LazyCallGraph analysis framework. Wire it up all the way through the opt
driver and add some very basic testing that we can build pass pipelines
including these components. Still a lot more to do in terms of testing
that all of this works, but the basic pieces are here.
There is a *lot* of boiler plate here. It's something I'm going to
actively look at reducing, but I don't have any immediate ideas that
don't end up making the code terribly complex in order to fold away the
boilerplate. Until I figure out something to minimize the boilerplate,
almost all of this is based on the code for the existing pass managers,
copied and heavily adjusted to suit the needs of the CGSCC pass
management layer.
The actual CG management still has a bunch of FIXMEs in it. Notably, we
don't do *any* updating of the CG as it is potentially invalidated.
I wanted to get this in place to motivate the new analysis, and add
update APIs to the analysis and the pass management layers in concert to
make sure that the *right* APIs are present.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206745 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r206677, reapplying my BlockFrequencyInfo rewrite.
I've done a careful audit, added some asserts, and fixed a couple of
bugs (unfortunately, they were in unlikely code paths). There's a small
chance that this will appease the failing bots [1][2]. (If so, great!)
If not, I have a follow-up commit ready that will temporarily add
-debug-only=block-freq to the two failing tests, allowing me to compare
the code path between what the failing bots and what my machines (and
the rest of the bots) are doing. Once I've triggered those builds, I'll
revert both commits so the bots go green again.
[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445
<rdar://problem/14292693>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206704 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r206666, as planned.
Still stumped on why the bots are failing. Sanitizer bots haven't
turned anything up. If anyone can help me debug either of the failures
(referenced in r206666) I'll owe them a beer. (In the meantime, I'll be
auditing my patch for undefined behaviour.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206677 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r206628, reapplying r206622 (and r206626).
Two tests are failing only on buildbots [1][2]: i.e., I can't reproduce
on Darwin, and Chandler can't reproduce on Linux. Asan and valgrind
don't tell us anything, but we're hoping the msan bot will catch it.
So, I'm applying this again to get more feedback from the bots. I'll
leave it in long enough to trigger builds in at least the sanitizer
buildbots (it was failing for reasons unrelated to my commit last time
it was in), and hopefully a few others.... and then I expect to revert a
third time.
[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206666 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r206622 and the MSVC fixup in r206626.
Apparently the remotely failing tests are still failing, despite my
attempt to fix the nondeterminism in r206621.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206628 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r206556, effectively reapplying commit r206548 and
its fixups in r206549 and r206550.
In an intervening commit I've added target triples to the tests that
were failing remotely [1] (but passing locally). I'm hoping the mystery
is solved? I'll revert this again if the tests are still failing
remotely.
[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206622 91177308-0d34-0410-b5e6-96231b3b80d8
Reality is that we're never going to copy one of these. Supporting this
was becoming a nightmare because nothing even causes it to compile most
of the time. Lots of subtle errors built up that wouldn't have been
caught by any "normal" testing.
Also, make the move assignment actually work rather than the bogus swap
implementation that would just infloop if used. As part of that, factor
out the graph pointer updates into a helper to share between move
construction and move assignment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206583 91177308-0d34-0410-b5e6-96231b3b80d8
LazyCallGraph. This is the start of the whole point of this different
abstraction, but it is just the initial bits. Here is a run-down of
what's going on here. I'm planning to incorporate some (or all) of this
into comments going forward, hopefully with better editing and wording.
=]
The crux of the problem with the traditional way of building SCCs is
that they are ephemeral. The new pass manager however really needs the
ability to associate analysis passes and results of analysis passes with
SCCs in order to expose these analysis passes to the SCC passes. Making
this work is kind-of the whole point of the new pass manager. =]
So, when we're building SCCs for the call graph, we actually want to
build persistent nodes that stick around and can be reasoned about
later. We'd also like the ability to walk the SCC graph in more complex
ways than just the traditional postorder traversal of the current CGSCC
walk. That means that in addition to being persistent, the SCCs need to
be connected into a useful graph structure.
However, we still want the SCCs to be formed lazily where possible.
These constraints are quite hard to satisfy with the SCC iterator. Also,
using that would bypass our ability to actually add data to the nodes of
the call graph to facilite implementing the Tarjan walk. So I've
re-implemented things in a more direct and embedded way. This
immediately makes it easy to get the persistence and connectivity
correct, and it also allows leveraging the existing nodes to simplify
the algorithm. I've worked somewhat to make this implementation more
closely follow the traditional paper's nomenclature and strategy,
although it is still a bit obtuse because it isn't recursive, using
an explicit stack and a tail call instead, and it is interruptable,
resuming each time we need another SCC.
The other tricky bit here, and what actually took almost all the time
and trials and errors I spent building this, is exactly *what* graph
structure to build for the SCCs. The naive thing to build is the call
graph in its newly acyclic form. I wrote about 4 versions of this which
did precisely this. Inevitably, when I experimented with them across
various use cases, they became incredibly awkward. It was all
implementable, but it felt like a complete wrong fit. Square peg, round
hole. There were two overriding aspects that pushed me in a different
direction:
1) We want to discover the SCC graph in a postorder fashion. That means
the root node will be the *last* node we find. Using the call-SCC DAG
as the graph structure of the SCCs results in an orphaned graph until
we discover a root.
2) We will eventually want to walk the SCC graph in parallel, exploring
distinct sub-graphs independently, and synchronizing at merge points.
This again is not helped by the call-SCC DAG structure.
The structure which, quite surprisingly, ended up being completely
natural to use is the *inverse* of the call-SCC DAG. We add the leaf
SCCs to the graph as "roots", and have edges to the caller SCCs. Once
I switched to building this structure, everything just fell into place
elegantly.
Aside from general cleanups (there are FIXMEs and too few comments
overall) that are still needed, the other missing piece of this is
support for iterating across levels of the SCC graph. These will become
useful for implementing #2, but they aren't an immediate priority.
Once SCCs are in good shape, I'll be working on adding mutation support
for incremental updates and adding the pass manager that this analysis
enables.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206581 91177308-0d34-0410-b5e6-96231b3b80d8
Rewrite the shared implementation of BlockFrequencyInfo and
MachineBlockFrequencyInfo entirely.
The old implementation had a fundamental flaw: precision losses from
nested loops (or very wide branches) compounded past loop exits (and
convergence points).
The @nested_loops testcase at the end of
test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating. This
function has three nested loops, with branch weights in the loop headers
of 1:4000 (exit:continue). The old analysis gives non-sensical results:
Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
---- Block Freqs ----
entry = 1.0
for.cond1.preheader = 1.00103
for.cond4.preheader = 5.5222
for.body6 = 18095.19995
for.inc8 = 4.52264
for.inc11 = 0.00109
for.end13 = 0.0
The new analysis gives correct results:
Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
block-frequency-info: nested_loops
- entry: float = 1.0, int = 8
- for.cond1.preheader: float = 4001.0, int = 32007
- for.cond4.preheader: float = 16008001.0, int = 128064007
- for.body6: float = 64048012001.0, int = 512384096007
- for.inc8: float = 16008001.0, int = 128064007
- for.inc11: float = 4001.0, int = 32007
- for.end13: float = 1.0, int = 8
Most importantly, the frequency leaving each loop matches the frequency
entering it.
The new algorithm leverages BlockMass and PositiveFloat to maintain
precision, separates "probability mass distribution" from "loop
scaling", and uses dithering to eliminate probability mass loss. I have
unit tests for these types out of tree, but it was decided in the review
to make the classes private to BlockFrequencyInfoImpl, and try to shrink
them (or remove them entirely) in follow-up commits.
The new algorithm should generally have a complexity advantage over the
old. The previous algorithm was quadratic in the worst case. The new
algorithm is still worst-case quadratic in the presence of irreducible
control flow, but it's linear without it.
The key difference between the old algorithm and the new is that control
flow within a loop is evaluated separately from control flow outside,
limiting propagation of precision problems and allowing loop scale to be
calculated independently of mass distribution. Loops are visited
bottom-up, their loop scales are calculated, and they are replaced by
pseudo-nodes. Mass is then distributed through the function, which is
now a DAG. Finally, loops are revisited top-down to multiply through
the loop scales and the masses distributed to pseudo nodes.
There are some remaining flaws.
- Irreducible control flow isn't modelled correctly. LoopInfo and
MachineLoopInfo ignore irreducible edges, so this algorithm will
fail to scale accordingly. There's a note in the class
documentation about how to get closer. See also the comments in
test/Analysis/BlockFrequencyInfo/irreducible.ll.
- Loop scale is limited to 4096 per loop (2^12) to avoid exhausting
the 64-bit integer precision used downstream.
- The "bias" calculation proposed on llvmdev is *not* incorporated
here. This will be added in a follow-up commit, once comments from
this review have been handled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206548 91177308-0d34-0410-b5e6-96231b3b80d8
to a more normal move operation on the graph itself. The definition
already got removed, but I missed the declaration.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206455 91177308-0d34-0410-b5e6-96231b3b80d8
This will become necessary to build up the SCC iterators and SCC
definitions. Moving it now so that subsequent diffs are incremental.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206454 91177308-0d34-0410-b5e6-96231b3b80d8
graph. This simplifies the custom move constructor operation to one of
walking the graph and updating the 'up' pointers to point to the new
location of the graph. Switch the nodes from a reference to a pointer
for the 'up' edge to facilitate this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206450 91177308-0d34-0410-b5e6-96231b3b80d8
It doesn't work. I'm still cleaning up all the places where I blindly
followed this pattern. There are more to come in this code too.
As a benefit, this lets the default copy and move operations Just Work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206375 91177308-0d34-0410-b5e6-96231b3b80d8
Moves redundant template parameters into an implementation detail of
BlockFrequencyInfoImpl.
No functionality change.
<rdar://problem/14292693>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206084 91177308-0d34-0410-b5e6-96231b3b80d8
This is a shared implementation class for BlockFrequencyInfo and
MachineBlockFrequencyInfo, not for BlockFrequency, a related (but
distinct) class.
No functionality change.
<rdar://problem/14292693>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206083 91177308-0d34-0410-b5e6-96231b3b80d8
Also updated as many loops as I could find using df_begin/idf_begin -
strangely I found no uses of idf_begin. Is that just used out of tree?
Also a few places couldn't use df_begin because either they used the
member functions of the depth first iterators or had specific ordering
constraints (I added a comment in the latter case).
Based on a patch by Jim Grosbach. (Jim - you just had iterator_range<T>
where you needed iterator_range<idf_iterator<T>>)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206016 91177308-0d34-0410-b5e6-96231b3b80d8
In preparation for an upcoming commit implementing unrolling preferences for
x86, this adds additional fields to the UnrollingPreferences structure:
- PartialThreshold and PartialOptSizeThreshold - Like Threshold and
OptSizeThreshold, but used when not fully unrolling. These are necessary
because we need different thresholds for full unrolling from those used when
partially unrolling (the full unrolling thresholds are generally going to be
larger).
- MaxCount - A cap on the unrolling factor when partially unrolling. This can
be used by a target to prevent the unrolled loop from exceeding some
resource limit independent of the loop size (such as number of branches).
There should be no functionality change for any in-tree targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205347 91177308-0d34-0410-b5e6-96231b3b80d8
Implement Pass::releaseMemory() in BlockFrequencyInfo and
MachineBlockFrequencyInfo. Just delete the private implementation when
not in use. Switch to a std::unique_ptr to make the logic more clear.
<rdar://problem/14292693>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204741 91177308-0d34-0410-b5e6-96231b3b80d8
Extend the target hook to take also the operand index into account when
calculating the cost of the constant materialization.
Related to <rdar://problem/16381500>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204435 91177308-0d34-0410-b5e6-96231b3b80d8
This commit extends the coverage of the constant hoisting pass, adds additonal
debug output and updates the function names according to the style guide.
Related to <rdar://problem/16381500>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204389 91177308-0d34-0410-b5e6-96231b3b80d8
as well. I don't see any particular need but it imposes no cost to
support it and it makes the API cleaner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203448 91177308-0d34-0410-b5e6-96231b3b80d8
the stack of the analysis group because they are all immutable passes.
This is made clear by Craig's recent work to use override
systematically -- we weren't overriding anything for 'finalizePass'
because there is no such thing.
This is kind of a lame restriction on the API -- we can no longer push
and pop things, we just set up the stack and run. However, I'm not
invested in building some better solution on top of the existing
(terrifying) immutable pass and legacy pass manager.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203437 91177308-0d34-0410-b5e6-96231b3b80d8
This compiles with no changes to clang/lld/lldb with MSVC and includes
overloads to various functions which are used by those projects and llvm
which have OwningPtr's as parameters. This should allow out of tree
projects some time to move. There are also no changes to libs/Target,
which should help out of tree targets have time to move, if necessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203083 91177308-0d34-0410-b5e6-96231b3b80d8
to ensure we don't mess up any of the overrides. Necessary for cleaning
up the Value use iterators and enabling range-based traversing of use
lists.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202958 91177308-0d34-0410-b5e6-96231b3b80d8
a bit surprising, as the class is almost entirely abstracted away from
any particular IR, however it encodes the comparsion predicates which
mutate ranges as ICmp predicate codes. This is reasonable as they're
used for both instructions and constants. Thus, it belongs in the IR
library with instructions and constants.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202838 91177308-0d34-0410-b5e6-96231b3b80d8
this would have been required because of the use of DataLayout, but that
has moved into the IR proper. It is still required because this folder
uses the constant folding in the analysis library (which uses the
datalayout) as the more aggressive basis of its folder.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202832 91177308-0d34-0410-b5e6-96231b3b80d8
Move the test for this class into the IR unittests as well.
This uncovers that ValueMap too is in the IR library. Ironically, the
unittest for ValueMap is useless in the Support library (honestly, so
was the ValueHandle test) and so it already lives in the IR unittests.
Mmmm, tasty layering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202821 91177308-0d34-0410-b5e6-96231b3b80d8
business.
This header includes Function and BasicBlock and directly uses the
interfaces of both classes. It has to do with the IR, it even has that
in the name. =] Put it in the library it belongs to.
This is one step toward making LLVM's Support library survive a C++
modules bootstrap.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202814 91177308-0d34-0410-b5e6-96231b3b80d8