Modified patch by Adam Preuss.
This builds on the existing framework for block tracing, edge profiling and optimal edge profiling.
See -help-hidden for new flags.
For documentation, see the technical report "Implementation of Path Profiling..." in llvm.org/pubs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124515 91177308-0d34-0410-b5e6-96231b3b80d8
benchmarks, and that it can be simplified to X/Y. (In general you can only
simplify (Z*Y)/Y to Z if the multiplication did not overflow; if Z has the
form "X/Y" then this is the case). This patch implements that transform and
moves some Div logic out of instcombine and into InstructionSimplify.
Unfortunately instcombine gets in the way somewhat, since it likes to change
(X/Y)*Y into X-(X rem Y), so I had to teach instcombine about this too.
Finally, thanks to the NSW/NUW flags, sometimes we know directly that "Z*Y"
does not overflow, because the flag says so, so I added that logic too. This
eliminates a bunch of divisions and subtractions in 447.dealII, and has good
effects on some other benchmarks too. It seems to have quite an effect on
tramp3d-v4 but it's hard to say if it's good or bad because inlining decisions
changed, resulting in massive changes all over.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124487 91177308-0d34-0410-b5e6-96231b3b80d8
a few loops accordingly. Should be no functional change.
This is a step for more accurate cost/benefit analysis of devirt/inlining
bonuses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124275 91177308-0d34-0410-b5e6-96231b3b80d8
optimized code are:
(non-negative number)+(power-of-two) != 0 -> true
and
(x | 1) != 0 -> true
Instcombine knows about the second one of course, but only does it if X|1
has only one use. These fire thousands of times in the testsuite.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124183 91177308-0d34-0410-b5e6-96231b3b80d8
with BasicAA's DecomposeGEPExpression, which recently began
using a TargetData. This fixes PR8968, though the testcase
is awkward to reduce.
Also, update several off GetUnderlyingObject's users
which happen to have a TargetData handy to pass it in.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124134 91177308-0d34-0410-b5e6-96231b3b80d8
computation, the Ancestor field is always set to the Parent, so we can remove
the explicit link entirely and merge the Parent and Ancestor fields. Instead of
checking for whether an ancestor exists for a node or not, we simply check
whether the node has already been processed. This is simpler if Compress is
inlined into Eval, so I did that as well.
This is about a 3% speedup running -domtree on test-suite + SPEC2000 & SPEC2006,
but it also opens up some opportunities for further improvement.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124061 91177308-0d34-0410-b5e6-96231b3b80d8
While there, I noticed that the transform "undef >>a X -> undef" was wrong.
For example if X is 2 then the top two bits must be equal, so the result can
not be anything. I fixed this in the constant folder as well. Also, I made
the transform for "X << undef" stronger: it now folds to undef always, even
though X might be zero. This is in accordance with the LangRef, but I must
admit that it is fairly aggressive. Also, I added "i32 X << 32 -> undef"
following the LangRef and the constant folder, likewise fairly aggressive.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123417 91177308-0d34-0410-b5e6-96231b3b80d8
Add methods for accessing the (single) entry / exit edge of a region. If no such
edge exists, null is returned. Both accessors return the start block of the
corresponding edge. The edge can finally be formed by utilizing
Region::getEntry() or Region::getExit();
Contributed by: Andreas Simbuerger <simbuerg@fim.uni-passau.de>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123410 91177308-0d34-0410-b5e6-96231b3b80d8
void f(int* begin, int* end) { std::fill(begin, end, 0); }
which turns into a != exit expression where one pointer is
strided and (thanks to step #1) known to not overflow, and
the other is loop invariant.
The observation here is that, though the IV is strided by
4 in this case, that the IV *has* to become equal to the
end value. It cannot "miss" the end value by stepping over
it, because if it did, the strided IV expression would
eventually wrap around.
Handle this by turning A != B into "A-B != 0" where the A-B
part is known to be NUW.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123131 91177308-0d34-0410-b5e6-96231b3b80d8
a pointer value has potentially become escaping. Implementations can choose to either fall back to
conservative responses for that value, or may recompute their analysis to accomodate the change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122777 91177308-0d34-0410-b5e6-96231b3b80d8
update a callGraph when performing the common operation of splicing the body to
a new function and updating all callers (such as via RAUW).
No users yet, though this is intended for DeadArgumentElimination as part of
PR8887.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122728 91177308-0d34-0410-b5e6-96231b3b80d8
compile, and everyone's tests have shown it to be slower in practice, even for
quite large graphs.
I also hope to do an optimization that is only correct with the simpler data
structure, which would break this even further.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122684 91177308-0d34-0410-b5e6-96231b3b80d8
naively implemented, the Lengauer-Tarjan algorithm requires a separate bucket
for each vertex. However, this is unnecessary, because each vertex is only
placed into a single bucket (that of its semidominator), and each vertex's
bucket is processed before it is added to any bucket itself.
Instead of using a bucket per vertex, we use a single array Buckets that has two
purposes. Before the vertex V with DFS number i is processed, Buckets[i] stores
the index of the first element in V's bucket. After V's bucket is processed,
Buckets[i] stores the index of the next element in the bucket to which V now
belongs, if any.
Reading from the buckets can also be optimized. Instead of processing the bucket
of V's parent at the end of processing V, we process the bucket of V itself at
the beginning of processing V. This means that the case of the root vertex can
be simplified somewhat. It also means that we don't need to look up the DFS
number of the semidominator of every node in the bucket we are processing,
since we know it is the current index being processed.
This is a 6.5% speedup running -domtree on test-suite + SPEC2000/2006, with
larger speedups of around 12% on the larger benchmarks like GCC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122680 91177308-0d34-0410-b5e6-96231b3b80d8
AliasAnalysis consumers, PartialAlias will be treated as MayAlias.
For AliasAnalysis chaining, MayAlias says "procede to the next analysis".
PartialAlias will be used to indicate that the query should terminate,
even though it didn't reach MustAlias or NoAlias.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121507 91177308-0d34-0410-b5e6-96231b3b80d8