ConstantFoldInstruction on calls) by avoiding Value::getName(). getName() constructs
and returns an std::string, which does heap allocation stuff. This slightly speeds up
instcombine.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40924 91177308-0d34-0410-b5e6-96231b3b80d8
1. domtree is a tree, not a graph. There is no need to avoid revisiting nodes with a set.
2. the worklist can contain the child iterator pointers so we don't get N^2 rescanning of children.
This speeds up updateDFSNumbers significantly, making it basically free. On the testcase in PR1432,
this speeds up loopsimplify by another 3x, dropping it from the 12th most expensive pass to the to
the 30th. :) It used to be #1.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40923 91177308-0d34-0410-b5e6-96231b3b80d8
natural loop canonicalization (which does many cfg xforms) by 4.3x, for
example. This also fixes a bug in postdom dfnumber computation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40920 91177308-0d34-0410-b5e6-96231b3b80d8
- Fix some minor bugs related to special markers on val# def. ~0U means
undefined, ~1U means dead val#.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40916 91177308-0d34-0410-b5e6-96231b3b80d8
kill instruction #, and source register number (iff the value# is defined by a
copy).
- Now def instruction # is set for every value#, not just for copy defined ones.
- Update some outdated code related inactive live ranges.
- Kill info not yet set. That's next patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40913 91177308-0d34-0410-b5e6-96231b3b80d8
SSE mode (all but conversions <-> other FP types, I think):
>>Do not mark all-80-bit operations as "Requires[FPStack]"
(which really means "not SSE").
>>Refactor load-and-extend to facilitate this.
>>Update comments.
>>Handle long double in SSE when computing FP_REG_KILL.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40906 91177308-0d34-0410-b5e6-96231b3b80d8
Last x87 bits for full functionality (not
thoroughly tested, and long doubles do not work
in SSE modes at all - use -mcpu=i486 for now)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40886 91177308-0d34-0410-b5e6-96231b3b80d8
2. Make domtree printing print dfin/dfout #'s
3. Fix the Transforms/LoopSimplify/2004-04-13-LoopSimplifyUpdateDomFrontier.ll failure from last night (in DominanceFrontier::splitBlock).
w.r.t. #3, my patches last night happened to expose the bug, but this
has been broken since Owen's r35839 patch to LoopSimplify. The code
was subsequently moved over from LoopSimplify into Dominators, carrying
the latent bug. Fun stuff.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40858 91177308-0d34-0410-b5e6-96231b3b80d8
This shrinks it down to something small. On the testcase
from PR1432, this speeds up instcombine from 0.7959s to 0.5000s,
(59%)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40840 91177308-0d34-0410-b5e6-96231b3b80d8
which dynamically allocates the string result. This speeds up dse on the
testcase from PR1432 from 0.3781s to 0.1804s (2.1x).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40838 91177308-0d34-0410-b5e6-96231b3b80d8
contents of the set were small, deallocate and shrink the set. This
avoids having us to memset as much data, significantly speeding up
some pathological cases. For example, this speeds up the verifier
from 0.3899s to 0.0763 (5.1x) on the testcase from PR1432 in a
release build.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40837 91177308-0d34-0410-b5e6-96231b3b80d8
speeds up idom by about 45% and postidom by about 33%.
Some extra precautions must be taken not to invalidate densemap iterators.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40827 91177308-0d34-0410-b5e6-96231b3b80d8
DenseMap instead of an std::map. This speeds up postdomtree
by about 25% and domtree by about 23%. It also speeds up clients,
for example, domfrontier by 11%, mem2reg by 4% and ADCE by 6%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40826 91177308-0d34-0410-b5e6-96231b3b80d8
In the old way, we computed and inserted phi nodes for the whole IDF of
the definitions of the alloca, then computed which ones were dead and
removed them.
In the new method, we first compute the region where the value is live,
and use that information to only insert phi nodes that are live. This
eliminates the need to compute liveness later, and stops the algorithm
from inserting a bunch of phis which it then later removes.
This speeds up the testcase in PR1432 from 2.00s to 0.15s (14x) in a
release build and 6.84s->0.50s (14x) in a debug build.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40825 91177308-0d34-0410-b5e6-96231b3b80d8
to the worklist, and handling the last one with a 'tail call'. This speeds
up PR1432 from 2.0578s to 2.0012s (2.8%)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40822 91177308-0d34-0410-b5e6-96231b3b80d8
faster than with the 'local to a block' fastpath. This speeds
up PR1432 from 2.1232 to 2.0686s (2.6%)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40818 91177308-0d34-0410-b5e6-96231b3b80d8
to increment NumLocalPromoted, and didn't actually delete the
dead alloca, leading to an extra iteration of mem2reg.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40817 91177308-0d34-0410-b5e6-96231b3b80d8
stored value was a non-instruction value. Doh.
This increase the # single store allocas from 8982 to 9026, and
speeds up mem2reg on the testcase in PR1432 from 2.17 to 2.13s.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40813 91177308-0d34-0410-b5e6-96231b3b80d8
1. Check for revisiting a block before checking domination, which is faster.
2. If the stored value isn't an instruction, we don't have to check for domination.
3. If we have a value used in the same block more than once, make sure to remove the
block from the UsingBlocks vector. Not doing so forces us to go through the slow
path for the alloca.
The combination of these improvements increases the number of allocas on the fastpath
from 8935 to 8982 on PR1432. This speeds it up from 2.90s to 2.20s (31%)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40811 91177308-0d34-0410-b5e6-96231b3b80d8
a using block from the list if we handle it. Not doing this caused us
to not be able to promote (with the fast path) allocas which have uses (whoops).
This increases the # allocas hitting this fastpath from 4042 to 8935 on the
testcase in PR1432, speeding up mem2reg by 2.6x
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40809 91177308-0d34-0410-b5e6-96231b3b80d8
This also changes the syntax for llvm.bswap, llvm.part.set, llvm.part.select, and llvm.ct* intrinsics. They are automatically upgraded by both the LLVM ASM reader and the bitcode reader. The test cases have been updated, with special tests added to ensure the automatic upgrading is supported.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40807 91177308-0d34-0410-b5e6-96231b3b80d8
(I've tried to get the info right for all targets,
but I'm not expert on all of them - check yours.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40792 91177308-0d34-0410-b5e6-96231b3b80d8
Generalize isPSHUFDMask and add a unary SHUFPD pattern so that SHUFPD's
memory operand alignment can be tested as well, with a fix to avoid
breaking MMX's use of isPSHUFDMask.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40756 91177308-0d34-0410-b5e6-96231b3b80d8