With this fix we now successfully extract all 149 loops from 256.bzip2 without
crashing or miscompiling the program!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12493 91177308-0d34-0410-b5e6-96231b3b80d8
1. Names were not put on the new arguments created (ok, this just helps sanity :)
2. Fix outgoing pointer values
3. Do not insert stores for values that had not been computed
4. Fix some wierd problems with the outset calculation
This fixes CodeExtractor/2004-03-14-DominanceProblem.ll, making the extractor
work on at least one simple case!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12484 91177308-0d34-0410-b5e6-96231b3b80d8
Simplify the input/output finder. All elements of a basic block are
instructions. Any used arguments are also inputs. An instruction can only
be used by another instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12405 91177308-0d34-0410-b5e6-96231b3b80d8
* Don't insert a branch to the switch instruction after the call, just
make it a single block.
* Insert the new alloca instructions in the entry block of the original
function instead of having them execute dynamically
* Don't make the default edge of the switch instruction go back to the switch.
The loop extractor shouldn't create new loops!
* Give meaningful names to the alloca slots and the reload instructions
* Some minor code simplifications
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12402 91177308-0d34-0410-b5e6-96231b3b80d8
This also implements a two minor improvements:
* Don't insert live-out stores IN the region, insert them on the code path
that exits the region
* If the region is exited to the same block from multiple paths, share the
switch statement entry, live-out store code, and the basic block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12401 91177308-0d34-0410-b5e6-96231b3b80d8
a member of the class. While we're at it, turn the collection into a set
instead of a vector to improve efficiency and make queries simpler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12400 91177308-0d34-0410-b5e6-96231b3b80d8
loop information won't see it, and we could have unreachable blocks pointing to
the non-header node of blocks in a natural loop. This isn't tidy, so have the
loopsimplify pass clean it up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12380 91177308-0d34-0410-b5e6-96231b3b80d8
On the testcase from GCC PR12440, which has a LOT of loops (1392 of which require
preheaders to be inserted), this speeds up the loopsimplify pass from 1.931s to
0.1875s. The loop in question goes from 1.65s -> 0.0097s, which isn't bad. All of
these times are a debug build.
This adds a dependency on DominatorTree analysis that was not there before, but
we always had dominatortree available anyway, because LICM requires both loop
simplify and DT, so this doesn't add any extra analysis in practice.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12362 91177308-0d34-0410-b5e6-96231b3b80d8
function, as long as the loop isn't the only one in that function. This should
help debugging passes easier with BugPoint.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11936 91177308-0d34-0410-b5e6-96231b3b80d8
This case occurs many times in various benchmarks, especially when combined
with the previous patch. This allows it to get stuff like:
if (X == 4 || X == 3)
if (X == 5 || X == 8)
and
switch (X) {
case 4: case 5: case 6:
if (X == 4 || X == 5)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11797 91177308-0d34-0410-b5e6-96231b3b80d8
allowed in invoke instructions. Thus, if we are inlining a call to an intrinsic
function into an invoke site, we don't need to turn the call into an invoke!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11384 91177308-0d34-0410-b5e6-96231b3b80d8
Having a proper 'select' instruction would allow the elimination of a lot
of the special case cruft in this patch, but we don't have one yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11307 91177308-0d34-0410-b5e6-96231b3b80d8
This causes the JIT, or LLC'd program to print out a nice message, explaining
WHY the program aborted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11184 91177308-0d34-0410-b5e6-96231b3b80d8
The problem is that the dominator update code didn't "realize" that it's
possible for the newly inserted basic block to dominate anything. Because
it IS possible, stuff was getting updated wrong.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11137 91177308-0d34-0410-b5e6-96231b3b80d8
1. Don't scan to the end of alloca instructions in the caller function to
insert inlined allocas, just insert at the top. This saves a lot of
time inlining into functions with a lot of allocas.
2. Use splice to move the alloca instructions over, instead of remove/insert.
This allows us to transfer a block at a time, and eliminates a bunch of
silly symbol table manipulations.
This speeds up the inliner on the testcase in PR209 from 1.73s -> 1.04s (67%)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11118 91177308-0d34-0410-b5e6-96231b3b80d8
and that basic block ends with a return instruction. In this case, we can just splice
the cloned "body" of the function directly into the source basic block, avoiding a lot
of rearrangement and splitBasicBlock's linear scan over the split block. This speeds up
the inliner on the testcase in PR209 from 2.3s to 1.7s, a 35% reduction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11116 91177308-0d34-0410-b5e6-96231b3b80d8
before we delete the original call site, allowing slight simplifications of
code, but nothing exciting.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11109 91177308-0d34-0410-b5e6-96231b3b80d8
process. The only optimization we did so far is to avoid creating a
PHI node, then immediately destroying it in the common case where the
callee has one return statement. Instead, we just don't create the return
value. This has no noticable performance impact, but paves the way for
future improvements.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11108 91177308-0d34-0410-b5e6-96231b3b80d8
to add the cloned block to. This allows the block to be added to the function
immediately, and all of the instructions to be immediately added to the function
symbol table, which speeds up the inliner from 3.7 -> 3.38s on the PR209.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11107 91177308-0d34-0410-b5e6-96231b3b80d8
process them all as a group. This speeds up SRoA/mem2reg from 28.46s to
0.62s on the testcase from PR209.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11100 91177308-0d34-0410-b5e6-96231b3b80d8
Added assert() to ensure symbol table is well formed.
Added code to remember the value that was found; resolving types can change
the symbol table and invalidate the value of the iterator.
Added comments to the ResolveTypes() function (mainly for my own benefit).
Please feel free to correct the comments if they are not accurate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@9693 91177308-0d34-0410-b5e6-96231b3b80d8
PHI node entries for unwind instructions just like for call instructions which
became invokes! This fixes PR57, tested by
Inline/2003-10-26-InlineInvokeExceptionDestPhi.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@9526 91177308-0d34-0410-b5e6-96231b3b80d8
break dominance relationships, and is otherwise bad. This fixes bug:
Inline/2003-10-13-AllocaDominanceProblem.ll. This also fixes miscompilation
of 3 176.gcc source files (reload1.c, global.c, flow.c)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@9109 91177308-0d34-0410-b5e6-96231b3b80d8
have a SINGLE backedge. This is useful to, for example, the -indvars pass.
This implements testcase LoopSimplify/single-backedge.ll and closes PR#34
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@9065 91177308-0d34-0410-b5e6-96231b3b80d8
Running the inliner on 252.eon used to take 48.4763s, now it takes 14.4148s.
In release mode, it went from taking 25.8741s to taking 11.5712s.
This also fixes a FIXME.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@8890 91177308-0d34-0410-b5e6-96231b3b80d8
"minimal" SSA form (in other words, it doesn't insert dead PHIs). This
speeds up the mem2reg pass very significantly because it doesn't have to
do a lot of frivolous work in many common cases.
In the 252.eon function I have been playing with, this doesn't even insert
the 120 PHI nodes that it used to which were trivially dead (in the process
of promoting 356 alloca instructions overall). This speeds up the mem2reg
pass from 1.2459s to 0.1284s. More significantly, the DCE pass used to take
2.4138s to remove the 120 dead PHI nodes that mem2reg constructed, now it
takes 0.0134s (which is the time to scan the function and decide that there
is nothing dead). So overall, on this one function, we speed things up a
total of 3.5179s, which is a 24.8x speedup! :)
This change is tested by the Mem2Reg/2003-10-05-DeadPHIInsertion.ll test,
which now passes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@8884 91177308-0d34-0410-b5e6-96231b3b80d8
basic block. This is amazingly common in code generated by the C/C++ front-ends.
This change makes it not have to insert ANY phi nodes, whereas before it would insert
a ton of dead ones which DCE would have to clean up.
Thus, this fix improves compile-time performance of these trivial allocas in two ways:
1. It doesn't have to do the walking and book-keeping for renaming
2. It does not insert dead phi nodes for them which would have to
subsequently be cleaned up.
On my favorite testcase from 252.eon, this special case handles 305 out of
356 promoted allocas in the function. It speeds up the mem2reg pass from 7.5256s
to 1.2505s. It inserts 677 fewer dead PHI nodes, which speeds up a subsequent
-dce pass from 18.7524s to 2.4806s.
There are still 120 trivially dead PHI nodes being inserted for variables used
in multiple basic blocks, but they are not handled by this patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@8881 91177308-0d34-0410-b5e6-96231b3b80d8
*** Revamp the code which handled unreachable code in the function. Now the
code is much more efficient for high-degree basic blocks, such as those
that occur in the 252.eon SPEC benchmark.
For the interested, the time to promote a SINGLE alloca in _ZN7mrScene4ReadERSi
function used to be > 3.5s. Now it is < .075s. The function has a LOT of
allocas in it, so it appeared to be infinite looping, this should make it much
nicer. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@8863 91177308-0d34-0410-b5e6-96231b3b80d8
work-list of value definitions. This allows elimination of the explicit
'iterative' step of the algorithm, and also reuses temporary memory better.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@8861 91177308-0d34-0410-b5e6-96231b3b80d8
* Do not insert a new entry into NewPhiNodes during the rename pass if there are no PHIs in a block.
* Do not compute WriteSets in parallel
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@8858 91177308-0d34-0410-b5e6-96231b3b80d8
* Eliminate the KillList instance variable, instead, just delete loads and
stores as they are "renamed", and delete allocas when they are done
* Make the 'visited' set an instance variable to avoid passing it on the stack.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@8857 91177308-0d34-0410-b5e6-96231b3b80d8
... by making sure to update PHI nodes to take into consideration the
extra edges we get if we inline a call instruction through an invoke.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@8664 91177308-0d34-0410-b5e6-96231b3b80d8