that it was leaving in loops after rotation (between the original latch
block and the original header.
With this change, it is possible for rotated loops to have just a single
basic block, which is useful.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123075 91177308-0d34-0410-b5e6-96231b3b80d8
1. Rip out LoopRotate's domfrontier updating code. It isn't
needed now that LICM doesn't use DF and it is super complex
and gross.
2. Make DomTree updating code a lot simpler and faster. The
old loop over all the blocks was just to find a block??
3. Change the code that inserts the new preheader to just use
SplitCriticalEdge instead of doing an overcomplex
reimplementation of it.
No behavior change, except for the name of the inserted preheader.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123072 91177308-0d34-0410-b5e6-96231b3b80d8
they all ready do). This removes two dominator recomputations prior to isel,
which is a 1% improvement in total llc time for 403.gcc.
The only potentially suspect thing is making GCStrategy recompute dominators if
it used a custom lowering strategy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123064 91177308-0d34-0410-b5e6-96231b3b80d8
Add a unnamed_addr bit to global variables and functions. This will be used
to indicate that the address is not significant and therefore the constant
or function can be merged with others.
If an optimization pass can show that an address is not used, it can set this.
Examples of things that can have this set by the FE are globals created to
hold string literals and C++ constructors.
Adding unnamed_addr to a non-const global should have no effect unless
an optimization can transform that global into a constant.
Aliases are not allowed to have unnamed_addr since I couldn't figure
out any use for it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123063 91177308-0d34-0410-b5e6-96231b3b80d8
them into the loop preheader, eliminating silly instructions like
"icmp i32 0, 100" in fixed tripcount loops. This also better exposes the
bigger problem with loop rotate that I'd like to fix: once this has been
folded, the duplicated conditional branch *often* turns into an uncond branch.
Not aggressively handling this is pessimizing later loop optimizations
somethin' fierce by making "dominates all exit blocks" checks fail.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123060 91177308-0d34-0410-b5e6-96231b3b80d8
1. Take a flags argument instead of a bool. This makes
it more clear to the reader what it is used for.
2. Add a flag that says that "remapping a value not in the
map is ok".
3. Reimplement MapValue to share a bunch of code and be a lot
more efficient. For lookup failures, don't drop null values
into the map.
4. Using the new flag a bunch of code can vaporize in LinkModules
and LoopUnswitch, kill it.
No functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123058 91177308-0d34-0410-b5e6-96231b3b80d8
map from ValueMapper.h (giving us access to its utilities)
and add a fastpath in the loop rotation code, avoiding expensive
ssa updator manipulation for values with nothing to update.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123057 91177308-0d34-0410-b5e6-96231b3b80d8
Instead encode llvm IR level property "HasSideEffects" in an operand (shared
with IsAlignStack). Added MachineInstrs::hasUnmodeledSideEffects() to check
the operand when the instruction is an INLINEASM.
This allows memory instructions to be moved around INLINEASM instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123044 91177308-0d34-0410-b5e6-96231b3b80d8
X = sext x; x >s c ? X : C+1 --> X = sext x; X <s C+1 ? C+1 : X
X = sext x; x <s c ? X : C-1 --> X = sext x; X >s C-1 ? C-1 : X
X = zext x; x >u c ? X : C+1 --> X = zext x; X <u C+1 ? C+1 : X
X = zext x; x <u c ? X : C-1 --> X = zext x; X >u C-1 ? C-1 : X
X = sext x; x >u c ? X : C+1 --> X = sext x; X <u C+1 ? C+1 : X
X = sext x; x <u c ? X : C-1 --> X = sext x; X >u C-1 ? C-1 : X
Instead of calculating this with mixed types promote all to the
larger type. This enables scalar evolution to analyze this
expression. PR8866
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123034 91177308-0d34-0410-b5e6-96231b3b80d8
Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle
vectors from being translated to EXTRACT_SUBVECTOR.
Patch by Tim Northover.
The test changes are needed to keep those spill-q tests from testing aligned
spills and restores. If the only aligned stack objects are spill slots, we
no longer realign the stack frame. Prior to this patch, an EXTRACT_SUBVECTOR
was legalized by loading from the stack, which created an aligned frame index.
Now, however, there is nothing except the spill slot in the stack frame, so
I added an aligned alloca.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122995 91177308-0d34-0410-b5e6-96231b3b80d8
We were never generating any of these nodes with variable indices, and there
was one legalizer function asserting on a non-constant index. If we ever have
a need to support variable indices, we can add this back again.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122993 91177308-0d34-0410-b5e6-96231b3b80d8
The theory is it's still faster than a pair of movq / a quad of movl. This
will probably hurt older chips like P4 but should run faster on current
and future Intel processors. rdar://8817010
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122955 91177308-0d34-0410-b5e6-96231b3b80d8
etc. takes an option OptSize. If OptSize is true, it would return
the inline limit for functions with attribute OptSize.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122952 91177308-0d34-0410-b5e6-96231b3b80d8
ret i64 ptrtoint (i8* getelementptr ([1000 x i8]* @X, i64 1, i64 sub (i64 0, i64 ptrtoint ([1000 x i8]* @X to i64))) to i64)
to "ret i64 1000". This allows us to correctly compute the trip count
on a loop in PR8883, which occurs with std::fill on a char array. This
allows us to transform it into a memset with a constant size.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122950 91177308-0d34-0410-b5e6-96231b3b80d8
into a separate function, so that it can be called from a loop using a worklist
rather than a loop traversing a whole basic block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122943 91177308-0d34-0410-b5e6-96231b3b80d8
This pass precomputes CFG block frequency information that can be used by the
register allocator to find optimal spill code placement.
Given an interference pattern, placeSpills() will compute which basic blocks
should have the current variable enter or exit in a register, and which blocks
prefer the stack.
The algorithm is ready to consume block frequencies from profiling data, but for
now it gets by with the static estimates used for spill weights.
This is a work in progress and still not hooked up to RegAllocGreedy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122938 91177308-0d34-0410-b5e6-96231b3b80d8
up freebsd bootloader. However, this doesn't make much sense for Darwin, whose
-Os is meant to optimize for size only if it doesn't hurt performance.
rdar://8821501
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122936 91177308-0d34-0410-b5e6-96231b3b80d8
beginning of the "main" function. The assembler complains about the invalid
suffix for the 'call' instruction. The right instruction is "callq __main".
Patch by KS Sreeram!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122933 91177308-0d34-0410-b5e6-96231b3b80d8
hasBlockValue() that was causing iterator invalidations. Many thanks to Dimitry Andric for
tracking down those invalidations!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122906 91177308-0d34-0410-b5e6-96231b3b80d8
step is to only process instructions in subloops if they have been modified by
an earlier simplification.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122869 91177308-0d34-0410-b5e6-96231b3b80d8
skipping them, but it should probably use a worklist and only revisit those
instructions in subloops that have actually changed. It should probably also
use a worklist after the first iteration like instsimplify now does. Regardless,
it's only 0.3% of opt -O2 time on 403.gcc if it replaces the instcombine placed
in the middle of the loop passes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122868 91177308-0d34-0410-b5e6-96231b3b80d8
fewer things into the value numbering maps, but any speedup is beneath the noise threshold on my machine
on 403.gcc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122844 91177308-0d34-0410-b5e6-96231b3b80d8
The analysis will be needed by both the greedy register allocator and the
X86FloatingPoint pass. It only needs to be computed once when the CFG doesn't
change.
This pass is very fast, usually showing up as 0.0% wall time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122832 91177308-0d34-0410-b5e6-96231b3b80d8
case where a static caller is itself inlined everywhere else, and
thus may go away if it doesn't get too big due to inlining other
things into it. If there are references to the caller other than
calls, it will not be removed; account for this.
This results in same-day completion of the case in PR8853.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122821 91177308-0d34-0410-b5e6-96231b3b80d8
avoids adding them to the various value numbering tables, resulting in a minor (~3%) speedup for GVN
on 40.gcc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122819 91177308-0d34-0410-b5e6-96231b3b80d8
when safe.
The testcase is basically this nested loop:
void foo(char *X) {
for (int i = 0; i != 100; ++i)
for (int j = 0; j != 100; ++j)
X[j+i*100] = 0;
}
which gets turned into a single memset now. clang -O3 doesn't optimize
this yet though due to a phase ordering issue I haven't analyzed yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122806 91177308-0d34-0410-b5e6-96231b3b80d8
instruction *after* the store. The store will always be deleted
if the transformation kicks in, so we'd do an N^2 scan of every
loop block. Whoops.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122805 91177308-0d34-0410-b5e6-96231b3b80d8
FunctionPass. It probably doesn't have a reason to be a LoopPass, as it will
probably drop the simple fixed point and either use RPO iteration or Duncan's
approach in instsimplify of only revisiting instructions that have changed.
The next step is to preserve LoopSimplify. This looks like it won't be too hard,
although the pass manager doesn't actually seem to respect when non-loop passes
claim to preserve LCSSA or LoopSimplify. This will have to be fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122791 91177308-0d34-0410-b5e6-96231b3b80d8
prologue and epilogue if the adjustment is 8. Similarly, use pushl / popl if
the adjustment is 4 in 32-bit mode.
In the epilogue, takes care to pop to a caller-saved register that's not live
at the exit (either return or tailcall instruction).
rdar://8771137
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122783 91177308-0d34-0410-b5e6-96231b3b80d8
a pointer value has potentially become escaping. Implementations can choose to either fall back to
conservative responses for that value, or may recompute their analysis to accomodate the change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122777 91177308-0d34-0410-b5e6-96231b3b80d8
that are allowed to have metadata operands are intrinsic calls,
and the only ones that take metadata currently return void.
Just reject all void instructions, which should not be value
numbered anyway. To future proof things, add an assert to the
getHashValue impl for calls to check that metadata operands
aren't present.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122759 91177308-0d34-0410-b5e6-96231b3b80d8
nested values, so they can change and drop to null, which can
change the hash and cause havok.
It turns out that it isn't a good idea to value number stuff
with metadata operands anyway, so... don't.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122758 91177308-0d34-0410-b5e6-96231b3b80d8
capacity on the Visited SmallPtrSet. On 403.gcc, this is about a 4.5% speedup of
CodeGenPrepare time (which itself is 10% of time spent in the backend).
This is progress towards PR8889.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122741 91177308-0d34-0410-b5e6-96231b3b80d8
update a callGraph when performing the common operation of splicing the body to
a new function and updating all callers (such as via RAUW).
No users yet, though this is intended for DeadArgumentElimination as part of
PR8887.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122728 91177308-0d34-0410-b5e6-96231b3b80d8
On 176.gcc, this catches 13090 loads and calls, and increases the
number of simple instructions CSE'd from 29658 to 36208.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122727 91177308-0d34-0410-b5e6-96231b3b80d8
of instcombine that is currently in the middle of the loop pass pipeline. This
commit only checks in the pass; it will hopefully be enabled by default later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122719 91177308-0d34-0410-b5e6-96231b3b80d8
sure that the loop we're promoting into a memcpy doesn't mutate the input
of the memcpy. Before we were just checking that the dest of the memcpy
wasn't mod/ref'd by the loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122712 91177308-0d34-0410-b5e6-96231b3b80d8
This allows us to compile:
void test(char *s, int a) {
__builtin_memset(s, a, 15);
}
into 1 mul + 3 stores instead of 3 muls + 3 stores.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122710 91177308-0d34-0410-b5e6-96231b3b80d8
We could implement a DAGCombine to turn x * 0x0101 back into logic operations
on targets that doesn't support the multiply or it is slow (p4) if someone cares
enough.
Example code:
void test(char *s, int a) {
__builtin_memset(s, a, 4);
}
before:
_test: ## @test
movzbl 8(%esp), %eax
movl %eax, %ecx
shll $8, %ecx
orl %eax, %ecx
movl %ecx, %eax
shll $16, %eax
orl %ecx, %eax
movl 4(%esp), %ecx
movl %eax, 4(%ecx)
movl %eax, (%ecx)
ret
after:
_test: ## @test
movzbl 8(%esp), %eax
imull $16843009, %eax, %eax ## imm = 0x1010101
movl 4(%esp), %ecx
movl %eax, 4(%ecx)
movl %eax, (%ecx)
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122707 91177308-0d34-0410-b5e6-96231b3b80d8
blocks in a loop, instead of just the header block. This makes it more
aggressive, able to handle Duncan's Ada examples.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122704 91177308-0d34-0410-b5e6-96231b3b80d8
isExitBlockDominatedByBlockInLoop is a relic of the days when domtree was
*just* a tree and didn't have DFS numbers. Checking DFS numbers is faster
and easier than "limiting the search of the tree".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122702 91177308-0d34-0410-b5e6-96231b3b80d8
in the PR, the pass could break LCSSA form when inserting preheaders. It probably
would be easy enough to fix this, but since currently we always go into LCSSA form
after running this pass, doing so is not urgent.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122695 91177308-0d34-0410-b5e6-96231b3b80d8
header for now for memset/memcpy opportunities. It turns out that loop-rotate
is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for
loops" into 2 basic block loops that loop-idiom was ignoring.
With this fix, we form many *many* more memcpy and memsets than before, including
on the "history" loops in the viterbi benchmark, which look like this:
for (j=0; j<MAX_history; ++j) {
history_new[i][j+1] = history[2*i][j];
}
Transforming these loops into memcpy's speeds up the viterbi benchmark from
11.98s to 3.55s on my machine. Woo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122685 91177308-0d34-0410-b5e6-96231b3b80d8
size of a loop header instead of its own code size estimator.
This allows it to handle bitcasts etc more precisely.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122681 91177308-0d34-0410-b5e6-96231b3b80d8
maintains the guarantee that the DenseSet expects two elements it contains to
not go from inequal to equal under its nose.
As a side-effect, this also lets us switch from iterating to a fixed-point to
actually maintaining a work queue of functions to look at again, and we don't
add thunks to our work queue so we don't need to detect and ignore them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122677 91177308-0d34-0410-b5e6-96231b3b80d8
earlyclobber stuff. This should fix PRs 2313 and 8157.
Unfortunately, no testcase, since it'd be dependent on register
assignments.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122663 91177308-0d34-0410-b5e6-96231b3b80d8
numbering, in which it considers (for example) "%a = add i32 %x, %y" and
"%b = add i32 %x, %y" to be equal because the operands are equal and the
result of the instructions only depends on the values of the operands.
This has almost no effect (it removes 4 instructions from gcc-as-one-file),
and perhaps slows down compilation: I measured a 0.4% slowdown on the large
gcc-as-one-file testcase, but it wasn't statistically significant.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122654 91177308-0d34-0410-b5e6-96231b3b80d8
process those instructions that define phi sources. This is a 47% speedup of
StrongPHIElimination compile time on 403.gcc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122627 91177308-0d34-0410-b5e6-96231b3b80d8
we are only interested in the defs when discovering interferences.
This is a 28% speedup running StrongPHIElimination on 403.gcc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122596 91177308-0d34-0410-b5e6-96231b3b80d8
when running without the verifier, and I have not yet checked them to see if
the new results are still correct. There are more verifier failures, but they
all seem to be additional occurrences of verifier failures that occur with the
existing PHIElimination pass. There are a few obvious issues with the code:
1) It doesn't properly update the register equivalence classes during copy
insertion, and instead recomputes them before merging live intervals and
renaming registers. I wanted to keep this first patch simple for debugging
purposes, but it shouldn't be very hard to do this.
2) It doesn't mix the renaming and live interval merging with the copy insertion
process, which leads to a lot of virtual register churn. Virtual registers and
live intervals are created, only to later be merged into others. The code should
be smarter and only create a new virtual register if there is no existing
register in the same congruence class.
3) In one place the code uses a DenseMap per basic block, which is unnecessary
heap allocation. There should be an inline storage version of DenseMap.
I did a quick compile-time test of running llc on 403.gcc with and without
StrongPHIElimination. It is slightly slower with StrongPHIElimination, because
the small decrease in the coalescer runtime can't beat the increase in phi
elimination runtime. Perhaps fixing the above performance issues will narrow
the gap.
I also haven't yet run any tests of the quality of the generated code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122582 91177308-0d34-0410-b5e6-96231b3b80d8
valno verification. The "Different value live out of predecessor" check is
incorrect in the case of phi-def valnos, so just skip that check for phi-def
valnos and instead check that all of the valnos for predecessors have phi-kill.
Fixes PR8863.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122581 91177308-0d34-0410-b5e6-96231b3b80d8