simplification present in fully optimized code (I think instcombine fails to
transform some of these when "X-Y" has more than one use). Fires here and
there all over the test-suite, for example it eliminates 8 subtractions in
the final IR for 445.gobmk, 2 subs in 447.dealII, 2 in paq8p etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123442 91177308-0d34-0410-b5e6-96231b3b80d8
threading of shifts over selects and phis while there. This fires here and
there in the testsuite, to not much effect. For example when compiling spirit
it fires 5 times, during early-cse, resulting in 6 more cse simplifications,
and 3 more terminators being folded by jump threading, but the final bitcode
doesn't change in any interesting way: other optimizations would have caught
the opportunity anyway, only later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123441 91177308-0d34-0410-b5e6-96231b3b80d8
While there, I noticed that the transform "undef >>a X -> undef" was wrong.
For example if X is 2 then the top two bits must be equal, so the result can
not be anything. I fixed this in the constant folder as well. Also, I made
the transform for "X << undef" stronger: it now folds to undef always, even
though X might be zero. This is in accordance with the LangRef, but I must
admit that it is fairly aggressive. Also, I added "i32 X << 32 -> undef"
following the LangRef and the constant folder, likewise fairly aggressive.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123417 91177308-0d34-0410-b5e6-96231b3b80d8
This is a minor extension of SROA to handle a special case that is
important for some ARM NEON operations. Some of the NEON intrinsics
return multiple values, which are handled as struct types containing
multiple elements of the same vector type. The corresponding return
types declared in the arm_neon.h header have equivalent arrays. We
need SROA to recognize that it can split up those arrays and structs
into separate vectors, even though they are not always accessed with
the same type. SROA already handles loads and stores of an entire
alloca by using insertvalue/extractvalue to access the individual
pieces, and that code works the same regardless of whether the type
is a struct or an array. So, all that needs to be done is to check
for compatible arrays and homogeneous structs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123381 91177308-0d34-0410-b5e6-96231b3b80d8
SROA only split up structs and arrays one level at a time, so padding can
only cause trouble if it is located in between the struct or array elements.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123380 91177308-0d34-0410-b5e6-96231b3b80d8
is "X != 0 -> X" when X is a boolean. This occurs a lot because of the way
llvm-gcc converts gcc's conditional expressions. Add this, and a few other
similar transforms for completeness.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123372 91177308-0d34-0410-b5e6-96231b3b80d8
point values to their integer representation through the SSE intrinsic
calls. This is the last part of a README.txt entry for which I have real
world examples.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123206 91177308-0d34-0410-b5e6-96231b3b80d8
larger memsets. Among other things, this fixes rdar://8760394 and
allows us to handle "Example 2" from http://blog.regehr.org/archives/320,
compiling it into a single 4096-byte memset:
_mad_synth_mute: ## @mad_synth_mute
## BB#0: ## %entry
pushq %rax
movl $4096, %esi ## imm = 0x1000
callq ___bzero
popq %rax
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123089 91177308-0d34-0410-b5e6-96231b3b80d8
1. Rip out LoopRotate's domfrontier updating code. It isn't
needed now that LICM doesn't use DF and it is super complex
and gross.
2. Make DomTree updating code a lot simpler and faster. The
old loop over all the blocks was just to find a block??
3. Change the code that inserts the new preheader to just use
SplitCriticalEdge instead of doing an overcomplex
reimplementation of it.
No behavior change, except for the name of the inserted preheader.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123072 91177308-0d34-0410-b5e6-96231b3b80d8
them into the loop preheader, eliminating silly instructions like
"icmp i32 0, 100" in fixed tripcount loops. This also better exposes the
bigger problem with loop rotate that I'd like to fix: once this has been
folded, the duplicated conditional branch *often* turns into an uncond branch.
Not aggressively handling this is pessimizing later loop optimizations
somethin' fierce by making "dominates all exit blocks" checks fail.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123060 91177308-0d34-0410-b5e6-96231b3b80d8
X = sext x; x >s c ? X : C+1 --> X = sext x; X <s C+1 ? C+1 : X
X = sext x; x <s c ? X : C-1 --> X = sext x; X >s C-1 ? C-1 : X
X = zext x; x >u c ? X : C+1 --> X = zext x; X <u C+1 ? C+1 : X
X = zext x; x <u c ? X : C-1 --> X = zext x; X >u C-1 ? C-1 : X
X = sext x; x >u c ? X : C+1 --> X = sext x; X <u C+1 ? C+1 : X
X = sext x; x <u c ? X : C-1 --> X = sext x; X >u C-1 ? C-1 : X
Instead of calculating this with mixed types promote all to the
larger type. This enables scalar evolution to analyze this
expression. PR8866
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123034 91177308-0d34-0410-b5e6-96231b3b80d8
ret i64 ptrtoint (i8* getelementptr ([1000 x i8]* @X, i64 1, i64 sub (i64 0, i64 ptrtoint ([1000 x i8]* @X to i64))) to i64)
to "ret i64 1000". This allows us to correctly compute the trip count
on a loop in PR8883, which occurs with std::fill on a char array. This
allows us to transform it into a memset with a constant size.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122950 91177308-0d34-0410-b5e6-96231b3b80d8
when safe.
The testcase is basically this nested loop:
void foo(char *X) {
for (int i = 0; i != 100; ++i)
for (int j = 0; j != 100; ++j)
X[j+i*100] = 0;
}
which gets turned into a single memset now. clang -O3 doesn't optimize
this yet though due to a phase ordering issue I haven't analyzed yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122806 91177308-0d34-0410-b5e6-96231b3b80d8
On 176.gcc, this catches 13090 loads and calls, and increases the
number of simple instructions CSE'd from 29658 to 36208.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122727 91177308-0d34-0410-b5e6-96231b3b80d8
sure that the loop we're promoting into a memcpy doesn't mutate the input
of the memcpy. Before we were just checking that the dest of the memcpy
wasn't mod/ref'd by the loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122712 91177308-0d34-0410-b5e6-96231b3b80d8
blocks in a loop, instead of just the header block. This makes it more
aggressive, able to handle Duncan's Ada examples.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122704 91177308-0d34-0410-b5e6-96231b3b80d8
in the PR, the pass could break LCSSA form when inserting preheaders. It probably
would be easy enough to fix this, but since currently we always go into LCSSA form
after running this pass, doing so is not urgent.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122695 91177308-0d34-0410-b5e6-96231b3b80d8
header for now for memset/memcpy opportunities. It turns out that loop-rotate
is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for
loops" into 2 basic block loops that loop-idiom was ignoring.
With this fix, we form many *many* more memcpy and memsets than before, including
on the "history" loops in the viterbi benchmark, which look like this:
for (j=0; j<MAX_history; ++j) {
history_new[i][j+1] = history[2*i][j];
}
Transforming these loops into memcpy's speeds up the viterbi benchmark from
11.98s to 3.55s on my machine. Woo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122685 91177308-0d34-0410-b5e6-96231b3b80d8
numbering, in which it considers (for example) "%a = add i32 %x, %y" and
"%b = add i32 %x, %y" to be equal because the operands are equal and the
result of the instructions only depends on the values of the operands.
This has almost no effect (it removes 4 instructions from gcc-as-one-file),
and perhaps slows down compilation: I measured a 0.4% slowdown on the large
gcc-as-one-file testcase, but it wasn't statistically significant.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122654 91177308-0d34-0410-b5e6-96231b3b80d8
the original instruction, half the cases were missed (making it not
wrong but suboptimal). Also correct a typo (A <-> B) in the second
chunk.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122414 91177308-0d34-0410-b5e6-96231b3b80d8
if both A op B and A op C simplify. This fires fairly often but doesn't
make that much difference. On gcc-as-one-file it removes two "and"s and
turns one branch into a select.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122399 91177308-0d34-0410-b5e6-96231b3b80d8
I still think that LVI should be handling this, but that capability is some ways off in the future,
and this matters for some significant benchmarks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122378 91177308-0d34-0410-b5e6-96231b3b80d8
not assume this (for example in case more transforms get added below
it). Suggested by Frits van Bommel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122332 91177308-0d34-0410-b5e6-96231b3b80d8
quite often, but don't make much difference in practice presumably because
instcombine also knows them and more.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122328 91177308-0d34-0410-b5e6-96231b3b80d8
a couple of existing transforms. This fires surprisingly often, for
example when compiling gcc "(X+(-1))+1->X" fires quite a lot as well
as various "and" simplifications (usually with a phi node operand).
Most of the time this doesn't make a real difference since the same
thing would have been done elsewhere anyway, eg: by instcombine, but
there are a few places where this results in simplifications that we
were not doing before.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122326 91177308-0d34-0410-b5e6-96231b3b80d8
(they had just been forgotten before). Adding Xor causes "main" in the
existing testcase 2010-11-01-lshr-mask.ll to be hugely more simplified.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122245 91177308-0d34-0410-b5e6-96231b3b80d8
argument. The generated alloca has to have at least the alignment of the
byval, if not, the client may be making assumptions that the new alloca won't
satisfy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122234 91177308-0d34-0410-b5e6-96231b3b80d8
This resolves a README entry and technically resolves PR4916,
but we still get poor code for the testcase in that PR because
GVN isn't CSE'ing uadd with add, filed as PR8817.
Previously we got:
_test7: ## @test7
addq %rsi, %rdi
cmpq %rdi, %rsi
movl $42, %eax
cmovaq %rsi, %rax
ret
Now we get:
_test7: ## @test7
addq %rsi, %rdi
movl $42, %eax
cmovbq %rsi, %rax
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122182 91177308-0d34-0410-b5e6-96231b3b80d8
sadd formed is half the size of the original type. We can
now compile this into a sadd.i8:
unsigned char X(char a, char b) {
int res = a+b;
if ((unsigned )(res+128) > 255U)
abort();
return res;
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122178 91177308-0d34-0410-b5e6-96231b3b80d8
checking to see if the high bits of the original add result were dead.
Inserting a smaller add and zexting back to that size is not good enough.
This is likely to be the fix for 8816.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122177 91177308-0d34-0410-b5e6-96231b3b80d8
which have trapping constant exprs in them due to PHI nodes.
Eliminating them can cause the constant expr to be evalutated
on new paths if the input edges are critical.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122164 91177308-0d34-0410-b5e6-96231b3b80d8
on the DragonEgg self-host bot. Unfortunately, the testcase is pretty messy and doesn't reduce well due to
interactions with other parts of InstCombine.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122072 91177308-0d34-0410-b5e6-96231b3b80d8
a null endptr argument, because they may write to errno.
This fixes a seflhost miscompile observed on Linux targets when TBAA
was enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122014 91177308-0d34-0410-b5e6-96231b3b80d8
dragonegg self-host buildbot. Original commit message:
Add an InstCombine transform to recognize instances of manual overflow-safe addition
(performing the addition in a wider type and explicitly checking for overflow), and
fold them down to intrinsics. This currently only supports signed-addition, but could
be generalized if someone works out the magic constant formulas for other operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121965 91177308-0d34-0410-b5e6-96231b3b80d8
(performing the addition in a wider type and explicitly checking for overflow), and
fold them down to intrinsics. This currently only supports signed-addition, but could
be generalized if someone works out the magic constant formulas for other operations.
Fixes <rdar://problem/8558713>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121905 91177308-0d34-0410-b5e6-96231b3b80d8
When it sees a promising select it now tries to figure out whether the condition of the select is known in any of the predecessors and if so it maps the operands appropriately.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121859 91177308-0d34-0410-b5e6-96231b3b80d8
which is simpler than finding a place to insert in BB.
- Don't perform the 'if condition hoisting' xform on certain
i1 PHIs, as it interferes with switch formation.
This re-fixes "example 7", without breaking the world hopefully.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121764 91177308-0d34-0410-b5e6-96231b3b80d8
first, it can kick in on blocks whose conditions have been
folded to a constant, even though one of the edges will be
trivially folded.
second, it doesn't clean up the "if diamond" that it just
eliminated away. This is a problem because other simplifycfg
xforms kick in depending on the order of block visitation,
causing pointless work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121762 91177308-0d34-0410-b5e6-96231b3b80d8
when simplifying, allowing them to be eagerly turned into switches. This
is the last step required to get "Example 7" from this blog post:
http://blog.regehr.org/archives/320
On X86, we now generate this machine code, which (to my eye) seems better
than the ICC generated code:
_crud: ## @crud
## BB#0: ## %entry
cmpb $33, %dil
jb LBB0_4
## BB#1: ## %switch.early.test
addb $-34, %dil
cmpb $58, %dil
ja LBB0_3
## BB#2: ## %switch.early.test
movzbl %dil, %eax
movabsq $288230376537592865, %rcx ## imm = 0x400000017001421
btq %rax, %rcx
jb LBB0_4
LBB0_3: ## %lor.rhs
xorl %eax, %eax
ret
LBB0_4: ## %lor.end
movl $1, %eax
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121690 91177308-0d34-0410-b5e6-96231b3b80d8
(x & 2^n) ? 2^m+C : C
we can offset both arms by C to get the "(x & 2^n) ? 2^m : 0" form, optimize the
select to a shift and apply the offset afterwards.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121609 91177308-0d34-0410-b5e6-96231b3b80d8
(if available) as we go so that we get simple constantexprs not insane ones.
This fixes the failure of clang/test/CodeGenCXX/virtual-base-ctor.cpp
that the previous iteration of this patch had.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121111 91177308-0d34-0410-b5e6-96231b3b80d8
memcpy's like:
memcpy(A, B)
memcpy(A, C)
we cannot delete the first memcpy as dead if A and C might be aliases.
If so, we actually get:
memcpy(A, B)
memcpy(A, A)
which is not correct to transform into:
memcpy(A, A)
This patch was heavily influenced by Jakub Staszak's patch in PR8728, thanks
Jakub!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120974 91177308-0d34-0410-b5e6-96231b3b80d8
20040709-1.c from the gcc testsuite. I was using the size of a
pointer instead of the pointee. This fixes rdar://8713376
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120519 91177308-0d34-0410-b5e6-96231b3b80d8
may-aliasing stores that partially overlap with different base
pointers. This implements PR6043 and the non-variable part of
PR8657
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120485 91177308-0d34-0410-b5e6-96231b3b80d8
now that DSE hacks on them. This fixes a regression I introduced,
by generalizing DSE to hack on transfers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120445 91177308-0d34-0410-b5e6-96231b3b80d8
about pairs of AA::Location's instead of looking for MemDep's
"Def" predicate. This is more powerful and general, handling
memset/memcpy/store all uniformly, and implementing PR8701 and
probably obsoleting parts of memcpyoptimizer.
This also fixes an obscure bug with init.trampoline and i8
stores, but I'm not surprised it hasn't been hit yet. Enhancing
init.trampoline to carry the size that it stores would allow
DSE to be much more aggressive about optimizing them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120406 91177308-0d34-0410-b5e6-96231b3b80d8
contains "ref".
Enhance DSE to use a modref query instead of a store-specific hack
to generalize the "ignore may-alias stores" optimization to handle
memset and memcpy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120368 91177308-0d34-0410-b5e6-96231b3b80d8