benchmarks, and that it can be simplified to X/Y. (In general you can only
simplify (Z*Y)/Y to Z if the multiplication did not overflow; if Z has the
form "X/Y" then this is the case). This patch implements that transform and
moves some Div logic out of instcombine and into InstructionSimplify.
Unfortunately instcombine gets in the way somewhat, since it likes to change
(X/Y)*Y into X-(X rem Y), so I had to teach instcombine about this too.
Finally, thanks to the NSW/NUW flags, sometimes we know directly that "Z*Y"
does not overflow, because the flag says so, so I added that logic too. This
eliminates a bunch of divisions and subtractions in 447.dealII, and has good
effects on some other benchmarks too. It seems to have quite an effect on
tramp3d-v4 but it's hard to say if it's good or bad because inlining decisions
changed, resulting in massive changes all over.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124487 91177308-0d34-0410-b5e6-96231b3b80d8
operand being factorized (and erased) could occur several times in Ops,
resulting in freed memory being used when the next occurrence in Ops was
analyzed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124287 91177308-0d34-0410-b5e6-96231b3b80d8
optimized code are:
(non-negative number)+(power-of-two) != 0 -> true
and
(x | 1) != 0 -> true
Instcombine knows about the second one of course, but only does it if X|1
has only one use. These fire thousands of times in the testsuite.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124183 91177308-0d34-0410-b5e6-96231b3b80d8
occurs because instcombine sinks loads and inserts phis. This kicks in
on such apps as 175.vpr, eon, 403.gcc, xalancbmk and a bunch of times in
spec2006 in some app that uses std::deque.
This resolves the last of rdar://7339113.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124090 91177308-0d34-0410-b5e6-96231b3b80d8
common cases. This triggers a surprising number of times in SPEC2K6
because min/max idioms end up doing this. For example, code from the
STL ends up looking like this to SRoA:
%202 = load i64* %__old_size, align 8, !tbaa !3
%203 = load i64* %__old_size, align 8, !tbaa !3
%204 = load i64* %__n, align 8, !tbaa !3
%205 = icmp ult i64 %203, %204
%storemerge.i = select i1 %205, i64* %__n, i64* %__old_size
%206 = load i64* %storemerge.i, align 8, !tbaa !3
We can now promote both the __n and the __old_size allocas.
This addresses another chunk of rdar://7339113, poor codegen on
stringswitch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124088 91177308-0d34-0410-b5e6-96231b3b80d8
that have PHI or select uses of their element pointers. This can often happen
when instcombine sinks two loads into a successor, inserting a phi or select.
With this patch, we can scalarize the alloca, but the pinned elements are not
yet promoted. This is still a win for large aggregates where only one element
is used. This fixes rdar://8904039 and part of rdar://7339113 (poor codegen
on stringswitch).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124070 91177308-0d34-0410-b5e6-96231b3b80d8
A == B, and A > B, does not mean we can fold it to true. We still need to
check for A ? B (A unordered B).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123993 91177308-0d34-0410-b5e6-96231b3b80d8
a select. A vector select is pairwise on each element so we'd need a new
condition with the right number of elements to select on. Fixes PR8994.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123963 91177308-0d34-0410-b5e6-96231b3b80d8
While here, I'd like to complain about how vector is not an aggregate type
according to llvm::Type::isAggregateType(), but they're listed under aggregate
types in the LangRef and zero vectors are stored as ConstantAggregateZero.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123956 91177308-0d34-0410-b5e6-96231b3b80d8
auto-simplier the transform most missed by early-cse is (zext X) != 0 -> X != 0.
This patch adds this transform and some related logic to InstructionSimplify
and removes some of the logic from instcombine (unfortunately not all because
there are several situations in which instcombine can improve things by making
new instructions, whereas instsimplify is not allowed to do this). At -O2 this
often results in more than 15% more simplifications by early-cse, and results in
hundreds of lines of bitcode being eliminated from the testsuite. I did see some
small negative effects in the testsuite, for example a few additional instructions
in three programs. One program, 483.xalancbmk, got an additional 35 instructions,
which seems to be due to a function getting an additional instruction and then
being inlined all over the place.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123911 91177308-0d34-0410-b5e6-96231b3b80d8
These were not recommended by my auto-simplifier since they don't fire often enough.
However they do fire from time to time, for example they remove one subtraction from
the final bitcode for 483.xalancbmk.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123755 91177308-0d34-0410-b5e6-96231b3b80d8
simplification in fully optimized code. It occurs sporadically in the testsuite, and
many times in 403.gcc: the final bitcode has 131 fewer subtractions after this change.
The reason that the multiplies are not eliminated is the same reason that instcombine
did not catch this: they are used by other instructions (instcombine catches this with
a more general transform which in general is only profitable if the operands have only
one use).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123754 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes the original testcase in PR8927. It also causes a clang
binary built with a patched clang to increase in size by 0.21%.
We can probably get some of the size back by writing a pass that
detects that a global never has its pointer compared and adds
unnamed_addr to it (maybe extend global opt). It is also possible that
there are some other cases clang could add unnamed_addr to.
I will investigate extending globalopt next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123584 91177308-0d34-0410-b5e6-96231b3b80d8
then don't try to decimate it into its individual pieces. This will just make a mess of the
IR and is pointless if none of the elements are individually accessed. This was generating
really terrible code for std::bitset (PR8980) because it happens to be lowered by clang
as an {[8 x i8]} structure instead of {i64}.
The testcase now is optimized to:
define i64 @test2(i64 %X) {
br label %L2
L2: ; preds = %0
ret i64 %X
}
before we generated:
define i64 @test2(i64 %X) {
%sroa.store.elt = lshr i64 %X, 56
%1 = trunc i64 %sroa.store.elt to i8
%sroa.store.elt8 = lshr i64 %X, 48
%2 = trunc i64 %sroa.store.elt8 to i8
%sroa.store.elt9 = lshr i64 %X, 40
%3 = trunc i64 %sroa.store.elt9 to i8
%sroa.store.elt10 = lshr i64 %X, 32
%4 = trunc i64 %sroa.store.elt10 to i8
%sroa.store.elt11 = lshr i64 %X, 24
%5 = trunc i64 %sroa.store.elt11 to i8
%sroa.store.elt12 = lshr i64 %X, 16
%6 = trunc i64 %sroa.store.elt12 to i8
%sroa.store.elt13 = lshr i64 %X, 8
%7 = trunc i64 %sroa.store.elt13 to i8
%8 = trunc i64 %X to i8
br label %L2
L2: ; preds = %0
%9 = zext i8 %1 to i64
%10 = shl i64 %9, 56
%11 = zext i8 %2 to i64
%12 = shl i64 %11, 48
%13 = or i64 %12, %10
%14 = zext i8 %3 to i64
%15 = shl i64 %14, 40
%16 = or i64 %15, %13
%17 = zext i8 %4 to i64
%18 = shl i64 %17, 32
%19 = or i64 %18, %16
%20 = zext i8 %5 to i64
%21 = shl i64 %20, 24
%22 = or i64 %21, %19
%23 = zext i8 %6 to i64
%24 = shl i64 %23, 16
%25 = or i64 %24, %22
%26 = zext i8 %7 to i64
%27 = shl i64 %26, 8
%28 = or i64 %27, %25
%29 = zext i8 %8 to i64
%30 = or i64 %29, %28
ret i64 %30
}
In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough
PHIs are in play that instcombine backs off. It's better to not generate this stuff
in the first place.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123571 91177308-0d34-0410-b5e6-96231b3b80d8
multiple uses. In some cases, all the uses are the same operation,
so instcombine can go ahead and promote the phi. In the testcase
this pushes an add out of the loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123568 91177308-0d34-0410-b5e6-96231b3b80d8
The basic issue is that isel (very reasonably!) expects conditional branches
to be folded, so CGP leaving around a bunch dead computation feeding
conditional branches isn't such a good idea. Just fold branches on constants
into unconditional branches.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123526 91177308-0d34-0410-b5e6-96231b3b80d8
have objectsize folding recursively simplify away their result when it
folds. It is important to catch this here, because otherwise we won't
eliminate the cross-block values at isel and other times.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123524 91177308-0d34-0410-b5e6-96231b3b80d8
simplification present in fully optimized code (I think instcombine fails to
transform some of these when "X-Y" has more than one use). Fires here and
there all over the test-suite, for example it eliminates 8 subtractions in
the final IR for 445.gobmk, 2 subs in 447.dealII, 2 in paq8p etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123442 91177308-0d34-0410-b5e6-96231b3b80d8
threading of shifts over selects and phis while there. This fires here and
there in the testsuite, to not much effect. For example when compiling spirit
it fires 5 times, during early-cse, resulting in 6 more cse simplifications,
and 3 more terminators being folded by jump threading, but the final bitcode
doesn't change in any interesting way: other optimizations would have caught
the opportunity anyway, only later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123441 91177308-0d34-0410-b5e6-96231b3b80d8