bits known clear in the result and don't care about the # casts
eliminated. TD is also dead but keeping it for now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@93098 91177308-0d34-0410-b5e6-96231b3b80d8
1) don't try to optimize a sext or zext that is only used by a trunc, let
the trunc get optimized first. This avoids some pointless effort in
some common cases since instcombine scans down a block in the first pass.
2) Change the cost model for zext elimination to consider an 'and' cheaper
than a zext. This allows us to do it more aggressively, and for the next
patch to simplify the code quite a bit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@93097 91177308-0d34-0410-b5e6-96231b3b80d8
commonIntCastTransforms into the callers, eliminating a switch,
and allowing the static predicate methods to be moved down to
live next to the corresponding function. No functionality
change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@93089 91177308-0d34-0410-b5e6-96231b3b80d8
that feeds into a zext, similar to the patch I did yesterday for sext.
There is a lot of room for extension beyond this patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92962 91177308-0d34-0410-b5e6-96231b3b80d8
to an element of a vector in a static ctor) which occurs with an
unrelated patch I'm testing. Annoyingly, EvaluateStoreInto basically
does exactly the same stuff as InsertElement constant folding, but it
now handles vectors, and you can't insertelement into a vector. It
would be 'really nice' if GEP into a vector were not legal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92889 91177308-0d34-0410-b5e6-96231b3b80d8
phi nodes when deciding which pointers point to local memory.
I actually checked long ago how useful this is, and it isn't
very: it hardly ever fires in the testsuite, but since Chris
wants it here it is!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92836 91177308-0d34-0410-b5e6-96231b3b80d8
memcpy, memset and other intrinsics that only access their arguments
to be readnone if the intrinsic's arguments all point to local memory.
This improves the testcase in the README to readonly, but it could in
theory be made readnone, however this would involve more sophisticated
analysis that looks through the memcpy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92829 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, instcombine would only promote an expression tree to
the larger type if doing so eliminated two casts. This is because
a need to manually do the sign extend after the promoted expression
tree with two shifts. Now, we keep track of whether the result of
the computation is going to be properly sign extended already. If
so, we can unconditionally promote the expression, which allows us
to zap more sext's.
This implements rdar://6598839 (aka gcc pr38751)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92815 91177308-0d34-0410-b5e6-96231b3b80d8
The only difference is that EvaluateInDifferentType checks to ensure
they are profitable before doing them :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92788 91177308-0d34-0410-b5e6-96231b3b80d8
RecursivelyDeleteDeadPHINode, and DeleteDeadPHIs return a flag
indicating whether they made any changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92732 91177308-0d34-0410-b5e6-96231b3b80d8
dyn_castNotVal in the X+~X transform. dyn_castNotVal is
dramatic overkill for what the xform needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92704 91177308-0d34-0410-b5e6-96231b3b80d8
Eliminate the 'AddMaskingAnd' transformation, it is redundant with this
more general code right below it:
// A+B --> A|B iff A and B have no bits set in common.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92693 91177308-0d34-0410-b5e6-96231b3b80d8
that got instantiated. There is no reason for instcombine
to try this hard for simple associative optimizations. Next
up, eliminate the template completely.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92692 91177308-0d34-0410-b5e6-96231b3b80d8
when doing this transform if the GEP is not inbounds. No testcase because
it is very difficult to trigger this: instcombine already canonicalizes
GEP indices to pointer size, so it relies specific permutations of the
instcombine worklist.
Thanks to Duncan for pointing this possible problem out.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92495 91177308-0d34-0410-b5e6-96231b3b80d8
on the example in PR4216. This doesn't trigger in the testsuite,
so I'd really appreciate someone scrutinizing the logic for
correctness.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92458 91177308-0d34-0410-b5e6-96231b3b80d8
occurs in 403.gcc in mode_mask_array, in safe-ctype.c (which
is copied in multiple apps) in _sch_istable, etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92427 91177308-0d34-0410-b5e6-96231b3b80d8
when a consequtive sequence of elements all satisfies the
predicate. Like the double compare case, this generates better
code than the magic constant case and generalizes to more than
32/64 element array lookups.
Here are some examples where it triggers. From 403.gcc, most
accesses to the rtx_class array are handled, e.g.:
@rtx_class = constant [153 x i8] c"xxxxxmmmmmmmmxxxxxxxxxxxxmxxxxxxiiixxxxxxxxxxxxxxxxxxxooxooooooxxoooooox3x2c21c2222ccc122222ccccaaaaaa<<<<<<<<<<<<<<<<<<111111111111bbooxxxxxxxxxxcc2211x", align 32 ; <[153 x i8]*> [#uses=547]
%142 = icmp eq i8 %141, 105
@rtx_class = constant [153 x i8] c"xxxxxmmmmmmmmxxxxxxxxxxxxmxxxxxxiiixxxxxxxxxxxxxxxxxxxooxooooooxxoooooox3x2c21c2222ccc122222ccccaaaaaa<<<<<<<<<<<<<<<<<<111111111111bbooxxxxxxxxxxcc2211x", align 32 ; <[153 x i8]*> [#uses=543]
%165 = icmp eq i8 %164, 60
Also, most of the 59-element arrays (mode_class/rid_to_yy, etc)
optimized before are actually range compares. This lets 32-bit
machines optimize them.
400.perlbmk has stuff like this:
400.perlbmk: PL_regkind, even for 32-bit:
@PL_regkind = constant [62 x i8] c"\00\00\02\02\02\06\06\06\06\09\09\0B\0B\0D\0E\0E\0E\11\12\12\14\14\16\16\18\18\1A\1A\1C\1C\1E\1F !!!$$&'((((,-.///88886789:;8$", align 32 ; <[62 x i8]*> [#uses=4]
%811 = icmp ne i8 %810, 33
@PL_utf8skip = constant [256 x i8] c"\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\01\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\02\03\03\03\03\03\03\03\03\03\03\03\03\03\03\03\03\04\04\04\04\04\04\04\04\05\05\05\05\06\06\07\0D", align 32 ; <[256 x i8]*> [#uses=94]
%12 = icmp ult i8 %10, 2
etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92426 91177308-0d34-0410-b5e6-96231b3b80d8
two elements match or don't match with two comparisons. For
example, the testcase compiles into:
define i1 @test5(i32 %X) {
%1 = icmp eq i32 %X, 2 ; <i1> [#uses=1]
%2 = icmp eq i32 %X, 7 ; <i1> [#uses=1]
%R = or i1 %1, %2 ; <i1> [#uses=1]
ret i1 %R
}
This generalizes the previous xforms when the array is larger than
64 elements (and this case matches) and generates better code for
cases where it overlaps with the magic bitshift case.
This generalizes more cases than you might expect. For example,
400.perlbmk has:
@PL_utf8skip = constant [256 x i8] c"\01\01\01\...
%15 = icmp ult i8 %7, 7
403.gcc has:
@rid_to_yy = internal constant [114 x i16] [i16 259, i16 260, ...
%18 = icmp eq i16 %16, 295
and xalancbmk has a bunch of examples, such as
_ZN11xercesc_2_5L15gCombiningCharsE and _ZN11xercesc_2_5L10gBaseCharsE.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92417 91177308-0d34-0410-b5e6-96231b3b80d8
arrays with variable indices into a comparison of the index
with a constant. The most common occurrence of this that
I see by far is stuff like:
if ("foobar"[i] == '\0') ...
which we compile into: if (i == 6), saving a load and
materialization of the global address. This also exposes
loop trip count information to later passes in many cases.
This triggers hundreds of times in xalancbmk, which is where I first
noticed it, but it also triggers in many other apps. Here are a few
interesting ones from various apps:
@must_be_connected_without = internal constant [8 x i8*] [i8* getelementptr inbounds ([3 x i8]* @.str64320, i64 0, i64 0), i8* getelementptr inbounds ([3 x i8]* @.str27283, i64 0, i64 0), i8* getelementptr inbounds ([4 x i8]* @.str71327, i64 0, i64 0), i8* getelementptr inbounds ([4 x i8]* @.str72328, i64 0, i64 0), i8* getelementptr inbounds ([3 x i8]* @.str18274, i64 0, i64 0), i8* getelementptr inbounds ([6 x i8]* @.str11267, i64 0, i64 0), i8* getelementptr inbounds ([3 x i8]* @.str32288, i64 0, i64 0), i8* null], align 32 ; <[8 x i8*]*> [#uses=2]
%scevgep.i = getelementptr [8 x i8*]* @must_be_connected_without, i64 0, i64 %indvar.i ; <i8**> [#uses=1]
%17 = load ...
%18 = icmp eq i8* %17, null ; <i1> [#uses=1]
-> icmp eq i64 %indvar.i, 7
@yytable1095 = internal constant [84 x i8] c"\12\01(\05\06\07\08\09\0A\0B\0C\0D\0E1\0F\10\11266\1D: \10\11,-,0\03'\10\11B6\04\17&\18\1945\05\06\07\08\09\0A\0B\0C\0D\0E\1E\0F\10\11*\1A\1B\1C$3+>#%;<IJ=ADFEGH9KL\00\00\00C", align 32 ; <[84 x i8]*> [#uses=2]
%57 = getelementptr inbounds [84 x i8]* @yytable1095, i64 0, i64 %56 ; <i8*> [#uses=1]
%mode.0.in = getelementptr inbounds [9 x i32]* @mb_mode_table, i64 0, i64 %.pn ; <i32*> [#uses=1]
load ...
%64 = icmp eq i8 %58, 4 ; <i1> [#uses=1]
-> icmp eq i64 %.pn, 35 ; <i1> [#uses=0]
@gsm_DLB = internal constant [4 x i16] [i16 6554, i16 16384, i16 26214, i16 32767]
%scevgep.i = getelementptr [4 x i16]* @gsm_DLB, i64 0, i64 %indvar.i ; <i16*> [#uses=1]
%425 = load %scevgep.i
%426 = icmp eq i16 %425, -32768 ; <i1> [#uses=0]
-> false
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92411 91177308-0d34-0410-b5e6-96231b3b80d8
pointer to int casts that confuse later optimizations. See PR3351
for details.
This improves but doesn't complete fix 483.xalancbmk because llvm-gcc
does this xform in GCC's "fold" routine as well. Clang++ will do
better I guess.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92408 91177308-0d34-0410-b5e6-96231b3b80d8
a constantexpr gep on the 'base' side of the expression.
This completes comment #4 in PR3351, which comes from
483.xalancbmk.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92402 91177308-0d34-0410-b5e6-96231b3b80d8
positive and negative forms of constants together. This
allows us to compile:
int foo(int x, int y) {
return (x-y) + (x-y) + (x-y);
}
into:
_foo: ## @foo
subl %esi, %edi
leal (%rdi,%rdi,2), %eax
ret
instead of (where the 3 and -3 were not factored):
_foo:
imull $-3, 8(%esp), %ecx
imull $3, 4(%esp), %eax
addl %ecx, %eax
ret
this started out as:
movl 12(%ebp), %ecx
imull $3, 8(%ebp), %eax
subl %ecx, %eax
subl %ecx, %eax
subl %ecx, %eax
ret
This comes from PR5359.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92381 91177308-0d34-0410-b5e6-96231b3b80d8
getMDKindID/getMDKindNames methods to LLVMContext (and add
convenience methods to Module), eliminating MetadataContext.
Move the state that it maintains out to LLVMContext.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92259 91177308-0d34-0410-b5e6-96231b3b80d8
I asked Devang to do back on Sep 27. Instead of going through the
MetadataContext class with methods like getMD() and getMDs(), just
ask the instruction directly for its metadata with getMetadata()
and getAllMetadata().
This includes a variety of other fixes and improvements: previously
all Value*'s were bloated because the HasMetadata bit was thrown into
value, adding a 9th bit to a byte. Now this is properly sunk down to
the Instruction class (the only place where it makes sense) and it
will be folded away somewhere soon.
This also fixes some confusion in getMDs and its clients about
whether the returned list is indexed by the MDID or densely packed.
This is now returned sorted and densely packed and the comments make
this clear.
This introduces a number of fixme's which I'll follow up on.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92235 91177308-0d34-0410-b5e6-96231b3b80d8
non-templated IRBuilderBase class. Move that large CreateGlobalString
out of line, eliminating the need to #include GlobalVariable.h in IRBuilder.h
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92227 91177308-0d34-0410-b5e6-96231b3b80d8
SDISel. This optimization was causing simplifylibcalls to
introduce type-unsafe nastiness. This is the first step, I'll be
expanding the memcmp optimizations shortly, covering things that
we really really wouldn't want simplifylibcalls to do.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@92098 91177308-0d34-0410-b5e6-96231b3b80d8
load is needed when we have a small store into a large alloca (at which
point we get a load/insert/store sequence), but when you do a full-sized
store, this load ends up being dead.
This dead load is bad in really large nasty testcases where the load ends
up causing mem2reg to insert large chains of dependent phi nodes which only
ADCE can delete. Instead of doing this, just don't insert the dead load.
This fixes rdar://6864035
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91917 91177308-0d34-0410-b5e6-96231b3b80d8
missing check that an array reference doesn't go past the end of the array,
and remove some redundant checks for in-bound array and vector references
that are no longer needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91897 91177308-0d34-0410-b5e6-96231b3b80d8
by merging all returns in a function into a single one, but simplifycfg
currently likes to duplicate the return (an unfortunate choice!)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91890 91177308-0d34-0410-b5e6-96231b3b80d8
instead of stored. This reduces memdep memory usage, and also eliminates a bunch of
weakvh's. This speeds up gvn on gcc.c-torture/20001226-1.c from 23.9s to 8.45s (2.8x)
on a different machine than earlier.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91885 91177308-0d34-0410-b5e6-96231b3b80d8
load to avoid even messing around with SSAUpdate at all. In this case (which
is very common, we can just use the input value directly).
This speeds up GVN time on gcc.c-torture/20001226-1.c from 36.4s to 16.3s,
which still isn't great, but substantially better and this is a simple speedup
that applies to lots of different cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91851 91177308-0d34-0410-b5e6-96231b3b80d8
two-element arrays. After restructuring the SROA code, it was not safe to
do this without adding more checking. It is not clear that this special-case
has really been useful, and removing this simplifies the code quite a bit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91828 91177308-0d34-0410-b5e6-96231b3b80d8
'GetValueInMiddleOfBlock' case, instead of inserting
duplicates.
A similar fix is almost certainly needed by the machine-level
SSAUpdate implementation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91820 91177308-0d34-0410-b5e6-96231b3b80d8
implement some optimizations for MIN(MIN()) and MAX(MAX()) and
MIN(MAX()) etc. This substantially improves the code in PR5822 but
doesn't kick in much elsewhere. 2 max's were optimized in
pairlocalalign and one in smg2000.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91814 91177308-0d34-0410-b5e6-96231b3b80d8
Use the presence of NSW/NUW to fold "icmp (x+cst), x" to a constant in
cases where it would otherwise be undefined behavior.
Surprisingly (to me at least), this triggers hundreds of the times in
a few benchmarks: lencode, ldecode, and 466.h264ref seem to *really*
like this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91812 91177308-0d34-0410-b5e6-96231b3b80d8
a bunch in lencode, ldecod, spass, 176.gcc, 252.eon, among others. It is
also the first part of PR5822
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91811 91177308-0d34-0410-b5e6-96231b3b80d8
where instcombine would have to split a critical edge due to a
phi node of an invoke. Since instcombine can't change the CFG,
it has to bail out from doing the transformation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91763 91177308-0d34-0410-b5e6-96231b3b80d8
* change FindElementAndOffset to return a uint64_t instead of unsigned, and
to identify the type to be used for that result in a GEP instruction.
* move "isa<ConstantInt>" to be first in conditional.
* replace some dyn_casts with casts.
* add a comment about handling mem intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91762 91177308-0d34-0410-b5e6-96231b3b80d8
bootstrap. This also replaces the WeakVH references that Chris objected to
with normal Value references.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91711 91177308-0d34-0410-b5e6-96231b3b80d8
contains another loop, or an instruction. The loop form is
substantially more efficient on large loops than the typical
code it replaces.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91654 91177308-0d34-0410-b5e6-96231b3b80d8
of 91296 that caused trouble -- the Processed list needs to be
preserved for the livetime of the pass, as AddUsersIfInteresting
is called from other passes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91641 91177308-0d34-0410-b5e6-96231b3b80d8
to memcpy. (Such a memcpy is technically illegal, but in practice is safe
and is generated by struct self-assignment in C code.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91621 91177308-0d34-0410-b5e6-96231b3b80d8
problem", this broke llvm-gcc bootstrap for release builds on
x86_64-apple-darwin10.
This reverts commit db22309800b224a9f5f51baf76071d7a93ce59c9.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91534 91177308-0d34-0410-b5e6-96231b3b80d8
found last time. Instead of trying to modify the IR while iterating over it,
I've change it to keep a list of WeakVH references to dead instructions, and
then delete those instructions later. I also added some special case code to
detect and handle the situation when both operands of a memcpy intrinsic are
referencing the same alloca.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91459 91177308-0d34-0410-b5e6-96231b3b80d8
isPodLike type trait. This is a generally useful type trait for
more than just DenseMap, and we really care about whether something
acts like a pod, not whether it really is a pod.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91421 91177308-0d34-0410-b5e6-96231b3b80d8
While scanning through the uses of an alloca, keep track of the current offset
relative to the start of the alloca, and check memory references to see if
the offset & size correspond to a component within the alloca. This has the
nice benefit of unifying much of the code from isSafeUseOfAllocation,
isSafeElementUse, and isSafeUseOfBitCastedAllocation. The code to rewrite
the uses of a promoted alloca, after it is determined to be safe, is
reorganized in the same way.
Also, when rewriting GEP instructions, mark them as "in-bounds" since all the
indices are known to be safe.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91184 91177308-0d34-0410-b5e6-96231b3b80d8
value size. This only manifested when memdep inprecisely returns clobber,
which is do to a caching issue in the PR5744 testcase. We can 'efficiently
emulate' this by using '-no-aa'
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@91004 91177308-0d34-0410-b5e6-96231b3b80d8
clobbers to forward pieces of large stores to small loads, we need to consider
the properly phi translated pointer in the store block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@90978 91177308-0d34-0410-b5e6-96231b3b80d8
phi translation of complex expressions like &A[i+1]. This has the
following benefits:
1. The phi translation logic is all contained in its own class with
a strong interface and verification that it is self consistent.
2. The logic is more correct than before. Previously, if intermediate
expressions got PHI translated, we'd miss the update and scan for
the wrong pointers in predecessor blocks. @phi_trans2 is a testcase
for this.
3. We have a lot less code in memdep.
We can handle phi translation across blocks of things like @phi_trans3,
which is pretty insane :).
This patch should fix the miscompiles of 255.vortex, and I tested it
with a bootstrap of llvm-gcc, llvm-test and dejagnu of course.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@90926 91177308-0d34-0410-b5e6-96231b3b80d8
I'm not aware that this does anything significant on its own, but it's
needed for another patch that I'm working on.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@90864 91177308-0d34-0410-b5e6-96231b3b80d8