%shr = lshr i64 %key, 3
%0 = load i64* %val, align 8
%sub = add i64 %0, -1
%and = and i64 %sub, %shr
ret i64 %and
to:
%shr = lshr i64 %key, 3
%0 = load i64* %val, align 8
%sub = add i64 %0, 2305843009213693951
%and = and i64 %sub, %shr
ret i64 %and
The demanded bit optimization is actually a pessimization because add -1 would
be codegen'ed as a sub 1. Teach the demanded constant shrinking optimization
to check for negated constant to make sure it is actually reducing the width
of the constant.
rdar://11793464
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160101 91177308-0d34-0410-b5e6-96231b3b80d8
This patch removes ~70 lines in InstCombineLoadStoreAlloca.cpp and makes both functions a bit more aggressive than before :)
In theory, we can be more aggressive when removing an alloca than a malloc, because an alloca pointer should never escape, but we are not taking advantage of this anyway
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159952 91177308-0d34-0410-b5e6-96231b3b80d8
This means we can do cheap DSE for heap memory.
Nothing is done if the pointer excapes or has a load.
The churn in the tests is mostly due to objectsize, since we want to make sure we
don't delete the malloc call before evaluating the objectsize (otherwise it becomes -1/0)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159876 91177308-0d34-0410-b5e6-96231b3b80d8
another mechanical change accomplished though the power of terrible Perl
scripts.
I have manually switched some "s to 's to make escaping simpler.
While I started this to fix tests that aren't run in all configurations,
the massive number of tests is due to a really frustrating fragility of
our testing infrastructure: things like 'grep -v', 'not grep', and
'expected failures' can mask broken tests all too easily.
Essentially, I'm deeply disturbed that I can change the testsuite so
radically without causing any change in results for most platforms. =/
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159547 91177308-0d34-0410-b5e6-96231b3b80d8
This was done through the aid of a terrible Perl creation. I will not
paste any of the horrors here. Suffice to say, it require multiple
staged rounds of replacements, state carried between, and a few
nested-construct-parsing hacks that I'm not proud of. It happens, by
luck, to be able to deal with all the TCL-quoting patterns in evidence
in the LLVM test suite.
If anyone is maintaining large out-of-tree test trees, feel free to poke
me and I'll send you the steps I used to convert things, as well as
answer any painful questions etc. IRC works best for this type of thing
I find.
Once converted, switch the LLVM lit config to use ShTests the same as
Clang. In addition to being able to delete large amounts of Python code
from 'lit', this will also simplify the entire test suite and some of
lit's architecture.
Finally, the test suite runs 33% faster on Linux now. ;]
For my 16-hardware-thread (2x 4-core xeon e5520): 36s -> 24s
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159525 91177308-0d34-0410-b5e6-96231b3b80d8
// C - zext(bool) -> bool ? C - 1 : C
if (ZExtInst *ZI = dyn_cast<ZExtInst>(Op1))
if (ZI->getSrcTy()->isIntegerTy(1))
return SelectInst::Create(ZI->getOperand(0), SubOne(C), C);
This ends up forming sext i1 instructions that codegen to terrible code. e.g.
int blah(_Bool x, _Bool y) {
return (x - y) + 1;
}
=>
movzbl %dil, %eax
movzbl %sil, %ecx
shll $31, %ecx
sarl $31, %ecx
leal 1(%rax,%rcx), %eax
ret
Without the rule, llvm now generates:
movzbl %sil, %ecx
movzbl %dil, %eax
incl %eax
subl %ecx, %eax
ret
It also helps with ARM (and pretty much any target that doesn't have a sext i1 :-).
The transformation was done as part of Eli's r75531. He has given the ok to
remove it.
rdar://11748024
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159230 91177308-0d34-0410-b5e6-96231b3b80d8
merge all zero-sized alloca's into one, fixing c43204g from the Ada ACATS
conformance testsuite. What happened there was that a variable sized object
was being allocated on the stack, "alloca i8, i32 %size". It was then being
passed to another function, which tested that the address was not null (raising
an exception if it was) then manipulated %size bytes in it (load and/or store).
The optimizers cleverly managed to deduce that %size was zero (congratulations
to them, as it isn't at all obvious), which made the alloca zero size, causing
the optimizers to replace it with null, which then caused the check mentioned
above to fail, and the exception to be raised, wrongly. Note that no loads
and stores were actually being done to the alloca (the loop that does them is
executed %size times, i.e. is not executed), only the not-null address check.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159202 91177308-0d34-0410-b5e6-96231b3b80d8
- simplifycfg: invoke undef/null -> unreachable
- instcombine: invoke new -> invoke expect(0, 0) (an arbitrary NOOP intrinsic; only done if the allocated memory is unused, of course)
- verifier: allow invoke of intrinsics (to make the previous step work)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159146 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes PR5997.
These transforms were disabled because codegen couldn't deal with other
uses of trunc(x). This is now handled by the peephole pass.
This causes no regressions on x86-64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159003 91177308-0d34-0410-b5e6-96231b3b80d8
- provide more extensive set of functions to detect library allocation functions (e.g., malloc, calloc, strdup, etc)
- provide an API to compute the size and offset of an object pointed by
Move a few clients (GVN, AA, instcombine, ...) to the new API.
This implementation is a lot more aggressive than each of the custom implementations being replaced.
Patch reviewed by Nick Lewycky and Chandler Carruth, thanks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158919 91177308-0d34-0410-b5e6-96231b3b80d8
This saves a cast, and zext is more expensive on platforms with subreg support
than trunc is. This occurs in the BSD implementation of memchr(3), see PR12750.
On the synthetic benchmark from that bug stupid_memchr and bsd_memchr have the
same performance now when not inlining either function.
stupid_memchr: 323.0us
bsd_memchr: 321.0us
memchr: 479.0us
where memchr is the llvm-gcc compiled bsd_memchr from osx lion's libc. When
inlining is enabled bsd_memchr still regresses down to llvm-gcc memchr time,
I haven't fully understood the issue yet, something is grossly mangling the
loop after inlining.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158297 91177308-0d34-0410-b5e6-96231b3b80d8
-%a + 42
into
42 - %a
previously we were emitting:
-(%a + 42)
This fixes the infinite loop in PR12338. The generated code is still not perfect, though.
Will work on that next
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158237 91177308-0d34-0410-b5e6-96231b3b80d8
The test case feeds the following into InstCombine's visitSelect:
%tobool8 = icmp ne i32 0, 0
%phitmp = select i1 %tobool8, i32 3, i32 0
Then instcombine replaces the right side of the switch with 0, doesn't notice
that nothing changes and tries again indefinitely.
This fixes PR12897.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157587 91177308-0d34-0410-b5e6-96231b3b80d8
add an additional parameter to InstCombiner::EmitGEPOffset() to force it to *not* emit operations with NUW flag
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156585 91177308-0d34-0410-b5e6-96231b3b80d8
refactor code a bit to enable future changes to support run-time information
add support to compute allocation sizes at run-time if penalty > 1 (e.g., malloc(x), calloc(x, y), and VLAs)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156515 91177308-0d34-0410-b5e6-96231b3b80d8
<rdar://problem/11291436>.
This is a second attempt at a fix for this, the first was r155468. Thanks
to Chandler, Bob and others for the feedback that helped me improve this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155866 91177308-0d34-0410-b5e6-96231b3b80d8
Original commit message:
Defer some shl transforms to DAGCombine.
The shl instruction is used to represent multiplication by a constant
power of two as well as bitwise left shifts. Some InstCombine
transformations would turn an shl instruction into a bit mask operation,
making it difficult for later analysis passes to recognize the
constsnt multiplication.
Disable those shl transformations, deferring them to DAGCombine time.
An 'shl X, C' instruction is now treated mostly the same was as 'mul X, C'.
These transformations are deferred:
(X >>? C) << C --> X & (-1 << C) (When X >> C has multiple uses)
(X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2) (When C2 > C1)
(X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2) (When C1 > C2)
The corresponding exact transformations are preserved, just like
div-exact + mul:
(X >>?,exact C) << C --> X
(X >>?,exact C1) << C2 --> X << (C2-C1)
(X >>?,exact C1) << C2 --> X >>?,exact (C1-C2)
The disabled transformations could also prevent the instruction selector
from recognizing rotate patterns in hash functions and cryptographic
primitives. I have a test case for that, but it is too fragile.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155362 91177308-0d34-0410-b5e6-96231b3b80d8
While the patch was perfect and defect free, it exposed a really nasty
bug in X86 SelectionDAG that caused an llc crash when compiling lencod.
I'll put the patch back in after fixing the SelectionDAG problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155181 91177308-0d34-0410-b5e6-96231b3b80d8
The shl instruction is used to represent multiplication by a constant
power of two as well as bitwise left shifts. Some InstCombine
transformations would turn an shl instruction into a bit mask operation,
making it difficult for later analysis passes to recognize the
constsnt multiplication.
Disable those shl transformations, deferring them to DAGCombine time.
An 'shl X, C' instruction is now treated mostly the same was as 'mul X, C'.
These transformations are deferred:
(X >>? C) << C --> X & (-1 << C) (When X >> C has multiple uses)
(X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2) (When C2 > C1)
(X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2) (When C1 > C2)
The corresponding exact transformations are preserved, just like
div-exact + mul:
(X >>?,exact C) << C --> X
(X >>?,exact C1) << C2 --> X << (C2-C1)
(X >>?,exact C1) << C2 --> X >>?,exact (C1-C2)
The disabled transformations could also prevent the instruction selector
from recognizing rotate patterns in hash functions and cryptographic
primitives. I have a test case for that, but it is too fragile.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155136 91177308-0d34-0410-b5e6-96231b3b80d8
GEPs, bit casts, and stores reaching it but no other instructions. These
often show up during the iterative processing of the inliner, SROA, and
DCE. Once we hit this point, we can completely remove the alloca. These
were actually showing up in the final, fully optimized code in a bunch
of inliner tests I've been working on, and notably they show up after
LLVM finishes optimizing away all function calls involved in
hash_combine(a, b).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154285 91177308-0d34-0410-b5e6-96231b3b80d8
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154011 91177308-0d34-0410-b5e6-96231b3b80d8
overflow checking multiply intrinsic as well.
Add a test for this, updating the test from grep to FileCheck.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153028 91177308-0d34-0410-b5e6-96231b3b80d8
alignment. If that's the case, then we want to make sure that we don't increase
the alignment of the store instruction. Because if we increase it to be "more
aligned" than the pointer, code-gen may use instructions which require a greater
alignment than the pointer guarantees.
<rdar://problem/11043589>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152907 91177308-0d34-0410-b5e6-96231b3b80d8
The 'CmpInst::isFalseWhenEqual' function returns 'false' for values other than
simply equality. For instance, it returns 'false' for <= or >=. This isn't the
correct behavior for this transformation, which is checking for strict equality
and non-equality. It was causing the gcc.c-torture/execute/frame-address.c test
to fail because it would completely (and incorrectly) optimize a whole function
into a 'ret i32 0'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152497 91177308-0d34-0410-b5e6-96231b3b80d8
by using llvm::isIdentifiedObject. Also teach it to handle GEPs that have
the same base pointer and constant operands. Fixes PR11238!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151449 91177308-0d34-0410-b5e6-96231b3b80d8
This transformation is not safe in some pathological cases (signed icmp of pointers should be an
extremely rare thing, but it's valid IR!). Add an explanatory comment.
Kudos to Duncan for pointing out this edge case (and not giving up explaining it until I finally got it).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151055 91177308-0d34-0410-b5e6-96231b3b80d8
- Ignore pointer casts.
- Also expand GEPs that aren't constantexprs when they have one use or only constant indices.
- We now compile "&foo[i] - &foo[j]" into "i - j".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@150961 91177308-0d34-0410-b5e6-96231b3b80d8
Changing arguments from being passed as fixed to varargs is unsafe, as
the ABI may require they be handled differently (stack vs. register, for
example).
Remove two tests which rely on the bitcast being folded into the direct
call, which is exactly the transformation that's unsafe.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@149457 91177308-0d34-0410-b5e6-96231b3b80d8
Unfortunately I also had to disable constant-pool-sharing.ll the code it tests has been
updated to use the IL logic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@149148 91177308-0d34-0410-b5e6-96231b3b80d8
We still save an instruction when just the "and" part is replaced.
Also change the code to match comments more closely.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147753 91177308-0d34-0410-b5e6-96231b3b80d8
This was intended to undo the sub canonicalization in cases where it's not profitable, but it also
finds some cases on it's own.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147256 91177308-0d34-0410-b5e6-96231b3b80d8
unsigned foo(unsigned x) { return 31 - __builtin_clz(x); }
now compiles into a single "bsrl" instruction on x86.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147255 91177308-0d34-0410-b5e6-96231b3b80d8
This has the obvious advantage of being commutable and is always a win on x86 because
const - x wastes a register there. On less weird architectures this may lead to
a regression because other arithmetic doesn't fuse with it anymore. I'll address that
problem in a followup.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147254 91177308-0d34-0410-b5e6-96231b3b80d8
I followed three heuristics for deciding whether to set 'true' or
'false':
- Everything target independent got 'true' as that is the expected
common output of the GCC builtins.
- If the target arch only has one way of implementing this operation,
set the flag in the way that exercises the most of codegen. For most
architectures this is also the likely path from a GCC builtin, with
'true' being set. It will (eventually) require lowering away that
difference, and then lowering to the architecture's operation.
- Otherwise, set the flag differently dependending on which target
operation should be tested.
Let me know if anyone has any issue with this pattern or would like
specific tests of another form. This should allow the x86 codegen to
just iteratively improve as I teach the backend how to differentiate
between the two forms, and everything else should remain exactly the
same.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146370 91177308-0d34-0410-b5e6-96231b3b80d8
weak variable are compiled by different compilers, such as GCC and LLVM, while
LLVM may increase the alignment to the preferred alignment there is no reason to
think that GCC will use anything more than the ABI alignment. Since it is the
GCC version that might end up in the final program (as the linkage is weak), it
is wrong to increase the alignment of loads from the global up to the preferred
alignment as the alignment might only be the ABI alignment.
Increasing alignment up to the ABI alignment might be OK, but I'm not totally
convinced that it is. It seems better to just leave the alignment of weak
globals alone.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145413 91177308-0d34-0410-b5e6-96231b3b80d8
trampoline forms. Both of these were correct in LLVM 3.0, and we don't
need to support LLVM 2.9 and earlier in mainline.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145174 91177308-0d34-0410-b5e6-96231b3b80d8