- provide more extensive set of functions to detect library allocation functions (e.g., malloc, calloc, strdup, etc)
- provide an API to compute the size and offset of an object pointed by
Move a few clients (GVN, AA, instcombine, ...) to the new API.
This implementation is a lot more aggressive than each of the custom implementations being replaced.
Patch reviewed by Nick Lewycky and Chandler Carruth, thanks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158919 91177308-0d34-0410-b5e6-96231b3b80d8
With this change, we avoid relying on the IR Builder to constant fold the operations.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158829 91177308-0d34-0410-b5e6-96231b3b80d8
This saves a cast, and zext is more expensive on platforms with subreg support
than trunc is. This occurs in the BSD implementation of memchr(3), see PR12750.
On the synthetic benchmark from that bug stupid_memchr and bsd_memchr have the
same performance now when not inlining either function.
stupid_memchr: 323.0us
bsd_memchr: 321.0us
memchr: 479.0us
where memchr is the llvm-gcc compiled bsd_memchr from osx lion's libc. When
inlining is enabled bsd_memchr still regresses down to llvm-gcc memchr time,
I haven't fully understood the issue yet, something is grossly mangling the
loop after inlining.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158297 91177308-0d34-0410-b5e6-96231b3b80d8
-%a + 42
into
42 - %a
previously we were emitting:
-(%a + 42)
This fixes the infinite loop in PR12338. The generated code is still not perfect, though.
Will work on that next
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158237 91177308-0d34-0410-b5e6-96231b3b80d8
The test case feeds the following into InstCombine's visitSelect:
%tobool8 = icmp ne i32 0, 0
%phitmp = select i1 %tobool8, i32 3, i32 0
Then instcombine replaces the right side of the switch with 0, doesn't notice
that nothing changes and tries again indefinitely.
This fixes PR12897.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157587 91177308-0d34-0410-b5e6-96231b3b80d8
move EmitGEPOffset from InstCombine to Transforms/Utils/Local.h
(a draft of this) patch reviewed by Andrew, thanks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157261 91177308-0d34-0410-b5e6-96231b3b80d8
add an additional parameter to InstCombiner::EmitGEPOffset() to force it to *not* emit operations with NUW flag
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156585 91177308-0d34-0410-b5e6-96231b3b80d8
refactor code a bit to enable future changes to support run-time information
add support to compute allocation sizes at run-time if penalty > 1 (e.g., malloc(x), calloc(x, y), and VLAs)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@156515 91177308-0d34-0410-b5e6-96231b3b80d8
<rdar://problem/11291436>.
This is a second attempt at a fix for this, the first was r155468. Thanks
to Chandler, Bob and others for the feedback that helped me improve this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155866 91177308-0d34-0410-b5e6-96231b3b80d8
Original commit message:
Defer some shl transforms to DAGCombine.
The shl instruction is used to represent multiplication by a constant
power of two as well as bitwise left shifts. Some InstCombine
transformations would turn an shl instruction into a bit mask operation,
making it difficult for later analysis passes to recognize the
constsnt multiplication.
Disable those shl transformations, deferring them to DAGCombine time.
An 'shl X, C' instruction is now treated mostly the same was as 'mul X, C'.
These transformations are deferred:
(X >>? C) << C --> X & (-1 << C) (When X >> C has multiple uses)
(X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2) (When C2 > C1)
(X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2) (When C1 > C2)
The corresponding exact transformations are preserved, just like
div-exact + mul:
(X >>?,exact C) << C --> X
(X >>?,exact C1) << C2 --> X << (C2-C1)
(X >>?,exact C1) << C2 --> X >>?,exact (C1-C2)
The disabled transformations could also prevent the instruction selector
from recognizing rotate patterns in hash functions and cryptographic
primitives. I have a test case for that, but it is too fragile.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155362 91177308-0d34-0410-b5e6-96231b3b80d8
While the patch was perfect and defect free, it exposed a really nasty
bug in X86 SelectionDAG that caused an llc crash when compiling lencod.
I'll put the patch back in after fixing the SelectionDAG problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155181 91177308-0d34-0410-b5e6-96231b3b80d8
The shl instruction is used to represent multiplication by a constant
power of two as well as bitwise left shifts. Some InstCombine
transformations would turn an shl instruction into a bit mask operation,
making it difficult for later analysis passes to recognize the
constsnt multiplication.
Disable those shl transformations, deferring them to DAGCombine time.
An 'shl X, C' instruction is now treated mostly the same was as 'mul X, C'.
These transformations are deferred:
(X >>? C) << C --> X & (-1 << C) (When X >> C has multiple uses)
(X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2) (When C2 > C1)
(X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2) (When C1 > C2)
The corresponding exact transformations are preserved, just like
div-exact + mul:
(X >>?,exact C) << C --> X
(X >>?,exact C1) << C2 --> X << (C2-C1)
(X >>?,exact C1) << C2 --> X >>?,exact (C1-C2)
The disabled transformations could also prevent the instruction selector
from recognizing rotate patterns in hash functions and cryptographic
primitives. I have a test case for that, but it is too fragile.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155136 91177308-0d34-0410-b5e6-96231b3b80d8
GEPs, bit casts, and stores reaching it but no other instructions. These
often show up during the iterative processing of the inliner, SROA, and
DCE. Once we hit this point, we can completely remove the alloca. These
were actually showing up in the final, fully optimized code in a bunch
of inliner tests I've been working on, and notably they show up after
LLVM finishes optimizing away all function calls involved in
hash_combine(a, b).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154285 91177308-0d34-0410-b5e6-96231b3b80d8
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154011 91177308-0d34-0410-b5e6-96231b3b80d8
alignment. If that's the case, then we want to make sure that we don't increase
the alignment of the store instruction. Because if we increase it to be "more
aligned" than the pointer, code-gen may use instructions which require a greater
alignment than the pointer guarantees.
<rdar://problem/11043589>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152907 91177308-0d34-0410-b5e6-96231b3b80d8
Renamed methods caseBegin, caseEnd and caseDefault with case_begin, case_end, and case_default.
Added some notes relative to case iterators.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152532 91177308-0d34-0410-b5e6-96231b3b80d8
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120130/136146.html
Implemented CaseIterator and it solves almost all described issues: we don't need to mix operand/case/successor indexing anymore. Base iterator class is implemented as a template since it may be initialized either from "const SwitchInst*" or from "SwitchInst*".
ConstCaseIt is just a read-only iterator.
CaseIt is read-write iterator; it allows to change case successor and case value.
Usage of iterator allows totally remove resolveXXXX methods. All indexing convertions done automatically inside the iterator's getters.
Main way of iterator usage looks like this:
SwitchInst *SI = ... // intialize it somehow
for (SwitchInst::CaseIt i = SI->caseBegin(), e = SI->caseEnd(); i != e; ++i) {
BasicBlock *BB = i.getCaseSuccessor();
ConstantInt *V = i.getCaseValue();
// Do something.
}
If you want to convert case number to TerminatorInst successor index, just use getSuccessorIndex iterator's method.
If you want initialize iterator from TerminatorInst successor index, use CaseIt::fromSuccessorIndex(...) method.
There are also related changes in llvm-clients: klee and clang.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152297 91177308-0d34-0410-b5e6-96231b3b80d8
This transformation is not safe in some pathological cases (signed icmp of pointers should be an
extremely rare thing, but it's valid IR!). Add an explanatory comment.
Kudos to Duncan for pointing out this edge case (and not giving up explaining it until I finally got it).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151055 91177308-0d34-0410-b5e6-96231b3b80d8
- Ignore pointer casts.
- Also expand GEPs that aren't constantexprs when they have one use or only constant indices.
- We now compile "&foo[i] - &foo[j]" into "i - j".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@150961 91177308-0d34-0410-b5e6-96231b3b80d8
The purpose of refactoring is to hide operand roles from SwitchInst user (programmer). If you want to play with operands directly, probably you will need lower level methods than SwitchInst ones (TerminatorInst or may be User). After this patch we can reorganize SwitchInst operands and successors as we want.
What was done:
1. Changed semantics of index inside the getCaseValue method:
getCaseValue(0) means "get first case", not a condition. Use getCondition() if you want to resolve the condition. I propose don't mix SwitchInst case indexing with low level indexing (TI successors indexing, User's operands indexing), since it may be dangerous.
2. By the same reason findCaseValue(ConstantInt*) returns actual number of case value. 0 means first case, not default. If there is no case with given value, ErrorIndex will returned.
3. Added getCaseSuccessor method. I propose to avoid usage of TerminatorInst::getSuccessor if you want to resolve case successor BB. Use getCaseSuccessor instead, since internal SwitchInst organization of operands/successors is hidden and may be changed in any moment.
4. Added resolveSuccessorIndex and resolveCaseIndex. The main purpose of these methods is to see how case successors are really mapped in TerminatorInst.
4.1 "resolveSuccessorIndex" was created if you need to level down from SwitchInst to TerminatorInst. It returns TerminatorInst's successor index for given case successor.
4.2 "resolveCaseIndex" converts low level successors index to case index that curresponds to the given successor.
Note: There are also related compatability fix patches for dragonegg, klee, llvm-gcc-4.0, llvm-gcc-4.2, safecode, clang.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@149481 91177308-0d34-0410-b5e6-96231b3b80d8
Changing arguments from being passed as fixed to varargs is unsafe, as
the ABI may require they be handled differently (stack vs. register, for
example).
Remove two tests which rely on the bitcast being folded into the direct
call, which is exactly the transformation that's unsafe.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@149457 91177308-0d34-0410-b5e6-96231b3b80d8
Problem: LLVM needs more function attributes than currently available (32 bits).
One such proposed attribute is "address_safety", which shows that a function is being checked for address safety (by AddressSanitizer, SAFECode, etc).
Solution:
- extend the Attributes from 32 bits to 64-bits
- wrap the object into a class so that unsigned is never erroneously used instead
- change "unsigned" to "Attributes" throughout the code, including one place in clang.
- the class has no "operator uint64 ()", but it has "uint64_t Raw() " to support packing/unpacking.
- the class has "safe operator bool()" to support the common idiom: if (Attributes attr = getAttrs()) useAttrs(attr);
- The CTOR from uint64_t is marked explicit, so I had to add a few explicit CTOR calls
- Add the new attribute "address_safety". Doing it in the same commit to check that attributes beyond first 32 bits actually work.
- Some of the functions from the Attribute namespace are worth moving inside the class, but I'd prefer to have it as a separate commit.
Tested:
"make check" on Linux (32-bit and 64-bit) and Mac (10.6)
built/run spec CPU 2006 on Linux with clang -O2.
This change will break clang build in lib/CodeGen/CGCall.cpp.
The following patch will fix it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@148553 91177308-0d34-0410-b5e6-96231b3b80d8
We still save an instruction when just the "and" part is replaced.
Also change the code to match comments more closely.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147753 91177308-0d34-0410-b5e6-96231b3b80d8
This was intended to undo the sub canonicalization in cases where it's not profitable, but it also
finds some cases on it's own.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147256 91177308-0d34-0410-b5e6-96231b3b80d8
This has the obvious advantage of being commutable and is always a win on x86 because
const - x wastes a register there. On less weird architectures this may lead to
a regression because other arithmetic doesn't fuse with it anymore. I'll address that
problem in a followup.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147254 91177308-0d34-0410-b5e6-96231b3b80d8
"half precision" floating-point with a first-class type.
This patch adds basic IR support (but not codegen support).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146786 91177308-0d34-0410-b5e6-96231b3b80d8
combining of the landingpad instruction. The ObjC personality function acts
almost identically to the C++ personality function. In particular, it uses
"null" as a "catch-all" value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142256 91177308-0d34-0410-b5e6-96231b3b80d8
profile metadata at the same time. Use it to preserve metadata attached
to a branch when re-writing it in InstCombine.
Add metadata to the canonicalize_branch InstCombine test, and check that
it is tranformed correctly.
Reviewed by Nick Lewycky!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142168 91177308-0d34-0410-b5e6-96231b3b80d8
Just pull the instruction name, but don't change the order of anything
else. That keeps --debug happy and non-crashing, but doesn't change
how the worklist gets built.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@141210 91177308-0d34-0410-b5e6-96231b3b80d8
When updating the worklist for InstCombine, the Add/AddUsersToWorklist
functions may access the instruction(s) being added, for debug output for
example. If the instructions aren't yet added to the basic block, this
can result in a crash. Finish the instruction transformation before
adjusting the worklist instead.
rdar://10238555
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@141203 91177308-0d34-0410-b5e6-96231b3b80d8
catch or repeated filter clauses. Teach instcombine a bunch
of tricks for simplifying landingpad clauses. Currently the
code only recognizes the GNU C++ and Ada personality functions,
but that doesn't stop it doing a bunch of "generic" transforms
which are hopefully fine for any real-world personality function.
If these "generic" transforms turn out not to be generic, they
can always be conditioned on the personality function. Probably
someone should add the ObjC++ personality function. I didn't as
I don't know anything about it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@140852 91177308-0d34-0410-b5e6-96231b3b80d8
init.trampoline and adjust.trampoline intrinsics, into two intrinsics
like in GCC. While having one combined intrinsic is tempting, it is
not natural because typically the trampoline initialization needs to
be done in one function, and the result of adjust trampoline is needed
in a different (nested) function. To get around this llvm-gcc hacks the
nested function lowering code to insert an additional parent variable
holding the adjust.trampoline result that can be accessed from the child
function. Dragonegg doesn't have the luxury of tweaking GCC code, so it
stored the result of adjust.trampoline in the memory GCC set aside for
the trampoline itself (this is always available in the child function),
and set up some new memory (using an alloca) to hold the trampoline.
Unfortunately this breaks Go which allocates trampoline memory on the
heap and wants to use it even after the parent has exited (!). Rather
than doing even more hacks to get Go working, it seemed best to just use
two intrinsics like in GCC. Patch mostly by Sanjoy Das.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139140 91177308-0d34-0410-b5e6-96231b3b80d8
Optimize chained bitcasts of the form A->B->A.
Undo r138722 and change isEliminableCastPair to allow this case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138756 91177308-0d34-0410-b5e6-96231b3b80d8
- use SmallVectorImpl& for the function argument.
- ignore the operands on the GEP, even if they aren't constant! Much as we
pretend the malloc succeeds, we pretend that malloc + whatever-you-GEP'd-by
is not null. It's magic!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136757 91177308-0d34-0410-b5e6-96231b3b80d8
Don't replace a gep/bitcast with 'undef' because that will form a "free(undef)"
which in turn means "unreachable". What we wanted was a no-op. Instead, analyze
the whole tree and look for all the instructions we need to delete first, then
delete them second, not relying on the use_list to stay consistent.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136752 91177308-0d34-0410-b5e6-96231b3b80d8
This adds the 'resume' instruction class, IR parsing, and bitcode reading and
writing. The 'resume' instruction resumes propagation of an existing (in-flight)
exception whose unwinding was interrupted with a 'landingpad' instruction (to be
added later).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136589 91177308-0d34-0410-b5e6-96231b3b80d8
specified in the same file that the library itself is created. This is
more idiomatic for CMake builds, and also allows us to correctly specify
dependencies that are missed due to bugs in the GenLibDeps perl script,
or change from compiler to compiler. On Linux, this returns CMake to
a place where it can relably rebuild several targets of LLVM.
I have tried not to change the dependencies from the ones in the current
auto-generated file. The only places I've really diverged are in places
where I was seeing link failures, and added a dependency. The goal of
this patch is not to start changing the dependencies, merely to move
them into the correct location, and an explicit form that we can control
and change when necessary.
This also removes a serialization point in the build because we don't
have to scan all the libraries before we begin building various tools.
We no longer have a step of the build that regenerates a file inside the
source tree. A few other associated cleanups fall out of this.
This isn't really finished yet though. After talking to dgregor he urged
switching to a single CMake macro to construct libraries with both
sources and dependencies in the arguments. Migrating from the two macros
to that style will be a follow-up patch.
Also, llvm-config is still generated with GenLibDeps.pl, which means it
still has slightly buggy dependencies. The internal CMake
'llvm-config-like' macro uses the correct explicitly specified
dependencies however. A future patch will switch llvm-config generation
(when using CMake) to be based on these deps as well.
This may well break Windows. I'm getting a machine set up now to dig
into any failures there. If anyone can chime in with problems they see
or ideas of how to solve them for Windows, much appreciated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136433 91177308-0d34-0410-b5e6-96231b3b80d8
an assert on Darwin llvm-gcc builds.
Assertion failed: (castIsValid(op, S, Ty) && "Invalid cast!"), function Create, file /Users/buildslave/zorg/buildbot/smooshlab/slave-0.8/build.llvm-gcc-i386-darwin9-RA/llvm.src/lib/VMCore/Instructions.cpp, li\
ne 2067.
etc.
http://smooshlab.apple.com:8013/builders/llvm-gcc-i386-darwin9-RA/builds/2354
--- Reverse-merging r134893 into '.':
U include/llvm/Target/TargetData.h
U include/llvm/DerivedTypes.h
U tools/bugpoint/ExtractFunction.cpp
U unittests/Support/TypeBuilderTest.cpp
U lib/Target/ARM/ARMGlobalMerge.cpp
U lib/Target/TargetData.cpp
U lib/VMCore/Constants.cpp
U lib/VMCore/Type.cpp
U lib/VMCore/Core.cpp
U lib/Transforms/Utils/CodeExtractor.cpp
U lib/Transforms/Instrumentation/ProfilingUtils.cpp
U lib/Transforms/IPO/DeadArgumentElimination.cpp
U lib/CodeGen/SjLjEHPrepare.cpp
--- Reverse-merging r134888 into '.':
G include/llvm/DerivedTypes.h
U include/llvm/Support/TypeBuilder.h
U include/llvm/Intrinsics.h
U unittests/Analysis/ScalarEvolutionTest.cpp
U unittests/ExecutionEngine/JIT/JITTest.cpp
U unittests/ExecutionEngine/JIT/JITMemoryManagerTest.cpp
U unittests/VMCore/PassManagerTest.cpp
G unittests/Support/TypeBuilderTest.cpp
U lib/Target/MBlaze/MBlazeIntrinsicInfo.cpp
U lib/Target/Blackfin/BlackfinIntrinsicInfo.cpp
U lib/VMCore/IRBuilder.cpp
G lib/VMCore/Type.cpp
U lib/VMCore/Function.cpp
G lib/VMCore/Core.cpp
U lib/VMCore/Module.cpp
U lib/AsmParser/LLParser.cpp
U lib/Transforms/Utils/CloneFunction.cpp
G lib/Transforms/Utils/CodeExtractor.cpp
U lib/Transforms/Utils/InlineFunction.cpp
U lib/Transforms/Instrumentation/GCOVProfiling.cpp
U lib/Transforms/Scalar/ObjCARC.cpp
U lib/Transforms/Scalar/SimplifyLibCalls.cpp
U lib/Transforms/Scalar/MemCpyOptimizer.cpp
G lib/Transforms/IPO/DeadArgumentElimination.cpp
U lib/Transforms/IPO/ArgumentPromotion.cpp
U lib/Transforms/InstCombine/InstCombineCompares.cpp
U lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
U lib/Transforms/InstCombine/InstCombineCalls.cpp
U lib/CodeGen/DwarfEHPrepare.cpp
U lib/CodeGen/IntrinsicLowering.cpp
U lib/Bitcode/Reader/BitcodeReader.cpp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@134949 91177308-0d34-0410-b5e6-96231b3b80d8
This tightens up checking for overflow in alloca sizes, based on feedback
from Duncan and John about the change in r132926.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@134749 91177308-0d34-0410-b5e6-96231b3b80d8
all over the place in different styles and variants. Standardize on two
preferred entrypoints: one that takes a StructType and ArrayRef, and one that
takes StructType and varargs.
In cases where there isn't a struct type convenient, we now add a
ConstantStruct::getAnon method (whose name will make more sense after a few
more patches land).
It would be "really really nice" if the ConstantStruct::get and
ConstantVector::get methods didn't make temporary std::vectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@133412 91177308-0d34-0410-b5e6-96231b3b80d8
might overflow. Re-typing the alloca to a larger type (e.g. double)
hoists a shift into the alloca, potentially exposing overflow in the
expression. rdar://problem/9265821
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132926 91177308-0d34-0410-b5e6-96231b3b80d8
crc32.[8|16|32] have been renamed to .crc32.32.[8|16|32] and
crc64.[8|16|32] have been renamed to .crc32.64.[8|64].
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@132163 91177308-0d34-0410-b5e6-96231b3b80d8
It's better to do this in codegen, mul.with.overflow(X, 2) is more canonical because it has only one use on "X".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@131798 91177308-0d34-0410-b5e6-96231b3b80d8
As an example, the change to InstCombineCalls catches a common case where a call to a bitcast of a function is rewritten.
Chris, does this approach look reasonable?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@131516 91177308-0d34-0410-b5e6-96231b3b80d8
This automagically provides a transform noticed by my super-optimizer
as occurring quite often: "rem x, (select cond, x, 1)" -> 0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@130694 91177308-0d34-0410-b5e6-96231b3b80d8
This obviously helps a lot if the division would be turned into a libcall
(think i64 udiv on i386), but div is also one of the few remaining instructions
on modern CPUs that become more expensive when the bitwidth gets bigger.
This also helps register pressure on i386 when dividing chars, divb needs
two 8-bit parts of a 16 bit register as input where divl uses two registers.
int foo(unsigned char a) { return a/10; }
int bar(unsigned char a, unsigned char b) { return a/b; }
compiles into (x86_64)
_foo:
imull $205, %edi, %eax
shrl $11, %eax
ret
_bar:
movzbl %dil, %eax
divb %sil, %al
movzbl %al, %eax
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@130615 91177308-0d34-0410-b5e6-96231b3b80d8
This shouldn't happen in practice because the icmp would be a constant.
Add a check so we don't miscompile code if something goes wrong.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@130446 91177308-0d34-0410-b5e6-96231b3b80d8
effective in avoiding recomputation of LCSSA form; the widespread
use of instsimplify (which looks through phi nodes) means it was
not preserving LCSSA form anyway; and instcombine is no longer
scheduled in the middle of the loop passes so this doesn't matter
anymore.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@130301 91177308-0d34-0410-b5e6-96231b3b80d8
when X has multiple uses. This is useful for exposing secondary optimizations,
but the X86 backend isn't ready for this when X has a single use. For example,
this can disable load folding.
This is inching towards resolving PR6627.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@130238 91177308-0d34-0410-b5e6-96231b3b80d8
canonical, and generally leads to better code. Found while looking at
an article about saturating arithmetic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129545 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we have a first-class way to represent unaligned loads, the unaligned
load intrinsics are superfluous.
First part of <rdar://problem/8460511>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129401 91177308-0d34-0410-b5e6-96231b3b80d8
space info. We crash with an assert in this case. This change checks that the
address space of the bitcasted pointer is the same as the gep ptr.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128884 91177308-0d34-0410-b5e6-96231b3b80d8
It's possible to craft an input that hits the recursion limits in a way
that SimplifyDemandedBits doesn't simplify the icmp but ComputeMaskedBits
can infer which bits are zero.
No test case as it depends on too many other things. Fixes PR9609.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128777 91177308-0d34-0410-b5e6-96231b3b80d8
- Localize the check if an icmp has one use to a place where we know we're
introducing something that's likely more expensive than a sext from i1.
- Add an assert to make sure a case that would lead to a miscompilation is
folded away earlier.
- Fix a typo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128744 91177308-0d34-0410-b5e6-96231b3b80d8
removes one use of X which helps it pass the many hasOneUse() checks.
In my analysis, this turns up very often where X = A >>exact B and that can't be
simplified unless X has one use (except by increasing the lifetime of A which is
generally a performance loss).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128373 91177308-0d34-0410-b5e6-96231b3b80d8
load and store reference same memory location, the memory location
is represented by getelementptr with two uses (load and store) and
the getelementptr's base is alloca with single use. At this point,
instructions from alloca to store can be removed.
(this pattern is generated when bitfield is accessed.)
For example,
%u = alloca %struct.test, align 4 ; [#uses=1]
%0 = getelementptr inbounds %struct.test* %u, i32 0, i32 0;[#uses=2]
%1 = load i8* %0, align 4 ; [#uses=1]
%2 = and i8 %1, -16 ; [#uses=1]
%3 = or i8 %2, 5 ; [#uses=1]
store i8 %3, i8* %0, align 4
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127565 91177308-0d34-0410-b5e6-96231b3b80d8
This happens a lot in clang-compiled C++ code because it adds overflow checks to operator new[]:
unsigned *foo(unsigned n) { return new unsigned[n]; }
We can optimize away the overflow check on 64 bit targets because (uint64_t)n*4 cannot overflow.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127418 91177308-0d34-0410-b5e6-96231b3b80d8
the value splatted into every element. Extend this to getTrue and getFalse which
by providing new overloads that take Types that are either i1 or <N x i1>. Use
it in InstCombine to add vector support to some code, fixing PR8469!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127116 91177308-0d34-0410-b5e6-96231b3b80d8
possible. This goes into instcombine and instsimplify because instsimplify
doesn't need to check hasOneUse since it returns (almost exclusively) constants.
This fixes PR9343 #4#5 and #8!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127064 91177308-0d34-0410-b5e6-96231b3b80d8
intersection of the LHS and RHS ConstantRanges and return "false" when
the range is empty.
This simplifies some code and catches some extra cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126744 91177308-0d34-0410-b5e6-96231b3b80d8
function prototype into a call to a varargs prototype. We do
allow the xform if we have a definition, but otherwise we don't
want to risk that we're changing the abi in a subtle way. On
X86-64, for example, varargs require passing stuff in %al.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126363 91177308-0d34-0410-b5e6-96231b3b80d8
We usually catch this kind of optimization through InstSimplify's distributive
magic, but or doesn't distribute over xor in general.
"A | ~(A | B) -> A | ~B" hits 24 times on gcc.c.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126081 91177308-0d34-0410-b5e6-96231b3b80d8
variations (some of these were already present so I unified the code). Spotted by my
auto-simplifier as occurring a lot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125734 91177308-0d34-0410-b5e6-96231b3b80d8
unsigned overflow (e.g. due to a negative array index), but
the scales on array size multiplications are known to not
sign wrap.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125409 91177308-0d34-0410-b5e6-96231b3b80d8
gep to explicit addressing, we know that none of the intermediate
computation overflows.
This could use review: it seems that the shifts certainly wouldn't
overflow, but could the intermediate adds overflow if there is a
negative index?
Previously the testcase would instcombine to:
define i1 @test(i64 %i) {
%p1.idx.mask = and i64 %i, 4611686018427387903
%cmp = icmp eq i64 %p1.idx.mask, 1000
ret i1 %cmp
}
now we get:
define i1 @test(i64 %i) {
%cmp = icmp eq i64 %i, 1000
ret i1 %cmp
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125271 91177308-0d34-0410-b5e6-96231b3b80d8
exact/nsw/nuw shifts and have instcombine infer them when it can prove
that the relevant properties are true for a given shift without them.
Also, a variety of refactoring to use the new patternmatch logic thrown
in for good luck. I believe that this takes care of a bunch of related
code quality issues attached to PR8862.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125267 91177308-0d34-0410-b5e6-96231b3b80d8
optimizations to be much more aggressive in the face of
exact/nsw/nuw div and shifts. For example, these (which
are the same except the first is 'exact' sdiv:
define i1 @sdiv_icmp4_exact(i64 %X) nounwind {
%A = sdiv exact i64 %X, -5 ; X/-5 == 0 --> x == 0
%B = icmp eq i64 %A, 0
ret i1 %B
}
define i1 @sdiv_icmp4(i64 %X) nounwind {
%A = sdiv i64 %X, -5 ; X/-5 == 0 --> x == 0
%B = icmp eq i64 %A, 0
ret i1 %B
}
compile down to:
define i1 @sdiv_icmp4_exact(i64 %X) nounwind {
%1 = icmp eq i64 %X, 0
ret i1 %1
}
define i1 @sdiv_icmp4(i64 %X) nounwind {
%X.off = add i64 %X, 4
%1 = icmp ult i64 %X.off, 9
ret i1 %1
}
This happens when you do something like:
(ptr1-ptr2) == 42
where the pointers are pointers to non-unit types.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125266 91177308-0d34-0410-b5e6-96231b3b80d8
and generally tidying things up. Only very trivial functionality changes
like now doing (-1 - A) -> (~A) for vectors too.
InstCombineAddSub.cpp | 296 +++++++++++++++++++++-----------------------------
1 file changed, 126 insertions(+), 170 deletions(-)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125264 91177308-0d34-0410-b5e6-96231b3b80d8
versions of creation functions. Eventually, the "insertion point" versions
of these should just be removed, we do have IRBuilder afterall.
Do a massive rewrite of much of pattern match. It is now shorter and less
redundant and has several other widgets I will be using in other patches.
Among other changes, m_Div is renamed to m_IDiv (since it only matches
integer divides) and m_Shift is gone (it used to match all binops!!) and
we now have m_LogicalShift for the one client to use.
Enhance IRBuilder to have "isExact" arguments to things like CreateUDiv
and reduce redundancy within IRbuilder by having these methods chain to
each other more instead of duplicating code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@125194 91177308-0d34-0410-b5e6-96231b3b80d8
reassociation. No testcase, because I wasn't able to create a testcase
which actually demonstrates a problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124713 91177308-0d34-0410-b5e6-96231b3b80d8
benchmarks, and that it can be simplified to X/Y. (In general you can only
simplify (Z*Y)/Y to Z if the multiplication did not overflow; if Z has the
form "X/Y" then this is the case). This patch implements that transform and
moves some Div logic out of instcombine and into InstructionSimplify.
Unfortunately instcombine gets in the way somewhat, since it likes to change
(X/Y)*Y into X-(X rem Y), so I had to teach instcombine about this too.
Finally, thanks to the NSW/NUW flags, sometimes we know directly that "Z*Y"
does not overflow, because the flag says so, so I added that logic too. This
eliminates a bunch of divisions and subtractions in 447.dealII, and has good
effects on some other benchmarks too. It seems to have quite an effect on
tramp3d-v4 but it's hard to say if it's good or bad because inlining decisions
changed, resulting in massive changes all over.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124487 91177308-0d34-0410-b5e6-96231b3b80d8
clang's -Wuninitialized-experimental warning.
While these don't look like real bugs, clang's
-Wuninitialized-experimental analysis is stricter
than GCC's, and these fixes have the benefit
of being general nice cleanups.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124073 91177308-0d34-0410-b5e6-96231b3b80d8
A == B, and A > B, does not mean we can fold it to true. We still need to
check for A ? B (A unordered B).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123993 91177308-0d34-0410-b5e6-96231b3b80d8
a select. A vector select is pairwise on each element so we'd need a new
condition with the right number of elements to select on. Fixes PR8994.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123963 91177308-0d34-0410-b5e6-96231b3b80d8
auto-simplier the transform most missed by early-cse is (zext X) != 0 -> X != 0.
This patch adds this transform and some related logic to InstructionSimplify
and removes some of the logic from instcombine (unfortunately not all because
there are several situations in which instcombine can improve things by making
new instructions, whereas instsimplify is not allowed to do this). At -O2 this
often results in more than 15% more simplifications by early-cse, and results in
hundreds of lines of bitcode being eliminated from the testsuite. I did see some
small negative effects in the testsuite, for example a few additional instructions
in three programs. One program, 483.xalancbmk, got an additional 35 instructions,
which seems to be due to a function getting an additional instruction and then
being inlined all over the place.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123911 91177308-0d34-0410-b5e6-96231b3b80d8
multiple uses. In some cases, all the uses are the same operation,
so instcombine can go ahead and promote the phi. In the testcase
this pushes an add out of the loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123568 91177308-0d34-0410-b5e6-96231b3b80d8
While there, I noticed that the transform "undef >>a X -> undef" was wrong.
For example if X is 2 then the top two bits must be equal, so the result can
not be anything. I fixed this in the constant folder as well. Also, I made
the transform for "X << undef" stronger: it now folds to undef always, even
though X might be zero. This is in accordance with the LangRef, but I must
admit that it is fairly aggressive. Also, I added "i32 X << 32 -> undef"
following the LangRef and the constant folder, likewise fairly aggressive.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123417 91177308-0d34-0410-b5e6-96231b3b80d8
X = sext x; x >s c ? X : C+1 --> X = sext x; X <s C+1 ? C+1 : X
X = sext x; x <s c ? X : C-1 --> X = sext x; X >s C-1 ? C-1 : X
X = zext x; x >u c ? X : C+1 --> X = zext x; X <u C+1 ? C+1 : X
X = zext x; x <u c ? X : C-1 --> X = zext x; X >u C-1 ? C-1 : X
X = sext x; x >u c ? X : C+1 --> X = sext x; X <u C+1 ? C+1 : X
X = sext x; x <u c ? X : C-1 --> X = sext x; X >u C-1 ? C-1 : X
Instead of calculating this with mixed types promote all to the
larger type. This enables scalar evolution to analyze this
expression. PR8866
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123034 91177308-0d34-0410-b5e6-96231b3b80d8
if both A op B and A op C simplify. This fires fairly often but doesn't
make that much difference. On gcc-as-one-file it removes two "and"s and
turns one branch into a select.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122399 91177308-0d34-0410-b5e6-96231b3b80d8
This resolves a README entry and technically resolves PR4916,
but we still get poor code for the testcase in that PR because
GVN isn't CSE'ing uadd with add, filed as PR8817.
Previously we got:
_test7: ## @test7
addq %rsi, %rdi
cmpq %rdi, %rsi
movl $42, %eax
cmovaq %rsi, %rax
ret
Now we get:
_test7: ## @test7
addq %rsi, %rdi
movl $42, %eax
cmovbq %rsi, %rax
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122182 91177308-0d34-0410-b5e6-96231b3b80d8