The following transforms are valid if -C is a power of 2:
(icmp ugt (xor X, C), ~C) -> (icmp ult X, C)
(icmp ult (xor X, C), -C) -> (icmp uge X, C)
These are nice, they get rid of the xor.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185915 91177308-0d34-0410-b5e6-96231b3b80d8
Commit 185883 fixes a bug in the IRBuilder that should fix the ASan bot. AssertingVH can help in exposing some RAUW problems.
Thanks Ben and Alexey!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185886 91177308-0d34-0410-b5e6-96231b3b80d8
Back in r179493 we determined that two transforms collided with each
other. The fix back then was to reorder the transforms so that the
preferred transform would give it a try and then we would try the
secondary transform. However, it was noted that the best approach would
canonicalize one transform into the other, removing the collision and
allowing us to optimize IR given to us in that form.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185808 91177308-0d34-0410-b5e6-96231b3b80d8
This is a complete re-write if the bottom-up vectorization class.
Before this commit we scanned the instruction tree 3 times. First in search of merge points for the trees. Second, for estimating the cost. And finally for vectorization.
There was a lot of code duplication and adding the DCE exposed bugs. The new design is simpler and DCE was a part of the design.
In this implementation we build the tree once. After that we estimate the cost by scanning the different entries in the constructed tree (in any order). The vectorization phase also works on the built tree.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185774 91177308-0d34-0410-b5e6-96231b3b80d8
This is the first patch in a series of 3 patches which clean up how we create
runtime function declarations in the ARC optimizer when they do not exist
already in the IR.
Currently we have a bunch of duplicated code in ObjCARCOpts, ObjCARCContract
that does this. This patch refactors that code into a separate class called
ARCRuntimeEntryPoints which lazily creates the declarations for said
entrypoints.
The next two patches will consist of the work of refactoring
ObjCARCContract/ObjCARCOpts to use this new code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185740 91177308-0d34-0410-b5e6-96231b3b80d8
functions. Make the function attributes pass add it to known library functions
and when it can deduce it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185735 91177308-0d34-0410-b5e6-96231b3b80d8
This transform was originally added in r185257 but later removed in
r185415. The original transform would create instructions speculatively
and then discard them if the speculation was proved incorrect. This has
been replaced with a scheme that splits the transform into two parts:
preflight and fold. While we preflight, we build up fold actions that
inform the folding stage on how to act.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185667 91177308-0d34-0410-b5e6-96231b3b80d8
This allows us to create switches even if instcombine has munged two of the
incombing compares into one and some bit twiddling. This was motivated by enum
compares that are common in clang.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185632 91177308-0d34-0410-b5e6-96231b3b80d8
This changes behavior of -msan-poison-stack=0 flag from not poisoning stack
allocations to actively unpoisoning them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185538 91177308-0d34-0410-b5e6-96231b3b80d8
This implies annotating it as nounwind and its arguments as nocapture. To be
conservative, we do not annotate the arguments with noalias since some platforms
do not have restrict on the declaration for gettimeofday.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185502 91177308-0d34-0410-b5e6-96231b3b80d8
I'm reverting this commit because:
1. As discussed during review, it needs to be rewritten (to avoid creating and
then deleting instructions).
2. This is causing optimizer crashes. Specifically, I'm seeing things like
this:
While deleting: i1 %
Use still stuck around after Def is destroyed: <badref> = select i1 <badref>, i32 0, i32 1
opt: /src/llvm-trunk/lib/IR/Value.cpp:79: virtual llvm::Value::~Value(): Assertion `use_empty() && "Uses remain when a value is destroyed!"' failed.
I'd guess that these will go away once we're no longer creating/deleting
instructions here, but just in case, I'm adding a regression test.
Because the code is bring rewritten, I've just XFAIL'd the original regression test. Original commit message:
InstCombine: Be more agressive optimizing 'udiv' instrs with 'select' denoms
Real world code sometimes has the denominator of a 'udiv' be a
'select'. LLVM can handle such cases but only when the 'select'
operands are symmetric in structure (both select operands are a constant
power of two or a left shift, etc.). This falls apart if we are dealt a
'udiv' where the code is not symetric or if the select operands lead us
to more select instructions.
Instead, we should treat the LHS and each select operand as a distinct
divide operation and try to optimize them independently. If we can
to simplify each operation, then we can replace the 'udiv' with, say, a
'lshr' that has a new select with a bunch of new operands for the
select.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185415 91177308-0d34-0410-b5e6-96231b3b80d8
No functionality change. It should suffice to check the type of a debug info
metadata, instead of calling Verify.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185383 91177308-0d34-0410-b5e6-96231b3b80d8
Math functions are mark as readonly because they read the floating point
rounding mode. Because we don't vectorize loops that would contain function
calls that set the rounding mode it is safe to ignore this memory read.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185299 91177308-0d34-0410-b5e6-96231b3b80d8
Inserting a zext or trunc is sufficient. This pattern is somewhat common in
LLVM's pointer mangling code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185270 91177308-0d34-0410-b5e6-96231b3b80d8
Changing the sign when comparing the base pointer would introduce all
sorts of unexpected things like:
%gep.i = getelementptr inbounds [1 x i8]* %a, i32 0, i32 0
%gep2.i = getelementptr inbounds [1 x i8]* %b, i32 0, i32 0
%cmp.i = icmp ult i8* %gep.i, %gep2.i
%cmp.i1 = icmp ult [1 x i8]* %a, %b
%cmp = icmp ne i1 %cmp.i, %cmp.i1
ret i1 %cmp
into:
%cmp.i = icmp slt [1 x i8]* %a, %b
%cmp.i1 = icmp ult [1 x i8]* %a, %b
%cmp = xor i1 %cmp.i, %cmp.i1
ret i1 %cmp
By preserving the original sign, we now get:
ret i1 false
This fixes PR16483.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185259 91177308-0d34-0410-b5e6-96231b3b80d8
Real world code sometimes has the denominator of a 'udiv' be a
'select'. LLVM can handle such cases but only when the 'select'
operands are symmetric in structure (both select operands are a constant
power of two or a left shift, etc.). This falls apart if we are dealt a
'udiv' where the code is not symetric or if the select operands lead us
to more select instructions.
Instead, we should treat the LHS and each select operand as a distinct
divide operation and try to optimize them independently. If we can
to simplify each operation, then we can replace the 'udiv' with, say, a
'lshr' that has a new select with a bunch of new operands for the
select.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185257 91177308-0d34-0410-b5e6-96231b3b80d8
We may, after other optimizations, find ourselves with IR that looks
like:
%shl = shl i32 1, %y
%cmp = icmp ult i32 %shl, 32
Instead, we should just compare the shift count:
%cmp = icmp ult i32 %y, 5
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185242 91177308-0d34-0410-b5e6-96231b3b80d8
To support this we have to insert 'extractelement' instructions to pick the right lane.
We had this functionality before but I removed it when we moved to the multi-block design because it was too complicated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185230 91177308-0d34-0410-b5e6-96231b3b80d8
In this code we keep track of pointers that we are allowed to read from, if they are accessed by non-predicated blocks.
We use this list to allow vectorization of conditional loads in predicated blocks because we know that these addresses don't segfault.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185214 91177308-0d34-0410-b5e6-96231b3b80d8
- Build debug metadata for 'bare' Modules using DIBuilder
- DebugIR can be constructed to generate an IR file (to be seen by a debugger)
or not in cases where the user already has an IR file on disk.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185193 91177308-0d34-0410-b5e6-96231b3b80d8
I used the class to safely reset the state of the builder's debug location. I
think I have caught all places where we need to set the debug location to a new
one. Therefore, we can replace the class by a function that just sets the debug
location.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185165 91177308-0d34-0410-b5e6-96231b3b80d8
No functionality change.
It should suffice to check the type of a debug info metadata, instead of
calling Verify. For cases where we know the type of a DI metadata, use
assert.
Also update testing cases to make them conform to the format of DI classes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185135 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r185099.
Looks like both the ppc-64 and mips bots are still failing after I reverted this
change.
Since:
1. The mips bot always performs a clean build,
2. The ppc64-bot failed again after a clean build (I asked the ppc-64
maintainers to clean the bot which they did... Thanks Will!),
I think it is safe to assume that this change was not the cause of the failures
that said builders were seeing. Thus I am recomitting.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185111 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r185095. This is causing a FileCheck failure on
the 3dnow intrinsics on at least the mips/ppc bots but not on the x86
bots.
Reverting while I figure out what is going on.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185099 91177308-0d34-0410-b5e6-96231b3b80d8
The category which an APFloat belongs to should be dependent on the
actual value that the APFloat has, not be arbitrarily passed in by the
user. This will prevent inconsistency bugs where the category and the
actual value in APFloat differ.
I also fixed up all of the references to this constructor (which were
only in LLVM).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185095 91177308-0d34-0410-b5e6-96231b3b80d8
Use vectorized instruction instead of original instruction anchored in the
original loop.
Fixes PR16452 and t2075.c of PR16455.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185081 91177308-0d34-0410-b5e6-96231b3b80d8
When we store values for reversed induction stores we must not store the
reversed value in the vectorized value map. Another instruction might use this
value.
This fixes 3 test cases of PR16455.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185051 91177308-0d34-0410-b5e6-96231b3b80d8
The Builtin attribute is an attribute that can be placed on function call site that signal that even though a function is declared as being a builtin,
rdar://problem/13727199
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185049 91177308-0d34-0410-b5e6-96231b3b80d8
No functionality change.
It should suffice to check the type of a debug info metadata, instead of
calling Verify.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185020 91177308-0d34-0410-b5e6-96231b3b80d8
debug statements to add a missing newline. Also canonicalize to '\n' instead of
"\n"; the latter calls a function with a loop the former does not.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184897 91177308-0d34-0410-b5e6-96231b3b80d8
When a 1-element vector alloca is promoted, a store instruction can often be
rewritten without converting the value to a scalar and using an insertelement
instruction to stuff it into the new alloca. This patch just adds a check
to skip that conversion when it is unnecessary. This turns out to be really
important for some ARM Neon operations where <1 x i64> is used to get around
the fact that i64 is not a legal type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184870 91177308-0d34-0410-b5e6-96231b3b80d8
This should hopefully have fixed the stage2/stage3 miscompare on the dragonegg
testers.
"LoopVectorize: Use the dependence test utility class
We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.
We can now vectorize loops with simple constant dependence distances.
for (i = 8; i < 256; ++i) {
a[i] = a[i+4] * a[i+8];
}
for (i = 8; i < 256; ++i) {
a[i] = a[i-4] * a[i-8];
}
We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.
I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.
radar://13681598"
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184724 91177308-0d34-0410-b5e6-96231b3b80d8
CGSCC pass manager. This should insulate the inlining decisions from the
vectorization decisions, however it may have both compile time and code
size problems so it is just an experimental option right now.
Adding this based on a discussion with Arnold and it seems at least
worth having this flag for us to both run some experiments to see if
this strategy is workable. It may solve some of the regressions seen
with the loop vectorizer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184698 91177308-0d34-0410-b5e6-96231b3b80d8
We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.
We can now vectorize loops with simple constant dependence distances.
for (i = 8; i < 256; ++i) {
a[i] = a[i+4] * a[i+8];
}
for (i = 8; i < 256; ++i) {
a[i] = a[i-4] * a[i-8];
}
We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.
I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.
radar://13681598
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184685 91177308-0d34-0410-b5e6-96231b3b80d8
This class checks dependences by subtracting two Scalar Evolution access
functions allowing us to catch very simple linear dependences.
The checker assumes source order in determining whether vectorization is safe.
We currently don't reorder accesses.
Positive true dependencies need to be a multiple of VF otherwise we impede
store-load forwarding.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184684 91177308-0d34-0410-b5e6-96231b3b80d8
Sets of dependent accesses are built by unioning sets based on underlying
objects. This class will be used by the upcoming dependence checker.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184683 91177308-0d34-0410-b5e6-96231b3b80d8
Untill now we detected the vectorizable tree and evaluated the cost of the
entire tree. With this patch we can decide to trim-out branches of the tree
that are not profitable to vectorizer.
Also, increase the max depth from 6 to 12. In the worse possible case where all
of the code is made of diamond-shaped graph this can bring the cost to 2**10,
but diamonds are not very common.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184681 91177308-0d34-0410-b5e6-96231b3b80d8
The RAII builder location guard is saving a reference to instructions, so we can't erase instructions during vectorization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184671 91177308-0d34-0410-b5e6-96231b3b80d8
Rewrote the SLP-vectorization as a whole-function vectorization pass. It is now able to vectorize chains across multiple basic blocks.
It still does not vectorize PHIs, but this should be easy to do now that we scan the entire function.
I removed the support for extracting values from trees.
We are now able to vectorize more programs, but there are some serious regressions in many workloads (such as flops-6 and mandel-2).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184647 91177308-0d34-0410-b5e6-96231b3b80d8
This is apart of a series of patches to encapsulate PtrState.RRI and
make PtrState.RRI a private field of PtrState.
*NOTE* This is actually the second commit in the patch stream. I should
have put this note on the first such commit r184528.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184532 91177308-0d34-0410-b5e6-96231b3b80d8
This commit completely removes what is left of the simplify-libcalls
pass. All of the functionality has now been migrated to the instcombine
and functionattrs passes. The following C API functions are now NOPs:
1. LLVMAddSimplifyLibCallsPass
2. LLVMPassManagerBuilderSetDisableSimplifyLibCalls
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184459 91177308-0d34-0410-b5e6-96231b3b80d8
We collect gather sequences when we vectorize basic blocks. Gather sequences are excellent
hints for vectorization of other basic blocks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184444 91177308-0d34-0410-b5e6-96231b3b80d8
Prior to this change, the considered addressing modes may be invalid since the
maximum and minimum offsets were not taking into account.
This was causing an assertion failure.
The added test case exercices that behavior.
<rdar://problem/14199725> Assertion failed: (CurScaleCost >= 0 && "Legal
addressing mode has an illegal cost!")
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184341 91177308-0d34-0410-b5e6-96231b3b80d8
The type <3 x i8> is a common in graphics and we want to be able to vectorize it.
This changes accelerates bullet by 12% and 471_omnetpp by 5%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184317 91177308-0d34-0410-b5e6-96231b3b80d8
vectorizing loops with memory accesses to non-zero address spaces. It
simply dropped the AS info. Fixes PR16306.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184103 91177308-0d34-0410-b5e6-96231b3b80d8
This pass was assuming that if hasAddressTaken() returns false for a
function, the function's only uses are call sites. That's not true
because there can be references by BlockAddresses too.
Fix the pass to handle this case. Fix
BlockAddress::replaceUsesOfWithOnConstant() to allow a function's type
to be changed by RAUW'ing the function with a bitcast of the recreated
function.
Patch by Mark Seaborn.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183933 91177308-0d34-0410-b5e6-96231b3b80d8
Most clients have already been moved from Path V1 to V2. The ones using V1
now include PathV1.h explicitly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183801 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of a custom implementation of replaceAllUsesWith, we just call
replaceAllUsesWith and recreate llvm.used and llvm.compiler-used.
This change is particularity interesting because it makes llvm see
through what clang is doing with static used functions in extern "C"
contexts. With this change, running clang -O2 in
extern "C" {
__attribute__((used)) static void foo() {}
}
produces
@llvm.used = appending global [1 x i8*] [i8* bitcast (void ()* @foo to
i8*)], section "llvm.metadata"
define internal void @foo() #0 {
entry:
ret void
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183756 91177308-0d34-0410-b5e6-96231b3b80d8
Variadic functions are particularly fragile in the face of ABI changes, so this
limits how much the pass changes them
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183625 91177308-0d34-0410-b5e6-96231b3b80d8
r183584 tries to derive some info from the code *AFTER* a call and apply
these derived info to the code *BEFORE* the call, which is not always safe
as the call in question may never return, and in this case, the derived
info is invalid.
Thank Duncan for pointing out this potential bug.
rdar://14073661
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183606 91177308-0d34-0410-b5e6-96231b3b80d8
The MemCpyOpt pass is capable of optimizing:
callee(&S); copy N bytes from S to D.
into:
callee(&D);
subject to some legality constraints.
Assertion is triggered when the compiler tries to evalute "sizeof(typeof(D))",
while D is an opaque-typed, 'sret' formal argument of function being compiled.
i.e. the signature of the func being compiled is something like this:
T caller(...,%opaque* noalias nocapture sret %D, ...)
The fix is that when come across such situation, instead of calling some
utility functions to get the size of D's type (which will crash), we simply
assume D has at least N bytes as implified by the copy-instruction.
rdar://14073661
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183584 91177308-0d34-0410-b5e6-96231b3b80d8
IndVarSimplify is willing to move divide instructions outside of their
loop bodies if they are invariant of the loop. However, it may not be
safe to expand them if we do not know if they can trap.
Instead, check to see if it is not safe to expand the instruction and
skip the expansion.
This fixes PR16041.
Testcase by Rafael Ávila de Espíndola.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183239 91177308-0d34-0410-b5e6-96231b3b80d8
The problem this time seems to be a thinko. We were assuming that in the CFG
A
| \
| B
| /
C
speculating the basic block B would cause only the phi value for the B->C edge
to be speculated. That is not true, the phi's are semantically in the edges, so
if the A->B->C path is taken, any code needed for A->C is not executed and we
have to consider it too when deciding to speculate B.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183226 91177308-0d34-0410-b5e6-96231b3b80d8
PR16069 is an interesting case where an incoming value to a PHI is a
trap value while also being a 'ConstantExpr'.
We do not consider this case when performing the 'HoistThenElseCodeToIf'
optimization.
Instead, make our modifications more conservative if we detect that we
cannot transform the PHI to a select.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183152 91177308-0d34-0410-b5e6-96231b3b80d8