This is needed so that perl's SHA can be compiled (otherwise
BBVectorize takes far too long to find its fixed point).
I'll try to come up with a reduced test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166738 91177308-0d34-0410-b5e6-96231b3b80d8
This is the first of several steps to incorporate information from the new
TargetTransformInfo infrastructure into BBVectorize. Two things are done here:
1. Target information is used to determine if it is profitable to fuse two
instructions. This means that the cost of the vector operation must not
be more expensive than the cost of the two original operations. Pairs that
are not profitable are no longer considered (because current cost information
is incomplete, for intrinsics for example, equal-cost pairs are still
considered).
2. The 'cost savings' computed for the profitability check are also used to
rank the DAGs that represent the potential vectorization plans. Specifically,
for nodes of non-trivial depth, the cost savings is used as the node
weight.
The next step will be to incorporate the shuffle costs into the DAG weighting;
this will give the edges of the DAG weights as well. Once that is done, when
target information is available, we should be able to dispense with the
depth heuristic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166716 91177308-0d34-0410-b5e6-96231b3b80d8
Unreachable blocks can have invalid instructions. For example,
jump threading can produce self-referential instructions in
unreachable blocks. Also, we should not be spending time
optimizing unreachable code. Fixes PR14133.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166423 91177308-0d34-0410-b5e6-96231b3b80d8
We used a SCEV to detect that A[X] is consecutive. We assumed that X was
the induction variable. But X can be any expression that uses the induction
for example: X = i + 2;
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166388 91177308-0d34-0410-b5e6-96231b3b80d8
This is important for nested-loop reductions such as :
In the innermost loop, the induction variable does not start with zero:
for (i = 0 .. n)
for (j = 0 .. m)
sum += ...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166387 91177308-0d34-0410-b5e6-96231b3b80d8
If the pointer is consecutive then it is safe to read and write. If the pointer is non-loop-consecutive then
it is unsafe to vectorize it because we may hit an ordering issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166371 91177308-0d34-0410-b5e6-96231b3b80d8
This disables malloc-specific optimization when -fno-builtin (or -ffreestanding)
is specified. This has been a problem for a long time but became more severe
with the recent memory builtin improvements.
Since the memory builtin functions are used everywhere, this required passing
TLI in many places. This means that functions that now have an optional TLI
argument, like RecursivelyDeleteTriviallyDeadFunctions, won't remove dead
mallocs anymore if the TLI argument is missing. I've updated most passes to do
the right thing.
Fixes PR13694 and probably others.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162841 91177308-0d34-0410-b5e6-96231b3b80d8
When both a load/store and its address computation are being vectorized, it can
happen that the address-computation vectorization destroys SCEV's ability
to analyize the relative pointer offsets. As a result (like with the aliasing
analysis info), we need to precompute the necessary information prior to
instruction fusing.
This was found during stress testing (running through the test suite with a very
low required chain length); unfortunately, I don't have a small test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159332 91177308-0d34-0410-b5e6-96231b3b80d8
The original algorithm only used recursive pair fusion of equal-length
types. This is now extended to allow pairing of any types that share
the same underlying scalar type. Because we would still generally
prefer the 2^n-length types, those are formed first. Then a second
set of iterations form the non-2^n-length types.
Also, a call to SimplifyInstructionsInBlock has been added after each
pairing iteration. This takes care of DCE (and a few other things)
that make the following iterations execute somewhat faster. For the
same reason, some of the simple shuffle-combination cases are now
handled internally.
There is some additional refactoring work to be done, but I've had
many requests for this feature, so additional refactoring will come
soon in future commits (as will additional test cases).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159330 91177308-0d34-0410-b5e6-96231b3b80d8
Maintaining this kind of checking in different places is dangerous, extending
Instruction::isSameOperationAs consolidates this logic into one place. Here
I've added an optional flags parameter and two flags that are important for
vectorization: CompareIgnoringAlignment and CompareUsingScalarTypes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159329 91177308-0d34-0410-b5e6-96231b3b80d8