with the unrolling behavior in the loop vectorizer. No functionality
changed at this point.
These are a bit hack-y, but talking with Hal, there doesn't seem to be
a cleaner way to easily experiment with different thresholds here and he
was also interested in them so I wanted to commit them. Suggestions for
improvement are very welcome here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200212 91177308-0d34-0410-b5e6-96231b3b80d8
number of vector registers rather than toggling between vector and
scalar register number based on VF. I don't have a test case as
I spotted this by inspection and on X86 it only makes a difference if
your target is lacking SSE and thus has *no* vector registers.
If someone wants to add a test case for this for ARM or somewhere else
where this is more significant, that would be awesome.
Also made the variable name a bit more sensible while I'm here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200211 91177308-0d34-0410-b5e6-96231b3b80d8
LoopVectorize pass.
The logic here doesn't make much sense. We *only* unrolled if the
unvectorized loop was a reduction loop with a single basic block *and*
small loop body. The reduction part in particular doesn't make much
sense. Instead, if we just fall through to the vectorized unroll logic
it makes more sense of unrolling if there is a vectorized reduction that
could be hacked on by the SLP vectorizer *or* if the loop is small.
This is mostly a cleanup and nothing in the test suite really exercises
this, but I did run benchmarks across this change and saw no really
significant changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200198 91177308-0d34-0410-b5e6-96231b3b80d8
a FunctionPass. With this change the loop vectorizer no longer is a loop
pass and can readily depend on function analyses. In particular, with
this change we no longer have to form a loop pass manager to run the
loop vectorizer which simplifies the entire pass management of LLVM.
The next step here is to teach the loop vectorizer to leverage profile
information through the profile information providing analysis passes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200074 91177308-0d34-0410-b5e6-96231b3b80d8
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200018 91177308-0d34-0410-b5e6-96231b3b80d8
a reduction.
Really. Under certain circumstances (the use list of an instruction has to be
set up right - hence the extra pass in the test case) we would not recognize
when a value in a potential reduction cycle was used multiple times by the
reduction cycle.
Fixes PR18526.
radar://15851149
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199570 91177308-0d34-0410-b5e6-96231b3b80d8
can be used by both the new pass manager and the old.
This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.
The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.
Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199104 91177308-0d34-0410-b5e6-96231b3b80d8
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.
Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.
But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199082 91177308-0d34-0410-b5e6-96231b3b80d8
for (i = 0; i < N; ++i)
A[i * Stride1] += B[i * Stride2];
We take loops like this and check that the symbolic strides 'Strided1/2' are one
and drop to the scalar loop if they are not.
This is currently disabled by default and hidden behind the flag
'enable-mem-access-versioning'.
radar://13075509
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198950 91177308-0d34-0410-b5e6-96231b3b80d8
subsequent changes are easier to review. About to fix some layering
issues, and wanted to separate out the necessary churn.
Also comment and sink the include of "Windows.h" in three .inc files to
match the usage in Memory.inc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198685 91177308-0d34-0410-b5e6-96231b3b80d8
A phi node operand or an instruction operand could be a constant expression that
can trap (division). Check that we don't vectorize such cases.
PR16729
radar://15653590
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@197449 91177308-0d34-0410-b5e6-96231b3b80d8
The intended behaviour is to force vectorization on the presence
of the flag (either turn on or off), and to continue the behaviour
as expected in its absence. Tests were added to make sure the all
cases are covered in opt. No tests were added in other tools with
the assumption that they should use the PassManagerBuilder in the
same way.
This patch also removes the outdated -late-vectorize flag, which was
on by default and not helping much.
The pragma metadata is being attached to the same place as other loop
metadata, but nothing forbids one from attaching it to a function
(to enable #pragma optimize) or basic blocks (to hint the basic-block
vectorizers), etc. The logic should be the same all around.
Patches to Clang to produce the metadata will be produced after the
initial implementation is agreed upon and committed. Patches to other
vectorizers (such as SLP and BB) will be added once we're happy with
the pass manager changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196537 91177308-0d34-0410-b5e6-96231b3b80d8
We use CSEBlocks to initialize a worklist:
SmallVector<BasicBlock *, 8> CSEWorkList(CSEBlocks.begin(), CSEBlocks.end());
so it must have a deterministic order.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196520 91177308-0d34-0410-b5e6-96231b3b80d8
We were creating external uses for scalar values in MustGather entries that also
had a ScalarToTreeEntry (they also are present in a vectorized tuple). This
meant we would keep a value 'alive' as a scalar and vectorized causing havoc.
This is not necessary because when we create a MustGather vector we explicitly
create external uses entries for the insertelement instructions of the
MustGather vector elements.
Fixes PR18129.
radar://15582184
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196508 91177308-0d34-0410-b5e6-96231b3b80d8
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196471 91177308-0d34-0410-b5e6-96231b3b80d8
may be removed and optimized in future iterations. Instead we save a list of basic blocks that we need to CSE.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195791 91177308-0d34-0410-b5e6-96231b3b80d8
In signed arithmetic we could end up with an i64 trip count for an i32 phi.
Because it is signed arithmetic we know that this is only defined if the i32
does not wrap. It is therefore safe to truncate the i64 trip count to a i32
value.
Fixes PR18049.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195787 91177308-0d34-0410-b5e6-96231b3b80d8
we generate PHI nodes with multiple entries from the same basic block but
with different values. Enabling CSE on ExtractElement instructions make sure
that all of the RAUWed instructions are the same.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195773 91177308-0d34-0410-b5e6-96231b3b80d8
SLP vectorization. Based on the code in BBVectorizer.
Fixes PR17741.
Patch by Raul Silvera, reviewed by Hal and Nadav. Reformatted by my
driving of clang-format. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195528 91177308-0d34-0410-b5e6-96231b3b80d8
We are slicing an array of Value pointers and process those slices in a loop.
The problem is that we might invalidate a later slice by vectorizing a former
slice.
Use a WeakVH to track the pointer. If the pointer is deleted or RAUW'ed we can
tell.
The test case will only fail when running with libgmalloc.
radar://15498655
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195162 91177308-0d34-0410-b5e6-96231b3b80d8
In some case the loop exit count computation can overflow. Extend the type to
prevent most of those cases.
The problem is loops like:
int main ()
{
int a = 1;
char b = 0;
lbl:
a &= 4;
b--;
if (b) goto lbl;
return a;
}
The backedge count is 255. The induction variable type is i8. If we add one to
255 to get the exit count we overflow to zero.
To work around this issue we extend the type of the induction variable to i32 in
the case of i8 and i16.
PR17532
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195008 91177308-0d34-0410-b5e6-96231b3b80d8
When we vectorize a scalar access with no alignment specified, we have to set
the target's abi alignment of the scalar access on the vectorized access.
Using the same alignment of zero would be wrong because most targets will have a
bigger abi alignment for vector types.
This probably fixes PR17878.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194876 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of doing a RPO traversal of the whole function remember the blocks
containing gathers (typically <= 2) and scan them in dominator-first order.
The actual CSE is still quadratic, but I'm not confident that adding a
scoped hash table here is worth it as we're only looking at the generated
instructions and not arbitrary code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193956 91177308-0d34-0410-b5e6-96231b3b80d8
Doing this with a hash map doesn't change behavior and avoids calling
isIdenticalTo O(n^2) times. This should probably eventually move into a utility
class shared with EarlyCSE and the limited CSE in the SLPVectorizer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193926 91177308-0d34-0410-b5e6-96231b3b80d8
When the loop vectorizer was part of the SCC inliner pass manager gvn would
run after the loop vectorizer followed by instcombine. This way redundancy
(multiple uses) were removed and instcombine could perform scalarization on the
induction variables. Having moved the loop vectorizer to later we no longer run
any form of redundancy elimination before we perform instcombine. This caused
vectorized induction variables to survive that did not before.
On a recent iMac this helps linpack back from 6000Mflops to 7000Mflops.
This should also help lpbench and paq8p.
I ran a Release (without Asserts) build over the test-suite and did not see any
negative impact on compile time.
radar://15339680
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193891 91177308-0d34-0410-b5e6-96231b3b80d8
If we have a pointer to a single-element struct we can still build wide loads
and stores to it (if there is no padding).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193860 91177308-0d34-0410-b5e6-96231b3b80d8
When a dependence check fails we can still try to vectorize loops with runtime
array bounds checks.
This helps linpack to vectorize a loop in dgefa. And we are back to 2x of the
scalar performance on a corei7-avx.
radar://15339680
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193853 91177308-0d34-0410-b5e6-96231b3b80d8
Clear all data structures when resetting the RuntimeCheck data structure.
No test case. This was exposed by an upcomming change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193852 91177308-0d34-0410-b5e6-96231b3b80d8
By vectorizing a series of srl, or, ... instructions we have obfuscated the
intention so much that the backend does not know how to fold this code away.
radar://15336950
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193573 91177308-0d34-0410-b5e6-96231b3b80d8
No test case, because with the current cost model we don't see a difference.
An upcoming ARM memory cost model change will expose and test this bug.
radar://15332579
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193572 91177308-0d34-0410-b5e6-96231b3b80d8
The loop vectorizer does not currently understand how to vectorize
extractelement instructions. The existing check, which excluded all
vector-valued instructions, did not catch extractelement instructions because
it checked only the return value. As a result, vectorization would proceed,
producing illegal instructions like this:
%58 = extractelement <2 x i32> %15, i32 0
%59 = extractelement i32 %58, i32 0
where the second extractelement is illegal because its first operand is not a vector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193434 91177308-0d34-0410-b5e6-96231b3b80d8
Make sure we mark all loops (scalar and vector) when vectorizing,
so that we don't try to vectorize them anymore. Also, set unroll
to 1, since this is what we check for on early exit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193349 91177308-0d34-0410-b5e6-96231b3b80d8