Commit Graph

141 Commits

Author SHA1 Message Date
Arnold Schwaighofer
7251a75f6e X86 cost model: Add cost for vectorized gather/scather
radar://14351991

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186189 91177308-0d34-0410-b5e6-96231b3b80d8
2013-07-12 19:16:07 +00:00
Arnold Schwaighofer
4a1c764264 ARM cost model: Add cost for gather/scather
Fixes a 35% degradation compared to unvectorized code in
MiBench/automotive-susan and an equally serious regression on a private
image processing benchmark.

radar://14351991

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186188 91177308-0d34-0410-b5e6-96231b3b80d8
2013-07-12 19:16:04 +00:00
Arnold Schwaighofer
11eb51e239 LoopVectorize: Vectorize all accesses in address space zero with unit stride
We can vectorize them because in the case where we wrap in the address space the
unvectorized code would have had to access a pointer value of zero which is
undefined behavior in address space zero according to the LLVM IR semantics.
(Thank you Duncan, for pointing this out to me).

Fixes PR16592.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186088 91177308-0d34-0410-b5e6-96231b3b80d8
2013-07-11 15:21:55 +00:00
Arnold Schwaighofer
c14380d195 LoopVectorize: Math functions only read rounding mode
Math functions are mark as readonly because they read the floating point
rounding mode. Because we don't vectorize loops that would contain function
calls that set the rounding mode it is safe to ignore this memory read.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185299 91177308-0d34-0410-b5e6-96231b3b80d8
2013-07-01 00:54:44 +00:00
Arnold Schwaighofer
57a7da8b23 LoopVectorize: Preserve debug location info
radar://14169017

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185122 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-28 00:38:54 +00:00
Arnold Schwaighofer
0bbbf7cbb0 LoopVectorize: Cache edge masks created during if-conversion
Otherwise, we end up with an exponential IR blowup.
Fixes PR16472.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185097 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-27 20:31:06 +00:00
Arnold Schwaighofer
0862d589ee LoopVectorize: Use vectorized loop invariant gep index anchored in loop
Use vectorized instruction instead of original instruction anchored in the
original loop.

Fixes PR16452 and t2075.c of PR16455.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185081 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-27 15:11:55 +00:00
Arnold Schwaighofer
e2b9912a78 Fix spelling.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185052 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-27 01:01:11 +00:00
Arnold Schwaighofer
45ef457b8f LoopVectorize: Don't store a reversed value in the vectorized value map
When we store values for reversed induction stores we must not store the
reversed value in the vectorized value map. Another instruction might use this
value.

This fixes 3 test cases of PR16455.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185051 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-27 00:45:41 +00:00
Arnold Schwaighofer
bc7c58d2b1 Reapply 184685 after the SetVector iteration order fix.
This should hopefully have fixed the stage2/stage3 miscompare on the dragonegg
testers.

"LoopVectorize: Use the dependence test utility class

We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.

We can now vectorize loops with simple constant dependence distances.

  for (i = 8; i < 256; ++i) {
    a[i] = a[i+4] * a[i+8];
  }

  for (i = 8; i < 256; ++i) {
    a[i] = a[i-4] * a[i-8];
  }

We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.

I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.

radar://13681598"

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184724 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-24 12:09:15 +00:00
Arnold Schwaighofer
ec677e2a64 Revert "LoopVectorize: Use the dependence test utility class"
This reverts commit cbfa1ca993.

We are seeing a stage2 and stage3 miscompare on some dragonegg bots.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184690 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-24 06:10:41 +00:00
Arnold Schwaighofer
cbfa1ca993 LoopVectorize: Use the dependence test utility class
We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.

We can now vectorize loops with simple constant dependence distances.

  for (i = 8; i < 256; ++i) {
    a[i] = a[i+4] * a[i+8];
  }

  for (i = 8; i < 256; ++i) {
    a[i] = a[i-4] * a[i-8];
  }

We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.

I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.

radar://13681598

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184685 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-24 03:55:48 +00:00
Pekka Jaaskelainen
a8a04380c5 Fix for a regression caused by the LoopVectorizer when
vectorizing loops with memory accesses to non-zero address spaces. It
simply dropped the AS info. Fixes PR16306.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184103 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-17 18:49:06 +00:00
Arnold Schwaighofer
47afc19625 LoopVectorize: PHIs with only outside users should prevent vectorization
We check that instructions in the loop don't have outside users (except if
they are reduction values). Unfortunately, we skipped this check for
if-convertable PHIs.

Fixes PR16184.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183035 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-31 19:53:50 +00:00
Paul Redmond
ee21b6f7b4 Add support for llvm.vectorizer metadata
- llvm.loop.parallel metadata has been renamed to llvm.loop to be more generic
  by making the root of additional loop metadata.
  - Loop::isAnnotatedParallel now looks for llvm.loop and associated
    llvm.mem.parallel_loop_access
  - document llvm.loop and update llvm.mem.parallel_loop_access
- add support for llvm.vectorizer.width and llvm.vectorizer.unroll
  - document llvm.vectorizer.* metadata
  - add utility class LoopVectorizerHints for getting/setting loop metadata
  - use llvm.vectorizer.width=1 to indicate already vectorized instead of
    already_vectorized
- update existing tests that used llvm.loop.parallel and
  llvm.vectorizer.already_vectorized

Reviewed by: Nadav Rotem


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182802 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-28 20:00:34 +00:00
Benjamin Kramer
959ecb2eec LoopVectorize: LoopSimplify can't canonicalize loops with an indirectbr in it, don't assert on those cases.
Fixes PR16139.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182656 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-24 18:05:35 +00:00
Arnold Schwaighofer
6e4a9c14f6 LoopVectorize: Make Value pointers that could be RAUW'ed a VH
The Value pointers we store in the induction variable list can be RAUW'ed by a
call to SCEVExpander::expandCodeFor, use a TrackingVH instead. Do the same thing
in some other places where we store pointers that could potentially be RAUW'ed.

Fixes PR16073.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182485 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-22 16:54:56 +00:00
Arnold Schwaighofer
688b5103eb LoopVectorize: Handle single edge PHIs
We might encouter single edge PHIs - handle them with an identity select.

Fixes PR15990.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182199 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-18 18:38:34 +00:00
Arnold Schwaighofer
1386692ef6 LoopVectorize: Hoist conditional loads if possible
InstCombine can be uncooperative to vectorization and sink loads into
conditional blocks. This prevents vectorization.

Undo this optimization if there are unconditional memory accesses to the same
addresses in the loop.

radar://13815763

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181860 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-15 01:44:30 +00:00
Arnold Schwaighofer
123f18bcb9 LoopVectorize: Handle loops with multiple forward inductions
We used to give up if we saw two integer inductions. After this patch, we base
further induction variables on the chosen one like we do in the reverse
induction and pointer induction case.

Fixes PR15720.

radar://13851975

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181746 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-14 00:21:18 +00:00
Arnold Schwaighofer
9b5d70f076 LoopVectorize: Use the widest induction variable type
Use the widest induction type encountered for the cannonical induction variable.

We used to turn the following loop into an empty loop because we used i8 as
induction variable type and truncated 1024 to 0 as trip count.

int a[1024];
void fail() {
  int reverse_induction = 1023;
  unsigned char forward_induction = 0;
  while ((reverse_induction) >= 0) {
    forward_induction++;
    a[reverse_induction] = forward_induction;
    --reverse_induction;
  }
}

radar://13862901

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181667 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-11 23:04:28 +00:00
Nadav Rotem
ef3e423e23 Add an additional testcase for PR15882.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181646 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-10 22:55:44 +00:00
Arnold Schwaighofer
c121f5dc26 LoopVectorizer: Don't assert on the absence of induction variables
A computable loop exit count does not imply the presence of an induction
variable. Scalar evolution can return a value for an infinite loop.

Fixes PR15926.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181495 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-09 00:32:18 +00:00
Arnold Schwaighofer
280e1df858 LoopVectorizer: Improve reduction variable identification
The two nested loops were confusing and also conservative in identifying
reduction variables. This patch replaces them by a worklist based approach.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181369 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-07 21:55:37 +00:00
Arnold Schwaighofer
eb95cec176 LoopVectorize: getConsecutiveVector must respect signed arithmetic
We were passing an i32 to ConstantInt::get where an i64 was needed and we must
also pass the sign if we pass negatives numbers. The start index passed to
getConsecutiveVector must also be signed.

Should fix PR15882.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181286 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-07 04:37:05 +00:00
Arnold Schwaighofer
87defd0924 LoopVectorize: Add support for floating point min/max reductions
Add support for min/max reductions when "no-nans-float-math" is enabled. This
allows us to assume we have ordered floating point math and treat ordered and
unordered predicates equally.

radar://13723044

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181144 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-05 01:54:48 +00:00
Arnold Schwaighofer
c1738fdadd LoopVectorize: We don't need an identity element for min/max reductions
We can just use the initial element that feeds the reduction.

  max(max(x, y), z) == max(max(x,y), max(x,z))

radar://13723044

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181141 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-05 01:54:42 +00:00
Nadav Rotem
4bcd5f888f LoopVectorizer: Add support for if-conversion of PHINodes with 3+ incoming values.
By supporting the vectorization of PHINodes with more than two incoming values we can increase the complexity of nested if statements.

We can now vectorize this loop:

int foo(int *A, int *B, int n) {
  for (int i=0; i < n; i++) {
    int x = 9;
    if (A[i] > B[i]) {
      if (A[i] > 19) {
        x = 3;
      } else if (B[i] < 4 ) {
        x = 4;
      } else {
        x = 5;
      }
    }
    A[i] = x;
  }
}



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181037 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-03 17:42:55 +00:00
Manman Ren
436849be6a TBAA: remove !tbaa from testing cases if not used.
This will make it easier to turn on struct-path aware TBAA since the metadata
format will change.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180935 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-02 18:11:35 +00:00
Manman Ren
2dc50d3067 TBAA: remove !tbaa from testing cases if not used.
This will make it easier to turn on struct-path aware TBAA since the metadata
format will change.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180796 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-30 17:52:57 +00:00
Nadav Rotem
7557e521e5 LoopVectorizer: Calculate the number of pointers to disambiguate at runtime based on the numbers of reads and writes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180593 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-26 05:08:59 +00:00
Nadav Rotem
975b1ddf60 LoopVectorizer: No need to generate pointer disambiguation checks between readonly pointers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180570 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-25 19:55:03 +00:00
Arnold Schwaighofer
a4b8b4ccc9 LoopVectorize: Scalarize padded types
This patch disables memory-instruction vectorization for types that need padding
bytes, e.g., x86_fp80 has 10 bytes store size with 6 bytes padding in darwin on
x86_64. Because the load/store vectorization is performed by the bit casting to
a packed vector, which has incompatible memory layout due to the lack of padding
bytes, the present vectorizer produces inconsistent result for memory
instructions of those types.
This patch checks an equality of the AllocSize of a scalar type and allocated
size for each vector element, to ensure that there is no padding bytes and the
array can be read/written using vector operations.

Patch by Daisuke Takahashi!

Fixes PR15758.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180196 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-24 16:16:01 +00:00
Arnold Schwaighofer
b03ad17536 LoopVectorizer: Bail out if we don't have datalayout we need it
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180195 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-24 16:15:58 +00:00
Nadav Rotem
a7d9a6ee63 LoopVectorizer: Fix 15830. When scalarizing and unrolling stores make sure that the order in which the elements are scalarized is the same as the original order.
This fixes a miscompilation in FreeBSD's regex library.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180121 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-23 17:12:42 +00:00
Pekka Jaaskelainen
2e59a125fc Call the potentially costly isAnnotatedParallel() only once.
Made the uniform write test's checks a bit stricter.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180119 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-23 16:44:43 +00:00
Pekka Jaaskelainen
a8958769ea Refuse to (even try to) vectorize loops which have uniform writes,
even if erroneously annotated with the parallel loop metadata.

Fixes Bug 15794: 
"Loop Vectorizer: Crashes with the use of llvm.loop.parallel metadata"



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180081 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-23 08:08:51 +00:00
Arnold Schwaighofer
a3fb330d05 LoopVectorizer: Recognize min/max reductions
A min/max operation is represented by a select(cmp(lt/le/gt/ge, X, Y), X, Y)
sequence in LLVM. If we see such a sequence we can treat it just as any other
commutative binary instruction and reduce it.

This appears to help bzip2 by about 1.5% on an imac12,2.

radar://12960601

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179773 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-18 17:22:34 +00:00
Benjamin Kramer
403fc14370 LoopVectorize: Use a set to avoid longer cycles in the reduction chain too.
Fixes PR15748.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179757 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-18 14:29:13 +00:00
Arnold Schwaighofer
08a0e8f8db LoopVectorizer: integer division is not a reduction operation
Don't classify idiv/udiv as a reduction operation. Integer division is lossy.
For example : (1 / 2) * 4 != 4/2.

Example:

int a[] = { 2, 5, 2, 2}
int x = 80;

for()
  x /= a[i];

Scalar:
  x /= 2 // = 40
  x /= 5 // = 8
  x /= 2 // = 4
  x /= 2 // = 2

Vectorized:

 <80, 1> / <2,5> //= <40,0>
 <40, 0> / <2,2> //= <20,0>

 20*0 = 0

radar://13640654

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179381 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-12 15:15:19 +00:00
Arnold Schwaighofer
ac2cc0170f LoopVectorizer: Pass OperandValueKind information to the cost model
Pass down the fact that an operand is going to be a vector of constants.

This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86
back. It had degraded to scalar performance due to my pervious shift cost change
that made all shifts expensive on x86.

radar://13576547

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178809 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-04 23:26:27 +00:00
Benjamin Kramer
13497b3aa7 X86TTI: Add accurate costs for itofp operations, based on the actual instruction counts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178459 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-01 10:23:49 +00:00
Arnold Schwaighofer
c184a5f4ca LoopVectorizer: Insert some white space to make test case more readable
Also remove some unneeded function attributes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177114 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-14 21:31:09 +00:00
Arnold Schwaighofer
e2188d9c43 Add missing asserts flag to test - it uses debug flags
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177102 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-14 19:01:58 +00:00
Arnold Schwaighofer
d517da33b7 LoopVectorize: Invert case when we use a vector cmp value to query select cost
We generate a select with a vectorized condition argument when the condition is
NOT loop invariant. Not the other way around.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177098 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-14 18:54:36 +00:00
Benjamin Kramer
1cb47b9afe Test case hygiene.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176772 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-09 18:25:40 +00:00
Arnold Schwaighofer
56ee544a3a LoopVectorizer: Ignore dbg.value instructions
We want vectorization to happen at -g. Ignore calls to the dbg.value intrinsic
and don't transfer them to the vectorized code.

radar://13378964

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176768 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-09 15:56:34 +00:00
Benjamin Kramer
35a4a0ca51 Force cpu in test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176702 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-08 17:01:18 +00:00
Benjamin Kramer
f22d9cfa6d Insert the reduction start value into the first bypass block to preserve domination.
Fixes PR15344.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176701 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-08 16:58:37 +00:00
Arnold Schwaighofer
5f0d9dbdf4 X86 cost model: Adjust cost for custom lowered vector multiplies
This matters for example in following matrix multiply:

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
  int i, j, k, val;
  for (i=0; i<rows; i++) {
    for (j=0; j<cols; j++) {
      val = 0;
      for (k=0; k<cols; k++) {
        val += m1[i][k] * m2[k][j];
      }
      m3[i][j] = val;
    }
  }
  return(m3);
}

Taken from the test-suite benchmark Shootout.

We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).

Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.

I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:

for (i ...)
   r += a[i] * 3;
for (i ...)
  m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.

In each case the vectorized version was considerably slower.

radar://13304919

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176403 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-02 04:02:52 +00:00