40 Commits

Author SHA1 Message Date
Raul E. Silvera
6df2b69098 When analyzing vectors of element type that require legalization,
the legalization cost must be included to get an accurate
estimation of the total cost of the scalarized vector.
The inaccurate cost triggered unprofitable SLP vectorization on
32-bit X86.

Summary:
Include legalization overhead when computing scalarization cost

Reviewers: hfinkel, nadav

CC: chandlerc, rnk, llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2992

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203509 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-10 22:59:13 +00:00
Nico Rieck
c15d3a82ae Add extra CHECK prefix to tests with explicit prefix
These tests mistakenly assume that CHECK is still available even if an
explicit prefix is specified.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201492 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-16 13:28:15 +00:00
Andrea Di Biagio
029a76b0a2 [Vectorizer] Add a new 'OperandValueKind' in TargetTransformInfo called
'OK_NonUniformConstValue' to identify operands which are constants but
not constant splats.

The cost model now allows returning 'OK_NonUniformConstValue'
for non splat operands that are instances of ConstantVector or
ConstantDataVector.

With this change, targets are now able to compute different costs
for instructions with non-uniform constant operands.
For example, On X86 the cost of a vector shift may vary depending on whether
the second operand is a uniform or non-uniform constant.

This patch applies the following changes:
 - The cost model computation now takes into account non-uniform constants;
 - The cost of vector shift instructions has been improved in
   X86TargetTransformInfo analysis pass;
 - BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish
   between non-uniform and uniform constant operands.

Added a new test to verify that the output of opt
'-cost-model -analyze' is valid in the following configurations: SSE2,
SSE4.1, AVX, AVX2.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201272 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-12 23:43:47 +00:00
Tim Northover
0c245b69f7 X86: add costs for 64-bit vector ext/trunc & rebalance
The most important part of this is probably adding any cost at all for
operations like zext <8 x i8> to <8 x i32>. Before they were being
recorded as extremely costly (24, I believe) which made LLVM fall back
on a 4-wide vectorisation of a loop.

It also rebalances the values for sext, zext and trunc. Lacking any
other sane metric that might work across CPU microarchitectures I went
for instructions. This seems to be in reasonable accord with the rest
of the table (sitofp, ...) though no doubt at least one value is
sub-optimal for some bizarre reason.

Finally, separate AVX and AVX2 values are provided where appropriate.
The CodeGen is quite different in many cases.

rdar://problem/15981990

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200928 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-06 18:18:36 +00:00
Benjamin Kramer
bb41c75ab5 X86: Custom lower sext v16i8 to v16i16, and the corresponding truncate.
Also update the cost model.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193270 91177308-0d34-0410-b5e6-96231b3b80d8
2013-10-23 21:06:07 +00:00
Yi Jiang
cdfb43f0a6 X86 horizontal vector reduction cost model
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191021 91177308-0d34-0410-b5e6-96231b3b80d8
2013-09-19 17:48:48 +00:00
Arnold Schwaighofer
65457b679a Costmodel: Add support for horizontal vector reductions
Upcoming SLP vectorization improvements will want to be able to estimate costs
of horizontal reductions. Add infrastructure to support this.

We model reductions as a series of (shufflevector,add) tuples ultimately
followed by an extractelement. For example, for an add-reduction of <4 x float>
we could generate the following sequence:

 (v0, v1, v2, v3)
   \   \  /  /
     \  \  /
       +  +

 (v0+v2, v1+v3, undef, undef)
    \      /
 ((v0+v2) + (v1+v3), undef, undef)

 %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,
                           <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
 %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
 %rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,
                          <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
 %bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7
 %r = extractelement <4 x float> %bin.rdx8, i32 0

This commit adds a cost model interface "getReductionCost(Opcode, Ty, Pairwise)"
that will allow clients to ask for the cost of such a reduction (as backends
might generate more efficient code than the cost of the individual instructions
summed up). This interface is excercised by the CostModel analysis pass which
looks for reduction patterns like the one above - starting at extractelements -
and if it sees a matching sequence will call the cost model interface.

We will also support a second form of pairwise reduction that is well supported
on common architectures (haddps, vpadd, faddp).

 (v0, v1, v2, v3)
  \   /    \  /
 (v0+v1, v2+v3, undef, undef)
    \     /
 ((v0+v1)+(v2+v3), undef, undef, undef)

  %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
        <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
  %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
        <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
  %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
  %rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
        <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
  %rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
        <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
  %bin.rdx.1 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
  %r = extractelement <4 x float> %bin.rdx.1, i32 0

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190876 91177308-0d34-0410-b5e6-96231b3b80d8
2013-09-17 18:06:50 +00:00
Daniel Dunbar
24ec2e5a72 [tests] Cleanup initialization of test suffixes.
- Instead of setting the suffixes in a bunch of places, just set one master
   list in the top-level config. We now only modify the suffix list in a few
   suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).

 - Aside from removing the need for a bunch of lit.local.cfg files, this enables
   4 tests that were inadvertently being skipped (one in
   Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
   CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
   XFAILED).

 - This commit also fixes a bunch of config files to use config.root instead of
   older copy-pasted code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188513 91177308-0d34-0410-b5e6-96231b3b80d8
2013-08-16 00:37:11 +00:00
Hal Finkel
04b84c2f92 Add the nearbyint -> FNEARBYINT mapping to BasicTargetTransformInfo
This fixes an oversight that Intrinsic::nearbyint was not being mapped to
ISD::FNEARBYINT (thus fixing the over-optimistic cost we were assigning to
nearbyint calls for some targets).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185783 91177308-0d34-0410-b5e6-96231b3b80d8
2013-07-08 03:24:07 +00:00
Nadav Rotem
16d36a5cd1 CostModel: improve the cost model for load/store of non power-of-two types such as <3 x float>, which are popular in graphics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185085 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-27 17:52:04 +00:00
Arnold Schwaighofer
34eb2406b4 X86 cost model: Vectorizing integer division is a bad idea
radar://14057959

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184872 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-25 19:14:09 +00:00
Manman Ren
e78d832097 TBAA: remove !tbaa from testing cases if not used.
This will make it easier to turn on struct-path aware TBAA since the metadata
format will change.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180743 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-29 22:42:01 +00:00
Arnold Schwaighofer
9c63f0d687 X86 cost model: Exit before calling getSimpleVT on non-simple VTs
getSimpleVT can only handle simple value types.

radar://13676022

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179714 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-17 20:04:53 +00:00
Nadav Rotem
9eb366acba CostModel: increase the default cost of supported floating point operations from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179413 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-12 21:15:03 +00:00
Arnold Schwaighofer
813456527e X86 cost model: Model cost for uitofp and sitofp on SSE2
The costs are overfitted so that I can still use the legalization factor.

For example the following kernel has about half the throughput vectorized than
unvectorized when compiled with SSE2. Before this patch we would vectorize it.

unsigned short A[1024];
double B[1024];
void f() {
  int i;
  for (i = 0; i < 1024; ++i) {
    B[i] = (double) A[i];
  }
}

radar://13599001

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179033 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-08 18:05:48 +00:00
Arnold Schwaighofer
cd3d60c450 TargetLowering: Fix getTypeConversion handling of extended vector types
The code in getTypeConversion attempts to promote the element vector type
before it trys to split or widen the vector.
After it failed finding a legal vector type by promoting it would continue using
the promoted vector element type. Thereby missing legal splitted vector types.
For example the type v32i32 that has a legal split of 4 x v3i32 on x86/sse2
would be transformed to: v32i256 and from there on successively split to:
v16i256, v8i256, v1i256 and then finally ends up as an i64 type.
By resetting the vector element type to the original vector element type that
existed before the promotion the code will attempt to split the vector type to
smaller vector widths of the same type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178999 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-07 20:22:56 +00:00
Arnold Schwaighofer
2537f3c659 X86 cost model: Differentiate cost for vector shifts of constants
SSE2 has efficient support for shifts by a scalar. My previous change of making
shifts expensive did not take this into account marking all shifts as expensive.
This would prevent vectorization from happening where it is actually beneficial.

With this change we differentiate between shifts of constants and other shifts.

radar://13576547

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178808 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-04 23:26:24 +00:00
Arnold Schwaighofer
6b6050b229 X86 cost model: Vector shifts are expensive in most cases
The default logic does not correctly identify costs of casts because they are
marked as custom on x86.

For some cases, where the shift amount is a scalar we would be able to generate
better code. Unfortunately, when this is the case the value (the splat) will get
hoisted out of the loop, thereby making it invisible to ISel.

radar://13130673
radar://13537826

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178703 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-03 21:46:05 +00:00
Benjamin Kramer
13497b3aa7 X86TTI: Add accurate costs for itofp operations, based on the actual instruction counts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178459 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-01 10:23:49 +00:00
Michael Liao
f74e9bf650 Correct cost model for vector shift on AVX2
- After moving logic recognizing vector shift with scalar amount from
  DAG combining into DAG lowering, we declare to customize all vector
  shifts even vector shift on AVX is legal. As a result, the cost model
  needs special tuning to identify these legal cases.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177586 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-20 22:01:10 +00:00
Nadav Rotem
b05130e1b2 Optimize sext <4 x i8> and <4 x i16> to <4 x i64>.
Patch by Ahmad, Muhammad T <muhammad.t.ahmad@intel.com>



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177421 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-19 18:38:27 +00:00
Arnold Schwaighofer
5f0d9dbdf4 X86 cost model: Adjust cost for custom lowered vector multiplies
This matters for example in following matrix multiply:

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
  int i, j, k, val;
  for (i=0; i<rows; i++) {
    for (j=0; j<cols; j++) {
      val = 0;
      for (k=0; k<cols; k++) {
        val += m1[i][k] * m2[k][j];
      }
      m3[i][j] = val;
    }
  }
  return(m3);
}

Taken from the test-suite benchmark Shootout.

We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).

Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.

I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:

for (i ...)
   r += a[i] * 3;
for (i ...)
  m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.

In each case the vectorized version was considerably slower.

radar://13304919

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176403 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-02 04:02:52 +00:00
Benjamin Kramer
8611d4449a Cost model support for lowered math builtins.
We make the cost for calling libm functions extremely high as emitting the
calls is expensive and causes spills (on x86) so performance suffers. We still
vectorize important calls like ceilf and friends on SSE4.1. and fabs.

Differential Revision: http://llvm-reviews.chandlerc.com/D466

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176287 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-28 19:09:33 +00:00
Elena Demikhovsky
52981c4b60 I optimized the following patterns:
sext <4 x i1> to <4 x i64>
 sext <4 x i8> to <4 x i64>
 sext <4 x i16> to <4 x i64>
 
I'm running Combine on SIGN_EXTEND_IN_REG and revert SEXT patterns:
 (sext_in_reg (v4i64 anyext (v4i32 x )), ExtraVT) -> (v4i64 sext (v4i32 sext_in_reg (v4i32 x , ExtraVT)))
 
 The sext_in_reg (v4i32 x) may be lowered to shl+sar operations.
 The "sar" does not exist on 64-bit operation, so lowering sext_in_reg (v4i64 x) has no vector solution.

I also added a cost of this operations to the AVX costs table.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175619 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-20 12:42:54 +00:00
Arnold Schwaighofer
fb55a8fd7c ARM cost model: Address computation in vector mem ops not free
Adds a function to target transform info to query for the cost of address
computation. The cost model analysis pass now also queries this interface.
The code in LoopVectorize adds the cost of address computation as part of the
memory instruction cost calculation. Only there, we know whether the instruction
will be scalarized or not.
Increase the penality for inserting in to D registers on swift. This becomes
necessary because we now always assume that address computation has a cost and
three is a closer value to the architecture.

radar://13097204

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174713 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-08 14:50:48 +00:00
Nadav Rotem
f85ec865f0 We are not ready to estimate the cost of integer expansions based on the number of parts. This test is too noisy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170999 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-23 09:11:07 +00:00
Nadav Rotem
f5637c3997 Improve the X86 cost model for loads and stores.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170830 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-21 01:33:59 +00:00
Jakub Staszak
270bfbd3d1 Reverse order of checking SSE level when calculating compare cost, so we check
AVX2 before AVX.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170464 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-18 22:57:56 +00:00
Nadav Rotem
0602bb4659 Cost Model: change the default cost of control flow instructions (br / ret / ...) to zero.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169423 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-05 21:21:26 +00:00
Nadav Rotem
a6fb97a49a CostModel: add another known vector trunc optimization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167488 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-06 21:17:17 +00:00
Nadav Rotem
b042868c01 Cost Model: add tables for some avx type-conversion hacks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167480 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-06 19:33:53 +00:00
Nadav Rotem
7ae3bcca45 CostModel: Add tables for the common x86 compares.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167421 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-05 23:48:20 +00:00
Nadav Rotem
2d1528b358 Code Model: Improve the accuracy of the zext/sext/trunc vector cost estimation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167412 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-05 22:20:53 +00:00
Nadav Rotem
a4ab5290e6 Cost Model: Normalize the insert/extract index when splitting types
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167402 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-05 21:12:13 +00:00
Nadav Rotem
75138f58b0 Cost Model: teach the cost model about expanding integers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167401 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-05 21:11:10 +00:00
Nadav Rotem
e623702c22 Implement the cost of abnormal x86 instruction lowering as a table.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167395 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-05 19:32:46 +00:00
Nadav Rotem
b4b04c3fa0 X86 CostModel: Add support for a some of the common arithmetic instructions for SSE4, AVX and AVX2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167347 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-03 00:39:56 +00:00
Nadav Rotem
0c31e43ff3 Add a stub for the x86 cost model impl. Implement a basic cost rule for inserting/extracting from XMM registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167333 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-02 23:27:16 +00:00
Nadav Rotem
f149567160 CostModel: add support for Vector Insert and Extract.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167329 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-02 22:31:56 +00:00
Nadav Rotem
6bed58ef24 Add a cost model analysis that allows us to estimate the cost of IR-level instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167324 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-02 21:48:17 +00:00