Commit Graph

84 Commits

Author SHA1 Message Date
Chandler Carruth
1937233a22 [PM] Switch the TargetMachine interface from accepting a pass manager
base which it adds a single analysis pass to, to instead return the type
erased TargetTransformInfo object constructed for that TargetMachine.

This removes all of the pass variants for TTI. There is now a single TTI
*pass* in the Analysis layer. All of the Analysis <-> Target
communication is through the TTI's type erased interface itself. While
the diff is large here, it is nothing more that code motion to make
types available in a header file for use in a different source file
within each target.

I've tried to keep all the doxygen comments and file boilerplate in line
with this move, but let me know if I missed anything.

With this in place, the next step to making TTI work with the new pass
manager is to introduce a really simple new-style analysis that produces
a TTI object via a callback into this routine on the target machine.
Once we have that, we'll have the building blocks necessary to accept
a function argument as well.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227685 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-31 11:17:59 +00:00
Chandler Carruth
a6a87b595d [PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.

The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.

I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.

There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.

The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.

Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.

The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]

Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:

1) Improving the TargetMachine interface by having it directly return
   a TTI object. Because we have a non-pass object with value semantics
   and an internal type erasure mechanism, we can narrow the interface
   of the TargetMachine to *just* do what we need: build and return
   a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
   This will include splitting off a minimal form of it which is
   sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
   target machine for each function. This may actually be done as part
   of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
   easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
   easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
   just a bit messy and exacerbating the complexity of implementing
   the TTI in each target.

Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.

Differential Revision: http://reviews.llvm.org/D7293

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227669 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-31 03:43:40 +00:00
Elena Demikhovsky
70bae89669 Implemented cost model for masked load/store operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227035 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 08:44:46 +00:00
Elena Demikhovsky
b31322328a Masked Load/Store - Changed the order of parameters in intrinsics.
No functional changes.
The documentation is coming.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224829 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-25 07:49:20 +00:00
Elena Demikhovsky
1c3a1516f8 Loop Vectorizer minor changes in the code -
some comments, function names, identation.

Reviewed here: http://reviews.llvm.org/D6527


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224218 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-14 09:43:50 +00:00
Elena Demikhovsky
73ae1df82c Masked Load / Store Intrinsics - the CodeGen part.
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.

Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223348 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 09:40:44 +00:00
Michael Liao
d3c452a506 [X86] Clean up whitespace as well as minor coding style
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223339 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 05:20:33 +00:00
Duncan P. N. Exon Smith
54786a0936 Revert "Masked Vector Load and Store Intrinsics."
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot.  I'll respond to the commit on the
list with a reproduction of one of the failures.

Conflicts:
	lib/Target/X86/X86TargetTransformInfo.cpp

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222936 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-28 21:29:14 +00:00
Craig Topper
71777d18ad Add missing override keywords.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222634 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-23 09:40:13 +00:00
Elena Demikhovsky
ae1ae2c3a1 Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222632 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-23 08:07:43 +00:00
Elena Demikhovsky
18e1185ddf AVX-512: SINT_TO_FP cost model and some bugfixes
Checked some corner cases, for example translation
of <8 x i1> to <8 x double>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221883 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 11:46:16 +00:00
Quentin Colombet
8201185d61 [X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 if
AVX2 is available.
According to IACA, the new lowering has a throughput of 8 cycles instead of 13
with the previous one.

Althought this lowering kicks in some SPECs benchmarks, the performance
improvement was within the noise.

Correctness testing has been done for the whole range of uint32_t with the
following program:
    uint4 v = (uint4) {0,1,2,3};
    uint32_t i;
    
    //Check correctness over entire range for uint4 -> float4 conversion
    for( i = 0; i < 1U << (32-2); i++ )
    {
        float4 t = test(v);
        float4 c = correct(v);
        
        if( 0xf != _mm_movemask_ps( t == c ))
        {
            printf( "Error @ %vx: %vf vs. %vf\n", v, c, t);
            return -1;
        }
        
        v += 4;
    }
Where "correct" is the old lowering and "test" the new one.

The patch adds a test case for the two custom lowering instruction.
It also modifies the vector cost model, which is why cast.ll and uitofp.ll are
modified.
2009-02-26-MachineLICMBug.ll is also modified because we now hoist 7
instructions instead of 4 (3 more constant loads).

rdar://problem/18153096>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221657 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-11 02:23:47 +00:00
Elena Demikhovsky
0218e1e1da AVX-512: added cost for some AVX-512 instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217863 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-16 07:57:37 +00:00
Sanjay Patel
87c977a52b Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable.
"Unroll" is not the appropriate name for this variable. Clang already uses 
the term "interleave" in pragmas and metadata for this.

Differential Revision: http://reviews.llvm.org/D5066



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217528 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 17:58:16 +00:00
Karthik Bhat
e637d65af3 Allow vectorization of division by uniform power of 2.
This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible.
Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend.
Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216371 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-25 04:56:54 +00:00
Eric Christopher
9f85dccfc6 Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214781 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 21:25:23 +00:00
Adam Nemet
074b752cc9 [X86] AVX512: Enable it in the Loop Vectorizer
This lets us experiment with 512-bit vectorization without passing
force-vector-width manually.

The code generated for a simple integer memset loop is properly vectorized.
Disassembly is still broken for it though :(.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212634 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 18:22:33 +00:00
Andrea Di Biagio
60e9a53c21 [CostModel][x86] Improved cost model for alternate shuffles.
This patch:
 1) Improves the cost model for x86 alternate shuffles (originally
added at revision 211339);
 2) Teaches the Cost Model Analysis pass how to analyze alternate shuffles.

Alternate shuffles are a special kind of blend; on x86, we can often
easily lowered alternate shuffled into single blend
instruction (depending on the subtarget features).

The existing cost model didn't take into account subtarget features.
Also, it had a couple of "dead" entries for vector types that are never
legal (example: on x86 types v2i32 and v2f32 are not legal; those are
always either promoted or widened to 128-bit vector types).

The new x86 cost model takes into account what target features we have
before returning the shuffle cost (i.e. the number of instructions
after the blend is lowered/expanded).

This patch also teaches the Cost Model Analysis how to identify and analyze
alternate shuffles (i.e. 'SK_Alternate' shufflevector instructions):
 - added function 'isAlternateVectorMask';
 - added some logic to check if an instruction is a alternate shuffle and, in
   case, call the target specific TTI to get the corresponding shuffle cost;
 - added a test to verify the cost model analysis on alternate shuffles.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212296 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-03 22:24:18 +00:00
Karthik Bhat
d2ce9392dc Add Support to Recognize and Vectorize NON SIMD instructions in SLPVectorizer.
This patch adds support to recognize patterns such as fadd,fsub,fadd,fsub.../add,sub,add,sub... and
vectorizes them as vector shuffles if they are profitable.
These patterns of vector shuffle can later be converted to instructions such as addsubpd etc on X86.
Thanks to Arnold and Hal for the reviews. http://reviews.llvm.org/D4015 


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211339 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-20 04:32:48 +00:00
Saleem Abdulrasool
a984b30b76 X86: stifle GCC warning
lib/Target/X86/X86TargetTransformInfo.cpp: In member function ‘virtual unsigned int {anonymous}::X86TTI::getIntImmCost(unsigned int, unsigned int, const llvm::APInt&, llvm::Type*) const’:
lib/Target/X86/X86TargetTransformInfo.cpp:920:60: warning: enumeral and non-enumeral type in conditional expression [enabled by default]

This seems like an unhelpful warning, but there doesnt seem to be a controlling
flag, so add an explicit cast to silence the warning.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210806 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-12 17:56:18 +00:00
Juergen Ributzka
8b9e31c6e2 [ConstantHoisting][X86] Improve the cost model for small constants with large types (i64 and above).
This improves the X86 cost model for small constants with large types. Before
this commit we would even hoist trivial constants such as i96 2.

This is related to <rdar://problem/17070936>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210504 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-10 00:32:29 +00:00
Eric Christopher
4551b0a800 Fix typo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209377 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-22 01:21:44 +00:00
Juergen Ributzka
2f8bca00bb [ConstantHoisting][X86] Change the cost model to never hoist constants for types larger than i128.
Currently the X86 backend doesn't support types larger than i128 very well. For
example an i192 multiply will assert in codegen when the 2nd argument is a constant and the constant got hoisted.

This fix changes the cost model to never hoist constants for types larger than
i128. Once the codegen issues have been resolved, the cost model can be updated
to allow also larger types.

This is related to <rdar://problem/16954938>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209162 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-19 21:00:53 +00:00
Hal Finkel
f35ce2376c Move late partial-unrolling thresholds into the processor definitions
The old method used by X86TTI to determine partial-unrolling thresholds was
messy (because it worked by testing target features), and also would not
correctly identify the target CPU if certain target features were disabled.
After some discussions on IRC with Chandler et al., it was decided that the
processor scheduling models were the right containers for this information
(because it is often tied to special uop dispatch-buffer sizes).

This does represent a small functionality change:
 - For generic x86-64 (which uses the SB model and, thus, will get some
   unrolling).
 - For AMD cores (because they still currently use the SB scheduling model)
 - For Haswell (based on benchmarking by Louis Gerbarg, it was decided to bump
   the default threshold to 50; we're working on a test case for this).
Otherwise, nothing has changed for any other targets. The logic, however, has
been moved into BasicTTI, so other targets may now also opt-in to this
functionality simply by setting LoopMicroOpBufferSize in their processor
model definitions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208289 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-08 09:14:44 +00:00
Hal Finkel
df60e43e05 [X86TTI] Remove the unrolling branch limits
The loop stream detector (LSD) on modern Intel cores, which optimizes the
execution of small loops, has limits on the number of taken branches in
addition to uop-count limits (modern AMD cores have similar limits).
Unfortunately, at the IR level, estimating the number of branches that will be
taken is difficult. For one thing, it strongly depends on later passes (block
placement, etc.). The original implementation took a conservative approach and
limited the maximal BB DFS depth of the loop.  However, fairly-extensive
benchmarking by several of us has revealed that this is the wrong approach. In
fact, there are zero known cases where the branch limit prevents a detrimental
unrolling (but plenty of cases where it does prevent beneficial unrolling).

While we could improve the current branch counting logic by incorporating
branch probabilities, this further complication seems unjustified without a
motivating regression. Instead, unless and until a regression appears, the
branch counting will be removed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208255 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-07 22:25:18 +00:00
Michael Zolotukhin
c80b103a2b [X86] Never hoist the shift value of a shift instruction.
There is no need to check if we want to hoist the immediate value of an
shift instruction. Simply return TCC_Free right away.

This change is like r206101, but for X86.

rdar://problem/16190769

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207692 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 19:17:32 +00:00
Benjamin Kramer
3551384ae2 X86TTI: Adjust sdiv cost now that we can lower it on plain SSE2.
Includes a fix for a horrible typo that caused all SDIV costs to be
slightly off :)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207371 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-27 18:47:54 +00:00
Benjamin Kramer
d9ced7112e X86TTI: i16/i32 vector div with a constant (splat) divisor are reasonably cheap now.
Turn vectorization back on.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207320 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-26 14:53:05 +00:00
Craig Topper
c848b1bbcf [C++] Use 'nullptr'. Target edition.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207197 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-25 05:30:21 +00:00
Chandler Carruth
42e8630239 [Modules] Fix potential ODR violations by sinking the DEBUG_TYPE
definition below all of the header #include lines, lib/Target/...
edition.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206842 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-22 02:41:26 +00:00
Juergen Ributzka
172e0ca8c5 Add comments and test case for [X86TTI] Make constant base pointers for GetElementPtr opaque (r204739).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205468 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-02 21:45:36 +00:00
Hal Finkel
e30aa957e3 Implement X86TTI::getUnrollingPreferences
This provides an initial implementation of getUnrollingPreferences for x86.
getUnrollingPreferences is used by the generic (concatenation) unroller, which
is distinct from the unrolling done by the loop vectorizer. Many modern x86
cores have some kind of uop cache and loop-stream detector (LSD) used to
efficiently dispatch small loops, and taking full advantage of this requires
unrolling small loops (small here means 10s of uops).

These caches also have limits on the number of taken branches in the loop, and
so we also cap the loop unrolling factor based on the maximum "depth" of the
loop. This is currently calculated with a partial DFS traversal (partial
because it will stop early if the path length grows too much). This is still an
approximation, and one that is both conservative (because it does not account
for branches eliminated via block placement) and optimistic (because it is only
recording the maximum depth over minimum paths). Nevertheless, because the
loops that fit in these uop caches are so small, it is not clear how much the
details matter.

The original set of patches posted for review produced the following test-suite
performance results (from the TSVC benchmark) at that time:
  ControlLoops-dbl - 13% speedup
  ControlLoops-flt - 15% speedup
  Reductions-dbl - 7.5% speedup

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205348 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-01 18:50:34 +00:00
Adam Nemet
4ffbb65494 [X86] Adjust cost of FP_TO_UINT v4f64->v4i32 as well
Pretty obvious follow-on to r205159 to also handle conversion from double
besides float.

Fixes <rdar://problem/16373208>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205253 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 21:54:48 +00:00
Adam Nemet
cb1800772a [X86] Adjust cost of FP_TO_UINT v8f32->v8i32
There is no direct AVX instruction to convert to unsigned.  I have some ideas
how we may be able to do this with three vector instructions but the current
backend just bails on this to get it scalarized.

See the comment why we need to adjust the cost returned by BasicTTI.

The test is a bit roundabout (and checks assembly rather than bit code) because
I'd like it to work even if at some point we could vectorize this conversion.

Fixes <rdar://problem/16371920>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205159 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-30 18:07:13 +00:00
Quentin Colombet
d5f800aab3 [X86][Vector Cost Model] Add a comment to explain the workaround
in my previous commit (r204884).

<rdar://problem/16381225>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204972 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-27 22:27:41 +00:00
Quentin Colombet
566abecc9f [X86][Vectorizer Cost Model] Correct vectorization cost model for v2i64->v2f64
and v4i64->v4f64.

The new costs match what we did for SSE2 and reflect the reality of our codegen.

<rdar://problem/16381225>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204884 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-27 00:52:16 +00:00
Jim Grosbach
f20d9ee6a6 X86: Correct vectorization cost model for v8f32->v8i8.
Fix the cost model to reflect the reality of our codegen.

rdar://16370633

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204880 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-27 00:04:11 +00:00
Juergen Ributzka
feaa46379a [X86TTI] Make constant base pointers for getElementPtr opaque.
If getElementPtr uses a constant as base pointer, then make the constant opaque.
This prevents constant folding it with the offset. The offset can usually be
encoded in the load/store instruction itself and the base address doesn't have
to be rematerialized several times.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204739 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-25 18:01:25 +00:00
Juergen Ributzka
e987eb12b6 [Stackmaps][X86TTI] Fix think-o in getIntImmCost calculation.
The cost for the first four stackmap operands was always TCC_Free.
This is only true for the first two operands. All other operands
are TCC_Free if they are within 64bit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204738 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-25 18:01:23 +00:00
Juergen Ributzka
d3cf783ed1 [Constant Hoisting] Make the constant materialization cost operand dependent
Extend the target hook to take also the operand index into account when
calculating the cost of the constant materialization.

Related to <rdar://problem/16381500>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204435 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-21 06:04:45 +00:00
Juergen Ributzka
ee3242ed0b Revert "[Constant Hoisting] Extend coverage of the constant hoisting pass."
I will break this up into smaller pieces for review and recommit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204393 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-20 20:17:13 +00:00
Juergen Ributzka
228c72a841 [Constant Hoisting] Extend coverage of the constant hoisting pass.
This commit extends the coverage of the constant hoisting pass, adds additonal
debug output and updates the function names according to the style guide.

Related to <rdar://problem/16381500>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204389 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-20 19:55:52 +00:00
Craig Topper
3b89e528c4 [C++11] Remove 'virtual' keyword from methods marked with 'override' keyword.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203444 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-10 05:29:18 +00:00
Chandler Carruth
436906ab3c [TTI] There is actually no realistic way to pop TTI implementations off
the stack of the analysis group because they are all immutable passes.
This is made clear by Craig's recent work to use override
systematically -- we weren't overriding anything for 'finalizePass'
because there is no such thing.

This is kind of a lame restriction on the API -- we can no longer push
and pop things, we just set up the stack and run. However, I'm not
invested in building some better solution on top of the existing
(terrifying) immutable pass and legacy pass manager.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203437 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-10 02:45:14 +00:00
Craig Topper
629b96cb4f Switch all uses of LLVM_OVERRIDE to just use 'override' directly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202621 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-02 09:09:27 +00:00
Craig Topper
4eb03f049e Switch all uses of LLVM_FINAL to just use 'final', and remove the macro.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202618 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-02 08:08:51 +00:00
Andrea Di Biagio
029a76b0a2 [Vectorizer] Add a new 'OperandValueKind' in TargetTransformInfo called
'OK_NonUniformConstValue' to identify operands which are constants but
not constant splats.

The cost model now allows returning 'OK_NonUniformConstValue'
for non splat operands that are instances of ConstantVector or
ConstantDataVector.

With this change, targets are now able to compute different costs
for instructions with non-uniform constant operands.
For example, On X86 the cost of a vector shift may vary depending on whether
the second operand is a uniform or non-uniform constant.

This patch applies the following changes:
 - The cost model computation now takes into account non-uniform constants;
 - The cost of vector shift instructions has been improved in
   X86TargetTransformInfo analysis pass;
 - BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish
   between non-uniform and uniform constant operands.

Added a new test to verify that the output of opt
'-cost-model -analyze' is valid in the following configurations: SSE2,
SSE4.1, AVX, AVX2.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201272 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-12 23:43:47 +00:00
Tim Northover
0c245b69f7 X86: add costs for 64-bit vector ext/trunc & rebalance
The most important part of this is probably adding any cost at all for
operations like zext <8 x i8> to <8 x i32>. Before they were being
recorded as extremely costly (24, I believe) which made LLVM fall back
on a 4-wide vectorisation of a loop.

It also rebalances the values for sext, zext and trunc. Lacking any
other sane metric that might work across CPU microarchitectures I went
for instructions. This seems to be in reasonable accord with the rest
of the table (sitofp, ...) though no doubt at least one value is
sub-optimal for some bizarre reason.

Finally, separate AVX and AVX2 values are provided where appropriate.
The CodeGen is quite different in many cases.

rdar://problem/15981990

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200928 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-06 18:18:36 +00:00
Juergen Ributzka
943ce55f39 Revert "Revert "Add Constant Hoisting Pass" (r200034)"
This reverts commit r200058 and adds the using directive for
ARMTargetTransformInfo to silence two g++ overload warnings.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200062 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-25 02:02:55 +00:00
Hans Wennborg
503793e834 Revert "Add Constant Hoisting Pass" (r200034)
This commit caused -Woverloaded-virtual warnings. The two new
TargetTransformInfo::getIntImmCost functions were only added to the superclass,
and to the X86 subclass. The other targets were not updated, and the
warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was
hiding the two new getIntImmCost variants.

We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost"
to the various subclasses, or turning it off, but I suspect that it's wrong to
leave the functions unimplemnted in those targets. The default implementations
return TCC_Free, which I don't think is right e.g. for ARM.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200058 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-25 01:18:18 +00:00