Commit Graph

525 Commits

Author SHA1 Message Date
Arnold Schwaighofer
a16c1b55e2 LoopVectorizer: Enable unrolling of conditional stores and the load/store
unrolling heuristic per default

Benchmarking on x86_64 (thanks Chandler!) and ARM has shown those options speed
up some benchmarks while not causing any interesting regressions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200621 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-02 03:12:34 +00:00
Reid Kleckner
86cb795388 Revert "[SLPV] Recognize vectorizable intrinsics during SLP vectorization ..."
This reverts commit r200576.  It broke 32-bit self-host builds by
vectorizing two calls to @llvm.bswap.i64, which we then fail to expand.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200602 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-01 01:37:30 +00:00
Chandler Carruth
093b0413fe [SLPV] Recognize vectorizable intrinsics during SLP vectorization and
transform accordingly. Based on similar code from Loop vectorization.
Subsequent commits will include vectorization of function calls to
vector intrinsics and form function calls to vector library calls.

Patch by Raul Silvera! (Much delayed due to my not running dcommit)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200576 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-31 21:14:40 +00:00
Chandler Carruth
93228f6199 [vectorizer] Tweak the way we do small loop runtime unrolling in the
loop vectorizer to not do so when runtime pointer checks are needed and
share code with the new (not yet enabled) load/store saturation runtime
unrolling. Also ensure that we only consider the runtime checks when the
loop hasn't already been vectorized. If it has, the runtime check cost
has already been paid.

I've fleshed out a test case to cover the scalar unrolling as well as
the vector unrolling and comment clearly why we are or aren't following
the pattern.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200530 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-31 10:51:08 +00:00
Arnold Schwaighofer
8dc253e97b LoopVectorizer: Don't count the induction variable multiple times
When estimating register pressure, don't count the induction variable mulitple
times. It is unlikely to be unrolled. This is currently disabled and hidden
behind a flag ("enable-ind-var-reg-heur").

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200371 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-29 04:36:12 +00:00
Chandler Carruth
05d43d8b6f [vectorizer] Completely disable the block frequency guidance of the loop
vectorizer, placing it behind an off-by-default flag.

It turns out that block frequency isn't what we want at all, here or
elsewhere. This has been I think a nagging feeling for several of us
working with it, but Arnold has given some really nice simple examples
where the results are so comprehensively wrong that they aren't useful.

I'm planning to email the dev list with a summary of why its not really
useful and a couple of ideas about how to better structure these types
of heuristics.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200294 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-28 09:10:41 +00:00
Arnold Schwaighofer
a47aa4b4ef LoopVectorize: Support conditional stores by scalarizing
The vectorizer takes a loop like this and widens all instructions except for the
store. The stores are scalarized/unrolled and hidden behind an "if" block.

  for (i = 0; i < 128; ++i) {
    if (a[i] < 10)
      a[i] += val;
  }

  for (i = 0; i < 128; i+=2) {
    v = a[i:i+1];
    v0 = (extract v, 0) + 10;
    v1 = (extract v, 1) + 10;
    if (v0 < 10)
      a[i] = v0;
    if (v1 < 10)
      a[i] = v1;
  }

The vectorizer relies on subsequent optimizations to sink instructions into the
conditional block where they are anticipated.

The flag "vectorize-num-stores-pred" controls whether and how many stores to
handle this way. Vectorization of conditional stores is disabled per default for
now.

This patch also adds a change to the heuristic when the flag
"enable-loadstore-runtime-unroll" is enabled (off by default). It unrolls small
loops until load/store ports are saturated. This heuristic uses TTI's
getMaxUnrollFactor as a measure for load/store ports.

I also added a second flag -enable-cond-stores-vec. It will enable vectorization
of conditional stores. But there is no cost model for vectorization of
conditional stores in place yet so this will not do good at the moment.

rdar://15892953

Results for x86-64 -O3 -mavx +/- -mllvm -enable-loadstore-runtime-unroll
-vectorize-num-stores-pred=1 (before the BFI change):

 Performance Regressions:
   Benchmarks/Ptrdist/yacr2/yacr2 7.35% (maze3() is identical but 10% slower)
   Applications/siod/siod         2.18%
 Performance improvements:
   mesa                          -4.42%
   libquantum                    -4.15%

 With a patch that slightly changes the register heuristics (by subtracting the
 induction variable on both sides of the register pressure equation, as the
 induction variable is probably not really unrolled):

 Performance Regressions:
   Benchmarks/Ptrdist/yacr2/yacr2  7.73%
   Applications/siod/siod          1.97%

 Performance Improvements:
   libquantum                    -13.05% (we now also unroll quantum_toffoli)
   mesa                           -4.27%

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200270 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-28 01:01:53 +00:00
Chandler Carruth
5f61e70eac [vectorize] Initial version of respecting PGO in the vectorizer: treat
cold loops as-if they were being optimized for size.

Nothing fancy here. Simply test case included. The nice thing is that we
can now incrementally build on top of this to drive other heuristics.
All of the infrastructure work is done to get the profile information
into this layer.

The remaining work necessary to make this a fully general purpose loop
unroller for very hot loops is to make it a fully general purpose loop
unroller. Things I know of but am not going to have time to benchmark
and fix in the immediate future:

1) Don't disable the entire pass when the target is lacking vector
   registers. This really doesn't make any sense any more.
2) Teach the unroller at least and the vectorizer potentially to handle
   non-if-converted loops. This is trivial for the unroller but hard for
   the vectorizer.
3) Compute the relative hotness of the loop and thread that down to the
   various places that make cost tradeoffs (very likely only the
   unroller makes sense here, and then only when dealing with loops that
   are small enough for unrolling to not completely blow out the LSD).

I'm still dubious how useful hotness information will be. So far, my
experiments show that if we can get the correct logic for determining
when unrolling actually helps performance, the code size impact is
completely unimportant and we can unroll in all cases. But at least
we'll no longer burn code size on cold code.

One somewhat unrelated idea that I've had forever but not had time to
implement: mark all functions which are only reachable via the global
constructors rigging in the module as optsize. This would also decrease
the impact of any more aggressive heuristics here on code size.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200219 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-27 13:11:50 +00:00
Chandler Carruth
1c4746ed70 [vectorizer] Add an override for the target instruction cost and use it
to stabilize a test that really is trying to test generic behavior and
not a specific target's behavior.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200215 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-27 11:41:50 +00:00
Chandler Carruth
91a3f1dc8e [vectorizer] Simplify code to use existing helpers on the Function
object and fewer pointless variables.

Also, add a clarifying comment and a FIXME because the code which
disables *all* vectorization if we can't use implicit floating point
instructions just makes no sense at all.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200214 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-27 11:27:37 +00:00
Chandler Carruth
424b2b0093 [vectorizer] Teach the loop vectorizer's unroller to only unroll by
powers of two. This is essentially always the correct thing given the
impact on alignment, scaling factors that can be used in addressing
modes, etc. Also, fix the management of the unroll vs. small loop cost
to more accurately model things with this world.

Enhance a test case to actually exercise more of the unroll machinery if
using synthetic constants rather than a specific target model. Before
this change, with the added flags this test will unroll 3 times instead
of either 2 or 4 (the two sensible answers).

While I don't expect this to make a huge difference, if there are lots
of loops sitting right on the edge of hitting the 'small unroll' factor,
they might change behavior. However, I've benchmarked moving the small
loop cost up and down in many various ways and by a huge factor (2x)
without seeing more than 0.2% code size growth. Small adjustments such
as the series that led up here have led to about 1% improvement on some
benchmarks, but it is very close to the noise floor so I mostly checked
that nothing regressed. Let me know if you see bad behavior on other
targets but I don't expect this to be a sufficiently dramatic change to
trigger anything.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200213 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-27 11:12:24 +00:00
Chandler Carruth
9f22a8788f [vectorizer] Add some flags which are useful for conducting experiments
with the unrolling behavior in the loop vectorizer. No functionality
changed at this point.

These are a bit hack-y, but talking with Hal, there doesn't seem to be
a cleaner way to easily experiment with different thresholds here and he
was also interested in them so I wanted to commit them. Suggestions for
improvement are very welcome here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200212 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-27 11:12:19 +00:00
Chandler Carruth
3fa842d791 [vectorizer] Fix a trivial oversight where we always requested the
number of vector registers rather than toggling between vector and
scalar register number based on VF. I don't have a test case as
I spotted this by inspection and on X86 it only makes a difference if
your target is lacking SSE and thus has *no* vector registers.

If someone wants to add a test case for this for ARM or somewhere else
where this is more significant, that would be awesome.

Also made the variable name a bit more sensible while I'm here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200211 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-27 11:12:14 +00:00
Chandler Carruth
0afd0bc5fa [vectorizer] Clean up the handling of unvectorized loop unrolling in the
LoopVectorize pass.

The logic here doesn't make much sense. We *only* unrolled if the
unvectorized loop was a reduction loop with a single basic block *and*
small loop body. The reduction part in particular doesn't make much
sense. Instead, if we just fall through to the vectorized unroll logic
it makes more sense of unrolling if there is a vectorized reduction that
could be hacked on by the SLP vectorizer *or* if the loop is small.

This is mostly a cleanup and nothing in the test suite really exercises
this, but I did run benchmarks across this change and saw no really
significant changes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200198 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-27 08:17:58 +00:00
Chandler Carruth
754b83a51a [LPM] Conclude my immediate work by making the LoopVectorizer
a FunctionPass. With this change the loop vectorizer no longer is a loop
pass and can readily depend on function analyses. In particular, with
this change we no longer have to form a loop pass manager to run the
loop vectorizer which simplifies the entire pass management of LLVM.

The next step here is to teach the loop vectorizer to leverage profile
information through the profile information providing analysis passes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200074 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-25 10:01:55 +00:00
Alp Toker
ae43cab6ba Fix known typos
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200018 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-24 17:20:08 +00:00
Arnold Schwaighofer
2becaaf3a1 LoopVectorizer: A reduction that has multiple uses of the reduction value is not
a reduction.

Really. Under certain circumstances (the use list of an instruction has to be
set up right - hence the extra pass in the test case) we would not recognize
when a value in a potential reduction cycle was used multiple times by the
reduction cycle.

Fixes PR18526.
radar://15851149

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199570 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-19 03:18:31 +00:00
Arnold Schwaighofer
e96fec2e43 LoopVectorize: Only strip casts from integer types when replacing symbolic
strides

Fixes PR18480.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199291 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-15 03:35:46 +00:00
Chandler Carruth
7f2eff792a [PM] Split DominatorTree into a concrete analysis result object which
can be used by both the new pass manager and the old.

This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.

The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.

Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199104 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-13 13:07:17 +00:00
Chandler Carruth
56e1394c88 [cleanup] Move the Dominators.h and Verifier.h headers into the IR
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.

Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.

But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199082 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-13 09:26:24 +00:00
Arnold Schwaighofer
73c9559237 LoopVectorizer: Enable strided memory accesses versioning per default
I saw no compile or execution time regressions on x86_64 -mavx -O3.

radar://13075509

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199015 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-11 20:40:34 +00:00
NAKAMURA Takumi
55da404566 LoopVectorize.cpp: Appease MSC16.
Excuse me, I hope msc16 builders would be fine till its end day.
Introduce nullptr then. ;)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199001 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-11 09:59:27 +00:00
Arnold Schwaighofer
ee3f7de62e LoopVectorizer: Handle strided memory accesses by versioning
for (i = 0; i < N; ++i)
   A[i * Stride1] += B[i * Stride2];

We take loops like this and check that the symbolic strides 'Strided1/2' are one
and drop to the scalar loop if they are not.

This is currently disabled by default and hidden behind the flag
'enable-mem-access-versioning'.

radar://13075509

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198950 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-10 18:20:32 +00:00
Chandler Carruth
974a445bd9 Re-sort all of the includes with ./utils/sort_includes.py so that
subsequent changes are easier to review. About to fix some layering
issues, and wanted to separate out the necessary churn.

Also comment and sink the include of "Windows.h" in three .inc files to
match the usage in Memory.inc.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198685 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-07 11:48:04 +00:00
Arnold Schwaighofer
83196a9fcb LoopVectorizer: Don't if-convert constant expressions that can trap
A phi node operand or an instruction operand could be a constant expression that
can trap (division). Check that we don't vectorize such cases.

PR16729
radar://15653590

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@197449 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-17 01:11:01 +00:00
NAKAMURA Takumi
0d87d72fa7 Prune redundant dependencies in LLVMBuild.txt.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196988 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-11 00:30:57 +00:00
NAKAMURA Takumi
e0c0c4bdf6 Whitespaces.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196880 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-10 05:39:12 +00:00
Jakub Staszak
7ae72bfd94 Don't #include heavy Dominators.h file in LoopInfo.h. This change reduces
overall time of LLVM compilation by ~1%.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196667 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-07 21:20:17 +00:00
Renato Golin
07d9471bc5 Add #pragma vectorize enable/disable to LLVM
The intended behaviour is to force vectorization on the presence
of the flag (either turn on or off), and to continue the behaviour
as expected in its absence. Tests were added to make sure the all
cases are covered in opt. No tests were added in other tools with
the assumption that they should use the PassManagerBuilder in the
same way.

This patch also removes the outdated -late-vectorize flag, which was
on by default and not helping much.

The pragma metadata is being attached to the same place as other loop
metadata, but nothing forbids one from attaching it to a function
(to enable #pragma optimize) or basic blocks (to hint the basic-block
vectorizers), etc. The logic should be the same all around.

Patches to Clang to produce the metadata will be produced after the
initial implementation is agreed upon and committed. Patches to other
vectorizers (such as SLP and BB) will be added once we're happy with
the pass manager changes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196537 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-05 21:20:02 +00:00
Rafael Espindola
3bb1184a1a Fix non-deterministic behavior.
We use CSEBlocks to initialize a worklist:

SmallVector<BasicBlock *, 8> CSEWorkList(CSEBlocks.begin(), CSEBlocks.end());

so it must have a deterministic order.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196520 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-05 18:28:01 +00:00
Arnold Schwaighofer
9e0807cb61 SLPVectorizer: An in-tree vectorized entry cannot also be a scalar external use
We were creating external uses for scalar values in MustGather entries that also
had a ScalarToTreeEntry (they also are present in a vectorized tuple). This
meant we would keep a value 'alive' as a scalar and vectorized causing havoc.
This is not necessary because when we create a MustGather vector we explicitly
create external uses entries for the insertelement instructions of the
MustGather vector elements.

Fixes PR18129.

radar://15582184

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196508 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-05 15:14:40 +00:00
Alp Toker
087ab613f4 Correct word hyphenations
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196471 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-05 05:44:44 +00:00
Nadav Rotem
7e8ff837e6 PR1860 - We can't save a list of ExtractElement instructions to CSE because some of these instructions
may be removed and optimized in future iterations. Instead we save a list of basic blocks that we need to CSE.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195791 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-26 22:24:25 +00:00
Arnold Schwaighofer
b40f14eb89 LoopVectorizer: Truncate i64 trip counts of i32 phis if necessary
In signed arithmetic we could end up with an i64 trip count for an i32 phi.
Because it is signed arithmetic we know that this is only defined if the i32
does not wrap. It is therefore safe to truncate the i64 trip count to a i32
value.

Fixes PR18049.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195787 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-26 22:11:23 +00:00
Nadav Rotem
bba8da2ba0 PR18060 - When we RAUW values with ExtractElement instructions in some cases
we generate PHI nodes with multiple entries from the same basic block but
with different values. Enabling CSE on ExtractElement instructions make sure
that all of the RAUWed instructions are the same.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195773 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-26 17:29:19 +00:00
Chandler Carruth
fbe605e712 Migrate metadata information from scalar to vector instructions during
SLP vectorization. Based on the code in BBVectorizer.

Fixes PR17741.

Patch by Raul Silvera, reviewed by Hal and Nadav. Reformatted by my
driving of clang-format. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195528 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-23 00:48:34 +00:00
Arnold Schwaighofer
21a47246f9 SLPVectorizer: Fix whitespace errors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195468 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-22 15:47:17 +00:00
Yi Jiang
709a31b5f9 SLP Vectorizer: Extract cost will only be added once even if the scalar has multiple external uses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195406 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-22 01:57:02 +00:00
Arnold Schwaighofer
4bc2e3a32d SLPVectorizer: Fix stale for Value pointer array
We are slicing an array of Value pointers and process those slices in a loop.
The problem is that we might invalidate a later slice by vectorizing a former
slice.

Use a WeakVH to track the pointer. If the pointer is deleted or RAUW'ed we can
tell.

The test case will only fail when running with libgmalloc.

radar://15498655

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195162 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-19 22:20:20 +00:00
Arnold Schwaighofer
413f7bea8d SLPVectorizer: Fix whitespace errors
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195161 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-19 22:20:18 +00:00
Arnold Schwaighofer
07a3c481c6 LoopVectorizer: Extend the induction variable to a larger type
In some case the loop exit count computation can overflow. Extend the type to
prevent most of those cases.

The problem is loops like:
int main ()
{
  int a = 1;
  char b = 0;
  lbl:
    a &= 4;
    b--;
    if (b) goto lbl;
  return a;
}

The backedge count is 255. The induction variable type is i8. If we add one to
255 to get the exit count we overflow to zero.

To work around this issue we extend the type of the induction variable to i32 in
the case of i8 and i16.

PR17532

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195008 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-18 13:14:32 +00:00
Arnold Schwaighofer
4634338655 LoopVectorizer: Use abi alignment for accesses with no alignment
When we vectorize a scalar access with no alignment specified, we have to set
the target's abi alignment of the scalar access on the vectorized access.
Using the same alignment of zero would be wrong because most targets will have a
bigger abi alignment for vector types.

This probably fixes PR17878.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194876 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-15 23:09:33 +00:00
Renato Golin
4921d5b0a9 Move debug message in vectorizer
No functional change, just better reporting.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194388 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-11 16:27:35 +00:00
Benjamin Kramer
63d8f88686 SLPVectorizer: Use properlyDominates to satisfy the irreflexivity of a strict weak ordering.
STL debug mode checks this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194015 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-04 21:34:55 +00:00
Benjamin Kramer
ec346c1314 SLPVectorizer: Add a missing pair of parens. No functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193958 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-03 12:54:32 +00:00
Benjamin Kramer
0c7ba3cef2 SLPVectorizer: When CSEing generated gathers only scan blocks containing them.
Instead of doing a RPO traversal of the whole function remember the blocks
containing gathers (typically <= 2) and scan them in dominator-first order.

The actual CSE is still quadratic, but I'm not confident that adding a
scoped hash table here is worth it as we're only looking at the generated
instructions and not arbitrary code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193956 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-03 12:27:52 +00:00
Benjamin Kramer
9bbc7b4e49 SLPVectorizer: Remove duplicated function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193927 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-02 14:46:27 +00:00
Benjamin Kramer
ff566d8f44 LoopVectorize: Remove quadratic behavior the local CSE.
Doing this with a hash map doesn't change behavior and avoids calling
isIdenticalTo O(n^2) times. This should probably eventually move into a utility
class shared with EarlyCSE and the limited CSE in the SLPVectorizer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193926 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-02 13:39:00 +00:00
Arnold Schwaighofer
bc28e88a28 LoopVectorizer: Move cse code into its own function
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193895 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-01 23:28:54 +00:00
Arnold Schwaighofer
f4775827d0 LoopVectorizer: Perform redundancy elimination on induction variables
When the loop vectorizer was part of the SCC inliner pass manager gvn would
run after the loop vectorizer followed by instcombine. This way redundancy
(multiple uses) were removed and instcombine could perform scalarization on the
induction variables. Having moved the loop vectorizer to later we no longer run
any form of redundancy elimination before we perform instcombine. This caused
vectorized induction variables to survive that did not before.

On a recent iMac this helps linpack back from 6000Mflops to 7000Mflops.

This should also help lpbench and paq8p.

I ran a Release (without Asserts) build over the test-suite and did not see any
negative impact on compile time.

radar://15339680

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193891 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-01 22:18:19 +00:00