llvm-6502/test/Transforms/LoopVectorize/X86
Hal Finkel e30aa957e3 Implement X86TTI::getUnrollingPreferences
This provides an initial implementation of getUnrollingPreferences for x86.
getUnrollingPreferences is used by the generic (concatenation) unroller, which
is distinct from the unrolling done by the loop vectorizer. Many modern x86
cores have some kind of uop cache and loop-stream detector (LSD) used to
efficiently dispatch small loops, and taking full advantage of this requires
unrolling small loops (small here means 10s of uops).

These caches also have limits on the number of taken branches in the loop, and
so we also cap the loop unrolling factor based on the maximum "depth" of the
loop. This is currently calculated with a partial DFS traversal (partial
because it will stop early if the path length grows too much). This is still an
approximation, and one that is both conservative (because it does not account
for branches eliminated via block placement) and optimistic (because it is only
recording the maximum depth over minimum paths). Nevertheless, because the
loops that fit in these uop caches are so small, it is not clear how much the
details matter.

The original set of patches posted for review produced the following test-suite
performance results (from the TSVC benchmark) at that time:
  ControlLoops-dbl - 13% speedup
  ControlLoops-flt - 15% speedup
  Reductions-dbl - 7.5% speedup

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205348 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-01 18:50:34 +00:00
..
already-vectorized.ll Move partial/runtime unrolling late in the pipeline 2014-03-31 23:23:51 +00:00
avx1.ll
constant-vector-operand.ll
conversion-cost.ll
cost-model.ll
fp32_to_uint32-cost-model.ll [X86] Adjust cost of FP_TO_UINT v8f32->v8i32 2014-03-30 18:07:13 +00:00
fp64_to_uint32-cost-model.ll [X86] Adjust cost of FP_TO_UINT v4f64->v4i32 as well 2014-03-31 21:54:48 +00:00
fp_to_sint8-cost-model.ll add 'requires asserts' to test that needs it 2014-03-27 00:20:42 +00:00
gather-cost.ll
gcc-examples.ll
illegal-parallel-loop-uniform-write.ll
lit.local.cfg
metadata-enable.ll Implement X86TTI::getUnrollingPreferences 2014-04-01 18:50:34 +00:00
min-trip-count-switch.ll
no-vector.ll
parallel-loops-after-reg2mem.ll
parallel-loops.ll
rauw-bug.ll SLPVectorizer: Fix stale for Value pointer array 2013-11-19 22:20:20 +00:00
reduction-crash.ll
small-size.ll [vectorizer] Completely disable the block frequency guidance of the loop 2014-01-28 09:10:41 +00:00
struct-store.ll
tripcount.ll LoopVectorizer: Truncate i64 trip counts of i32 phis if necessary 2013-11-26 22:11:23 +00:00
uint64_to_fp64-cost-model.ll [X86][Vectorizer Cost Model] Correct vectorization cost model for v2i64->v2f64 2014-03-27 00:52:16 +00:00
unroll_selection.ll
unroll-pm.ll
unroll-small-loops.ll [vectorizer] Tweak the way we do small loop runtime unrolling in the 2014-01-31 10:51:08 +00:00
vector_ptr_load_store.ll
vector-scalar-select-cost.ll
x86_fp80-vector-store.ll