llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2025-01-18 13:34:04 +00:00

History

Arnold Schwaighofer 65457b679a Costmodel: Add support for horizontal vector reductions

Upcoming SLP vectorization improvements will want to be able to estimate costs
of horizontal reductions. Add infrastructure to support this.

We model reductions as a series of (shufflevector,add) tuples ultimately
followed by an extractelement. For example, for an add-reduction of <4 x float>
we could generate the following sequence:

 (v0, v1, v2, v3)
   \   \  /  /
     \  \  /
       +  +

 (v0+v2, v1+v3, undef, undef)
    \      /
 ((v0+v2) + (v1+v3), undef, undef)

 %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,
                           <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
 %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
 %rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,
                          <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
 %bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7
 %r = extractelement <4 x float> %bin.rdx8, i32 0

This commit adds a cost model interface "getReductionCost(Opcode, Ty, Pairwise)"
that will allow clients to ask for the cost of such a reduction (as backends
might generate more efficient code than the cost of the individual instructions
summed up). This interface is excercised by the CostModel analysis pass which
looks for reduction patterns like the one above - starting at extractelements -
and if it sees a matching sequence will call the cost model interface.

We will also support a second form of pairwise reduction that is well supported
on common architectures (haddps, vpadd, faddp).

 (v0, v1, v2, v3)
  \   /    \  /
 (v0+v1, v2+v3, undef, undef)
    \     /
 ((v0+v1)+(v2+v3), undef, undef, undef)

  %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
        <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
  %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
        <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
  %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
  %rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
        <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
  %rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
        <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
  %bin.rdx.1 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
  %r = extractelement <4 x float> %bin.rdx.1, i32 0

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190876 91177308-0d34-0410-b5e6-96231b3b80d8

2013-09-17 18:06:50 +00:00

IPA

Disable inlining between sanitized and non-sanitized functions.

2013-08-08 08:22:39 +00:00

AliasAnalysis.cpp

Reimplement isPotentiallyReachable to make nocapture deduction much stronger.

2013-07-27 01:24:00 +00:00

AliasAnalysisCounter.cpp

…

AliasAnalysisEvaluator.cpp

…

AliasDebugger.cpp

…

AliasSetTracker.cpp

In AliasSetTracker, do not change the alias set to "mod/ref" when adding

2013-09-12 20:15:50 +00:00

Analysis.cpp

…

BasicAliasAnalysis.cpp

Remove trailing spaces.

2013-08-24 14:16:00 +00:00

BlockFrequencyInfo.cpp

…

BranchProbabilityInfo.cpp

Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size.

2013-07-04 01:31:24 +00:00

CaptureTracking.cpp

Extend 'readonly' and 'readnone' to work on function arguments as well as

2013-07-06 00:29:58 +00:00

CFG.cpp

Add some constantness.

2013-08-20 23:04:15 +00:00

CFGPrinter.cpp

…

CMakeLists.txt

Also update CMakeLists.txt for r187283.

2013-07-27 01:25:51 +00:00

CodeMetrics.cpp

…

ConstantFolding.cpp

Move variable under condition where it is used

2013-09-12 01:07:58 +00:00

CostModel.cpp

Costmodel: Add support for horizontal vector reductions

2013-09-17 18:06:50 +00:00

DependenceAnalysis.cpp

Remove extraneous semicolon.

2013-08-06 16:40:40 +00:00

DominanceFrontier.cpp

…

DomPrinter.cpp

…

InstCount.cpp

…

InstructionSimplify.cpp

Add ISD::FROUND for libm round()

2013-08-07 22:49:12 +00:00

Interval.cpp

…

IntervalPartition.cpp

…

IVUsers.cpp

…

LazyValueInfo.cpp

Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size.

2013-07-04 01:31:24 +00:00

LibCallAliasAnalysis.cpp

…

LibCallSemantics.cpp

…

Lint.cpp

Fix lint assert on integer vector division

2013-08-26 23:29:33 +00:00

LLVMBuild.txt

…

Loads.cpp

…

LoopInfo.cpp

Add 'const' qualifiers to static const char* variables.

2013-07-16 01:17:10 +00:00

LoopPass.cpp

Comment: try to clarify loop iteration order.

2013-07-20 23:10:31 +00:00

Makefile

…

MemDepPrinter.cpp

…

MemoryBuiltins.cpp

Treat nothrow forms of ::operator delete and ::operator delete[] as

2013-07-21 23:11:42 +00:00

MemoryDependenceAnalysis.cpp

…

ModuleDebugInfoPrinter.cpp

…

NoAliasAnalysis.cpp

…

PathNumbering.cpp

…

PathProfileInfo.cpp

…

PathProfileVerifier.cpp

…

PHITransAddr.cpp

…

PostDominators.cpp

…

ProfileDataLoader.cpp

Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size.

2013-07-14 04:42:23 +00:00

ProfileDataLoaderPass.cpp

…

ProfileEstimatorPass.cpp

Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size.

2013-07-04 01:31:24 +00:00

ProfileInfo.cpp

…

ProfileInfoLoader.cpp

…

ProfileInfoLoaderPass.cpp

…

ProfileVerifierPass.cpp

…

PtrUseVisitor.cpp

…

README.txt

…

RegionInfo.cpp

Reorder headers according to lint.

2013-08-21 21:14:19 +00:00

RegionPass.cpp

…

RegionPrinter.cpp

…

ScalarEvolution.cpp

Teach ScalarEvolution about pointer address spaces

2013-09-10 19:55:24 +00:00

ScalarEvolutionAliasAnalysis.cpp

…

ScalarEvolutionExpander.cpp

Teach ScalarEvolution about pointer address spaces

2013-09-10 19:55:24 +00:00

ScalarEvolutionNormalization.cpp

…

SparsePropagation.cpp

…

TargetTransformInfo.cpp

Costmodel: Add support for horizontal vector reductions

2013-09-17 18:06:50 +00:00

Trace.cpp

…

TypeBasedAliasAnalysis.cpp

TBAA: add isTBAAVtableAccess to MDNode so clients can call the function

2013-09-06 22:47:05 +00:00

ValueTracking.cpp

Fix assert with GEP ptr vector indexing structs

2013-08-19 21:43:16 +00:00

README.txt

Analysis Opportunities:

//===---------------------------------------------------------------------===//

In test/Transforms/LoopStrengthReduce/quadradic-exit-value.ll, the
ScalarEvolution expression for %r is this:

  {1,+,3,+,2}<loop>

Outside the loop, this could be evaluated simply as (%n * %n), however
ScalarEvolution currently evaluates it as

  (-2 + (2 * (trunc i65 (((zext i64 (-2 + %n) to i65) * (zext i64 (-1 + %n) to i65)) /u 2) to i64)) + (3 * %n))

In addition to being much more complicated, it involves i65 arithmetic,
which is very inefficient when expanded into code.

//===---------------------------------------------------------------------===//

In formatValue in test/CodeGen/X86/lsr-delayed-fold.ll,

ScalarEvolution is forming this expression:

((trunc i64 (-1 * %arg5) to i32) + (trunc i64 %arg5 to i32) + (-1 * (trunc i64 undef to i32)))

This could be folded to

(-1 * (trunc i64 undef to i32))

//===---------------------------------------------------------------------===//