llvm-6502/include/llvm
Arnold Schwaighofer 65457b679a Costmodel: Add support for horizontal vector reductions
Upcoming SLP vectorization improvements will want to be able to estimate costs
of horizontal reductions. Add infrastructure to support this.

We model reductions as a series of (shufflevector,add) tuples ultimately
followed by an extractelement. For example, for an add-reduction of <4 x float>
we could generate the following sequence:

 (v0, v1, v2, v3)
   \   \  /  /
     \  \  /
       +  +

 (v0+v2, v1+v3, undef, undef)
    \      /
 ((v0+v2) + (v1+v3), undef, undef)

 %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,
                           <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
 %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
 %rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,
                          <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
 %bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7
 %r = extractelement <4 x float> %bin.rdx8, i32 0

This commit adds a cost model interface "getReductionCost(Opcode, Ty, Pairwise)"
that will allow clients to ask for the cost of such a reduction (as backends
might generate more efficient code than the cost of the individual instructions
summed up). This interface is excercised by the CostModel analysis pass which
looks for reduction patterns like the one above - starting at extractelements -
and if it sees a matching sequence will call the cost model interface.

We will also support a second form of pairwise reduction that is well supported
on common architectures (haddps, vpadd, faddp).

 (v0, v1, v2, v3)
  \   /    \  /
 (v0+v1, v2+v3, undef, undef)
    \     /
 ((v0+v1)+(v2+v3), undef, undef, undef)

  %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
        <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
  %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
        <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
  %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
  %rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
        <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
  %rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
        <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
  %bin.rdx.1 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
  %r = extractelement <4 x float> %bin.rdx.1, i32 0

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190876 91177308-0d34-0410-b5e6-96231b3b80d8
2013-09-17 18:06:50 +00:00
..
ADT Add warn_unused_result to empty() on various containers. 2013-09-13 17:33:24 +00:00
Analysis Costmodel: Add support for horizontal vector reductions 2013-09-17 18:06:50 +00:00
Assembly Enable *BasicBlockPass::createPrinterPass() 2013-02-08 23:37:41 +00:00
Bitcode Add function attribute 'optnone'. 2013-08-23 11:53:55 +00:00
CodeGen simplify expression 2013-09-17 00:15:33 +00:00
Config [conf] Add config variable to disable crash related overrides. 2013-08-30 20:39:21 +00:00
DebugInfo Add support for DebugFission to DWARF parser 2013-08-27 09:20:22 +00:00
ExecutionEngine Fix include guards. 2013-08-20 22:52:02 +00:00
IR Add llvm.x86.* intrinsics for Intel SHA Extensions 2013-09-17 13:44:39 +00:00
IRReader Split out the IRReader header and the utility functions it provides into 2013-03-26 02:25:37 +00:00
MC Somehow this important part of the patch, where I actually check the Mask, 2013-09-12 14:23:19 +00:00
Object Move everything depending on Object/MachOFormat.h over to Support/MachO.h. 2013-09-01 04:28:48 +00:00
Option Option parsing: support case-insensitive option matching. 2013-08-28 20:04:31 +00:00
Support ELF: Add support for the exclude section bit for gas compat. 2013-09-15 19:53:20 +00:00
TableGen Move StringToOffsetTable into the TableGen include directory so I can use it in clang. 2013-08-29 05:09:55 +00:00
Target Add an instruction deprecation feature to TableGen. 2013-09-12 10:28:05 +00:00
Transforms Remove the long, long defunct IR block placement pass. 2013-09-14 09:28:14 +00:00
AutoUpgrade.h Remove trailing whitespace, fix file path in comment 2013-07-20 17:46:00 +00:00
CMakeLists.txt Move all of the header files which are involved in modelling the LLVM IR 2013-01-02 11:36:10 +00:00
DebugInfo.h Debug Info: move class definition of DIRef. 2013-09-11 18:55:55 +00:00
DIBuilder.h Revert r190269 to fix dragonegg failures. 2013-09-08 06:02:56 +00:00
GVMaterializer.h Fix include guards so they exactly match file names. 2013-01-10 00:45:19 +00:00
InitializePasses.h Remove the long, long defunct IR block placement pass. 2013-09-14 09:28:14 +00:00
InstVisitor.h Move all of the header files which are involved in modelling the LLVM IR 2013-01-02 11:36:10 +00:00
LinkAllIR.h Rename LinkAllVMCore.h to LinkAllIR.h since VMCore directory was renamed to IR. 2013-01-10 21:55:02 +00:00
LinkAllPasses.h Remove the long, long defunct IR block placement pass. 2013-09-14 09:28:14 +00:00
Linker.h Fix a performance bug in the Linker. 2013-05-04 05:05:18 +00:00
Pass.h moves doInitialization and doFinalization to the Pass class and removes some unreachable code in MachineModuleInfo 2012-12-03 21:56:57 +00:00
PassAnalysisSupport.h Fix include guards so they exactly match file names. 2013-01-10 00:45:19 +00:00
PassManager.h This patch breaks up Wrap.h so that it does not have to include all of 2013-05-01 20:59:00 +00:00
PassManagers.h Use a DenseMap instead of a std::map for AnalysisID -> Pass* maps. This reduces the pass-manager overhead from FPPassManager::runOnFunction() by about 10%. 2013-02-26 01:31:59 +00:00
PassRegistry.h This patch breaks up Wrap.h so that it does not have to include all of 2013-05-01 20:59:00 +00:00
PassSupport.h Fix include guards so they exactly match file names. 2013-01-10 00:45:19 +00:00