llvm-6502/include/llvm/CodeGen
Andrew Trick 573931394f MI-Sched: handle latency of in-order operations with the new machine model.
The per-operand machine model allows the target to define "unbuffered"
processor resources. This change is a quick, cheap way to model stalls
caused by the latency of operations that use such resources. This only
applies when the processor's micro-op buffer size is non-zero
(Out-of-Order). We can't precisely model in-order stalls during
out-of-order execution, but this is an easy and effective
heuristic. It benefits cortex-a9 scheduling when using the new
machine model, which is not yet on by default.

MI-Sched for armv7 was evaluated on Swift (and only not enabled because
of a performance bug related to predication). However, we never
evaluated Cortex-A9 performance on MI-Sched in its current form. This
change adds MI-Sched functionality to reach performance goals on
A9. The only remaining change is to allow MI-Sched to run as a PostRA
pass.

I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7:
-mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false

For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results:
(min run time over 2 runs, filtering tiny changes)

Speedups:
| Benchmarks/BenchmarkGame/recursive         |  52.39% |
| Benchmarks/VersaBench/beamformer           |  20.80% |
| Benchmarks/Misc/pi                         |  19.97% |
| Benchmarks/Misc/mandel-2                   |  19.95% |
| SPEC/CFP2000/188.ammp                      |  18.72% |
| Benchmarks/McCat/08-main/main              |  18.58% |
| Benchmarks/Misc-C++/Large/sphereflake      |  18.46% |
| Benchmarks/Olden/power                     |  17.11% |
| Benchmarks/Misc-C++/mandel-text            |  16.47% |
| Benchmarks/Misc/oourafft                   |  15.94% |
| Benchmarks/Misc/flops-7                    |  14.99% |
| Benchmarks/FreeBench/distray               |  14.26% |
| SPEC/CFP2006/470.lbm                       |  14.00% |
| mediabench/mpeg2/mpeg2dec/mpeg2decode      |  12.28% |
| Benchmarks/SmallPT/smallpt                 |  10.36% |
| Benchmarks/Misc-C++/Large/ray              |   8.97% |
| Benchmarks/Misc/fp-convert                 |   8.75% |
| Benchmarks/Olden/perimeter                 |   7.10% |
| Benchmarks/Bullet/bullet                   |   7.03% |
| Benchmarks/Misc/mandel                     |   6.75% |
| Benchmarks/Olden/voronoi                   |   6.26% |
| Benchmarks/Misc/flops-8                    |   5.77% |
| Benchmarks/Misc/matmul_f64_4x4             |   5.19% |
| Benchmarks/MiBench/security-rijndael       |   5.15% |
| Benchmarks/Misc/flops-6                    |   5.10% |
| Benchmarks/Olden/tsp                       |   4.46% |
| Benchmarks/MiBench/consumer-lame           |   4.28% |
| Benchmarks/Misc/flops-5                    |   4.27% |
| Benchmarks/mafft/pairlocalalign            |   4.19% |
| Benchmarks/Misc/himenobmtxpa               |   4.07% |
| Benchmarks/Misc/lowercase                  |   4.06% |
| SPEC/CFP2006/433.milc                      |   3.99% |
| Benchmarks/tramp3d-v4                      |   3.79% |
| Benchmarks/FreeBench/pifft                 |   3.66% |
| Benchmarks/Ptrdist/ks                      |   3.21% |
| Benchmarks/Adobe-C++/loop_unroll           |   3.12% |
| SPEC/CINT2000/175.vpr                      |   3.12% |
| Benchmarks/nbench                          |   2.98% |
| SPEC/CFP2000/183.equake                    |   2.91% |
| Benchmarks/Misc/perlin                     |   2.85% |
| Benchmarks/Misc/flops-1                    |   2.82% |
| Benchmarks/Misc-C++-EH/spirit              |   2.80% |
| Benchmarks/Misc/flops-2                    |   2.77% |
| Benchmarks/NPB-serial/is                   |   2.42% |
| Benchmarks/ASC_Sequoia/CrystalMk           |   2.33% |
| Benchmarks/BenchmarkGame/n-body            |   2.28% |
| Benchmarks/SciMark2-C/scimark2             |   2.27% |
| Benchmarks/Olden/bh                        |   2.03% |
| skidmarks10/skidmarks                      |   1.81% |
| Benchmarks/Misc/flops                      |   1.72% |

Slowdowns:
| Benchmarks/llubenchmark/llu                | -14.14% |
| Benchmarks/Polybench/stencils/seidel-2d    |  -5.67% |
| Benchmarks/Adobe-C++/functionobjects       |  -5.25% |
| Benchmarks/Misc-C++/oopack_v1p8            |  -5.00% |
| Benchmarks/Shootout/hash                   |  -2.35% |
| Benchmarks/Prolangs-C++/ocean              |  -2.01% |
| Benchmarks/Polybench/medley/floyd-warshall |  -1.98% |
| Polybench/linear-algebra/kernels/3mm       |  -1.95% |
| Benchmarks/McCat/09-vor/vor                |  -1.68% |

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196516 91177308-0d34-0410-b5e6-96231b3b80d8
2013-12-05 17:55:58 +00:00
..
PBQP Dereference the node iterator when dumping the PBQP graph structure in DOT 2013-11-21 06:30:14 +00:00
Analysis.h
AsmPrinter.h Reland 196270 "Generalize debug info / EH emission in AsmPrinter" 2013-12-03 15:10:23 +00:00
CalcSpillWeights.h CalcSpillWeights: allow overidding the spill weight normalizing function 2013-11-11 19:56:14 +00:00
CallingConvLower.h
CommandFlags.h Speling fixes. 2013-10-22 15:18:03 +00:00
DAGCombine.h
DFAPacketizer.h
EdgeBundles.h
FastISel.h Avoid illegal integer promotion in fastisel 2013-11-15 19:09:27 +00:00
FunctionLoweringInfo.h
GCMetadata.h
GCMetadataPrinter.h
GCs.h
GCStrategy.h
IntrinsicLowering.h
ISDOpcodes.h Add addrspacecast instruction. 2013-11-15 01:34:59 +00:00
JITCodeEmitter.h
LatencyPriorityQueue.h
LexicalScopes.h Remove capability for polymorphic destruction from LexicalScope 2013-11-20 00:54:28 +00:00
LinkAllAsmWriterComponents.h
LinkAllCodegenComponents.h
LiveInterval.h Replacing HUGE_VALF with llvm::huge_valf in order to work around a warning triggered in MSVC 12. 2013-11-13 00:15:44 +00:00
LiveIntervalAnalysis.h Represent RegUnit liveness with LiveRange instance 2013-10-10 21:29:02 +00:00
LiveIntervalUnion.h Rename LiveRange to LiveInterval::Segment 2013-10-10 21:28:43 +00:00
LiveRangeEdit.h
LiveRegMatrix.h
LiveRegUnits.h LiveRegUnits: Use *MBB for consistency and convenience. 2013-10-14 22:18:59 +00:00
LiveStackAnalysis.h
LiveVariables.h
MachineBasicBlock.h Even more spelling fixes for "instruction". 2013-09-28 13:42:22 +00:00
MachineBlockFrequencyInfo.h Added MachineBlockFrequencyInfo::view for displaying the block frequency propagation graph via graphviz. 2013-12-03 00:49:33 +00:00
MachineBranchProbabilityInfo.h
MachineCodeEmitter.h
MachineCodeInfo.h
MachineConstantPool.h
MachineDominators.h
MachineFrameInfo.h
MachineFunction.h
MachineFunctionAnalysis.h
MachineFunctionPass.h
MachineInstr.h Rename parameter: defined regs are not incoming. 2013-10-10 21:28:38 +00:00
MachineInstrBuilder.h
MachineInstrBundle.h
MachineJumpTableInfo.h
MachineLoopInfo.h
MachineMemOperand.h
MachineModuleInfo.h
MachineModuleInfoImpls.h
MachineOperand.h Fix a typo where we were creating <def,kill> operands instead of 2013-11-22 00:46:32 +00:00
MachinePassRegistry.h
MachinePostDominators.h
MachineRegisterInfo.h [weak vtables] Remove a bunch of weak vtables 2013-11-19 00:57:56 +00:00
MachineRelocation.h
MachineScheduler.h [weak vtables] Remove a bunch of weak vtables 2013-11-19 00:57:56 +00:00
MachineSSAUpdater.h
MachineTraceMetrics.h
MachORelocation.h
Passes.h Move the old pass manager infrastructure into a legacy namespace and 2013-11-09 12:26:54 +00:00
PseudoSourceValue.h
RegAllocPBQP.h Re-apply r194300 with fixes for warnings. 2013-11-09 03:08:56 +00:00
RegAllocRegistry.h
RegisterClassInfo.h
RegisterPressure.h Represent RegUnit liveness with LiveRange instance 2013-10-10 21:29:02 +00:00
RegisterScavenging.h
ResourcePriorityQueue.h
RuntimeLibcalls.h Fix filename in header comment 2013-11-16 15:40:54 +00:00
ScheduleDAG.h MI-Sched: handle latency of in-order operations with the new machine model. 2013-12-05 17:55:58 +00:00
ScheduleDAGInstrs.h Correct word hyphenations 2013-12-05 05:44:44 +00:00
ScheduleDFS.h
ScheduleHazardRecognizer.h
SchedulerRegistry.h
ScoreboardHazardRecognizer.h
SelectionDAG.h Split SETCC if VSELECT requires splitting too. 2013-11-22 00:39:23 +00:00
SelectionDAGISel.h Add OPC_CheckChildSame0-3 to the DAG isel matcher. This replaces sequences of MoveChild, CheckSame, MoveParent. Saves 846 bytes from the X86 DAG isel matcher, ~300 from ARM, ~840 from Hexagon. 2013-10-05 05:38:16 +00:00
SelectionDAGNodes.h Add addrspacecast instruction. 2013-11-15 01:34:59 +00:00
SlotIndexes.h
StackMaps.h Refactor a lot of patchpoint/stackmap related code to simplify and make it 2013-11-29 03:07:54 +00:00
StackProtector.h Reformat code with clang-format. 2013-10-30 02:25:14 +00:00
TargetLoweringObjectFileImpl.h
TargetSchedule.h IfConverter: Use TargetSchedule for instruction latencies 2013-09-30 15:28:56 +00:00
ValueTypes.h Implement AArch64 vector load/store multiple N-element structure class SIMD(lselem). 2013-10-10 17:00:52 +00:00
ValueTypes.td Implement aarch64 neon instruction set AdvSIMD (Across). 2013-10-05 08:22:10 +00:00
VirtRegMap.h