llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2024-11-02 07:11:49 +00:00

History

Chandler Carruth a6a87b595d [PM] Change the core design of the TTI analysis to use a polymorphic type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is much more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to just do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227669 91177308-0d34-0410-b5e6-96231b3b80d8		2015-01-31 03:43:40 +00:00
..
AsmPrinter	Compute the ELF SectionKind from the flags.	2015-01-29 17:33:21 +00:00
SelectionDAG	Add nullptr checks for TargetSelectionDAGInfo in SelectionDAG.	2015-01-28 23:50:40 +00:00
AggressiveAntiDepBreaker.cpp	Correct the AggressiveAntiDepBreaker's handling of subregisters defining super registers	2015-01-28 14:44:14 +00:00
AggressiveAntiDepBreaker.h	mop up: "Don’t duplicate function or class name at the beginning of the comment."	2014-09-21 14:48:16 +00:00
AllocationOrder.cpp
AllocationOrder.h	Canonicalize header guards into a common format.	2014-08-13 16:26:38 +00:00
Analysis.cpp	Add assertions for out of bound index in ComputeLinearIndex	2015-01-14 05:38:48 +00:00
AntiDepBreaker.h	mop up: "Don’t duplicate function or class name at the beginning of the comment."	2014-09-21 14:48:16 +00:00
AtomicExpandPass.cpp	Migrate AtomicExpandPass and DwarfEHPrepare to using a Function-ized getSubtargetImpl.	2015-01-27 01:04:42 +00:00
BasicTargetTransformInfo.cpp	[PM] Change the core design of the TTI analysis to use a polymorphic	2015-01-31 03:43:40 +00:00
BranchFolding.cpp	Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>	2014-11-19 07:49:26 +00:00
BranchFolding.h	Canonicalize header guards into a common format.	2014-08-13 16:26:38 +00:00
CalcSpillWeights.cpp	Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>	2014-11-19 07:49:26 +00:00
CallingConvLower.cpp	musttail: Only set the inreg flag for fastcall and vectorcall	2015-01-12 23:28:23 +00:00
CMakeLists.txt	Add a Windows EH preparation pass that zaps resumes	2015-01-29 00:41:44 +00:00
CodeGen.cpp	[PM] Change the core design of the TTI analysis to use a polymorphic	2015-01-31 03:43:40 +00:00
CodeGenPrepare.cpp	[PM] Change the core design of the TTI analysis to use a polymorphic	2015-01-31 03:43:40 +00:00
CriticalAntiDepBreaker.cpp	Remove unnecessary TargetMachine.h includes.	2014-10-14 07:22:08 +00:00
CriticalAntiDepBreaker.h	mop up: "Don’t duplicate function or class name at the beginning of the comment."	2014-09-21 14:48:16 +00:00
DeadMachineInstructionElim.cpp	Add the llvm.frameallocate and llvm.recoverframeallocation intrinsics	2015-01-13 00:48:10 +00:00
DFAPacketizer.cpp	Remove the TargetMachine from DFAPacketizer since it was only	2014-10-14 01:03:16 +00:00
DwarfEHPrepare.cpp	EHPrepare: Remove leftover initialization code for DomTrees.	2015-01-29 13:26:50 +00:00
EarlyIfConversion.cpp	The subtarget is cached on the MachineFunction. Access it directly.	2015-01-27 07:31:29 +00:00
EdgeBundles.cpp
ErlangGC.cpp	Revert GCStrategy ownership changes	2015-01-26 18:26:35 +00:00
ExecutionDepsFix.cpp	ExecutionDepsFix: Correctly handle wide registers.	2014-12-17 19:13:47 +00:00
ExpandISelPseudos.cpp	Remove unnecessary TargetMachine.h includes.	2014-10-14 07:22:08 +00:00
ExpandPostRAPseudos.cpp	Remove unnecessary TargetMachine.h includes.	2014-10-14 07:22:08 +00:00
ForwardControlFlowIntegrity.cpp	[cleanup] Re-sort all the #include lines in LLVM using	2015-01-14 11:23:27 +00:00
GCMetadata.cpp	Revert GCStrategy ownership changes	2015-01-26 18:26:35 +00:00
GCMetadataPrinter.cpp	clang-format all the GC related files (NFC)	2015-01-16 23:16:12 +00:00
GCRootLowering.cpp	Remove gc.root's performCustomLowering	2015-01-28 19:28:03 +00:00
GCStrategy.cpp	Revert GCStrategy ownership changes	2015-01-26 18:26:35 +00:00
GlobalMerge.cpp	[cleanup] Re-sort all the #include lines in LLVM using	2015-01-14 11:23:27 +00:00
IfConversion.cpp	The subtarget is cached on the MachineFunction. Access it directly.	2015-01-27 07:31:29 +00:00
InlineSpiller.cpp	LiveIntervalAnalysis: Factor out code to update liveness on physreg def removal	2015-01-21 18:50:21 +00:00
InterferenceCache.cpp
InterferenceCache.h	Canonicalize header guards into a common format.	2014-08-13 16:26:38 +00:00
IntrinsicLowering.cpp	[PATCH][Interpreter] Add missing FP intrinsic lowering.	2014-08-30 18:33:35 +00:00
JumpInstrTables.cpp	[cleanup] Re-sort all the #include lines in LLVM using	2015-01-14 11:23:27 +00:00
LatencyPriorityQueue.cpp
LexicalScopes.cpp	DebugInfo: Ensure that all debug location scope chains from instructions within a function, lead to the function itself.	2014-10-14 18:22:52 +00:00
LiveDebugVariables.cpp	[cleanup] Re-sort all the #include lines in LLVM using	2015-01-14 11:23:27 +00:00
LiveDebugVariables.h	[cleanup] Re-sort all the #include lines in LLVM using	2015-01-14 11:23:27 +00:00
LiveInterval.cpp	LiveInterval: Implement feedback by Quentin Colombet.	2015-01-07 23:35:11 +00:00
LiveIntervalAnalysis.cpp	LiveIntervalAnalysis: Mark subregister defs as undef when we determined they are only reading a dead superregister value	2015-01-21 22:55:13 +00:00
LiveIntervalUnion.cpp	LiveIntervalUnion: Allow specification of liverange when unifying/extracting.	2014-12-10 01:12:59 +00:00
LivePhysRegs.cpp
LiveRangeCalc.cpp	LiveInterval: Introduce createMainRangeFromSubranges().	2014-12-24 02:11:51 +00:00
LiveRangeCalc.h	LiveRangeCalc: Rewrite subrange calculation	2014-12-16 04:03:38 +00:00
LiveRangeEdit.cpp	MachineRegisterInfo can access TII off of the MachineFunction's	2015-01-27 01:15:16 +00:00
LiveRegMatrix.cpp	[cleanup] Re-sort all the #include lines in LLVM using	2015-01-14 11:23:27 +00:00
LiveStackAnalysis.cpp	Move register class name strings to a single array in MCRegisterInfo to reduce static table size and number of relocation entries.	2014-11-17 05:50:14 +00:00
LiveVariables.cpp	Remove unnecessary TargetMachine.h includes.	2014-10-14 07:22:08 +00:00
LLVMBuild.txt
LLVMTargetMachine.cpp	std::unique_ptrify the MCStreamer argument to createAsmPrinter	2015-01-18 20:29:04 +00:00
LocalStackSlotAllocation.cpp	[Statepoints 2/4] Statepoint infrastructure for garbage collection: MI & x86-64 Backend	2014-12-01 22:52:56 +00:00
MachineBasicBlock.cpp	The leak detector is dead, long live asan and valgrind.	2014-12-22 13:00:36 +00:00
MachineBlockFrequencyInfo.cpp	Revert "Introduce a string_ostream string builder facilty"	2014-06-26 22:52:05 +00:00
MachineBlockPlacement.cpp	[MBP] Add flags to disable the BadCFGConflict check in MachineBlockPlacement.	2015-01-14 20:19:29 +00:00
MachineBranchProbabilityInfo.cpp
MachineCombiner.cpp	remove function names from comments; NFC	2015-01-27 22:26:56 +00:00
MachineCopyPropagation.cpp	Have MachineFunction cache a pointer to the subtarget to make lookups	2014-08-05 02:39:49 +00:00
MachineCSE.cpp	[MachineCSE] Clear kill-flag on registers imp-def'd by the CSE'd instruction.	2014-12-02 18:09:51 +00:00
MachineDominanceFrontier.cpp	[cleanup] Re-sort all the #include lines in LLVM using	2015-01-14 11:23:27 +00:00
MachineDominators.cpp	[MachineDominatorTree] Provide a method to inform a MachineDominatorTree that a	2014-08-13 21:00:07 +00:00
MachineFunction.cpp	Remove MergeableConst.	2015-01-29 14:12:41 +00:00
MachineFunctionAnalysis.cpp	Remove unused member variable.	2014-10-14 18:53:16 +00:00
MachineFunctionPass.cpp	[LPM] Stop using the string based preservation API. It is an	2015-01-28 04:57:56 +00:00
MachineFunctionPrinterPass.cpp	Rename argument strings of codegen passes to avoid collisions with command line	2014-12-13 04:52:04 +00:00
MachineInstr.cpp	LiveIntervalAnalysis: Mark subregister defs as undef when we determined they are only reading a dead superregister value	2015-01-21 22:55:13 +00:00
MachineInstrBundle.cpp	Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>	2014-11-19 07:49:26 +00:00
MachineLICM.cpp	[MachineLICM] A command-line option to hoist even cheap instructions	2015-01-08 22:10:48 +00:00
MachineLoopInfo.cpp
MachineModuleInfo.cpp	Classify functions by EH personality type rather than using the triple	2015-01-23 18:49:01 +00:00
MachineModuleInfoImpls.cpp
MachinePassRegistry.cpp
MachinePostDominators.cpp
MachineRegionInfo.cpp	[cleanup] Re-sort all the #include lines in LLVM using	2015-01-14 11:23:27 +00:00
MachineRegisterInfo.cpp	MachineRegisterInfo can access TII off of the MachineFunction's	2015-01-27 01:15:16 +00:00
MachineScheduler.cpp	The subtarget is cached on the MachineFunction. Access it directly.	2015-01-27 07:31:29 +00:00
MachineSink.cpp	Use DomTree in MachineSink to sink over diamonds.	2014-12-04 10:36:42 +00:00
MachineSSAUpdater.cpp	Remove unnecessary TargetMachine.h includes.	2014-10-14 07:22:08 +00:00
MachineTraceMetrics.cpp	The subtarget is cached on the MachineFunction. Access it directly.	2015-01-27 07:31:29 +00:00
MachineVerifier.cpp	MachineVerifier: Allow undef reads if a matching superreg is defined.	2015-01-14 22:25:14 +00:00
Makefile
module.modulemap
OcamlGC.cpp	Revert GCStrategy ownership changes	2015-01-26 18:26:35 +00:00
OptimizePHIs.cpp	Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>	2014-11-19 07:49:26 +00:00
Passes.cpp	Add a Windows EH preparation pass that zaps resumes	2015-01-29 00:41:44 +00:00
PeepholeOptimizer.cpp	Peephole opt needs optimizeSelect() to keep track of newly created MIs	2015-01-13 07:07:13 +00:00
PHIElimination.cpp	Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>	2014-11-19 07:49:26 +00:00
PHIEliminationUtils.cpp
PHIEliminationUtils.h	Canonicalize header guards into a common format.	2014-08-13 16:26:38 +00:00
PostRASchedulerList.cpp	The subtarget is cached on the MachineFunction. Access it directly.	2015-01-27 07:31:29 +00:00
ProcessImplicitDefs.cpp	Have MachineFunction cache a pointer to the subtarget to make lookups	2014-08-05 02:39:49 +00:00
PrologEpilogInserter.cpp	Add the llvm.frameallocate and llvm.recoverframeallocation intrinsics	2015-01-13 00:48:10 +00:00
PrologEpilogInserter.h	Canonicalize header guards into a common format.	2014-08-13 16:26:38 +00:00
PseudoSourceValue.cpp	Make isAliased property for fixed-offset stack objects adjustable	2014-08-16 00:17:02 +00:00
README.txt
RegAllocBase.cpp	[RegAllocGreedy] Introduce a late pass to repair broken hints.	2015-01-08 01:16:39 +00:00
RegAllocBase.h	[RegAllocGreedy] Introduce a late pass to repair broken hints.	2015-01-08 01:16:39 +00:00
RegAllocBasic.cpp	Remove unnecessary TargetMachine.h includes.	2014-10-14 07:22:08 +00:00
RegAllocFast.cpp	[RegAllocFast] Handle implicit definitions conservatively.	2014-12-03 23:38:08 +00:00
RegAllocGreedy.cpp	[RegAllocGreedy] Introduce a late pass to repair broken hints.	2015-01-08 01:16:39 +00:00
RegAllocPBQP.cpp	Have the PBQP register allocator use the subtarget on the MachineFunction.	2015-01-27 08:27:06 +00:00
RegisterClassInfo.cpp	Silence more static analyzer warnings.	2014-12-15 18:48:43 +00:00
RegisterCoalescer.cpp	Update a few calls to getSubtarget<> to either be getSubtargetImpl	2015-01-27 07:54:39 +00:00
RegisterCoalescer.h	mop up: "Don’t duplicate function or class name at the beginning of the comment."	2014-09-20 22:39:16 +00:00
RegisterPressure.cpp	Remove unnecessary TargetMachine.h includes.	2014-10-14 07:22:08 +00:00
RegisterScavenging.cpp	Grab the subtarget and subtarget dependent variables off of	2014-10-14 07:22:00 +00:00
ScheduleDAG.cpp	Replace some uses of getSubtargetImpl with the cached version	2015-01-27 08:48:42 +00:00
ScheduleDAGInstrs.cpp	Update a few calls to getSubtarget<> to either be getSubtargetImpl	2015-01-27 07:54:39 +00:00
ScheduleDAGPrinter.cpp	Remove unnecessary TargetMachine.h includes.	2014-10-14 07:22:08 +00:00
ScoreboardHazardRecognizer.cpp	Change MCSchedModel to be a struct of statically initialized data.	2014-09-02 17:43:54 +00:00
ShadowStackGC.cpp	Remove gc.root's performCustomLowering	2015-01-28 19:28:03 +00:00
ShadowStackGCLowering.cpp	Remove gc.root's performCustomLowering	2015-01-28 19:28:03 +00:00
SjLjEHPrepare.cpp	Replace some uses of getSubtargetImpl with the cached version	2015-01-27 08:48:42 +00:00
SlotIndexes.cpp
Spiller.h	[RegAlloc] Kill off the trivial spiller - nobody is using it any more.	2014-11-06 19:12:38 +00:00
SpillPlacement.cpp	Fix the threshold added in r186434 (a re-apply of r185393) and updaated	2014-10-02 22:23:14 +00:00
SpillPlacement.h	Fix the threshold added in r186434 (a re-apply of r185393) and updaated	2014-10-02 22:23:14 +00:00
SplitKit.cpp	LiveIntervalAnalysis: Factor out code to update liveness on vreg def removal	2015-01-21 19:02:30 +00:00
SplitKit.h	Canonicalize header guards into a common format.	2014-08-13 16:26:38 +00:00
StackColoring.cpp	IR: Split Metadata from Value	2014-12-09 18:38:53 +00:00
StackMapLivenessAnalysis.cpp	[StackMaps] Allow the target to pre-process the live-out mask	2015-01-13 17:47:59 +00:00
StackMaps.cpp	Move DataLayout back to the TargetMachine from TargetSubtargetInfo	2015-01-26 19:03:15 +00:00
StackProtector.cpp	Replace some uses of getSubtargetImpl with the cached version	2015-01-27 08:48:42 +00:00
StackSlotColoring.cpp	Remove unnecessary TargetMachine.h includes.	2014-10-14 07:22:08 +00:00
StatepointExampleGC.cpp	Revert GCStrategy ownership changes	2015-01-26 18:26:35 +00:00
TailDuplication.cpp	Have MachineFunction cache a pointer to the subtarget to make lookups	2014-08-05 02:39:49 +00:00
TargetFrameLoweringImpl.cpp	Remove unnecessary TargetMachine.h includes.	2014-10-14 07:22:08 +00:00
TargetInstrInfo.cpp	Move DataLayout back to the TargetMachine from TargetSubtargetInfo	2015-01-26 19:03:15 +00:00
TargetLoweringBase.cpp	Move DataLayout back to the TargetMachine from TargetSubtargetInfo	2015-01-26 19:03:15 +00:00
TargetLoweringObjectFileImpl.cpp	Compute the ELF SectionKind from the flags.	2015-01-29 17:33:21 +00:00
TargetOptionsImpl.cpp	Migrate ABIName to MCTargetOptions so that it can be shared between	2015-01-14 00:50:31 +00:00
TargetRegisterInfo.cpp	Introduce register dump helper	2014-11-19 19:46:11 +00:00
TargetSchedule.cpp	Remove unnecessary TargetMachine.h includes.	2014-10-14 07:22:08 +00:00
TwoAddressInstructionPass.cpp	Replace some uses of getSubtargetImpl with the cached version	2015-01-27 08:48:42 +00:00
UnreachableBlockElim.cpp	Replace size method call of containers to empty method where appropriate	2015-01-15 11:41:30 +00:00
VirtRegMap.cpp	LiveInterval: Use range based for loops for subregister ranges.	2014-12-11 00:59:06 +00:00
WinEHPrepare.cpp	Fix memory leak in WinEHPrepare introduced in r227405.	2015-01-30 22:07:05 +00:00

README.txt

//===---------------------------------------------------------------------===//

Common register allocation / spilling problem:

        mul lr, r4, lr
        str lr, [sp, #+52]
        ldr lr, [r1, #+32]
        sxth r3, r3
        ldr r4, [sp, #+52]
        mla r4, r3, lr, r4

can be:

        mul lr, r4, lr
        mov r4, lr
        str lr, [sp, #+52]
        ldr lr, [r1, #+32]
        sxth r3, r3
        mla r4, r3, lr, r4

and then "merge" mul and mov:

        mul r4, r4, lr
        str r4, [sp, #+52]
        ldr lr, [r1, #+32]
        sxth r3, r3
        mla r4, r3, lr, r4

It also increase the likelihood the store may become dead.

//===---------------------------------------------------------------------===//

bb27 ...
        ...
        %reg1037 = ADDri %reg1039, 1
        %reg1038 = ADDrs %reg1032, %reg1039, %NOREG, 10
    Successors according to CFG: 0x8b03bf0 (#5)

bb76 (0x8b03bf0, LLVM BB @0x8b032d0, ID#5):
    Predecessors according to CFG: 0x8b0c5f0 (#3) 0x8b0a7c0 (#4)
        %reg1039 = PHI %reg1070, mbb<bb76.outer,0x8b0c5f0>, %reg1037, mbb<bb27,0x8b0a7c0>

Note ADDri is not a two-address instruction. However, its result %reg1037 is an
operand of the PHI node in bb76 and its operand %reg1039 is the result of the
PHI node. We should treat it as a two-address code and make sure the ADDri is
scheduled after any node that reads %reg1039.

//===---------------------------------------------------------------------===//

Use local info (i.e. register scavenger) to assign it a free register to allow
reuse:
        ldr r3, [sp, #+4]
        add r3, r3, #3
        ldr r2, [sp, #+8]
        add r2, r2, #2
        ldr r1, [sp, #+4]  <==
        add r1, r1, #1
        ldr r0, [sp, #+4]
        add r0, r0, #2

//===---------------------------------------------------------------------===//

LLVM aggressively lift CSE out of loop. Sometimes this can be negative side-
effects:

R1 = X + 4
R2 = X + 7
R3 = X + 15

loop:
load [i + R1]
...
load [i + R2]
...
load [i + R3]

Suppose there is high register pressure, R1, R2, R3, can be spilled. We need
to implement proper re-materialization to handle this:

R1 = X + 4
R2 = X + 7
R3 = X + 15

loop:
R1 = X + 4  @ re-materialized
load [i + R1]
...
R2 = X + 7 @ re-materialized
load [i + R2]
...
R3 = X + 15 @ re-materialized
load [i + R3]

Furthermore, with re-association, we can enable sharing:

R1 = X + 4
R2 = X + 7
R3 = X + 15

loop:
T = i + X
load [T + 4]
...
load [T + 7]
...
load [T + 15]
//===---------------------------------------------------------------------===//

It's not always a good idea to choose rematerialization over spilling. If all
the load / store instructions would be folded then spilling is cheaper because
it won't require new live intervals / registers. See 2003-05-31-LongShifts for
an example.

//===---------------------------------------------------------------------===//

With a copying garbage collector, derived pointers must not be retained across
collector safe points; the collector could move the objects and invalidate the
derived pointer. This is bad enough in the first place, but safe points can
crop up unpredictably. Consider:

        %array = load { i32, [0 x %obj] }** %array_addr
        %nth_el = getelementptr { i32, [0 x %obj] }* %array, i32 0, i32 %n
        %old = load %obj** %nth_el
        %z = div i64 %x, %y
        store %obj* %new, %obj** %nth_el

If the i64 division is lowered to a libcall, then a safe point will (must)
appear for the call site. If a collection occurs, %array and %nth_el no longer
point into the correct object.

The fix for this is to copy address calculations so that dependent pointers
are never live across safe point boundaries. But the loads cannot be copied
like this if there was an intervening store, so may be hard to get right.

Only a concurrent mutator can trigger a collection at the libcall safe point.
So single-threaded programs do not have this requirement, even with a copying
collector. Still, LLVM optimizations would probably undo a front-end's careful
work.

//===---------------------------------------------------------------------===//

The ocaml frametable structure supports liveness information. It would be good
to support it.

//===---------------------------------------------------------------------===//

The FIXME in ComputeCommonTailLength in BranchFolding.cpp needs to be
revisited. The check is there to work around a misuse of directives in inline
assembly.

//===---------------------------------------------------------------------===//

It would be good to detect collector/target compatibility instead of silently
doing the wrong thing.

//===---------------------------------------------------------------------===//

It would be really nice to be able to write patterns in .td files for copies,
which would eliminate a bunch of explicit predicates on them (e.g. no side 
effects).  Once this is in place, it would be even better to have tblgen 
synthesize the various copy insertion/inspection methods in TargetInstrInfo.

//===---------------------------------------------------------------------===//

Stack coloring improvements:

1. Do proper LiveStackAnalysis on all stack objects including those which are
   not spill slots.
2. Reorder objects to fill in gaps between objects.
   e.g. 4, 1, <gap>, 4, 1, 1, 1, <gap>, 4 => 4, 1, 1, 1, 1, 4, 4

//===---------------------------------------------------------------------===//

The scheduler should be able to sort nearby instructions by their address. For
example, in an expanded memset sequence it's not uncommon to see code like this:

  movl $0, 4(%rdi)
  movl $0, 8(%rdi)
  movl $0, 12(%rdi)
  movl $0, 0(%rdi)

Each of the stores is independent, and the scheduler is currently making an
arbitrary decision about the order.

//===---------------------------------------------------------------------===//

Another opportunitiy in this code is that the $0 could be moved to a register:

  movl $0, 4(%rdi)
  movl $0, 8(%rdi)
  movl $0, 12(%rdi)
  movl $0, 0(%rdi)

This would save substantial code size, especially for longer sequences like
this. It would be easy to have a rule telling isel to avoid matching MOV32mi
if the immediate has more than some fixed number of uses. It's more involved
to teach the register allocator how to do late folding to recover from
excessive register pressure.