llvm-6502/lib/CodeGen
Jakob Stoklund Olesen 8c54a62061 Rebuild RegScavenger::DistanceMap each time it is needed.
The register scavenger maintains a DistanceMap that maps MI pointers to their
distance from the top of the current MBB. The DistanceMap is built
incrementally in forward() and in bulk in findFirstUse(). It is used by
scavengeRegister() to determine which candidate register has the longest
unused interval.

Unfortunately the DistanceMap contents can become outdated. The first time
scavengeRegister() is called, the DistanceMap is filled to cover the MBB. If
then instructions are inserted in the MBB (as they always are following
scavengeRegister()), the recorded distances are too short. This causes bad
behaviour in the included test case where a register use /after/ the current
position is ignored because findFirstUse() thinks is is /before/ the current
position. A "using an undefined register" assertion follows promptly.

The fix is to build a fresh DistanceMap at the top of scavengeRegister(), and
discard it after use. This means that DistanceMap is no longer needed as a
RegScavenger member variable, and forward() doesn't need to update it.

The fix then discloses issue number two in the same test case: The candidate
search in scavengeRegister() finds a CSR that has been saved in the prologue,
but is currently unused. It would be both inefficient and wrong to spill such
a register in the emergency spill slot. In the present case, the emergency
slot restore is placed immediately before the normal epilogue restore, leading
to a "Redefining a live register" assertion.

Fix number two: When scavengerRegister() stumbles upon an unused register that
is overwritten later in the MBB, return that register early. It is important
to verify that the register is defined later in the MBB, otherwise it might be
an unspilled CSR.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@78650 91177308-0d34-0410-b5e6-96231b3b80d8
2009-08-11 06:25:12 +00:00
..
AsmPrinter SjLj based exception handling unwinding support. This patch is nasty, brutish 2009-08-11 00:09:57 +00:00
PBQP Fix some -Asserts unused variable warnings. 2009-08-08 00:40:46 +00:00
SelectionDAG SjLj based exception handling unwinding support. This patch is nasty, brutish 2009-08-11 00:09:57 +00:00
BranchFolding.cpp Rewrite previous patch to follow Chris' stylistic 2009-08-07 17:41:29 +00:00
CMakeLists.txt Post RA scheduler changes. Introduce a hazard recognizer that uses the target schedule information to accurately model the pipeline. Update the scheduler to correctly handle multi-issue targets. 2009-08-10 15:55:25 +00:00
CodePlacementOpt.cpp
DeadMachineInstructionElim.cpp
Dump.cpp
DwarfEHPrepare.cpp
ELF.h ELF improvements: 2009-08-08 17:29:04 +00:00
ELFCodeEmitter.cpp
ELFCodeEmitter.h
ELFWriter.cpp Move ConstantExpr handling to ResolveConstantExpr method and also 2009-08-10 03:32:40 +00:00
ELFWriter.h Move ConstantExpr handling to ResolveConstantExpr method and also 2009-08-10 03:32:40 +00:00
ExactHazardRecognizer.cpp Fix a -Asserts warning. 2009-08-11 06:22:47 +00:00
ExactHazardRecognizer.h Post RA scheduler changes. Introduce a hazard recognizer that uses the target schedule information to accurately model the pipeline. Update the scheduler to correctly handle multi-issue targets. 2009-08-10 15:55:25 +00:00
GCMetadata.cpp
GCMetadataPrinter.cpp
GCStrategy.cpp
IfConversion.cpp
IntrinsicLowering.cpp
LatencyPriorityQueue.cpp
LazyLiveness.cpp
LiveInterval.cpp Modified VNInfo. The "copy" member is now a union which holds the copy for a register interval, or the defining register for a stack interval. Access is via getCopy/setCopy and getReg/setReg. 2009-08-10 23:43:28 +00:00
LiveIntervalAnalysis.cpp Modified VNInfo. The "copy" member is now a union which holds the copy for a register interval, or the defining register for a stack interval. Access is via getCopy/setCopy and getReg/setReg. 2009-08-10 23:43:28 +00:00
LiveStackAnalysis.cpp
LiveVariables.cpp
LLVMTargetMachine.cpp SjLj based exception handling unwinding support. This patch is nasty, brutish 2009-08-11 00:09:57 +00:00
LowerSubregs.cpp Remove RegisterScavenger::isSuperRegUsed(). This completely reverses the mistaken commit r77904. 2009-08-08 13:19:10 +00:00
MachineBasicBlock.cpp
MachineDominators.cpp
MachineFunction.cpp SjLj based exception handling unwinding support. This patch is nasty, brutish 2009-08-11 00:09:57 +00:00
MachineFunctionAnalysis.cpp
MachineFunctionPass.cpp
MachineInstr.cpp
MachineLICM.cpp
MachineLoopInfo.cpp
MachineModuleInfo.cpp
MachinePassRegistry.cpp
MachineRegisterInfo.cpp
MachineSink.cpp
MachineVerifier.cpp Clean out per-function data after the machine code verifier is done with it. 2009-08-08 15:34:50 +00:00
MachO.h
MachOCodeEmitter.cpp
MachOCodeEmitter.h
MachOWriter.cpp
MachOWriter.h
Makefile
ObjectCodeEmitter.cpp Remove accidental commited comment 2009-08-05 07:00:43 +00:00
OcamlGC.cpp
Passes.cpp
PHIElimination.cpp
PHIElimination.h
PostRASchedulerList.cpp Replace DOUT. 2009-08-11 01:44:26 +00:00
PreAllocSplitting.cpp Modified VNInfo. The "copy" member is now a union which holds the copy for a register interval, or the defining register for a stack interval. Access is via getCopy/setCopy and getReg/setReg. 2009-08-10 23:43:28 +00:00
PrologEpilogInserter.cpp
PrologEpilogInserter.h
PseudoSourceValue.cpp
README.txt
RegAllocLinearScan.cpp
RegAllocLocal.cpp
RegAllocPBQP.cpp Modified VNInfo. The "copy" member is now a union which holds the copy for a register interval, or the defining register for a stack interval. Access is via getCopy/setCopy and getReg/setReg. 2009-08-10 23:43:28 +00:00
RegAllocSimple.cpp
RegisterCoalescer.cpp
RegisterScavenging.cpp Rebuild RegScavenger::DistanceMap each time it is needed. 2009-08-11 06:25:12 +00:00
ScheduleDAG.cpp
ScheduleDAGEmit.cpp
ScheduleDAGInstrs.cpp Post RA scheduler changes. Introduce a hazard recognizer that uses the target schedule information to accurately model the pipeline. Update the scheduler to correctly handle multi-issue targets. 2009-08-10 15:55:25 +00:00
ScheduleDAGInstrs.h
ScheduleDAGPrinter.cpp
ShadowStackGC.cpp Privatize the StructType table, which unfortunately involves routing contexts through a number of APIs. 2009-08-05 23:16:16 +00:00
ShrinkWrapping.cpp
SimpleHazardRecognizer.h Post RA scheduler changes. Introduce a hazard recognizer that uses the target schedule information to accurately model the pipeline. Update the scheduler to correctly handle multi-issue targets. 2009-08-10 15:55:25 +00:00
SimpleRegisterCoalescing.cpp Modified VNInfo. The "copy" member is now a union which holds the copy for a register interval, or the defining register for a stack interval. Access is via getCopy/setCopy and getReg/setReg. 2009-08-10 23:43:28 +00:00
SimpleRegisterCoalescing.h
Spiller.cpp
Spiller.h
StackProtector.cpp
StackSlotColoring.cpp
StrongPHIElimination.cpp Modified VNInfo. The "copy" member is now a union which holds the copy for a register interval, or the defining register for a stack interval. Access is via getCopy/setCopy and getReg/setReg. 2009-08-10 23:43:28 +00:00
TargetInstrInfoImpl.cpp
TwoAddressInstructionPass.cpp Code clean up. 2009-08-07 00:28:58 +00:00
UnreachableBlockElim.cpp
VirtRegMap.cpp
VirtRegMap.h
VirtRegRewriter.cpp Fix a bunch of namespace pollution. 2009-08-07 01:32:21 +00:00
VirtRegRewriter.h

//===---------------------------------------------------------------------===//

Common register allocation / spilling problem:

        mul lr, r4, lr
        str lr, [sp, #+52]
        ldr lr, [r1, #+32]
        sxth r3, r3
        ldr r4, [sp, #+52]
        mla r4, r3, lr, r4

can be:

        mul lr, r4, lr
        mov r4, lr
        str lr, [sp, #+52]
        ldr lr, [r1, #+32]
        sxth r3, r3
        mla r4, r3, lr, r4

and then "merge" mul and mov:

        mul r4, r4, lr
        str lr, [sp, #+52]
        ldr lr, [r1, #+32]
        sxth r3, r3
        mla r4, r3, lr, r4

It also increase the likelyhood the store may become dead.

//===---------------------------------------------------------------------===//

I think we should have a "hasSideEffects" flag (which is automatically set for
stuff that "isLoad" "isCall" etc), and the remat pass should eventually be able
to remat any instruction that has no side effects, if it can handle it and if
profitable.

For now, I'd suggest having the remat stuff work like this:

1. I need to spill/reload this thing.
2. Check to see if it has side effects.
3. Check to see if it is simple enough: e.g. it only has one register
destination and no register input.
4. If so, clone the instruction, do the xform, etc.

Advantages of this are:

1. the .td file describes the behavior of the instructions, not the way the
   algorithm should work.
2. as remat gets smarter in the future, we shouldn't have to be changing the .td
   files.
3. it is easier to explain what the flag means in the .td file, because you
   don't have to pull in the explanation of how the current remat algo works.

Some potential added complexities:

1. Some instructions have to be glued to it's predecessor or successor. All of
   the PC relative instructions and condition code setting instruction. We could
   mark them as hasSideEffects, but that's not quite right. PC relative loads
   from constantpools can be remat'ed, for example. But it requires more than
   just cloning the instruction. Some instructions can be remat'ed but it
   expands to more than one instruction. But allocator will have to make a
   decision.

4. As stated in 3, not as simple as cloning in some cases. The target will have
   to decide how to remat it. For example, an ARM 2-piece constant generation
   instruction is remat'ed as a load from constantpool.

//===---------------------------------------------------------------------===//

bb27 ...
        ...
        %reg1037 = ADDri %reg1039, 1
        %reg1038 = ADDrs %reg1032, %reg1039, %NOREG, 10
    Successors according to CFG: 0x8b03bf0 (#5)

bb76 (0x8b03bf0, LLVM BB @0x8b032d0, ID#5):
    Predecessors according to CFG: 0x8b0c5f0 (#3) 0x8b0a7c0 (#4)
        %reg1039 = PHI %reg1070, mbb<bb76.outer,0x8b0c5f0>, %reg1037, mbb<bb27,0x8b0a7c0>

Note ADDri is not a two-address instruction. However, its result %reg1037 is an
operand of the PHI node in bb76 and its operand %reg1039 is the result of the
PHI node. We should treat it as a two-address code and make sure the ADDri is
scheduled after any node that reads %reg1039.

//===---------------------------------------------------------------------===//

Use local info (i.e. register scavenger) to assign it a free register to allow
reuse:
        ldr r3, [sp, #+4]
        add r3, r3, #3
        ldr r2, [sp, #+8]
        add r2, r2, #2
        ldr r1, [sp, #+4]  <==
        add r1, r1, #1
        ldr r0, [sp, #+4]
        add r0, r0, #2

//===---------------------------------------------------------------------===//

LLVM aggressively lift CSE out of loop. Sometimes this can be negative side-
effects:

R1 = X + 4
R2 = X + 7
R3 = X + 15

loop:
load [i + R1]
...
load [i + R2]
...
load [i + R3]

Suppose there is high register pressure, R1, R2, R3, can be spilled. We need
to implement proper re-materialization to handle this:

R1 = X + 4
R2 = X + 7
R3 = X + 15

loop:
R1 = X + 4  @ re-materialized
load [i + R1]
...
R2 = X + 7 @ re-materialized
load [i + R2]
...
R3 = X + 15 @ re-materialized
load [i + R3]

Furthermore, with re-association, we can enable sharing:

R1 = X + 4
R2 = X + 7
R3 = X + 15

loop:
T = i + X
load [T + 4]
...
load [T + 7]
...
load [T + 15]
//===---------------------------------------------------------------------===//

It's not always a good idea to choose rematerialization over spilling. If all
the load / store instructions would be folded then spilling is cheaper because
it won't require new live intervals / registers. See 2003-05-31-LongShifts for
an example.

//===---------------------------------------------------------------------===//

With a copying garbage collector, derived pointers must not be retained across
collector safe points; the collector could move the objects and invalidate the
derived pointer. This is bad enough in the first place, but safe points can
crop up unpredictably. Consider:

        %array = load { i32, [0 x %obj] }** %array_addr
        %nth_el = getelementptr { i32, [0 x %obj] }* %array, i32 0, i32 %n
        %old = load %obj** %nth_el
        %z = div i64 %x, %y
        store %obj* %new, %obj** %nth_el

If the i64 division is lowered to a libcall, then a safe point will (must)
appear for the call site. If a collection occurs, %array and %nth_el no longer
point into the correct object.

The fix for this is to copy address calculations so that dependent pointers
are never live across safe point boundaries. But the loads cannot be copied
like this if there was an intervening store, so may be hard to get right.

Only a concurrent mutator can trigger a collection at the libcall safe point.
So single-threaded programs do not have this requirement, even with a copying
collector. Still, LLVM optimizations would probably undo a front-end's careful
work.

//===---------------------------------------------------------------------===//

The ocaml frametable structure supports liveness information. It would be good
to support it.

//===---------------------------------------------------------------------===//

The FIXME in ComputeCommonTailLength in BranchFolding.cpp needs to be
revisited. The check is there to work around a misuse of directives in inline
assembly.

//===---------------------------------------------------------------------===//

It would be good to detect collector/target compatibility instead of silently
doing the wrong thing.

//===---------------------------------------------------------------------===//

It would be really nice to be able to write patterns in .td files for copies,
which would eliminate a bunch of explicit predicates on them (e.g. no side 
effects).  Once this is in place, it would be even better to have tblgen 
synthesize the various copy insertion/inspection methods in TargetInstrInfo.

//===---------------------------------------------------------------------===//

Stack coloring improvments:

1. Do proper LiveStackAnalysis on all stack objects including those which are
   not spill slots.
2. Reorder objects to fill in gaps between objects.
   e.g. 4, 1, <gap>, 4, 1, 1, 1, <gap>, 4 => 4, 1, 1, 1, 1, 4, 4