mirror of
https://github.com/c64scene-ar/llvm-6502.git
synced 2025-01-15 07:34:33 +00:00
[PowerPC] Update comment re: VSX copy-instruction selection
I've done some experimentation with this, and it looks like using the lower-latency (but lower throughput) copy instruction is essentially always the right thing to do. My assumption is that, in order to be relatively sure that the higher-latency copy will increase throughput, we'd want to have it unlikely to be in-flight with its use. On the P7, the global completion table (GCT) can hold a maximum of 120 instructions, shared among all active threads (up to 4), giving 30 instructions per thread. So specifically, I'd require at least that many instructions between the copy and the use before the high-latency variant is used. Trying this, however, over the entire test suite resulted in zero cases where the high-latency form would be preferable. This may be a consequence of the fact that the scheduler views copies as free, and so they tend to end up close to their uses. For this experiment I created a function: unsigned chooseVSXCopy(MachineBasicBlock &MBB, MachineBasicBlock::iterator I, unsigned DestReg, unsigned SrcReg, unsigned StartDist = 1, unsigned Depth = 3) const; with an implementation like: if (!Depth) return PPC::XXLOR; const unsigned MaxDist = 30; unsigned Dist = StartDist; for (auto J = I, JE = MBB.end(); J != JE && Dist <= MaxDist; ++J) { if (J->isTransient() && !J->isCopy()) continue; if (J->isCall() || J->isReturn() || J->readsRegister(DestReg, TRI)) return PPC::XXLOR; ++Dist; } // We've exceeded the required distance for the high-latency form, use it. if (Dist > MaxDist) return PPC::XVCPSGNDP; // If this is only an exit block, use the low-latency form. if (MBB.succ_empty()) return PPC::XXLOR; // We've reached the end of the block, check the successor blocks (up to some // depth), and use the high-latency form if that is okay with all successors. for (auto J = MBB.succ_begin(), JE = MBB.succ_end(); J != JE; ++J) { if (chooseVSXCopy(**J, (*J)->begin(), DestReg, SrcReg, Dist, --Depth) == PPC::XXLOR) return PPC::XXLOR; } // All of our successor blocks seem okay with the high-latency variant, so // we'll use it. return PPC::XVCPSGNDP; and then changed the copy opcode selection from: Opc = PPC::XXLOR; to: Opc = chooseVSXCopy(MBB, std::next(I), DestReg, SrcReg); In conclusion, I'm removing the FIXME from the comment, because I believe that there is, at least absent other examples, nothing to fix. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204591 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
parent
2236f9348d
commit
72448143b5
@ -710,12 +710,14 @@ void PPCInstrInfo::copyPhysReg(MachineBasicBlock &MBB,
|
||||
else if (PPC::VRRCRegClass.contains(DestReg, SrcReg))
|
||||
Opc = PPC::VOR;
|
||||
else if (PPC::VSRCRegClass.contains(DestReg, SrcReg))
|
||||
// FIXME: There are really two different ways this can be done, and we
|
||||
// should pick the better one depending on the situation:
|
||||
// There are two different ways this can be done:
|
||||
// 1. xxlor : This has lower latency (on the P7), 2 cycles, but can only
|
||||
// issue in VSU pipeline 0.
|
||||
// 2. xmovdp/xmovsp: This has higher latency (on the P7), 6 cycles, but
|
||||
// can go to either pipeline.
|
||||
// We'll always use xxlor here, because in practically all cases where
|
||||
// copies are generated, they are close enough to some use that the
|
||||
// lower-latency form is preferable.
|
||||
Opc = PPC::XXLOR;
|
||||
else if (PPC::CRBITRCRegClass.contains(DestReg, SrcReg))
|
||||
Opc = PPC::CROR;
|
||||
|
Loading…
x
Reference in New Issue
Block a user