[PowerPC] ELFv2 stack space reduction

The ELFv2 ABI reduces the amount of stack required to implement an
ABI-compliant function call in two ways:
* the "linkage area" is reduced from 48 bytes to 32 bytes by
  eliminating two unused doublewords
* the 64-byte "parameter save area" is now optional and need not be
  present in certain cases (it remains mandatory in functions with
  variable arguments, and functions that have any parameter that is
  passed on the stack)

The following patch implements this required changes:
- reducing the linkage area, and associated relocation of the TOC save
  slot, in getLinkageSize / getTOCSaveOffset (this requires updating all
  callers of these routines to pass in the isELFv2ABI flag).
- (partially) handling the case where the parameter save are is optional

This latter part requires some extra explanation:  Currently, we still
always allocate the parameter save area when *calling* a function.
That is certainly always compliant with the ABI, but may cause code to
allocate stack unnecessarily.  This can be addressed by a follow-on
optimization patch.

On the *callee* side, in LowerFormalArguments, we *must* track
correctly whether the ABI guarantees that the caller has allocated
the parameter save area for our use, and the patch does so. However,
there is one complication: the code that handles incoming "byval"
arguments will currently *always* write to the parameter save area,
because it has to force incoming register arguments to the stack since
it must return an *address* to implement the byval semantics.

To fix this, the patch changes the LowerFormalArguments code to write
arguments to a freshly allocated stack slot on the function's own stack
frame instead of the argument save area in those cases where that area
is not present.

Reviewed by Hal Finkel.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213490 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
Ulrich Weigand 2014-07-20 23:43:15 +00:00
parent edfd4f18bc
commit 7fc5011e8d
6 changed files with 118 additions and 27 deletions

View File

@ -1203,7 +1203,9 @@ bool PPCFastISel::processCallArgs(SmallVectorImpl<Value*> &Args,
CCState CCInfo(CC, IsVarArg, *FuncInfo.MF, TM, ArgLocs, *Context); CCState CCInfo(CC, IsVarArg, *FuncInfo.MF, TM, ArgLocs, *Context);
// Reserve space for the linkage area on the stack. // Reserve space for the linkage area on the stack.
unsigned LinkageSize = PPCFrameLowering::getLinkageSize(true, false); bool isELFv2ABI = PPCSubTarget->isELFv2ABI();
unsigned LinkageSize = PPCFrameLowering::getLinkageSize(true, false,
isELFv2ABI);
CCInfo.AllocateStack(LinkageSize, 8); CCInfo.AllocateStack(LinkageSize, 8);
CCInfo.AnalyzeCallOperands(ArgVTs, ArgFlags, CC_PPC64_ELF_FIS); CCInfo.AnalyzeCallOperands(ArgVTs, ArgFlags, CC_PPC64_ELF_FIS);
@ -1232,6 +1234,7 @@ bool PPCFastISel::processCallArgs(SmallVectorImpl<Value*> &Args,
// Because we cannot tell if this is needed on the caller side, we have to // Because we cannot tell if this is needed on the caller side, we have to
// conservatively assume that it is needed. As such, make sure we have at // conservatively assume that it is needed. As such, make sure we have at
// least enough stack space for the caller to store the 8 GPRs. // least enough stack space for the caller to store the 8 GPRs.
// FIXME: On ELFv2, it may be unnecessary to allocate the parameter area.
NumBytes = std::max(NumBytes, LinkageSize + 64); NumBytes = std::max(NumBytes, LinkageSize + 64);
// Issue CALLSEQ_START. // Issue CALLSEQ_START.

View File

@ -400,7 +400,8 @@ unsigned PPCFrameLowering::determineFrameLayout(MachineFunction &MF,
// Maximum call frame needs to be at least big enough for linkage area. // Maximum call frame needs to be at least big enough for linkage area.
unsigned minCallFrameSize = getLinkageSize(Subtarget.isPPC64(), unsigned minCallFrameSize = getLinkageSize(Subtarget.isPPC64(),
Subtarget.isDarwinABI()); Subtarget.isDarwinABI(),
Subtarget.isELFv2ABI());
maxCallFrameSize = std::max(maxCallFrameSize, minCallFrameSize); maxCallFrameSize = std::max(maxCallFrameSize, minCallFrameSize);
// If we have dynamic alloca then maxCallFrameSize needs to be aligned so // If we have dynamic alloca then maxCallFrameSize needs to be aligned so

View File

@ -76,8 +76,8 @@ public:
/// getTOCSaveOffset - Return the previous frame offset to save the /// getTOCSaveOffset - Return the previous frame offset to save the
/// TOC register -- 64-bit SVR4 ABI only. /// TOC register -- 64-bit SVR4 ABI only.
static unsigned getTOCSaveOffset(void) { static unsigned getTOCSaveOffset(bool isELFv2ABI) {
return 40; return isELFv2ABI ? 24 : 40;
} }
/// getFramePointerSaveOffset - Return the previous frame offset to save the /// getFramePointerSaveOffset - Return the previous frame offset to save the
@ -109,9 +109,10 @@ public:
/// getLinkageSize - Return the size of the PowerPC ABI linkage area. /// getLinkageSize - Return the size of the PowerPC ABI linkage area.
/// ///
static unsigned getLinkageSize(bool isPPC64, bool isDarwinABI) { static unsigned getLinkageSize(bool isPPC64, bool isDarwinABI,
bool isELFv2ABI) {
if (isDarwinABI || isPPC64) if (isDarwinABI || isPPC64)
return 6 * (isPPC64 ? 8 : 4); return (isELFv2ABI ? 4 : 6) * (isPPC64 ? 8 : 4);
// SVR4 ABI: // SVR4 ABI:
return 8; return 8;

View File

@ -2190,6 +2190,54 @@ static unsigned CalculateStackSlotAlignment(EVT ArgVT, ISD::ArgFlagsTy Flags,
return Align; return Align;
} }
/// CalculateStackSlotUsed - Return whether this argument will use its
/// stack slot (instead of being passed in registers). ArgOffset,
/// AvailableFPRs, and AvailableVRs must hold the current argument
/// position, and will be updated to account for this argument.
static bool CalculateStackSlotUsed(EVT ArgVT, ISD::ArgFlagsTy Flags,
unsigned PtrByteSize,
unsigned LinkageSize,
unsigned ParamAreaSize,
unsigned &ArgOffset,
unsigned &AvailableFPRs,
unsigned &AvailableVRs) {
bool UseMemory = false;
// Respect alignment of argument on the stack.
unsigned Align = CalculateStackSlotAlignment(ArgVT, Flags, PtrByteSize);
ArgOffset = ((ArgOffset + Align - 1) / Align) * Align;
// If there's no space left in the argument save area, we must
// use memory (this check also catches zero-sized arguments).
if (ArgOffset >= LinkageSize + ParamAreaSize)
UseMemory = true;
// Allocate argument on the stack.
ArgOffset += CalculateStackSlotSize(ArgVT, Flags, PtrByteSize);
// If we overran the argument save area, we must use memory
// (this check catches arguments passed partially in memory)
if (ArgOffset > LinkageSize + ParamAreaSize)
UseMemory = true;
// However, if the argument is actually passed in an FPR or a VR,
// we don't use memory after all.
if (!Flags.isByVal()) {
if (ArgVT == MVT::f32 || ArgVT == MVT::f64)
if (AvailableFPRs > 0) {
--AvailableFPRs;
return false;
}
if (ArgVT == MVT::v4f32 || ArgVT == MVT::v4i32 ||
ArgVT == MVT::v8i16 || ArgVT == MVT::v16i8 ||
ArgVT == MVT::v2f64 || ArgVT == MVT::v2i64)
if (AvailableVRs > 0) {
--AvailableVRs;
return false;
}
}
return UseMemory;
}
/// EnsureStackAlignment - Round stack frame size up from NumBytes to /// EnsureStackAlignment - Round stack frame size up from NumBytes to
/// ensure minimum alignment required for target. /// ensure minimum alignment required for target.
static unsigned EnsureStackAlignment(const TargetMachine &Target, static unsigned EnsureStackAlignment(const TargetMachine &Target,
@ -2275,7 +2323,7 @@ PPCTargetLowering::LowerFormalArguments_32SVR4(
getTargetMachine(), ArgLocs, *DAG.getContext()); getTargetMachine(), ArgLocs, *DAG.getContext());
// Reserve space for the linkage area on the stack. // Reserve space for the linkage area on the stack.
unsigned LinkageSize = PPCFrameLowering::getLinkageSize(false, false); unsigned LinkageSize = PPCFrameLowering::getLinkageSize(false, false, false);
CCInfo.AllocateStack(LinkageSize, PtrByteSize); CCInfo.AllocateStack(LinkageSize, PtrByteSize);
CCInfo.AnalyzeFormalArguments(Ins, CC_PPC32_SVR4); CCInfo.AnalyzeFormalArguments(Ins, CC_PPC32_SVR4);
@ -2468,6 +2516,7 @@ PPCTargetLowering::LowerFormalArguments_64SVR4(
SmallVectorImpl<SDValue> &InVals) const { SmallVectorImpl<SDValue> &InVals) const {
// TODO: add description of PPC stack frame format, or at least some docs. // TODO: add description of PPC stack frame format, or at least some docs.
// //
bool isELFv2ABI = Subtarget.isELFv2ABI();
bool isLittleEndian = Subtarget.isLittleEndian(); bool isLittleEndian = Subtarget.isLittleEndian();
MachineFunction &MF = DAG.getMachineFunction(); MachineFunction &MF = DAG.getMachineFunction();
MachineFrameInfo *MFI = MF.getFrameInfo(); MachineFrameInfo *MFI = MF.getFrameInfo();
@ -2479,8 +2528,8 @@ PPCTargetLowering::LowerFormalArguments_64SVR4(
(CallConv == CallingConv::Fast)); (CallConv == CallingConv::Fast));
unsigned PtrByteSize = 8; unsigned PtrByteSize = 8;
unsigned LinkageSize = PPCFrameLowering::getLinkageSize(true, false); unsigned LinkageSize = PPCFrameLowering::getLinkageSize(true, false,
unsigned ArgOffset = LinkageSize; isELFv2ABI);
static const MCPhysReg GPR[] = { static const MCPhysReg GPR[] = {
PPC::X3, PPC::X4, PPC::X5, PPC::X6, PPC::X3, PPC::X4, PPC::X5, PPC::X6,
@ -2502,12 +2551,29 @@ PPCTargetLowering::LowerFormalArguments_64SVR4(
const unsigned Num_FPR_Regs = 13; const unsigned Num_FPR_Regs = 13;
const unsigned Num_VR_Regs = array_lengthof(VR); const unsigned Num_VR_Regs = array_lengthof(VR);
unsigned GPR_idx, FPR_idx = 0, VR_idx = 0; // Do a first pass over the arguments to determine whether the ABI
// guarantees that our caller has allocated the parameter save area
// on its stack frame. In the ELFv1 ABI, this is always the case;
// in the ELFv2 ABI, it is true if this is a vararg function or if
// any parameter is located in a stack slot.
bool HasParameterArea = !isELFv2ABI || isVarArg;
unsigned ParamAreaSize = Num_GPR_Regs * PtrByteSize;
unsigned NumBytes = LinkageSize;
unsigned AvailableFPRs = Num_FPR_Regs;
unsigned AvailableVRs = Num_VR_Regs;
for (unsigned i = 0, e = Ins.size(); i != e; ++i)
if (CalculateStackSlotUsed(Ins[i].VT, Ins[i].Flags,
PtrByteSize, LinkageSize, ParamAreaSize,
NumBytes, AvailableFPRs, AvailableVRs))
HasParameterArea = true;
// Add DAG nodes to load the arguments or copy them out of registers. On // Add DAG nodes to load the arguments or copy them out of registers. On
// entry to a function on PPC, the arguments start after the linkage area, // entry to a function on PPC, the arguments start after the linkage area,
// although the first ones are often in registers. // although the first ones are often in registers.
unsigned ArgOffset = LinkageSize;
unsigned GPR_idx, FPR_idx = 0, VR_idx = 0;
SmallVector<SDValue, 8> MemOps; SmallVector<SDValue, 8> MemOps;
Function::const_arg_iterator FuncArg = MF.getFunction()->arg_begin(); Function::const_arg_iterator FuncArg = MF.getFunction()->arg_begin();
unsigned CurArgIdx = 0; unsigned CurArgIdx = 0;
@ -2552,8 +2618,17 @@ PPCTargetLowering::LowerFormalArguments_64SVR4(
} }
// Create a stack object covering all stack doublewords occupied // Create a stack object covering all stack doublewords occupied
// by the argument. // by the argument. If the argument is (fully or partially) on
int FI = MFI->CreateFixedObject(ArgSize, ArgOffset, true); // the stack, or if the argument is fully in registers but the
// caller has allocated the parameter save anyway, we can refer
// directly to the caller's stack frame. Otherwise, create a
// local copy in our own frame.
int FI;
if (HasParameterArea ||
ArgSize + ArgOffset > LinkageSize + Num_GPR_Regs * PtrByteSize)
FI = MFI->CreateFixedObject(ArgSize, ArgOffset, true);
else
FI = MFI->CreateStackObject(ArgSize, Align, false);
SDValue FIN = DAG.getFrameIndex(FI, PtrVT); SDValue FIN = DAG.getFrameIndex(FI, PtrVT);
// Handle aggregates smaller than 8 bytes. // Handle aggregates smaller than 8 bytes.
@ -2697,7 +2772,10 @@ PPCTargetLowering::LowerFormalArguments_64SVR4(
// Area that is at least reserved in the caller of this function. // Area that is at least reserved in the caller of this function.
unsigned MinReservedArea; unsigned MinReservedArea;
MinReservedArea = std::max(ArgOffset, LinkageSize + 8 * PtrByteSize); if (HasParameterArea)
MinReservedArea = std::max(ArgOffset, LinkageSize + 8 * PtrByteSize);
else
MinReservedArea = LinkageSize;
// Set the size that is at least reserved in caller of this function. Tail // Set the size that is at least reserved in caller of this function. Tail
// call optimized functions' reserved stack space needs to be aligned so that // call optimized functions' reserved stack space needs to be aligned so that
@ -2758,7 +2836,8 @@ PPCTargetLowering::LowerFormalArguments_Darwin(
(CallConv == CallingConv::Fast)); (CallConv == CallingConv::Fast));
unsigned PtrByteSize = isPPC64 ? 8 : 4; unsigned PtrByteSize = isPPC64 ? 8 : 4;
unsigned LinkageSize = PPCFrameLowering::getLinkageSize(isPPC64, true); unsigned LinkageSize = PPCFrameLowering::getLinkageSize(isPPC64, true,
false);
unsigned ArgOffset = LinkageSize; unsigned ArgOffset = LinkageSize;
// Area that is at least reserved in caller of this function. // Area that is at least reserved in caller of this function.
unsigned MinReservedArea = ArgOffset; unsigned MinReservedArea = ArgOffset;
@ -3616,6 +3695,8 @@ PPCTargetLowering::FinishCall(CallingConv::ID CallConv, SDLoc dl,
int SPDiff, unsigned NumBytes, int SPDiff, unsigned NumBytes,
const SmallVectorImpl<ISD::InputArg> &Ins, const SmallVectorImpl<ISD::InputArg> &Ins,
SmallVectorImpl<SDValue> &InVals) const { SmallVectorImpl<SDValue> &InVals) const {
bool isELFv2ABI = Subtarget.isELFv2ABI();
std::vector<EVT> NodeTys; std::vector<EVT> NodeTys;
SmallVector<SDValue, 8> Ops; SmallVector<SDValue, 8> Ops;
unsigned CallOpc = PrepareCall(DAG, Callee, InFlag, Chain, dl, SPDiff, unsigned CallOpc = PrepareCall(DAG, Callee, InFlag, Chain, dl, SPDiff,
@ -3691,7 +3772,7 @@ PPCTargetLowering::FinishCall(CallingConv::ID CallConv, SDLoc dl,
SDVTList VTs = DAG.getVTList(MVT::Other, MVT::Glue); SDVTList VTs = DAG.getVTList(MVT::Other, MVT::Glue);
EVT PtrVT = DAG.getTargetLoweringInfo().getPointerTy(); EVT PtrVT = DAG.getTargetLoweringInfo().getPointerTy();
SDValue StackPtr = DAG.getRegister(PPC::X1, PtrVT); SDValue StackPtr = DAG.getRegister(PPC::X1, PtrVT);
unsigned TOCSaveOffset = PPCFrameLowering::getTOCSaveOffset(); unsigned TOCSaveOffset = PPCFrameLowering::getTOCSaveOffset(isELFv2ABI);
SDValue TOCOff = DAG.getIntPtrConstant(TOCSaveOffset); SDValue TOCOff = DAG.getIntPtrConstant(TOCSaveOffset);
SDValue AddTOC = DAG.getNode(ISD::ADD, dl, MVT::i64, StackPtr, TOCOff); SDValue AddTOC = DAG.getNode(ISD::ADD, dl, MVT::i64, StackPtr, TOCOff);
Chain = DAG.getNode(PPCISD::LOAD_TOC, dl, VTs, Chain, AddTOC, InFlag); Chain = DAG.getNode(PPCISD::LOAD_TOC, dl, VTs, Chain, AddTOC, InFlag);
@ -3784,7 +3865,8 @@ PPCTargetLowering::LowerCall_32SVR4(SDValue Chain, SDValue Callee,
getTargetMachine(), ArgLocs, *DAG.getContext()); getTargetMachine(), ArgLocs, *DAG.getContext());
// Reserve space for the linkage area on the stack. // Reserve space for the linkage area on the stack.
CCInfo.AllocateStack(PPCFrameLowering::getLinkageSize(false, false), PtrByteSize); CCInfo.AllocateStack(PPCFrameLowering::getLinkageSize(false, false, false),
PtrByteSize);
if (isVarArg) { if (isVarArg) {
// Handle fixed and variable vector arguments differently. // Handle fixed and variable vector arguments differently.
@ -4012,9 +4094,11 @@ PPCTargetLowering::LowerCall_64SVR4(SDValue Chain, SDValue Callee,
MF.getInfo<PPCFunctionInfo>()->setHasFastCall(); MF.getInfo<PPCFunctionInfo>()->setHasFastCall();
// Count how many bytes are to be pushed on the stack, including the linkage // Count how many bytes are to be pushed on the stack, including the linkage
// area, and parameter passing area. We start with at least 48 bytes, which // area, and parameter passing area. On ELFv1, the linkage area is 48 bytes
// is reserved space for [SP][CR][LR][3 x unused]. // reserved space for [SP][CR][LR][2 x unused][TOC]; on ELFv2, the linkage
unsigned LinkageSize = PPCFrameLowering::getLinkageSize(true, false); // area is 32 bytes reserved space for [SP][CR][LR][TOC].
unsigned LinkageSize = PPCFrameLowering::getLinkageSize(true, false,
isELFv2ABI);
unsigned NumBytes = LinkageSize; unsigned NumBytes = LinkageSize;
// Add up all the space actually used. // Add up all the space actually used.
@ -4036,6 +4120,7 @@ PPCTargetLowering::LowerCall_64SVR4(SDValue Chain, SDValue Callee,
// Because we cannot tell if this is needed on the caller side, we have to // Because we cannot tell if this is needed on the caller side, we have to
// conservatively assume that it is needed. As such, make sure we have at // conservatively assume that it is needed. As such, make sure we have at
// least enough stack space for the caller to store the 8 GPRs. // least enough stack space for the caller to store the 8 GPRs.
// FIXME: On ELFv2, it may be unnecessary to allocate the parameter area.
NumBytes = std::max(NumBytes, LinkageSize + 8 * PtrByteSize); NumBytes = std::max(NumBytes, LinkageSize + 8 * PtrByteSize);
// Tail call needs the stack to be aligned. // Tail call needs the stack to be aligned.
@ -4374,7 +4459,7 @@ PPCTargetLowering::LowerCall_64SVR4(SDValue Chain, SDValue Callee,
// Load r2 into a virtual register and store it to the TOC save area. // Load r2 into a virtual register and store it to the TOC save area.
SDValue Val = DAG.getCopyFromReg(Chain, dl, PPC::X2, MVT::i64); SDValue Val = DAG.getCopyFromReg(Chain, dl, PPC::X2, MVT::i64);
// TOC save area offset. // TOC save area offset.
unsigned TOCSaveOffset = PPCFrameLowering::getTOCSaveOffset(); unsigned TOCSaveOffset = PPCFrameLowering::getTOCSaveOffset(isELFv2ABI);
SDValue PtrOff = DAG.getIntPtrConstant(TOCSaveOffset); SDValue PtrOff = DAG.getIntPtrConstant(TOCSaveOffset);
SDValue AddPtr = DAG.getNode(ISD::ADD, dl, PtrVT, StackPtr, PtrOff); SDValue AddPtr = DAG.getNode(ISD::ADD, dl, PtrVT, StackPtr, PtrOff);
Chain = DAG.getStore(Val.getValue(1), dl, Val, AddPtr, MachinePointerInfo(), Chain = DAG.getStore(Val.getValue(1), dl, Val, AddPtr, MachinePointerInfo(),
@ -4434,7 +4519,8 @@ PPCTargetLowering::LowerCall_Darwin(SDValue Chain, SDValue Callee,
// Count how many bytes are to be pushed on the stack, including the linkage // Count how many bytes are to be pushed on the stack, including the linkage
// area, and parameter passing area. We start with 24/48 bytes, which is // area, and parameter passing area. We start with 24/48 bytes, which is
// prereserved space for [SP][CR][LR][3 x unused]. // prereserved space for [SP][CR][LR][3 x unused].
unsigned LinkageSize = PPCFrameLowering::getLinkageSize(isPPC64, true); unsigned LinkageSize = PPCFrameLowering::getLinkageSize(isPPC64, true,
false);
unsigned NumBytes = LinkageSize; unsigned NumBytes = LinkageSize;
// Add up all the space actually used. // Add up all the space actually used.

View File

@ -7,11 +7,11 @@ target triple = "powerpc64le-unknown-linux-gnu"
define void @test_indirect(void ()* nocapture %fp) { define void @test_indirect(void ()* nocapture %fp) {
; CHECK-LABEL: @test_indirect ; CHECK-LABEL: @test_indirect
tail call void %fp() tail call void %fp()
; CHECK-DAG: std 2, 40(1) ; CHECK-DAG: std 2, 24(1)
; CHECK-DAG: mr 12, 3 ; CHECK-DAG: mr 12, 3
; CHECK-DAG: mtctr 3 ; CHECK-DAG: mtctr 3
; CHECK: bctrl ; CHECK: bctrl
; CHECK-NEXT: ld 2, 40(1) ; CHECK-NEXT: ld 2, 24(1)
ret void ret void
} }

View File

@ -22,7 +22,7 @@ entry:
ret void ret void
} }
; CHECK: @callee1 ; CHECK: @callee1
; CHECK: lwz {{[0-9]+}}, 120(1) ; CHECK: lwz {{[0-9]+}}, 104(1)
; CHECK: blr ; CHECK: blr
define void @caller1() { define void @caller1() {
@ -32,7 +32,7 @@ entry:
ret void ret void
} }
; CHECK: @caller1 ; CHECK: @caller1
; CHECK: stw {{[0-9]+}}, 120(1) ; CHECK: stw {{[0-9]+}}, 104(1)
; CHECK: bl test1 ; CHECK: bl test1
declare void @test1(%struct.small_arg* sret, %struct.large_arg* byval, %struct.small_arg* byval) declare void @test1(%struct.small_arg* sret, %struct.large_arg* byval, %struct.small_arg* byval)
@ -42,7 +42,7 @@ entry:
ret float %x ret float %x
} }
; CHECK: @callee2 ; CHECK: @callee2
; CHECK: lfs {{[0-9]+}}, 152(1) ; CHECK: lfs {{[0-9]+}}, 136(1)
; CHECK: blr ; CHECK: blr
define void @caller2() { define void @caller2() {
@ -52,7 +52,7 @@ entry:
ret void ret void
} }
; CHECK: @caller2 ; CHECK: @caller2
; CHECK: stfs {{[0-9]+}}, 152(1) ; CHECK: stfs {{[0-9]+}}, 136(1)
; CHECK: bl test2 ; CHECK: bl test2
declare float @test2(float, float, float, float, float, float, float, float, float, float, float, float, float, float) declare float @test2(float, float, float, float, float, float, float, float, float, float, float, float, float, float)