mirror of
https://github.com/c64scene-ar/llvm-6502.git
synced 2025-05-24 18:38:50 +00:00
This patch introduces a new pass that computes the safe point to insert the prologue and epilogue of the function. The interest is to find safe points that are cheaper than the entry and exits blocks. As an example and to avoid regressions to be introduce, this patch also implements the required bits to enable the shrink-wrapping pass for AArch64. ** Context ** Currently we insert the prologue and epilogue of the method/function in the entry and exits blocks. Although this is correct, we can do a better job when those are not immediately required and insert them at less frequently executed places. The job of the shrink-wrapping pass is to identify such places. ** Motivating example ** Let us consider the following function that perform a call only in one branch of a if: define i32 @f(i32 %a, i32 %b) { %tmp = alloca i32, align 4 %tmp2 = icmp slt i32 %a, %b br i1 %tmp2, label %true, label %false true: store i32 %a, i32* %tmp, align 4 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp) br label %false false: %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ] ret i32 %tmp.0 } On AArch64 this code generates (removing the cfi directives to ease readabilities): _f: ; @f ; BB#0: stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething LBB0_2: ; %false mov sp, x29 ldp x29, x30, [sp], #16 ret With shrink-wrapping we could generate: _f: ; @f ; BB#0: cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething add sp, x29, #16 ; =16 ldp x29, x30, [sp], #16 LBB0_2: ; %false ret Therefore, we would pay the overhead of setting up/destroying the frame only if we actually do the call. ** Proposed Solution ** This patch introduces a new machine pass that perform the shrink-wrapping analysis (See the comments at the beginning of ShrinkWrap.cpp for more details). It then stores the safe save and restore point into the MachineFrameInfo attached to the MachineFunction. This information is then used by the PrologEpilogInserter (PEI) to place the related code at the right place. This pass runs right before the PEI. Unlike the original paper of Chow from PLDI’88, this implementation of shrink-wrapping does not use expensive data-flow analysis and does not need hack to properly avoid frequently executed point. Instead, it relies on dominance and loop properties. The pass is off by default and each target can opt-in by setting the EnableShrinkWrap boolean to true in their derived class of TargetPassConfig. This setting can also be overwritten on the command line by using -enable-shrink-wrap. Before you try out the pass for your target, make sure you properly fix your emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not necessarily the entry block. ** Design Decisions ** 1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but for debugging and clarity I thought it was best to have its own file. 2. Right now, we only support one save point and one restore point. At some point we can expand this to several save point and restore point, the impacted component would then be: - The pass itself: New algorithm needed. - MachineFrameInfo: Hold a list or set of Save/Restore point instead of one pointer. - PEI: Should loop over the save point and restore point. Anyhow, at least for this first iteration, I do not believe this is interesting to support the complex cases. We should revisit that when we motivating examples. Differential Revision: http://reviews.llvm.org/D9210 <rdar://problem/3201744> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236507 91177308-0d34-0410-b5e6-96231b3b80d8
262 lines
11 KiB
C++
262 lines
11 KiB
C++
//===-- llvm/Target/TargetFrameLowering.h ---------------------------*- C++ -*-===//
|
|
//
|
|
// The LLVM Compiler Infrastructure
|
|
//
|
|
// This file is distributed under the University of Illinois Open Source
|
|
// License. See LICENSE.TXT for details.
|
|
//
|
|
//===----------------------------------------------------------------------===//
|
|
//
|
|
// Interface to describe the layout of a stack frame on the target machine.
|
|
//
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
#ifndef LLVM_TARGET_TARGETFRAMELOWERING_H
|
|
#define LLVM_TARGET_TARGETFRAMELOWERING_H
|
|
|
|
#include "llvm/CodeGen/MachineBasicBlock.h"
|
|
#include <utility>
|
|
#include <vector>
|
|
|
|
namespace llvm {
|
|
class CalleeSavedInfo;
|
|
class MachineFunction;
|
|
class RegScavenger;
|
|
|
|
/// Information about stack frame layout on the target. It holds the direction
|
|
/// of stack growth, the known stack alignment on entry to each function, and
|
|
/// the offset to the locals area.
|
|
///
|
|
/// The offset to the local area is the offset from the stack pointer on
|
|
/// function entry to the first location where function data (local variables,
|
|
/// spill locations) can be stored.
|
|
class TargetFrameLowering {
|
|
public:
|
|
enum StackDirection {
|
|
StackGrowsUp, // Adding to the stack increases the stack address
|
|
StackGrowsDown // Adding to the stack decreases the stack address
|
|
};
|
|
|
|
// Maps a callee saved register to a stack slot with a fixed offset.
|
|
struct SpillSlot {
|
|
unsigned Reg;
|
|
int Offset; // Offset relative to stack pointer on function entry.
|
|
};
|
|
private:
|
|
StackDirection StackDir;
|
|
unsigned StackAlignment;
|
|
unsigned TransientStackAlignment;
|
|
int LocalAreaOffset;
|
|
bool StackRealignable;
|
|
public:
|
|
TargetFrameLowering(StackDirection D, unsigned StackAl, int LAO,
|
|
unsigned TransAl = 1, bool StackReal = true)
|
|
: StackDir(D), StackAlignment(StackAl), TransientStackAlignment(TransAl),
|
|
LocalAreaOffset(LAO), StackRealignable(StackReal) {}
|
|
|
|
virtual ~TargetFrameLowering();
|
|
|
|
// These methods return information that describes the abstract stack layout
|
|
// of the target machine.
|
|
|
|
/// getStackGrowthDirection - Return the direction the stack grows
|
|
///
|
|
StackDirection getStackGrowthDirection() const { return StackDir; }
|
|
|
|
/// getStackAlignment - This method returns the number of bytes to which the
|
|
/// stack pointer must be aligned on entry to a function. Typically, this
|
|
/// is the largest alignment for any data object in the target.
|
|
///
|
|
unsigned getStackAlignment() const { return StackAlignment; }
|
|
|
|
/// getTransientStackAlignment - This method returns the number of bytes to
|
|
/// which the stack pointer must be aligned at all times, even between
|
|
/// calls.
|
|
///
|
|
unsigned getTransientStackAlignment() const {
|
|
return TransientStackAlignment;
|
|
}
|
|
|
|
/// isStackRealignable - This method returns whether the stack can be
|
|
/// realigned.
|
|
bool isStackRealignable() const {
|
|
return StackRealignable;
|
|
}
|
|
|
|
/// getOffsetOfLocalArea - This method returns the offset of the local area
|
|
/// from the stack pointer on entrance to a function.
|
|
///
|
|
int getOffsetOfLocalArea() const { return LocalAreaOffset; }
|
|
|
|
/// isFPCloseToIncomingSP - Return true if the frame pointer is close to
|
|
/// the incoming stack pointer, false if it is close to the post-prologue
|
|
/// stack pointer.
|
|
virtual bool isFPCloseToIncomingSP() const { return true; }
|
|
|
|
/// assignCalleeSavedSpillSlots - Allows target to override spill slot
|
|
/// assignment logic. If implemented, assignCalleeSavedSpillSlots() should
|
|
/// assign frame slots to all CSI entries and return true. If this method
|
|
/// returns false, spill slots will be assigned using generic implementation.
|
|
/// assignCalleeSavedSpillSlots() may add, delete or rearrange elements of
|
|
/// CSI.
|
|
virtual bool
|
|
assignCalleeSavedSpillSlots(MachineFunction &MF,
|
|
const TargetRegisterInfo *TRI,
|
|
std::vector<CalleeSavedInfo> &CSI) const {
|
|
return false;
|
|
}
|
|
|
|
/// getCalleeSavedSpillSlots - This method returns a pointer to an array of
|
|
/// pairs, that contains an entry for each callee saved register that must be
|
|
/// spilled to a particular stack location if it is spilled.
|
|
///
|
|
/// Each entry in this array contains a <register,offset> pair, indicating the
|
|
/// fixed offset from the incoming stack pointer that each register should be
|
|
/// spilled at. If a register is not listed here, the code generator is
|
|
/// allowed to spill it anywhere it chooses.
|
|
///
|
|
virtual const SpillSlot *
|
|
getCalleeSavedSpillSlots(unsigned &NumEntries) const {
|
|
NumEntries = 0;
|
|
return nullptr;
|
|
}
|
|
|
|
/// targetHandlesStackFrameRounding - Returns true if the target is
|
|
/// responsible for rounding up the stack frame (probably at emitPrologue
|
|
/// time).
|
|
virtual bool targetHandlesStackFrameRounding() const {
|
|
return false;
|
|
}
|
|
|
|
/// emitProlog/emitEpilog - These methods insert prolog and epilog code into
|
|
/// the function.
|
|
virtual void emitPrologue(MachineFunction &MF,
|
|
MachineBasicBlock &MBB) const = 0;
|
|
virtual void emitEpilogue(MachineFunction &MF,
|
|
MachineBasicBlock &MBB) const = 0;
|
|
|
|
/// Adjust the prologue to have the function use segmented stacks. This works
|
|
/// by adding a check even before the "normal" function prologue.
|
|
virtual void adjustForSegmentedStacks(MachineFunction &MF,
|
|
MachineBasicBlock &PrologueMBB) const {}
|
|
|
|
/// Adjust the prologue to add Erlang Run-Time System (ERTS) specific code in
|
|
/// the assembly prologue to explicitly handle the stack.
|
|
virtual void adjustForHiPEPrologue(MachineFunction &MF,
|
|
MachineBasicBlock &PrologueMBB) const {}
|
|
|
|
/// Adjust the prologue to add an allocation at a fixed offset from the frame
|
|
/// pointer.
|
|
virtual void
|
|
adjustForFrameAllocatePrologue(MachineFunction &MF,
|
|
MachineBasicBlock &PrologueMBB) const {}
|
|
|
|
/// spillCalleeSavedRegisters - Issues instruction(s) to spill all callee
|
|
/// saved registers and returns true if it isn't possible / profitable to do
|
|
/// so by issuing a series of store instructions via
|
|
/// storeRegToStackSlot(). Returns false otherwise.
|
|
virtual bool spillCalleeSavedRegisters(MachineBasicBlock &MBB,
|
|
MachineBasicBlock::iterator MI,
|
|
const std::vector<CalleeSavedInfo> &CSI,
|
|
const TargetRegisterInfo *TRI) const {
|
|
return false;
|
|
}
|
|
|
|
/// restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee
|
|
/// saved registers and returns true if it isn't possible / profitable to do
|
|
/// so by issuing a series of load instructions via loadRegToStackSlot().
|
|
/// Returns false otherwise.
|
|
virtual bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB,
|
|
MachineBasicBlock::iterator MI,
|
|
const std::vector<CalleeSavedInfo> &CSI,
|
|
const TargetRegisterInfo *TRI) const {
|
|
return false;
|
|
}
|
|
|
|
/// hasFP - Return true if the specified function should have a dedicated
|
|
/// frame pointer register. For most targets this is true only if the function
|
|
/// has variable sized allocas or if frame pointer elimination is disabled.
|
|
virtual bool hasFP(const MachineFunction &MF) const = 0;
|
|
|
|
/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
|
|
/// not required, we reserve argument space for call sites in the function
|
|
/// immediately on entry to the current function. This eliminates the need for
|
|
/// add/sub sp brackets around call sites. Returns true if the call frame is
|
|
/// included as part of the stack frame.
|
|
virtual bool hasReservedCallFrame(const MachineFunction &MF) const {
|
|
return !hasFP(MF);
|
|
}
|
|
|
|
/// canSimplifyCallFramePseudos - When possible, it's best to simplify the
|
|
/// call frame pseudo ops before doing frame index elimination. This is
|
|
/// possible only when frame index references between the pseudos won't
|
|
/// need adjusting for the call frame adjustments. Normally, that's true
|
|
/// if the function has a reserved call frame or a frame pointer. Some
|
|
/// targets (Thumb2, for example) may have more complicated criteria,
|
|
/// however, and can override this behavior.
|
|
virtual bool canSimplifyCallFramePseudos(const MachineFunction &MF) const {
|
|
return hasReservedCallFrame(MF) || hasFP(MF);
|
|
}
|
|
|
|
// needsFrameIndexResolution - Do we need to perform FI resolution for
|
|
// this function. Normally, this is required only when the function
|
|
// has any stack objects. However, targets may want to override this.
|
|
virtual bool needsFrameIndexResolution(const MachineFunction &MF) const;
|
|
|
|
/// getFrameIndexOffset - Returns the displacement from the frame register to
|
|
/// the stack frame of the specified index.
|
|
virtual int getFrameIndexOffset(const MachineFunction &MF, int FI) const;
|
|
|
|
/// getFrameIndexReference - This method should return the base register
|
|
/// and offset used to reference a frame index location. The offset is
|
|
/// returned directly, and the base register is returned via FrameReg.
|
|
virtual int getFrameIndexReference(const MachineFunction &MF, int FI,
|
|
unsigned &FrameReg) const;
|
|
|
|
/// Same as above, except that the 'base register' will always be RSP, not
|
|
/// RBP on x86. This is used exclusively for lowering STATEPOINT nodes.
|
|
/// TODO: This should really be a parameterizable choice.
|
|
virtual int getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI,
|
|
unsigned &FrameReg) const {
|
|
// default to calling normal version, we override this on x86 only
|
|
llvm_unreachable("unimplemented for non-x86");
|
|
return 0;
|
|
}
|
|
|
|
/// processFunctionBeforeCalleeSavedScan - This method is called immediately
|
|
/// before PrologEpilogInserter scans the physical registers used to determine
|
|
/// what callee saved registers should be spilled. This method is optional.
|
|
virtual void processFunctionBeforeCalleeSavedScan(MachineFunction &MF,
|
|
RegScavenger *RS = nullptr) const {
|
|
|
|
}
|
|
|
|
/// processFunctionBeforeFrameFinalized - This method is called immediately
|
|
/// before the specified function's frame layout (MF.getFrameInfo()) is
|
|
/// finalized. Once the frame is finalized, MO_FrameIndex operands are
|
|
/// replaced with direct constants. This method is optional.
|
|
///
|
|
virtual void processFunctionBeforeFrameFinalized(MachineFunction &MF,
|
|
RegScavenger *RS = nullptr) const {
|
|
}
|
|
|
|
/// eliminateCallFramePseudoInstr - This method is called during prolog/epilog
|
|
/// code insertion to eliminate call frame setup and destroy pseudo
|
|
/// instructions (but only if the Target is using them). It is responsible
|
|
/// for eliminating these instructions, replacing them with concrete
|
|
/// instructions. This method need only be implemented if using call frame
|
|
/// setup/destroy pseudo instructions.
|
|
///
|
|
virtual void
|
|
eliminateCallFramePseudoInstr(MachineFunction &MF,
|
|
MachineBasicBlock &MBB,
|
|
MachineBasicBlock::iterator MI) const {
|
|
llvm_unreachable("Call Frame Pseudo Instructions do not exist on this "
|
|
"target!");
|
|
}
|
|
};
|
|
|
|
} // End llvm namespace
|
|
|
|
#endif
|