the target can handle a given basic block as prologue
or epilogue.
Related to <rdar://problem/20821487>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238292 91177308-0d34-0410-b5e6-96231b3b80d8
If there is an InstAlias defined for an instruction that had a custom
converter (AsmMatchConverter), then when the alias is matched,
the custom converter will be used rather than the converter generated
by the InstAlias.
This patch adds the UseInstAsmMatchConverter field to the InstAlias
class, which allows you to override this behavior and force the
converter generated by the InstAlias to be used.
This is required for some future improvemnts to the R600 assembler.
Differential Revision: http://reviews.llvm.org/D9083
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238210 91177308-0d34-0410-b5e6-96231b3b80d8
On GPU targets, materializing constants is cheap and stores are
expensive, so only doing this for zero vectors was silly.
Most of the new testcases aren't optimally merged, and are for
later improvements.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238108 91177308-0d34-0410-b5e6-96231b3b80d8
This is part of the work to remove TargetMachine::resetTargetOptions.
In this patch, instead of updating global variable NoFramePointerElim in
resetTargetOptions, its use in DisableFramePointerElim is replaced with a call
to TargetFrameLowering::noFramePointerElim. This function determines on a
per-function basis if frame pointer elimination should be disabled.
There is no change in functionality except that cl:opt option "disable-fp-elim"
can now override function attribute "no-frame-pointer-elim".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238080 91177308-0d34-0410-b5e6-96231b3b80d8
This is in preparation to making changes needed to stop resetting
NoFramePointerElim in resetTargetOptions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238079 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds a class for processing many recip codegen possibilities.
The TargetRecip class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.
The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other CPUs continue to *not* use reciprocal estimates by default with -ffast-math.
Differential Revision: http://reviews.llvm.org/D8982
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238051 91177308-0d34-0410-b5e6-96231b3b80d8
This starts merging MCSection and MCSectionData.
There are a few issues with the current split between MCSection and
MCSectionData.
* It optimizes the the not as important case. We want the production
of .o files to be really fast, but the split puts the information used
for .o emission in a separate data structure.
* The ELF/COFF/MachO hierarchy is not represented in MCSectionData,
leading to some ad-hoc ways to represent the various flags.
* It makes it harder to remember where each item is.
The attached patch starts merging the two by moving the alignment from
MCSectionData to MCSection.
Most of the patch is actually just dropping 'const', since
MCSectionData is mutable, but MCSection was not.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237936 91177308-0d34-0410-b5e6-96231b3b80d8
This was previously returning int. However there are no negative opcode
numbers and more importantly this was needlessly different from
MCInstrDesc::getOpcode() (which even is the value returned here) and
SDValue::getOpcode()/SDNode::getOpcode().
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237611 91177308-0d34-0410-b5e6-96231b3b80d8
This adds new SDNodes for signed/unsigned min/max. These nodes are built from
select/icmp pairs matched at SDAGBuilder stage.
This patch adds the nodes, as well as legalization support and sets them to
be "expand" for all targets.
NFC for now; this will be tested when I switch AArch64 to using these new
nodes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237423 91177308-0d34-0410-b5e6-96231b3b80d8
to use the information in the module rather than TargetOptions.
We've had and clang has used the use-soft-float attribute for some
time now so have the backends set a subtarget feature based on
a particular function now that subtargets are created based on
functions and function attributes.
For the one middle end soft float check go ahead and create
an overloadable TargetLowering::useSoftFloat function that
just checks the TargetSubtargetInfo in all cases.
Also remove the command line option that hard codes whether or
not soft-float is set by using the attribute for all of the
target specific test cases - for the generic just go ahead and
add the attribute in the one case that showed up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237079 91177308-0d34-0410-b5e6-96231b3b80d8
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.
As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.
** Context **
Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.
** Motivating example **
Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b) {
%tmp = alloca i32, align 4
%tmp2 = icmp slt i32 %a, %b
br i1 %tmp2, label %true, label %false
true:
store i32 %a, i32* %tmp, align 4
%tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
br label %false
false:
%tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
ret i32 %tmp.0
}
On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f: ; @f
; BB#0:
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
LBB0_2: ; %false
mov sp, x29
ldp x29, x30, [sp], #16
ret
With shrink-wrapping we could generate:
_f: ; @f
; BB#0:
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
add sp, x29, #16 ; =16
ldp x29, x30, [sp], #16
LBB0_2: ; %false
ret
Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.
** Proposed Solution **
This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.
Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.
The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.
Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.
** Design Decisions **
1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.
Differential Revision: http://reviews.llvm.org/D9210
<rdar://problem/3201744>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236507 91177308-0d34-0410-b5e6-96231b3b80d8
This should make it clear under which narrow circumstances implicit
physreg uses are okay when rematerializing and prevent people from
accidentally allowing too much when overriding
isReallyTriviallyReMaterializable() even with the weaker assert in the
RegisterCoalescer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235679 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch adds legalization support to operate on FP16 as a load/store type
and do operations on it as floats.
Tests for ARM are added to test/CodeGen/ARM/fp16-promote.ll
Reviewers: srhines, t.p.northover
Differential Revision: http://reviews.llvm.org/D8755
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235215 91177308-0d34-0410-b5e6-96231b3b80d8
formatted_raw_ostream is a wrapper over another stream to add column and line
number tracking.
It is used only for asm printing.
This patch moves the its creation down to where we know we are printing
assembly. This has the following advantages:
* Simpler lifetime management: std::unique_ptr
* We don't compute column and line number of object files :-)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234535 91177308-0d34-0410-b5e6-96231b3b80d8
Revert "Add classof implementations to the raw_ostream classes."
Revert "Use the cast machinery to remove dummy uses of formatted_raw_ostream."
The underlying issue can be fixed without classof.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234495 91177308-0d34-0410-b5e6-96231b3b80d8
If we know we are producing an object, we don't need to wrap the stream
in a formatted_raw_ostream anymore.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234461 91177308-0d34-0410-b5e6-96231b3b80d8
Specify an allocation order with a register class. This is used by register
allocators with a greedy heuristic. This is usefull as it is sometimes
beneficial to color more constrained classes first.
Differential Revision: http://reviews.llvm.org/D8626
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233743 91177308-0d34-0410-b5e6-96231b3b80d8
per-function subtarget.
Currently, code-gen passes the default or generic subtarget to the constructors
of MCInstPrinter subclasses (see LLVMTargetMachine::addPassesToEmitFile), which
enables some targets (AArch64, ARM, and X86) to change their instprinter's
behavior based on the subtarget feature bits. Since the backend can now use
different subtargets for each function, instprinter has to be changed to use the
per-function subtarget rather than the default subtarget.
This patch takes the first step towards enabling instprinter to change its
behavior based on the per-function subtarget. It adds a bit "PassSubtarget" to
AsmWriter which tells table-gen to pass a reference to MCSubtargetInfo to the
various print methods table-gen auto-generates.
I will follow up with changes to instprinters of AArch64, ARM, and X86.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@233411 91177308-0d34-0410-b5e6-96231b3b80d8
Fixing sign extension in makeLibCall for MIPS64. In MIPS64 architecture all
32 bit arguments (int, unsigned int, float 32 (soft float)) must be sign
extended. This fixes test "MultiSource/Applications/oggenc/".
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D7791
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232943 91177308-0d34-0410-b5e6-96231b3b80d8
TargetMachine::getSubtargetImpl routines.
This keeps the target independent code free of bare subtarget
calls while the remainder of the backends are migrated, or not
if they don't wish to support per-function subtargets as would
be needed for function multiversioning or LTO of disparate
cpu subarchitecture types, e.g.
clang -msse4.2 -c foo.c -emit-llvm -o foo.bc
clang -c bar.c -emit-llvm -o bar.bc
llvm-link foo.bc bar.bc -o baz.bc
llc baz.bc
and get appropriate code for what the command lines requested.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232885 91177308-0d34-0410-b5e6-96231b3b80d8
LocalStackSlotPass assumes that isFrameOffsetLegal doesn't change its
answer when the base register changes. Unfortunately this isn't true
in thumb1, where SP-based loads allow a larger offset than
non-SP-based loads, and this causes the base register reuse code to
generate instructions that are unencodable, causing an assertion
failure.
Solve this by adding a BaseReg parameter to isFrameOffsetLegal, which
ARMBaseRegisterInfo can then make use of to give the correct answer.
Differential Revision: http://reviews.llvm.org/D8419
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232825 91177308-0d34-0410-b5e6-96231b3b80d8
This enables us to remove calls to the subtarget from the TargetMachine
and with a small hack for backends that require global subtarget
information for module level code generation, e.g. mips abi flags, as
mentioned in a fixme in the code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232776 91177308-0d34-0410-b5e6-96231b3b80d8
Some subregisters are only to indicate different access sizes, while not
providing any way to actually divide the register up into multiple
disjunct parts. Avoid tracking subregister liveness in these cases as it
is not beneficial.
Differential Revision: http://reviews.llvm.org/D8429
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232695 91177308-0d34-0410-b5e6-96231b3b80d8