The combiner was creating Q-register loads and stores, which then had to be spilled because there are no callee-save Q registers!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214634 91177308-0d34-0410-b5e6-96231b3b80d8
Add branch weights to branch instructions, so that the following passes can
optimize based on it (i.e. basic block ordering).
Fixes <rdar://problem/17887137>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214537 91177308-0d34-0410-b5e6-96231b3b80d8
ADDS and SUBS cannot encode negative immediates or immediates larger than 12bit.
This fix checks if the immediate version can be used under this constraints and
if we can convert ADDS to SUBS or vice versa to support negative immediates.
Also update the test cases to test the immediate versions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214470 91177308-0d34-0410-b5e6-96231b3b80d8
This commit updates the existing SelectionDAG tests for the stackmap and patchpoint
intrinsics and enables FastISel testing. It also splits up the tests into separate
files, due to different codegen between SelectionDAG and FastISel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214382 91177308-0d34-0410-b5e6-96231b3b80d8
Currently the large code model for MachO uses the GOT to make function calls.
Emit the required adrp and ldr instructions to load the address from the GOT.
Related to <rdar://problem/17733076>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214381 91177308-0d34-0410-b5e6-96231b3b80d8
This improves the code generation for the XALU intrinsics when the
condition is feeding a select instruction.
This also updates and enables the XALU unit tests for FastISel.
This fixes <rdar://problem/17831117>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214350 91177308-0d34-0410-b5e6-96231b3b80d8
Currently the shift-immediate versions are not supported by tblgen and
hopefully this can be later removed, once the required support has been
added to tblgen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214345 91177308-0d34-0410-b5e6-96231b3b80d8
'J' represents a negative number suitable for an add/sub alias
instruction, but while preparing it to become an int64_t we were
mangling the sign extension. So "i32 -1" became 0xffffffffLL, for
example.
Should fix one half of PR20456.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214052 91177308-0d34-0410-b5e6-96231b3b80d8
address of the stack guard was being spilled to the stack.
Previously the address of the stack guard would get spilled to the stack if it
was impossible to keep it in a register. This patch introduces a new target
independent node and pseudo instruction which gets expanded post-RA to a
sequence of instructions that load the stack guard value. Register allocator
can now just remat the value when it can't keep it in a register.
<rdar://problem/12475629>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213967 91177308-0d34-0410-b5e6-96231b3b80d8
This commit implements the frameaddress intrinsic for the AArch64 architecture
in FastISel.
There were two test cases that pretty much tested the same, so I combined them
to a single test case.
Fixes <rdar://problem/17811834>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213959 91177308-0d34-0410-b5e6-96231b3b80d8
which have successfully round-tripped through the combine phase, and use
this to ensure all operands to DAG nodes are visited by the combiner,
even if they are only added during the combine phase.
This is critical to have the combiner reach nodes that are *introduced*
during combining. Previously these would sometimes be visited and
sometimes not be visited based on whether they happened to end up on the
worklist or not. Now we always run them through the combiner.
This fixes quite a few bad codegen test cases lurking in the suite while
also being more principled. Among these, the TLS codegeneration is
particularly exciting for programs that have this in the critical path
like TSan-instrumented binaries (although I think they engineer to use
a different TLS that is faster anyways).
I've tried to check for compile-time regressions here by running llc
over a merged (but not LTO-ed) clang bitcode file and observed at most
a 3% slowdown in llc. Given that this is essentially a worst case (none
of opt or clang are running at this phase) I think this is tolerable.
The actual LTO case should be even less costly, and the cost in normal
compilation should be negligible.
With this combining logic, it is possible to re-legalize as we combine
which is necessary to implement PSHUFB formation on x86 as
a post-legalize DAG combine (my ultimate goal).
Differential Revision: http://reviews.llvm.org/D4638
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213898 91177308-0d34-0410-b5e6-96231b3b80d8
This bug is introduced by r211144. The element of operand may be
smaller than the element of result, but previous commit can
only handle the contrary condition. This commit is to handle this
scenario and generate optimized codes like ZIP1.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213830 91177308-0d34-0410-b5e6-96231b3b80d8
The transform to constant fold unary operations with an AND across a
vector comparison applies when the constant is not a splat of a scalar
as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213800 91177308-0d34-0410-b5e6-96231b3b80d8
The folding of unary operations through a vector compare and mask operation
is only safe if the unary operation result is of the same size as its input.
For example, it's not safe for [su]itofp from v4i32 to v4f64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213799 91177308-0d34-0410-b5e6-96231b3b80d8
I used the wrong method to obtain the return type inside FinishCall. This fix
simply uses the return type from FastLowerCall, which we already determined to
be a valid type.
Reduced test case from Chad. Thanks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213788 91177308-0d34-0410-b5e6-96231b3b80d8
The target-independent DAGcombiner will generate:
asr w1, X, #31 w1 = splat sign bit.
add X, X, w1, lsr #28 X = X + 0 or pow2-1
asr w0, X, asr #4 w0 = X/pow2
However, the add + shifts is expensive, so generate:
add w0, X, 15 w0 = X + pow2-1
cmp X, wzr X - 0
csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X;
asr w0, X, asr 4 w0 = X/pow2
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213758 91177308-0d34-0410-b5e6-96231b3b80d8
There really is no arm64_be: it was a useful fiction to test big-endian support
while both backends existed in parallel, but now the only platform that uses
the name (iOS) doesn't have a big-endian variant, let alone one called
"arm64_be".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213748 91177308-0d34-0410-b5e6-96231b3b80d8
This commit modifies the existing call lowering functions to be used as the
FastLowerCall and FastLowerIntrinsicCall target-hooks instead.
This enables patchpoint intrinsic lowering for AArch64.
This fixes <rdar://problem/17733076>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213704 91177308-0d34-0410-b5e6-96231b3b80d8
This makes the first stage DAG for @llvm.convert.to.fp16 an fptrunc,
and correspondingly @llvm.convert.from.fp16 an fpext. The legalisation
path is now uniform, regardless of the input IR:
fptrunc -> FP_TO_FP16 (if f16 illegal) -> libcall
fpext -> FP16_TO_FP (if f16 illegal) -> libcall
Each target should be able to select the version that best matches its
operations and not be required to duplicate patterns for both fptrunc
and FP_TO_FP16 (for example).
As a result we can remove some redundant AArch64 patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213507 91177308-0d34-0410-b5e6-96231b3b80d8
Because i16 is illegal, there's no native DAG method to
represent a bitcast to or from an f16 type. This meant LLVM was
inserting a stack store/load pair which is really not ideal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213378 91177308-0d34-0410-b5e6-96231b3b80d8
Actual support for softening f16 operations is still limited, and can be added
when it's needed. But Soften is much closer to being a useful thing to try
than keeping it Legal when no registers can actually hold such values.
Longer term, we probably want something between Soften and Promote semantics
for most targets, it'll be more efficient to promote the 4 basic operations to
f32 than libcall them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213372 91177308-0d34-0410-b5e6-96231b3b80d8
Since the result of a SETCC for AArch64 is 0 or -1 in each lane, we can
move unary operations, in this case [su]int_to_fp through the mask
operation and constant fold the operation away. Generally speaking:
UNARYOP(AND(VECTOR_CMP(x,y), constant))
--> AND(VECTOR_CMP(x,y), constant2)
where constant2 is UNARYOP(constant).
This implements the transform where UNARYOP is [su]int_to_fp.
For example, consider the simple function:
define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind {
%cmp = fcmp oeq <4 x float> %val, %test
%ext = zext <4 x i1> %cmp to <4 x i32>
%result = sitofp <4 x i32> %ext to <4 x float>
ret <4 x float> %result
}
Before this change, the code is generated as:
fcmeq.4s v0, v0, v1
movi.4s v1, #0x1 // Integer splat value.
and.16b v0, v0, v1 // Mask lanes based on the comparison.
scvtf.4s v0, v0 // Convert each lane to f32.
ret
After, the code is improved to:
fcmeq.4s v0, v0, v1
fmov.4s v1, #1.00000000 // f32 splat value.
and.16b v0, v0, v1 // Mask lanes based on the comparison.
ret
The svvtf.4s has been constant folded away and the floating point 1.0f
vector lanes are materialized directly via fmov.4s.
Rather than do the folding manually in the target code, teach getNode()
in the generic SelectionDAG to handle folding constant operands of
vector [su]int_to_fp nodes. It is reasonable (as noted in a FIXME) to do
additional constant folding there as well, but I don't have test cases
for those operations, so leaving them for another time when it becomes
appropriate.
rdar://17693791
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213341 91177308-0d34-0410-b5e6-96231b3b80d8
This makes the two intrinsics @llvm.convert.from.f16 and
@llvm.convert.to.f16 accept types other than simple "float". This is
only strictly needed for the truncate operation, since otherwise
double rounding occurs and there's no way to represent the strict IEEE
conversion. However, for symmetry we allow larger types in the extend
too.
During legalization, we can expand an "fp16_to_double" operation into
two extends for convenience, but abort when the truncate isn't legal. A new
libcall is probably needed here.
Even after this commit, various target tweaks are needed to actually use the
extended intrinsics. I've put these into separate commits for clarity, so there
are no actual tests of f64 conversion here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213248 91177308-0d34-0410-b5e6-96231b3b80d8
Memory barrier __builtin_arm_[dmb, dsb, isb] intrinsics are required to
implement their corresponding ACLE and MSVC intrinsics.
This patch ports ARM dmb, dsb, isb intrinsic to AArch64.
Differential Revision: http://reviews.llvm.org/D4520
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213247 91177308-0d34-0410-b5e6-96231b3b80d8
Sufficiently twisted use of TableGen lets us write patterns directly for f16
(as an i16 promoted to i32) -> f32 conversion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212933 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a llvm.aarch64.hint intrinsic to mirror the llvm.arm.hint in order to
support the various hint intrinsic functions in the ACLE.
Add an optional pattern field that permits the subclass to specify the pattern
that matches the selection. The intrinsic pattern is set as mayLoad, mayStore,
so overload the value for the definition of the hint instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212883 91177308-0d34-0410-b5e6-96231b3b80d8
ACLE 2.0 allows __fp16 to be used as a function argument or return
type. This enables this for AArch64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212812 91177308-0d34-0410-b5e6-96231b3b80d8
Storing will generally be immediately preceded by rounding from an f32
or f64, so make sure to match those patterns directly to convert into the
FPR16 register class directly rather than going through the integer GPRs.
This also eliminates an extra step in the convert-from-f64 path
which was first converting to f32 and then to f16 from there.
rdar://17594379
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212638 91177308-0d34-0410-b5e6-96231b3b80d8
Loading will generally extend to an f32 or an 64, so make sure
to match those patterns directly to load into the FPR16 register
class directly rather than going through the integer GPRs.
This also eliminates an extra step in the convert-to-f64 path
which was first converting to f32 and then to f64 from there.
rdar://17594379
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212573 91177308-0d34-0410-b5e6-96231b3b80d8
Currently AArch64FastISel crashes if it tries to extend an integer into an
MVT::i128. This can happen by creating 128 bit integers like so:
typedef unsigned int uint128_t __attribute__((mode(TI)));
typedef int sint128_t __attribute__((mode(TI)));
This patch makes EmitIntExt check for their presence and then falls back to
SelectionDAG.
Tests included.
rdar://17516686
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212492 91177308-0d34-0410-b5e6-96231b3b80d8
We've been performing the wrong operation on ARM for "atomicrmw nand" for
years, since "a NAND b" is "~(a & b)" rather than ARM's very tempting "a & ~b".
This bled over into the generic expansion pass.
So I assume no-one has ever actually tried to do an atomic nand in the real
world. Oh well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212443 91177308-0d34-0410-b5e6-96231b3b80d8
vector type legalization strategies in a more fine grained manner, and
change the legalization of several v1iN types and v1f32 to be widening
rather than scalarization on AArch64.
This fixes an assertion failure caused by scalarizing nodes like "v1i32
trunc v1i64". As v1i64 is legal it will fail to scalarize v1i32.
This also provides a foundation for other targets to have more granular
control over how vector types are legalized.
Patch by Hao Liu, reviewed by Tim Northover. I'm committing it to allow
some work to start taking place on top of this patch as it adds some
really important hooks to the backend that I'd like to immediately start
using. =]
http://reviews.llvm.org/D4322
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212242 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commits r212189 and r212190.
While this pass was accidentally disabled (until r212073), r205437
slipped in a use of `auto` that should have been `auto&`.
This fixes PR20188.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212201 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r212109, which reverted r212088.
However, disable the assert as it's not necessary for correctness. There are
several corner cases that the assert needed to handle better for in-order
scheduling, but none of them are incorrect scheduler behavior. The assert is
mainly there to collect good unit tests like this and ensure that the
target-independent scheduler is working as expected with the various machine
models.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212187 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r212088, which is causing a number of spec
failures. Will provide reduced test cases shortly.
PR20057
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212109 91177308-0d34-0410-b5e6-96231b3b80d8
AArch64AddressTypePromotion was doing nothing because it was using the
old semantics of `Use` and `uses()`, when it really wanted to get at the
`users()`.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212073 91177308-0d34-0410-b5e6-96231b3b80d8
The combine for mul x, pow2 +/- 1 is unchanged. Test cases for
both combines as well as mul x, pow2 have been added as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212044 91177308-0d34-0410-b5e6-96231b3b80d8
Fixe for Bug 20057 - Assertion failied in llvm::SUnit* llvm::SchedBoundary::pickOnlyChoice(): Assertion `i <= (HazardRec->getMaxLookAhead() + MaxObservedStall) && "permanent hazard"'
Thanks to Chad for the test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211865 91177308-0d34-0410-b5e6-96231b3b80d8
"Fix PR20056: Implement pseudo LDR <reg>, =<literal/label> for AArch64"
Missed files are added in this commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211605 91177308-0d34-0410-b5e6-96231b3b80d8
ReconstructShuffle() may wrongly creat a CONCAT_VECTOR trying to
concat 2 of v2i32 into v4i16. This commit is to fix this issue and
try to generate UZP1 instead of lots of MOV and INS.
Patch is initalized by Kevin Qin, and refactored by Tim Northover.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211144 91177308-0d34-0410-b5e6-96231b3b80d8
To make sure branches are in range, we need to do a better job of estimating
the length of an inline assembly block than "it's probably 1 instruction, who'd
write asm with more than that?".
Fortunately there's already a (highly suspect, see how many ways you can think
of to break it!) callback for this purpose, which is used by the other targets.
rdar://problem/17277590
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211095 91177308-0d34-0410-b5e6-96231b3b80d8
There's probably no acatual change in behaviour here, just updating
the LowerFP_TO_INT function to be more similar to the reverse
implementation and updating costs to current CodeGen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210985 91177308-0d34-0410-b5e6-96231b3b80d8
This patch is to move GlobalMerge pass from Transform/Scalar
to CodeGen, because GlobalMerge depends on TargetMachine.
In the mean time, the macro INITIALIZE_TM_PASS is also moved
to CodeGen/Passes.h. With this fix we can avoid making
libScalarOpts depend on libCodeGen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210951 91177308-0d34-0410-b5e6-96231b3b80d8
This commit adds a weak variant of the cmpxchg operation, as described
in C++11. A cmpxchg instruction with this modifier is permitted to
fail to store, even if the comparison indicated it should.
As a result, cmpxchg instructions must return a flag indicating
success in addition to their original iN value loaded. Thus, for
uniformity *all* cmpxchg instructions now return "{ iN, i1 }". The
second flag is 1 when the store succeeded.
At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been
added as the natural representation for the new cmpxchg instructions.
It is a strong cmpxchg.
By default this gets Expanded to the existing ATOMIC_CMP_SWAP during
Legalization, so existing backends should see no change in behaviour.
If they wish to deal with the enhanced node instead, they can call
setOperationAction on it. Beware: as a node with 2 results, it cannot
be selected from TableGen.
Currently, no use is made of the extra information provided in this
patch. Test updates are almost entirely adapting the input IR to the
new scheme.
Summary for out of tree users:
------------------------------
+ Legacy Bitcode files are upgraded during read.
+ Legacy assembly IR files will be invalid.
+ Front-ends must adapt to different type for "cmpxchg".
+ Backends should be unaffected by default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210903 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is to improve global merge pass and support global symbol merge.
The global symbol merge is not enabled by default. For aarch64, we need some
more back-end fix to make it really benifit ADRP CSE.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210640 91177308-0d34-0410-b5e6-96231b3b80d8
As Ana Pazos pointed out, these have to be restored to their incoming values
before a function returns; i.e. before the tail call. So they can't be used
correctly as the destination register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210525 91177308-0d34-0410-b5e6-96231b3b80d8
Previously we were abandonning the attempt, leading to some combination of
extra work (when selection of a load/store fails completely) and inferior code
(when this leads to a real memcpy call instead of inlining).
rdar://problem/17187463
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210520 91177308-0d34-0410-b5e6-96231b3b80d8
We were hitting an assert if FastISel couldn't create the load or store we
requested. Currently this happens for large frame-local addresses, though
CodeGen could be improved there.
rdar://problem/17187463
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210519 91177308-0d34-0410-b5e6-96231b3b80d8
The tests check that the following transform happens:
(ldr|str) X, [x20]
...
sub x20, x20, #16
->
(ldr|str) X, [x20], #-16
with X being either w0, x0, s0, d0 or q0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210113 91177308-0d34-0410-b5e6-96231b3b80d8
This means the output of LowerFormalArguments returns a lowered
SDValue with the correct type (expected in SelectionDAGBuilder).
Without this, an assertion under a DEBUG macro triggers when those
types are passed on the stack.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210102 91177308-0d34-0410-b5e6-96231b3b80d8
Add tests for the following transform:
add x8, x8, #16
...
str X, [x8]
->
str X, [x8, #16]!
with X being either w0, x0, s0, d0 or q0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210021 91177308-0d34-0410-b5e6-96231b3b80d8
Add tests for the following transform:
add x8, x8, #16
...
ldr X, [x8]
->
ldr X, [x8, #16]!
with X being either w0, x0, s0, d0 or q0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210018 91177308-0d34-0410-b5e6-96231b3b80d8
The C and C++ semantics for compare_exchange require it to return a bool
indicating success. This gets mapped to LLVM IR which follows each cmpxchg with
an icmp of the value loaded against the desired value.
When lowered to ldxr/stxr loops, this extra comparison is redundant: its
results are implicit in the control-flow of the function.
This commit makes two changes: it replaces that icmp with appropriate PHI
nodes, and then makes sure earlyCSE is called after expansion to actually make
use of the opportunities revealed.
I've also added -{arm,aarch64}-enable-atomic-tidy options, so that
existing fragile tests aren't perturbed too much by the change. Many
of them either rely on undef/unreachable too pervasively to be
restored to something well-defined (particularly while making sure
they test the same obscure assert from many years ago), or depend on a
particular CFG shape, which is disrupted by SimplifyCFG.
rdar://problem/16227836
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209883 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts r208640 (I've just XFAILed the test) because it broke ppc64/Linux
self-hosting. Because nearly every regression test triggers a segfault, I hope
this will be easy to fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209747 91177308-0d34-0410-b5e6-96231b3b80d8
Add regression tests for the following transformation:
str X, [x20]
...
add x20, x20, #32
->
str X, [x20], #32
with X being either w0, x0, s0, d0 or q0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209715 91177308-0d34-0410-b5e6-96231b3b80d8
Add regression tests for the following transformation:
ldr X, [x20]
...
add x20, x20, #32
->
ldr X, [x20], #32
with X being either w0, x0, s0, d0 or q0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209711 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is debatable. There are two possible approaches, neither
of which is really satisfactory:
1. Use "@foo(i1 zeroext)" to mean an extension to 32-bits on Darwin,
and 8 bits otherwise.
2. Redefine "@foo(i1)" to mean that the i1 is extended by the caller
to 8 bits. This goes against the spirit of "zeroext" I think, but
it's a bit of a vague construct anyway (by definition you're going
to extend to the amount required by the ABI, that's why it's the
ABI!).
This implements option 2. The DAG machinery really isn't setup for the
first (there's a fairly strong assumption that "zeroext" goes to at
least the smallest register size), and even if it was the resulting
DAG looks like it would be inferior in many cases.
Theoretically we could add AssertZext nodes in the consumers of
ABI-passed values too now, but this actually seems to make the code
worse in practice by making truncation proceed in two steps. The code
produced is equally valid if we continue to assume only the low bit is
defined.
Should fix PR19850
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209637 91177308-0d34-0410-b5e6-96231b3b80d8
We can eliminate the custom C++ code in favour of some TableGen to
check the same things. Functionality should be identical, except for a
buffer overrun that was present in the C++ code and meant webkit
failed if any small argument needed to be passed on the stack.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209636 91177308-0d34-0410-b5e6-96231b3b80d8
We have a couple of regression tests for load/store pairing, but (to my knowledge) there are no regression tests for the load/store + add/sub folding.
As a first step towards increased test coverage of this area, this commit adds a test for one instance of a load + add to pre-indexed load transformation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209618 91177308-0d34-0410-b5e6-96231b3b80d8
This commit starts with a "git mv ARM64 AArch64" and continues out
from there, renaming the C++ classes, intrinsics, and other
target-local objects for consistency.
"ARM64" test directories are also moved, and tests that began their
life in ARM64 use an arm64 triple, those from AArch64 use an aarch64
triple. Both should be equivalent though.
This finishes the AArch64 merge, and everyone should feel free to
continue committing as normal now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209577 91177308-0d34-0410-b5e6-96231b3b80d8
I'm doing this in two phases for a better "git blame" record. This
commit removes the previous AArch64 backend and redirects all
functionality to ARM64. It also deduplicates test-lines and removes
orphaned AArch64 tests.
The next step will be "git mv ARM64 AArch64" and rewire most of the
tests.
Hopefully LLVM is still functional, though it would be even better if
no-one ever had to care because the rename happens straight
afterwards.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209576 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r208934.
The patch depends on aliases to GEPs with non zero offsets. That is not
supported and fairly broken.
The good news is that GlobalAlias is being redesigned and will have support
for offsets, so this patch should be a nice match for it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208978 91177308-0d34-0410-b5e6-96231b3b80d8
TableGen has a fairly dubious heuristic to decide whether an alias should be
printed: does the alias have lest operands than the real instruction. This is
bad enough (particularly with no way to override it), but it should at least be
calculated consistently for both strings.
This patch implements that logic: first get the *correct* string for the
variant, in the same way as the Matcher, without guessing; then count the
number of whitespace chars.
There are basically 4 changes this brings about after the previous
commits; all of these appear to be good, so I have changed the tests:
+ ARM64: we print "neg X, Y" instead of "sub X, xzr, Y".
+ ARM64: we skip implicit "uxtx" and "uxtw" modifiers.
+ Sparc: we print "mov A, B" instead of "or %g0, A, B".
+ Sparc: we print "fcmpX A, B" instead of "fcmpX %fcc0, A, B"
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208969 91177308-0d34-0410-b5e6-96231b3b80d8
This commit implements two command line switches -global-merge-on-external
and -global-merge-aligned, and both of them are false by default, so this
optimization is disabled by default for all targets.
For ARM64, some back-end behaviors need to be tuned to get this optimization
further enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208934 91177308-0d34-0410-b5e6-96231b3b80d8
In all cases, if a "mov" alias exists, it is the canonical form of the
instruction. Now that TableGen can support aliases containing syntax variants,
we can enable them and improve the quality of the asm output.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208874 91177308-0d34-0410-b5e6-96231b3b80d8
To get at least one use of the change (and some actual tests) in with its
commit, I've enabled the AArch64 & ARM64 NEON mov aliases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208867 91177308-0d34-0410-b5e6-96231b3b80d8
When performing a scalar comparison that feeds into a vector select,
it's actually better to do the comparison on the vector side: the
scalar route would be "CMP -> CSEL -> DUP", the vector is "CM -> DUP"
since the vector comparisons are all mask based.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208210 91177308-0d34-0410-b5e6-96231b3b80d8
This completes the port of r204814 (cpirker "AArch64_BE function argument
passing for ARM ABI") from AArch64 to ARM64, and fixes a bunch of issues
found during later development along the way. The biggest of these was
that the alignment fixup logic wasn't replicated into all the places it
should have been.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208192 91177308-0d34-0410-b5e6-96231b3b80d8
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).
So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208104 91177308-0d34-0410-b5e6-96231b3b80d8
An alias has the address of what it points to, so it also has the same
alignment.
This allows a few optimizations to see past aliases for free.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208103 91177308-0d34-0410-b5e6-96231b3b80d8
The canonical form of the BFM instruction is always one of the more explicit
extract or insert operations, which makes reading output much easier.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207752 91177308-0d34-0410-b5e6-96231b3b80d8
On instructions using the NZCV register, a couple of conditions have dual
representations: HS/CS and LO/CC (meaning unsigned-higher-or-same/carry-set and
unsigned-lower/carry-clear). The first of these is more descriptive in most
circumstances, so we should print it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207644 91177308-0d34-0410-b5e6-96231b3b80d8
Since these are mostly used in "lsl #16", "lsl #32", "lsl #48" combinations to
piece together an immediate in 16-bit chunks, hex is probably the most
appropriate format.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207635 91177308-0d34-0410-b5e6-96231b3b80d8
Since these instructions only accept a 12-bit immediate, possibly shifted left
by 12, the canonical syntax used by the architecture reference manual is "#N {,
lsl #12 }". We should accept an immediate that has already been shifted, (e.g.
Also, print a comment giving the full addend since it can be helpful.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207633 91177308-0d34-0410-b5e6-96231b3b80d8
This is a partial port of r204816 (cpirker "Elf support for MC-JIT
runtime dynamic linker") from AArch64 to ARM64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207625 91177308-0d34-0410-b5e6-96231b3b80d8
There are no patterns for this. This was already fixed for ARM64 but I forgot
to apply it to AArch64 too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207515 91177308-0d34-0410-b5e6-96231b3b80d8
This patch is a supplement of implementing predicate of FP, enabling aarch64 backend
no-fp tests on arm64 target for verification. During this, one bug is exposed and
fixed by this patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207215 91177308-0d34-0410-b5e6-96231b3b80d8
This matches ARM64 behaviour, which I think is clearer. It also puts all the
churn from that difference into one easily ignored commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207116 91177308-0d34-0410-b5e6-96231b3b80d8
ARM64 was not producing pure BFI instructions for bitfield insertion
operations, unlike AArch64. The approach had to be a little different (in
ISelDAGToDAG rather than ISelLowering), and the outcomes aren't identical but
hopefully this gives it similar power.
This should address PR19424.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207102 91177308-0d34-0410-b5e6-96231b3b80d8
Covers quite a few extra instructions (like any of the max/min ones
which were broken until recently on ARM64).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206575 91177308-0d34-0410-b5e6-96231b3b80d8
Code mostly copied from AArch64, just tidied up a trifle and plumbed
into the ARM64 way of doing things.
This also enables the AArch64 tests which inspired the previous
untested commits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206574 91177308-0d34-0410-b5e6-96231b3b80d8
ARM64 was scalarizing some vector comparisons which don't quite map to
AArch64's compare and mask instructions. AArch64's approach of sacrificing a
little efficiency to emulate them with the limited set available was better, so
I ported it across.
More "inspired by" than copy/paste since the backend's internal expectations
were a bit different, but the tests were invaluable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206570 91177308-0d34-0410-b5e6-96231b3b80d8
I enhanced it a little in the process. The decision shouldn't really be beased
on whether a BUILD_VECTOR is a splat: any set of constants will do the job
provided they're related in the correct way.
Also, the BUILD_VECTOR could be any operand of the incoming AND nodes, so it's
best to check for all 4 possibilities rather than assuming it'll be the RHS.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206569 91177308-0d34-0410-b5e6-96231b3b80d8
It's not actually used to handle C or C++ ABI rules on ARM64, but could well be
emitted by other language front-ends, so it's as well to have a sensible
implementation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206568 91177308-0d34-0410-b5e6-96231b3b80d8
These ones used completely different sets of intrinsics, so the only way to do
it is create a separate ARM64 copy and change them all.
Other than that, CodeGen was straightforward, no deficiencies detected here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206392 91177308-0d34-0410-b5e6-96231b3b80d8
The most important part here is that we should actuall emit the stubs we refer
to in the exception table, but as a side issue this uses more sensible & GCC
compatible representations for some of the bits of information.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206380 91177308-0d34-0410-b5e6-96231b3b80d8
If we know that a particular 64-bit constant has all high bits zero, then we
can rely on the fact that 32-bit ARM64 instructions automatically zero out the
high bits of an x-register. This gives the expansion logic less constraints to
satisfy and so sometimes allows it to pick better sequences.
Came up while porting test/CodeGen/AArch64/movw-consts.ll: this will allow a
32-bit MOVN to be used in @test8 soon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206379 91177308-0d34-0410-b5e6-96231b3b80d8
This particular DAG combine is designed to kick in when both ConstantFPs will
end up being loaded via a litpool, however those nodes have a semi-legal
status, dictated by isFPImmLegal so in some cases there wouldn't have been a
litpool in the first place. Don't try to be clever in those circumstances.
Picked up while merging some AArch64 tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206365 91177308-0d34-0410-b5e6-96231b3b80d8
Sometimes we need emit the bits that would actually be a MOVN when producing a
relocated MOVZ instruction (don't ask). But not always, a check which ARM64 got
wrong until now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206289 91177308-0d34-0410-b5e6-96231b3b80d8
I've left the MachO CodeGen as it is, there's a reasonable chance it should use
the GOT like ConstPools, but I'm not certain.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206288 91177308-0d34-0410-b5e6-96231b3b80d8
This brings it into line with the AArch64 behaviour and should open the way for
certain OpenCL features.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206286 91177308-0d34-0410-b5e6-96231b3b80d8
Code is mostly copied directly across, with a slight extension of the
ISelDAGToDAG function so that it can cope with the floating-point constants
being behind a litpool.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206285 91177308-0d34-0410-b5e6-96231b3b80d8
Code change is because optimizeCompareInstr didn't know how to pull the
condition code out of FCSEL instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206171 91177308-0d34-0410-b5e6-96231b3b80d8
There was one definite issue in ARM64 (the off-by-1 check for whether
a shift could be folded in) and one difference that is probably
correct: ARM64 didn't fold nodes with multiple uses into the
arithmetic operations unless optimising for code size.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206168 91177308-0d34-0410-b5e6-96231b3b80d8
This transformation is only valid when being used for an EQ or NE
comparison since the flags change otherwise.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206167 91177308-0d34-0410-b5e6-96231b3b80d8
In AArch64 i64 to i32 truncate operation is a subregister access.
This allows more opportunities for LSR optmization to eliminate
variables of different types (i32 and i64).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205925 91177308-0d34-0410-b5e6-96231b3b80d8
When register allocator's stage is RS_Spill, we choose spill over using the CSR
for the first time, if the spill cost is lower than CSRCost.
When register allocator's stage is < RS_Split, we choose pre-splitting over
using the CSR for the first time, if the cost of splitting is lower than
CSRCost.
CSRCost is set with command-line option "regalloc-csr-first-time-cost". The
default value is 0 to generate the same codes as before this commit.
With a value of 15 (1 << 14 is the entry frequency), I measured performance
gain of 3% on 253.perlbmk and 1.7% on 197.parser, with instrumented PGO,
on an arm device.
rdar://16162005
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204690 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, only regular AArch64 instructions were annotated with SchedRW lists.
This patch does the same for NEON enabling these instructions to be scheduled by
the MIScheduler. Additionally, store operations are now modeled and a few
SchedRW lists were updated for bug fixes (e.g. multiple def operands).
Reviewers: apazos, mcrosier, atrick
Patch by Dave Estes <cestes@codeaurora.org>!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204505 91177308-0d34-0410-b5e6-96231b3b80d8
.data_region is only used in Darwin, so it shouldn't be generated
for other OS. Currently AArch64 doesn't support darwin yet, so
I removed it from AArch64. When Darwin is supported someday, we can
add it back and associate it with Darwin.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204424 91177308-0d34-0410-b5e6-96231b3b80d8
The syntax for "cmpxchg" should now look something like:
cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic
where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).
rdar://problem/15996804
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203559 91177308-0d34-0410-b5e6-96231b3b80d8
The function was making too many assumptions about its input:
1. The NEON_VDUP optimisation was far too aggressive, assuming (I
think) that the input would always be BUILD_VECTOR.
2. We were treating most unknown concats as legal (by returning Op
rather than SDValue()). I think only concats of pairs of vectors are
actually legal.
http://llvm.org/PR19094
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203450 91177308-0d34-0410-b5e6-96231b3b80d8
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.
The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.
The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.
I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.
The net result is that we don't create temporary labels that are never used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203204 91177308-0d34-0410-b5e6-96231b3b80d8
for the Cortex-A53 subtarget in the AArch64 backend.
This patch lays the ground work to annotate each AArch64 instruction
(no NEON yet) with a list of SchedReadWrite types. The patch also
provides the Cortex-A53 processor resources, maps those the the default
SchedReadWrites, and provides basic latency. NEON support will be added
in a subsequent patch with proper forwarding logic.
Verification was done by setting the pre-RA scheduler to linearize to
better gauge the effect of the MIScheduler. Even without modeling the
forward logic, the results show a modest improvement for Cortex-A53.
Reviewers: apazos, mcrosier, atrick
Patch by Dave Estes <cestes@codeaurora.org>!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203125 91177308-0d34-0410-b5e6-96231b3b80d8
for the Cortex-A53 subtarget in the AArch64 backend.
This patch lays the ground work to annotate each AArch64 instruction
(no NEON yet) with a list of SchedReadWrite types. The patch also
provides the Cortex-A53 processor resources, maps those the the default
SchedReadWrites, and provides basic latency. NEON support will be added
in a subsequent patch with proper forwarding logic.
Verification was done by setting the pre-RA scheduler to linearize to
better gauge the effect of the MIScheduler. Even without modeling the
forward logic, the results show a modest improvement for Cortex-A53.
Reviewers: apazos, mcrosier, atrick
Patch by Dave Estes <cestes@codeaurora.org>!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202767 91177308-0d34-0410-b5e6-96231b3b80d8
The table argument is always 128-bit (and interpreted as <16 x i8>) so the
extra specifier for it is just clutter.
No user-visible behaviour change, so no tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202258 91177308-0d34-0410-b5e6-96231b3b80d8
The va_start macro for AArch64 must set va_list.__stack to the address
following the last named argument on the stack, rounded up to an alignment
of 8 bytes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201797 91177308-0d34-0410-b5e6-96231b3b80d8
As v1i1 is illegal, the type legalizer tries to scalarize such node. But if the type operands of SETCC is legal, the scalarization algorithm will cause an assertion failure.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201381 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for
targets with mature MC support. Such targets will always parse the inline
assembly (even when emitting assembly). Targets without mature MC support
continue to use EmitRawText() for assembly output.
The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced
with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler
to parse inline assembly (even when emitting assembly output). UseIntegratedAs
is set to true for targets that consider any failure to parse valid assembly
to be a bug. Target specific subclasses generally enable the integrated
assembler in their constructor. The default value can be overridden with
-no-integrated-as.
All tests that rely on inline assembly supporting invalid assembly (for example,
those that use mnemonics such as 'foo' or 'hello world') have been updated to
disable the integrated assembler.
Changes since review (and last commit attempt):
- Fixed test failures that were missed due to configuration of local build.
(fixes crash.ll and a couple others).
- Fixed tests that happened to pass because the local build was on X86
(should fix 2007-12-17-InvokeAsm.ll)
- mature-mc-support.ll's should no longer require all targets to be compiled.
(should fix ARM and PPC buildbots)
- Object output (-filetype=obj and similar) now forces the integrated assembler
to be enabled regardless of default setting or -no-integrated-as.
(should fix SystemZ buildbots)
Reviewers: rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2686
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201333 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output.
The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as.
All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler.
Reviewers: rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2686
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201237 91177308-0d34-0410-b5e6-96231b3b80d8