We were swapping the true & false results while testing for FMAX/FMIN,
but not putting them back to the original state if the later checks
failed.
Should fix PR19700.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208469 91177308-0d34-0410-b5e6-96231b3b80d8
this patch disables the dead register elimination pass and the load/store pair
optimization pass at -O0. The ILP optimizations don't require the optimization
level to be checked because the call to addILPOpts is predicated with the
necessary check. The AdvSIMDScalar pass is disabled by default at all
optimization levels. This patch leaves that pass disabled by default.
Also, move command-line options into ARM64TargetMachine.cpp and add a few
additional flags to aid in debugging. This fixes an issue with the
-debug-pass=Structure flag where passes were printed, but not actually run
(i.e., AdvSIMDScalar pass).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208223 91177308-0d34-0410-b5e6-96231b3b80d8
When performing a scalar comparison that feeds into a vector select,
it's actually better to do the comparison on the vector side: the
scalar route would be "CMP -> CSEL -> DUP", the vector is "CM -> DUP"
since the vector comparisons are all mask based.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208210 91177308-0d34-0410-b5e6-96231b3b80d8
The AAPCS states that values passed in registers must have a value as though
they had been loaded with "LDR". LDR is equivalent to "LD1.64 vX.1D" - that is,
loading scalars to vector registers and loading 1-element vectors is equivalent.
The logic implemented here is to ensure that at all call boundaries and during
formal argument lowering all vectors are treated as their bitwidth-based floating
point scalar counterpart, which is always one of f64 or f128 (v2i32 -> f64,
v4i32 -> f128 etc). A BITCAST is inserted so that the appropriate REV will be
generated during code generation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208198 91177308-0d34-0410-b5e6-96231b3b80d8
Because we've canonicalised on using LD1/ST1, every time we do a bitcast
between vector types we must do an equivalent lane reversal.
Consider a simple memory load followed by a bitconvert then a store.
v0 = load v2i32
v1 = BITCAST v2i32 v0 to v4i16
store v4i16 v2
In big endian mode every memory access has an implicit byte swap. LDR and
STR do a 64-bit byte swap, whereas LD1/ST1 do a byte swap per lane - that
is, they treat the vector as a sequence of elements to be byte-swapped.
The two pairs of instructions are fundamentally incompatible. We've decided
to use LD1/ST1 only to simplify compiler implementation.
LD1/ST1 perform the equivalent of a sequence of LDR/STR + REV. This makes
the original code sequence: v0 = load v2i32
v1 = REV v2i32 (implicit)
v2 = BITCAST v2i32 v1 to v4i16
v3 = REV v4i16 v2 (implicit)
store v4i16 v3
But this is now broken - the value stored is different to the value loaded
due to lane reordering. To fix this, on every BITCAST we must perform two
other REVs:
v0 = load v2i32
v1 = REV v2i32 (implicit)
v2 = REV v2i32
v3 = BITCAST v2i32 v2 to v4i16
v4 = REV v4i16
v5 = REV v4i16 v4 (implicit)
store v4i16 v5
This means an extra two instructions, but actually in most cases the two REV
instructions can be combined into one. For example:
(REV64_2s (REV64_4h X)) === (REV32_4h X)
There is also no 128-bit REV instruction. This must be synthesized with an
EXT instruction.
Most bitconverts require some sort of conversion. The only exceptions are:
a) Identity conversions - vNfX <-> vNiX
b) Single-lane-to-scalar - v1fX <-> fX or v1iX <-> iX
Even though there are hundreds of changed lines, I have a fairly high confidence
that they are somewhat correct. The changes to add two REV instructions per
bitcast were pretty mechanical, and once I'd done that I threw the resulting
.td at a script I wrote which combined the two REVs together (and added
an EXT instruction, for f128) based on an instruction description I gave it.
This was much less prone to error than doing it all manually, plus my brain
would not just have melted but would have vapourised.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208194 91177308-0d34-0410-b5e6-96231b3b80d8
This completes the port of r204814 (cpirker "AArch64_BE function argument
passing for ARM ABI") from AArch64 to ARM64, and fixes a bunch of issues
found during later development along the way. The biggest of these was
that the alignment fixup logic wasn't replicated into all the places it
should have been.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208192 91177308-0d34-0410-b5e6-96231b3b80d8
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).
So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208104 91177308-0d34-0410-b5e6-96231b3b80d8
While post-indexed LD1/ST1 instructions do exist for vector loads,
this patch makes use of the more flexible addressing-modes in LDR/STR
instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207838 91177308-0d34-0410-b5e6-96231b3b80d8
The canonical form of the BFM instruction is always one of the more explicit
extract or insert operations, which makes reading output much easier.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207752 91177308-0d34-0410-b5e6-96231b3b80d8
For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it
into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx
more difficult.
For example:
Given
%shr = lshr i64 %x, 4
%and = and i64 %shr, 15
%arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and
%0 = load i64* %arrayidx
With current shift folding, it takes 3 instrs to compute base address:
lsr x8, x0, #1
and x8, x8, #0x78
add x8, x9, x8
If using ubfx, it only needs 2 instrs:
ubfx x8, x0, #4, #4
add x8, x9, x8, lsl #3
This fixes bug 19589
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207702 91177308-0d34-0410-b5e6-96231b3b80d8
The canonical syntax for shifts by a variable amount does not end with 'v', but
that syntax should be supported as an alias (presumably for legacy reasons).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207649 91177308-0d34-0410-b5e6-96231b3b80d8
On instructions using the NZCV register, a couple of conditions have dual
representations: HS/CS and LO/CC (meaning unsigned-higher-or-same/carry-set and
unsigned-lower/carry-clear). The first of these is more descriptive in most
circumstances, so we should print it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207644 91177308-0d34-0410-b5e6-96231b3b80d8
Since these are mostly used in "lsl #16", "lsl #32", "lsl #48" combinations to
piece together an immediate in 16-bit chunks, hex is probably the most
appropriate format.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207635 91177308-0d34-0410-b5e6-96231b3b80d8
This is mostly aimed at the NEON logical operations and MOVI/MVNI (since they
accept weird shifts which are more naturally understandable in hex notation).
Also changes BRK/HINT etc, which is probably a neutral change, but easier than
the alternative.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207634 91177308-0d34-0410-b5e6-96231b3b80d8
Since these instructions only accept a 12-bit immediate, possibly shifted left
by 12, the canonical syntax used by the architecture reference manual is "#N {,
lsl #12 }". We should accept an immediate that has already been shifted, (e.g.
Also, print a comment giving the full addend since it can be helpful.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207633 91177308-0d34-0410-b5e6-96231b3b80d8
There's no need for local symbols to go through the GOT, in fact it seems GNU ld is not even emitting GOT entries for local symbols and will error out when trying to resolve a GOT relocation for a local symbol.
This bug triggers when bootstrapping clang on AArch64 Linux with -fPIC and the ARM64 backend. The AArch64 backend is not affected.
With this commit it's now possible to bootstrap clang on AArch64 Linux with the ARM64 backend (-fPIC, -O3).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207226 91177308-0d34-0410-b5e6-96231b3b80d8
ARM64 was not producing pure BFI instructions for bitfield insertion
operations, unlike AArch64. The approach had to be a little different (in
ISelDAGToDAG rather than ISelLowering), and the outcomes aren't identical but
hopefully this gives it similar power.
This should address PR19424.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207102 91177308-0d34-0410-b5e6-96231b3b80d8
ANDS does not use the same encoding scheme as other xxxS instructions (e.g.,
ADDS). Take that into account to avoid wrong peephole optimization.
<rdar://problem/16693089>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207020 91177308-0d34-0410-b5e6-96231b3b80d8
AArch64 has feature predicates for NEON, FP and CRYPTO instructions.
This allows the compiler to generate code without using FP, NEON
or CRYPTO instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206949 91177308-0d34-0410-b5e6-96231b3b80d8
while checking candidate for bit field extract.
Otherwise the value may not fit in uint64_t and this will trigger an
assertion.
This fixes PR19503.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206834 91177308-0d34-0410-b5e6-96231b3b80d8