POP on armv4t cannot be used to change thumb state (unilke later non-m-class
architectures), therefore we need a different return sequence that uses 'bx'
instead:
POP {r3}
ADD sp, #offset
BX r3
This patch also fixes an issue where the return value in r3 would get clobbered
for functions that return 128 bits of data. In that case, we generate this
sequence instead:
MOV ip, r3
POP {r3}
ADD sp, #offset
MOV lr, r3
MOV r3, ip
BX lr
http://reviews.llvm.org/D4748
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214881 91177308-0d34-0410-b5e6-96231b3b80d8
It's a bit of a tradeoff, since llvm-dwarfdump doesn't print the name of
the global symbol being used as an address in the addressing mode, but
this avoids the dependence on hardcoded set labels that keep changing
(5+ commits over the last few years that each update the set label as it
changes due to other, unrelated differences in output). This could've,
instead, been changed to match the set name then match the name in the
string pool but that would present other issues (needing to skip over
the sets that weren't of interest, etc) and checking that the addresses
(granted, without relocations applied - so it's not the whole story)
match in the two variable location descriptions seems sufficient and
fairly stable here.
There are a few similar other tests with similar label dependence that
I'll update soonish.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214878 91177308-0d34-0410-b5e6-96231b3b80d8
SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code
path for 8-bit and 16-bit private loads.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214865 91177308-0d34-0410-b5e6-96231b3b80d8
Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214859 91177308-0d34-0410-b5e6-96231b3b80d8
found by a single test reduced out of a failure on llvm-stress.
The start of the problem (and the crash) came when we tried to use
a find of a non-used slot in the move-to half of the move-mask as the
target for two bad-half inputs. While if lucky this will be the first of
a pair of slots which we can place the bad-half inputs into, it isn't
actually guaranteed. This really isn't surprising, not sure what I was
thinking. The correct way to find the two unused slots is to look for
one of the *used* slots. We know it isn't that pair, and we can use some
modular arithmetic to find the other pair by masking off the odd bit and
adding 2 modulo 4. With this, we reliably found a viable pair of slots
for the bad-half inputs.
Sadly, that wasn't enough. We also had a wrong code bug that surfaced
when I reduced the test case for this where we would use the same slot
twice for the two bad inputs. This is because both of the bad inputs
could be in odd slots originally and thus the mod-2 mapping would
actually be the same. The whole point of the weird indexing into the
pair of empty slots was to try to leverage when the end result needed
the two bad-half inputs to be paired in a dword and pre-pair them in the
correct orrientation. This is less important with the powerful combining
we're now doing, and also easier and more reliable to achieve be noting
that we add the bad-half inputs in order. Thus, if they are in a dword
pair, the low part of that will be the first input in the sequence.
Always putting that in the low element will just do the right thing in
addition to computing the correct result.
Test case added. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214849 91177308-0d34-0410-b5e6-96231b3b80d8
This implements basic argument lowering for AArch64 in FastISel. It only
handles a small subset of the C calling convention. It supports simple
arguments that can be passed in GPR and FPR registers.
This should cover most of the trivial cases without falling back to
SelectionDAG.
This fixes <rdar://problem/17890986>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214846 91177308-0d34-0410-b5e6-96231b3b80d8
It broke compiling of most Benchmark and internal test, as clang got
clashed by segmentation fault or assertion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214845 91177308-0d34-0410-b5e6-96231b3b80d8
sequence on AArch64
Re-commit of r214669 without changes to test cases
LLVM::CodeGen/AArch64/arm64-neon-mul-div.ll and
LLVM:: CodeGen/AArch64/dp-3source.ll
This resolves the reported compfails of the original commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214832 91177308-0d34-0410-b5e6-96231b3b80d8
My original LE implementation of the vsldoi instruction, with its
altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect
shufflevector operations in the LLVM IR. Correct code is generated
because the back end handles the incorrect shufflevector in a
consistent manner.
This patch and a companion patch for Clang correct this problem by
removing the fixup from altivec.h and the corresponding fixup from the
PowerPC back end. Several test cases are also modified to reflect the
now-correct LLVM IR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214800 91177308-0d34-0410-b5e6-96231b3b80d8
Duplicate the vararg tests for linux and add a tests which mixed
vararg arguments with darwin positional parameters.
Patch by: Janne Grunau <j@jannau.net>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214799 91177308-0d34-0410-b5e6-96231b3b80d8
This fix changes the parameters #r and #s that are passed to the UBFM/SBFM
instruction to get the zero/sign-extension for free.
The original problem was that the shift left would use the 32-bit shift even for
i8/i16 value types, which could leave the upper bits set with "garbage" values.
The arithmetic shift right on the other side would use the wrong MSB as sign-bit
to determine what bits to shift into the value.
This fixes <rdar://problem/17907720>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214788 91177308-0d34-0410-b5e6-96231b3b80d8
This code is completely wrong. It is also dead, as if it were to *ever*
run, it would crash. Fortunately, after my work to the combiner, it is
at least *possible* to reach the code, and llvm-stress has found a test
case. Thanks to Patrick for reporting.
It would be really good if anyone who remembers how this code works and
what it was intended to do could add some more obvious test coverage
instead of my completely contrived and reduced test case. My test case
was so brittle I left a bread crumb comment in it to help the next
person to stumble on it and not know what it was actually testing for.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214785 91177308-0d34-0410-b5e6-96231b3b80d8
scalar integer instruction pass.
This is a patch I had lying around from a few months ago. The pass is
currently disabled by default, so nothing to interesting.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214779 91177308-0d34-0410-b5e6-96231b3b80d8
When the last instruction prior to a function epilogue is a call, we
need to emit a nop so that the return address is not in the epilogue IP
range. This is consistent with MSVC's behavior, and may be a workaround
for a bug in the Win64 unwinder.
Differential Revision: http://reviews.llvm.org/D4751
Patch by Vadim Chugunov!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214775 91177308-0d34-0410-b5e6-96231b3b80d8
In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl*
by swapping the input arguments. As it turns out, the exact same fix
is also required for the vpkuhum/vpkuwum patterns.
This fixes another regression in llvmpipe when vector support is
enabled.
Reviewed by Bill Schmidt.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214718 91177308-0d34-0410-b5e6-96231b3b80d8
I ran into some test failures where common code changed vector division
by constant into a multiply-high operation (MULHU). But these are not
implemented by the back-end, so we failed to recognize the insn.
Fixed by marking MULHU/MULHS as Expand for vector types.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214716 91177308-0d34-0410-b5e6-96231b3b80d8
This patch refactors code generation of vector comparisons.
This fixes a wrong code-gen bug for ISD::SETGE for floating-point types,
and improves generated code for vector comparisons in general.
Specifically, the patch moves all logic deciding how to implement vector
comparisons into getVCmpInst, which gets two extra boolean outputs
indicating to its caller whether its needs to swap the input operands
and/or negate the result of the comparison. Apart from implementing
these two modifications as directed by getVCmpInst, there is no need
to ever implement vector comparisons in any other manner; in particular,
there is never a need to perform two separate comparisons (e.g. one for
equal and one for greater-than, as code used to do before this patch).
Reviewed by Bill Schmidt.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214714 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch also fixes an issue with the way the Mips assembler enables/disables architecture
features. Before this patch, the assembler never disabled feature bits. For example,
.set mips64
.set mips32r2
would result in the 'OR' of mips64 with mips32r2 feature bits which isn't right.
Unfortunately this isn't trivial to fix because there's not an easy way to clear
feature bits as the algorithm in MCSubtargetInfo (ToggleFeature) only clears the bits
that imply the feature being cleared and not the implied bits by the feature (there's a
better explanation to the code I added).
Patch by Matheus Almeida and updated by Toma Tabacu
Reviewers: vmedic, matheusalmeida, dsanders
Reviewed By: dsanders
Subscribers: tomatabacu, llvm-commits
Differential Revision: http://reviews.llvm.org/D4123
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214709 91177308-0d34-0410-b5e6-96231b3b80d8
use of PACKUS. It's cleaner that way.
I looked at implementing clever combine-based folding of PACKUS chains
into PSHUFB but it is quite hard and doesn't seem likely to be worth it.
The most annoying part would be detecting that the correct masking had
been done to use PACKUS-style instructions as a blend operation rather
than there being any saturating as is indicated by its name. We generate
really nice code for what few test cases I've come up with that aren't
completely contrived for this by just directly prefering PSHUFB and so
let's go with that strategy for now. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214707 91177308-0d34-0410-b5e6-96231b3b80d8
patterns of v16i8 shuffles.
This implements one of the more important FIXMEs for the SSE2 support in
the new shuffle lowering. We now generate the optimal shuffle sequence
for truncate-derived shuffles which show up essentially everywhere.
Unfortunately, this exposes a weakness in other parts of the shuffle
logic -- we can no longer form PSHUFB here. I'll add the necessary
support for that and other things in a subsequent commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214702 91177308-0d34-0410-b5e6-96231b3b80d8
I spent some time looking into a better or more principled way to handle
this. For example, by detecting arbitrary "unneeded" ORs... But really,
there wasn't any point. We just shouldn't build blatantly wrong code so
late in the pipeline rather than adding more stages and logic later on
to fix it. Avoiding this is just too simple.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214680 91177308-0d34-0410-b5e6-96231b3b80d8
Fundamentally, there isn't a really portable way to test the constant
pool contents. Instead, pin this test to the bare-metal triple. This
also makes it a 64-bit triple which allows us to only match a single
constant pool rather than two. It can also just hard code the '.' prefix
as the format should be stable now that it has a fixed triple. Finally,
I've switched it to use CHECK-NEXT to be more precise in the instruction
sequence expected and to use variables rather than hard coding decisions
by the register allocator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214679 91177308-0d34-0410-b5e6-96231b3b80d8
This is intended to be the minimal change needed to fix PR20354 ( http://llvm.org/bugs/show_bug.cgi?id=20354 ). The check for a vector operation was wrong; we need to check that the fabs itself is not a vector operation.
This patch will not generate the optimal code. A constant pool load and 'and' op will be generated instead of just returning a value that we can calculate in advance (as we do for the scalar case). I've put a 'TODO' comment for that here and expect to have that patch ready soon.
There is a very similar optimization that we can do in visitFNEG, so I've put another 'TODO' there and expect to have another patch for that too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214670 91177308-0d34-0410-b5e6-96231b3b80d8
sequence - AArch64 target support
This patch turns off madd/msub generation in the DAGCombiner and generates
them in the MachineCombiner instead. It replaces the original code sequence
with the combined sequence when it is beneficial to do so.
When there is no machine model support it always generates the madd/msub
instruction. This is true also when the objective is to optimize for code
size: when the combined sequence is shorter is always chosen and does not
get evaluated.
When there is a machine model the combined instruction sequence
is evaluated for critical path and resource length using machine
trace metrics and the original code sequence is replaced when it is
determined to be faster.
rdar://16319955
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214669 91177308-0d34-0410-b5e6-96231b3b80d8