A vector extract followed by a dup can become a single instruction even if the
types don't match. AArch64 handled this in ISelLowering, but a few reasonably
simple patterns can take care of it in TableGen, so that's where I've put it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206573 91177308-0d34-0410-b5e6-96231b3b80d8
Tests will be committed shortly when all optimisations needed to
support AArch64's neon-copy.ll file are supported.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206571 91177308-0d34-0410-b5e6-96231b3b80d8
I enhanced it a little in the process. The decision shouldn't really be beased
on whether a BUILD_VECTOR is a splat: any set of constants will do the job
provided they're related in the correct way.
Also, the BUILD_VECTOR could be any operand of the incoming AND nodes, so it's
best to check for all 4 possibilities rather than assuming it'll be the RHS.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206569 91177308-0d34-0410-b5e6-96231b3b80d8
This patch improves the performance of vector creation in caseiswhere where
several of the lanes in the vector are a constant floating point value. It
also includes new patterns to fold together some of the instructions when the
value is 0.0f. Test cases included.
rdar://16349427
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206496 91177308-0d34-0410-b5e6-96231b3b80d8
If we know that a particular 64-bit constant has all high bits zero, then we
can rely on the fact that 32-bit ARM64 instructions automatically zero out the
high bits of an x-register. This gives the expansion logic less constraints to
satisfy and so sometimes allows it to pick better sequences.
Came up while porting test/CodeGen/AArch64/movw-consts.ll: this will allow a
32-bit MOVN to be used in @test8 soon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206379 91177308-0d34-0410-b5e6-96231b3b80d8
I've left the MachO CodeGen as it is, there's a reasonable chance it should use
the GOT like ConstPools, but I'm not certain.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206288 91177308-0d34-0410-b5e6-96231b3b80d8
This brings it into line with the AArch64 behaviour and should open the way for
certain OpenCL features.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206286 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds patterns to generate the cls instruction ARM64. Includes tests
for 64 bit and 32 bit operands.
rdar://15611957
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206079 91177308-0d34-0410-b5e6-96231b3b80d8
Confusingly, the NEON fmla instructions put the accumulator first but the
scalar versions put it at the end (like the fma lib function & LLVM's
intrinsic).
This should fix PR19345, assuming there's only one issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205758 91177308-0d34-0410-b5e6-96231b3b80d8
The previous patterns directly inserted FMOV or INS instructions into
the DAG for scalar_to_vector & bitconvert patterns. This is horribly
inefficient and can generated lots more GPR <-> FPR register traffic
than necessary.
It's much better to emit instructions the register allocator
understands so it can coalesce the copies when appropriate.
It led to at least one ISelLowering hack to avoid the problems, which
was incorrect for v1i64 (FPR64 has no dsub). It can now be removed
entirely.
This should also fix PR19331.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205616 91177308-0d34-0410-b5e6-96231b3b80d8
Switching between i32 and i64 based on the LHS type is a good idea in
theory, but pre-legalisation uses i64 regardless of our choice,
leading to potential ISel errors.
Should fix PR19294.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205519 91177308-0d34-0410-b5e6-96231b3b80d8
The previous attempt was fine with optimisations, but was actually rather
cavalier with its types. When compiled at -O0, it produced invalid COPY
MachineInstrs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205422 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a second implementation of the AArch64 architecture to LLVM,
accessible in parallel via the "arm64" triple. The plan over the
coming weeks & months is to merge the two into a single backend,
during which time thorough code review should naturally occur.
Everything will be easier with the target in-tree though, hence this
commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205090 91177308-0d34-0410-b5e6-96231b3b80d8