This patch fixes the multiple breakages on ARM test-suite after the SLP
vectorizer was introduced by default on O3. The problem was an illegal
vector type on ARMTTI::getCmpSelInstrCost() <3 x i1> which is not simple.
The guard protects this code from breaking (cause of the problems) but
doesn't fix the issue that is generating the odd vector in the first
place, which also needs to be investigated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187658 91177308-0d34-0410-b5e6-96231b3b80d8
Function attributes are the future! So just query whether we want to realign the
stack directly from the function instead of through a random target options
structure.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187618 91177308-0d34-0410-b5e6-96231b3b80d8
While the .td entry is nice and all, it takes a pretty gross hack in
ARMAsmParser::ParseInstruction() because of handling of other "subs"
instructions to get it to match. Ran it by Jim Grosbach and he said it was
about what he expected to make this work given the existing code.
rdar://14214063
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187530 91177308-0d34-0410-b5e6-96231b3b80d8
When simplifying a (or (and B A) (and C ~A)) to a (VBSL A B C) ensure that the
bitwidth of the second operands to both ands match before comparing the negation
of the values.
Split the check of the value of the second operands to the ands. Move the cast
and variable declaration slightly higher to make it slightly easier to follow.
Bug-Id: 16700
Signed-off-by: Saleem Abdulrasool <compnerd@compnerd.org>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187404 91177308-0d34-0410-b5e6-96231b3b80d8
do in the SDag when lowering references to the GOT: use
ARMConstantPoolSymbol rather than creating a dummy global variable. The
computation of the alignment still feels weird (it uses IR types and
datalayout) but it preserves the exact previous behavior. This change
fixes the memory leak of the global variable detected on the valgrind
leak checking bot.
Thanks to Benjamin Kramer for pointing me at ARMConstantPoolSymbol to
handle this use case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187303 91177308-0d34-0410-b5e6-96231b3b80d8
me) should start watching this bot more as its catching lots of bugs.
The fix here is to not construct the global if we aren't going to need
it. That's cheaper anyways, and globals have highly predictable types in
practice. I've added an assert to catch skew between our manual testing
of the type and the actual type just for paranoia's sake.
Note that this pattern is actually fine in most globals because when you
build a global with a module it automatically is moved to be owned by
that module. But here, we're in isel and don't really want to do that.
The solution of not creating a global is simpler anyways.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187302 91177308-0d34-0410-b5e6-96231b3b80d8
When vectors are built from a single value, the ARM lowering issues a
scalar_to_vector node.
This node is then always morphed into a move from the general purpose unit to
the vector unit.
When the value comes from a load, this can be simplified into a vector load to
the right lane.
This patch changes the lowering of insert_vector_elt to expose a vector
friendly pattern in this situation.
This is a step toward fixing <rdar://problem/14170854>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186999 91177308-0d34-0410-b5e6-96231b3b80d8
instructions. With this patch:
1. ldr.n is recognized as mnemonic for the short encoding
2. ldr.w is recognized as menmonic for the long encoding
3. ldr will map to either short or long encodings depending on the size of the offset
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186831 91177308-0d34-0410-b5e6-96231b3b80d8
After Ulrich's r180677 (thanks!) TableGen is intelligent enough to
handle tied constraints involving complex operands properly, so
virtually all of the ARM custom converters are now unnecessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186810 91177308-0d34-0410-b5e6-96231b3b80d8
indirect branches correctly. Under some circumstances, this led to the deletion
of basic blocks that were the destination of indirect branches. In that case it
left indirect branches to nowhere in the code.
This patch replaces, and is more general than either of the previous fixes for
indirect-branch-analysis issues, r181161 and r186461.
For other branches (not indirect) this refactor should have *almost* identical
behavior to the previous version. There are some corner cases where this
refactor is able to analyze blocks that the previous version could not (e.g.
this necessitated the update to thumb2-ifcvt2.ll).
<rdar://problem/14464830>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186735 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a new class for non-predicable NEON instructions and a
new DecoderNamespace for v8 NEON instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186504 91177308-0d34-0410-b5e6-96231b3b80d8
My patch 'r183551 - ARM FastISel integer sext/zext improvements' was incorrect when emitting ARM register-immediate ASR, LSL, LSR instructions: they are pseudo-instructions in ARMInstrInfo.td and I should have used MOVsi instead.
This is not an issue when code is generated through a .s file, but is an issue when generated straight to a .o (-filetype=obj).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186489 91177308-0d34-0410-b5e6-96231b3b80d8
block. Blocks that have an indirect branch terminator, even if it's not the
last terminator, should still be treated as unanalyzable.
<rdar://problem/14437274>
Reducing a useful regression test case is proving difficult - I hope to have
one soon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186461 91177308-0d34-0410-b5e6-96231b3b80d8
This adds an instruction alias to make the assembler recognize the alternate literal form: pli [PC, #+/-<imm>]
See A8.8.129 in the ARM ARM (DDI 0406C.b).
Fixes <rdar://problem/14403733>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186459 91177308-0d34-0410-b5e6-96231b3b80d8
We'd forgotten to provide string representations for the special ARMISD atomic
nodes; this adds them in. No effect on CodeGen, just makes the output of
"-view-whatever-dags" slightly more readable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186406 91177308-0d34-0410-b5e6-96231b3b80d8
Intrinsics already existed for the 64-bit variants, so these support operations
of size at most 32-bits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186392 91177308-0d34-0410-b5e6-96231b3b80d8
This patch enables calls to __aeabi_idivmod when in EABI mode,
by using the remainder value returned on registers (R1),
enabled by the ARM triple "none-eabi". Note that Darwin and
GNUEABI triples will continue lowering on GNU style, that is,
using the stack for the remainder.
Still need to add SREM/UREM support fix for 64-bit lowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186390 91177308-0d34-0410-b5e6-96231b3b80d8
ARM paired GPR COPY was being lowered to two MOVr without CC. This
patch puts the CC back.
My test is a reduction of the case where I encountered the issue,
64-bit atomics use paired GPRs.
The issue only occurs with selectionDAG, FastISel doesn't encounter it
so I didn't bother calling it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186226 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes a 35% degradation compared to unvectorized code in
MiBench/automotive-susan and an equally serious regression on a private
image processing benchmark.
radar://14351991
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186188 91177308-0d34-0410-b5e6-96231b3b80d8
Address calculation for gather/scather in vectorized code can incur a
significant cost making vectorization unbeneficial. Add infrastructure to add
cost.
Tests and cost model for targets will be in follow-up commits.
radar://14351991
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186187 91177308-0d34-0410-b5e6-96231b3b80d8
Currently ARM is the only backend that supports FMA instructions (for at least some subtargets) but does not implement this virtual, so FMAs are never generated except from explicit fma intrinsic calls. Apparently this is due to the fact that it supports both fused (one rounding step) and unfused (two rounding step) multiply + add instructions. This patch clarifies that this the case without changing behavior by implementing the virtual function to simply return false, as the default TargetLoweringBase version does.
It is possible that some cpus perform the fused version faster than the unfused version and vice-versa, so the function implementation should be revisited if hard data is found.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185994 91177308-0d34-0410-b5e6-96231b3b80d8
Propagate the fix from r185712 to Thumb2 codegen as well. Original
commit message applies here as well:
A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and
packs them in the bottom half of "x". An arithmetic and logic shift are
only equivalent in this context if the shift amount is 16. We would be
shifting in ones into the bottom 16bits instead of zeros if "y" is
negative.
rdar://14338767
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185982 91177308-0d34-0410-b5e6-96231b3b80d8
A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and packs them
in the bottom half of "x". An arithmetic and logic shift are only equivalent in
this context if the shift amount is 16. We would be shifting in ones into the
bottom 16bits instead of zeros if "y" is negative.
radar://14338767
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185712 91177308-0d34-0410-b5e6-96231b3b80d8
In the SelectionDAG immediate operands to inline asm are constructed as
two separate operands. The first is a constant of value InlineAsm::Kind_Imm
and the second is a constant with the value of the immediate.
In ARMDAGToDAGISel::SelectInlineAsm, if we reach an operand of Kind_Imm we
should skip over the next operand too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185688 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a new decoder table/namespace 'VFPV8', as these instructions have their
top 4 bits as 0b1111, while other Thumb instructions have 0b1110.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185642 91177308-0d34-0410-b5e6-96231b3b80d8