match splats in the form (splat (scalar_to_vector (load ...))) whenever
the load can be folded. All the logic and instruction emission is
working but because of PR8156, there are no ways to match loads, cause
they can never be folded for splats. Thus, the tests are XFAILed, but
I've tested and exercised all the logic using a relaxed version for
checking the foldable loads, as if the bug was already fixed. This
should work out of the box once PR8156 gets fixed since MayFoldLoad will
work as expected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137810 91177308-0d34-0410-b5e6-96231b3b80d8
vinsertf128 $1 + vpermilps $0, remove the old code that used to first
do the splat in a 128-bit vector and then insert it into a larger one.
This is better because the handling code gets simpler and also makes a
better room for the upcoming vbroadcast!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137807 91177308-0d34-0410-b5e6-96231b3b80d8
there is no support for native 256-bit shuffles, be more smart in some
cases, for example, when you can extract specific 128-bit parts and use
regular 128-bit shuffles for them. Example:
For this shuffle:
shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32>
<i32 1, i32 0, i32 7, i32 6>
This was expanded to:
vextractf128 $1, %ymm1, %xmm2
vpextrq $0, %xmm2, %rax
vmovd %rax, %xmm1
vpextrq $1, %xmm2, %rax
vmovd %rax, %xmm2
vpunpcklqdq %xmm1, %xmm2, %xmm1
vpextrq $0, %xmm0, %rax
vmovd %rax, %xmm2
vpextrq $1, %xmm0, %rax
vmovd %rax, %xmm0
vpunpcklqdq %xmm2, %xmm0, %xmm0
vinsertf128 $1, %xmm1, %ymm0, %ymm0
ret
Now we get:
vshufpd $1, %xmm0, %xmm0, %xmm0
vextractf128 $1, %ymm1, %xmm1
vshufpd $1, %xmm1, %xmm1, %xmm1
vinsertf128 $1, %xmm1, %ymm0, %ymm0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137733 91177308-0d34-0410-b5e6-96231b3b80d8
Mips1 does not support double precision loads or stores, therefore two single
precision loads or stores must be used in place of these instructions. This
patch treats double precision loads and stores as if they are legal
instructions until MCInstLowering, instead of generating the single precision
instructions during instruction selection or Prolog/Epilog code insertion.
Without the changes made in this patch, llc produces code that has the same
problem described in r137484 or bails out when
MipsInstrInfo::storeRegToStackSlot or loadRegFromStackSlot is called before
register allocation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137711 91177308-0d34-0410-b5e6-96231b3b80d8
Apparently we never added code to expand these pseudo instructions, and in
over a year, no one has noticed. Our register allocator must be awesome!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137551 91177308-0d34-0410-b5e6-96231b3b80d8
vectors. It operates on 128-bit elements instead of regular scalar
types. Recognize shuffles that are suitable for VPERM2F128 and teach
the x86 legalizer how to handle them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137519 91177308-0d34-0410-b5e6-96231b3b80d8
inserts and extracts. This simple combine makes us generate only 1
instruction instead of 11 in the v8 case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137362 91177308-0d34-0410-b5e6-96231b3b80d8
1) check for the "v" version of movaps
2) add a couple of CHECK-NOT to guarantee the behavior
3) move to a more appropriate test file
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137361 91177308-0d34-0410-b5e6-96231b3b80d8
(for example, after integer operation), do not pack the registers into a YMM
before saving. Its better to save as two XMM registers.
Before:
vinsertf128 $1, %xmm3, %ymm0, %ymm3
vinsertf128 $0, %xmm1, %ymm3, %ymm1
vmovaps %ymm1, 416(%rsp)
After:
vmovaps %xmm3, 416+16(%rsp)
vmovaps %xmm1, 416(%rsp)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137308 91177308-0d34-0410-b5e6-96231b3b80d8
data in-register prior to saving to memory. When we reorder the data in memory
we prevent the need to save multiple scalars to memory, making a single regular
store.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137238 91177308-0d34-0410-b5e6-96231b3b80d8
def : Pat<(X86Movss VR128:$src1,
(bc_v4i32 (v2i64 (load addr:$src2)))),
(MOVLPSrm VR128:$src1, addr:$src2)>;
This matches a MOVSS dag with a MOVLPS instruction. However, MOVSS will replace only the low 32 bits of the register, while the MOVLPS instruction will replace the low 64 bits. A testcase is added and illustrates the bug and also modified the one that was already present. Patch by Tanya Lattner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137227 91177308-0d34-0410-b5e6-96231b3b80d8
Coalescing can remove copy-like instructions with sub-register operands
that constrained the register class. Examples are:
x86: GR32_ABCD:sub_8bit_hi -> GR32
arm: DPR_VFP2:ssub0 -> DPR
Recompute the register class of any virtual registers that are used by
less instructions after coalescing.
This affects code generation for the Cortex-A8 where we use NEON
instructions for f32 operations, c.f. fp_convert.ll:
vadd.f32 d16, d1, d0
vcvt.s32.f32 d0, d16
The register allocator is now free to use d16 for the temporary, and
that comes first in the allocation order because it doesn't interfere
with any s-registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137133 91177308-0d34-0410-b5e6-96231b3b80d8
X86FloatingPoint keeps track of pending ST registers for an upcoming
inline asm instruction with fixed stack register constraints. It does
this by remembering which FP register holds the value that should appear
at a fixed stack position for the inline asm.
When that FP register is killed before the inline asm, make sure to
duplicate it to a scratch register, so the ST register still has a live
FP reference.
This could happen when the same FP register was copied to two ST
registers, or when a spill instruction is inserted between the ST copy
and the inline asm.
This fixes PR10602.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137050 91177308-0d34-0410-b5e6-96231b3b80d8
externally visable, create a local symbol to use in the CFE. If not, use the
function label itself.
Fixes PR10420.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136716 91177308-0d34-0410-b5e6-96231b3b80d8
avoid returning early for v8i32 types, which would only be valid for
vector with all zeros. Also split the handling of zeros and ones into separate
checking logic since they are handled differently. This fixes PR10547
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136642 91177308-0d34-0410-b5e6-96231b3b80d8
This includes registers like EFLAGS and ST0-ST7. We don't check for
liveness issues in the verifier and scavenger because registers will
never be allocated from these classes.
While in SSA form, we do care about the liveness of unallocatable
unreserved registers. Liveness of EFLAGS and ST0 neds to be correct for
MachineDCE and MachineSinking.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136541 91177308-0d34-0410-b5e6-96231b3b80d8
Later passes /are/ using this information when running the register
scavenger.
This fixes the second problem in PR10520.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136440 91177308-0d34-0410-b5e6-96231b3b80d8
This hidden llc option runs the machine code verifier after expanding
ARM pseudo-instructions, but before if-conversion.
The machine code verifier is much better at pointing out liveness errors
that can trip up the register scavenger.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136439 91177308-0d34-0410-b5e6-96231b3b80d8
Code like that would only be produced by bugpoint, but we should still
handle it correctly.
When a register is defined by a REG_SEQUENCE of undefs, the register
itself is undef. Previously, we would create a register with uses but no
defs.
Fixes part of PR10520.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136401 91177308-0d34-0410-b5e6-96231b3b80d8
usage of the shuffle bitmask. Both work in 128-bit lanes without
crossing, but in the former the mask of the high part is the same
used by the low part while in the later both lanes have independent
masks. Handle this properly and and add support for vpermilpd.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136200 91177308-0d34-0410-b5e6-96231b3b80d8
These copies would coalesce easily, but the resulting value would be
defined by a deleted instruction. Now we also remove the undefined value
number from the destination register.
This fixes PR10503.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136174 91177308-0d34-0410-b5e6-96231b3b80d8
On x86 we can't encode an immediate LHS of a sub directly. If the RHS comes from a XOR with a constant we can
fold the negation into the xor and add one to the immediate of the sub. Then we can turn the sub into an add,
which can be commuted and encoded efficiently.
This code is generated for __builtin_clz and friends.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136167 91177308-0d34-0410-b5e6-96231b3b80d8
different from the previous 128-bit because they work in lanes.
Update a few comments and add testcases
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136157 91177308-0d34-0410-b5e6-96231b3b80d8
shuffle before inserting on a 256-bit vector.
- Add AVX versions of movd/movq instructions
- Introduce a few COPY patterns to match insert_subvector instructions.
This turns a trivial insert_subvector instruction into a register copy,
coalescing the xmm into a ymm and avoid emiting on more instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136002 91177308-0d34-0410-b5e6-96231b3b80d8
Addresses PR10466, although the crash from that PR only triggers in cases where DAGCombine misses optimizing a shuffle.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@135980 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes PR10463. A two-address instruction with an <undef> use
operand was incorrectly rewritten so the def and use no longer used the
same register, violating the tie constraint.
Fix this by always rewriting <undef> operands with the register a def
operand would use.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@135885 91177308-0d34-0410-b5e6-96231b3b80d8