SSE transition penalty. The pass is enabled through the "x86-use-vzeroupper"
llc command line option. This is only the first step (very naive and
conservative one) to sketch out the idea, but proper DFA is coming next
to allow smarter decisions. Comments and ideas now and in further commits
will be very appreciated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138317 91177308-0d34-0410-b5e6-96231b3b80d8
instead of 2. They were already defined this way in their regular
version, but not for the intrinsics versions (*_Int), and that would work
for assembly emission but not for object code, since a MachineOperand
would be missing. This commit fix PR10697.
Also removed the {VSQRT,VRSQRT,VRCP}r_Int forms and match the intrinsic
via INSERT_SUBREG+EXTRACT_SUBREG patterns. The same couldn't be done for
memory versions because sse_load_f32/sse_load_f64 operand need special
handling and don't work like regular "addr" operands.
There are right now 114 "*_Int" and 98 "Int_*" forms! I'm slowly
removing them as I step through, but hope we can get rid of these
someday, they are really annoying :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138012 91177308-0d34-0410-b5e6-96231b3b80d8
match splats in the form (splat (scalar_to_vector (load ...))) whenever
the load can be folded. All the logic and instruction emission is
working but because of PR8156, there are no ways to match loads, cause
they can never be folded for splats. Thus, the tests are XFAILed, but
I've tested and exercised all the logic using a relaxed version for
checking the foldable loads, as if the bug was already fixed. This
should work out of the box once PR8156 gets fixed since MayFoldLoad will
work as expected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137810 91177308-0d34-0410-b5e6-96231b3b80d8
vinsertf128 $1 + vpermilps $0, remove the old code that used to first
do the splat in a 128-bit vector and then insert it into a larger one.
This is better because the handling code gets simpler and also makes a
better room for the upcoming vbroadcast!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137807 91177308-0d34-0410-b5e6-96231b3b80d8
there is no support for native 256-bit shuffles, be more smart in some
cases, for example, when you can extract specific 128-bit parts and use
regular 128-bit shuffles for them. Example:
For this shuffle:
shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32>
<i32 1, i32 0, i32 7, i32 6>
This was expanded to:
vextractf128 $1, %ymm1, %xmm2
vpextrq $0, %xmm2, %rax
vmovd %rax, %xmm1
vpextrq $1, %xmm2, %rax
vmovd %rax, %xmm2
vpunpcklqdq %xmm1, %xmm2, %xmm1
vpextrq $0, %xmm0, %rax
vmovd %rax, %xmm2
vpextrq $1, %xmm0, %rax
vmovd %rax, %xmm0
vpunpcklqdq %xmm2, %xmm0, %xmm0
vinsertf128 $1, %xmm1, %ymm0, %ymm0
ret
Now we get:
vshufpd $1, %xmm0, %xmm0, %xmm0
vextractf128 $1, %ymm1, %xmm1
vshufpd $1, %xmm1, %xmm1, %xmm1
vinsertf128 $1, %xmm1, %ymm0, %ymm0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137733 91177308-0d34-0410-b5e6-96231b3b80d8
Allow a target assembly parser to do context sensitive constraint checking
on a potential instruction match. This will be used, for example, to handle
Thumb2 IT block parsing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137675 91177308-0d34-0410-b5e6-96231b3b80d8
vectors. It operates on 128-bit elements instead of regular scalar
types. Recognize shuffles that are suitable for VPERM2F128 and teach
the x86 legalizer how to handle them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137519 91177308-0d34-0410-b5e6-96231b3b80d8
inserts and extracts. This simple combine makes us generate only 1
instruction instead of 11 in the v8 case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137362 91177308-0d34-0410-b5e6-96231b3b80d8
(for example, after integer operation), do not pack the registers into a YMM
before saving. Its better to save as two XMM registers.
Before:
vinsertf128 $1, %xmm3, %ymm0, %ymm3
vinsertf128 $0, %xmm1, %ymm3, %ymm1
vmovaps %ymm1, 416(%rsp)
After:
vmovaps %xmm3, 416+16(%rsp)
vmovaps %xmm1, 416(%rsp)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137308 91177308-0d34-0410-b5e6-96231b3b80d8
data in-register prior to saving to memory. When we reorder the data in memory
we prevent the need to save multiple scalars to memory, making a single regular
store.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137238 91177308-0d34-0410-b5e6-96231b3b80d8
def : Pat<(X86Movss VR128:$src1,
(bc_v4i32 (v2i64 (load addr:$src2)))),
(MOVLPSrm VR128:$src1, addr:$src2)>;
This matches a MOVSS dag with a MOVLPS instruction. However, MOVSS will replace only the low 32 bits of the register, while the MOVLPS instruction will replace the low 64 bits. A testcase is added and illustrates the bug and also modified the one that was already present. Patch by Tanya Lattner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137227 91177308-0d34-0410-b5e6-96231b3b80d8