checking each standalone condition and decide whether emit target
specific nodes or remove the condition if it's already matched before.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113031 91177308-0d34-0410-b5e6-96231b3b80d8
"Use target specific nodes instead of relying in unpckl and
unpckh pattern fragments during isel time. Also place a
depth limit in getShuffleScalarElt.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113020 91177308-0d34-0410-b5e6-96231b3b80d8
mask pattern fragment", which depends on r112934, which introduced some infinite
loop and select failures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112998 91177308-0d34-0410-b5e6-96231b3b80d8
- Teach getShuffleScalarElt how to handle more target
specific nodes, so the DAGCombine can make use of it.
- Add another hack to avoid the node update problem
during legalization. More description on the comments
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112934 91177308-0d34-0410-b5e6-96231b3b80d8
there are clearly no stores between the load and the store. This fixes
this miscompile reported as PR7833.
This breaks the test/CodeGen/X86/narrow_op-2.ll optimization, which is
safe, but awkward to prove safe. Move it to X86's README.txt.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112861 91177308-0d34-0410-b5e6-96231b3b80d8
check more strict, breaking some cases not checked in the
testsuite, but also exposes some foldings not done before,
as this example:
movaps (%rdi), %xmm0
movaps (%rax), %xmm1
movaps %xmm0, %xmm2
movss %xmm1, %xmm2
shufps $36, %xmm2, %xmm0
now is generated as:
movaps (%rdi), %xmm0
movaps %xmm0, %xmm1
movlps (%rax), %xmm1
shufps $36, %xmm1, %xmm0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112753 91177308-0d34-0410-b5e6-96231b3b80d8
times. This patch causes llc and llvm-mc (which both default to
verbose-asm) to print out comments after a few common shuffle
instructions which indicates the shuffle mask, e.g.:
insertps $113, %xmm3, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm3[1]
unpcklps %xmm1, %xmm0 ## xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
pshufd $1, %xmm1, %xmm1 ## xmm1 = xmm1[1,0,0,0]
This is carefully factored to keep the information extraction (of the
shuffle mask) separate from the printing logic. I plan to move the
extraction part out somewhere else at some point for other parts of
the x86 backend that want to introspect on the behavior of shuffles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112387 91177308-0d34-0410-b5e6-96231b3b80d8
when the top elements of a vector are undefined. This happens all
the time for X86-64 ABI stuff because only the low 2 elements of
a 4 element vector are defined. For example, on:
_Complex float f32(_Complex float A, _Complex float B) {
return A+B;
}
We used to produce (with SSE2, SSE4.1+ uses insertps):
_f32: ## @f32
movdqa %xmm0, %xmm2
addss %xmm1, %xmm2
pshufd $16, %xmm2, %xmm2
pshufd $1, %xmm1, %xmm1
pshufd $1, %xmm0, %xmm0
addss %xmm1, %xmm0
pshufd $16, %xmm0, %xmm1
movdqa %xmm2, %xmm0
unpcklps %xmm1, %xmm0
ret
We now produce:
_f32: ## @f32
movdqa %xmm0, %xmm2
addss %xmm1, %xmm2
pshufd $1, %xmm1, %xmm1
pshufd $1, %xmm0, %xmm3
addss %xmm1, %xmm3
movaps %xmm2, %xmm0
unpcklps %xmm3, %xmm0
ret
This implements rdar://8368414
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112378 91177308-0d34-0410-b5e6-96231b3b80d8
a new EltStride variable instead of reusing NumElems variable
for a non-obvious purpose. No functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112377 91177308-0d34-0410-b5e6-96231b3b80d8
Also teach this logic how to handle target specific shuffles if
needed, this is necessary while searching recursively for zeroed
scalar elements in vector shuffle operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112348 91177308-0d34-0410-b5e6-96231b3b80d8