llvm-6502/test/CodeGen
Andrea Di Biagio 54529ed1c4 [X86] Avoid introducing extra shuffles when lowering packed vector shifts.
When lowering a vector shift node, the backend checks if the shift count is a
shuffle with a splat mask. If so, then it introduces an extra dag node to
extract the splat value from the shuffle. The splat value is then used
to generate a shift count of a target specific shift.

However, if we know that the shift count is a splat shuffle, we can use the
splat index 'I' to extract the I-th element from the first shuffle operand.
The advantage is that the splat shuffle may become dead since we no longer
use it.

Example:

;;
define <4 x i32> @example(<4 x i32> %a, <4 x i32> %b) {
  %c = shufflevector <4 x i32> %b, <4 x i32> undef, <4 x i32> zeroinitializer
  %shl = shl <4 x i32> %a, %c
  ret <4 x i32> %shl
}
;;

Before this patch, llc generated the following code (-mattr=+avx):
  vpshufd $0, %xmm1, %xmm1   # xmm1 = xmm1[0,0,0,0]
  vpxor  %xmm2, %xmm2
  vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7]
  vpslld %xmm1, %xmm0, %xmm0
  retq

With this patch, the redundant splat operation is removed from the code.
  vpxor  %xmm2, %xmm2
  vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7]
  vpslld %xmm1, %xmm0, %xmm0
  retq


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223461 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 12:13:30 +00:00
..
AArch64 [AArch64] Combining Load and IntToFp should check for neon availability 2014-12-04 20:25:50 +00:00
ARM Add missing FP build attribute tests. 2014-12-05 08:22:47 +00:00
CPP
Generic
Hexagon
Inputs
Mips [mips] Fix passing of small structures for big-endian O32. 2014-12-02 20:40:27 +00:00
MSP430
NVPTX [NVPTX] Do not emit .weak symbols for NVPTX 2014-12-01 21:16:17 +00:00
PowerPC Revert "r223440 - Consider subregs when calling MI::registerDefIsDead for phys deps" 2014-12-05 02:07:35 +00:00
R600 R600/SI: Remove i1 pseudo VALU ops 2014-12-03 05:22:35 +00:00
SPARC
SystemZ
Thumb Re-add support to llvm-objdump for Mach-O universal files and archives with -macho 2014-12-04 23:56:27 +00:00
Thumb2 ARM: allow constpool entry to be moved to the user's block in all cases. 2014-11-13 17:58:53 +00:00
X86 [X86] Avoid introducing extra shuffles when lowering packed vector shifts. 2014-12-05 12:13:30 +00:00
XCore