llvm-6502/test/CodeGen
Chandler Carruth 4ea3097d08 [x86] Teach the vector shuffle lowering to make a more nuanced decision
between splitting a vector into 128-bit lanes and recombining them vs.
decomposing things into single-input shuffles and a final blend.

This handles a large number of cases in AVX1 where the cross-lane
shuffles would be much more expensive to represent even though we end up
with a fast blend at the root. Instead, we can do a better job of
shuffling in a single lane and then inserting it into the other lanes.

This fixes the remaining bits of Halide's regression captured in PR21281
for AVX1. However, the bug persists in AVX2 because I've made this
change reasonably conservative. The cases where it makes sense in AVX2
to split into 128-bit lanes are much more rare because we can often do
full permutations across all elements of the 256-bit vector. However,
the particular test case in PR21281 is an example of one of the rare
cases where it is *always* better to work in a single 128-bit lane. I'm
going to try to teach the logic to detect and form the good code even in
AVX2 next, but it will need to use a separate heuristic.

Finally, there is one pesky regression here where we previously would
craftily use vpermilps in AVX1 to shuffle both high and low halves at
the same time. We no longer pull that off, and not for any really good
reason. Ultimately, I think this is just another missing nuance to the
selection heuristic that I'll try to add in afterward, but this change
already seems strictly worth doing considering the magnitude of the
improvements in common matrix math shuffle patterns.

As always, please let me know if this causes a surprising regression for
you.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221861 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 04:06:10 +00:00
..
AArch64 [FastISel][AArch64] Optimize select when one of the operands is a 'true' or 'false' value. 2014-11-13 00:36:46 +00:00
ARM Add Forward Control-Flow Integrity. 2014-11-11 21:08:02 +00:00
CPP
Generic
Hexagon Handle ctor/init_array initialization. 2014-11-03 14:56:05 +00:00
Inputs
Mips [mips][micromips] Add predicate 'InMicroMips' at CodeGen patterns for microMIPS instructions 2014-11-12 13:30:10 +00:00
MSP430
NVPTX [NVPTX] Add NVPTXLowerStructArgs pass 2014-11-05 18:19:30 +00:00
PowerPC Revert part of the PIC tests (TLS part) 2014-11-12 16:50:15 +00:00
R600 R600/SI: Fix broken check prefixes in test 2014-11-08 00:02:57 +00:00
SPARC
SystemZ
Thumb Improve logic that decides if its profitable to commute when some of the virtual registers involved have uses/defs chains connecting them to physical register. Fix up the tests that this change improves. 2014-11-05 06:43:02 +00:00
Thumb2
X86 [x86] Teach the vector shuffle lowering to make a more nuanced decision 2014-11-13 04:06:10 +00:00
XCore