LLVM backend for 6502
Go to file
Chandler Carruth 4ea3097d08 [x86] Teach the vector shuffle lowering to make a more nuanced decision
between splitting a vector into 128-bit lanes and recombining them vs.
decomposing things into single-input shuffles and a final blend.

This handles a large number of cases in AVX1 where the cross-lane
shuffles would be much more expensive to represent even though we end up
with a fast blend at the root. Instead, we can do a better job of
shuffling in a single lane and then inserting it into the other lanes.

This fixes the remaining bits of Halide's regression captured in PR21281
for AVX1. However, the bug persists in AVX2 because I've made this
change reasonably conservative. The cases where it makes sense in AVX2
to split into 128-bit lanes are much more rare because we can often do
full permutations across all elements of the 256-bit vector. However,
the particular test case in PR21281 is an example of one of the rare
cases where it is *always* better to work in a single 128-bit lane. I'm
going to try to teach the logic to detect and form the good code even in
AVX2 next, but it will need to use a separate heuristic.

Finally, there is one pesky regression here where we previously would
craftily use vpermilps in AVX1 to shuffle both high and low halves at
the same time. We no longer pull that off, and not for any really good
reason. Ultimately, I think this is just another missing nuance to the
selection heuristic that I'll try to add in afterward, but this change
already seems strictly worth doing considering the magnitude of the
improvements in common matrix math shuffle patterns.

As always, please let me know if this causes a surprising regression for
you.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221861 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 04:06:10 +00:00
autoconf
bindings
cmake
docs
examples
include llvm-readobj: Print out address table when dumping COFF delay-import table 2014-11-13 03:22:54 +00:00
lib [x86] Teach the vector shuffle lowering to make a more nuanced decision 2014-11-13 04:06:10 +00:00
projects
test [x86] Teach the vector shuffle lowering to make a more nuanced decision 2014-11-13 04:06:10 +00:00
tools llvm-readobj: Print out address table when dumping COFF delay-import table 2014-11-13 03:22:54 +00:00
unittests Drop a few unneeded ctor calls (missed code review comment). 2014-11-13 00:36:34 +00:00
utils Make TreePattern::error use Twine 2014-11-11 23:48:11 +00:00
.arcconfig
.clang-format
.clang-tidy
.gitignore
CMakeLists.txt Pass PRIVATE to target_link_libraries if using shared libraries. 2014-11-07 15:33:56 +00:00
CODE_OWNERS.TXT Add Tom Stellard's role as 3.5 release manager. 2014-09-12 08:07:31 +00:00
configure
CREDITS.TXT Rise from the dead and update personal info 2014-08-25 17:51:04 +00:00
LICENSE.TXT Remove projects/sample. 2014-03-12 22:40:22 +00:00
llvm.spec.in
LLVMBuild.txt Remove the very substantial, largely unmaintained legacy PGO 2013-10-02 15:42:23 +00:00
Makefile
Makefile.common
Makefile.config.in
Makefile.rules
README.txt

Low Level Virtual Machine (LLVM)
================================

This directory and its subdirectories contain source code for the Low Level
Virtual Machine, a toolkit for the construction of highly optimized compilers,
optimizers, and runtime environments.

LLVM is open source software. You may freely distribute it under the terms of
the license agreement found in LICENSE.txt.

Please see the documentation provided in docs/ for further
assistance with LLVM, and in particular docs/GettingStarted.rst for getting
started with LLVM and docs/README.txt for an overview of LLVM's
documentation setup.

If you're writing a package for LLVM, see docs/Packaging.rst for our
suggestions.