llvm-6502/test
Chandler Carruth fa68750e54 [x86] Unify the horizontal adding used for popcount lowering taking the
best approach of each.

For vNi16, we use SHL + ADD + SRL pattern that seem easily the best.

For vNi32, we use the PUNPCK + PSADBW + PACKUSWB pattern. In some cases
there is a huge improvement with this in IACA's estimated throughput --
over 2x higher throughput!!!! -- but the measurements are too good to be
true. In one narrow case, the SHL + ADD + SHL + ADD + SRL pattern looks
slightly faster, but I'm not sure I believe any of the measurements at
this point. Both are the exact same uops though. Hard to be confident of
anything past that.

If anyone wants to collect very detailed (Agner-level) timings with the
result of this patch, or with the i32 case replaced with SHL + ADD + SHl
+ ADD + SRL, I'd be very interested. Note that you'll need to test it on
both Ivybridge and Haswell, with both SSE3, SSSE3, and AVX selected as
I saw unique behavior in each of these buckets with IACA all of which
should be checked against measured performance.

But this patch is still a useful improvement by dropping duplicate work
and getting the much nicer PSADBW lowering for v2i64.

I'd still like to rephrase this in terms of generic horizontal sum. It's
a bit lame to have a special case of that just for popcount.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@238652 91177308-0d34-0410-b5e6-96231b3b80d8
2015-05-30 10:35:03 +00:00
..
Analysis [DependenceAnalysis] Extend unifySubscriptType for handling coupled subscript groups. 2015-05-29 16:58:08 +00:00
Assembler
Bindings
Bitcode [BitcodeReader] Change an assert to a call to a call to Error() 2015-05-30 00:17:20 +00:00
BugPoint
CodeGen [x86] Unify the horizontal adding used for popcount lowering taking the 2015-05-30 10:35:03 +00:00
DebugInfo Object, ELF: Use error code instead of calling report_fatal_error() 2015-05-28 20:25:42 +00:00
ExecutionEngine [Mips64] Add support for MCJIT for MIPS64r2 and MIPS64r6 2015-05-28 13:48:41 +00:00
Feature
FileCheck
Instrumentation [ASan] New approach to dynamic allocas unpoisoning. Patch by Max Ostapenko! 2015-05-28 07:51:49 +00:00
Integer
JitListener
Linker
LTO
MC Add support for VSX FMA single-precision instructions to the PPC back end 2015-05-29 17:13:25 +00:00
Object Stop inventing symbol sizes. 2015-05-22 15:43:00 +00:00
Other
SymbolRewriter
TableGen Use std::bitset for SubtargetFeatures. 2015-05-26 10:47:10 +00:00
tools [llvm-readobj/ELF] Teach how to decode DF_1_XXX flags 2015-05-25 19:12:18 +00:00
Transforms [IR] fptrunc-of-fptrunc isn't an EliminableCastPair. 2015-05-29 00:04:30 +00:00
Unit
Verifier
YAMLParser
.clang-format
CMakeLists.txt
lit.cfg
lit.site.cfg.in
Makefile
Makefile.tests
TestRunner.sh