Summary:
This allows it to fold pshufd instructions across intervening
half-shuffles and other noise. This pattern actually shows up in the
generic lowering tests, but I've also added direct tests using
intrinsics to make sure that the specific desired functionality is
working even if the lowering stuff changes in the future.
Differential Revision: http://reviews.llvm.org/D4292
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211892 91177308-0d34-0410-b5e6-96231b3b80d8
half-shuffles, even looking through intervening instructions in a chain.
Summary:
This doesn't happen to show up with any test cases I've found for the current
shuffle lowering, but previous attempts would benefit from this and it seems
generally useful. I've tested it directly using intrinsics, which also shows
that it will work with hand vectorized code as well.
Note that even though pshufd isn't directly used in these tests, it gets
exercised because we combine some of the half shuffles into a pshufd
first, and then merge them.
Differential Revision: http://reviews.llvm.org/D4291
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211890 91177308-0d34-0410-b5e6-96231b3b80d8
trivially redundant.
This fixes several cases in the new vector shuffle lowering algorithm
which would generate redundant shuffle instructions for the sake of
simplicity.
I'm also deleting a testcase which was somewhat ridiculous. It was
checking for a bug in 2007 about incorrectly transforming shuffles by
looking for the string "-86" in the output of a pretty substantial
function. This test case doesn't seem to have any value at this point.
Differential Revision: http://reviews.llvm.org/D4240
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211889 91177308-0d34-0410-b5e6-96231b3b80d8
x86 backend.
This sketches out a new code path for vector lowering, hidden behind an
off-by-default flag while it is under development. The fundamental idea
behind the new code path is to aggressively break down the problem space
in ways that ease selecting the odd set of instructions available on
x86, and carefully avoid scalarizing code even when forced to use older
ISAs. Notably, this starts off restricting itself to SSE2 and implements
the complete vector shuffle and blend space for 128-bit vectors in SSE2
without scalarizing. The plan is to layer on top of this ISA extensions
where we can bail out of the complex SSE2 lowering and opt for
a cheaper, specialized instruction (or set of instructions). It also
needs to be generalized to AVX and AVX512 vector widths.
Currently, this does a decent but not perfect job for SSE2. There are
some specific shortcomings that I plan to address:
- We need a peephole combine to fold together shuffles where possible.
There are cases where a previous shuffle could be modified slightly to
arrange for elements to be in the correct position and a later shuffle
eliminated. Doing this eagerly added quite a bit of complexity, and
so my plan is to combine away these redundancies afterward.
- There are a lot more clever ways to use unpck and pack that need to be
added. This is essential for real world shuffles as it turns out...
Once SSE2 is polished a bit I should be able to get interesting numbers
on performance improvements on benchmarks conducive to vectorization.
All of this will be off by default until it is functionally equivalent
of course.
Differential Revision: http://reviews.llvm.org/D4225
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211888 91177308-0d34-0410-b5e6-96231b3b80d8
SystemZRegisterInfo and replace it with the subtarget as that's
all they needed in the first place. Update all uses and calls
accordingly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211877 91177308-0d34-0410-b5e6-96231b3b80d8
both MSP430InstrInfo and MSP430RegisterInfo. Remove unused member
variable StackAlign from MSP430RegisterInfo. Update constructors
accordingly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211835 91177308-0d34-0410-b5e6-96231b3b80d8
For now I used a separate template for these sub-vector/tuple broadcasts
rather than sharing the mem variants with avx512_int_broadcast_rm.
<rdar://problem/17402869>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211828 91177308-0d34-0410-b5e6-96231b3b80d8
for the Sparc port. Use the same initializeSubtargetDependencies
function to handle initialization similar to the other ports to
handle dependencies.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211811 91177308-0d34-0410-b5e6-96231b3b80d8
The default rounding mode to initialize the mode register needs
to be reported to the runtime. Fill in other bits a kernel
may be interested in setting for future use.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211791 91177308-0d34-0410-b5e6-96231b3b80d8
includes handling DIR_PWR8 where appropriate
The P7Model Itinerary is currently tied in for use under the P8Model, and will be updated later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211779 91177308-0d34-0410-b5e6-96231b3b80d8
Additional compliant GAS names for coprocessor register name
are enabled for all instruction with parameter MCK_CoprocReg:
LDC,LDC2,STC,STC2,CDP,CDP2,MCR,MCR2,MCRR,MCRR2,MRC,MRC2,MRRC,MRRC2
Patch by Andrey Kuharev.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211776 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches the backend how to canonicalize a shuffle vectors
according to the rule:
- (shuffle (FADD A, B), (FSUB A, B), Mask) ->
(shuffle (FSUB A, -B), (FADD A, -B), Mask)
Where 'Mask' is:
<0,5,2,7> ;; for v4f32 and v4f64 shuffles.
<0,3> ;; for v2f64 shuffles.
<0,9,2,11,4,13,6,15> ;; for v8f32 shuffles.
In general, ISel only knows how to pattern-match a canonical
'fadd + fsub + blendi' dag node sequence into an ADDSUB instruction.
This new rule allows to convert a non-canonical dag sequence into a
canonical one that will be matched by a single ADDSUB at ISel stage.
The idea of converting a non-canonical ADDSUB into a canonical one by
swapping the first two operands of the shuffle, and then negating the
second operand of the FADD and FSUB, was originally proposed by Hal Finkel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211771 91177308-0d34-0410-b5e6-96231b3b80d8
The *_alt defs for vcmp are used by the InstParser (the asm string in the main
def is used by the InstPrinter) . The former was accepting vector registers
as destination rather than mask registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211750 91177308-0d34-0410-b5e6-96231b3b80d8
string_ostream is a safe and efficient string builder that combines opaque
stack storage with a built-in ostream interface.
small_string_ostream<bytes> additionally permits an explicit stack storage size
other than the default 128 bytes to be provided. Beyond that, storage is
transferred to the heap.
This convenient class can be used in most places an
std::string+raw_string_ostream pair or SmallString<>+raw_svector_ostream pair
would previously have been used, in order to guarantee consistent access
without byte truncation.
The patch also converts much of LLVM to use the new facility. These changes
include several probable bug fixes for truncated output, a programming error
that's no longer possible with the new interface.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211749 91177308-0d34-0410-b5e6-96231b3b80d8