Commit Graph

11858 Commits

Author SHA1 Message Date
Chandler Carruth
bf21d40070 [x86] Teach the new vector shuffle lowering to widen floating point
elements as well as integer elements in order to form simpler shuffle
patterns.

This is the primary reason why we were failing to match some of the
2-and-2 floating point shuffles such as PR21140. Even after fixing this
we need to support some extra patterns in the backend in order to match
the resulting X86ISD::UNPCKL nodes into the correct instructions. This
commit should fix PR21140 and includes more comprehensive testing of
insertion patterns in v4 shuffles.

Not all of the added tests are beautiful. For example, we don't have
clever instructions to insert-via-load in the integer domain. There are
also some places where we aren't sufficiently cunning with our use of
movq and movd, but that's future work.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218911 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 21:37:14 +00:00
Chandler Carruth
130d072eb7 [x86] Move the vperm2f128 test to be vperm2x128 and test both the
floating point and integer domains.

Merge the AVX2 test into it and add an extra RUN line. Generate clean
FileCheck statements with my script. Remove the now merged AVX2 tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218903 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 20:11:11 +00:00
Chandler Carruth
fd94d1bd76 [x86] Just delete the last combine test file.
This file isn't really doing anything useful. Many of the tests that
seem to be combined are also repeats from other test files. Many of the
other tests, despite the comment that they should be combined into
a single shuffle... well... aren't combined into a single shuffle.
=/

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218862 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 08:05:57 +00:00
Chandler Carruth
7d663050b4 [x86] Merge still more combine tests into the common file. These at
least seem *slightly* more interesting test wise, although given how
spotily we actually combine anything, I remain somewhat suspicious.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218861 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 08:02:34 +00:00
Chandler Carruth
cd912001b4 [x86] Merge the third combining test into the generic one and add proper
checks for all the ISA variants.

If the SSE2 checks here terrify you, good. This is (in large part) the
kind of amazingly bad code that is holding LLVM back when vectorizing on
older ISAs.

At the same time, these tests seem increasingly dubious to me. There are
a very large number of tests and it isn't clear that they are
systematically covering a specific set of functionality. Anyways,
I don't want to reduce testing during the transition, I just want to
consolidate it to where it is easier to manage.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218860 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 07:56:47 +00:00
Chandler Carruth
84b3f53bcb [x86] Merge the second set of vector combining tests into a common test
file.

Some of these really don't make sense to test -- we're testing for the
*lack* of combining two shuffles into one, presumably because the two
would generate better shuffles in the end. But if you look at the
generated code shown here, in many cases the generated code is, frankly,
terrible. Or we combine any two generated shuffles back into a single
instruction! I've left a FIXME to revisit these decisions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218859 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 07:42:58 +00:00
Chandler Carruth
84c7078ddc [x86] Merge the bitwise operation shuffle combining into the common test
file, adding assertions across the ISA variants for it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218858 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 07:30:24 +00:00
Chandler Carruth
f25a5f3290 [x86] Update this test to run a full complement of the ISA extensions,
and use the new grouped FileCheck patterns to match them.

No interesting changes yet, but this test is now in proper form to have
the other shuffle combining tests merged into it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218857 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 07:22:26 +00:00
Chandler Carruth
fd153c13d8 [x86] Minimize the parameters to this test for clarity.
The test has to do with DAG combines, and so it doesn't need the new
vector shuffle lowering to be effective. Also, it has a nice in-IR
triple string which we should really be using rather than command line
flags (unless it varies form RUN-line to RUN-line). Finally, I much
prefer letting LLVM synthesize the correct datalayout string from the
triple rather than baking one in here that will just become stale.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218856 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 07:17:15 +00:00
Chandler Carruth
1ad4741e91 [x86] Add a comment clarifying that this test should span all manners of
generic DAG combining of shuffles relevant to x86.

My plan is to fold a bunch of the other DAG combining test cases into
this one, while converting them to use the nice new FileCheck assertion
syntax.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218855 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 07:13:25 +00:00
Chandler Carruth
0c9da85213 [x86] Switch some of the new consolidated vector tests to use
a bare-metal triple and have nice BB labels, etc.

No significant change here, just tidying up to have a consistent set of
OS-agnostic vector functionality here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218854 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 06:52:19 +00:00
Chandler Carruth
4bbf21e71e [x86] Improve and correct how the new vector shuffle lowering was
matching and lowering 64-bit insertions.

The first problem was that we weren't looking through bitcasts to
discover that we *could* lower as insertions. Once fixed, we in turn
weren't looking through bitcasts to discover that we could fold a load
into the lowering. Once fixed, we weren't forming a SCALAR_TO_VECTOR
node around the inserted element and instead were passing a scalar to
a DAG node that expected a vector. It turns out there are some patterns
that will "lower" this into the correct asm, but the rest of the X86
backend is very unhappy with such antics.

This should fix a few more edge case regressions I've spotted going
through the regression test suite to enable the new vector shuffle
lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218839 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 23:14:28 +00:00
Sanjay Patel
2b918388ab Lower FNEG ( FABS (x) ) -> FNABS (x) [X86 codegen] PR20578
Negative FABS of either a scalar or vector should be handled the same way
on x86 with SSE/AVX: a single OR instruction of the FP operand with a
constant to light up the sign bit(s).

http://llvm.org/bugs/show_bug.cgi?id=20578

Differential Revision: http://reviews.llvm.org/D5201



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218822 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 21:20:06 +00:00
Chandler Carruth
93803535ad [x86] Merge the remaining test cases into vector-blend.ll and remove all
the ISA-specific test files.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218818 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 21:07:07 +00:00
Chandler Carruth
b1b266ca9c [x86] Expand the ISA coverage of our blend test in preparation for
merging ISA-specific testing into this file.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218816 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 21:03:21 +00:00
Chandler Carruth
11e1c61b86 [x86] Merge the interesting test cases from blend-msb.ll into
vector-blend.ll and remove the former.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218814 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 20:56:57 +00:00
Chandler Carruth
ccee7a87e0 [x86] Move the AVX blend test to a generic name. I'm going to fold other
blend tests into this one.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218813 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 20:52:55 +00:00
Chandler Carruth
f90b2fbd35 [x86] Remove a test that wasn't doing anything really. We have plenty of
better tests for zext of vectors at this point.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218811 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 20:50:58 +00:00
Chandler Carruth
eeb4a0c7ef [x86] Add a 32-bit run to the sext test, and remove a sad vec_sext.ll
test file.

This old test had a bunch of functions that were never even checked. =/
The only thing it really did was to make sure that we did something
reasonable in 32-bit mode with SSE4.1. Adding another run line to the
main vector-sext.ll test seems a better way to do that.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218810 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 20:49:54 +00:00
Chandler Carruth
3916c2642d [x86] Teach both sext and zext vector tests to cover a nice wide range
of architectures: SSE2, SSSE3, SSE4.1, AVX, and AVX2.

Unfortunately, this exposses the absolute horror of the code we generate
for many of these patterns. Anyone wanting to familiarize themselves
with the x86 backend and improve performance could do a lot of good
sitting down and making these test cases not look so terrible. While the
new vector shuffle code I'm working on well help some, it won't fix all
of the crimes here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218807 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 20:41:36 +00:00
Chandler Carruth
03eb5f8ff6 [x86] Sort the ISA-specific RUN lines for vector-sext.ll to go from
oldest to newest. This makes more sense to me and is more consistent
with other tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218802 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 20:32:44 +00:00
Tim Northover
c1b22c63c0 ARM: yes it can (as of r218789)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218801 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 20:31:58 +00:00
Chandler Carruth
edc47287e1 [x86] Rename avx-{s,z}ext.ll to vector-{s,z}ext.ll.
These tests are far and away the best sext and zext tests we have for
vectors. I'm going to merge the other similar tests into them and expand
the ISA coverage.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218800 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 20:30:30 +00:00
Chandler Carruth
0f1d402ad2 [x86] Cleanup and re-generate the checks for avx-zext.ll using the new
script.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218799 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 20:27:16 +00:00
Chandler Carruth
024b291415 [x86] Generate the FileCheck assertions for avx-blend.ll with my new
script to make them nice and predictable. This will ease updating them
for the new vector shuffle lowering and seeing the delta if any.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218795 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 20:19:45 +00:00
Chandler Carruth
0c7a4b9d27 [x86] Clean up and generate detailed FileCheck assertions for
avx-sext.ll using my new script.

Also add an AVX2 mode to this test.

Part of cleaning up the test suite before enabling the new vector
shuffle lowering. This also highlights some of the abysmal failures of
the old shuffle lowering. Check out those 'pinsrw' and 'pextrw'
sequences!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218794 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 20:19:32 +00:00
Tim Northover
472f2a056d ARM: allow copying of CPSR when all else fails.
As with x86 and AArch64, certain situations can arise where we need to spill
CPSR in the middle of a calculation. These should be avoided where possible
(MRS/MSR is rather expensive), which ARM is actually better at than the other
two since it tries to Glue defs to uses, but as a last ditch effort, copying is
better than crashing.

rdar://problem/18011155

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218789 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 19:21:03 +00:00
Adrian Prantl
02474a32eb Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

Note: I accidentally committed a bogus older version of this patch previously.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218787 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 18:55:02 +00:00
Reed Kotler
befab0a552 Add fptrunc to mips fast-sel
Summary: Implement conversion of 64 to 32 bit floating point numbers (fptrunc) in mips fast-isel

Test Plan:
fptrunc.ll
checked also with 4 internal mips build bot flavors mip32r1/miprs32r2 and at -O0 and -O2

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: rfuhler

Differential Revision: http://reviews.llvm.org/D5553

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218785 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 18:47:02 +00:00
Adrian Prantl
10c4265675 Revert r218778 while investigating buldbot breakage.
"Move the complex address expression out of DIVariable and into an extra"

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218782 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 18:10:54 +00:00
Adrian Prantl
076fd5dfc1 Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218778 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 17:55:39 +00:00
Tom Stellard
56077f5796 R600: Call EmitFunctionHeader() in the AsmPrinter to populate the ELF symbol table
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218776 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 17:15:17 +00:00
Jingyue Wu
ccd995ab0c Revert r216862 due to a performance regression
Reported by Alexey Volkov in PR21115


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218771 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 15:22:13 +00:00
Oliver Stannard
9d7038c437 [ARM] Allow selecting VRINT[APMXZR] and VCVT[BT] instructions for FPv5
Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions
when targeting ARMv8, but they are actually present on any target with
FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an
M-profile core, but they have the same instructions so we model them
both as FPARMv8 in the ARM backend.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218763 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 13:13:18 +00:00
Chandler Carruth
7d64681274 [x86] Fix a few more tiny patterns with the new vector shuffle lowering
that keep cropping up in the regression test suite.

This also addresses one of the issues raised on the mailing list with
failing to form 'movsd' in as many cases as we realistically should.
There will be corresponding patches forthcoming for v4f32 at least. This
was a lot of fuss for a relatively small gain, but all the fuss was on
my end trying different ways of holding the pieces of the x86 fragment
patterns *just right*. Now that it works, the code is reasonably simple.

In the new test cases I'm adding here, v2i64 sticks out as just plain
horrible. I've not come up with any great ideas here other than that it
would be nice to recognize when we're *going* to take a domain crossing
hit and cross earlier to get the decent instructions. At least with AVX
it is slightly less silly....

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218756 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 11:14:02 +00:00
Asiri Rathnayake
e9bbacd0a8 Add missing natual vector cast.
Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST
was introduced in r217159 and r217138. This patch adds a missing cast from
v2f32 to v1i64 which is causing some compilation failures. Also added test
cases to cover various modimm types and BUILD_VECTORs with i64 elements.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218751 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 09:59:45 +00:00
Oliver Stannard
ff18b9ff38 [ARM] Add support for Cortex-M7, FPv5-SP and FPv5-DP (LLVM)
The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and
FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be
modelled using the same target feature, and all double-precision
operations are already disabled by the fp-only-sp target features.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218747 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 09:02:17 +00:00
Sasa Stankovic
05a13f0bd0 [mips] For indirect calls we don't need $gp to point to .got. Mips linker
doesn't generate lazy binding stub for a function whose address is taken in
the program.

Differential Revision: http://reviews.llvm.org/D5067


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218744 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 08:22:21 +00:00
Chandler Carruth
9e2fe46484 [x86] Teach the new vector shuffle lowering to be even more aggressive
in exposing the scalar value to the broadcast DAG fragment so that we
can catch even reloads and fold them into the broadcast.

This is somewhat magical I'm afraid but seems to work. It is also what
the old lowering did, and I've switched an old test to run both
lowerings demonstrating that we get the same result.

Unlike the old code, I'm not lowering f32 or f64 scalars through this
path when we only have AVX1. The target patterns include pretty heinous
code to re-cast those as shuffles when the scalar happens to not be
spilled because AVX1 provides no broadcast mechanism from registers
what-so-ever. This is terribly brittle. I'd much rather go through our
generic lowering code to get this. If needed, we can add a peephole to
get even more opportunities to broadcast-from-spill-slots that are
exposed post-RA, but my suspicion is this just doesn't matter that much.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218734 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 03:19:43 +00:00
Chandler Carruth
429670f0e8 [x86] Hoist the zext-lowering up in the v4i32 lowering routine -- it is
the same speed as pshufd but we can fold loads into the pmovzx
instructions.

This fixes some regressions that came up in the regression test suite
for the new vector shuffle lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218733 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 02:25:54 +00:00
Chandler Carruth
afe75172b1 [x86] Teach the new vector shuffle lowering about VBROADCAST and
VPBROADCAST.

This has the somewhat expected pervasive impact. I don't know why
I forgot about this. Everything seems good with lots of significant
improvements in the tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218724 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 00:41:21 +00:00
Chandler Carruth
b3ce94707a [x86] Add AVX1 and AVX2 testing to all of the 128-bit shuffle test
cases.

While clearly we don't need the AVX vector width, these ISA extensions
often cause us to select different instructions and we should cover them
even with the narrow vector width.

Also, while here, nuke the stress_test2 contents. There is no reason to
try to FileCheck this entire body when it is mostly a test for
successfully surviving the code generator.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218710 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 22:16:23 +00:00
Chandler Carruth
3a926b9b5c [x86] Update the exact FileCheck syntax of the 256-bit and 512-bit
shuffle tests to match that used in the script I posted and now used
consistently in 128-bit tests.

Nothing interesting changing here, just using the label name as the
FileCheck label and a slightly more general comment marker consumption
strategy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218709 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 22:04:45 +00:00
Chandler Carruth
8b6f2eee07 [x86] Rework all of the 128-bit vector shuffle tests with my handy test
updating script so that they are more thorough and consistent.

Specific fixes here include:
- Actually test VEX-encoded AVX mnemonics.
- Actually use an SSE 4.1 run to test SSE 4.1 features!
- Correctly check instructions sequences from the start of the function.
- Elide the shuffle operands and comment designator in a consistent way.
- Test all of the architectures instead of just the ones I was motivated
  to manually author.

I've gone back through and fixed up any egregious issues I spotted. Let
me know if I missed something you really dislike.

One downside to this is that we're now not as diligently using FileCheck
variables for registers. I would be much more concerned with this if we
had larger register usage, but there just aren't that interesting of
register choices here and most of the registers are constrained by the
ABI. Ultimately, I don't think this is likely to be the maintenance
burden for these tests and updating them again should be staright
forward.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218707 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 21:44:34 +00:00
Juergen Ributzka
9952c922c2 Recommit r218010 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ.
Note: This version fixed an issue with the TBZ/TBNZ instructions that were
generated in FastISel. The issue was that the 64bit version of TBZ (TBZX)
automagically sets the upper bit of the immediate field that is used to specify
the bit we want to test. To test for any of the lower 32bits we have to first
extract the subregister and use the 32bit version of the TBZ instruction (TBZW).

Original commit message:
Teach selectBranch to fold bit test and branch into a single instruction (TBZ or
TBNZ).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218693 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 19:59:35 +00:00
Matt Arsenault
28233d3a63 R600/SI: Fix printing of clamp and omod
No tests for omod since nothing uses it yet, but
this should get rid of the remaining annoying trailing
zeros after some instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218692 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 19:49:48 +00:00
Reed Kotler
8a6f79e58d Add numeric extend, trunctate to mips fast-isel
Summary:
 Add numeric extend, trunctate to mips fast-isel

 Reactivates D4827



Test Plan:
fpext.ll
loadstoreconv.ll

Reviewers: dsanders

Subscribers: mcrosier

Differential Revision: http://reviews.llvm.org/D5251

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218681 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 16:30:13 +00:00
Robert Khasanov
8acdc5232d [AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VCMPGT{BWDQ}.
Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218670 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 12:15:52 +00:00
Robert Khasanov
175ff01f0f [AVX512] Added intrinsics for 128- and 256-bit versions of VCMPEQ{BWDQ}
Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1.
Now cmp intrinsics lower in the following way:
 (i8 (int_x86_avx512_mask_pcmpeq_q_128
             (v2i64 %a), (v2i64 %b), (i8 %mask))) ->
 (i8 (bitcast
   (v8i1 (insert_subvector undef,
           (v2i1 (and (PCMPEQM %a, %b),
                      (extract_subvector
                         (v8i1 (bitcast %mask)), 0))), 0))))


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218669 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 11:41:54 +00:00
Robert Khasanov
cfa5724d50 [AVX512] Added intrinsics for VPCMPEQB and VPCMPEQW.
Added new operand type for intrinsics (IIT_V64)


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218668 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 11:32:22 +00:00
Robert Khasanov
58da66b2bf [AVX512] Enabled intrinsics for VPCMPEQD and VPCMPEQQ.
Added CMP_MASK intrinsic type


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218667 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 11:19:50 +00:00
Chandler Carruth
4abb04a65c [x86] Revert r218588, r218589, and r218600. These patches were pursuing
a flawed direction and causing miscompiles. Read on for details.

Fundamentally, the premise of this patch series was to map
VECTOR_SHUFFLE DAG nodes into VSELECT DAG nodes for all blends because
we are going to *have* to lower to VSELECT nodes for some blends to
trigger the instruction selection patterns of variable blend
instructions. This doesn't actually work out so well.

In order to match performance with the existing VECTOR_SHUFFLE
lowering code, we would need to re-slice the blend in order to fit it
into either the integer or floating point blends available on the ISA.
When coming from VECTOR_SHUFFLE (or other vNi1 style VSELECT sources)
this works well because the X86 backend ensures that these types of
operands to VSELECT get sign extended into '-1' and '0' for true and
false, allowing us to re-slice the bits in whatever granularity without
changing semantics.

However, if the VSELECT condition comes from some other source, for
example code lowering vector comparisons, it will likely only have the
required bit set -- the high bit. We can't blindly slice up this style
of VSELECT. Reid found some code using Halide that triggers this and I'm
hopeful to eventually get a test case, but I don't need it to understand
why this is A Bad Idea.

There is another aspect that makes this approach flawed. When in
VECTOR_SHUFFLE form, we have very distilled information that represents
the *constant* blend mask. Converting back to a VSELECT form actually
can lose this information, and so I think now that it is better to treat
this as VECTOR_SHUFFLE until the very last moment and only use VSELECT
nodes for instruction selection purposes.

My plan is to:
1) Clean up and formalize the target pre-legalization DAG combine that
   converts a VSELECT with a constant condition operand into
   a VECTOR_SHUFFLE.
2) Remove any fancy lowering from VSELECT during *legalization* relying
   entirely on the DAG combine to catch cases where we can match to an
   immediate-controlled blend instruction.

One additional step that I'm not planning on but would be interested in
others' opinions on: we could add an X86ISD::VSELECT or X86ISD::BLENDV
which encodes a fully legalized VSELECT node. Then it would be easy to
write isel patterns only in terms of this to ensure VECTOR_SHUFFLE
legalization only ever forms the fully legalized construct and we can't
cycle between it and VSELECT combining.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218658 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 02:52:28 +00:00
Chandler Carruth
52b072d73f [x86] Add some vector-register broadcast operations to the 256-bit v4
tests which were missing them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218657 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 02:32:36 +00:00
Matt Arsenault
cbb188bffc R600: Fix broken check lines, missing scalar case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218655 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 01:05:29 +00:00
Juergen Ributzka
a0af4b0271 [FastISel][AArch64] Fold sign-/zero-extends into the load instruction.
The sign-/zero-extension of the loaded value can be performed by the memory
instruction for free. If the result of the load has only one use and the use is
a sign-/zero-extend, then we emit the proper load instruction. The extend is
only a register copy and will be optimized away later on.

Other instructions that consume the sign-/zero-extended value are also made
aware of this fact, so they don't fold the extend too.

This fixes rdar://problem/18495928.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218653 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 00:49:58 +00:00
Eric Christopher
6a2169eb6f Add soft-float to the key for the subtarget lookup in the TargetMachine
map, this makes sure that we can compile the same code for two different
ABIs (hard and soft float) in the same module.

Update one testcase accordingly (and fix some confusing naming) and
add a new testcase as well with the ordering swapped which would
highlight the problem.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218632 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-29 21:57:54 +00:00
Matt Arsenault
49cbc1891b R600/SI: Also fix fsub + fadd a, a to mad combines
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218609 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-29 14:59:38 +00:00
Matt Arsenault
a5f45d5444 R600/SI: Fix using mad with multiplies by 2
These turn into fadds, so combine them into the target
mad node.

fadd (fadd (a, a), b) -> mad 2.0, a, b

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218608 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-29 14:59:34 +00:00
Chandler Carruth
8ac2f142a8 [x86] Make the new vector shuffle lowering lower blends as VSELECT
nodes, and rely exclusively on its logic. This removes a ton of
duplication from the blend lowering and centralizes it in one place.

One downside is that it requires a bunch of hacks to make this work with
the current legalization framework. We have to manually speculate one
aspect of legalizing VSELECT nodes to get everything to work nicely
because the existing legalization framework isn't *actually* bottom-up.

The other grossness is that we somewhat duplicate the analysis of
constant blends. I'm on the fence here. If reviewers thing this would
look better with VSELECT when it has constant operands dumping over tho
VECTOR_SHUFFLE, we could go that way. But it would be a substantial
change because currently all of the actual blend instructions are
matched via patterns in the TD files based around VSELECT nodes (despite
them not being perfect fits for that). Suggestions welcome, but at least
this removes the rampant duplication in the backend.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218600 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-29 09:57:07 +00:00
Chandler Carruth
d23f1883d3 [x86] Delete a bunch of really bad and totally unnecessary code in the
X86 target-specific DAG combining that tried to convert VSELECT nodes
into VECTOR_SHUFFLE nodes that it "knew" would lower into
immediate-controlled blend nodes.

Turns out, we have perfectly good lowering of all these VSELECT nodes,
and indeed that lowering already knows how to handle lowering through
BLENDI to immediate-controlled blend nodes. The code just wasn't getting
used much because this thing forced the world to go through the vector
shuffle lowering. Yuck.

This also exposes that I was too aggressive in avoiding domain crossing
in v218588 with that lowering -- when the other option is to expand into
two 128-bit vectors, it is worth domain crossing. Restore that behavior
now that we have nice tests covering it.

The test updates here fall into two camps. One is where previously we
ended up with an unsigned encoding of the blend operand and now we get
a signed encoding. In most of those places there were elaborate comments
explaining exactly what these operands really mean. Rather than that,
just switch these tests to use the nicely decoded comments that make it
obvious that the final shuffle matches.

The other updates are just removing pointless domain crossing by
blending integers with PBLENDW rather than BLENDPS.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218589 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-29 02:01:20 +00:00
Chandler Carruth
8e93ce1780 [x86] Add the dispatch skeleton to the new vector shuffle lowering for
AVX-512.

There is no interesting logic yet. Everything ends up eventually
delegating to the generic code to split the vector and shuffle the
halves. Interestingly, that logic does a significantly better job of
lowering all of these types than the generic vector expansion code does.
Mostly, it lets most of the cases fall back to nice AVX2 code rather
than all the way back to SSE code paths.

Step 2 of basic AVX-512 support in the new vector shuffle lowering. Next
up will be to incrementally add direct support for the basic instruction
set to each type (adding tests first).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218585 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-29 00:37:27 +00:00
Chandler Carruth
b61dfec824 [x86] Teach the new vector shuffle lowering to fall back on AVX-512
vectors.

Someone will need to build the AVX512 lowering, which should follow
AVX1 and AVX2 *very* closely for AVX512F and AVX512BW resp. I've added
a dummy test which is a port of the v8f32 and v8i32 tests from AVX and
AVX2 to v8f64 and v8i64 tests for AVX512F and AVX512BW. Hopefully this
is enough information for someone to implement proper lowering here. If
not, I'll be happy to help, but right now the AVX-512 support isn't
a priority for me.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218583 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-28 23:53:10 +00:00
Chandler Carruth
4f4280469c [x86] Fix the new vector shuffle lowering's use of VSELECT for AVX2
lowerings.

This was hopelessly broken. First, the x86 backend wants '-1' to be the
element value representing true in a boolean vector, and second the
operand order for VSELECT is backwards from the actual x86 instructions.
To make matters worse, the backend is just using '-1' as the true value
to get the high bit to be set. It doesn't actually symbolically map the
'-1' to anything. But on x86 this isn't quite how it works: there *only*
the high bit is relevant. As a consequence weird non-'-1' values like
0x80 actually "work" once you flip the operands to be backwards.

Anyways, thanks to Hal for helping me sort out what these *should* be.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218582 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-28 23:23:55 +00:00
Chandler Carruth
3f40848670 [x86] Fix a really silly bug that I introduced fixing another bug in the
new vector shuffle target DAG combines -- it helps to actually test for
the value you want rather than just using an integer in a boolean
context.

Have I mentioned that I loathe implicit conversions recently? :: sigh ::

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218576 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-28 06:11:04 +00:00
Chandler Carruth
21b69296fb [x86] Fix yet another bug in the new vector shuffle lowering's handling
of widening masks.

We can't widen a zeroing mask unless both elements that would be merged
are either zeroed or undef. This is the only way to widen a mask if it
has a zeroed element.

Also clean up the code here by ordering the checks in a more logical way
and by using the symoblic values for undef and zero. I'm actually torn
on using the symbolic values because the existing code is littered with
the assumption that -1 is undef, and moreover that entries '< 0' are the
special entries. While that works with the values given to these
constants, using the symbolic constants actually makes it a bit more
opaque why this is the case.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218575 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-28 03:30:25 +00:00
James Molloy
aada52189e [AArch64] Redundant store instructions should be removed as dead code
If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed.

This problem is found in spec2006-197.parser.

For example,
  stur    w10, [x11, #-4]
  stur    w10, [x11, #-4]
Then one of the two stur instructions can be removed.

Patch by David Xu!



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218569 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-27 17:02:54 +00:00
Chandler Carruth
72c3b07dfd [x86] Fix terrible bugs everywhere in the new vector shuffle lowering
and in the target shuffle combining when trying to widen vector
elements.

Previously only one of these was correct, and we didn't correctly
propagate zeroing target shuffle masks (which have a different sentinel
value from undef in non- target shuffle masks now). This isn't just
a missed optimization, this caused us to drop zeroing shuffles on the
floor and miscompile code. The added test case is one example of that.

There are other fixes to the test suite as a consequence of this as well
as restoring the undef elements in some of the masks that were lost when
I brought sanity to the actual *value* of the undef and zero sentinels.

I've also just cleaned up some of the PSHUFD and PSHUFLW and PSHUFHW
combining code, but that code really needs to go. It was a nice initial
attempt, but it isn't very principled and the recursive shuffle combiner
is much more powerful.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218562 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-27 04:42:44 +00:00
Chandler Carruth
8470b5b812 [x86] Flip the sentinel values used in the target shuffle mask decoding
to significantly more sane sentinels. Notably, everywhere else in the
backend's representation of shuffles uses '-1' to represent undef. The
target shuffle masks really shouldn't diverge from that, especially as
in a few places they are manipulated by shared code.

This causes us to lose some undef lanes in various test masks. I want to
get these back, but technically it isn't invalid and there are a *lot*
of bugs here so I want to try to establish a saner baseline for fixing
some of the bugs by aligning the specific senitnel values used.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218561 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-27 04:42:39 +00:00
Sanjay Patel
676af35b38 Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2).
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

And:

z = y / x

into:

z = y * rcpe(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.

Differential Revision: http://reviews.llvm.org/D5484



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218553 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 23:01:47 +00:00
Chandler Carruth
0a31a52b91 [x86] Fix a moderately terrifying bug in the new 128-bit shuffle logic
that managed to elude all of my fuzz testing historically. =/

Something changed to allow this code path to actually be exercised and
it was doing bad things. It is especially heavily exercised by the
patterns that emerge when doing AVX shuffles that end up lowered through
the 128-bit code path.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218540 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 20:41:45 +00:00
Matt Arsenault
5435c66a33 R600/SI: Add strict check lines to div_scale tests.
This has weird operand requirements so it's worthwhile
to have very strict checks for its operands.

Add different combinations of SGPR operands.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218535 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 17:55:11 +00:00
Matt Arsenault
d991d2217b R600/SI Allow same SGPR to be used for multiple operands
Instead of moving the first SGPR that is different than the first,
legalize the operand that requires the fewest moves if one
SGPR is used for multiple operands.

This saves extra moves and is also required for some instructions
which require that the same operand be used for multiple operands.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218532 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 17:55:03 +00:00
Matt Arsenault
aed12d4bad R600/SI: Partially move operand legalization to post-isel hook.
Disable the SGPR usage restriction parts of the DAG legalizeOperands.
It now should only be doing immediate folding until it can be replaced
later. The real legalization work is now done by the other
SIInstrInfo::legalizeOperands

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218531 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 17:54:59 +00:00
Matt Arsenault
8a70e28114 R600/SI: Don't move operands that are required to be SGPRs
e.g. v_cndmask_b32 requires the condition operand be an SGPR.
If one of the source operands were an SGPR, that would be considered
the one SGPR use and the condition operand would be illegally moved.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218529 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 17:54:52 +00:00
Matt Arsenault
26b2a7834e R600/SI: Fix using wrong operand indices when commuting
No test since the current SIISelLowering::legalizeOperands
effectively hides this, and the general uses seem to only fire
on SALU instructions which don't have modifiers between
the operands.

When trying to use legalizeOperands immediately after
instruction selection, it now sees a lot more patterns
it did not see before which break on this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218527 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 17:54:43 +00:00
Chandler Carruth
7929a210d5 [x86] In the new vector shuffle lowering, when trying to do another
layer of tie-breaking sorting, it really helps to check that you're in
a tie first. =] Otherwise the whole thing cycles infinitely. Test case
added, another one found through fuzz testing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218523 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 17:24:26 +00:00
Chandler Carruth
7164a4ae0a [x86] Fix a large collection of bugs that crept in as I fleshed out the
AVX support.

New test cases included. Note that none of the existing test cases
covered these buggy code paths. =/ Also, it is clear from this that
SHUFPS and SHUFPD are the most bug prone shuffle instructions in x86. =[

These were all detected by fuzz-testing. (I <3 fuzz testing.)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218522 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 17:11:02 +00:00
Robert Khasanov
26ba182fdf [AVX512] Added load/store from BW/VL subsets to Register2Memory opcode tables.
Added lowering tests for these instructions.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218508 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 09:48:50 +00:00
David Xu
abf5bf221f Revert patch of r218493, delete the test case
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218495 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 02:40:54 +00:00
David Xu
c41ae2a5c4 Redundant store instructions should be removed as dead code
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218493 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 02:02:09 +00:00
Eric Christopher
55a90ab4ef Add the first backend support for on demand subtarget creation
based on the Function. This is currently used to implement
mips16 support in the mips backend via the existing module
pass resetting the subtarget.

Things to note:

a) This involved running resetTargetOptions before creating a
new subtarget so that code generation options like soft-float
could be recognized when creating the new subtarget. This is
to deal with initialization code in isel lowering that only
paid attention to the initial value.

b) Many of the existing testcases weren't using the soft-float
feature correctly. I've corrected these based on the check
values assuming that was the desired behavior.

c) The mips port now pays attention to the target-cpu and
target-features strings when generating code for a particular
function. I've removed these from one function where the
requested cpu and features didn't match the check lines in
the testcase.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218492 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 01:44:08 +00:00
Matt Arsenault
deaa9d8c72 R600: Avoid repeated check lines
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218487 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 01:12:36 +00:00
Matt Arsenault
584886c0bb R600/SI: Fix emitting trailing whitespace after s_waitcnt
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218486 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 01:09:46 +00:00
Adam Nemet
2f3ccfc257 [AVX512] Make vextract*x4/vinsert*x4 tests check for the index as well
Extend test so that it provides coverage for the next commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218479 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 23:48:47 +00:00
Matt Arsenault
3011a602be R600: Fix some missing conversion testcases
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218474 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 23:16:18 +00:00
Matt Arsenault
556ae0484a Remove duplicated RUN lines in middle of test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218473 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 23:16:14 +00:00
Bruno Cardoso Lopes
f4230250a1 [MachineSink+PGO] Teach MachineSink to use BlockFrequencyInfo
Machine Sink uses loop depth information to select between successors BBs to
sink machine instructions into, where BBs within smaller loop depths are
preferable.  This patch adds support for choosing between successors by using
profile information from BlockFrequencyInfo instead, whenever the information
is available.

Tested it under SPEC2006 train (average of 30 runs for each program); ~1.5%
execution speedup in average on x86-64 darwin.

<rdar://problem/18021659>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218472 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 23:14:26 +00:00
Tom Stellard
29d48e6a49 R600/SI: Add support for global atomic add
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218457 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 18:30:26 +00:00
Robin Morisset
79826e015e Lower idempotent RMWs to fence+load
Summary:
I originally tried doing this specifically for X86 in the backend in D5091,
but it was rather brittle and generally running too late to be general.
Furthermore, other targets may want to implement similar optimizations.
So I reimplemented it at the IR-level, fitting it into AtomicExpandPass
as it interacts with that pass (which could not be cleanly done before
at the backend level).

This optimization relies on a new target hook, which is only used by X86
for now, as the correctness of the optimization on other targets remains
an open question. If it is found correct on other targets, it should be
trivial to enable for them.

Details of the optimization are discussed in D5091.

Test Plan: make check-all + a new test

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5422

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218455 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 17:27:43 +00:00
Sid Manning
733681d3bd Add missing attributes !cmp.[eq,gt,gtu] instructions.
These instructions do not indicate they are extendable or the
number of bits in the extendable operand.  Rename to match
architected names.  Add a testcase for the intrinsics.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218453 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 13:09:54 +00:00
Daniel Sanders
03fe69e90d [mips] Add CCValAssign::[ASZ]ExtUpper and CCPromoteToUpperBitsInType and handle struct's correctly on big-endian N32/N64 return values.
Summary:
The N32/N64 ABI's require that structs passed in registers are laid out
such that spilling the register with 'sd' places the struct at the lowest
address. For little endian this is trivial but for big-endian it requires
that structs are shifted into the upper bits of the register.

We also require that structs passed in registers have the 'inreg'
attribute for big-endian N32/N64 to work correctly. This is because the
tablegen-erated calling convention implementation only has access to the
lowered form of struct arguments (one or more integers of up to 64-bits
each) and is unable to determine the original type.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5286

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218451 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 12:15:05 +00:00
Chandler Carruth
4b667ee436 [x86] Teach the new vector shuffle lowering to use AVX2 instructions for
v4f64 and v8f32 shuffles when they are lane-crossing. We have fully
general lane-crossing permutation functions in AVX2 that make this easy.

Part of this also changes exactly when and how these vectors are split
up when we don't have AVX2. This isn't always a win but it usually is
a win, so on the balance I think its better. The primary regressions are
all things that just need to be fixed anyways such as modeling when
a blend can be completely accomplished via VINSERTF128, etc.

Also, this highlights one of the few remaining big features: we do
a really poor job of inserting elements into AVX registers efficiently.

This completes almost all of the big tricks I have in mind for AVX2. The
only things left that I plan to add:

1) element insertion smarts
2) palignr and other fairly specialized lowerings when they happen to
   apply

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218449 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 11:03:55 +00:00
Chandler Carruth
05901d80ba [x86] Teach the new vector shuffle lowering a fancier way to lower
256-bit vectors with lane-crossing.

Rather than immediately decomposing to 128-bit vectors, try flipping the
256-bit vector lanes, shuffling them and blending them together. This
reduces our worst case shuffle by a pretty significant margin across the
board.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218446 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 10:21:15 +00:00
Chandler Carruth
2e8d2c727c [x86] Fix an oversight in the v8i32 path of the new vector shuffle
lowering where it only used the mask of the low 128-bit lane rather than
the entire mask.

This allows the new lowering to correctly match the unpack patterns for
v8i32 vectors.

For reference, the reason that we check for the the entire mask rather
than checking the repeated mask is because the repeated masks don't
abide by all of the invariants of normal masks. As a consequence, it is
safer to use the full mask with functions like the generic equivalence
test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218442 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 04:10:27 +00:00
Chandler Carruth
e3bb4bb2d5 [x86] Implement AVX2 support for v32i8 in the new vector shuffle
lowering.

This completes the basic AVX2 feature support, but there are still some
improvements I'd like to do to really get the last mile of performance
here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218440 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 02:52:12 +00:00
Chandler Carruth
1d63231455 [x86] More tweaks to the v32i8 test cases.
I made a mistake in the previous commit and produced the wrong pattern.
Fix that. Also make one more shuffle pattern byte-based rather than
word-based, and add two more blend patterns.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218439 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 02:44:39 +00:00
Chandler Carruth
a87d04a759 [x86] Re-work a bunch of the v32i8 test cases to actually involve byte
shuffles rather than word shuffles.

As you might guess, these were built starting from the word shuffle test
cases and I failed to properly port a bunch of them and left them as
widened word shuffle test cases. We still have a couple of tests that
check our ability to widen shuffles, but now we will test the actual
byte shuffle quite a bit better.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218438 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 02:20:02 +00:00
Chandler Carruth
ef673b3c73 [x86] Fix the v16i16 blend logic I added in the prior commit and add the
missing test cases for it.

Unsurprisingly, without test cases, there were bugs here. Surprisingly,
this bug wasn't caught at compile time. Yep, there is an X86ISD::BLENDV.
It isn't wired to anything. Oops. I'll fix than next.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218434 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 01:13:38 +00:00
Akira Hatanaka
0253523c92 [X86,AVX] Add an isel pattern for X86VBroadcast.
This fixes PR21050 and rdar://problem/18434607.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218431 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 00:26:15 +00:00
Chandler Carruth
bdecfeb723 [x86] Implement v16i16 support with AVX2 in the new vector shuffle
lowering.

This also implements the fancy blend lowering for v16i16 using AVX2 and
teaches the X86 backend to print shuffle masks for 256-bit PSHUFB
and PBLENDW instructions. It also makes the mask decoding correct for
PBLENDW instructions. The yaks, they are legion.

Tests are updated accordingly. There are some missing tests for the
VBLENDVB lowering, but I'll add those in a follow-up as this commit has
accumulated enough cruft already.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218430 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 00:24:19 +00:00
Moritz Roth
8c4e64af8a [Thumb] Make load/store optimizer less conservative.
If it's safe to clobber the condition flags, we can do a few extra things:
it's then possible to reset the base register writeback using a SUBS, so
we can try to merge even if the base register isn't dead after the merged
instruction.

This is effectively a (heavily bug-fixed) rewrite of r208992.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218386 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 16:35:50 +00:00
Chandler Carruth
10cd8098a7 [x86] Teach the instruction lowering to add comments describing constant
pool data being loaded into a vector register.

The comments take the form of:

  # ymm0 = [a,b,c,d,...]
  # xmm1 = <x,y,z...>

The []s are used for generic sequential data and the <>s are used for
specifically ConstantVector loads. Undef elements are printed as the
letter 'u', integers in decimal, and floating point values as floating
point values. Suggestions on improving the formatting or other aspects
of the display are very welcome.

My primary use case for this is to be able to FileCheck test masks
passed to vector shuffle instructions in-register. It isn't fantastic
for that (no decoding special zeroing semantics or other tricks), but it
at least puts the mask onto an instruction line that could reasonably be
checked. I've updated many of the new vector shuffle lowering tests to
leverage this in their test cases so that we're actually checking the
shuffle masks remain as expected.

Before implementing this, I tried a *bunch* of different approaches.
I looked into teaching the MCInstLower code to scan up the basic block
and find a definition of a register used in a shuffle instruction and
then decode that, but this seems incredibly brittle and complex.
I talked to Hal a lot about the "right" way to do this: attach the raw
shuffle mask to the instruction itself in some form of unencoded
operands, and then use that to emit the comments. I still think that's
the optimal solution here, but it proved to be beyond what I'm up for
here. In particular, it seems likely best done by completing the
plumbing of metadata through these layers and attaching the shuffle mask
in metadata which could have fully automatic dropping when encoding an
actual instruction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218377 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 09:39:41 +00:00
Matt Arsenault
0bb38df86c R600/SI: Fix weird CHECK-DAG usage
This prevents these from failing in a future commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218356 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 02:14:26 +00:00
Tom Stellard
81c6c9690a R600/SI: Enable selecting SALU inside branches
We can do this now that the FixSGPRLiveRanges pass is working.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218353 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 01:33:28 +00:00
Chandler Carruth
6717f9d907 [x86] Teach the new vector shuffle lowering to lower v8i32 shuffles with
the native AVX2 instructions.

Note that the test case is really frustrating here because VPERMD
requires the mask to be in the register input and we don't produce
a comment looking through that to the constant pool. I'm going to
attempt to improve this in a subsequent commit, but not sure if I will
succeed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218347 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 01:24:44 +00:00
Chandler Carruth
8415f84e49 [x86] Fix a really terrible bug in the repeated 128-bin-lane shuffle
detection. It was incorrectly handling undef lanes by actually treating
an undef lane in the first 128-bit lane as a *numeric* shuffle value.

Fortunately, this almost always DTRT and disabled detecting repeated
patterns. But not always. =/ This patch introduces a much more
principled approach and fixes the miscompiles I spotted by inspection
previously.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218346 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 01:03:57 +00:00
Robin Morisset
73ce2886b1 Fix swift-atomics testcase
This testcase was not testing what it meant: because there were only two checks for
dmb {{ish}} in the second function, it could have missed a bug where one of the three
required dmb {{ish}} became dmb {{ishst}}. As I was fixing it, I also added
CHECK-LABELs to make it a bit less brittle.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218341 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 23:18:01 +00:00
Chandler Carruth
30ce74b5e3 [x86] Teach the new vector shuffle lowering to lower v4i64 vector
shuffles using the AVX2 instructions. This is the first step of cutting
in real AVX2 support.

Note that I have spotted at least one bug in the test cases already, but
I suspect it was already present and just is getting surfaced. Will
investigate next.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218338 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 22:39:02 +00:00
Chandler Carruth
798f2849c3 [x86] Teach the rest of the 'target shuffle' machinery about blends and
add VPBLENDD to the InstPrinter's comment generation so we get nice
comments everywhere.

Now that we have the nice comments, I can see the bug introduced by
a silly typo in the commit that enabled VPBLENDD, and have fixed it. Yay
tests that are easy to inspect.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218335 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 22:14:14 +00:00
Robin Morisset
30e7514d01 [X86] Make wide loads be managed by AtomicExpand
Summary:
AtomicExpand already had logic for expanding wide loads and stores on LL/SC
architectures, and for expanding wide stores on CmpXchg architectures, but
not for wide loads on CmpXchg architectures. This patch fills this hole,
and makes use of this new feature in the X86 backend.

Only one functionnal change: we now lose the SynchScope attribute.
It is regrettable, but I have another patch that I will submit soon that will
solve this for all of AtomicExpand (it seemed better to split it apart as it
is a different concern).

Test Plan: make check-all (lots of tests for this functionality already exist)

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5404

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218332 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 20:59:25 +00:00
Robin Morisset
58bca6e8ec [Power] Use AtomicExpandPass for fence insertion, and use lwsync where appropriate
Summary:
This patch makes use of AtomicExpandPass in Power for inserting fences around
atomic as part of an effort to remove fence insertion from SelectionDAGBuilder.
As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic
lwsync) instead of sync 0 (heavyweight sync) in many cases.

I also added a test, as there was no test for the barriers emitted by the Power
backend for atomic loads and stores.

Test Plan: new test + make check-all

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5180

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218331 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 20:46:49 +00:00
Chandler Carruth
7024c7e949 [x86] Teach the new shuffle lowering's blend functionality to use AVX2's
VPBLENDD where appropriate even on 128-bit vectors.

According to Agner's tables, this instruction is significantly higher
throughput (can execute on any port) on Haswell chips so we should
aggressively try to form it when available.

Sadly, this loses our delightful shuffle comments. I'll add those back
for VPBLENDD next.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218322 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 18:16:12 +00:00
Chandler Carruth
4850be49a3 [x86] Teach the vector comment parsing and printing to correctly handle
undef in the shuffle mask. This shows up when we're printing comments
during lowering and we still have an IR-level constant hanging around
that models undef.

A nice consequence of this is *much* prettier test cases where the undef
lanes actually show up as undef rather than as a particular set of
values. This also allows us to print shuffle comments in cases that use
undef such as the recently added variable VPERMILPS lowering. Now those
test cases have nice shuffle comments attached with their details.

The shuffle lowering for PSHUFB has been augmented to use undef, and the
shuffle combining has been augmented to comprehend it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218301 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 11:15:19 +00:00
Chandler Carruth
8f637786d8 [x86] Teach the AVX1 path of the new vector shuffle lowering one more
trick that I missed.

VPERMILPS has a non-immediate memory operand mode that allows it to do
asymetric shuffles in the two 128-bit lanes. Use this rather than two
shuffles and a blend.

However, it turns out the variable shuffle path to VPERMILPS (and
VPERMILPD, although that one offers no functional differenc from the
immediate operand other than variability) wasn't even plumbed through
codegen. Do such plumbing so that we can reasonably emit
a variable-masked VPERMILP instruction. Also plumb basic comment parsing
and printing through so that the tests are reasonable.

There are still a few tests which don't show the shuffle pattern. These
are tests with undef lanes. I'll teach the shuffle decoding and printing
to handle undef mask entries in a follow-up. I've looked at the masks
and they seem reasonable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218300 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 10:08:29 +00:00
Sanjay Patel
c4ef4e47c2 tighten up checks
We manage to generate all of the matching instructions (and a lot more) via
the reciprocal optimization function - even if we completely remove the square
root optimization. With CHECK_NEXT, we assure that we're executing the
expected square root optimization paths and not generating extra insts.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218284 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 22:46:44 +00:00
Sanjay Patel
90969b9ee0 remove unnecessary labels; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218278 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 21:52:53 +00:00
Juergen Ributzka
af989653e0 [FastISel][AArch64] Also allow folding of sign-/zero-extend and shift-left for booleans (i1).
Shift-left immediate with sign-/zero-extensions also works for boolean values.
Update the assert and the test cases to reflect that fact.

This should fix a bug found by Chad.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218275 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 21:08:53 +00:00
Chandler Carruth
56c7cfe41f [x86] Introduce tests covering the gamut of 256-bit vector shuffling.
These are just test cases, no actual code yet. This establishes the
baseline fallback strategy we're starting from on AVX2 and the expected
lowering we use on AVX1.

Also, these test cases are very much generated. I've manually crafted
the specific pattern set that I'm hoping will be useful at exercising
the lowering code, but I've not (and could not) manually verify *all* of
these. I've spot checked and they seem legit to me.

As with the rest of vector shuffling, at a certain point the only really
useful way to check the correctness of this stuff is through fuzz
testing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218267 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 20:25:08 +00:00
Sanjay Patel
6539887847 Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2).
We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors.
This patch should preserve all existing behavior with regular optimization levels, 
but also use splats whenever possible when optimizing for *size* on any CPU with AVX or AVX2.

The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save
at least 8 bytes (up to 31 bytes) of constant pool data.

Differential Revision: http://reviews.llvm.org/D5347



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218263 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 18:54:01 +00:00
Akira Hatanaka
73c604b290 Fix test case commited in r218242 to appease buildbot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218261 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 18:07:20 +00:00
Tom Stellard
e1bc40b1e6 Revert "R600/SI: Add support for global atomic add"
This reverts commit r218254.

The global_atomics.ll test fails with asserts disabled.  For some reason,
the compiler fails to produce the atomic no return variants.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218257 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 16:44:04 +00:00
Tom Stellard
6d625ad495 R600/SI: Add support for global atomic add
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218254 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 15:35:35 +00:00
Pavel Chupin
25c57d5cfe [x32] Fix segmented stacks support
Summary:
Update segmented-stacks*.ll tests with x32 target case and make
corresponding changes to make them pass.

Test Plan: tests updated with x32 target

Reviewers: nadav, rafael, dschuff

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D5245

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218247 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 13:11:35 +00:00
Robert Lougher
2ee97f03a4 Fix assert when decoding PSHUFB mask
The PSHUFB mask decode routine used to assert if the mask index was out of
range (<0 or greater than the size of the vector).  The problem is, we can
legitimately have a PSHUFB with a large index using intrinsics.  The
instruction only uses the least significant 4 bits.  This change removes the
assert and masks the index to match the instruction behaviour.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218242 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 11:54:38 +00:00
Chandler Carruth
ec35919c9a [x86] Move the AVX v4i64 test cases down to group them together.
Increasingly I don't want to mix the integer and floating point tests,
especially with AVX where they are handled quite differently.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218233 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 03:05:23 +00:00
Chandler Carruth
de95c380c7 [x86] Back out a bad choice about lowering v4i64 and pave the way for
a more sane approach to AVX2 support.

Fundamentally, there is no useful way to lower integer vectors in AVX.
None. We always end up with a VINSERTF128 in the end, so we might as
well eagerly switch to the floating point domain and do everything
there. This cleans up lots of weird and unlikely to be correct
differences between integer and floating point shuffles when we only
have AVX1.

The other nice consequence is that by doing things this way we will make
it much easier to write the integer lowering routines as we won't need
to duplicate the logic to check for AVX vs. AVX2 in each one -- if we
actually try to lower a 256-bit vector as an integer vector, we have
AVX2 and can rely on it. I think this will make the code much simpler
and more comprehensible.

Currently, I've disabled *all* support for AVX2 so that we always fall
back to AVX. This keeps everything working rather than asserting. That
will go away with the subsequent series of patches that provide
a baseline AVX2 implementation.

Please note, I'm going to implement AVX2 *without access to hardware*.
That means I cannot correctness test this path. I will be relying on
those with access to AVX2 hardware to do correctness testing and fix
bugs here, but as a courtesy I'm trying to sketch out the framework for
the new-style vector shuffle lowering in the context of the AVX2 ISA.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218228 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 00:32:15 +00:00
Chandler Carruth
37bb4b0365 [x86] Teach the new vector shuffle lowering how to cleverly lower single
input v8f32 shuffles which are not 128-bit lane crossing but have
different shuffle patterns in the low and high lanes. This removes most
of the extract/insert traffic that was unnecessary and is particularly
good at lowering cases where only one of the two lanes is shuffled at
all.

I've also added a collection of test cases with undef lanes because this
lowering is somewhat more sensitive to undef lanes than others.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218226 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 23:46:13 +00:00
Chandler Carruth
7da57cf5b4 [x86] Add a bunch of test cases where we have different shuffle patterns
in the high and low 128-bit lanes of a v8f32 vector.

No functionality change yet, but wanted to set up the baseline for my
next patch which will make these quite a bit better. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218224 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 23:32:42 +00:00
Chandler Carruth
974542d7d8 [x86] Teach the new vector shuffle lowering to re-use the SHUFPS
lowering when it can use a symmetric SHUFPS across both 128-bit lanes.

This required making the SHUFPS lowering tolerant of other vector types,
and adjusting our canonicalization to canonicalize harder.

This is the last of the clever uses of symmetry I've thought of for
v8f32. The rest of the tricks I'm aware of here are to work around
assymetry in the mask.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218216 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 13:35:14 +00:00
Chandler Carruth
1a5f7f54f4 [x86] Teach the new vector shuffle lowering the basics about insertion
of a single element into a zero vector for v4f64 and v4i64 in AVX.
Ironically, there is less to see here because xor+blend is so crazy fast
that we can't really beat that to zero the high 128-bit lane.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218214 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 12:49:46 +00:00
Chandler Carruth
6ef31b0079 [x86] Teach the new vector shuffle lowering how to lower to UNPCKLPS and
UNPCKHPS with AVX vectors by recognizing those patterns when they are
repeated for both 128-bit lanes.

With this, we now generate the exact same (really nice) code for
Quentin's avx_test_case.ll which was the most significant regression
reported for the new shuffle lowering. In fact, I'm out of specific test
cases for AVX lowering, the rest were AVX2 I think. However, there are
a bunch of pretty obvious remaining things to improve with AVX...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218213 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 12:20:44 +00:00
Chandler Carruth
7a94357b04 [x86] Add test cases for UNPCK instructions with v8f32 AVX vectors in
preparation for enhancing their support in the new vector shuffle
lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218212 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 12:13:11 +00:00
Chandler Carruth
7922d3e39a [x86] Begin teaching the new vector shuffle lowering among the most
important bits of cleverness: to detect and lower repeated shuffle
patterns between the two 128-bit lanes with a single instruction.

This patch just teaches it how to lower single-input shuffles that fit
this model using VPERMILPS. =] There is more that needs to happen here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218211 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 12:01:19 +00:00
Chandler Carruth
e4cb9d5f25 [x86] Regenerate this test case now that I've improved my script for
generating the test cases to format things more consistently and
actually catch all the operand sequences that should be elided in favor
of the asm comments. No actual changes here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218210 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 11:51:33 +00:00
Chandler Carruth
29720a4bad [x86] Teach the new vector shuffle lowering of v4f64 to prefer a direct
VBLENDPD over using VSHUFPD. While the 256-bit variant of VBLENDPD slows
down to the same speed as VSHUFPD on Sandy Bridge CPUs, it has twice the
reciprocal throughput on Ivy Bridge CPUs much like it does everywhere
for 128-bits. There isn't a downside, so just eagerly use this
instruction when it suffices.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218208 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 11:17:55 +00:00
Chandler Carruth
0dd52092d0 [x86] Add some more comprehensive tests for v4f64 blending.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218207 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 11:12:19 +00:00
Chandler Carruth
57191b0b48 [x86] Re-generate a bunch of the v4f64 test cases with my new script.
This expands the integer cases to cover the fact that AVX2 moves their
lane-crossing shuffles into the integer domain. It also adds proper
support for AVX2 run lines and the "ALL" group when it doesn't matter.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218206 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 11:07:41 +00:00
Chandler Carruth
291140b112 [x86] Teach the new vector shuffle lowering the first step toward more
actual support for complex AVX shuffling tricks. We can do independent
blends of the low and high 128-bit lanes of an avx vector, so shuffle
the inputs into place and then do the blend at 256 bits. This will in
many cases remove one blend instruction.

The next step is to permute the low and high halves in-place rather than
extracting them and re-inserting them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218202 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 09:35:22 +00:00
Chandler Carruth
1ca1e33c3a [x86] Add some more test cases covering specific blend patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218200 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 09:01:26 +00:00
Chandler Carruth
b7ef7f97a8 [x86] Add the beginnings of some tests for our v8f32 shuffle lowering
under AVX.

This really just documents the current state of the world. I'm going to
try to flesh it out to cover any test cases I plan to improve prior to
improving them so that the delta made by changes is actually visible to
code reviewers.

This is made easier by the fact that I now have a script to automate the
process of producing test cases including the check lines. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218199 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 08:49:27 +00:00
Chandler Carruth
ae464b2ba1 [x86] Teach the new vector shuffle lowering to use VPERMILPD for
single-input shuffles with doubles. This allows them to fold memory
operands into the shuffle, etc. This is just the analog to the v4f32
case in my prior commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218193 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 22:09:27 +00:00
Chandler Carruth
479d0ba62b [x86] Add an AVX run to the 128-bit v2 tests, teach them to have
a generic SSE and AVX mode in addition to a specific AVX1 test path, and
flesh out the AVX tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218192 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 21:26:41 +00:00
David Majnemer
182c8ff6c0 Update tests which broke from r218189
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218191 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 21:18:43 +00:00
Chandler Carruth
9c7ffd20df [x86] Teach the new vector shuffle lowering to use the AVX VPERMILPS
instruction for single-vector floating point shuffles. This in turn
allows the shuffles to fold a load into the instruction which is one of
the common regressions hit with the new shuffle lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218190 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 20:52:07 +00:00
Chandler Carruth
cc727ec92a [x86] Start moving to a fancier check syntax to reduce the need for
duplication of check lines. The idea is to have broad sets of
compilation modes that will frequently diverge without having to always
and immediately explode to the precise ISA feature set.

While this already helps due to VEX encoded differences, it will help
much more as I teach the new shuffle lowering about more of the new VEX
encoded instructions which can still be used to implement 128-bit
shuffles.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218188 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 18:36:39 +00:00
Chandler Carruth
c16105b078 [x86] Teach the v4f32 path of the new shuffle lowering to handle the
tricky case of single-element insertion into the zero lane of a zero
vector.

We can't just use the same pattern here as we do in every other vector
type because the general insertion logic can handle insertion into the
non-zero lane of the vector. However, in SSE4.1 with v4f32 vectors we
have INSERTPS that is a much better choice than the generic one for such
lowerings. But INSERTPS can do lots of other lowerings as well so
factoring its logic into the general insertion logic doesn't work very
well. We also can't just extract the core common part of the general
insertion logic that is faster (forming VZEXT_MOVL synthetic nodes that
lower to MOVSS when they can) because VZEXT_MOVL is often *faster* than
a blend while INSERTPS is slower! So instead we do a restrictive
condition on attempting to use the generic insertion logic to narrow it
to those cases where VZEXT_MOVL won't need a shuffle afterward and thus
will do better than INSERTPS. Then we try blending. Then we go back to
INSERTPS.

This still doesn't generate perfect code for some silly reasons that can
be fixed by tweaking the td files for lowering VZEXT_MOVL to use
XORPS+BLENDPS when available rather than XORPS+MOVSS when the input ends
up in a register rather than a load from memory -- BLENDPSrr has twice
the reciprocal throughput of MOVSSrr. Don't you love this ISA?

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218177 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 04:15:22 +00:00
Chandler Carruth
cc62abbe39 [x86] Generalize the single-element insertion lowering to work with
floating point types and use it for both v2f64 and v2i64 single-element
insertion lowering.

This fixes the last non-AVX performance regression test case I've gotten
of for the new vector shuffle lowering. There is obvious analogous
lowering for v4f32 that I'll add in a follow-up patch (because with
INSERTPS, v4f32 requires special treatment). After that, its AVX stuff.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218175 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 03:32:25 +00:00
Peter Collingbourne
87f7e75e58 Fix crash with an insertvalue that produces an empty object.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218171 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 00:10:47 +00:00
Matt Arsenault
1a505ebae4 R600: Un-xfail a test which passes with pass disabled
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218165 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 23:02:20 +00:00
Matt Arsenault
ea3a0242f4 R600/SI: Un-xfail tests which work now
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218164 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 23:02:18 +00:00