10876 Commits

Author SHA1 Message Date
Chandler Carruth
db5e1dafa4 [x86] Bypass the shuffle mask comment generation when not using verbose
asm. This can be somewhat expensive and there is no reason to do it
outside of tests or debugging sessions. I'm also likely to make it
significantly more expensive to support more styles of shuffles.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218362 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 03:06:34 +00:00
Chandler Carruth
7e0f903c6c [x86] Hoist the logic for extracting the relevant bits of information
from the MachineInstr into the caller which is already doing a switch
over the instruction.

This will make it more clear how to compute different operands to feed
the comment selection for example.

Also, in a drive-by-fix, don't append an empty comment string (which is
a no-op ultimately).

No functionality changed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218361 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 02:24:41 +00:00
Chandler Carruth
5f671bae7c [x86] Start refactoring the comment printing logic in the MC lowering of
vector shuffles.

This is just the beginning by hoisting it into its own function and
making use of early exit to dramatically simplify the flow of the
function. I'm going to be incrementally refactoring this until it is
a bit less magical how this applies to other instructions, and I can
teach it how to dig a shuffle mask out of a register. Then I plan to
hook it up to VPERMD so we get our mask comments for it.

No functionality changed yet.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218357 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 02:16:12 +00:00
Chandler Carruth
6717f9d907 [x86] Teach the new vector shuffle lowering to lower v8i32 shuffles with
the native AVX2 instructions.

Note that the test case is really frustrating here because VPERMD
requires the mask to be in the register input and we don't produce
a comment looking through that to the constant pool. I'm going to
attempt to improve this in a subsequent commit, but not sure if I will
succeed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218347 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 01:24:44 +00:00
Chandler Carruth
8415f84e49 [x86] Fix a really terrible bug in the repeated 128-bin-lane shuffle
detection. It was incorrectly handling undef lanes by actually treating
an undef lane in the first 128-bit lane as a *numeric* shuffle value.

Fortunately, this almost always DTRT and disabled detecting repeated
patterns. But not always. =/ This patch introduces a much more
principled approach and fixes the miscompiles I spotted by inspection
previously.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218346 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 01:03:57 +00:00
Chandler Carruth
30ce74b5e3 [x86] Teach the new vector shuffle lowering to lower v4i64 vector
shuffles using the AVX2 instructions. This is the first step of cutting
in real AVX2 support.

Note that I have spotted at least one bug in the test cases already, but
I suspect it was already present and just is getting surfaced. Will
investigate next.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218338 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 22:39:02 +00:00
Chandler Carruth
798f2849c3 [x86] Teach the rest of the 'target shuffle' machinery about blends and
add VPBLENDD to the InstPrinter's comment generation so we get nice
comments everywhere.

Now that we have the nice comments, I can see the bug introduced by
a silly typo in the commit that enabled VPBLENDD, and have fixed it. Yay
tests that are easy to inspect.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218335 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 22:14:14 +00:00
Robin Morisset
30e7514d01 [X86] Make wide loads be managed by AtomicExpand
Summary:
AtomicExpand already had logic for expanding wide loads and stores on LL/SC
architectures, and for expanding wide stores on CmpXchg architectures, but
not for wide loads on CmpXchg architectures. This patch fills this hole,
and makes use of this new feature in the X86 backend.

Only one functionnal change: we now lose the SynchScope attribute.
It is regrettable, but I have another patch that I will submit soon that will
solve this for all of AtomicExpand (it seemed better to split it apart as it
is a different concern).

Test Plan: make check-all (lots of tests for this functionality already exist)

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5404

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218332 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 20:59:25 +00:00
Chandler Carruth
7024c7e949 [x86] Teach the new shuffle lowering's blend functionality to use AVX2's
VPBLENDD where appropriate even on 128-bit vectors.

According to Agner's tables, this instruction is significantly higher
throughput (can execute on any port) on Haswell chips so we should
aggressively try to form it when available.

Sadly, this loses our delightful shuffle comments. I'll add those back
for VPBLENDD next.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218322 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 18:16:12 +00:00
Lang Hames
3025b00b7f [MCJIT] Nuke MachineRelocation and MachineCodeEmitter. Now that the old JIT is
gone they're no longer needed.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218320 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 18:08:47 +00:00
Chandler Carruth
4850be49a3 [x86] Teach the vector comment parsing and printing to correctly handle
undef in the shuffle mask. This shows up when we're printing comments
during lowering and we still have an IR-level constant hanging around
that models undef.

A nice consequence of this is *much* prettier test cases where the undef
lanes actually show up as undef rather than as a particular set of
values. This also allows us to print shuffle comments in cases that use
undef such as the recently added variable VPERMILPS lowering. Now those
test cases have nice shuffle comments attached with their details.

The shuffle lowering for PSHUFB has been augmented to use undef, and the
shuffle combining has been augmented to comprehend it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218301 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 11:15:19 +00:00
Chandler Carruth
8f637786d8 [x86] Teach the AVX1 path of the new vector shuffle lowering one more
trick that I missed.

VPERMILPS has a non-immediate memory operand mode that allows it to do
asymetric shuffles in the two 128-bit lanes. Use this rather than two
shuffles and a blend.

However, it turns out the variable shuffle path to VPERMILPS (and
VPERMILPD, although that one offers no functional differenc from the
immediate operand other than variability) wasn't even plumbed through
codegen. Do such plumbing so that we can reasonably emit
a variable-masked VPERMILP instruction. Also plumb basic comment parsing
and printing through so that the tests are reasonable.

There are still a few tests which don't show the shuffle pattern. These
are tests with undef lanes. I'll teach the shuffle decoding and printing
to handle undef mask entries in a follow-up. I've looked at the masks
and they seem reasonable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218300 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 10:08:29 +00:00
Chandler Carruth
4b365159bf [x86] Rename X86ISD::VPERMILP to X86ISD::VPERMILPI (and the same for the
td pattern). Currently we only model the immediate operand variation of
VPERMILPS and VPERMILPD, we should make that clear in the pseudos used.
Will be adding support for the variable mask variant in my next commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218282 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 22:29:42 +00:00
Kaelyn Takata
cdc451b1ae Fix a "typo" from my previous commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218281 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 22:17:59 +00:00
Kaelyn Takata
1488ba63fe Silence unused variable warnings in the new stub functions that occur
when assertions are disabled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218280 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 22:14:13 +00:00
Chandler Carruth
8571ae37ae [x86] Stub out the integer lowering of 256-bit vectors with AVX2
support. No interesting functionality yet, but this will let me
implement one vector type at a time.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218277 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 21:45:57 +00:00
Ehsan Akhgari
e6f6980d5b ms-inline-asm: Fix parsing label names inside bracket expressions
Summary:
This fixes a couple of issues.  One is ensuring that AOK_Label rewrite
rules have a lower priority than AOK_Skip rules, as AOK_Skip needs to
be able to skip the brackets properly.  The other part of the fix ensures
that we don't overwrite Identifier when looking up the identifier, and
that we use the locally available information to generate the AOK_Label
rewrite in ParseIntelIdentifier.  Doing that in CreateMemForInlineAsm
would be problematic since the Start location there may point to the
beginning of a bracket expression, and not necessarily the beginning of
an identifier.

This also means that we don't need to carry around the InternlName field,
which helps simplify the code.

Test Plan: This will be tested on the clang side.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5445

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218270 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 20:40:36 +00:00
Sanjay Patel
6539887847 Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2).
We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors.
This patch should preserve all existing behavior with regular optimization levels, 
but also use splats whenever possible when optimizing for *size* on any CPU with AVX or AVX2.

The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save
at least 8 bytes (up to 31 bytes) of constant pool data.

Differential Revision: http://reviews.llvm.org/D5347



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218263 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 18:54:01 +00:00
Pavel Chupin
25c57d5cfe [x32] Fix segmented stacks support
Summary:
Update segmented-stacks*.ll tests with x32 target case and make
corresponding changes to make them pass.

Test Plan: tests updated with x32 target

Reviewers: nadav, rafael, dschuff

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D5245

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218247 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 13:11:35 +00:00
Robert Lougher
2ee97f03a4 Fix assert when decoding PSHUFB mask
The PSHUFB mask decode routine used to assert if the mask index was out of
range (<0 or greater than the size of the vector).  The problem is, we can
legitimately have a PSHUFB with a large index using intrinsics.  The
instruction only uses the least significant 4 bits.  This change removes the
assert and masks the index to match the instruction behaviour.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218242 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 11:54:38 +00:00
Ehsan Akhgari
ffaafbe92d ms-inline-asm: Add a sema callback for looking up label names
The implementation of the callback in clang's Sema will return an
internal name for labels.

Test Plan: Will be tested in clang.

Reviewers: rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4587

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218229 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 02:21:35 +00:00
Chandler Carruth
de95c380c7 [x86] Back out a bad choice about lowering v4i64 and pave the way for
a more sane approach to AVX2 support.

Fundamentally, there is no useful way to lower integer vectors in AVX.
None. We always end up with a VINSERTF128 in the end, so we might as
well eagerly switch to the floating point domain and do everything
there. This cleans up lots of weird and unlikely to be correct
differences between integer and floating point shuffles when we only
have AVX1.

The other nice consequence is that by doing things this way we will make
it much easier to write the integer lowering routines as we won't need
to duplicate the logic to check for AVX vs. AVX2 in each one -- if we
actually try to lower a 256-bit vector as an integer vector, we have
AVX2 and can rely on it. I think this will make the code much simpler
and more comprehensible.

Currently, I've disabled *all* support for AVX2 so that we always fall
back to AVX. This keeps everything working rather than asserting. That
will go away with the subsequent series of patches that provide
a baseline AVX2 implementation.

Please note, I'm going to implement AVX2 *without access to hardware*.
That means I cannot correctness test this path. I will be relying on
those with access to AVX2 hardware to do correctness testing and fix
bugs here, but as a courtesy I'm trying to sketch out the framework for
the new-style vector shuffle lowering in the context of the AVX2 ISA.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218228 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 00:32:15 +00:00
Chandler Carruth
37bb4b0365 [x86] Teach the new vector shuffle lowering how to cleverly lower single
input v8f32 shuffles which are not 128-bit lane crossing but have
different shuffle patterns in the low and high lanes. This removes most
of the extract/insert traffic that was unnecessary and is particularly
good at lowering cases where only one of the two lanes is shuffled at
all.

I've also added a collection of test cases with undef lanes because this
lowering is somewhat more sensitive to undef lanes than others.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218226 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 23:46:13 +00:00
Chandler Carruth
974e872b03 [x86] With the stronger canonicalization of shuffles added in r218216,
the new vector shuffle lowering no longer needs to check both symmetric
forms of UNPCK patterns for v4f64.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218217 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 13:37:51 +00:00
Chandler Carruth
974542d7d8 [x86] Teach the new vector shuffle lowering to re-use the SHUFPS
lowering when it can use a symmetric SHUFPS across both 128-bit lanes.

This required making the SHUFPS lowering tolerant of other vector types,
and adjusting our canonicalization to canonicalize harder.

This is the last of the clever uses of symmetry I've thought of for
v8f32. The rest of the tricks I'm aware of here are to work around
assymetry in the mask.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218216 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 13:35:14 +00:00
Chandler Carruth
38e181630a [x86] Refactor the logic to form SHUFPS instruction patterns to lower
a generic vector shuffle mask into a helper that isn't specific to the
other things that influence which choice is made or the specific types
used with the instruction.

No functionality changed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218215 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 13:03:00 +00:00
Chandler Carruth
1a5f7f54f4 [x86] Teach the new vector shuffle lowering the basics about insertion
of a single element into a zero vector for v4f64 and v4i64 in AVX.
Ironically, there is less to see here because xor+blend is so crazy fast
that we can't really beat that to zero the high 128-bit lane.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218214 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 12:49:46 +00:00
Chandler Carruth
6ef31b0079 [x86] Teach the new vector shuffle lowering how to lower to UNPCKLPS and
UNPCKHPS with AVX vectors by recognizing those patterns when they are
repeated for both 128-bit lanes.

With this, we now generate the exact same (really nice) code for
Quentin's avx_test_case.ll which was the most significant regression
reported for the new shuffle lowering. In fact, I'm out of specific test
cases for AVX lowering, the rest were AVX2 I think. However, there are
a bunch of pretty obvious remaining things to improve with AVX...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218213 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 12:20:44 +00:00
Chandler Carruth
7922d3e39a [x86] Begin teaching the new vector shuffle lowering among the most
important bits of cleverness: to detect and lower repeated shuffle
patterns between the two 128-bit lanes with a single instruction.

This patch just teaches it how to lower single-input shuffles that fit
this model using VPERMILPS. =] There is more that needs to happen here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218211 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 12:01:19 +00:00
Chandler Carruth
fdaf59e9b1 [x86] Explicitly lower to a blend early if it is trivial to do so for
v8f32 shuffles in the new vector shuffle lowering code.

This is very cheap to do and makes it much more clear that anything more
expensive but overlapping with this lowering should be selected
afterward (for example using AVX2's VPERMPS). However, no functionality
changed here as without this code we would fall through to create no-op
shuffles of each input and a blend. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218209 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 11:40:39 +00:00
Chandler Carruth
29720a4bad [x86] Teach the new vector shuffle lowering of v4f64 to prefer a direct
VBLENDPD over using VSHUFPD. While the 256-bit variant of VBLENDPD slows
down to the same speed as VSHUFPD on Sandy Bridge CPUs, it has twice the
reciprocal throughput on Ivy Bridge CPUs much like it does everywhere
for 128-bits. There isn't a downside, so just eagerly use this
instruction when it suffices.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218208 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 11:17:55 +00:00
Chandler Carruth
25089558f2 [x86] Switch the blend implementation to use a MVT switch rather than
awkward conditions. The readability improvement of this will be even
more important as I generalize it to handle more types.

No functionality changed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218205 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 10:36:12 +00:00
Chandler Carruth
4127d76566 [x86] Remove some essentially lying comments from the v4f64 path of the
new vector shuffle lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218204 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 10:27:14 +00:00
Chandler Carruth
05a8a724e2 [x86] Fix a helper to reflect that what we actually care about is
128-bit lane crossings, not 'half' crossings. This came up in code
review ages ago, but I hadn't really addresesd it. Also added some
documentation for the helper.

No functionality changed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218203 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 09:35:25 +00:00
Chandler Carruth
291140b112 [x86] Teach the new vector shuffle lowering the first step toward more
actual support for complex AVX shuffling tricks. We can do independent
blends of the low and high 128-bit lanes of an avx vector, so shuffle
the inputs into place and then do the blend at 256 bits. This will in
many cases remove one blend instruction.

The next step is to permute the low and high halves in-place rather than
extracting them and re-inserting them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218202 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 09:35:22 +00:00
Chandler Carruth
ae464b2ba1 [x86] Teach the new vector shuffle lowering to use VPERMILPD for
single-input shuffles with doubles. This allows them to fold memory
operands into the shuffle, etc. This is just the analog to the v4f32
case in my prior commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218193 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 22:09:27 +00:00
Chandler Carruth
9c7ffd20df [x86] Teach the new vector shuffle lowering to use the AVX VPERMILPS
instruction for single-vector floating point shuffles. This in turn
allows the shuffles to fold a load into the instruction which is one of
the common regressions hit with the new shuffle lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218190 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 20:52:07 +00:00
Chandler Carruth
c16105b078 [x86] Teach the v4f32 path of the new shuffle lowering to handle the
tricky case of single-element insertion into the zero lane of a zero
vector.

We can't just use the same pattern here as we do in every other vector
type because the general insertion logic can handle insertion into the
non-zero lane of the vector. However, in SSE4.1 with v4f32 vectors we
have INSERTPS that is a much better choice than the generic one for such
lowerings. But INSERTPS can do lots of other lowerings as well so
factoring its logic into the general insertion logic doesn't work very
well. We also can't just extract the core common part of the general
insertion logic that is faster (forming VZEXT_MOVL synthetic nodes that
lower to MOVSS when they can) because VZEXT_MOVL is often *faster* than
a blend while INSERTPS is slower! So instead we do a restrictive
condition on attempting to use the generic insertion logic to narrow it
to those cases where VZEXT_MOVL won't need a shuffle afterward and thus
will do better than INSERTPS. Then we try blending. Then we go back to
INSERTPS.

This still doesn't generate perfect code for some silly reasons that can
be fixed by tweaking the td files for lowering VZEXT_MOVL to use
XORPS+BLENDPS when available rather than XORPS+MOVSS when the input ends
up in a register rather than a load from memory -- BLENDPSrr has twice
the reciprocal throughput of MOVSSrr. Don't you love this ISA?

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218177 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 04:15:22 +00:00
Chandler Carruth
9ba9f1a7e6 [x86] Refactor the code for emitting INSERTPS to reuse the zeroable mask
analysis used elsewhere. This removes the last duplicate of this logic.
Also simplify the code here quite a bit. No functionality changed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218176 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 03:57:01 +00:00
Chandler Carruth
cc62abbe39 [x86] Generalize the single-element insertion lowering to work with
floating point types and use it for both v2f64 and v2i64 single-element
insertion lowering.

This fixes the last non-AVX performance regression test case I've gotten
of for the new vector shuffle lowering. There is obvious analogous
lowering for v4f32 that I'll add in a follow-up patch (because with
INSERTPS, v4f32 requires special treatment). After that, its AVX stuff.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218175 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 03:32:25 +00:00
Chandler Carruth
8924ed3db4 [x86] Replace some duplicated logic reasoning about whether particular
vector lanes can be modeled as zero with a call to the new function that
computes a bit-vector representing that information.

No functionality changed here, but will allow doing more clever things
with the zero-test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218174 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 02:44:21 +00:00
Robin Morisset
613c7d0b35 [X86] Erase some obsolete comments from README.txt
I just tried reproducing some of the optimization failures in README.txt in the
X86 backend, and many of them could not be reproduced. In general the entire
file appears quite bit-rotted, whatever interesting parts remain should be
moved to bugzilla, and the rest deleted. I did not spend the time to do that,
so I just deleted the few I tried reproducing which are obsolete, to save some
time to whoever will find the courage to do it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218170 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 23:56:46 +00:00
Chandler Carruth
f7ca3552ff [x86] Hoist a function up to the rest of the non-type-specific lowering
helpers, and re-flow the logic to use early exit and be a bit more
readable.

No functionality changed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218155 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 21:52:10 +00:00
Chandler Carruth
401b720aa8 [x86] Hoist the actual lowering logic into a helper function to separate
it from the shuffle pattern matching logic.

Also cleaned up variable names, comments, etc. No functionality changed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218152 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 21:20:08 +00:00
Chandler Carruth
dc58d1e099 [x86] Fully generalize the zext lowering in the new vector shuffle
lowering to support both anyext and zext and to custom lower for many
different microarchitectures.

Using this allows us to get *exactly* the right code for zext and anyext
shuffles in all the vector sizes. For v16i8, the improvement is *huge*.
The new SSE2 test case added I refused to add before this because it was
sooooo muny instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218143 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 20:00:32 +00:00
Chandler Carruth
89436b4160 [x86] Recognize that we can use duplication to widen v16i8 shuffles due
to undef lanes as well as defined widenable lanes. This dramatically
improves the lowering we use for undef-shuffles in a zext-ish pattern
for SSE2.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218115 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 09:45:21 +00:00
Chandler Carruth
ec1f7b1c87 [x86] Teach the new vector shuffle lowering to also use pmovzx for v4i32
shuffles that are zext-ing.

Not a lot to see here; the undef lane variant is better handled with
pshufd, but this improves the actual zext pattern.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218112 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 08:37:44 +00:00
Chandler Carruth
330aa6fd6b [x86] Add a dedicated lowering path for zext-compatible vector shuffles
to the new vector shuffle lowering code.

This allows us to emit PMOVZX variants consistently for patterns where
it is a viable lowering. This instruction is both fast and allows us to
fold loads into it. This only hooks the new lowering up for i16 and i8
element widths, mostly so I could manage the change to the tests. I'll
add the i32 one next, although it is significantly less interesting.

One thing to note is that we already had some tests for these patterns
but those tests had far less horrible instructions. The problem is that
those tests weren't checking the strict start and end of the instruction
sequence. =[ As a consequence something changed in the lowering making
us generate *TERRIBLE* code for these patterns in SSE2 through SSSE3.
I've consolidated all of the tests and spelled out the madness that we
currently emit for these shuffles. I'm going to try to figure out what
has gone wrong here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218102 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 06:07:49 +00:00
Aaron Ballman
c21e4e197d Reverting NFC changes from r218050. Instead, the warning was disabled for GCC in r218059, so these changes are no longer required.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218062 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 17:34:23 +00:00
Robert Khasanov
262d57d578 [SKX] Deriving rmb multiclasses from general one (avx512_icmp_packed_rmb and avx512_icmp_cc_rmb).
Thanks Adam Nemet for notice about this.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218051 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 14:06:55 +00:00