5285 Commits

Author SHA1 Message Date
Chandler Carruth
36cf5d68be [x86] Add an SSE4.1 mode to this test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217072 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-03 20:39:06 +00:00
Chandler Carruth
87508f1d87 [x86] Make this test check everything for both SSE2 and AVX1 modes,
using a common 'all' prefix for the common test output.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217063 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-03 19:39:10 +00:00
Alexander Potapenko
fac68d2d70 Fix PR20800: correctly calculate the offset of the subq instruction when generating compact unwind info.
This CL replaces the constant DarwinX86AsmBackend.PushInstrSize with a method
that lets the backend account for different sizes of "push %reg" instruction
sizes.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217020 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-03 07:11:34 +00:00
Robin Morisset
76b55cc4b1 [X86] Allow atomic operations using immediates to avoid using a register
The only valid lowering of atomic stores in the X86 backend was mov from
register to memory. As a result, storing an immediate required a useless copy
of the immediate in a register. Now these can be compiled as a simple mov.

Similarily, adding/and-ing/or-ing/xor-ing an
immediate to an atomic location (but through an atomic_store/atomic_load,
not a fetch_whatever intrinsic) can now make use of an 'add $imm, x(%rip)'
instead of using a register. And the same applies to inc/dec.

This second point matches the first issue identified in
  http://llvm.org/bugs/show_bug.cgi?id=17281

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216980 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 22:16:29 +00:00
Reid Kleckner
f93099eb1c CodeGen: Handle va_start in the entry block
Also fix a small copy-paste bug in X86ISelLowering where Chain should
have been used in place of DAG.getEntryToken().

Fixes PR20828.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216929 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 18:42:44 +00:00
Rafael Espindola
1e556a80ff Replace -use-init-array with -use-ctors.
We have been using .init-array for most systems for quiet some time,
but tools like llc are still defaulting to .ctors because the old
option was never changed.

This patch makes llc default to .init-array and changes the option to
be -use-ctors.

Clang is not affected by this. It has its own fancier logic.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216905 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 13:54:53 +00:00
Jingyue Wu
88350bf61d Fix a typo in comments in r216862, NFC
PR20766 -> PR20776. Thanks Roman Divacky for the catch!


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216883 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-01 14:55:04 +00:00
Jingyue Wu
d43e6df10b [MachineSink] Use the real post dominator tree
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics
in isPostDominatedBy, use the real MachinePostDominatorTree. The old
heuristics caused instructions to sink unnecessarily, and might create
register pressure.

Test Plan:
Added a NVPTX codegen test to verify that our change is in effect. It also
shows the unnecessary register pressure caused by over-sinking. Updated
affected tests in AArch64 and X86.

Reviewers: eliben, meheff, Jiangning

Reviewed By: Jiangning

Subscribers: jholewinski, aemerson, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D4814



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216862 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-01 03:47:25 +00:00
Robin Morisset
217b38e19a Fix typos in comments, NFC
Summary: Just fixing comments, no functional change.

Test Plan: N/A

Reviewers: jfb

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D5130

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216784 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 21:53:01 +00:00
Reid Kleckner
9436574d1b musttail: Forward regparms of variadic functions on x86_64
Summary:
If a variadic function body contains a musttail call, then we copy all
of the remaining register parameters into virtual registers in the
function prologue. We track the virtual registers through the function
body, and add them as additional registers to pass to the call. Because
this is all done in virtual registers, the register allocator usually
gives us good code. If the function does a call, however, it will have
to spill and reload all argument registers (ew).

Forwarding regparms on x86_32 is not implemented because most compilers
don't support varargs in 32-bit with regparms.

Reviewers: majnemer

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5060

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216780 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 21:42:08 +00:00
Reid Kleckner
dae28732f4 Verifier: Don't reject varargs callee cleanup functions
We've rejected these kinds of functions since r28405 in 2006 because
it's impossible to lower the return of a callee cleanup varargs
function. However there are lots of legal ways to leave such a function
without returning, such as aborting. Today we can leave a function with
a musttail call to another function with the correct prototype, and
everything works out.

I'm removing the verifier check declaring that a normal return from such
a function is UB.

Reviewed By: nlewycky

Differential Revision: http://reviews.llvm.org/D5059

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216779 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 21:25:28 +00:00
Reid Kleckner
1469e29334 X86: Fix conflict over ESI between base register and rep;movsl
The new solution is to not use this lowering if there are any dynamic
allocas in the current function. We know up front if there are dynamic
allocas, but we don't know if we'll need to create stack temporaries
with large alignment during lowering. Conservatively assume that we will
need such temporaries.

Reviewed By: hans

Differential Revision: http://reviews.llvm.org/D5128

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216775 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 20:50:31 +00:00
Robert Khasanov
37e671e894 [SKX] Enable lowering of integer CMP operations.
Added new types to Legalizer.
Fixed getSetCCResultType function
Added lowering tests.

Reviewed by Elena Demikhovsky.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216717 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 08:46:04 +00:00
Rafael Espindola
4bb535027e On MachO, don't put non-private constants in mergeable sections.
On MachO, putting a symbol that doesn't start with a 'L' or 'l' in one of the
__TEXT,__literal* sections prevents the linker from merging the context of the
section.

Since private GVs are the ones the get mangled to start with 'L' or 'l', we now
only put those on the __TEXT,__literal* sections.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216682 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 20:13:31 +00:00
Sanjay Patel
cf9661c6f9 Fix a logic bug in x86 vector codegen: sext (zext (x) ) != sext (x) (PR20472).
Remove a block of code from LowerSIGN_EXTEND_INREG() that was added with:
http://llvm.org/viewvc/llvm-project?view=revision&revision=177421

And caused:
http://llvm.org/bugs/show_bug.cgi?id=20472 (more analysis here)
http://llvm.org/bugs/show_bug.cgi?id=18054

The testcases confirm that we (1) don't remove a zext op that is necessary and (2) generate
a pmovz instead of punpck if SSE4.1 is available. Although pmovz is 1 byte longer, it allows 
folding of the load, and so saves 3 bytes overall.

Differential Revision: http://reviews.llvm.org/D4909



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216679 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 18:59:22 +00:00
Chandler Carruth
da3a293313 [x86] Clean up some tests to use FileCheck and combine two into a single
file.

Changing code that is covered by these tests is just too hard to debug
currently, and now it will be clear the nature of the changes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216643 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 03:41:28 +00:00
Chandler Carruth
7e3dc40fab [x86] Fix a regression introduced with r213897 for 32-bit targets where
we stopped efficiently lowering sextload using the SSE41 instructions
for that operation.

This is a consequence of a bad predicate I used thinking of the memory
access needs. The code actually handles the cases where the predicate
doesn't apply, and handles them much better. =] Simple fix and a test
case added. Fixes PR20767.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216538 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 11:39:47 +00:00
Chandler Carruth
963a5e6c61 [SDAG] Re-instate r215611 with a fix to a pesky X86 DAG combine.
This combine is essentially combining target-specific nodes back into target
independent nodes that it "knows" will be combined yet again by a target
independent DAG combine into a different set of target-independent nodes that
are legal (not custom though!) and thus "ok". This seems... deeply flawed. The
crux of the problem is that we don't combine un-legalized shuffles that are
introduced by legalizing other operations, and thus we don't see a very
profitable combine opportunity. So the backend just forces the input to that
combine to re-appear.

However, for this to work, the conditions detected to re-form the unlegalized
nodes must be *exactly* right. Previously, failing this would have caused poor
code (if you're lucky) or a crasher when we failed to select instructions.
After r215611 we would fall back into the legalizer. In some cases, this just
"fixed" the crasher by produces bad code. But in the test case added it caused
the legalizer and the dag combiner to iterate forever.

The fix is to make the alignment checking in the x86 side of things match the
alignment checking in the generic DAG combine exactly. This isn't really a
satisfying or principled fix, but it at least make the code work as intended.
It also highlights that it would be nice to detect the availability of under
aligned loads for a given type rather than bailing on this optimization. I've
left a FIXME to document this.

Original commit message for r215611 which covers the rest of the chang:
  [SDAG] Fix a case where we would iteratively legalize a node during
  combining by replacing it with something else but not re-process the
  node afterward to remove it.

  In a truly remarkable stroke of bad luck, this would (in the test case
  attached) end up getting some other node combined into it without ever
  getting re-processed. By adding it back on to the worklist, in addition
  to deleting the dead nodes more quickly we also ensure that if it
  *stops* being dead for any reason it makes it back through the
  legalizer. Without this, the test case will end up failing during
  instruction selection due to an and node with a type we don't have an
  instruction pattern for.

It took many million runs of the shuffle fuzz tester to find this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216537 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 11:22:16 +00:00
Elena Demikhovsky
fe0c6ead85 AVX-512: Added intrinsic for VMOVSS store form with mask.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216530 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 07:38:43 +00:00
Chandler Carruth
bfed08e41f [x86] Start fixing a really subtle and terrible form of miscompile in
these DAG combines.

The DAG auto-CSE thing is truly terrible. Due to it, when RAUW-ing
a node with its operand, you can cause its uses to CSE to itself, which
then causes their uses to become your uses which causes them to be
picked up by the RAUW. For nodes that are determined to be "no-ops",
this is "fine". But if the RAUW is one of several steps to enact
a transformation, this causes the DAG to really silently eat an discard
nodes that you would never expect. It took days for me to actually
pinpoint a test case triggering this and a really frustrating amount of
time to even comprehend the bug because I never even thought about the
ability of RAUW to iteratively consume nodes due to CSE-ing them into
itself.

To fix this, we have to build up a brand-new chain of operations any
time we are combining across (potentially) intervening nodes. But once
the logic is added to do this, another issue surfaces: CombineTo eagerly
deletes the one node combined, *but no others*. This is... really
frustrating. If deleting it makes its operands become dead, those
operand nodes often won't go onto the worklist in the
order you would want -- they're already on it and not near the top. That
means things higher on the worklist will get combined prior to these
dead nodes being GCed out of the worklist, and if the chain is long, the
immediate users won't be enough to re-detect where the root of the chain
is that became single-use again after deleting the dead nodes. The
better way to do this is to never immediately delete nodes, and instead
to just enqueue them so we can recursively delete them. The
combined-from node is typically not on the worklist anyways by virtue of
having been popped off.... But that in turn breaks other tests that
*require* CombineTo to delete unused nodes. :: sigh ::

Fortunately, there is a better way. This whole routine should have been
returning the replacement rather than using CombineTo which is quite
hacky. Switch to that, and all the pieces fall together.

I suspect the same kind of miscompile is possible in the half-shuffle
folding code, and potentially the recursive folding code. I'll be
switching those over to a pattern more like this one for safety's sake
even though I don't immediately have any test cases for them. Note that
the only way I got a test case for this instance was with *heavily* DAG
combined 256-bit shuffle sequences generated by my fuzzer. ;]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216319 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-23 10:25:15 +00:00
Nick Lewycky
f591b9c33e Revert r215611 because it caused the infinite loop in bug 20736. There is a reduced testcase in that bug.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216307 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-23 00:45:03 +00:00
Reid Kleckner
d89c0abc07 ARM / x86_64 varargs: Don't save regparms in prologue without va_start
There's no need to do this if the user doesn't call va_start. In the
future, we're going to have thunks that forward these register
parameters with musttail calls, and they won't need these spills for
handling va_start.

Most of the test suite changes are adding va_start calls to existing
tests to keep things working.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216294 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-22 21:59:26 +00:00
Benjamin Kramer
daada81e5c DAGCombiner: Make concat_vector combine safe for EVTs and concat_vectors with many arguments.
PR20677

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216175 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 13:28:02 +00:00
Robert Khasanov
fec1abaeab [x86] Added _addcarry_ and _subborrow_ intrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216164 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 09:43:43 +00:00
Robert Khasanov
10dacc4b52 [x86] Broadwell: ADOX/ADCX. Added _addcarryx_u{32|64} intrinsics to LLVM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216162 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 09:27:00 +00:00
Sanjay Patel
89305e5345 Don't prevent a vselect of constants from becoming a single load (PR20648).
Fix for PR20648 - http://llvm.org/bugs/show_bug.cgi?id=20648

This patch checks the operands of a vselect to see if all values are constants.
If yes, bail out of any further attempts to create a blend or shuffle because
SelectionDAGLegalize knows how to turn this kind of vselect into a single load.

This already happens for machines without SSE4.1, so the added checks just send
more targets down that path.

Differential Revision: http://reviews.llvm.org/D4934


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216121 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 20:34:56 +00:00
Duncan P. N. Exon Smith
4641d5dbdf X86: Add missing triples from r216119
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216120 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 19:58:59 +00:00
Duncan P. N. Exon Smith
5012f1db20 X86: Align the stack on word boundaries in LowerFormalArguments()
The goal of the patch is to implement section 3.2.3 of the AMD64 ABI
correctly.  The controlling sentence is, "The size of each argument gets
rounded up to eightbytes.  Therefore the stack will always be eightbyte
aligned." The equivalent sentence in the i386 ABI page 37 says, "At all
times, the stack pointer should point to a word-aligned area."  For both
architectures, the stack pointer is not being rounded up to the nearest
eightbyte or word between the last normal argument and the first
variadic argument.

Patch by Thomas Jablin!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216119 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 19:40:59 +00:00
Keno Fischer
4b1cddbaf0 Do not insert a tail call when returning multiple values on X86
Summary: This fixes http://llvm.org/bugs/show_bug.cgi?id=19530.
The problem is that X86ISelLowering erroneously thought the third call
was eligible for tail call elimination.
It would have been if it's return value was actually the one returned
by the calling function, but here that is not the case and
additional values are being returned.

Test Plan: Test case from the original bug report is included.

Reviewers: rafael

Reviewed By: rafael

Subscribers: rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D4968

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216117 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 19:00:37 +00:00
Sanjay Patel
3deb3e32b2 critical-anti-dependency breaker: don't use reg def info from kill insts (PR20308)
In PR20308 ( http://llvm.org/bugs/show_bug.cgi?id=20308 ), the critical-anti-dependency breaker
caused a miscompile because it broke a WAR hazard using a register that it thinks is available
based on info from a kill inst. Until PR18663 is solved, we shouldn't use any def/use info from
a kill because they are really just nops.

This patch adds guard checks for kills around calls to ScanInstruction() where the DefIndices
array is set. For good measure, add an assert in ScanInstruction() so we don't hit this bug again.

The test case is a reduced version of the code from the bug report.

Differential Revision: http://reviews.llvm.org/D4977



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216114 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 18:03:00 +00:00
Pavel Chupin
aadaac228d [x32] Fix FrameIndex check in SelectLEA64_32Addr
Summary:
Fixes http://llvm.org/bugs/show_bug.cgi?id=20016 reproducible on new
lea-5.ll case.
Also use RSP/RBP for x32 lea to save 1 byte used for 0x67 prefix in
ESP/EBP case.

Test Plan: lea tests modified to include x32/nacl and new test added

Reviewers: nadav, dschuff, t.p.northover

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D4929

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216065 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 11:59:22 +00:00
Juergen Ributzka
96b1e70c66 Reapply [FastISel][X86] Add large code model support for materializing floating-point constants (r215595).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

Original commit message:
In the large code model for X86 floating-point constants are placed in the
constant pool and materialized by loading from it. Since the constant pool
could be far away, a PC relative load might not work. Therefore we first
materialize the address of the constant pool with a movabsq and then load
from there the floating-point value.

Fixes <rdar://problem/17674628>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216012 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-19 19:44:13 +00:00
Juergen Ributzka
9c23685dd2 Reapply [FastISel][X86] Use XOR to materialize the "0" value (r215594).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216011 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-19 19:44:10 +00:00
Juergen Ributzka
e8757c5dbb Reapply [FastISel][X86] Emit more efficient instructions for integer constant materialization (r215593).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

Original commit message:
This mostly affects the i64 value type, which always resulted in an 15byte
mobavsq instruction to materialize any constant. The custom code checks the
value of the immediate and tries to use a different and smaller mov
instruction when possible.

This fixes <rdar://problem/17420988>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216010 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-19 19:44:06 +00:00
Juergen Ributzka
f08cddcf56 Reapply [FastISel] Let the target decide first if it wants to materialize a constant (215588).
Note: This was originally reverted to track down a buildbot error. This commit
exposed a latent bug that was fixed in r215753. Therefore it is reapplied
without any modifications.

I run it through SPEC2k and SPEC2k6 for AArch64 and it didn't introduce any new
regeressions.

Original commit message:
This changes the order in which FastISel tries to materialize a constant.
Originally it would try to use a simple target-independent approach, which
can lead to the generation of inefficient code.

On X86 this would result in the use of movabsq to materialize any 64bit
integer constant - even for simple and small values such as 0 and 1. Also
some very funny floating-point materialization could be observed too.

On AArch64 it would materialize the constant 0 in a register even the
architecture has an actual "zero" register.

On ARM it would generate unnecessary mov instructions or not use mvn.

This change simply changes the order and always asks the target first if it
likes to materialize the constant. This doesn't fix all the issues
mentioned above, but it enables the targets to implement such
optimizations.

Related to <rdar://problem/17420988>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216006 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-19 19:05:24 +00:00
Akira Hatanaka
6290308366 [X86, X87 stackifier] Do not mark an operand of a debug instruction as kill.
<rdar://problem/16952634>



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215962 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-19 02:09:57 +00:00
Elena Demikhovsky
9735ccb7ea AVX-512: Fixed a bug in emitting compare for MVT:i1 type.
Added a test.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215889 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-18 11:59:06 +00:00
NAKAMURA Takumi
0ac5626b56 llvm/test/CodeGen/X86/fmul-combines.ll: Appease Windows x64. <4 x float> is passed by stack.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215821 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-16 22:28:37 +00:00
Matt Arsenault
5f8a9ae17c Fix fmul combines with constant splat vectors
Fixes things like fmul x, 2 -> fadd x, x

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215820 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-16 10:14:19 +00:00
Chandler Carruth
a3805f1c73 [x86] Teach lots of the new vector shuffle lowering to use UNPCK
instructions for blend operations at 128 bits. This was a serious hole
in our prior blend lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215819 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-16 09:42:15 +00:00
Andrea Di Biagio
89cea3c36b [DAGCombiner] Improve the folding of target independet shuffles to Undef.
When combining a pair of shuffle nodes, check if the combined shuffle mask is
trivially Undef. In case, immediately fold that pair of shuffles to Undef.

The lack of checks for undef masks was the root-cause of a poor-codegen bug
in the dag combiner.

Example:
  %1 = shufflevector <4 x i32> %A, <4 x i32> %B, <4 x i32> <i32 4, i32 1, i32 1, i32 6>
  %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 0, i32 4, i32 1, i32 6>
  %3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> <i32 1, i32 5, i32 3, i32 3>

Before this patch, on x86 (with -mcpu=corei7) we failed to fold the entire
sequence to Undef value and therefore we generated:
  shufps $-123, %xmm1, $xmm0
  pshufd $-46, %xmm0, %xmm0

With this patch, the entire shuffle sequence is folded to Undef and no
shuffles are generated in the output assembly.

Added new test cases to test 'combine-vec-shuffle-5.ll'.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215797 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-16 00:29:44 +00:00
Chandler Carruth
92ee945e2e [x86] Teach the new AVX v4f64 shuffle lowering to use UNPCK instructions
where applicable for blending.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215737 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 17:42:00 +00:00
Chandler Carruth
12e69a0267 [x86] Add the initial skeleton of type-based dispatch for AVX vectors in
the new shuffle lowering and an implementation for v4 shuffles.

This allows us to handle non-half-crossing shuffles directly for v4
shuffles, both integer and floating point. This currently misses places
where we could perform the blend via UNPCK instructions, but otherwise
generates equally good or better code for the test cases included to the
existing vector shuffle lowering. There are a few cases that are
entertainingly better. ;]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215702 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 11:01:40 +00:00
Chandler Carruth
886f0101a7 [x86] Fix the very broken formation of vpunpck instructions in the
target-specific shuffl DAG combines.

We were recognizing the paired shuffles backwards. This code needs to be
replaced anyways as we have the same functionality elsewhere, but I'll
do the refactoring in a follow-up, this is the minimal fix to the
behavior.

In addition to fixing miscompiles with the new vector shuffle lowering,
it also causes the canonicalization to kick in much better, selecting
the smaller encoding variants in lots of places in the new AVX path.
This still isn't quite ideal as we don't need both the shufpd and the
punpck instructions, but that'll get fixed in a follow-up patch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215690 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 03:54:49 +00:00
Chandler Carruth
477f28c48d [x86] Fix PR20540 where the x86 shuffle DAG combiner had completely
broken logic for merging shuffle masks in the face of SM_SentinelZero
mask operands.

While these are '-1' they don't mean 'undef' the way '-1' means in the
pre-legalized shuffle masks. Instead, they mean that the shuffle
operation is forcibly zeroing that lane. Reflect this and explicitly
handle it in a bunch of places. In one place the effect is equivalent
but much more clear. In the rest it was really weirdly broken.

Also, rewrite the entire merging thing to be a more directy operation
with a single loop and just doing math to map the indices through the
various masks.

Also add a bunch of asserts to try to make in extremely clear what the
different masks can possibly look like.

Finally, add some comments to clarify that we're merging shuffle masks
*up* here rather than *down* as we do everywhere else, and thus the
logic is quite confusing.

Thanks to several different people for sending test cases, and for
Robert Khasanov for an initial attempt at fixing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215687 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 02:43:18 +00:00
Juergen Ributzka
6398a7f5fd Revert several FastISel commits to track down a buildbot error.
This reverts:
r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants."
r215594 "[FastISel][X86] Use XOR to materialize the "0" value."
r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization."
r215591 "[FastISel][AArch64] Make use of the zero register when possible."
r215588 "[FastISel] Let the target decide first if it wants to materialize a constant."
r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI."

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215673 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-14 19:56:28 +00:00
Adam Nemet
41c5e687ed [AVX512] Add test for FMA masking instrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215665 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-14 17:13:33 +00:00
Adam Nemet
90eb948fc9 [AVX512] Switch FMA intrinsics to the masking version
This does the renaming and updates the lowering logic.

Part of <rdar://problem/17688758>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215664 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-14 17:13:30 +00:00
Sanjay Patel
9615d702ad optimize vector fneg of bitcasted integer value
This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops.

This patch is very similar to a fabs patch committed at r214892.

Differential Revision: http://reviews.llvm.org/D4852



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215646 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-14 15:15:28 +00:00
Chandler Carruth
cad1711154 [x86] Begin stubbing out the AVX support in the new vector shuffle
lowering scheme.

Currently, this just directly bails to the fallback path of splitting
the 256-bit vector into two 128-bit vectors, operating there, and then
joining the results back together. While the results are far from
perfect, they are *shockingly* good for what we're doing here. I'll be
layering the rest of the functionality on top of this piece by piece and
updating tests as I go.

Note that 256-bit vectors in this mode are still somewhat WIP. While
I think the code paths that I'm adding here are clean and good-to-go,
there are still a lot of 128-bit assumptions that I'll need to stomp out
as I march through the functional spread here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215637 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-14 12:13:59 +00:00