Commit Graph

10401 Commits

Author SHA1 Message Date
Adam Nemet
ccba8025da [X86] AVX512: Move compressed displacement logic to TD
This does not actually move the logic yet but reimplements it in the Tablegen
language.  Then asserts that the new implementation results in the same value.

The next patch will remove the assert and the temporary use of the TSFlags and
remove the C++ implementation.

The formula requires a limited form of the logical left and right operators.
I implemented these with the bit-extract/insert operator (i.e. blah{bits}).

No functional change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213278 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-17 17:04:34 +00:00
Saleem Abdulrasool
a056166dc2 MC: fix MCAsmInfo usage for windows-itanium
Windows itanium uses the GNUCOFF assmebly format, not ELF.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213274 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-17 16:27:40 +00:00
Tim Northover
6c701b9aca CodeGen: generate single libcall for fptrunc -> f16 operations.
Previously we asserted on this code. Currently compiler-rt doesn't
actually implement any of these new libcalls, but external help is
pretty much the only viable option for LLVM.

I've followed the much more generic "__truncST2" naming, as opposed to
the odd name for f32 -> f16 truncation. This can obviously be changed
later, or overridden by any targets that need to.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213252 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-17 11:12:12 +00:00
Tim Northover
ed05086d61 X86: support double extension of f16 type.
x86 has no native ability to extend an f16 to f64, but the same result
is obtained if we expand it into two separate extensions: f16 -> f32
-> f64.

Unfortunately the same is not true for truncate, so that still results
in a compilation failure.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213251 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-17 11:04:04 +00:00
Tim Northover
3e61ccdded CodeGen: extend f16 conversions to permit types > float.
This makes the two intrinsics @llvm.convert.from.f16 and
@llvm.convert.to.f16 accept types other than simple "float". This is
only strictly needed for the truncate operation, since otherwise
double rounding occurs and there's no way to represent the strict IEEE
conversion. However, for symmetry we allow larger types in the extend
too.

During legalization, we can expand an "fp16_to_double" operation into
two extends for convenience, but abort when the truncate isn't legal. A new
libcall is probably needed here.

Even after this commit, various target tweaks are needed to actually use the
extended intrinsics. I've put these into separate commits for clarity, so there
are no actual tests of f64 conversion here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213248 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-17 10:51:23 +00:00
Sanjay Patel
8758b729f5 Remove Atom references in description.
Any CPU can run this pass.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213190 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-16 20:18:49 +00:00
Tim Northover
1cafa00e26 CodeGen: don't form illegail EXTLOAD operations.
It turns out that in most cases (the main exception being i1-related
types) once these operations are formed we cannot separate them and
the targets end up having to deal with them whether they want to or
not.

This is not a good situation, and a more reasonable default can be
formed by ackowledging this and having targets leave them as Legal.
Only x86 seems to be affected (other targets don't even try marking
the operation Expand).

Mostly there's no visible change here yet, but it will be useful to
have truly expanded EXTLOADS for MVT::f16 softening support.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213162 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-16 15:37:24 +00:00
Andrea Di Biagio
35f6e97777 [X86] Add a check for 'isMOVHLPSMask' within method 'isShuffleMaskLegal'.
Before this change, method 'isShuffleMaskLegal' didn't know that shuffles
implementing a 'movhlps' operation were perfectly legal for SSE targets.

This patch adds the missing check for 'isMOVHLPSMask' inside method
'isShuffleMaskLegal' to fix the problem.

The reason why it is important to do this is because the DAGCombiner
conservatively avoids combining a pair of shuffles if the resulting shuffle
node has an illegal mask. Before this patch, shuffles with a MOVHLPS mask were
wrongly considered not to be legal. This was the root cause of some poor-code
generation bugs.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213137 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-16 11:29:39 +00:00
David Majnemer
6f7532bb97 X86: Simplify X86WindowsTargetObjectFile::getSectionForConstant
There exists a helper function to abstract away the various differences
between ConstantVector, ConstantDataVector, ConstantAggregateZero, etc.

Use it to simplify X86WindowsTargetObjectFile::getSectionForConstant.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213104 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-15 23:01:10 +00:00
Sanjay Patel
f7e042324a Move Post RA Scheduling flag bit into SchedMachineModel
Refactoring; no functional changes intended

    Removed PostRAScheduler bits from subtargets (X86, ARM).
    Added PostRAScheduler bit to MCSchedModel class.
    This bit is set by a CPU's scheduling model (if it exists).
    Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses.
    Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!).
    Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling.
    Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values.
    Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86: 
       a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget. 
       b. MIPS overrides the CPU's postRA settings by enabling postRA for everything. 
       c. PPC overrides the CPU's postRA settings by enabling postRA for everything. 
       d. X86 is the only target that actually has postRA specified via sched model info.

Differential Revision: http://reviews.llvm.org/D4217


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213101 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-15 22:39:58 +00:00
Cameron McInally
ef886879b6 Revert r213070. It's breaking the build in MCELFStreamer::EmitInstToData(...).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213073 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-15 16:24:24 +00:00
Cameron McInally
160c4cb678 Add x86 patterns to match a specific add-with-carry.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213070 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-15 15:03:32 +00:00
Andrea Di Biagio
a34d957b61 Silence a warning in conditional expression.
Fixes a gcc warning caused by a typo. A redundant assignment operation was
accidentally used as the third operand of a conditional expression.
No functional change intended.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213061 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-15 10:53:44 +00:00
David Majnemer
218f127b63 Fix typo in comment
No functionality changed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213052 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-15 07:11:32 +00:00
Juergen Ributzka
cce1bd7a0b [FastISel][X86] Remove no longer needed functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213051 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-15 06:35:53 +00:00
Juergen Ributzka
1b0266d7cb [FastISel][X86] Implement the FastLowerIntrinsicCall hook.
Rename X86VisitIntrinsicCall -> FastLowerIntrinsicCall, which effectively
implements the target hook.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213050 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-15 06:35:50 +00:00
Juergen Ributzka
566afe2a15 [FastISel][X86] Implement the FastLowerCall hook.
This implements the FastLowerCall hook, which is based on the DoSelectCall
function. The implementation is very similar, but the target-independent call
lowering part has been factored out.

This should also enable patchpoint intrinsic lowering for FastISel on X86.

Related to <rdar://problem/17427052>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213049 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-15 06:35:47 +00:00
Juergen Ributzka
a107ebd54d Revert "[FastISel][X86] Remove no longer needed functions."
Revert "[FastISel][X86] Implement the FastLowerIntrinsicCall hook."
Revert "[FastISel][X86] Implement the FastLowerCall hook."

This reverts commit r213035, r213036, and r213037 to make the
buildbots happy again.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213048 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-15 05:23:40 +00:00
David Majnemer
cdc1044944 CodeGen: Handle ConstantVector and undef in WinCOFF constant pools
The constant pool entry code for WinCOFF assumed that vector constants
would be formed using ConstantDataVector, it did not expect to see a
ConstantVector.  Furthermore, it did not expect undef as one of the
elements of the vector.

ConstantVectors should be handled like ConstantDataVectors, treat Undef
as zero.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213038 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-15 02:34:12 +00:00
Juergen Ributzka
3c0737454d [FastISel][X86] Remove no longer needed functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213037 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-15 02:22:56 +00:00
Juergen Ributzka
a7d1d3a513 [FastISel][X86] Implement the FastLowerIntrinsicCall hook.
Rename X86VisitIntrinsicCall -> FastLowerIntrinsicCall, which effectively
implements the target hook.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213036 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-15 02:22:53 +00:00
Juergen Ributzka
805486c3f9 [FastISel][X86] Implement the FastLowerCall hook.
This implements the FastLowerCall hook, which is based on the DoSelectCall
function. The implementation is very similar, but the target-independent call
lowering part has been factored out.

This should also enable patchpoint intrinsic lowering for FastISel on X86.

Related to <rdar://problem/17427052>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213035 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-15 02:22:49 +00:00
Adam Nemet
3afd71fc7d [X86] Specify all TSFlags bit-offsets symbolically
No functional change.

The offsets for the other bitfields are specified symbolically.  I need to
increase the size for one of the earlier fields which is easier after this
cleanup.

Why these bits are relative to VEXShift is a bit strange but that is for
another cleanup.

I made sure that the values for the enums are unchanged after this change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213011 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-14 23:18:39 +00:00
David Majnemer
38d8be1ad8 CodeGen: Stick constant pool entries in COMDAT sections for WinCOFF
COFF lacks a feature that other object file formats support: mergeable
sections.

To work around this, MSVC sticks constant pool entries in special COMDAT
sections so that each constant is in it's own section.  This permits
unused constants to be dropped and it also allows duplicate constants in
different translation units to get merged together.

This fixes PR20262.

Differential Revision: http://reviews.llvm.org/D4482

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213006 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-14 22:57:27 +00:00
Saleem Abdulrasool
5335b49f96 X86: correct 64-bit atomics on 32-bit
We would emit a libcall for a 64-bit atomic on x86 after SVN r212119.  This was
due to the misuse of hasCmpxchg16 to indicate if cmpxchg8b was supported on a
32-bit target.  They were added at different times and would result in the
border condition being mishandled.

This fixes the border case to emit the cmpxchg8b instruction for 64-bit atomic
operations on x86 at the cost of restoring a long-standing bug in the codegen.
We emit a cmpxchg8b on all x86 targets even where the CPU does not support this
instruction (pre-Pentium CPUs).  Although this bug should be fixed, this was
present prior to SVN r212119 and this change, so this is not really introducing
a regression.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212956 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-14 16:28:13 +00:00
Tim Northover
9e251d6d41 X86: remove temporary atomicrmw used during lowering.
We construct a temporary "atomicrmw xchg" instruction when lowering atomic
stores for widths that aren't supported natively. This isn't on the top-level
worklist though, so it won't be removed automatically and we have to do it
ourselves once that itself has been lowered.

Thanks Saleem for pointing this out!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212948 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-14 15:31:13 +00:00
Saleem Abdulrasool
1f1e6e7884 MC: make DWARF and Windows unwinding handling more similar
Rename member variables and functions for the MCStreamer for DWARF-like
unwinding management.  Rename the Windows ones as well and make the naming and
handling similar across the two.  No functional change intended.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212912 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-13 19:03:36 +00:00
Juergen Ributzka
b2e9db4109 Revert "[FastISel][X86] Implement the FastLowerIntrinsicCall hook."
This reverts commit r212851, because it broke the memset lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212855 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-11 23:10:08 +00:00
Juergen Ributzka
83772afe0e [FastISel][X86] Implement the FastLowerIntrinsicCall hook.
Rename X86VisitIntrinsicCall -> FastLowerIntrinsicCall, which effectively
implements the target hook.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212851 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-11 22:37:43 +00:00
Quentin Colombet
52c50f5ec5 [X86] Fix the inversion of low and high bits for the lowering of MUL_LOHI.
Also add a few comments.

<rdar://problem/17581756>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212808 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-11 12:08:23 +00:00
Adam Nemet
11c9a50611 [X86] AVX512: Improve readability of isCDisp8
No functional change.  As I was trying to understand this function, I found
that variables were reused with confusing names and the broadcast case was a
bit too implicit.  Hopefully, this is an improvement.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212795 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-11 05:23:25 +00:00
Adam Nemet
fcc56eb31f [X86] AVX512: Simplify logic in isCDisp8
It was computing the VL/n case as:
  MemObjSize = VectorByteSize / ElemByteSize / Divider * ElemByteSize

ElemByteSize not only falls out but VectorByteSize/Divider now actually
matches the definition of VL/n.

Also some formatting fixes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212794 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-11 05:23:12 +00:00
Akira Hatanaka
60079c18bc [X86] Mark pseudo instruction TEST8ri_NOEREX as hasSIdeEffects=0.
Also, add a case clause in X86InstrInfo::shouldScheduleAdjacent to enable
macro-fusion.

<rdar://problem/15680770>



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212747 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 18:00:53 +00:00
Zinovy Nis
a2bc403710 [x32] Add AsmBackend for X32 which uses ELF32 with x86_64 (the author is Pavel Chupin).
This is minimal change for backend required to have "hello world" compiled and working on x32 target (x86_64-linux-gnux32). More patches for x32 will follow.

Differential Revision: http://reviews.llvm.org/D4181



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212716 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 13:03:26 +00:00
Chandler Carruth
a4e7f05a28 [x86] Add another combine that is particularly useful for the new vector
shuffle lowering: match shuffle patterns equivalent to an unpcklwd or
unpckhwd instruction.

This allows us to use generic lowering code for v8i16 shuffles and match
the unpack pattern late.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212705 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 11:09:29 +00:00
Chandler Carruth
977aab501d [x86] Expand the target DAG combining for PSHUFD nodes to be able to
combine into half-shuffles through unpack instructions that expand the
half to a whole vector without messing with the dword lanes.

This fixes some redundant instructions in splat-like lowerings for
v16i8, which are now getting to be *really* nice.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212695 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 09:57:36 +00:00
Chandler Carruth
dc90a3ab8f [x86] Tweak the v16i8 single input special case lowering for shuffles
that splat i8s into i16s.

Previously, we would try much too hard to arrange a sequence of i8s in
one half of the input such that we could unpack them into i16s and
shuffle those into place. This isn't always going to be a cheaper i8
shuffle than our other strategies. The case where it is always going to
be cheaper is when we can arrange all the necessary inputs into one half
using just i16 shuffles. It happens that viewing the problem this way
also makes it much easier to produce an efficient set of shuffles to
move the inputs into one half and then unpack them.

With this, our splat code gets one step closer to being not terrible
with the new experimental lowering strategy. It also exposes two
combines missing which I will add next.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212692 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 09:16:40 +00:00
Chandler Carruth
95b14b00da [x86] Initial improvements to the new shuffle lowering for v16i8
shuffles specifically for cases where a small subset of the elements in
the input vector are actually used.

This is specifically targetted at improving the shuffles generated for
trunc operations, but also helps out splat-like operations.

There is still some really low-hanging fruit here that I want to address
but this is a huge step in the right direction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212680 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 04:34:06 +00:00
Chandler Carruth
c4788790f6 [x86] Refactor some of the new code for lowering v16i8 shuffles to
remove duplication and make it easier to select different strategies.

No functionality changed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212674 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 02:24:26 +00:00
Chandler Carruth
515f228ec3 [SDAG] Make the new zext-vector-inreg node default to expand so targets
don't need to set it manually.

This is based on feedback from Tom who pointed out that if every target
needs to handle this we need to reach out to those maintainers. In fact,
it doesn't make sense to duplicate everything when anything other than
expand seems unlikely at this stage.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212661 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 22:53:04 +00:00
Benjamin Kramer
08b75dd061 TargetRegisterInfo: Remove function that fell out of use years ago.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212636 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 18:53:57 +00:00
Adam Nemet
074b752cc9 [X86] AVX512: Enable it in the Loop Vectorizer
This lets us experiment with 512-bit vectorization without passing
force-vector-width manually.

The code generated for a simple integer memset loop is properly vectorized.
Disassembly is still broken for it though :(.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212634 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 18:22:33 +00:00
Benjamin Kramer
4afbd3e941 X86: When lowering v8i32 himuls use the correct shuffle masks for AVX2.
Turns out my trick of using the same masks for SSE4.1 and AVX2 didn't work out
as we have to blend two vectors. While there remove unecessary cross-lane moves
from the shuffles so the backend can lower it to palignr instead of vperm.

Fixes PR20118, a miscompilation of vector sdiv by constant on AVX2.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212611 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 11:12:39 +00:00
Chandler Carruth
ce184e95f9 [x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when widening
vector types to be legal and a ZERO_EXTEND node is encountered.

When we use widening to legalize vector types, extend nodes are a real
challenge. Either the input or output is likely to be legal, but in many
cases not both. As a consequence, we don't really have any way to
represent this situation and the prior code in the widening legalization
framework would just scalarize the extend operation completely.

This patch introduces a new DAG node to represent doing a zero extend of
a vector "in register". The core of the idea is to allow legal but
different vector types in the input and output. The output vector must
have fewer lanes but wider elements. The operation is defined to zero
extend the low elements of the input to the size of the output elements,
and drop all of the high elements which don't have a corresponding lane
in the output vector.

It also includes generic expansion of this node in terms of blending
a zero vector into the high elements of the vector and bitcasting
across. This in turn yields extremely nice code for x86 SSE2 when we use
the new widening legalization logic in conjunction with the new shuffle
lowering logic.

There is still more to do here. We need to support sign extension, any
extension, and potentially int-to-float conversions. My current plan is
to continue using similar synthetic nodes to model each of these
transitions with generic lowering code for each one.

However, with this patch LLVM already reaches performance parity with
GCC for the core C loops of the x264 code (assuming you disable the
hand-written assembly versions) when compiling for SSE2 and SSE3
architectures and enabling the new widening and lowering logic for
vectors.

Differential Revision: http://reviews.llvm.org/D4405

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212610 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 10:58:18 +00:00
Chandler Carruth
a2a8a72f6f [x86] Initialize a pointer to null to fix a bug in r212602.
This should restore GCC hosts (which happen to put the bad stuff into
the pointer) and MSan, etc.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212606 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 10:36:42 +00:00
Chandler Carruth
98eac0a244 [x86] Re-apply a variant of the x86 side of r212324 now that the rest
has settled without incident, removing the x86-specific and overly
strict 'isVectorSplat' routine in favor of generic and more powerful
splat detection.

The primary motivation and result of this is that the x86 backend can
now see through splats which contain undef elements. This is essential
if we are using a widening form of legalization and I've updated a test
case to also run in that mode as before this change the generated code
for the test case was completely scalarized.

This version of the patch much more carefully handles the undef lanes.
- We aren't overly conservative about them in the shift lowering
  (where we will never use the splat itself).
- One place where the splat would have been re-used by the existing code
  now explicitly constructs a new constant splat that will be safe.
- The broadcast lowering is much more reasonable with undefs by doing
  a correct check of whether the splat is the only user of a loaded
  value, checking that the splat actually crosses multiple lanes before
  using a broadcast, and handling broadcasts of non-constant splats.

As a consequence of the last bullet, the weird usage of vpshufd instead
of vbroadcast is gone, and we actually can lower an AVX splat with
vbroadcastss where before we emitted a really strange pattern of
a vector load and a manual splat across the vector.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212602 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 10:06:58 +00:00
Chandler Carruth
25b7d54e7f [x86,SDAG] Sink the logic for folding shuffles of splats more
aggressively from the x86 shuffle lowering to the generic SDAG vector
shuffle formation code.

This code already tried to fold away shuffles of splats! It just had
lots of bugs and couldn't handle the case my new x86 shuffle lowering
needed.

First, it failed to correctly compute whether N2 was undef because it
pre-computed this, then did transformations which could *make* N2 undef,
then failed to ever re-consider the precomputed state.

Second, it didn't look through bitcasts at all, even in the safe cases
where they are just element-type bitcasts with no change to the number
of elements.

Third, it didn't handle all-zero bit casts nicely the way my code in the
x86 side of things did, which is essential to getting good zext-shuffle
lowerings.

But all of these are generic. I just ported the code down to this layer
and fixed the surrounding bugs. Tests exercising this in the x86 backend
still pass and some silly code in widen_cast-6.ll gets better. I updated
that test to be a bit more precise but it's still pretty unclear what
the value of the test is in this day and age.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212517 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-08 08:45:38 +00:00
Adam Nemet
f189d9cdf7 [X86] AVX512: Only allow k1-k7 as predicates to vpcmp*
As destination k0 is allowed but not as predicate/writemask.

I also modified the test to allow checking of error messages by the assembler.
I applied a similar approach to the test ret.s in the same directory.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212504 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-08 00:22:32 +00:00
Andrea Di Biagio
cfb83b7bac [x86] Fix assertion failure caused by a wrong combine of PSHUFD nodes with different types.
When combining a sequence of two PSHUFD dag nodes into a single PSHUFD,
make sure that we assign the correct type to the resulting PSHUFD.
X86ISD::PSHUFD dag nodes can be either MVT::v4i32 or MVT::v4f32.

Before this change, an assertion failure was triggered in method
'DAGCombinerInfo::CombineTo' when trying to combine the shuffles from the test
below into a single PSHUFD.

define <4 x float> @test1(<4 x float> %V) {
  %1 = shufflevector <4 x float> %V, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
  %2 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
  ret <4 x float> %2
}



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212498 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-07 23:25:23 +00:00
Juergen Ributzka
1154be8198 [FastISel][X86] Fix smul.with.overflow.i8 lowering.
Add custom lowering code for signed multiply instruction selection, because the
default FastISel instruction selection for ISD::MUL will use unsigned multiply
for the i8 type and signed multiply for all other types. This would set the
incorrect flags for the overflow check.

This fixes <rdar://problem/17549300>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212493 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-07 21:52:21 +00:00