Commit Graph

10453 Commits

Author SHA1 Message Date
Chandler Carruth
fb1293fd4c [x86] Teach the target shuffle mask extraction to recognize unary forms
of normally binary shuffle instructions like PUNPCKL and MOVLHPS.

This detects cases where a single register is used for both operands
making the shuffle behave in a unary way. We detect this and adjust the
mask to use the unary form which allows the existing DAG combine for
shuffle instructions to actually work at all.

As a consequence, this uncovered a number of obvious bugs in the
existing DAG combine which are fixed. It also now canonicalizes several
shuffles even with the existing lowering. These typically are trying to
match the shuffle to the domain of the input where before we only really
modeled them with the floating point variants. All of the cases which
change to an integer shuffle here have something in the integer domain, so
there are no more or fewer domain crosses here AFAICT. Technically, it
might be better to go from a GPR directly to the floating point domain,
but detecting floating point *outputs* despite integer inputs is a lot
more code and seems unlikely to be worthwhile in practice. If folks are
seeing domain-crossing regressions here though, let me know and I can
hack something up to fix it.

Also as a consequence, a bunch of missed opportunities to form pshufb
now can be formed. Notably, splats of i8s now form pshufb.
Interestingly, this improves the existing splat lowering too. We go from
3 instructions to 1. Yes, we may tie up a register, but it seems very
likely to be worth it, especially if splatting the 0th byte (the
common case) as then we can use a zeroed register as the mask.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214625 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-02 10:27:38 +00:00
Chandler Carruth
8a64a46c97 [x86] Teach my pshufb comment printer to handle VPSHUFB forms as well as
PSHUFB forms. This will be important to update some AVX tests when I add
PSHUFB combining.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214624 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-02 10:08:17 +00:00
Akira Hatanaka
306030f8aa [X86] Simplify X87 stackifier pass.
Stop using ST registers for function returns and inline-asm instructions and use
FP registers instead. This allows removing a large amount of code in the
stackifier pass that was needed to track register liveness and handle copies
between ST and FP registers and function calls returning floating point values.

It also fixes a bug which manifests when an ST register defined by an
inline-asm instruction was live across another inline-asm instruction, as shown
in the following sequence of machine instructions:

1. INLINEASM <es:frndint> $0:[regdef], %ST0<imp-def,tied5>
2. INLINEASM <es:fldcw $0>
3. %FP0<def> = COPY %ST0

<rdar://problem/16952634>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214580 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-01 22:19:41 +00:00
Eric Christopher
f9799dbe51 Add a non-const subtarget returning function to the target machine
so that we can use it to get the old-style JIT out of the subtarget.

This code should be removed when the old-style JIT is removed
(imminently).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214560 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-01 21:18:01 +00:00
Reid Kleckner
ab418066a2 MS inline asm: Use memory constraints for functions instead of registers
This is consistent with how we parse them in a standalone .s file, and
inline assembly shouldn't differ.

This fixes errors about requiring more registers than available in
cases like this:
  void f();
  void __declspec(naked) g() {
    __asm pusha
    __asm call f
    __asm popa
    __asm ret
  }

There are no registers available to pass the address of 'f' into the asm
blob.  The asm should now directly call 'f'.

Tests will land in Clang shortly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214550 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-01 20:21:24 +00:00
Philip Reames
ec9de4677a Add support for StackMap section for ELF/Linux systems
This patch adds code to emits the StackMap section on ELF systems. This section is required to support llvm.experimental.stackmap and llvm.experimental.patchpoint intrinsics.

Reviewers: ributzka, echristo

Differential Revision: http://reviews.llvm.org/D4574



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214538 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-01 18:47:09 +00:00
Reid Kleckner
21e23ab6f9 MS inline asm: Fix null SMLoc when 'ptr' is missing after dword & co
This improves the diagnostics from the regular assembler, but more
importantly it fixes an assertion when parsing inline assembly.  Test
landing in Clang.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214468 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-01 00:59:22 +00:00
Kevin Enderby
42deb12738 Add support for the X86 secure guard extensions instructions in assembler (SGX).
This allows assembling the two new instructions, encls and enclu for the
SKX processor model.

Note the diffs are a bigger than what might think, but to fit the new
MRM_CF and MRM_D7 in things in the right places things had to be
renumbered and shuffled down causing a bit more diffs.

rdar://16228228


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214460 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-31 23:57:38 +00:00
Reid Kleckner
0b3444cca9 X86 MC: Don't crash on empty memory operand parens
Instead, create an absolute memory operand.

Fixes PR20504.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214457 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-31 23:26:35 +00:00
Reid Kleckner
7895ae3135 X86 MC: Reject invalid segment registers before a memory operand colon
Previously we would execute unreachable during object emission.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214456 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-31 23:03:22 +00:00
Louis Gerbarg
7d54c5b0f2 Make sure no loads resulting from load->switch DAGCombine are marked invariant
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.

This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214449 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-31 21:45:05 +00:00
Evgeniy Stepanov
8a78bb9836 [asan] Support x86 REP MOVS asm instrumentation.
Patch by Yuri Gorshenin.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214395 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-31 09:11:04 +00:00
Juergen Ributzka
450c33b212 [FastISel][AArch64 and X86] Don't emit stores for UNDEF arguments during function call lowering.
UNDEF arguments are not ment to be touched - especially for the webkit_js
calling convention. This fix reproduces the already existing behavior of
SelectionDAG in FastISel.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214366 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-31 00:11:11 +00:00
Reid Kleckner
a749eccfed X86 asm parser: Avoid duplicating the list of aliased instructions
No functional change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214364 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-31 00:07:33 +00:00
Reid Kleckner
7af08a4a9b X86 asm parser: Use a loop to disambiguate suffixes instead of copy paste
This works towards making the Intel syntax asm matcher use a completely
different disambiguation strategy.

No functional change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214352 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-30 22:23:11 +00:00
Juergen Ributzka
07731b34fb [FastISel] Move the helper function isCommutativeIntrinsic into FastISel base class.
Move the helper function isCommutativeIntrinsic into the FastISel base class,
so it can be used by more than just one backend.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214347 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-30 22:04:28 +00:00
Robert Khasanov
281d2bf320 [SKX] Enabling mask logic instructions: encoding, lowering
Instructions: KAND{BWDQ}, KANDN{BWDQ}, KOR{BWDQ}, KXOR{BWDQ}, KXNOR{BWDQ}

Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214081 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-28 13:46:45 +00:00
Matt Arsenault
2dd264c8a3 Add alignment value to allowsUnalignedMemoryAccess
Rename to allowsMisalignedMemoryAccess.

On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment,
and don't need to be split into multiple accesses. Vector loads with
an alignment of the element type are not uncommon in OpenCL code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214055 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-27 17:46:40 +00:00
Chandler Carruth
33153513fb [x86] Sink a variable only used by asserts into the asserts. Should fix
some -Werror bots, sorry for the noise.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214043 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-27 01:45:49 +00:00
Chandler Carruth
a6f9501b62 [x86] Add a much more powerful framework for combining x86 shuffle
instructions in the legalized DAG, and leverage it to combine long
sequences of instructions to PSHUFB.

Eventually, the other x86-instruction-specific shuffle combines will
probably all be driven out of this routine. But the real motivation is
to detect after we have fully legalized and optimized a shuffle to the
minimal number of x86 instructions whether it is profitable to replace
the chain with a fully generic PSHUFB instruction even though doing so
requires either a load from a constant pool or tying up a register with
the mask.

While the Intel manuals claim it should be used when it replaces 5 or
more instructions (!!!!) my experience is that it is actually very fast
on modern chips, and so I've gon with a much more aggressive model of
replacing any sequence of 3 or more instructions.

I've also taught it to do some basic canonicalization to special-purpose
instructions which have smaller encodings than their generic
counterparts.

There are still quite a few FIXMEs here, and I've not yet implemented
support for lowering blends with PSHUFB (where its power really shines
due to being able to zero out lanes), but this starts implementing real
PSHUFB support even when using the new, fancy shuffle lowering. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214042 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-27 01:15:58 +00:00
Nick Lewycky
c94ff3dc78 Fix broken assert.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214019 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-26 05:44:15 +00:00
NAKAMURA Takumi
09ed816174 X86ShuffleDecode.cpp: Silence a warning. [-Wunused-variable]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214016 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-26 04:53:05 +00:00
Chandler Carruth
5bce4d8edf [x86] Fix PR20355 (for real). There are many layers to this bug.
The tale starts with r212808 which attempted to fix inversion of the low
and high bits when lowering MUL_LOHI. Sadly, that commit did not include
any positive test cases, and just removed some operations from a test
case where the actual logic being changed isn't fully visible from the
test.

What this commit did was two things. First, it reversed the low and high
results in the formation of the MERGE_VALUES node for the multiple
results. This is entirely correct.

Second it changed the shuffles for extracting the low and high
components from the i64 results of the multiplies to extract them
assuming a big-endian-style encoding of the multiply results. This
second change is wrong. There is no big-endian encoding in x86, the
results of the multiplies are normal v2i64s: when cast to v4i32, the low
i32s are at offsets 0 and 2, and the high i32s are at offsets 1 and 3.

However, the first change wasn't enough to actually fix the bug, which
is (I assume) why the second change was also made. There was another bug
in the MERGE_VALUES formation: we weren't using a VTList, and so were
getting a single result node! When grabbing the *second* result from the
node, we got... well.. colud be anything. I think this *appeared* to
invert things, but had to be causing other problems as well.

Fortunately, I fixed the MERGE_VALUES issue in r213931, so we should
have been fine, right? NOOOPE! Because the core bug was never addressed,
the test in vector-idiv failed when I fixed the MERGE_VALUES node.
Because there are essentially no docs for this node, I had to guess at
how to fix it and tried swapping the operands, restoring the order of
the original code before r212808. While this "fixed" the test case (in
that we produced the write instructions) we were still extracting the
wrong elements of the i64s, and thus PR20355 was still broken.

This commit essentially reverts the big-endian-style extraction part of
r212808 and goes back to the original masks which were correct. Now that
the MERGE_VALUES node formation is also correct, everything works. I've
also included a more detailed test from PR20355 to make sure this stays
fixed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214011 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-26 03:46:57 +00:00
Chandler Carruth
86de7ad211 [x86] Revert r214007: Fix PR20355 ...
The clever way to implement signed multiplication with unsigned *is
already implemented* and tested and working correctly. The bug is
somewhere else. Re-investigating.

This will teach me to not scroll far enough to read the code that did
what I thought needed to be done.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214009 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-26 02:14:54 +00:00
Chandler Carruth
47a12d8d2c [x86] Fix PR20355 (and dups) by not using unsigned multiplication when
signed multiplication is requested. While there is not a difference in
the *low* half of the result, the *high* half (used specifically to
implement the signed division by these constants) certainly is used. The
test case I've nuked was actively asserting wrong code.

There is a delightful solution to doing signed multiplication even when
we don't have it that Richard Smith has crafted, but I'll add the
machinery back and implement that in a follow-up patch. This at least
restores correctness.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214007 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-26 01:52:13 +00:00
NAKAMURA Takumi
5eb69c2337 Update X86/Utils/LLVMBuild.txt corresponding to r213986. "Core" has been introduced.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213995 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-26 00:45:43 +00:00
Chandler Carruth
b1fa0cf8b4 [x86] Fix unused variable warning in no-asserts build.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213989 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-26 00:04:41 +00:00
Chandler Carruth
30e89dd882 [x86] Teach the X86 backend to print shuffle comments for PSHUFB
instructions which happen to have a constant mask.

Currently, this only handles a very narrow set of cases, but those
happen to be the cases that I care about for testing shuffles sanely.
This is a bit trickier than other shuffle instructions because we're
decoding constants out of the constant pool. The current MC layer makes
it completely impossible to inspect a constant pool entry, so we have to
do it at the MI level and attach the comment to the streamer on its way
out. So no joy for disassembling, but it does make test cases and asm
dumps *much* nicer.

Sorry for no test cases, but it didn't really seem that valuable to go
trolling through existing old test cases and updating them. I'll have
lots of testing of this in the upcoming patch for SSSE3 emission in the
new vector shuffle lowering code paths.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213986 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-25 23:47:11 +00:00
Akira Hatanaka
0651a556fe [stack protector] Fix a potential security bug in stack protector where the
address of the stack guard was being spilled to the stack.

Previously the address of the stack guard would get spilled to the stack if it
was impossible to keep it in a register. This patch introduces a new target
independent node and pseudo instruction which gets expanded post-RA to a
sequence of instructions that load the stack guard value. Register allocator
can now just remat the value when it can't keep it in a register. 

<rdar://problem/12475629>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213967 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-25 19:31:34 +00:00
Chandler Carruth
568ab6a8dc [SDAG] Enable the new assert for out-of-range result numbers in
SDValues, fixing the two bugs left in the regression suite.

The key for both of these was the use a single value type rather than
a VTList which caused an unintentionally single-result merge-value node.
Fix this by getting the appropriate VTList in place.

Doing this exposed that the comments in x86's code abouth how MUL_LOHI
operands are handle is wrong. The bug with the use of out-of-range
result numbers was hiding the bug about the order of operands here (as
best i can tell). There are more places where the code appears to get
this backwards still...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213931 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-25 09:19:23 +00:00
Lang Hames
76cbffa2a9 [X86] Clarify some stackmap shadow optimization code as based on review
feedback from Eric Christopher.

No functional change.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213917 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-25 02:29:19 +00:00
Chandler Carruth
1b1fbccf49 [x86] Make vector legalization of extloads work more like the "normal"
vector operation legalization with support for custom target lowering
and fallback to expand when it fails, and use this to implement sext and
anyext load lowering for x86 in a more principled way.

Previously, the x86 backend relied on a target DAG combine to "combine
away" sextload and extload nodes prior to legalization, or would expand
them during legalization with terrible code. This is particularly
problematic because the DAG combine relies on running over non-canonical
DAG nodes at just the right time to match several common and important
patterns. It used a combine rather than lowering because we didn't have
good lowering support, and to expose some tricks being employed to more
combine phases.

With this change it becomes a proper lowering operation, the backend
marks that it can lower these nodes, and I've added support for handling
the canonical forms that don't have direct legal representations such as
sextload of a v4i8 -> v4i64 on AVX1. With this change, our test cases
for this behavior continue to pass even after the DAG combiner beigns
running more systematically over every node.

There is some noise caused by this in the test suite where we actually
use vector extends instead of subregister extraction. This doesn't
really seem like the right thing to do, but is unlikely to be a critical
regression. We do regress in one case where by lowering to the
target-specific patterns early we were able to combine away extraneous
legal math nodes. However, this regression is completely addressed by
switching to a widening based legalization which is what I'm working
toward anyways, so I've just switched the test to that mode.

Differential Revision: http://reviews.llvm.org/D4654

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213897 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-24 22:09:56 +00:00
Lang Hames
b96e833817 [X86] Optimize stackmap shadows on X86.
This patch minimizes the number of nops that must be emitted on X86 to satisfy
stackmap shadow constraints.

To minimize the number of nops inserted, the X86AsmPrinter now records the
size of the most recent stackmap's shadow in the StackMapShadowTracker class,
and tracks the number of instruction bytes emitted since the that stackmap
instruction was encountered. Padding is emitted (if it is required at all)
immediately before the next stackmap/patchpoint instruction, or at the end of
the basic block.

This optimization should reduce code-size and improve performance for people
using the llvm stackmap intrinsic on X86.

<rdar://problem/14959522>



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213892 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-24 20:40:55 +00:00
Reid Kleckner
5b93c8af72 Replace an assertion with a fatal error
Frontends are responsible for putting inalloca on parameters that would
be passed in memory and not registers.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213891 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-24 19:53:33 +00:00
Saleem Abdulrasool
9a050915f9 X86: silence sign comparison warning
GCC 4.8 detected a signed compare [-Wsign-compare].  Add a cast for the
destination index.  Add an assert to catch a potential overflow however unlikely
it may be.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213878 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-24 17:12:06 +00:00
NAKAMURA Takumi
9fda421ee4 Update library dependencies.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213832 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-24 02:10:42 +00:00
Filipe Cabecinhas
d28cfd150b Fixed PR20411 - bug in getINSERTPS()
When we had a vector_shuffle where we had an input from each vector, we
could miscompile it because we were assuming the input from V2 wouldn't
be moved from where it was on the vector.

Added a test case.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213826 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-24 01:28:21 +00:00
Rafael Espindola
34c5b5b952 Finish inverting the MC -> Object dependency.
There were still some disassembler bits in lib/MC, but their use of Object
was only visible in the includes they used, not in the symbols.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213808 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 22:26:07 +00:00
Jim Grosbach
adb6a3649e [X86,AArch64] Extend vcmp w/ unary op combine to work w/ more constants.
The transform to constant fold unary operations with an AND across a
vector comparison applies when the constant is not a splat of a scalar
as well.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213800 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 20:41:43 +00:00
Jim Grosbach
4070037e3c X86: restrict combine to when type sizes are safe.
The folding of unary operations through a vector compare and mask operation
is only safe if the unary operation result is of the same size as its input.
For example, it's not safe for [su]itofp from v4i32 to v4f64.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213799 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 20:41:38 +00:00
Robert Khasanov
3922da8ae8 [SKX] Enabling mask instructions: encoding, lowering
KMOVB, KMOVW, KMOVD, KMOVQ, KNOTB, KNOTW, KNOTD, KNOTQ

Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213757 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 14:49:42 +00:00
Andrea Di Biagio
bf0fb36d72 Revert r211771. It was: "[X86] Improve the selection of SSE3/AVX addsub instructions".
This chang fully reverts r211771.
That revision added a canonicalization rule which has the potential to causes a
combine-cycle in the target-independent canonicalizing DAG combine.

The plan is to move the logic that forms target specific addsub nodes as part of
the lowering of shuffles.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213736 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 11:20:24 +00:00
Tim Northover
50f2f1434c X86: drop relocations on __eh_frame sections globally.
Without this, we produce non-extern relocations when targeting older OS X
versions that ld64 can't cope with in the particular context of __eh_frame
sections (who'd want generic relocation-processing anyway?).

This means that an updated linker (ld64 from Xcode 3.2.6 or later) may be
needed when targeting such platforms with a modern version of LLVM, but this is
probably the case anyway and a reasonable requirement.

PR20212, rdar://problem/17544795

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213665 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-22 15:47:09 +00:00
Elena Demikhovsky
bf348c4e46 AVX-512: Fixed intrinsic of VSQRTPS/PD instructions.
I set number and types of parameters according to GCC intrinsics.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213640 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-22 11:07:31 +00:00
Robert Khasanov
aac33cfc08 [SKX] Enabling SKX target and AVX512BW, AVX512DQ, AVX512VL features.
Enabling HasAVX512{DQ,BW,VL} predicates.
Adding VK2, VK4, VK32, VK64 masked register classes.
Adding new types (v64i8, v32i16) to VR512.
Extending calling conventions for new types (v64i8, v32i16)

Patch by Zinovy Nis <zinovy.y.nis@intel.com>
Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213545 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 14:54:21 +00:00
Andrea Di Biagio
3d1975d44b [DAG] Refactor some logic. No functional change.
This patch removes function 'CommuteVectorShuffle' from X86ISelLowering.cpp
and moves its logic into SelectionDAG.cpp as method 'getCommutedVectorShuffles'.
This refactoring is in preperation of an upcoming change to the DAGCombiner.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213503 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 07:28:51 +00:00
Tim Northover
e683321270 X86: support fpext/fptrunc operations to and from 16-bit floats.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213374 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 13:01:25 +00:00
Jim Grosbach
c6058e2462 X86: Constant fold converting vector setcc results to float.
Since the result of a SETCC for X86 is 0 or -1 in each lane, we can
move unary operations, in this case [su]int_to_fp through the mask
operation and constant fold the operation away. Generally speaking:
  UNARYOP(AND(VECTOR_CMP(x,y), constant))
      --> AND(VECTOR_CMP(x,y), constant2)
where constant2 is UNARYOP(constant).

This implements the transform where UNARYOP is [su]int_to_fp.

For example, consider the simple function:
define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind {
  %cmp = fcmp oeq <4 x float> %val, %test
  %ext = zext <4 x i1> %cmp to <4 x i32>
  %result = sitofp <4 x i32> %ext to <4 x float>
  ret <4 x float> %result
}

Before this change, the SSE code is generated as:
LCPI0_0:
  .long 1                       ## 0x1
  .long 1                       ## 0x1
  .long 1                       ## 0x1
  .long 1                       ## 0x1
  .section  __TEXT,__text,regular,pure_instructions
  .globl  _foo
  .align  4, 0x90
_foo:                                   ## @foo
  cmpeqps %xmm1, %xmm0
  andps LCPI0_0(%rip), %xmm0
  cvtdq2ps  %xmm0, %xmm0
  retq

After, the code is improved to:
LCPI0_0:
  .long 1065353216              ## float 1.000000e+00
  .long 1065353216              ## float 1.000000e+00
  .long 1065353216              ## float 1.000000e+00
  .long 1065353216              ## float 1.000000e+00
  .section  __TEXT,__text,regular,pure_instructions
  .globl  _foo
  .align  4, 0x90
_foo:                                   ## @foo
  cmpeqps %xmm1, %xmm0
  andps LCPI0_0(%rip), %xmm0
  retq

The cvtdq2ps has been constant folded away and the floating point 1.0f
vector lanes are materialized directly via the ModRM operand of andps.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213342 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 00:40:56 +00:00
Nico Weber
c1ef24ce39 ms inline asm: Don't add x86 segment registers to the clobber list.
Clang tries to check the clobber list but doesn't list segment registers in its
x86 register list. This fixes PR20343.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213303 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-17 20:24:55 +00:00
Adam Nemet
6ae2941874 [X86] AVX512: Add disassembler support for compressed displacement
There are two parts here.  First is to modify tablegen to adjust the encoding
type ENCODING_RM with the scaling factor.

The second is to use the new encoding types to compute the correct
displacement in the decoder.

Fixes <rdar://problem/17608489>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213281 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-17 17:04:56 +00:00