10519 Commits

Author SHA1 Message Date
Chandler Carruth
e6329cf303 [x86] Fix a crash and wrong-code bug in the new vector lowering all
found by a single test reduced out of a failure on llvm-stress.

The start of the problem (and the crash) came when we tried to use
a find of a non-used slot in the move-to half of the move-mask as the
target for two bad-half inputs. While if lucky this will be the first of
a pair of slots which we can place the bad-half inputs into, it isn't
actually guaranteed. This really isn't surprising, not sure what I was
thinking. The correct way to find the two unused slots is to look for
one of the *used* slots. We know it isn't that pair, and we can use some
modular arithmetic to find the other pair by masking off the odd bit and
adding 2 modulo 4. With this, we reliably found a viable pair of slots
for the bad-half inputs.

Sadly, that wasn't enough. We also had a wrong code bug that surfaced
when I reduced the test case for this where we would use the same slot
twice for the two bad inputs. This is because both of the bad inputs
could be in odd slots originally and thus the mod-2 mapping would
actually be the same. The whole point of the weird indexing into the
pair of empty slots was to try to leverage when the end result needed
the two bad-half inputs to be paired in a dword and pre-pair them in the
correct orrientation. This is less important with the powerful combining
we're now doing, and also easier and more reliable to achieve be noting
that we add the bad-half inputs in order. Thus, if they are in a dword
pair, the low part of that will be the first input in the sequence.
Always putting that in the low element will just do the right thing in
addition to computing the correct result.

Test case added. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214849 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-05 08:19:21 +00:00
Eric Christopher
6035518e3b Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.

Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214838 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-05 02:39:49 +00:00
Eric Christopher
9f85dccfc6 Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214781 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 21:25:23 +00:00
Reid Kleckner
0b04cf6e71 Fix failure to invoke exception handler on Win64
When the last instruction prior to a function epilogue is a call, we
need to emit a nop so that the return address is not in the epilogue IP
range.  This is consistent with MSVC's behavior, and may be a workaround
for a bug in the Win64 unwinder.

Differential Revision: http://reviews.llvm.org/D4751

Patch by Vadim Chugunov!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214775 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 21:05:27 +00:00
Akira Hatanaka
93a454952b [X86] Place parentheses around "isMask_32(STReturns) && N <= 2".
This corrects r214672, which was committed to silence a gcc warning.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214732 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 17:23:38 +00:00
Robert Khasanov
7017934668 [SKX] Enabling load/store instructions: encoding
Instructions: VMOVAPD, VMOVAPS, VMOVDQA8, VMOVDQA16, VMOVDQA32,VMOVDQA64, VMOVDQU8, VMOVDQU16, VMOVDQU32,VMOVDQU64, VMOVUPD, VMOVUPS,

Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214719 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 14:35:15 +00:00
Aaron Ballman
c245c18ba3 Improving the name of the function parameter, which happens to solve two likely-less-than-useful MSVC warnings: warning C4258: 'I' : definition from the for loop is ignored; the definition from the enclosing scope is used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214717 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 13:51:27 +00:00
Chandler Carruth
48593d7934 [x86] Just unilaterally prefer SSSE3-style PSHUFB lowerings over clever
use of PACKUS. It's cleaner that way.

I looked at implementing clever combine-based folding of PACKUS chains
into PSHUFB but it is quite hard and doesn't seem likely to be worth it.
The most annoying part would be detecting that the correct masking had
been done to use PACKUS-style instructions as a blend operation rather
than there being any saturating as is indicated by its name. We generate
really nice code for what few test cases I've come up with that aren't
completely contrived for this by just directly prefering PSHUFB and so
let's go with that strategy for now. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214707 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 10:17:35 +00:00
Chandler Carruth
93f5d9f093 [x86] Implement more aggressive use of PACKUS chains for lowering common
patterns of v16i8 shuffles.

This implements one of the more important FIXMEs for the SSE2 support in
the new shuffle lowering. We now generate the optimal shuffle sequence
for truncate-derived shuffles which show up essentially everywhere.

Unfortunately, this exposes a weakness in other parts of the shuffle
logic -- we can no longer form PSHUFB here. I'll add the necessary
support for that and other things in a subsequent commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214702 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 09:40:02 +00:00
Chandler Carruth
73100d8f33 [x86] Handle single input shuffles in the SSSE3 case more intelligently.
I spent some time looking into a better or more principled way to handle
this. For example, by detecting arbitrary "unneeded" ORs... But really,
there wasn't any point. We just shouldn't build blatantly wrong code so
late in the pipeline rather than adding more stages and logic later on
to fix it. Avoiding this is just too simple.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214680 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 01:14:24 +00:00
Saleem Abdulrasool
f83017c0b7 X86: silence warning (-Wparentheses)
GCC 4.8.2 points out the ambiguity in evaluation of the assertion condition:

lib/Target/X86/X86FloatingPoint.cpp:949:49: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
   assert(STReturns == 0 || isMask_32(STReturns) && N <= 2);

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214672 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-03 23:00:39 +00:00
Saleem Abdulrasool
83eaeba2a8 MC: virtualise EmitWindowsUnwindTables
This makes EmitWindowsUnwindTables a virtual function and lowers the
implementation of the function to the X86WinCOFFStreamer.  This method is a
target specific operation.  This enables making the behaviour target dependent
by isolating it entirely to the target specific streamer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214664 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-03 18:51:26 +00:00
Chandler Carruth
caf471e820 [x86] Remove the FIXME that was implemented in r214628. Managed to
forget to update the comment here... =/

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214630 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-02 11:34:23 +00:00
Chandler Carruth
1029c7003f [x86] Largely complete the use of PSHUFB in the new vector shuffle
lowering with a small addition to it and adding PSHUFB combining.

There is one obvious place in the new vector shuffle lowering where we
should form PSHUFBs directly: when without them we will unpack a vector
of i8s across two different registers and do a potentially 4-way blend
as i16s only to re-pack them into i8s afterward. This is the crazy
expensive fallback path for i8 shuffles and we can just directly use
pshufb here as it will always be cheaper (the unpack and pack are
two instructions so even a single shuffle between them hits our
three instruction limit for forming PSHUFB).

However, this doesn't generate very good code in many cases, and it
leaves a bunch of common patterns not using PSHUFB. So this patch also
adds support for extracting a shuffle mask from PSHUFB in the X86
lowering code, and uses it to handle PSHUFBs in the recursive shuffle
combining. This allows us to combine through them, combine multiple ones
together, and generally produce sufficiently high quality code.

Extracting the PSHUFB mask is annoyingly complex because it could be
either pre-legalization or post-legalization. At least this doesn't have
to deal with re-materialized constants. =] I've added decode routines to
handle the different patterns that show up at this level and we dispatch
through them as appropriate.

The two primary test cases are updated. For the v16 test case there is
still a lot of room for improvement. Since I was going through it
systematically I left behind a bunch of FIXME lines that I'm hoping to
turn into ALL lines by the end of this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214628 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-02 10:39:15 +00:00
Chandler Carruth
a66122515a [x86] Switch to using the variable we extracted this operand into.
Spotted this missed refactoring by inspection when reading code, and it
doesn't changethe functionality at all.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214627 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-02 10:29:36 +00:00
Chandler Carruth
3c92a7aac1 [x86] Fix a few typos in my comments spotted in passing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214626 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-02 10:29:34 +00:00
Chandler Carruth
fb1293fd4c [x86] Teach the target shuffle mask extraction to recognize unary forms
of normally binary shuffle instructions like PUNPCKL and MOVLHPS.

This detects cases where a single register is used for both operands
making the shuffle behave in a unary way. We detect this and adjust the
mask to use the unary form which allows the existing DAG combine for
shuffle instructions to actually work at all.

As a consequence, this uncovered a number of obvious bugs in the
existing DAG combine which are fixed. It also now canonicalizes several
shuffles even with the existing lowering. These typically are trying to
match the shuffle to the domain of the input where before we only really
modeled them with the floating point variants. All of the cases which
change to an integer shuffle here have something in the integer domain, so
there are no more or fewer domain crosses here AFAICT. Technically, it
might be better to go from a GPR directly to the floating point domain,
but detecting floating point *outputs* despite integer inputs is a lot
more code and seems unlikely to be worthwhile in practice. If folks are
seeing domain-crossing regressions here though, let me know and I can
hack something up to fix it.

Also as a consequence, a bunch of missed opportunities to form pshufb
now can be formed. Notably, splats of i8s now form pshufb.
Interestingly, this improves the existing splat lowering too. We go from
3 instructions to 1. Yes, we may tie up a register, but it seems very
likely to be worth it, especially if splatting the 0th byte (the
common case) as then we can use a zeroed register as the mask.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214625 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-02 10:27:38 +00:00
Chandler Carruth
8a64a46c97 [x86] Teach my pshufb comment printer to handle VPSHUFB forms as well as
PSHUFB forms. This will be important to update some AVX tests when I add
PSHUFB combining.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214624 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-02 10:08:17 +00:00
Akira Hatanaka
306030f8aa [X86] Simplify X87 stackifier pass.
Stop using ST registers for function returns and inline-asm instructions and use
FP registers instead. This allows removing a large amount of code in the
stackifier pass that was needed to track register liveness and handle copies
between ST and FP registers and function calls returning floating point values.

It also fixes a bug which manifests when an ST register defined by an
inline-asm instruction was live across another inline-asm instruction, as shown
in the following sequence of machine instructions:

1. INLINEASM <es:frndint> $0:[regdef], %ST0<imp-def,tied5>
2. INLINEASM <es:fldcw $0>
3. %FP0<def> = COPY %ST0

<rdar://problem/16952634>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214580 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-01 22:19:41 +00:00
Eric Christopher
f9799dbe51 Add a non-const subtarget returning function to the target machine
so that we can use it to get the old-style JIT out of the subtarget.

This code should be removed when the old-style JIT is removed
(imminently).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214560 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-01 21:18:01 +00:00
Reid Kleckner
ab418066a2 MS inline asm: Use memory constraints for functions instead of registers
This is consistent with how we parse them in a standalone .s file, and
inline assembly shouldn't differ.

This fixes errors about requiring more registers than available in
cases like this:
  void f();
  void __declspec(naked) g() {
    __asm pusha
    __asm call f
    __asm popa
    __asm ret
  }

There are no registers available to pass the address of 'f' into the asm
blob.  The asm should now directly call 'f'.

Tests will land in Clang shortly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214550 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-01 20:21:24 +00:00
Philip Reames
ec9de4677a Add support for StackMap section for ELF/Linux systems
This patch adds code to emits the StackMap section on ELF systems. This section is required to support llvm.experimental.stackmap and llvm.experimental.patchpoint intrinsics.

Reviewers: ributzka, echristo

Differential Revision: http://reviews.llvm.org/D4574



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214538 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-01 18:47:09 +00:00
Reid Kleckner
21e23ab6f9 MS inline asm: Fix null SMLoc when 'ptr' is missing after dword & co
This improves the diagnostics from the regular assembler, but more
importantly it fixes an assertion when parsing inline assembly.  Test
landing in Clang.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214468 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-01 00:59:22 +00:00
Kevin Enderby
42deb12738 Add support for the X86 secure guard extensions instructions in assembler (SGX).
This allows assembling the two new instructions, encls and enclu for the
SKX processor model.

Note the diffs are a bigger than what might think, but to fit the new
MRM_CF and MRM_D7 in things in the right places things had to be
renumbered and shuffled down causing a bit more diffs.

rdar://16228228


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214460 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-31 23:57:38 +00:00
Reid Kleckner
0b3444cca9 X86 MC: Don't crash on empty memory operand parens
Instead, create an absolute memory operand.

Fixes PR20504.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214457 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-31 23:26:35 +00:00
Reid Kleckner
7895ae3135 X86 MC: Reject invalid segment registers before a memory operand colon
Previously we would execute unreachable during object emission.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214456 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-31 23:03:22 +00:00
Louis Gerbarg
7d54c5b0f2 Make sure no loads resulting from load->switch DAGCombine are marked invariant
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.

This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214449 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-31 21:45:05 +00:00
Evgeniy Stepanov
8a78bb9836 [asan] Support x86 REP MOVS asm instrumentation.
Patch by Yuri Gorshenin.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214395 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-31 09:11:04 +00:00
Juergen Ributzka
450c33b212 [FastISel][AArch64 and X86] Don't emit stores for UNDEF arguments during function call lowering.
UNDEF arguments are not ment to be touched - especially for the webkit_js
calling convention. This fix reproduces the already existing behavior of
SelectionDAG in FastISel.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214366 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-31 00:11:11 +00:00
Reid Kleckner
a749eccfed X86 asm parser: Avoid duplicating the list of aliased instructions
No functional change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214364 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-31 00:07:33 +00:00
Reid Kleckner
7af08a4a9b X86 asm parser: Use a loop to disambiguate suffixes instead of copy paste
This works towards making the Intel syntax asm matcher use a completely
different disambiguation strategy.

No functional change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214352 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-30 22:23:11 +00:00
Juergen Ributzka
07731b34fb [FastISel] Move the helper function isCommutativeIntrinsic into FastISel base class.
Move the helper function isCommutativeIntrinsic into the FastISel base class,
so it can be used by more than just one backend.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214347 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-30 22:04:28 +00:00
Robert Khasanov
281d2bf320 [SKX] Enabling mask logic instructions: encoding, lowering
Instructions: KAND{BWDQ}, KANDN{BWDQ}, KOR{BWDQ}, KXOR{BWDQ}, KXNOR{BWDQ}

Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214081 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-28 13:46:45 +00:00
Matt Arsenault
2dd264c8a3 Add alignment value to allowsUnalignedMemoryAccess
Rename to allowsMisalignedMemoryAccess.

On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment,
and don't need to be split into multiple accesses. Vector loads with
an alignment of the element type are not uncommon in OpenCL code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214055 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-27 17:46:40 +00:00
Chandler Carruth
33153513fb [x86] Sink a variable only used by asserts into the asserts. Should fix
some -Werror bots, sorry for the noise.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214043 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-27 01:45:49 +00:00
Chandler Carruth
a6f9501b62 [x86] Add a much more powerful framework for combining x86 shuffle
instructions in the legalized DAG, and leverage it to combine long
sequences of instructions to PSHUFB.

Eventually, the other x86-instruction-specific shuffle combines will
probably all be driven out of this routine. But the real motivation is
to detect after we have fully legalized and optimized a shuffle to the
minimal number of x86 instructions whether it is profitable to replace
the chain with a fully generic PSHUFB instruction even though doing so
requires either a load from a constant pool or tying up a register with
the mask.

While the Intel manuals claim it should be used when it replaces 5 or
more instructions (!!!!) my experience is that it is actually very fast
on modern chips, and so I've gon with a much more aggressive model of
replacing any sequence of 3 or more instructions.

I've also taught it to do some basic canonicalization to special-purpose
instructions which have smaller encodings than their generic
counterparts.

There are still quite a few FIXMEs here, and I've not yet implemented
support for lowering blends with PSHUFB (where its power really shines
due to being able to zero out lanes), but this starts implementing real
PSHUFB support even when using the new, fancy shuffle lowering. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214042 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-27 01:15:58 +00:00
Nick Lewycky
c94ff3dc78 Fix broken assert.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214019 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-26 05:44:15 +00:00
NAKAMURA Takumi
09ed816174 X86ShuffleDecode.cpp: Silence a warning. [-Wunused-variable]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214016 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-26 04:53:05 +00:00
Chandler Carruth
5bce4d8edf [x86] Fix PR20355 (for real). There are many layers to this bug.
The tale starts with r212808 which attempted to fix inversion of the low
and high bits when lowering MUL_LOHI. Sadly, that commit did not include
any positive test cases, and just removed some operations from a test
case where the actual logic being changed isn't fully visible from the
test.

What this commit did was two things. First, it reversed the low and high
results in the formation of the MERGE_VALUES node for the multiple
results. This is entirely correct.

Second it changed the shuffles for extracting the low and high
components from the i64 results of the multiplies to extract them
assuming a big-endian-style encoding of the multiply results. This
second change is wrong. There is no big-endian encoding in x86, the
results of the multiplies are normal v2i64s: when cast to v4i32, the low
i32s are at offsets 0 and 2, and the high i32s are at offsets 1 and 3.

However, the first change wasn't enough to actually fix the bug, which
is (I assume) why the second change was also made. There was another bug
in the MERGE_VALUES formation: we weren't using a VTList, and so were
getting a single result node! When grabbing the *second* result from the
node, we got... well.. colud be anything. I think this *appeared* to
invert things, but had to be causing other problems as well.

Fortunately, I fixed the MERGE_VALUES issue in r213931, so we should
have been fine, right? NOOOPE! Because the core bug was never addressed,
the test in vector-idiv failed when I fixed the MERGE_VALUES node.
Because there are essentially no docs for this node, I had to guess at
how to fix it and tried swapping the operands, restoring the order of
the original code before r212808. While this "fixed" the test case (in
that we produced the write instructions) we were still extracting the
wrong elements of the i64s, and thus PR20355 was still broken.

This commit essentially reverts the big-endian-style extraction part of
r212808 and goes back to the original masks which were correct. Now that
the MERGE_VALUES node formation is also correct, everything works. I've
also included a more detailed test from PR20355 to make sure this stays
fixed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214011 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-26 03:46:57 +00:00
Chandler Carruth
86de7ad211 [x86] Revert r214007: Fix PR20355 ...
The clever way to implement signed multiplication with unsigned *is
already implemented* and tested and working correctly. The bug is
somewhere else. Re-investigating.

This will teach me to not scroll far enough to read the code that did
what I thought needed to be done.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214009 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-26 02:14:54 +00:00
Chandler Carruth
47a12d8d2c [x86] Fix PR20355 (and dups) by not using unsigned multiplication when
signed multiplication is requested. While there is not a difference in
the *low* half of the result, the *high* half (used specifically to
implement the signed division by these constants) certainly is used. The
test case I've nuked was actively asserting wrong code.

There is a delightful solution to doing signed multiplication even when
we don't have it that Richard Smith has crafted, but I'll add the
machinery back and implement that in a follow-up patch. This at least
restores correctness.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214007 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-26 01:52:13 +00:00
NAKAMURA Takumi
5eb69c2337 Update X86/Utils/LLVMBuild.txt corresponding to r213986. "Core" has been introduced.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213995 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-26 00:45:43 +00:00
Chandler Carruth
b1fa0cf8b4 [x86] Fix unused variable warning in no-asserts build.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213989 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-26 00:04:41 +00:00
Chandler Carruth
30e89dd882 [x86] Teach the X86 backend to print shuffle comments for PSHUFB
instructions which happen to have a constant mask.

Currently, this only handles a very narrow set of cases, but those
happen to be the cases that I care about for testing shuffles sanely.
This is a bit trickier than other shuffle instructions because we're
decoding constants out of the constant pool. The current MC layer makes
it completely impossible to inspect a constant pool entry, so we have to
do it at the MI level and attach the comment to the streamer on its way
out. So no joy for disassembling, but it does make test cases and asm
dumps *much* nicer.

Sorry for no test cases, but it didn't really seem that valuable to go
trolling through existing old test cases and updating them. I'll have
lots of testing of this in the upcoming patch for SSSE3 emission in the
new vector shuffle lowering code paths.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213986 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-25 23:47:11 +00:00
Akira Hatanaka
0651a556fe [stack protector] Fix a potential security bug in stack protector where the
address of the stack guard was being spilled to the stack.

Previously the address of the stack guard would get spilled to the stack if it
was impossible to keep it in a register. This patch introduces a new target
independent node and pseudo instruction which gets expanded post-RA to a
sequence of instructions that load the stack guard value. Register allocator
can now just remat the value when it can't keep it in a register. 

<rdar://problem/12475629>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213967 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-25 19:31:34 +00:00
Chandler Carruth
568ab6a8dc [SDAG] Enable the new assert for out-of-range result numbers in
SDValues, fixing the two bugs left in the regression suite.

The key for both of these was the use a single value type rather than
a VTList which caused an unintentionally single-result merge-value node.
Fix this by getting the appropriate VTList in place.

Doing this exposed that the comments in x86's code abouth how MUL_LOHI
operands are handle is wrong. The bug with the use of out-of-range
result numbers was hiding the bug about the order of operands here (as
best i can tell). There are more places where the code appears to get
this backwards still...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213931 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-25 09:19:23 +00:00
Lang Hames
76cbffa2a9 [X86] Clarify some stackmap shadow optimization code as based on review
feedback from Eric Christopher.

No functional change.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213917 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-25 02:29:19 +00:00
Chandler Carruth
1b1fbccf49 [x86] Make vector legalization of extloads work more like the "normal"
vector operation legalization with support for custom target lowering
and fallback to expand when it fails, and use this to implement sext and
anyext load lowering for x86 in a more principled way.

Previously, the x86 backend relied on a target DAG combine to "combine
away" sextload and extload nodes prior to legalization, or would expand
them during legalization with terrible code. This is particularly
problematic because the DAG combine relies on running over non-canonical
DAG nodes at just the right time to match several common and important
patterns. It used a combine rather than lowering because we didn't have
good lowering support, and to expose some tricks being employed to more
combine phases.

With this change it becomes a proper lowering operation, the backend
marks that it can lower these nodes, and I've added support for handling
the canonical forms that don't have direct legal representations such as
sextload of a v4i8 -> v4i64 on AVX1. With this change, our test cases
for this behavior continue to pass even after the DAG combiner beigns
running more systematically over every node.

There is some noise caused by this in the test suite where we actually
use vector extends instead of subregister extraction. This doesn't
really seem like the right thing to do, but is unlikely to be a critical
regression. We do regress in one case where by lowering to the
target-specific patterns early we were able to combine away extraneous
legal math nodes. However, this regression is completely addressed by
switching to a widening based legalization which is what I'm working
toward anyways, so I've just switched the test to that mode.

Differential Revision: http://reviews.llvm.org/D4654

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213897 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-24 22:09:56 +00:00
Lang Hames
b96e833817 [X86] Optimize stackmap shadows on X86.
This patch minimizes the number of nops that must be emitted on X86 to satisfy
stackmap shadow constraints.

To minimize the number of nops inserted, the X86AsmPrinter now records the
size of the most recent stackmap's shadow in the StackMapShadowTracker class,
and tracks the number of instruction bytes emitted since the that stackmap
instruction was encountered. Padding is emitted (if it is required at all)
immediately before the next stackmap/patchpoint instruction, or at the end of
the basic block.

This optimization should reduce code-size and improve performance for people
using the llvm stackmap intrinsic on X86.

<rdar://problem/14959522>



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213892 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-24 20:40:55 +00:00
Reid Kleckner
5b93c8af72 Replace an assertion with a fatal error
Frontends are responsible for putting inalloca on parameters that would
be passed in memory and not registers.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213891 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-24 19:53:33 +00:00