Commit Graph

11381 Commits

Author SHA1 Message Date
Reid Kleckner
c9fbc97e95 x86: Fix large model calls to __chkstk for dynamic allocas
In the large code model, we now put __chkstk in %r11 before calling it.

Refactor the code so that we only do this once. Simplify things by using
__chkstk_ms instead of __chkstk on cygming. We already use that symbol
in the prolog emission, and it simplifies our logic.

Second half of PR18582.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227519 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 23:58:04 +00:00
Sanjay Patel
65d9a05c76 Change SmallVector param to the more general ArrayRef; NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227514 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 23:35:04 +00:00
Reid Kleckner
850420cd14 x86: Remove the W64ALLOCA pseudo
This is just an alias for CALL64pcrel32, and we can just use that opcode
with explicit defs in the MI.

No functionality change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227508 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 23:09:37 +00:00
Reid Kleckner
cb867e4ac4 Update comments to use unreachable instead of llvm.trap, as implemented now
win64: Call __chkstk through a register with the large code model

Fixes half of PR18582. True dynamic allocas will still have a
CALL64pcrel32 which will fail.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D7267

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227503 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 22:33:00 +00:00
David Blaikie
1ba26f8da1 DebugInfo: Teach Fast ISel to respect the debug location of comparisons in jumps
The use of the DbgLoc in FastISel is probably something we should fix.
It's prone to leaking the wrong location into instructions - we should
have a clear chain of custody from the debug location of an IR
Instruction to that of a MachineInstr to avoid such leakage.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227481 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 19:09:18 +00:00
Robert Lougher
1031549bec [X86] Use single add/sub for large stack offsets
For large stack offsets the compiler generates multiple immediate mode
sub/add instructions in the prologue/epilogue.  This patch makes the
compiler place the final amount to be added/subtracted into a register,
which is then added/substracted with a single operation.

Differential Revision: http://reviews.llvm.org/D7226


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227458 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 16:18:29 +00:00
Simon Pilgrim
d0e8688ebc Spelling fixes. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227376 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 22:03:52 +00:00
Simon Pilgrim
5d8772fef5 Line endings fix. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227374 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 21:56:52 +00:00
Sanjay Patel
143826f71b invert check for less indentation; use local vars to reduce duplication; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227355 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 19:44:21 +00:00
Sanjay Patel
9598bbc542 use SDValue methods directly instead of getNode()->* ; NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227334 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 18:01:31 +00:00
Michael Kuperstein
837fe4388b [x32] Change the condition from bitness to LP64 for TCRETURNdi64.
TCRETURNmi64, which was mistakenly changed in r227307 will wait for another day.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227317 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 16:11:35 +00:00
Michael Kuperstein
0906c8fc1c [X86] Reduce some 32-bit imuls into lea + shl
Reduce integer multiplication by a constant of the form k*2^c, where k is in {3,5,9} into a lea + shl. Previously it was only done for imulq on 64-bit platforms, but it makes sense for imull and 32-bit as well.

Differential Revision: http://reviews.llvm.org/D7196

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227308 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 14:08:22 +00:00
Michael Kuperstein
e5b95695ea [x32] Enable sibcall optimization on x32.
This includes two things:
1) Fix TCRETURNdi and TCRETURN64di patterns to check the right thing (LP64 as opposed to target bitness).
2) Allow LEA64_32 in MatchingStackOffset.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227307 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 13:38:48 +00:00
Elena Demikhovsky
b9d3801cd2 AVX-512: Added FMA intrinsics with rounding mode
By Asaf Badouh and Elena Demikhovsky

Added special nodes for rounding: FMADD_RND, FMSUB_RND..
It will prevent merge between nodes with rounding and other standard nodes.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227303 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 10:21:27 +00:00
Craig Topper
aef361807e [X86] Teach disassembler to handle illegal immediates on AVX512 integer compare instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227302 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 10:09:56 +00:00
Craig Topper
f3a2214da8 [X86] Merge printSSECC and printAVXCC. They only differed by an assertion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227301 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 10:09:52 +00:00
Alexey Samsonov
00b7a940e7 Revert "[x86] Combine x86mmx/i64 to v2i64 conversion to use scalar_to_vector"
This reverts commits r226953 and r226974.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227248 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-27 21:34:11 +00:00
Simon Pilgrim
44513da617 [X86][SSE] Float comparisons can sometimes be safely commuted
For ordered, unordered, equal and not-equal tests, packed float and double comparison instructions can be safely commuted without affecting the results. This patch checks the comparison mode of the (v)cmpps + (v)cmppd instructions and commutes the result if it can.

Differential Revision: http://reviews.llvm.org/D7178

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227145 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 22:29:24 +00:00
Simon Pilgrim
3ba85ab23a [X86][PCLMUL] Enable commutation for PCLMUL instructions
Patch to allow (v)pclmulqdq to be commuted - swaps the src registers and inverts the immediate (low/high) src mask.

Differential Revision: http://reviews.llvm.org/D7180

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227141 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 22:00:18 +00:00
Alex Rosenberg
8da9a6686a Use a different encoding for debugtrap on PS4.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227116 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 19:09:27 +00:00
Eric Christopher
04bcc11905 Move DataLayout back to the TargetMachine from TargetSubtargetInfo
derived classes.

Since global data alignment, layout, and mangling is often based on the
DataLayout, move it to the TargetMachine. This ensures that global
data is going to be layed out and mangled consistently if the subtarget
changes on a per function basis. Prior to this all targets(*) have
had subtarget dependent code moved out and onto the TargetMachine.

*One target hasn't been migrated as part of this change: R600. The
R600 port has, as a subtarget feature, the size of pointers and
this affects global data layout. I've currently hacked in a FIXME
to enable progress, but the port needs to be updated to either pass
the 64-bitness to the TargetMachine, or fix the DataLayout to
avoid subtarget dependent features.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227113 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 19:03:15 +00:00
Sanjay Patel
956d6f0cf5 Model sqrtsd as a binary operation with one source operand tied to the destination (PR14221)
This patch fixes the following miscompile:

define void @sqrtsd(<2 x double> %a) nounwind uwtable ssp {
  %0 = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a) nounwind 
  %a0 = extractelement <2 x double> %0, i32 0
  %conv = fptrunc double %a0 to float
  %a1 = extractelement <2 x double> %0, i32 1
  %conv3 = fptrunc double %a1 to float
  tail call void @callee2(float %conv, float %conv3) nounwind
  ret void
}

Current codegen:

sqrtsd	%xmm0, %xmm1        ## high element of %xmm1 is undef here
xorps	%xmm0, %xmm0
cvtsd2ss	%xmm1, %xmm0
shufpd	$1, %xmm1, %xmm1
cvtsd2ss	%xmm1, %xmm1 ## operating on undef value
jmp	_callee

This is a continuation of http://llvm.org/viewvc/llvm-project?view=revision&revision=224624 ( http://reviews.llvm.org/D6330 ) 
which was itself a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ).

All of these patches are partial fixes for PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ); 
this should be the final patch needed to resolve that bug.

Differential Revision: http://reviews.llvm.org/D6885



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227111 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 18:42:16 +00:00
Elena Demikhovsky
717d41d8c3 AVX-512: Changes in operations on masks registers for KNL and SKX
- Added KSHIFTB/D/Q for skx
- Added KORTESTB/D/Q for skx
- Fixed store operation for v8i1 type for KNL
- Store size of v8i1, v4i1 and v2i1 are changed to 8 bits



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227043 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 12:47:15 +00:00
Craig Topper
ff763041d2 [X86] Give scalar VRNDSCALE instructions priority in AVX512 mode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227039 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 08:49:22 +00:00
Craig Topper
e3792c042d Simplify a multiclass. No functional change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227038 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 08:49:19 +00:00
Craig Topper
e237954ed8 Remove tab characters. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227036 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 08:45:32 +00:00
Elena Demikhovsky
70bae89669 Implemented cost model for masked load/store operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227035 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 08:44:46 +00:00
Craig Topper
896c1e9b70 [X86] Replace i32i8imm on SSE/AVX instructions with i32u8imm which will make the assembler bounds check them. It will also make them print as unsigned.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227032 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 02:21:16 +00:00
Craig Topper
046047ccc3 [X86] Use u8imm in several places that used i32i8imm that don't require an i32 type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227031 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 02:21:13 +00:00
Craig Topper
a92d03bb7a Remove tab characters. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227030 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 02:21:11 +00:00
Bruno Cardoso Lopes
88869354d8 [x86] Fix a comment
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226974 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-24 00:22:04 +00:00
Bruno Cardoso Lopes
807360ab08 [x86] Combine x86mmx/i64 to v2i64 conversion to use scalar_to_vector
Handle the poor codegen for i64/x86xmm->v2i64 (%mm -> %xmm) moves. Instead of
using stack store/load pair to do the job, use scalar_to_vector directly, which
in the MMX case can use movq2dq. This was the current behavior prior to
improvements for vector legalization of extloads in r213897.

This commit fixes the regression and as a side-effect also remove some
unnecessary shuffles.

In the new attached testcase, we go from:

pshufw  $-18, (%rdi), %mm0
movq    %mm0, -8(%rsp)
movq    -8(%rsp), %xmm0
pshufd  $-44, %xmm0, %xmm0
movd    %xmm0, %eax
...

To:

pshufw  $-18, (%rdi), %mm0
movq2dq %mm0, %xmm0
movd    %xmm0, %eax
...

Differential Revision: http://reviews.llvm.org/D7126
rdar://problem/19413324

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226953 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 22:44:16 +00:00
Reid Kleckner
26ba4c13a7 Classify functions by EH personality type rather than using the triple
This mostly reverts commit r222062 and replaces it with a new enum. At
some point this enum will grow at least for other MSVC EH personalities.

Also beefs up the way we were sniffing the personality function.
Previously we would emit the Itanium LSDA despite using
__C_specific_handler.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6987

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226920 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 18:49:01 +00:00
Eric Christopher
ab74a03c00 Remove some local variables in place of just querying for them
in the couple of asserts.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226917 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 17:22:44 +00:00
Craig Topper
d05a6aa4e6 [x86] Change u8imm operands to always print as unsigned. This makes shuffle masks and the like make way more sense.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226902 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 08:00:59 +00:00
Craig Topper
c3942c9623 [X86] Add IntrNoMem to the AVX512 conflict intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226897 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 06:11:45 +00:00
Simon Pilgrim
316b43f7df [X86][AVX] Added (V)MOVDDUP / (V)MOVSLDUP / (V)MOVSHDUP memory folding + tests.
Minor tweak now that D7042 is complete, we can enable stack folding for (V)MOVDDUP and do proper testing.

Added missing AVX ymm folding patterns and fixed alignment for AVX VMOVSLDUP / VMOVSHDUP.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226873 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 22:39:59 +00:00
Alexander Potapenko
331e7db8ef Mark |TLI| variables used to suppress -Wunused-variable warnings.
(These vars are only used in assertions)


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226815 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 13:03:33 +00:00
Elena Demikhovsky
2785766bc8 Fixed a bug in type legalizer for masked load/store intrinsics.
The problem occurs when after vectorization we have type
<2 x i32>. This type is promoted to <2 x i64> and then requires
additional efforts for expanding loads and truncating stores.
I added EXPAND / TRUNCATE attributes to the masked load/store
SDNodes. The code now contains additional shuffles.
I've prepared changes in the cost estimation for masked memory
operations, it will be submitted separately.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226808 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 12:07:59 +00:00
Craig Topper
deb2e51099 Revert r226798. Guess I missed the patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226802 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 09:01:20 +00:00
Craig Topper
efad370a06 Use u8imm instead of i32i8imm on a couple instructions that have no patterns and thus no reason to use a larger operand size.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226798 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 08:53:11 +00:00
Craig Topper
6660dcedd3 [X86] Remove some unused multiclasses from AVX512 instruction file.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226797 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 08:53:08 +00:00
Saleem Abdulrasool
3f6dea4864 ARM: fail less catastrophically on invalid Windows input
Windows supports a restricted set of relocations (compared to ARM ELF).  In some
cases, we may end up generating an unsupported relocation.  This can occur with
bad input to the assembler in particular (the frontend should never generate
code that cannot be compiled).  Generate an error rather than just aborting.

The change in the API is driven by the desire to provide a slightly more helpful
message for debugging purposes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226779 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 04:03:32 +00:00
Simon Pilgrim
3f6acdd265 [X86][SSE] Missing SSE/AVX1 memory folding integer instructions
Added most of the missing integer vector folding patterns for SSE (to SSE42) and AVX1.

The most useful of these are probably the i32/i64 extraction, i8/i16/i32/i64 insertions, zero/sign extension, unsigned saturation subtractions, i64 subtractions and the variable mask blends (pblendvb) - others include CLMUL, SSE42 string comparisons and bit tests.

Differential Revision: http://reviews.llvm.org/D7094



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226745 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 23:43:30 +00:00
Simon Pilgrim
4269590166 [X86][SSE] Added support for SSE3 lane duplication shuffle instructions
This patch adds shuffle matching for the SSE3 MOVDDUP, MOVSLDUP and MOVSHDUP instructions. The big use of these being that they avoid many single source shuffles from needing to use (pre-AVX) dual source instructions such as SHUFPD/SHUFPS: causing extra moves and preventing load folds.

Adding these instructions uncovered an issue in XFormVExtractWithShuffleIntoLoad which crashed on single operand shuffle instructions (now fixed). It also involved fixing getTargetShuffleMask to correctly identify theses instructions as unary shuffles.

Also adds a missing tablegen pattern for MOVDDUP.

Differential Revision: http://reviews.llvm.org/D7042



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226716 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 22:44:35 +00:00
Simon Pilgrim
a7a4b836a3 [X86][SSE] movddup shuffle mask decodes
Patch to provide shuffle decodes and asm comments for the SSE3/AVX1 movddup double duplication instructions.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226705 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 22:02:30 +00:00
Ahmed Bougacha
34288d885e [X86] Declare SSE4.1/AVX2 vector extloads covered by PMOV[SZ]X legal.
Now that we can fully specify extload legality, we can declare them
legal for the PMOVSX/PMOVZX instructions.  This for instance enables
a DAGCombine to fire on code such as
  (and (<zextload-equivalent> ...), <redundant mask>)
to turn it into:
  (zextload ...)
as seen in the testcase changes.

There is one regression, in widen_load-2.ll: we're no longer able
to do store-to-load forwarding with illegal extload memory types.
This will be addressed separately.

Differential Revision: http://reviews.llvm.org/D6533


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226676 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 17:07:06 +00:00
Michael Kuperstein
0b4244ade1 [x32] Fast ISel should use LEA64_32r instead of LEA32r to adjust addresses in x32 mode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226661 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 14:44:05 +00:00
Craig Topper
74670deb21 [x86] Remove some unnecessary and slightly confusing typecasts from some patterns. I think it actually went i32->iPtr->i32 in some of these cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226647 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 08:43:57 +00:00
Craig Topper
51da87a580 [X86] Convert all the i8imm used by AVX512 and MMX instructions to u8imm.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226646 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 08:43:49 +00:00
Craig Topper
951d088ae7 [X86] Convert all the i8imm used by SSE and AVX instructions to u8imm.
This makes the assembler check their size and removes a hack from the disassembler to avoid sign extending the immediate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226645 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 08:15:54 +00:00
Craig Topper
f81b1f346a [x86] Add assembly parser bounds checking to the immediate value for cmpss/cmpsd/cmpps/cmppd.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226642 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 06:07:53 +00:00
Craig Topper
d624796bc6 [x86] Add some mayLoad/hasSideEffects flags. Remove one that was already covered by a pattern.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226562 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 12:15:30 +00:00
Simon Pilgrim
2a2f94c5f2 [X86][AVX] Missing AVX1 memory folding float instructions
Now that we can create much more exhaustive X86 memory folding tests, this patch adds the missing AVX1/F16C floating point instruction stack foldings we can easily test for including the scalar intrinsics (add, div, max, min, mul, sub), conversions float/int to double, half precision conversions, rounding, dot product and bit test. The patch also adds a couple of obviously missing SSE instructions (more to follow once we have full SSE testing).

Now that scalar folding is working it broke a very old test (2006-10-07-ScalarSSEMiscompile.ll) - this test appears to make no sense as its trying to ensure that a scalar subtraction isn't folded as it 'would zero the top elts of the loaded vector' - this test just appears to be wrong to me.

Differential Revision: http://reviews.llvm.org/D7055



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226513 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 22:40:45 +00:00
Rafael Espindola
a23cc6a1ea Add r224985 back with fixes.
The fixes are to note that AArch64 has additional restrictions on when local
relocations can be used. In particular, ld64 requires that relocations to
cstring/cfstrings use linker visible symbols.

Original message:

In an assembly expression like

bar:
  .long L0 + 1

the intended semantics is that bar will contain a pointer one byte past L0.

In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.

The solution used in ELF to use relocation with symbols if there is a non-zero
addend.

In MachO before this patch we would just keep all symbols in some sections.

This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.

This patch implements the non-zero addend logic for MachO too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226503 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 21:11:14 +00:00
Craig Topper
6f91bd79f3 [x86] Change AVX512 intrinsics to take a 8-bit immediate for the comparision kind instead of a 32-bit immediate. This better aligns with the emitted instruction. It also matches SSE and AVX1 equivalents. Also add auto upgrade support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226430 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 06:07:27 +00:00
David Blaikie
341a7e245e std::unique_ptrify the MCStreamer argument to createAsmPrinter
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226414 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 20:29:04 +00:00
Saleem Abdulrasool
d4bdb4e0b2 X86: fix comment typo in AsmParser
Fix a typo.  NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226313 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 20:16:06 +00:00
Adam Nemet
ad2ac976af [AVX512] Add intrinsics for masked aligned FP loads and stores
Similar to the unaligned cases.

Test was generated with update_llc_test_checks.py.

Part of <rdar://problem/17688758>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226296 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 18:50:09 +00:00
Andrea Di Biagio
ac7b9c828f [X86][DAG] Disable target specific combine on INSERTPS dag nodes at -O0.
This patch disables target specific combine on X86ISD::INSERTPS dag nodes
if optlevel is CodeGenOpt::None.

The backend currently implements a target specific combine rule that converts
a vector load used by an INSERTPS dag node into a scalar load plus a
scalar_to_vector. This allows ISel to select a single INSERTPSrm instead of
two instructions (i.e. a vector load plus INSERTPSrr).

However, the existing target combine rule on INSERTPS nodes only works under
the assumption that ISel will always be able to match an INSERTPSrm. This is
not true in general at -O0, since the backend only allows folding a load into
the memory operand of an instruction if the optimization level is not
CodeGenOpt::None.

In the example below:

//
__m128 test(__m128 a, __m128 *b) {
  __m128 c = _mm_insert_ps(a, *b, 1 << 6);
  return c;
}
//

Before this patch, at -O0, the backend would have canonicalized the load to 'b'
into a scalar load plus scalar_to_vector. Later on, ISel would have selected an
INSERTPSrr leaving the insertps mask in an inconsistent state:

  movss 4(%rdi), %xmm1
  insertps  $64, %xmm1, %xmm0 # xmm0 = xmm1[1],xmm0[1,2,3].

With this patch, the backend avoids folding the vector load into the operand of
the INSERTPS. The new codegen at -O0 is:

  movaps (%rdi), %xmm1
  insertps  $64, %xmm1, %xmm0 # %xmm1[1],xmm0[1,2,3].


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226277 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 14:55:26 +00:00
Joerg Sonnenberger
638077aa41 Support @PLT loads on 32bit x86.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226182 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 17:59:02 +00:00
Craig Topper
06185e7f6b Hide some redundant AVX512 instructions from the asm parser, but force them to show up in the disassembler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226155 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 09:37:15 +00:00
Rafael Espindola
8327f0bca1 Revert "Add r224985 back with two fixes."
This reverts commit r225644 while I debug a regression.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226022 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 19:07:23 +00:00
David Majnemer
3eafccc036 Use the operand vector instead so inline assembly can be validated too
The buildbots got upset after r225941, this should hopefully fix things.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225954 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 06:14:36 +00:00
Saleem Abdulrasool
e1f65e239a X86: only access operands if they are present
If there is no associated immediate (MS style inline asm), do not try to access
the operand, assume that it is valid.  This should fix the buildbots after SVN
r225941.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225950 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 05:37:10 +00:00
JF Bastien
7f0cbb5703 Revert "Insert random noops to increase security against ROP attacks (llvm)"
This reverts commit:
http://reviews.llvm.org/D3392

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225948 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 05:24:33 +00:00
Saleem Abdulrasool
1679d0d3c2 X86: validate 'int' instruction
The int instruction takes as an operand an 8-bit immediate value.  Validate that
the input is valid rather than silently truncating the value.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225941 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 05:10:21 +00:00
JF Bastien
21befa7761 Insert random noops to increase security against ROP attacks (llvm)
A pass that adds random noops to X86 binaries to introduce diversity with the goal of increasing security against most return-oriented programming attacks.

Command line options:
  -noop-insertion // Enable noop insertion.
  -noop-insertion-percentage=X // X% of assembly instructions will have a noop prepended (default: 50%, requires -noop-insertion)
  -max-noops-per-instruction=X // Randomly generate X noops per instruction. ie. roll the dice X times with probability set above (default: 1). This doesn't guarantee X noop instructions.

In addition, the following 'quick switch' in clang enables basic diversity using default settings (currently: noop insertion and schedule randomization; it is intended to be extended in the future).
  -fdiversify

This is the llvm part of the patch.
clang part: D3393

http://reviews.llvm.org/D3392
Patch by Stephen Crane (@rinon)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225908 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 01:07:26 +00:00
Adam Nemet
293f71ddd2 [AVX512] Unpack support in new shuffle lowering
This now handles both 32 and 64-bit element sizes.

In this version, the test are in vector-shuffle-512-v8.ll, canonicalized by
Chandler's update_llc_test_checks.py.

Part of <rdar://problem/17688758>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225838 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 22:20:18 +00:00
Adam Nemet
cebcef6329 [AVX512] Add pretty-printing of shuffle mask for unpacks
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225837 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 22:20:14 +00:00
Reid Kleckner
d8f69a7201 Rename llvm.recoverframeallocation to llvm.framerecover
This name is less descriptive, but it sort of puts things in the
'llvm.frame...' namespace, relating it to frameallocate and
frameaddress. It also avoids using "allocate" and "allocation" together.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225752 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 01:51:34 +00:00
Reid Kleckner
221a7075cf Add the llvm.frameallocate and llvm.recoverframeallocation intrinsics
These intrinsics allow multiple functions to share a single stack
allocation from one function's call frame. The function with the
allocation may only perform one allocation, and it must be in the entry
block.

Functions accessing the allocation call llvm.recoverframeallocation with
the function whose frame they are accessing and a frame pointer from an
active call frame of that function.

These intrinsics are very difficult to inline correctly, so the
intention is that they be introduced rarely, or at least very late
during EH preparation.

Reviewers: echristo, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D6493

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225746 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 00:48:10 +00:00
Simon Pilgrim
c815f15537 [X86][SSE] Minor regression fix for r225551
r225551 vector byte shuffle optimization caused an assertion as fully zeroable vectors can be produced under certain circumstances. This fix drops the assert and returns a zero vector where the assert would have failed.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225718 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-12 22:38:08 +00:00
Ahmed Bougacha
cd5bbd8bad [X86] Also create+widen FMIN/FMAX nodes for v2f32.
This happens in the HINT benchmark, where the SLP-vectorizer created
v2f32 fcmp/select code.  The "correct" solution would have been to
teach the vectorizer cost model that v2f32 isn't legal (because really,
it isn't), but if we can vectorize we might as well do so.

We legalize these v2f32 FMIN/FMAX nodes by widening to v4f32 later on.
v3f32 were already widened to v4f32 by the generic unroll-and-build-vector
legalization.

rdar://15763436
Differential Revision: http://reviews.llvm.org/D6557


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225691 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-12 20:31:30 +00:00
Rafael Espindola
5512415ade Add r224985 back with two fixes.
One is that AArch64 has additional restrictions on when local relocations can
be used. We have to take those into consideration when deciding to put a L
symbol in the symbol table or not.

The other is that ld64 requires the relocations to cstring to use linker
visible symbols on AArch64.

Thanks to Michael Zolotukhin for testing this!

Remove doesSectionRequireSymbols.

In an assembly expression like

bar:
.long L0 + 1

the intended semantics is that bar will contain a pointer one byte past L0.

In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.

The solution used in ELF to use relocation with symbols if there is a non-zero
addend.

In MachO before this patch we would just keep all symbols in some sections.

This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.

This patch implements the non-zero addend logic for MachO too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225644 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-12 18:13:07 +00:00
Simon Pilgrim
dc5a2dabfe [X86][SSE] Minor fix to VPBLENDW AVX2 commutation.
D6015 / rL221313 enabled commutation for SSE immediate blend instructions, but due to a typo the AVX2 VPBLENDW ymm instructions weren't flagged as commutative along with the others in the tables, but were still being commuted in code and tested for.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225612 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-11 22:08:01 +00:00
David Majnemer
85a0cb9bf2 Revert most of r225597
We can't rely on a DataLayout enlightened constant folder.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225599 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-11 07:29:51 +00:00
David Majnemer
d2f4460ee7 X86: Properly decode shuffle masks when the constant pool type is weird
It's possible for the constant pool entry for the shuffle mask to come
from a completely different operation.  This occurs when Constants have
the same bit pattern but have different types.

Make DecodePSHUFBMask tolerant of types which, after a bitcast, are
appropriately sized vector types.

This fixes PR22188.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225597 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-11 05:08:57 +00:00
Saleem Abdulrasool
776673ea09 X86: teach X86TargetLowering about L,M,O constraints
Teach the ISelLowering for X86 about the L,M,O target specific constraints.
Although, for the moment, clang performs constraint validation and prevents
passing along inline asm which may have immediate constant constraints violated,
the backend should be able to cope with the invalid inline asm a bit better.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225596 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-11 04:39:24 +00:00
Simon Pilgrim
47abf0e3da [X86][SSE] Improved (v)insertps shuffle matching
In the current code we only attempt to match against insertps if we have exactly one element from the second input vector, irrespective of how much of the shuffle result is zeroable.

This patch checks to see if there is a single non-zeroable element from either input that requires insertion. It also supports matching of cases where only one of the inputs need to be referenced.

We also split insertps shuffle matching off into a new lowerVectorShuffleAsInsertPS function.

Differential Revision: http://reviews.llvm.org/D6879



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225589 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-10 19:45:33 +00:00
Simon Pilgrim
34630b6ea9 [X86][SSE] Avoid vector byte shuffles with zero by using pshufb to create zeros
pshufb can shuffle in zero bytes as well as bytes from a source vector - we can use this to avoid having to shuffle 2 vectors and ORing the result when the used inputs from a vector are all zeroable.

Differential Revision: http://reviews.llvm.org/D6878



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225551 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-09 22:03:19 +00:00
Lang Hames
4c553e0367 Recommit r224935 with a fix for the ObjC++/AArch64 bug that that revision
introduced.

A test case for the bug was already committed in r225385.

Patch by Rafael Espindola.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225534 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-09 18:55:42 +00:00
Chandler Carruth
72dd2f22b5 [x86] Add a flag to control the vector shuffle legality predicates that
complements the new vector shuffle lowering code path. This flag,
naturally, is *off* because we've not tested or evaluated the results of
this at all. However, the flag will make it much easier to evaluate
whether we can be this aggressive and whether there are missing vector
shuffle lowering optimizations.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225491 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-09 01:24:36 +00:00
Ahmed Bougacha
3df0ee1141 [X86] Reflow comment. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225455 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-08 17:49:48 +00:00
Michael Kuperstein
0858c28ca8 [X86] Don't try to generate direct calls to TLS globals
The call lowering assumes that if the callee is a global, we want to emit a direct call.
This is correct for regular globals, but not for TLS ones.

Differential Revision: http://reviews.llvm.org/D6862

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225438 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-08 11:50:58 +00:00
Craig Topper
367b67df3e [X86] Don't print 'dword ptr' or 'qword ptr' on the operand to some of the LEA variants in Intel syntax. The memory operand is inherently unsized.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225432 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-08 07:41:30 +00:00
Ahmed Bougacha
7fac1d945f [SelectionDAG] Allow targets to specify legality of extloads' result
type (in addition to the memory type).

The *LoadExt* legalization handling used to only have one type, the
memory type.  This forced users to assume that as long as the extload
for the memory type was declared legal, and the result type was legal,
the whole extload was legal.

However, this isn't always the case.  For instance, on X86, with AVX,
this is legal:
    v4i32 load, zext from v4i8
but this isn't:
    v4i64 load, zext from v4i8
Whereas v4i64 is (arguably) legal, even without AVX2.

Note that the same thing was done a while ago for truncstores (r46140),
but I assume no one needed it yet for extloads, so here we go.

Calls to getLoadExtAction were changed to add the value type, found
manually in the surrounding code.

Calls to setLoadExtAction were mechanically changed, by wrapping the
call in a loop, to match previous behavior.  The loop iterates over
the MVT subrange corresponding to the memory type (FP vectors, etc...).
I also pulled neighboring setTruncStoreActions into some of the loops;
those shouldn't make a difference, as the additional types are illegal.
(e.g., i128->i1 truncstores on PPC.)

No functional change intended.

Differential Revision: http://reviews.llvm.org/D6532


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225421 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-08 00:51:32 +00:00
Matthias Braun
e081880cc1 X86: VZeroUpperInserter: shortcut should not trigger if we have any function live-ins.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225419 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-08 00:33:48 +00:00
Ahmed Bougacha
8065738154 [CodeGen] Use MVT iterator_ranges in legality loops. NFC intended.
A few loops do trickier things than just iterating on an MVT subset,
so I'll leave them be for now.
Follow-up of r225387.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225392 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 21:27:10 +00:00
Ahmed Bougacha
1d04b11927 [X86] Fix 512->256 typo in comments. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225367 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 19:38:50 +00:00
David Majnemer
60812c05e7 X86: Allow the stack probe size to be configurable per function
LLVM emits stack probes on Windows targets to ensure that the stack is
correctly accessed.  However, the amount of stack allocated before
emitting such a probe is hardcoded to 4096.

It is desirable to have this be configurable so that a function might
opt-out of stack probes.  Our level of granularity is at the function
level instead of, say, the module level to permit proper generation of
code after LTO.

Patch by Andrew H!

N.B.  The inliner needs to be updated to properly consider what happens
after inlining a function with a specific stack-probe-size into another
function with a different stack-probe-size.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225360 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 18:14:07 +00:00
Ahmed Bougacha
d412c608fc [X86] Teach FCOPYSIGN lowering to recognize constant magnitudes.
For code like:
    float foo(float x) { return copysign(1.0, x); }
We used to generate:
    andps  <-0.000000e+00,0,0,0>, %xmm0
    movss  <1.000000e+00>, %xmm1
    andps  <nan>, %xmm1
    orps   %xmm0, %xmm1
Basically doing an abs(1.0f) in the two middle instructions.

We now generate:
    andps  <-0.000000e+00,0,0,0>, %xmm0
    orps   <1.000000e+00,0,0,0>, %xmm0

Builds on cleanups r223415, r223542.
rdar://19049548
Differential Revision: http://reviews.llvm.org/D6555


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225357 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 17:33:03 +00:00
Craig Topper
76bdb26595 [X86] Merge a switch statement inside a default case of another switch statement on the same variable. There was no additional code in the default so this should be no functional change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225345 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 08:10:38 +00:00
Craig Topper
ebddeaee40 [X86] Don't mark the shift by 1 instructions as isConvertibleToThreeAddress. There is no handling for them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225344 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 08:10:36 +00:00
Craig Topper
decea4d6a6 [X86] Remove some unused TYPE enums from the disassembler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225343 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 07:47:52 +00:00
Lang Hames
84acf09f32 Revert r224935 "Refactor duplicated code. No intended functionality change."
This is affecting the behavior of some ObjC++ / AArch64 test cases on Darwin.
Reverting to get the bots green while I track down the source of the changed
behavior.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225311 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 23:04:36 +00:00
Craig Topper
f6145affbf [X86] Add OpSize32 to XBEGIN_4. Add XBEGIN_2 with OpSize16.
Requires new AsmParserOperand types that detect 16-bit and 32/64-bit mode so that we choose the right instruction based on default sizing without predicates. This is necessary since predicates mess up the disassembler table building.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225256 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 08:59:30 +00:00
Craig Topper
88dc6cd5c9 [X86] Make isel select the 2-byte register form of INC/DEC even in non-64-bit mode. Convert to the 1-byte form in non-64-bit mode as part of MCInst lowering.
Overall this seems simpler. It reduces duplication of patterns between both modes and it simplifies the memory folding/unfolding tables as they don't need to create fake instructions just to keep track of 64-bitness.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225252 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 07:35:50 +00:00
David Majnemer
b3065539bd X86: Don't make illegal GOTTPOFF relocations
"ELF Handling for Thread-Local Storage" specifies that R_X86_64_GOTTPOFF
relocation target a movq or addq instruction.

Prohibit the truncation of such loads to movl or addl.

This fixes PR22083.

Differential Revision: http://reviews.llvm.org/D6839

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225250 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 07:12:52 +00:00
Craig Topper
e16c86e524 [X86] Remove 16-bit and 32-bit offset jump instructions from the AsmParser. We always select the 8-bit size and let the assembler backend relax to the larger size.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225243 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 04:23:57 +00:00
Craig Topper
1299ae2403 [X86] Make isel select the shorter form of jump instructions instead of the long form.
The assembler backend will relax to the long form if necessary. This removes a swap from long form to short form in the MCInstLowering code. Selecting the long form used to be required by the old JIT.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225242 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 04:23:53 +00:00
Lang Hames
bce877c84c Revert r225048: It broke ObjC on AArch64.
I've filed http://llvm.org/PR22100 to track this issue.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225228 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 00:54:32 +00:00
Brad Smith
0d3cd751c4 Remove X86 .quad workaround for buggy GNU assembler on OpenBSD / Bitrig.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225227 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 00:53:52 +00:00
Simon Pilgrim
a5b2142af1 [X86][SSE] lowerVectorShuffleAsByteShift tidyup
Removed local isSequential predicate and use standard helper isSequentialOrUndefInRange instead.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225216 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-05 22:08:48 +00:00
Simon Pilgrim
dd0552884b [X86][SSE] Fixed description for isSequentialOrUndefInRange. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225202 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-05 21:09:48 +00:00
Craig Topper
9bf73516cb Replace several 'assert(false' with 'llvm_unreachable' or fold a condition into the assert.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225160 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-05 10:15:49 +00:00
Craig Topper
a5ba79effa [X86] Remove the predicates from the register forms of the 2-byte inc and dec instructions. Remove the 32-bit mode only versions that existed for the disassembler. Move the patterns out of the instructions so they can still be qualified with predicates.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225157 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-05 08:19:12 +00:00
Craig Topper
34f96b9cc3 [X86] Simplify code a little by just summing flags instead of conditionally incrementing. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225156 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-05 08:19:10 +00:00
Craig Topper
a481b50d75 [X86] Remove unnecessary redeclaration of a variable with the same assignment as the beginning of the function. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225155 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-05 08:19:07 +00:00
Craig Topper
d571fdb20b [X86] Remove a strange fixme referring to a hack that doesn't seem to exist since the code is in a comment. Can't figure out what the body of the 'if' was supposed to be anyway.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225154 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-05 08:19:05 +00:00
Craig Topper
1d8d3b62ad [x86] Reduce text duplication for similar operand class declarations in tablegen instruction info. No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225153 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-05 08:19:03 +00:00
Craig Topper
68f83ee530 [X86] Fix the immediate size to match the address size in the operand types for the move to/from absolute memory instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225152 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-05 08:18:59 +00:00
Craig Topper
00d70e98f0 Minor cleanup to all the switches after MatchInstructionImpl in all the AsmParsers.
Make sure they all have llvm_unreachable on the default path out of the switch. Remove unnecessary "default: break". Remove a 'return' after unreachable. Fix some indentation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225114 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-03 08:16:34 +00:00
Craig Topper
01c99892ca [X86] Disassembler support for move to/from %rax with a 32-bit memory offset is REX.W and AdSize prefix are both present.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225099 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-03 00:00:20 +00:00
Craig Topper
e3c96460b2 [X86] Use 32-bit sign extended immediate for 64-bit LOCK_ArithBinOp with sign extended immediate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225098 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-03 00:00:14 +00:00
Andrea Di Biagio
7fee03e4d5 Improved comments. No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225080 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-02 10:47:46 +00:00
Craig Topper
8ae0fd2bb3 [X86] Bring some better consistency to the naming of the move to/from %al/ax/eax/rax with memory offset.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225078 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-02 07:36:23 +00:00
Craig Topper
71fc42dbf6 [X86] Make the instructions that use AdSize16/32/64 co-exist together without using mode predicates.
This is necessary to allow the disassembler to be able to handle AdSize32 instructions in 64-bit mode when address size prefix is used.

Eventually we should probably also support 'addr32' and 'addr16' in the assembler to override the address size on some of these instructions. But for now we'll just use special operand types that will lookup the current mode size to select the right instruction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225075 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-02 07:02:25 +00:00
Rafael Espindola
8093abb745 Add r224985 back with a fix.
The issues was that AArch64 has additional restrictions on when local
relocations can be used. We have to take those into consideration when
deciding to put a L symbol in the symbol table or not.

Original message:

Remove doesSectionRequireSymbols.

In an assembly expression like

bar:
.long L0 + 1

the intended semantics is that bar will contain a pointer one byte past L0.

In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.

The solution used in ELF to use relocation with symbols if there is a non-zero
addend.

In MachO before this patch we would just keep all symbols in some sections.

This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.

This patch implements the non-zero addend logic for MachO too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225048 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-31 17:19:34 +00:00
Rafael Espindola
937e781f49 Revert "Remove doesSectionRequireSymbols."
This reverts commit r224985.

I am investigating why it made an Apple bot unhappy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225044 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-31 16:06:48 +00:00
Craig Topper
51f423ff30 [X86] Fix disassembly of absolute moves to work correctly in 16 and 32-bit modes with all 4 combinations of OpSize and AdSize prefixes being present or not.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225036 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-31 07:07:31 +00:00
Craig Topper
e8ffd99e4e [x86] Simplify detection of jcxz/jecxz/jrcxz in disassembler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225035 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-31 07:07:11 +00:00
Peter Collingbourne
d8ae3e1fee x86_64: Fix calls to __morestack under the large code model.
Under the large code model, we cannot assume that __morestack lives within
2^31 bytes of the call site, so we cannot use pc-relative addressing. We
cannot perform the call via a temporary register, as the rax register may
be used to store the static chain, and all other suitable registers may be
either callee-save or used for parameter passing. We cannot use the stack
at this point either because __morestack manipulates the stack directly.

To avoid these issues, perform an indirect call via a read-only memory
location containing the address.

This solution is not perfect, as it assumes that the .rodata section
is laid out within 2^31 bytes of each function body, but this seems to
be sufficient for JIT.

Differential Revision: http://reviews.llvm.org/D6787

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225003 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-30 20:05:19 +00:00
Rafael Espindola
65300b95e6 Remove doesSectionRequireSymbols.
In an assembly expression like

bar:
.long L0 + 1

the intended semantics is that bar will contain a pointer one byte past L0.

In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.

The solution used in ELF to use relocation with symbols if there is a non-zero
addend.

In MachO before this patch we would just keep all symbols in some sections.

This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.

This patch implements the non-zero addend logic for MachO too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224985 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-30 13:13:27 +00:00
Craig Topper
d52bd88fad [X86] Fix some cases where some 8-bit instructions were marked as being convertible to three address instructions, but aren't really.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224940 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-29 16:25:26 +00:00
Craig Topper
67044e9a6a [X86] Add the 0x82 instructions to the disassebmler. They are identical in functionality to the 0x80 opcode instructions, but are not valid in 64-bit mode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224939 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-29 16:25:23 +00:00
Craig Topper
702d11e595 [x86] Refactor some tablegen instruction info classes slightly to prepare for another change. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224938 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-29 16:25:22 +00:00
Craig Topper
b96ee8810f [x86] Remove unused classes from tablegen instruction info.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224937 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-29 16:25:19 +00:00
Rafael Espindola
a21d820952 Add segmented stack support for DragonFlyBSD.
Patch by Michael Neumann.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224936 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-29 15:47:28 +00:00
Rafael Espindola
2a1c1c9dea Refactor duplicated code.
No intended functionality change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224935 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-29 15:18:31 +00:00
Keno Fischer
41bda9f201 [X86][ISel] Fix a regression I introduced in r224884
The else case ResultReg was not checked for validity.
To my surprise, this case was not hit in any of the
existing test cases. This includes a new test cases
that tests this path.

Also drop the `target triple` declaration from the
original test as suggested by H.J. Lu, because
apparently with it the test won't be run on Linux

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224901 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-28 15:20:57 +00:00
Michael Kuperstein
bfa4a373f4 [X86] Add missing memory variants to AVX false dependency breaking
Adds missing memory instruction variants to AVX false dependency breaking handling. (SSE was handled in r224246)

Differential Revision: http://reviews.llvm.org/D6780

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224900 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-28 13:15:05 +00:00
Andrea Di Biagio
70a7cda495 [CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm.cttz/ctlz.
If the control flow is modelling an if-statement where the only instruction in
the 'then' basic block (excluding the terminator) is a call to cttz/ctlz,
CodeGenPrepare can try to speculate the cttz/ctlz call and simplify the control
flow graph.

Example:
\code
entry:
  %cmp = icmp eq i64 %val, 0
  br i1 %cmp, label %end.bb, label %then.bb

then.bb:
  %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true)
  br label %end.bb

end.bb:
  %cond = phi i64 [ %c, %then.bb ], [ 64, %entry]
\code

In this example, basic block %then.bb is taken if value %val is not zero.
Also, the phi node in %end.bb would propagate the size-of in bits of %val
only if %val is equal to zero.

With this patch, CodeGenPrepare will try to hoist the call to cttz from %then.bb
into basic block %entry only if cttz is cheap to speculate for the target.

Added two new hooks in TargetLowering.h to let targets customize the behavior
(i.e. decide whether it is cheap or not to speculate calls to cttz/ctlz). The
two new methods are 'isCheapToSpeculateCtlz' and 'isCheapToSpeculateCttz'.
By default, both methods return 'false'.
On X86, method 'isCheapToSpeculateCtlz' returns true only if the target has
LZCNT. Method 'isCheapToSpeculateCttz' only returns true if the target has BMI.

Differential Revision: http://reviews.llvm.org/D6728


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224899 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-28 11:07:35 +00:00
Craig Topper
04c853b269 [x86] Prevent instruction selection of AVX512 cmp.ps/pd/ss/sd intrinsics with illegal immediates. Correctly this time. I did the wrong patterns the first time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224891 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-27 20:08:45 +00:00
Aaron Ballman
22376afd76 Fixing another -Wunused-variable warning, this time in release builds without asserts. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224889 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-27 19:17:53 +00:00
Aaron Ballman
88e25192c2 Removing a variable that is set but never used, to silence a -Wunused-but-set-variable warning; NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224888 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-27 19:01:19 +00:00
Craig Topper
d840bf4ba9 [x86] Prevent instruction selection of AVX512 cmp.ps/pd/ss/sd intrinsics with illegal immediates. Forgot to do this when I did SSE/SSE2/AVX/AVX2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224887 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-27 18:51:06 +00:00
Craig Topper
3e9bf4c0d0 [x86] Assert on invalid immediates in the instruction printer for cmp.ps/pd/ss/sd instead of truncating the immediate. The assembly parser and instruction selection shouldn't generate invalid immediates.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224886 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-27 18:11:00 +00:00
Craig Topper
6ba84e58da [x86] Prevent llvm.x86.cmp.ps/pd/ss/sd from being selected with bad immediates. The frontend now checks this when the builtin is used. This will allow the instruction printer to not have to deal with invalid immediates on these instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224885 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-27 18:10:56 +00:00
Keno Fischer
cc80af1b4f [FastIsel][X86] Fix invalid register replacement for bool args
Summary:
Consider the following IR:

  %3 = load i8* undef
  %4 = trunc i8 %3 to i1
  %5 = call %jl_value_t.0* @foo(..., i1 %4, ...)
  ret %jl_value_t.0* %5

Bools (that are the result of direct truncs) are lowered as whatever
the argument to the trunc was and a "and 1", causing the part of the
MBB responsible for this argument to look something like this:

  %vreg8<def,tied1> = AND8ri %vreg7<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg8,%vreg7

Later, when the load is lowered, it will insert

  %vreg15<def> = MOV8rm %vreg14, 1, %noreg, 0, %noreg; mem:LD1[undef] GR8:%vreg15 GR64:%vreg14

but remember to (at the end of isel) replace vreg7 by vreg15. Now for
the bug. In fast isel lowering, we mistakenly mark vreg8 as the result
of the load instead of the trunc. This adds a fixup to have
vreg8 replaced by whatever the result of the load is as well, so
we end up with

  %vreg15<def,tied1> = AND8ri %vreg15<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg15

which is an SSA violation and causes problems later down the road.

This fixes PR21557.

Test Plan: Test test case from PR21557 is added to the test suite.

Reviewers: ributzka

Reviewed By: ributzka

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6245

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224884 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-27 13:10:15 +00:00
Craig Topper
a996db696b [X86] Add the debug registers DR8-DR15 so we can assemble and disassemble references to them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224862 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-26 18:20:05 +00:00
Craig Topper
6eb3e3ce10 [X86] Don't fail disassembly if REX.R/REX.B is used on an MMX register. Similar fix to not fail to disassembler CR9-CR15 references.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224861 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-26 18:19:44 +00:00
Craig Topper
654a66dbd3 Teach disassembler to handle illegal immediates on (v)cmpps/pd/ss/sd instructions. Instead of rejecting we'll just generate the _alt forms that don't try to alter the mnemonic. While I'm here, merge some common code in the Instruction printers for the condition code replacement and fix the mask on SSE to be 3-bits instead of 4.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224846 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-26 06:36:28 +00:00
Craig Topper
50d894e4d3 Use MCPhysReg for table of register encodings.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224845 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-26 06:36:23 +00:00
Elena Demikhovsky
b31322328a Masked Load/Store - Changed the order of parameters in intrinsics.
No functional changes.
The documentation is coming.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224829 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-25 07:49:20 +00:00
Craig Topper
3bc4397f1f [X86] Remove the single AdSize indicator and replace it with separate AdSize16/32/64 flags.
This removes a hardcoded list of instructions in the CodeEmitter. Eventually I intend to remove the predicates on the affected instructions since in any given mode two of them are valid if we supported addr32/addr16 prefixes in the assembler.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224809 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-24 06:05:22 +00:00
Elena Demikhovsky
1a637e9fc0 AVX-512: Added FMA instructions, intrinsics an tests for KNL and SKX targets
by Asaf Badouh

http://reviews.llvm.org/D6456



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224764 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 10:30:39 +00:00
Elena Demikhovsky
6709428067 AVX-512: BLENDM - fixed encoding of the broadcast version
Added more intrinsics and encoding tests.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224760 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 09:36:28 +00:00
Jim Grosbach
860122b3b7 X86: Don't over-align combined loads.
When combining consecutive loads+inserts into a single vector load,
we should keep the alignment of the base load. Doing otherwise can, and does,
lead to using overly aligned instructions. In the included test case, for
example, using a 32-byte vmovaps on a 16-byte aligned value. Oops.

rdar://19190968

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224746 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 00:35:23 +00:00
Reid Kleckner
34b7fde802 Make musttail more robust for vector types on x86
Previously I tried to plug musttail into the existing vararg lowering
code. That turned out to be a mistake, because non-vararg calls use
significantly different register lowering, even on x86. For example, AVX
vectors are usually passed in registers to normal functions and memory
to vararg functions.  Now musttail uses a completely separate lowering.

Hopefully this can be used as the basis for non-x86 perfect forwarding.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6156

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224745 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 23:58:37 +00:00
Bruno Cardoso Lopes
ba059464c3 [x86] Add vector @llvm.ctpop intrinsic custom lowering
Currently, when ctpop is supported for scalar types, the expansion of
@llvm.ctpop.vXiY uses vector element extractions, insertions and individual
calls to @llvm.ctpop.iY. When not, expansion with bit-math operations is used
for the scalar calls.

Local haswell measurements show that we can improve vector @llvm.ctpop.vXiY
expansion in some cases by using a using a vector parallel bit twiddling
approach, based on:

v = v - ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
v = ((v + (v >> 4) & 0xF0F0F0F)
v = v + (v >> 8)
v = v + (v >> 16)
v = v & 0x0000003F
(from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel)

When scalar ctpop isn't supported, the approach above performs better for
v2i64, v4i32, v4i64 and v8i32 (see numbers below). And even when scalar ctpop
is supported, this approach performs ~2x better for v8i32.

Here, x86_64 implies -march=corei7-avx without ctpop and x86_64h includes ctpop
support with -march=core-avx2.

== [x86_64h - new]
v8i32: 0.661685
v4i32: 0.514678
v4i64: 0.652009
v2i64: 0.324289
== [x86_64h - old]
v8i32: 1.29578
v4i32: 0.528807
v4i64: 0.65981
v2i64: 0.330707

== [x86_64 - new]
v8i32: 1.003
v4i32: 0.656273
v4i64: 1.11711
v2i64: 0.754064
== [x86_64 - old]
v8i32: 2.34886
v4i32: 1.72053
v4i64: 1.41086
v2i64: 1.0244

More work for other vector types will come next.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224725 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 19:45:43 +00:00
Elena Demikhovsky
c1aa521fb4 AVX-512: Added all forms of BLENDM instructions,
intrinsics, encoding tests for AVX-512F and skx instructions.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224707 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 13:52:48 +00:00
Craig Topper
b10afb51d6 [X86] Add hasSideEffects = 0 to CALLpcrel16. This matches what is inferred from patterns for the 32-bit version.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224692 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-21 20:05:06 +00:00
Craig Topper
b8f8f2dbed [X86] Swap operand order in Intel syntax on a bunch of aliases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224687 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-20 23:05:59 +00:00
Craig Topper
a9bae8c3da [X86] Swap operand order of imul aliases in Intel syntax. Also disable printing of the alias instead of the real instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224686 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-20 23:05:57 +00:00
Craig Topper
0c8f0f0403 [X86] Remove '*' from asm strings in far call/jump aliases for Intel syntax.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224685 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-20 23:05:55 +00:00
Craig Topper
58331b67cb [X86] Don't swap the order of segment and offset in immediate form of far call/jump in Intel syntax.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224684 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-20 23:05:52 +00:00
Craig Topper
ae39073d99 [X86] Immediate forms of far call/jump are not valid in x86-64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224678 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-20 07:43:27 +00:00
Elena Demikhovsky
573b762b68 Masked load and store codegen - fixed 128-bit vectors
The codegen failed on 128-bit types on AVX2.
I added patterns and in td files and tests.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224647 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-19 23:27:57 +00:00
Reid Kleckner
0f85d54670 Add the ExceptionHandling::MSVC enumeration
It is intended to be used for a family of personality functions that
have similar IR preparation requirements. Typically when interoperating
with MSVC personality functions, bits of functionality need to be
outlined from the main function into helper functions. There is also
usually more than one landing pad per invoke, which does not match the
LLVM IR landingpad representation.

None of this is implemented yet. This change just adds a new enum that
is active for *-windows-msvc and delegates to the EH removal preparation
pass.  No functionality change for other targets.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224625 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-19 22:19:48 +00:00
Sanjay Patel
9ccbf1a260 Model sqrtss as a binary operation with one source operand tied to the destination (PR14221)
This is a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ).
That patch started to fix PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ), but it was not completed. 

Differential Revision: http://reviews.llvm.org/D6330



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224624 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-19 22:16:28 +00:00
Robert Khasanov
d25d7bb372 [AVX512] Enable FP arithmetic lowering for AVX512VL subsets.
Added RegOp2MemOpTable4 to transform 4th operand from register to memory in merge-masked versions of instructions. 
Added lowering tests.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224516 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-18 12:28:22 +00:00
Craig Topper
539998ec8d [X86] Use correct opsize on indirect call and jump aliases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224497 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-18 05:02:12 +00:00
Craig Topper
4ae31b7e7d [X86] Don't use PS prefix on LDMXCSR/STMXCSR.
Near as I can tell prefixes are ignored on these instructions except for a comment in the Intel docs about 0xf3. Binutils disassembler seems to ignore prefixes on these instructions. Our disassembler still doesn't distinguish PS and "no prefix" well enough for this to make a functional change, but it helps with experiments I'm doing on a potential new disassembler table builder.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224496 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-18 05:02:10 +00:00
Craig Topper
c1dd90797f [X86] Remove unnecessary 'In64BitMode' predicate for instructions that already indicate use of REX.W.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224495 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-18 05:02:08 +00:00
Michael Kuperstein
fd350586f5 [DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle.
This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors.

This fixes PR15872 and provides a partial fix for PR21711.

Differential Revision: http://reviews.llvm.org/D6678

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224429 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-17 12:32:17 +00:00
Quentin Colombet
1e2604dccc [CodeGenPrepare] Reapply r224351 with a fix for the assertion failure:
The type promotion helper does not support vector type, so when make
such it does not kick in in such cases.

Original commit message:
[CodeGenPrepare] Move sign/zero extensions near loads using type promotion.

This patch extends the optimization in CodeGenPrepare that moves a sign/zero
extension near a load when the target can combine them. The optimization may
promote any operations between the extension and the load to make that possible.

Although this optimization may be beneficial for all targets, in particular
AArch64, this is enabled for X86 only as I have not benchmarked it for other
targets yet.


** Context **

Most targets feature extended loads, i.e., loads that perform a zero or sign
extension for free. In that context it is interesting to expose such pattern in
CodeGenPrepare so that the instruction selection pass can form such loads.
Sometimes, this pattern is blocked because of instructions between the load and
the extension. When those instructions are promotable to the extended type, we
can expose this pattern.


** Motivating Example **

Let us consider an example:
define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) {
  %ld = load i8* %addr1
  %zextld = zext i8 %ld to i32
  %ld2 = load i32* %addr2
  %add = add nsw i32 %ld2, %zextld
  %sextadd = sext i32 %add to i64
  %zexta = zext i8 %a to i32
  %addza = add nsw i32 %zexta, %zextld
  %sextaddza = sext i32 %addza to i64
  %addb = add nsw i32 %b, %zextld
  %sextaddb = sext i32 %addb to i64
  call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb)
  ret void
}

As it is, this IR generates the following assembly on x86_64:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movl  (%rsi), %es      # plain load
  addl  %eax, %esi       # 32-bit add
  movslq  %esi, %rdi     # sign extend the result of add
  movzbl  %dl, %edx      # zero extend the first argument
  addl  %eax, %edx       # 32-bit add
  movslq  %edx, %rsi     # sign extend the result of add
  addl  %eax, %ecx       # 32-bit add
  movslq  %ecx, %rdx     # sign extend the result of add
[...]
The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.

Now, by promoting the additions to form more extended loads we would generate:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movslq  (%rsi), %rdi   # sign-extended load
  addq  %rax, %rdi       # 64-bit add
  movzbl  %dl, %esi      # zero extend the first argument
  addq  %rax, %rsi       # 64-bit add
  movslq  %ecx, %rdx     # sign extend the second argument
  addq  %rax, %rdx       # 64-bit add
[...]
The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.

This kind of sequences happen a lot on code using 32-bit indexes on 64-bit
architectures.

Note: The throughput numbers are similar on Sandy Bridge and Haswell.


** Proposed Solution **

To avoid the penalty of all these sign/zero extensions, we merge them in the
loads at the beginning of the chain of computation by promoting all the chain of
computation on the extended type. The promotion is done if and only if we do not
introduce new extensions, i.e., if we do not degrade the code quality.
To achieve this, we extend the existing “move ext to load” optimization with the
promotion mechanism introduced to match larger patterns for addressing mode
(r200947).
The idea of this extension is to perform the following transformation:
ext(promotableInst1(...(promotableInstN(load))))
=>
promotedInst1(...(promotedInstN(ext(load))))

The promotion mechanism in that optimization is enabled by a new TargetLowering
switch, which is off by default. In other words, by default, the optimization
performs the “move ext to load” optimization as it was before this patch.


** Performance **

Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10.
Tested Optimization Levels: O3/Os
Tests: llvm-testsuite + externals.
Results:
- No regression beside noise.
- Improvements:
CINT2006/473.astar:  ~2%
Benchmarks/PAQ8p: ~2%
Misc/perlin: ~3%

The results are consistent for both O3 and Os.

<rdar://problem/18310086>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224402 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-17 01:36:17 +00:00
Reid Kleckner
0c7f4e46b6 Revert "[CodeGenPrepare] Move sign/zero extensions near loads using type promotion."
This reverts commit r224351. It causes assertion failures when building
ICU.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224397 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-17 00:29:23 +00:00
Simon Pilgrim
2385e98928 [X86][SSE] Vector double -> float conversion memory folding (cvtpd2ps)
Added a missing memory folding relationship for the (V)CVTPD2PS instruction - we can safely fold these for stack reloads.

Differential Revision: http://reviews.llvm.org/D6663



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224383 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-16 22:30:10 +00:00
JF Bastien
50b8835451 x86-32: PUSHF/POPF use/def EFLAGS
Summary: As a side-quest for D6629 jvoung pointed out that I should use -verify-machineinstrs and this found a bug in x86-32's handling of EFLAGS for PUSHF/POPF. This patch fixes the use/def, and adds -verify-machineinstrs to all x86 tests which contain 'EFLAGS'. One exception: this patch leaves inline-asm-fpstack.ll as-is because it fails -verify-machineinstrs in a way unrelated to EFLAGS. This patch also modifies cmpxchg-clobber-flags.ll along the lines of what D6629 already does by also testing i386.

Test Plan: ninja check

Reviewers: t.p.northover, jvoung

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6687

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224359 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-16 20:15:45 +00:00
Quentin Colombet
93b6e016b1 [CodeGenPrepare] Move sign/zero extensions near loads using type promotion.
This patch extends the optimization in CodeGenPrepare that moves a sign/zero
extension near a load when the target can combine them. The optimization may
promote any operations between the extension and the load to make that possible.

Although this optimization may be beneficial for all targets, in particular
AArch64, this is enabled for X86 only as I have not benchmarked it for other
targets yet.


** Context **

Most targets feature extended loads, i.e., loads that perform a zero or sign
extension for free. In that context it is interesting to expose such pattern in
CodeGenPrepare so that the instruction selection pass can form such loads.
Sometimes, this pattern is blocked because of instructions between the load and
the extension. When those instructions are promotable to the extended type, we
can expose this pattern.


** Motivating Example **

Let us consider an example:
define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) {
  %ld = load i8* %addr1
  %zextld = zext i8 %ld to i32
  %ld2 = load i32* %addr2
  %add = add nsw i32 %ld2, %zextld
  %sextadd = sext i32 %add to i64
  %zexta = zext i8 %a to i32
  %addza = add nsw i32 %zexta, %zextld
  %sextaddza = sext i32 %addza to i64
  %addb = add nsw i32 %b, %zextld
  %sextaddb = sext i32 %addb to i64
  call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb)
  ret void
}

As it is, this IR generates the following assembly on x86_64:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movl  (%rsi), %es      # plain load
  addl  %eax, %esi       # 32-bit add
  movslq  %esi, %rdi     # sign extend the result of add
  movzbl  %dl, %edx      # zero extend the first argument
  addl  %eax, %edx       # 32-bit add
  movslq  %edx, %rsi     # sign extend the result of add
  addl  %eax, %ecx       # 32-bit add
  movslq  %ecx, %rdx     # sign extend the result of add
[...]
The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.

Now, by promoting the additions to form more extended loads we would generate:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movslq  (%rsi), %rdi   # sign-extended load
  addq  %rax, %rdi       # 64-bit add
  movzbl  %dl, %esi      # zero extend the first argument
  addq  %rax, %rsi       # 64-bit add
  movslq  %ecx, %rdx     # sign extend the second argument
  addq  %rax, %rdx       # 64-bit add
[...]
The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.

This kind of sequences happen a lot on code using 32-bit indexes on 64-bit
architectures.

Note: The throughput numbers are similar on Sandy Bridge and Haswell.


** Proposed Solution **

To avoid the penalty of all these sign/zero extensions, we merge them in the
loads at the beginning of the chain of computation by promoting all the chain of
computation on the extended type. The promotion is done if and only if we do not
introduce new extensions, i.e., if we do not degrade the code quality.
To achieve this, we extend the existing “move ext to load” optimization with the
promotion mechanism introduced to match larger patterns for addressing mode
(r200947).
The idea of this extension is to perform the following transformation:
ext(promotableInst1(...(promotableInstN(load))))
=>
promotedInst1(...(promotedInstN(ext(load))))

The promotion mechanism in that optimization is enabled by a new TargetLowering
switch, which is off by default. In other words, by default, the optimization
performs the “move ext to load” optimization as it was before this patch.


** Performance **

Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10.
Tested Optimization Levels: O3/Os
Tests: llvm-testsuite + externals.
Results:
- No regression beside noise.
- Improvements:
CINT2006/473.astar:  ~2%
Benchmarks/PAQ8p: ~2%
Misc/perlin: ~3%

The results are consistent for both O3 and Os.

<rdar://problem/18310086>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224351 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-16 19:09:03 +00:00
Robert Khasanov
41100285aa [AVX512] Enable integer arithmetic lowering for AVX512BW/VL subsets.
Added lowering tests.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224349 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-16 18:24:07 +00:00
Sanjay Patel
8fe9488a40 combine consecutive subvector 16-byte loads into one 32-byte load
This is a fix for PR21709 ( http://llvm.org/bugs/show_bug.cgi?id=21709 ).
When we have 2 consecutive 16-byte loads that are merged into one 32-byte vector,
we can use a single 32-byte load instead. 
But we don't do this for SandyBridge / IvyBridge because they have slower 32-byte memops.
We also don't bother using 32-byte *integer* loads on a machine that only has AVX1 (btver2)
because those operands would have to be split in half anyway since there is no support for
32-byte integer math ops.

Differential Revision: http://reviews.llvm.org/D6492



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224344 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-16 16:30:01 +00:00
Robert Khasanov
43906dedbf [AVX512] Add a comment for avx512_broadcast_pat multiclass
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224341 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-16 16:12:11 +00:00
Elena Demikhovsky
4519623e9f X86: Added FeatureVectorUAMem for all AVX architectures.
According to AVX specification:

"Most arithmetic and data processing instructions encoded using the VEX prefix and
performing memory accesses have more flexible memory alignment requirements
than instructions that are encoded without the VEX prefix. Specifically,
With the exception of explicitly aligned 16 or 32 byte SIMD load/store instructions,
most VEX-encoded, arithmetic and data processing instructions operate in
a flexible environment regarding memory address alignment, i.e. VEX-encoded
instruction with 32-byte or 16-byte load semantics will support unaligned load
operation by default. Memory arguments for most instructions with VEX prefix
operate normally without causing #GP(0) on any byte-granularity alignment
(unlike Legacy SSE instructions)."

The same for AVX-512.

This change does not affect anything right now, because only the "memop pattern fragment"
depends on FeatureVectorUAMem and it is not used in AVX patterns.
All AVX patterns are based on the "unaligned load" anyway.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224330 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-16 09:10:08 +00:00
JF Bastien
13c782674a x86: Emit LOCK prefix after DATA16
Summary: x86 allows either ordering for the LOCK and DATA16 prefixes, but using GCC+GAS leads to different code generation than using LLVM. This change matches the order that GAS emits the x86 prefixes when a semicolon isn't used in inline assembly (see tc-i386.c comment before define LOCK_PREFIX), and helps simplify tooling that operates on the instruction's byte sequence (such as NaCl's validator). This change shouldn't have any performance impact.

Test Plan: ninja check

Reviewers: craig.topper, jvoung

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D6630

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224283 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-15 22:34:58 +00:00
Ahmed Bougacha
77effd8d7e [X86] Also pretty-print shuffle mask for INSERTPS rm variants.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224260 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-15 19:17:54 +00:00
Michael Kuperstein
299e0d4c24 [X86] Break false dependencies before partial register updates when the source operand is in memory
Adds the various "rm" instruction variants into the list of instructions that have a partial register update. Also adds all variants of SQRTSD that were missing in the original list.

Differential Revision: http://reviews.llvm.org/D6620

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224246 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-15 13:18:21 +00:00
Elena Demikhovsky
3f2027522c AVX-512: Added EXPAND instructions and intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224241 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-15 10:03:52 +00:00
Elena Demikhovsky
1c3a1516f8 Loop Vectorizer minor changes in the code -
some comments, function names, identation.

Reviewed here: http://reviews.llvm.org/D6527


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224218 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-14 09:43:50 +00:00
Robert Khasanov
5dc8ac87f1 [AVX512] Enabling bit logic lowering
Added lowering tests.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224132 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-12 17:02:18 +00:00
Robert Khasanov
a4f5a5525d [AVX512] Enabling MIN/MAX lowering.
Added lowering tests.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224127 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-12 15:10:43 +00:00
Robert Khasanov
b59ec5ad50 [AVX512] Minor fix in lowering pattern for broadcast intrustions.
No functional change.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224122 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-12 14:21:30 +00:00
Sanjay Patel
6f44989d39 remove function names from comments; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224080 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 23:38:43 +00:00
Sanjay Patel
033d8ea7a9 return without temporary; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224076 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 23:30:36 +00:00
Matthias Braun
8ac056b9dd Enable MachineVerifier in debug mode for X86, ARM, AArch64, Mips.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224075 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 23:18:03 +00:00
Ahmed Bougacha
11fcb48306 [X86] Add a temporary testcase for PR21876/r223996.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224074 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 23:07:52 +00:00
Matthias Braun
5b17297b3d [CodeGen] Add print and verify pass after each MachineFunctionPass by default
Previously print+verify passes were added in a very unsystematic way, which is
annoying when debugging as you miss intermediate steps and allows bugs to stay
unnotice when no verification is performed.

To make this change practical I added the possibility to explicitely disable
verification. I used this option on all places where no verification was
performed previously (because alot of places actually don't pass the
MachineVerifier).
In the long term these problems should be fixed properly and verification
enabled after each pass. I'll enable some more verification in subsequent
commits.

This is the 2nd attempt at this after realizing that PassManager::add() may
actually delete the pass.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224059 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 21:26:47 +00:00
Rafael Espindola
428923cfe2 This reverts commit r224043 and r224042.
check-llvm was failing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224045 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 20:03:57 +00:00
Matthias Braun
e9256e340b Enable machineverifier in debug mode for X86, ARM, AArch64, Mips
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224043 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 19:42:09 +00:00
Matthias Braun
71f56c4aac [CodeGen] Add print and verify pass after each MachineFunctionPass by default
Previously print+verify passes were added in a very unsystematic way, which is
annoying when debugging as you miss intermediate steps and allows bugs to stay
unnotice when no verification is performed.

To make this change practical I added the possibility to explicitely disable
verification. I used this option on all places where no verification was
performed previously (because alot of places actually don't pass the
MachineVerifier).
In the long term these problems should be fixed properly and verification
enabled after each pass. I'll enable some more verification in subsequent
commits.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224042 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 19:42:05 +00:00
Cameron McInally
14273ae2e4 [AVX512] Add support for 512b variable bit shift intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224028 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 17:13:05 +00:00
Elena Demikhovsky
11fb1d0eb5 AVX-512: Added all forms of COMPRESS instruction
+ intrinsics + tests


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224019 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 15:02:24 +00:00
Michael Kuperstein
6ad268ca66 [X86] When converting movs to pushes, don't assume MOVmi operand is an actual immediate
This should fix PR21878.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224010 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 11:26:16 +00:00
Elena Demikhovsky
bcb1a626b6 AVX-512: Fixed a bug in lowering setcc for MVT::i1 type
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224008 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 10:21:12 +00:00
Ahmed Bougacha
0aac0703f8 [X86] Add back AVX2 VR256 PMOVX patterns.
We can't reach those from zext, but other parts of the backend (the shuffle
lowering) generate 256-bit VZEXT nodes.

Fixes PR21876.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223996 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-11 04:32:17 +00:00
Sanjay Patel
3cd5b83bb8 Match new shuffle codegen for MOVHPD patterns
Add patterns to match SSE (shufpd) and AVX (vpermilpd) shuffle codegen
when storing the high element of a v2f64. The existing patterns were
only checking for an unpckh type of shuffle. 

http://llvm.org/bugs/show_bug.cgi?id=21791

Differential Revision: http://reviews.llvm.org/D6586



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223929 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-10 16:58:54 +00:00
Michael Kuperstein
89db49fb9b [X86] Make a code path in EltsFromConsecutiveLoads work only on vectors it expects
EltsFromConsecutiveLoads was apparently only ever called for 128-bit vectors, and assumed this implicitly. r223518 started calling it for AVX-sized vectors, causing the code path that had this assumption to crash.
This adds a check to make this path fire only for 128-bit vectors.

Differential Revision: http://reviews.llvm.org/D6579

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223922 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-10 08:46:12 +00:00
Robert Khasanov
648f7c7eb1 [AVX512] Added lowering for VBROADCASTSS/SD instructions.
Lowering patterns were written through avx512_broadcast_pat multiclass as pattern generates VBROADCAST and COPY_TO_REGCLASS nodes.
Added lowering tests.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223804 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 18:45:30 +00:00
Robert Khasanov
c50f9f15f5 [AVX512] Added VPBROADCAST{BWDQ} (Load with Broadcast Integer Data from General Purpose Register) encodings for AVX512-BW/VL subsets
Added encoding tests.
        


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223787 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 16:38:41 +00:00
Chandler Carruth
de0cdb0890 [x86] Fix the test to actually test things for the CPU names, add the
missing barcelona CPU which that test uncovered, and remove the 32-bit
x86 CPUs which I really wasn't prepared to audit and test thoroughly.

If anyone wants to clean up the 32-bit only x86 CPUs, go for it.

Also, if anyone else wants to try to de-duplicate the AMD CPUs, that'd
be cool, but from the looks of it wouldn't save as much as it did for
the Intel CPUs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223774 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 14:25:55 +00:00
Aaron Ballman
93fac96d4b Removing an unused variable to silence a -Wunused-but-set-variable warning. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223773 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 13:20:11 +00:00
Chandler Carruth
8729843d31 [x86] Bring some sanity to the x86 CPU processor definitions.
Notably, this adds simple micro-architecture names for the Intel CPU
variants, and defines the old 'core'-based names as aliases. GCC has
started to simplify their documented interface to use these names as
well, so it seems like we can start to converge on a consistent pattern.

I'd appreciate Intel double checking the entries that aren't yet
documented widely, especially Atom (Bonnell and Silvermont), Knights
Landing, and Skylake. But this change shouldn't break any existing
users.

Also, ran clang-format to re-format this code and it actually worked
(modulo a tiny bug) so hopefully we can start to stop thinking about
formatting this stuff.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223769 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 10:58:36 +00:00
Elena Demikhovsky
ff3745b4ff AVX-512: Added some comments to ERI scalar intrinsics.
No functional change.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223761 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 07:06:32 +00:00
Michael Kuperstein
77c1b73211 [X86] Convert esp-relative movs of function arguments into pushes, step 1
This handles the simplest case for mov -> push conversion:
1. x86-32 calling convention, everything is passed through the stack.
2. There is no reserved call frame.
3. Only registers or immediates are pushed, no attempt to combine a mem-reg-mem sequence into a single PUSHmm.

Differential Revision: http://reviews.llvm.org/D6503

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223757 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 06:10:44 +00:00
Bruno Cardoso Lopes
43edafcc07 [CompactUnwind] Fix register encoding logic
Fix a compact unwind encoding logic bug which would try to encode
more callee saved registers than it should, leading to early bail out
in the encoding logic and abusive use of DWARF frame mode unnecessarily.

Also remove no-compact-unwind.ll which was testing the wrong thing
based on this bug and move it to valid 'compact unwind' tests. Added
other few more tests too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223676 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-08 18:18:32 +00:00
Andrea Di Biagio
eafdf26d89 [X86] Improved tablegen patters for matching TZCNT/LZCNT.
Teach ISel how to match a TZCNT/LZCNT from a conditional move if the
condition code is X86_COND_NE.
Existing tablegen patterns only allowed to match TZCNT/LZCNT from a
X86cond with condition code equal to X86_COND_E. To avoid introducing
extra rules, I added an 'ImmLeaf' definition that checks if the
condition code is COND_E or COND_NE.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223668 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-08 17:47:18 +00:00
Andrea Di Biagio
ae16ff1c42 [X86] Improved lowering of packed v8i16 vector shifts by non-constant count.
Before this patch, the backend sub-optimally expanded the non-constant shift
count of a v8i16 shift into a sequence of two 'movd' plus 'movzwl'.

With this patch the backend checks if the target features sse4.1. If so, then
it lets the shuffle legalizer deal with the expansion of the shift amount.

Example:
;;
define <8 x i16> @test(<8 x i16> %A, <8 x i16> %B) {
  %shamt = shufflevector <8 x i16> %B, <8 x i16> undef, <8 x i32> zeroinitializer
  %shl = shl <8 x i16> %A, %shamt
  ret <8 x i16> %shl
}
;;

Before (with -mattr=+avx):
  vmovd  %xmm1, %eax
  movzwl  %ax, %eax
  vmovd  %eax, %xmm1
  vpsllw  %xmm1, %xmm0, %xmm0
  retq

Now:
  vpxor  %xmm2, %xmm2, %xmm2
  vpblendw  $1, %xmm1, %xmm2, %xmm1
  vpsllw  %xmm1, %xmm0, %xmm0
  retq


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223660 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-08 14:36:51 +00:00
Elena Demikhovsky
c4fbd3fd62 X86 intrinsics moved form X86ISelLowering.cpp to X86IntrinsicsInfo.h
X86ISelLowering.cpp has a long switch for intrinsics. I moved a part of
this long switch to the new intrinsics table in X86IntrinsicsInfo.h.
No functional changes, just code and compile time optimization.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223641 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-08 09:03:08 +00:00
Ahmed Bougacha
3b9ac8c7c3 [X86] Refactor PMOV[SZ]Xrm to add missing AVX2 patterns.
Most patterns will go away once the extload legalization changes land.

Differential Revision: http://reviews.llvm.org/D6125


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223567 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-06 01:31:07 +00:00
Ahmed Bougacha
f5e810be25 [X86] Cleanup FCOPYSIGN lowering. NFC intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223542 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 23:11:36 +00:00
Sanjay Patel
ab4ad4f98e Optimize merging of scalar loads for 32-byte vectors [X86, AVX]
Fix the poor codegen seen in PR21710 ( http://llvm.org/bugs/show_bug.cgi?id=21710 ).
Before we crack 32-byte build vectors into smaller chunks (and then subsequently
glue them back together), we should look for the easy case where we can just load
all elements in a single op.

An example of the codegen change is:

From:

vmovss  16(%rdi), %xmm1
vmovups (%rdi), %xmm0
vinsertps       $16, 20(%rdi), %xmm1, %xmm1
vinsertps       $32, 24(%rdi), %xmm1, %xmm1
vinsertps       $48, 28(%rdi), %xmm1, %xmm1
vinsertf128     $1, %xmm1, %ymm0, %ymm0
retq

To:

vmovups (%rdi), %ymm0
retq

Differential Revision: http://reviews.llvm.org/D6536



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223518 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 21:28:14 +00:00
Jan Wen Voung
a44126f432 Use 32-bit ebp for NaCl64 in a limited case: llvm.frameaddress.
Summary:
Follow up to [x32] "Use ebp/esp as frame and stack pointer":
http://reviews.llvm.org/D4617

In that earlier patch, NaCl64 was made to always use rbp.
That's needed for most cases because rbp should hold a full
64-bit address within the NaCl sandbox so that load/stores
off of rbp don't require sandbox adjustment (zeroing the top
32-bits, then filling those by adding r15).

However, llvm.frameaddress returns a pointer and pointers
are 32-bit for NaCl64. In this case, use ebp instead, which
will make the register copy type check. A similar mechanism
may be needed for llvm.eh.return, but is not added in this change.

Test Plan: test/CodeGen/X86/frameaddr.ll

Reviewers: dschuff, nadav

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D6514

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223510 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 20:55:53 +00:00
Andrea Di Biagio
6a9a49d7ab [X86] Improved lowering of packed vector shifts to vpsllq/vpsrlq.
SSE2/AVX non-constant packed shift instructions only use the lower 64-bit of
the shift count. 

This patch teaches function 'getTargetVShiftNode' how to deal with shifts
where the shift count node is of type MVT::i64.

Before this patch, function 'getTargetVShiftNode' only knew how to deal with
shift count nodes of type MVT::i32. This forced the backend to wrongly
truncate the shift count to MVT::i32, and then zero-extend it back to MVT::i64.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223505 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 20:02:22 +00:00
Andrea Di Biagio
54529ed1c4 [X86] Avoid introducing extra shuffles when lowering packed vector shifts.
When lowering a vector shift node, the backend checks if the shift count is a
shuffle with a splat mask. If so, then it introduces an extra dag node to
extract the splat value from the shuffle. The splat value is then used
to generate a shift count of a target specific shift.

However, if we know that the shift count is a splat shuffle, we can use the
splat index 'I' to extract the I-th element from the first shuffle operand.
The advantage is that the splat shuffle may become dead since we no longer
use it.

Example:

;;
define <4 x i32> @example(<4 x i32> %a, <4 x i32> %b) {
  %c = shufflevector <4 x i32> %b, <4 x i32> undef, <4 x i32> zeroinitializer
  %shl = shl <4 x i32> %a, %c
  ret <4 x i32> %shl
}
;;

Before this patch, llc generated the following code (-mattr=+avx):
  vpshufd $0, %xmm1, %xmm1   # xmm1 = xmm1[0,0,0,0]
  vpxor  %xmm2, %xmm2
  vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7]
  vpslld %xmm1, %xmm0, %xmm0
  retq

With this patch, the redundant splat operation is removed from the code.
  vpxor  %xmm2, %xmm2
  vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7]
  vpslld %xmm1, %xmm0, %xmm0
  retq


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223461 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 12:13:30 +00:00
Eric Christopher
52978c2adf Rename the x86 isTargetMacho to isTargetMachO for uniformity.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223421 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 00:22:38 +00:00
Eric Christopher
62b1007007 Both of these subtargets have functions that check whether or
not the target is mach-o. Use them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223420 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 00:22:35 +00:00
Ahmed Bougacha
3d5af84aa6 [X86] Delete dead code in fcopysign lowering. NFC.
r32900 introduced custom lowering for fcopysign, with two checks to
change the magnitude value's type if it's larger/smaller than the sign
value's type.  r32932 replaced that code for the smaller case.
r43205 did the same for the larger case, but left the old code, now dead.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223415 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 23:52:15 +00:00
Bruno Cardoso Lopes
9eb2a386c7 [x86] Fix isOffsetSuitableForCodeModel kernel code model offset
Offset == 0 is a valid offset for kernel code model according to the
x86_64 System V ABI. Found by inspection, no testcase.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223383 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 20:36:06 +00:00
Michael Kuperstein
5e343e6fd0 [X86] Improve a dag-combine that handles a vector extract -> zext sequence.
The current DAG combine turns a sequence of extracts from <4 x i32> followed by zexts into a store followed by scalar loads.
According to measurements by Martin Krastev (see PR 21269) for x86-64, a sequence of an extract, movs and shifts gives better performance. However, for 32-bit x86, the previous sequence still seems better.

Differential Revision: http://reviews.llvm.org/D6501

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223360 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 13:49:51 +00:00
Andrea Di Biagio
e6cb70164e [X86] Simplify code. NFC.
Replaced some logic that checked if a build_vector node is doing a splat of a
non-undef value with a call to method BuildVectorSDNode::getSplatValue().
No functional change intended.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223354 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 11:21:44 +00:00
Elena Demikhovsky
73ae1df82c Masked Load / Store Intrinsics - the CodeGen part.
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.

Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223348 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 09:40:44 +00:00
Michael Liao
d3c452a506 [X86] Clean up whitespace as well as minor coding style
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223339 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 05:20:33 +00:00
Michael Liao
fd0832ea89 [X86] Restore X86 base pointer after call to llvm.eh.sjlj.setjmp
Commit on 

- This patch fixes the bug described in
  http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-May/062343.html

The fix allocates an extra slot just below the GPRs and stores the base pointer
there. This is done only for functions containing llvm.eh.sjlj.setjmp that also
need a base pointer. Because code containing llvm.eh.sjlj.setjmp saves all of
the callee-save GPRs in the prologue, the offset to the extra slot can be
computed before prologue generation runs.

Impact at run-time on affected functions is::

  - One extra store in the prologue, The store saves the base pointer.
  - One extra load after a llvm.eh.sjlj.setjmp. The load restores the base pointer.

Because the extra slot is just above a gap between frame-pointer-relative and
base-pointer-relative chunks of memory, there is no impact on other offset
calculations other than ensuring there is room for the extra slot.

http://reviews.llvm.org/D6388

Patch by Arch Robison <arch.robison@intel.com>



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223329 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 00:56:38 +00:00
Matt Arsenault
459e595697 Allow target to specify prefix for labels
Use the MCAsmInfo instead of the DataLayout, and allow
specifying a custom prefix for labels specifically. HSAIL
requires that labels begin with @, but global symbols with &.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223323 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 00:06:57 +00:00
Sanjay Patel
7e4c9bda0a fix typos, grammar, formatting; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223276 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-03 22:28:05 +00:00
Ahmed Bougacha
ad41590c48 [X86][MC] Intel syntax: accept implicit memory operand sizes larger than 80.
The X86AsmParser intel handling was refactored in r216481, making it
try each different memory operand size to see which one matches.
Operand sizes larger than 80 ("[xyz]mmword ptr") were forgotten, which
led to an "invalid operand" error for code such as:
  movdqa [rax], xmm0


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223187 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-03 02:03:26 +00:00
Simon Pilgrim
ec49b722fd [X86][SSE] Keep 4i32 vector insertions in integer domain on SSE4.1 targets
4i32 shuffles for single insertions into zero vectors lowers to X86vzmovl which was using (v)blendps - causing domain switch stalls. This patch fixes this by using (v)pblendw instead.

The updated tests on test/CodeGen/X86/sse41.ll still contain a domain stall due to the use of insertps - I'm looking at fixing this in a future patch.

Differential Revision: http://reviews.llvm.org/D6458



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223165 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-02 22:31:23 +00:00
Philip Reames
712af374c1 Remove unneccessary code introduced with 223101.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223132 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-02 18:06:10 +00:00
Sanjay Patel
0a24620459 fix typo in comment
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223127 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-02 17:25:27 +00:00
Nick Lewycky
1bd6c6210f Fix variable used only in assertion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223101 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-02 01:09:56 +00:00
Philip Reames
0dfac4002b Try to fix a bot failure due to a variable used only in an assert.
Specifically, bot lld-x86_64-darwin13.  Resulting from change 223085.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223092 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-01 23:27:45 +00:00
Philip Reames
78cc6fcb01 [Statepoints 2/4] Statepoint infrastructure for garbage collection: MI & x86-64 Backend
This is the second patch in a small series.  This patch contains the MachineInstruction and x86-64 backend pieces required to lower Statepoints.  It does not include the code to actually generate the STATEPOINT machine instruction and as a result, the entire patch is currently dead code.  I will be submitting the SelectionDAG parts within the next 24-48 hours.  Since those pieces are by far the most complicated, I wanted to minimize the size of that patch.  That patch will include the tests which exercise the functionality in this patch.  The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683.

The STATEPOINT psuedo node is generated after all gc values are explicitly spilled to stack slots.  The purpose of this node is to wrap an actual call instruction while recording the spill locations of the meta arguments used for garbage collection and other purposes.  The STATEPOINT is modeled as modifing all of those locations to prevent backend optimizations from forwarding the value from before the STATEPOINT to after the STATEPOINT.  (Doing so would break relocation semantics for collectors which wish to relocate roots.)

The implementation of STATEPOINT is closely modeled on PATCHPOINT.  Eventually, much of the code in this patch will be removed.  The long term plan is to merge the functionality provided by statepoints and patchpoints.  Merging their implementations in the backend is likely to be a good starting point.

Reviewed by: atrick, ributzka



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223085 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-01 22:52:56 +00:00
Duncan P. N. Exon Smith
54786a0936 Revert "Masked Vector Load and Store Intrinsics."
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot.  I'll respond to the commit on the
list with a reproduction of one of the failures.

Conflicts:
	lib/Target/X86/X86TargetTransformInfo.cpp

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222936 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-28 21:29:14 +00:00
Sanjay Patel
c5992119fc Enable FeatureFastUAMem for btver2
Allow unaligned 16-byte memop codegen for btver2. No functional changes for any other subtargets.

Replace the existing supposed small memcpy test with an actual test of a small memcpy. 
The previous test wasn't using FileCheck either.

This patch should allow us to close PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).

Differential Revision: http://reviews.llvm.org/D6360



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222925 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-28 18:40:18 +00:00
Elena Demikhovsky
10c8f38047 AVX-512: Scalar ERI intrinsics
including SAE mode and memory operand.
Added AVX512_maskable_scalar template, that should cover all scalar instructions in the future.

The main difference between AVX512_maskable_scalar<> and AVX512_maskable<> is using X86select instead of vselect.
I need it, because I can't create vselect node for MVT::i1 mask for scalar instruction.

http://reviews.llvm.org/D6378



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222820 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-26 10:46:49 +00:00
Craig Topper
c0dae440e6 Replace neverHasSideEffects=1 with hasSideEffects=0 in all .td files.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222801 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-26 00:46:26 +00:00
Simon Pilgrim
7f6cee9626 [X86][SSE] Improvements to byte shift shuffle matching
Since (v)pslldq / (v)psrldq instructions resolve to a single input argument it is useful to match it much earlier than we currently do - this prevents more complicated shuffles (notably insertion into a zero vector) matching before it.

Differential Revision: http://reviews.llvm.org/D6409



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222796 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-25 22:34:59 +00:00
Cameron McInally
9f4bb0420d [AVX512] Add 512b integer shift by variable intrinsics and patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222786 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-25 20:41:51 +00:00
Craig Topper
690b96281f Remove space before tab in all AVX512 mnemonic strings.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222778 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-25 20:11:23 +00:00
Andrea Di Biagio
a1e1f01699 [X86] Improved target specific combine on VSELECT dag nodes.
This patch teaches function 'transformVSELECTtoBlendVECTOR_SHUFFLE' how to
convert VSELECT dag nodes to shuffles on targets that do not have SSE4.1.
On pre-SSE4.1 targets, we can still perform blend operations using movss/movsd.

Also, removed a target specific combine that performed a premature lowering of
VSELECT nodes to target specific MOVSS/MOVSD nodes.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222647 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-24 12:23:15 +00:00
Michael Kuperstein
d539147834 [X86] Fixes bug in build_vector v4x32 lowering
r222375 made some improvements to build_vector lowering of v4x32 and v4xf32 into an insertps, but it missed a case where:

1. A single extracted element is used twice.
2. The lower of the two non-zero indexes should be preserved, and the higher should be used for the dest mask.

This caused a crash, since the source value for the insertps ends-up uninitialized.

Differential Revision: http://reviews.llvm.org/D6377

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222635 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-23 13:09:06 +00:00
Craig Topper
71777d18ad Add missing override keywords.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222634 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-23 09:40:13 +00:00
Elena Demikhovsky
ae1ae2c3a1 Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222632 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-23 08:07:43 +00:00
Simon Pilgrim
53a43d38df Tidied up target triple OS detection. NFC
Use Triple::isOS*() helper functions where possible.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222622 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-22 19:12:10 +00:00
Chandler Carruth
e915b4b7c8 [x86] Teach the vector shuffle yet another step of canonicalization.
No functionality changed yet, but this will prevent subsequent patches
from having to handle permutations of various interleaved shuffle
patterns.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222614 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-22 09:18:53 +00:00
Sanjay Patel
28660d4b2f Add a feature flag for slow 32-byte unaligned memory accesses [x86].
This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen
for Sandy Bridge and Ivy Bridge. There is no functionality change intended for 
those chips. Previously, the absence of AVX2 was being used as a proxy to detect
this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2
that do not have the 32-byte unaligned access slowdown.

Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).

Differential Revision: http://reviews.llvm.org/D6355



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222544 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-21 17:40:04 +00:00
Chandler Carruth
46c5a97adc [x86] Restructure the checking patterns for v16 and v32 avx2 vector
shuffle lowering to allow much better blend matching.

Specifically, with the new structure the code seems clearer to me and we
correctly can hit the cases where merging two 128-bit lanes is a clear
win and can be shuffled cheaply afterward.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222539 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-21 14:53:03 +00:00
Chandler Carruth
0889d65fd5 [x86] Make the previous logic significantly less conservative and get
a bunch more improvements.

Non-lane-crossing is fine, the key is that lane merging only makes sense
for single-input shuffles. Not sure why I got so turned around here. The
code all works, I was just using the wrong model for it.

This only updates v4 and v8 lowering. The v16 and v32 lowering requires
restructuring the entire check sequence.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222537 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-21 14:33:24 +00:00
Chandler Carruth
bd357588a1 [x86] Teach the x86 vector shuffle lowering to detect mergable 128-bit
lanes.

By special casing these we can often either reduce the total number of
shuffles significantly or reduce the number of (high latency on Haswell)
AVX2 shuffles that potentially cross 128-bit lanes. Even when these
don't actually cross lanes, they have much higher latency to support
that. Doing two of them and a blend is worse than doing a single insert
across the 128-bit lanes to blend and then doing a single interleaved
shuffle.

While this seems like a narrow case, it kept cropping up on me and the
difference is *huge* as you can see in many of the test cases. I first
hit this trying to perfectly fix the interleaving shuffle patterns used
by Halide for AVX2.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222533 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-21 13:56:05 +00:00