Commit Graph

10972 Commits

Author SHA1 Message Date
Ahmed Bougacha
3b9ac8c7c3 [X86] Refactor PMOV[SZ]Xrm to add missing AVX2 patterns.
Most patterns will go away once the extload legalization changes land.

Differential Revision: http://reviews.llvm.org/D6125


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223567 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-06 01:31:07 +00:00
Ahmed Bougacha
f5e810be25 [X86] Cleanup FCOPYSIGN lowering. NFC intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223542 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 23:11:36 +00:00
Sanjay Patel
ab4ad4f98e Optimize merging of scalar loads for 32-byte vectors [X86, AVX]
Fix the poor codegen seen in PR21710 ( http://llvm.org/bugs/show_bug.cgi?id=21710 ).
Before we crack 32-byte build vectors into smaller chunks (and then subsequently
glue them back together), we should look for the easy case where we can just load
all elements in a single op.

An example of the codegen change is:

From:

vmovss  16(%rdi), %xmm1
vmovups (%rdi), %xmm0
vinsertps       $16, 20(%rdi), %xmm1, %xmm1
vinsertps       $32, 24(%rdi), %xmm1, %xmm1
vinsertps       $48, 28(%rdi), %xmm1, %xmm1
vinsertf128     $1, %xmm1, %ymm0, %ymm0
retq

To:

vmovups (%rdi), %ymm0
retq

Differential Revision: http://reviews.llvm.org/D6536



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223518 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 21:28:14 +00:00
Jan Wen Voung
a44126f432 Use 32-bit ebp for NaCl64 in a limited case: llvm.frameaddress.
Summary:
Follow up to [x32] "Use ebp/esp as frame and stack pointer":
http://reviews.llvm.org/D4617

In that earlier patch, NaCl64 was made to always use rbp.
That's needed for most cases because rbp should hold a full
64-bit address within the NaCl sandbox so that load/stores
off of rbp don't require sandbox adjustment (zeroing the top
32-bits, then filling those by adding r15).

However, llvm.frameaddress returns a pointer and pointers
are 32-bit for NaCl64. In this case, use ebp instead, which
will make the register copy type check. A similar mechanism
may be needed for llvm.eh.return, but is not added in this change.

Test Plan: test/CodeGen/X86/frameaddr.ll

Reviewers: dschuff, nadav

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D6514

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223510 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 20:55:53 +00:00
Andrea Di Biagio
6a9a49d7ab [X86] Improved lowering of packed vector shifts to vpsllq/vpsrlq.
SSE2/AVX non-constant packed shift instructions only use the lower 64-bit of
the shift count. 

This patch teaches function 'getTargetVShiftNode' how to deal with shifts
where the shift count node is of type MVT::i64.

Before this patch, function 'getTargetVShiftNode' only knew how to deal with
shift count nodes of type MVT::i32. This forced the backend to wrongly
truncate the shift count to MVT::i32, and then zero-extend it back to MVT::i64.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223505 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 20:02:22 +00:00
Andrea Di Biagio
54529ed1c4 [X86] Avoid introducing extra shuffles when lowering packed vector shifts.
When lowering a vector shift node, the backend checks if the shift count is a
shuffle with a splat mask. If so, then it introduces an extra dag node to
extract the splat value from the shuffle. The splat value is then used
to generate a shift count of a target specific shift.

However, if we know that the shift count is a splat shuffle, we can use the
splat index 'I' to extract the I-th element from the first shuffle operand.
The advantage is that the splat shuffle may become dead since we no longer
use it.

Example:

;;
define <4 x i32> @example(<4 x i32> %a, <4 x i32> %b) {
  %c = shufflevector <4 x i32> %b, <4 x i32> undef, <4 x i32> zeroinitializer
  %shl = shl <4 x i32> %a, %c
  ret <4 x i32> %shl
}
;;

Before this patch, llc generated the following code (-mattr=+avx):
  vpshufd $0, %xmm1, %xmm1   # xmm1 = xmm1[0,0,0,0]
  vpxor  %xmm2, %xmm2
  vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7]
  vpslld %xmm1, %xmm0, %xmm0
  retq

With this patch, the redundant splat operation is removed from the code.
  vpxor  %xmm2, %xmm2
  vpblendw $3, %xmm1, %xmm2, %xmm1 # xmm1 = xmm1[0,1],xmm2[2,3,4,5,6,7]
  vpslld %xmm1, %xmm0, %xmm0
  retq


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223461 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 12:13:30 +00:00
Eric Christopher
52978c2adf Rename the x86 isTargetMacho to isTargetMachO for uniformity.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223421 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 00:22:38 +00:00
Eric Christopher
62b1007007 Both of these subtargets have functions that check whether or
not the target is mach-o. Use them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223420 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 00:22:35 +00:00
Ahmed Bougacha
3d5af84aa6 [X86] Delete dead code in fcopysign lowering. NFC.
r32900 introduced custom lowering for fcopysign, with two checks to
change the magnitude value's type if it's larger/smaller than the sign
value's type.  r32932 replaced that code for the smaller case.
r43205 did the same for the larger case, but left the old code, now dead.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223415 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 23:52:15 +00:00
Bruno Cardoso Lopes
9eb2a386c7 [x86] Fix isOffsetSuitableForCodeModel kernel code model offset
Offset == 0 is a valid offset for kernel code model according to the
x86_64 System V ABI. Found by inspection, no testcase.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223383 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 20:36:06 +00:00
Michael Kuperstein
5e343e6fd0 [X86] Improve a dag-combine that handles a vector extract -> zext sequence.
The current DAG combine turns a sequence of extracts from <4 x i32> followed by zexts into a store followed by scalar loads.
According to measurements by Martin Krastev (see PR 21269) for x86-64, a sequence of an extract, movs and shifts gives better performance. However, for 32-bit x86, the previous sequence still seems better.

Differential Revision: http://reviews.llvm.org/D6501

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223360 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 13:49:51 +00:00
Andrea Di Biagio
e6cb70164e [X86] Simplify code. NFC.
Replaced some logic that checked if a build_vector node is doing a splat of a
non-undef value with a call to method BuildVectorSDNode::getSplatValue().
No functional change intended.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223354 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 11:21:44 +00:00
Elena Demikhovsky
73ae1df82c Masked Load / Store Intrinsics - the CodeGen part.
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.

Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223348 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 09:40:44 +00:00
Michael Liao
d3c452a506 [X86] Clean up whitespace as well as minor coding style
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223339 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 05:20:33 +00:00
Michael Liao
fd0832ea89 [X86] Restore X86 base pointer after call to llvm.eh.sjlj.setjmp
Commit on 

- This patch fixes the bug described in
  http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-May/062343.html

The fix allocates an extra slot just below the GPRs and stores the base pointer
there. This is done only for functions containing llvm.eh.sjlj.setjmp that also
need a base pointer. Because code containing llvm.eh.sjlj.setjmp saves all of
the callee-save GPRs in the prologue, the offset to the extra slot can be
computed before prologue generation runs.

Impact at run-time on affected functions is::

  - One extra store in the prologue, The store saves the base pointer.
  - One extra load after a llvm.eh.sjlj.setjmp. The load restores the base pointer.

Because the extra slot is just above a gap between frame-pointer-relative and
base-pointer-relative chunks of memory, there is no impact on other offset
calculations other than ensuring there is room for the extra slot.

http://reviews.llvm.org/D6388

Patch by Arch Robison <arch.robison@intel.com>



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223329 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 00:56:38 +00:00
Matt Arsenault
459e595697 Allow target to specify prefix for labels
Use the MCAsmInfo instead of the DataLayout, and allow
specifying a custom prefix for labels specifically. HSAIL
requires that labels begin with @, but global symbols with &.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223323 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 00:06:57 +00:00
Sanjay Patel
7e4c9bda0a fix typos, grammar, formatting; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223276 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-03 22:28:05 +00:00
Ahmed Bougacha
ad41590c48 [X86][MC] Intel syntax: accept implicit memory operand sizes larger than 80.
The X86AsmParser intel handling was refactored in r216481, making it
try each different memory operand size to see which one matches.
Operand sizes larger than 80 ("[xyz]mmword ptr") were forgotten, which
led to an "invalid operand" error for code such as:
  movdqa [rax], xmm0


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223187 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-03 02:03:26 +00:00
Simon Pilgrim
ec49b722fd [X86][SSE] Keep 4i32 vector insertions in integer domain on SSE4.1 targets
4i32 shuffles for single insertions into zero vectors lowers to X86vzmovl which was using (v)blendps - causing domain switch stalls. This patch fixes this by using (v)pblendw instead.

The updated tests on test/CodeGen/X86/sse41.ll still contain a domain stall due to the use of insertps - I'm looking at fixing this in a future patch.

Differential Revision: http://reviews.llvm.org/D6458



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223165 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-02 22:31:23 +00:00
Philip Reames
712af374c1 Remove unneccessary code introduced with 223101.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223132 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-02 18:06:10 +00:00
Sanjay Patel
0a24620459 fix typo in comment
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223127 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-02 17:25:27 +00:00
Nick Lewycky
1bd6c6210f Fix variable used only in assertion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223101 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-02 01:09:56 +00:00
Philip Reames
0dfac4002b Try to fix a bot failure due to a variable used only in an assert.
Specifically, bot lld-x86_64-darwin13.  Resulting from change 223085.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223092 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-01 23:27:45 +00:00
Philip Reames
78cc6fcb01 [Statepoints 2/4] Statepoint infrastructure for garbage collection: MI & x86-64 Backend
This is the second patch in a small series.  This patch contains the MachineInstruction and x86-64 backend pieces required to lower Statepoints.  It does not include the code to actually generate the STATEPOINT machine instruction and as a result, the entire patch is currently dead code.  I will be submitting the SelectionDAG parts within the next 24-48 hours.  Since those pieces are by far the most complicated, I wanted to minimize the size of that patch.  That patch will include the tests which exercise the functionality in this patch.  The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683.

The STATEPOINT psuedo node is generated after all gc values are explicitly spilled to stack slots.  The purpose of this node is to wrap an actual call instruction while recording the spill locations of the meta arguments used for garbage collection and other purposes.  The STATEPOINT is modeled as modifing all of those locations to prevent backend optimizations from forwarding the value from before the STATEPOINT to after the STATEPOINT.  (Doing so would break relocation semantics for collectors which wish to relocate roots.)

The implementation of STATEPOINT is closely modeled on PATCHPOINT.  Eventually, much of the code in this patch will be removed.  The long term plan is to merge the functionality provided by statepoints and patchpoints.  Merging their implementations in the backend is likely to be a good starting point.

Reviewed by: atrick, ributzka



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223085 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-01 22:52:56 +00:00
Duncan P. N. Exon Smith
54786a0936 Revert "Masked Vector Load and Store Intrinsics."
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot.  I'll respond to the commit on the
list with a reproduction of one of the failures.

Conflicts:
	lib/Target/X86/X86TargetTransformInfo.cpp

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222936 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-28 21:29:14 +00:00
Sanjay Patel
c5992119fc Enable FeatureFastUAMem for btver2
Allow unaligned 16-byte memop codegen for btver2. No functional changes for any other subtargets.

Replace the existing supposed small memcpy test with an actual test of a small memcpy. 
The previous test wasn't using FileCheck either.

This patch should allow us to close PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).

Differential Revision: http://reviews.llvm.org/D6360



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222925 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-28 18:40:18 +00:00
Elena Demikhovsky
10c8f38047 AVX-512: Scalar ERI intrinsics
including SAE mode and memory operand.
Added AVX512_maskable_scalar template, that should cover all scalar instructions in the future.

The main difference between AVX512_maskable_scalar<> and AVX512_maskable<> is using X86select instead of vselect.
I need it, because I can't create vselect node for MVT::i1 mask for scalar instruction.

http://reviews.llvm.org/D6378



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222820 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-26 10:46:49 +00:00
Craig Topper
c0dae440e6 Replace neverHasSideEffects=1 with hasSideEffects=0 in all .td files.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222801 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-26 00:46:26 +00:00
Simon Pilgrim
7f6cee9626 [X86][SSE] Improvements to byte shift shuffle matching
Since (v)pslldq / (v)psrldq instructions resolve to a single input argument it is useful to match it much earlier than we currently do - this prevents more complicated shuffles (notably insertion into a zero vector) matching before it.

Differential Revision: http://reviews.llvm.org/D6409



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222796 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-25 22:34:59 +00:00
Cameron McInally
9f4bb0420d [AVX512] Add 512b integer shift by variable intrinsics and patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222786 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-25 20:41:51 +00:00
Craig Topper
690b96281f Remove space before tab in all AVX512 mnemonic strings.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222778 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-25 20:11:23 +00:00
Andrea Di Biagio
a1e1f01699 [X86] Improved target specific combine on VSELECT dag nodes.
This patch teaches function 'transformVSELECTtoBlendVECTOR_SHUFFLE' how to
convert VSELECT dag nodes to shuffles on targets that do not have SSE4.1.
On pre-SSE4.1 targets, we can still perform blend operations using movss/movsd.

Also, removed a target specific combine that performed a premature lowering of
VSELECT nodes to target specific MOVSS/MOVSD nodes.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222647 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-24 12:23:15 +00:00
Michael Kuperstein
d539147834 [X86] Fixes bug in build_vector v4x32 lowering
r222375 made some improvements to build_vector lowering of v4x32 and v4xf32 into an insertps, but it missed a case where:

1. A single extracted element is used twice.
2. The lower of the two non-zero indexes should be preserved, and the higher should be used for the dest mask.

This caused a crash, since the source value for the insertps ends-up uninitialized.

Differential Revision: http://reviews.llvm.org/D6377

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222635 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-23 13:09:06 +00:00
Craig Topper
71777d18ad Add missing override keywords.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222634 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-23 09:40:13 +00:00
Elena Demikhovsky
ae1ae2c3a1 Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222632 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-23 08:07:43 +00:00
Simon Pilgrim
53a43d38df Tidied up target triple OS detection. NFC
Use Triple::isOS*() helper functions where possible.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222622 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-22 19:12:10 +00:00
Chandler Carruth
e915b4b7c8 [x86] Teach the vector shuffle yet another step of canonicalization.
No functionality changed yet, but this will prevent subsequent patches
from having to handle permutations of various interleaved shuffle
patterns.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222614 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-22 09:18:53 +00:00
Sanjay Patel
28660d4b2f Add a feature flag for slow 32-byte unaligned memory accesses [x86].
This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen
for Sandy Bridge and Ivy Bridge. There is no functionality change intended for 
those chips. Previously, the absence of AVX2 was being used as a proxy to detect
this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2
that do not have the 32-byte unaligned access slowdown.

Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).

Differential Revision: http://reviews.llvm.org/D6355



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222544 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-21 17:40:04 +00:00
Chandler Carruth
46c5a97adc [x86] Restructure the checking patterns for v16 and v32 avx2 vector
shuffle lowering to allow much better blend matching.

Specifically, with the new structure the code seems clearer to me and we
correctly can hit the cases where merging two 128-bit lanes is a clear
win and can be shuffled cheaply afterward.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222539 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-21 14:53:03 +00:00
Chandler Carruth
0889d65fd5 [x86] Make the previous logic significantly less conservative and get
a bunch more improvements.

Non-lane-crossing is fine, the key is that lane merging only makes sense
for single-input shuffles. Not sure why I got so turned around here. The
code all works, I was just using the wrong model for it.

This only updates v4 and v8 lowering. The v16 and v32 lowering requires
restructuring the entire check sequence.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222537 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-21 14:33:24 +00:00
Chandler Carruth
bd357588a1 [x86] Teach the x86 vector shuffle lowering to detect mergable 128-bit
lanes.

By special casing these we can often either reduce the total number of
shuffles significantly or reduce the number of (high latency on Haswell)
AVX2 shuffles that potentially cross 128-bit lanes. Even when these
don't actually cross lanes, they have much higher latency to support
that. Doing two of them and a blend is worse than doing a single insert
across the 128-bit lanes to blend and then doing a single interleaved
shuffle.

While this seems like a narrow case, it kept cropping up on me and the
difference is *huge* as you can see in many of the test cases. I first
hit this trying to perfectly fix the interleaving shuffle patterns used
by Halide for AVX2.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222533 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-21 13:56:05 +00:00
Alexey Volkov
d0d0424368 [X86] For Silvermont CPU use 16-bit division instead of 64-bit for small positive numbers
Differential Revision: http://reviews.llvm.org/D5938



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222521 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-21 11:19:34 +00:00
Craig Topper
e0ed7df6b0 Remove a bunch of unnecessary typecasts to 'const TargetRegisterClass *'
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222509 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-21 05:58:21 +00:00
Quentin Colombet
c91f34ae54 [X86] Do not custom lower UINT_TO_FP when the target type does not
match the custom lowering.

<rdar://problem/19026326>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222489 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-21 00:47:19 +00:00
Reid Kleckner
9c390888f7 Fix more instances of -Wsentinel on Windows with s/NULL/nullptr/
Follow up to r221940, where I must not have caught em all. NFC

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222481 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-20 23:51:47 +00:00
Reid Kleckner
d12434058d Add out of line virtual destructors to all LLVMTargetMachine subclasses
These recently all grew a unique_ptr<TargetLoweringObjectFile> member in
r221878.  When anyone calls a virtual method of a class, clang-cl
requires all virtual methods to be semantically valid. This includes the
implicit virtual destructor, which triggers instantiation of the
unique_ptr destructor, which fails because the type being deleted is
incomplete.

This is just part of the ongoing saga of PR20337, which is affecting
Blink as well. Because the MSVC ABI doesn't have key functions, we end
up referencing the vtable and implicit destructor on any virtual call
through a class. We don't actually end up emitting the dtor, so it'd be
good if we could avoid this unneeded type completion work.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222480 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-20 23:37:18 +00:00
Saleem Abdulrasool
e6c1fc9a44 X86: use the correct alloca symbol for Windows Itanium
Windows itanium targets the MSVCRT, and the stack probe symbol is provided by
MSVCRT.  This corrects the emission of stack probes on i686-windows-itanium.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222439 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-20 18:01:26 +00:00
Craig Topper
136d5aeba4 Fix a typo in a comment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222412 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-20 05:22:37 +00:00
Andrea Di Biagio
53daaff125 [X86] Improved lowering of v4x32 build_vector dag nodes.
This patch improves the lowering of v4f32 and v4i32 build_vector dag nodes
that are known to have at least two non-zero elements.

With this patch, a build_vector that performs a blend with zero is 
converted into a shuffle. This is done to let the shuffle legalizer expand
the dag node in a optimal way. For example, if we know that a build_vector
performs a blend with zero, we can try to lower it as a movq/blend instead of
always selecting an insertps.

This patch also improves the logic that lowers a build_vector into a insertps
with zero masking. See for example the extra test cases added to test sse41.ll.

Differential Revision: http://reviews.llvm.org/D6311


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222375 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-19 19:34:29 +00:00
Simon Pilgrim
a6943fff90 [X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2
This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain.

I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr.

Differential Revision: http://reviews.llvm.org/D5699



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222340 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-19 10:06:49 +00:00