Commit Graph

12300 Commits

Author SHA1 Message Date
Keno Fischer
41bda9f201 [X86][ISel] Fix a regression I introduced in r224884
The else case ResultReg was not checked for validity.
To my surprise, this case was not hit in any of the
existing test cases. This includes a new test cases
that tests this path.

Also drop the `target triple` declaration from the
original test as suggested by H.J. Lu, because
apparently with it the test won't be run on Linux

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224901 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-28 15:20:57 +00:00
Michael Kuperstein
bfa4a373f4 [X86] Add missing memory variants to AVX false dependency breaking
Adds missing memory instruction variants to AVX false dependency breaking handling. (SSE was handled in r224246)

Differential Revision: http://reviews.llvm.org/D6780

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224900 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-28 13:15:05 +00:00
Andrea Di Biagio
70a7cda495 [CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm.cttz/ctlz.
If the control flow is modelling an if-statement where the only instruction in
the 'then' basic block (excluding the terminator) is a call to cttz/ctlz,
CodeGenPrepare can try to speculate the cttz/ctlz call and simplify the control
flow graph.

Example:
\code
entry:
  %cmp = icmp eq i64 %val, 0
  br i1 %cmp, label %end.bb, label %then.bb

then.bb:
  %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true)
  br label %end.bb

end.bb:
  %cond = phi i64 [ %c, %then.bb ], [ 64, %entry]
\code

In this example, basic block %then.bb is taken if value %val is not zero.
Also, the phi node in %end.bb would propagate the size-of in bits of %val
only if %val is equal to zero.

With this patch, CodeGenPrepare will try to hoist the call to cttz from %then.bb
into basic block %entry only if cttz is cheap to speculate for the target.

Added two new hooks in TargetLowering.h to let targets customize the behavior
(i.e. decide whether it is cheap or not to speculate calls to cttz/ctlz). The
two new methods are 'isCheapToSpeculateCtlz' and 'isCheapToSpeculateCttz'.
By default, both methods return 'false'.
On X86, method 'isCheapToSpeculateCtlz' returns true only if the target has
LZCNT. Method 'isCheapToSpeculateCttz' only returns true if the target has BMI.

Differential Revision: http://reviews.llvm.org/D6728


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224899 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-28 11:07:35 +00:00
Elena Demikhovsky
8499a501e4 Scalarizer for masked load and store intrinsics.
Masked vector intrinsics are a part of common LLVM IR, but they are really supported on AVX2 and AVX-512 targets. I added a code that translates masked intrinsic for all other targets. The masked vector intrinsic is converted to a chain of scalar operations inside conditional basic blocks.

http://reviews.llvm.org/D6436



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224897 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-28 08:54:45 +00:00
David Majnemer
bd64447bf3 PowerPC: CTR shouldn't fire if a TLS call is in the loop
Determining the address of a TLS variable results in a function call in
certain TLS models.  This means that a simple ICmpInst might actually
result in invalidating the CTR register.

In such cases, do not attempt to rely on the CTR register for loop
optimization purposes.

This fixes PR22034.

Differential Revision: http://reviews.llvm.org/D6786

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224890 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-27 19:45:38 +00:00
Keno Fischer
cc80af1b4f [FastIsel][X86] Fix invalid register replacement for bool args
Summary:
Consider the following IR:

  %3 = load i8* undef
  %4 = trunc i8 %3 to i1
  %5 = call %jl_value_t.0* @foo(..., i1 %4, ...)
  ret %jl_value_t.0* %5

Bools (that are the result of direct truncs) are lowered as whatever
the argument to the trunc was and a "and 1", causing the part of the
MBB responsible for this argument to look something like this:

  %vreg8<def,tied1> = AND8ri %vreg7<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg8,%vreg7

Later, when the load is lowered, it will insert

  %vreg15<def> = MOV8rm %vreg14, 1, %noreg, 0, %noreg; mem:LD1[undef] GR8:%vreg15 GR64:%vreg14

but remember to (at the end of isel) replace vreg7 by vreg15. Now for
the bug. In fast isel lowering, we mistakenly mark vreg8 as the result
of the load instead of the trunc. This adds a fixup to have
vreg8 replaced by whatever the result of the load is as well, so
we end up with

  %vreg15<def,tied1> = AND8ri %vreg15<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg15

which is an SSA violation and causes problems later down the road.

This fixes PR21557.

Test Plan: Test test case from PR21557 is added to the test suite.

Reviewers: ributzka

Reviewed By: ributzka

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6245

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224884 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-27 13:10:15 +00:00
Rafael Espindola
3bea6d7604 No need to run llvm-as. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224859 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-26 16:42:47 +00:00
Hal Finkel
d7b2788e51 [PowerPC] [FastISel] i1 constants must be zero extended
When materializing constant i1 values, they must be zero extended. We represent
i1 values as [0, 1], not [0, -1], in i32 registers. As it turns out, this code
path was dead for i1 values prior to r216006 (which is why this did not manifest in
miscompiles until recently).

Fixes -O0 self-hosting on PPC64/Linux.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224842 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-25 23:08:25 +00:00
Elena Demikhovsky
b31322328a Masked Load/Store - Changed the order of parameters in intrinsics.
No functional changes.
The documentation is coming.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224829 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-25 07:49:20 +00:00
David Majnemer
e277a13a71 CodeGen: Error on redefinitions instead of asserting
It's possible to have a prior definition of a symbol in module asm.
Raise an error instead of crashing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224828 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-24 23:06:55 +00:00
David Majnemer
d36cad9914 CodeGen: Allow aliases to be overridden by variables
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224827 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-24 22:44:29 +00:00
David Majnemer
e54eacce75 MC: Label definitions are permitted after .set directives
.set directives may be overridden by other .set directives as well as
label definitions.

This fixes PR22019.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224811 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-24 10:27:50 +00:00
Hal Finkel
c9e5247ea7 [PowerPC] Ensure that the TOC reload directly follows bctrl on PPC64
On non-Darwin PPC64, the TOC reload needs to come directly after the bctrl
instruction (for indirect calls) because the 'bctrl/ld 2, 40(1)' instruction
sequence is interpreted by the unwinding code in libgcc. To make sure these
occur as a pair, as with other pairings interpreted by the linker, fuse the two
instructions into one instruction (for code generation only).

In the future, we might wish to do this by emitting CFI directives instead,
but this solution is simpler, and mirrors what GCC does. Additional discussion
on this point is contained in the PR.

Fixes PR22015.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224788 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 22:29:40 +00:00
Colin LeMahieu
3c3fc28384 [Hexagon] Reapplying 224775 load words.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224786 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 20:02:16 +00:00
Colin LeMahieu
6a9ef539c6 Reverting 224775 until mayLoad flag is addressed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224783 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 19:22:59 +00:00
Colin LeMahieu
5b7d5db23b [Hexagon] Adding word loads.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224775 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 18:06:56 +00:00
Elena Demikhovsky
1a637e9fc0 AVX-512: Added FMA instructions, intrinsics an tests for KNL and SKX targets
by Asaf Badouh

http://reviews.llvm.org/D6456



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224764 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 10:30:39 +00:00
Hal Finkel
2bea947207 [PowerPC] Don't mark the return-address slot as immutable
It is tempting to mark the fixed stack slot used to store the return address as
immutable when lowering @llvm.returnaddress(i32 0). Unfortunately, within the
function, it is not completely immutable: it is written during the function
prologue. When using post-RA instruction scheduling, the prologue instructions
are available for scheduling, and we're not free to interchange the order of a
particular store in the prologue with loads from that stack location.

Fixes PR21976.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224761 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 09:45:06 +00:00
Elena Demikhovsky
6709428067 AVX-512: BLENDM - fixed encoding of the broadcast version
Added more intrinsics and encoding tests.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224760 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 09:36:28 +00:00
Michael Kuperstein
1f0ddef593 [DagCombine] Improve DAGCombiner BUILD_VECTOR when it has two sources of elements
This partially fixes PR21943.

For AVX, we go from:

vmovq   (%rsi), %xmm0
vmovq   (%rdi), %xmm1
vpermilps       $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
vinsertps       $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
vinsertps       $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3]
vpermilps       $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3]
vinsertps       $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0]

To the expected:

vmovq   (%rdi), %xmm0
vmovhpd (%rsi), %xmm0, %xmm0
retq

Fixing this for AVX2 is still open.

Differential Revision: http://reviews.llvm.org/D6749

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224759 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 08:59:45 +00:00
Hal Finkel
775294d183 [PowerPC] Don't attempt a 64-bit pow2 division on PPC32
In r224033, in moving the signed power-of-2 division expansion into
BuildSDIVPow2, I accidentally made it possible to attempt the lowering for a
64-bit division on PPC32. This later asserts.

Fixes PR21928.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224758 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 08:38:50 +00:00
Ahmed Bougacha
bc47ceef43 [ARM] Don't break alignment when combining base updates into load/stores.
r223862/r224203 tried to also combine base-updating load/stores.
There was a mistake there: the alignment was added as is as an operand to
the ARMISD::VLD/VST node.  However, the VLD/VST selection logic doesn't care
about less-than-standard alignment attributes.
For example, no matter the alignment of a v2i64 load (say 1), SelectVLD picks
VLD1q64 (because of the memory type).  But VLD1q64 ("vld1.64 {dXX, dYY}") is
8-aligned, per ARMARMv7a 3.2.1.
For the 1-aligned load, what we really want is VLD1q8.

This commit introduces bitcasts if necessary, and changes the vld/vst type to
one whose standard alignment matches the original load/store alignment.

Differential Revision: http://reviews.llvm.org/D6759


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224754 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 06:07:31 +00:00
Jim Grosbach
860122b3b7 X86: Don't over-align combined loads.
When combining consecutive loads+inserts into a single vector load,
we should keep the alignment of the base load. Doing otherwise can, and does,
lead to using overly aligned instructions. In the included test case, for
example, using a 32-byte vmovaps on a 16-byte aligned value. Oops.

rdar://19190968

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224746 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 00:35:23 +00:00
Reid Kleckner
34b7fde802 Make musttail more robust for vector types on x86
Previously I tried to plug musttail into the existing vararg lowering
code. That turned out to be a mistake, because non-vararg calls use
significantly different register lowering, even on x86. For example, AVX
vectors are usually passed in registers to normal functions and memory
to vararg functions.  Now musttail uses a completely separate lowering.

Hopefully this can be used as the basis for non-x86 perfect forwarding.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6156

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224745 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 23:58:37 +00:00
Colin LeMahieu
9c0a115fbe [Hexagon] Adding memb instruction. Fixing whitespace in test from 224730.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224735 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 21:40:43 +00:00
Bruno Cardoso Lopes
ba059464c3 [x86] Add vector @llvm.ctpop intrinsic custom lowering
Currently, when ctpop is supported for scalar types, the expansion of
@llvm.ctpop.vXiY uses vector element extractions, insertions and individual
calls to @llvm.ctpop.iY. When not, expansion with bit-math operations is used
for the scalar calls.

Local haswell measurements show that we can improve vector @llvm.ctpop.vXiY
expansion in some cases by using a using a vector parallel bit twiddling
approach, based on:

v = v - ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
v = ((v + (v >> 4) & 0xF0F0F0F)
v = v + (v >> 8)
v = v + (v >> 16)
v = v & 0x0000003F
(from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel)

When scalar ctpop isn't supported, the approach above performs better for
v2i64, v4i32, v4i64 and v8i32 (see numbers below). And even when scalar ctpop
is supported, this approach performs ~2x better for v8i32.

Here, x86_64 implies -march=corei7-avx without ctpop and x86_64h includes ctpop
support with -march=core-avx2.

== [x86_64h - new]
v8i32: 0.661685
v4i32: 0.514678
v4i64: 0.652009
v2i64: 0.324289
== [x86_64h - old]
v8i32: 1.29578
v4i32: 0.528807
v4i64: 0.65981
v2i64: 0.330707

== [x86_64 - new]
v8i32: 1.003
v4i32: 0.656273
v4i64: 1.11711
v2i64: 0.754064
== [x86_64 - old]
v8i32: 2.34886
v4i32: 1.72053
v4i64: 1.41086
v2i64: 1.0244

More work for other vector types will come next.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224725 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 19:45:43 +00:00
Quentin Colombet
7b88565334 [CodeGenPrepare] Handle properly the promotion of operands when this does not
generate instructions.

Fixes PR21978.
Related to <rdar://problem/18310086>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224717 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 18:11:52 +00:00
Elena Demikhovsky
c1aa521fb4 AVX-512: Added all forms of BLENDM instructions,
intrinsics, encoding tests for AVX-512F and skx instructions.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224707 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 13:52:48 +00:00
Karthik Bhat
0c2590a266 Lower multiply-negate operation to mneg on AArch64
This patch pattern matches code such as-
neg	 w8, w8
mul	 w8, w9, w8
to
mneg	 w8, w8, w9

Review: http://reviews.llvm.org/D6754



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224706 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 13:38:58 +00:00
Rafael Espindola
b646a4b0b8 Convert a few tests to FileCheck. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224705 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 13:29:46 +00:00
Matt Arsenault
d796cf2e01 Enable (sext x) == C --> x == (trunc C) combine
Extend the existing code which handles this for zext. This makes this
more useful for targets with ZeroOrNegativeOne BooleanContent and
obsoletes a custom combine SI uses for i1 setcc (sext(i1), 0, setne)
since the constant will now be shrunk to i1.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224691 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-21 16:48:42 +00:00
Chandler Carruth
795bfc9234 [x86] Change the test added in r223774 to first check the spelling of
the error message for a bogus processor, and then look specifically for
that error message using FileCheck.

I actually tried to write the test this way at first, but drew a blank
on how to ensure the error message stayed in sync (oops). Now that I've
recalled how to do that, this is clearly better.

It also fixes an issue with a malloc implementation that actually prints
to stderr in all cases, which was causing problems for some builders it
seems.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224665 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-20 02:19:22 +00:00
Elena Demikhovsky
573b762b68 Masked load and store codegen - fixed 128-bit vectors
The codegen failed on 128-bit types on AVX2.
I added patterns and in td files and tests.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224647 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-19 23:27:57 +00:00
Matt Arsenault
7fc3bdab6a R600/SI: Only form min/max with 1 use.
If the condition is used for something else, this increases
the number of instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224646 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-19 23:15:30 +00:00
Reid Kleckner
0f85d54670 Add the ExceptionHandling::MSVC enumeration
It is intended to be used for a family of personality functions that
have similar IR preparation requirements. Typically when interoperating
with MSVC personality functions, bits of functionality need to be
outlined from the main function into helper functions. There is also
usually more than one landing pad per invoke, which does not match the
LLVM IR landingpad representation.

None of this is implemented yet. This change just adds a new enum that
is active for *-windows-msvc and delegates to the EH removal preparation
pass.  No functionality change for other targets.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224625 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-19 22:19:48 +00:00
Sanjay Patel
9ccbf1a260 Model sqrtss as a binary operation with one source operand tied to the destination (PR14221)
This is a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ).
That patch started to fix PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ), but it was not completed. 

Differential Revision: http://reviews.llvm.org/D6330



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224624 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-19 22:16:28 +00:00
Tom Stellard
87bd2fa24b R600/SI: Make sure non-inline constants aren't folded into mubuf soffset operand
mubuf instructions now define the soffset field using the SCSrc_32
register class which indicates that only SGPRs and inline constants
are allowed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224622 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-19 22:15:30 +00:00
Peter Collingbourne
9caa5bdec0 CodeGen: do not attempt to invalidate virtual registers for zero-sized phis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224615 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-19 20:50:07 +00:00
Sanjay Patel
3c3cd10928 merge consecutive stores of extracted vector elements
Add a path to DAGCombiner::MergeConsecutiveStores() 
to combine multiple scalar stores when the store operands
are extracted vector elements. This is a partial fix for
PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ).

For the new test case, codegen improves from:

   vmovss  %xmm0, (%rdi)
   vextractps      $1, %xmm0, 4(%rdi)
   vextractps      $2, %xmm0, 8(%rdi)
   vextractps      $3, %xmm0, 12(%rdi)
   vextractf128    $1, %ymm0, %xmm0
   vmovss  %xmm0, 16(%rdi)
   vextractps      $1, %xmm0, 20(%rdi)
   vextractps      $2, %xmm0, 24(%rdi)
   vextractps      $3, %xmm0, 28(%rdi)
   vzeroupper
   retq

To:

   vmovups	%ymm0, (%rdi)
   vzeroupper
   retq

Patch reviewed by Nadav Rotem.

Differential Revision: http://reviews.llvm.org/D6698



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224611 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-19 20:23:41 +00:00
Jozef Kolek
c9ae6ee7a0 [mips][microMIPS] Fix bugs related to atomic SC/LL instructions
Fix bugs related to atomic microMIPS SC/LL instructions: While expanding atomic
operations the mips32r2 encoding was emitted instead of microMIPS.

Differential Revision: http://reviews.llvm.org/D6659


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224524 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-18 16:39:29 +00:00
Toma Tabacu
c82c8a824a [mips] Clean up the CodeGen/Mips/inlineasmmemop.ll test. NFC.
Summary:
Improve comments and remove a redundant attribute list.
There are no functional changes (to the CHECK's or to the code).

Part of these changes were suggested in http://reviews.llvm.org/D6637.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6705

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224517 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-18 13:03:51 +00:00
Robert Khasanov
d25d7bb372 [AVX512] Enable FP arithmetic lowering for AVX512VL subsets.
Added RegOp2MemOpTable4 to transform 4th operand from register to memory in merge-masked versions of instructions. 
Added lowering tests.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224516 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-18 12:28:22 +00:00
Eric Christopher
c559ba7251 Add a new string member to the TargetOptions struct for the name
of the abi we should be using. For targets that don't use the
option there's no change, otherwise this allows external users
to set the ABI via string and avoid some of the -backend-option
pain in clang.

Use this option to move the ABI for the ARM port from the
Subtarget to the TargetMachine and update the testcases
accordingly since it's no longer valid to set via -mattr.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224492 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-18 02:20:58 +00:00
Eric Christopher
360cbd4839 Model ARM backend ABI selection after the front end code doing the
same. This will change the "bare metal" ABI from APCS to AAPCS.

The only difference between the front and back end code is that
the code for Triple::GNU was added for environment. That will migrate
to the front end shortly.

Tests updated with the ABI they were originally testing in the case
of bare metal (e.g. -mtriple armv7) or with a -gnu for arm-linux
triples.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224489 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-18 02:08:45 +00:00
Matt Arsenault
aa14ffddcf R600/SI: Fix f64 inline immediates
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224458 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-17 21:04:08 +00:00
Jingyue Wu
9830b459be [NVPTX] Fix bugs related to isSingleValueType
Summary:
With isSingleValueType starting to treat vector types as single-value types,
code that uses this interface needs to be updated.

Test Plan:
vector-global.ll
nvcl-param-align.ll

Reviewers: jholewinski

Reviewed By: jholewinski

Subscribers: llvm-commits, meheff, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D6573

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224440 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-17 17:59:04 +00:00
Timur Iskhodzhanov
898a22d7a5 Fix CR/LF line endings in test case
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224437 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-17 17:52:12 +00:00
Michael Kuperstein
fd350586f5 [DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle.
This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors.

This fixes PR15872 and provides a partial fix for PR21711.

Differential Revision: http://reviews.llvm.org/D6678

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224429 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-17 12:32:17 +00:00
Toma Tabacu
3fea427a63 [mips] Set GCC-compatible MIPS asssembler options before inline asm blocks.
Summary:
When generating MIPS assembly, LLVM always overrides the default assembler options by emitting the '.set noreorder', '.set nomacro' and '.set noat' directives,
while GCC uses the default options if an assembly-level function contains inline assembly code.

This becomes a problem when the code generated by LLVM is interleaved with inline assembly which assumes GCC-like assembler options (from Linux, for example).

This patch fixes these conflicts by setting the appropriate assembler options at the beginning of an inline asm block and popping them at the end.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6637

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224425 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-17 10:56:16 +00:00
Quentin Colombet
1e2604dccc [CodeGenPrepare] Reapply r224351 with a fix for the assertion failure:
The type promotion helper does not support vector type, so when make
such it does not kick in in such cases.

Original commit message:
[CodeGenPrepare] Move sign/zero extensions near loads using type promotion.

This patch extends the optimization in CodeGenPrepare that moves a sign/zero
extension near a load when the target can combine them. The optimization may
promote any operations between the extension and the load to make that possible.

Although this optimization may be beneficial for all targets, in particular
AArch64, this is enabled for X86 only as I have not benchmarked it for other
targets yet.


** Context **

Most targets feature extended loads, i.e., loads that perform a zero or sign
extension for free. In that context it is interesting to expose such pattern in
CodeGenPrepare so that the instruction selection pass can form such loads.
Sometimes, this pattern is blocked because of instructions between the load and
the extension. When those instructions are promotable to the extended type, we
can expose this pattern.


** Motivating Example **

Let us consider an example:
define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) {
  %ld = load i8* %addr1
  %zextld = zext i8 %ld to i32
  %ld2 = load i32* %addr2
  %add = add nsw i32 %ld2, %zextld
  %sextadd = sext i32 %add to i64
  %zexta = zext i8 %a to i32
  %addza = add nsw i32 %zexta, %zextld
  %sextaddza = sext i32 %addza to i64
  %addb = add nsw i32 %b, %zextld
  %sextaddb = sext i32 %addb to i64
  call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb)
  ret void
}

As it is, this IR generates the following assembly on x86_64:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movl  (%rsi), %es      # plain load
  addl  %eax, %esi       # 32-bit add
  movslq  %esi, %rdi     # sign extend the result of add
  movzbl  %dl, %edx      # zero extend the first argument
  addl  %eax, %edx       # 32-bit add
  movslq  %edx, %rsi     # sign extend the result of add
  addl  %eax, %ecx       # 32-bit add
  movslq  %ecx, %rdx     # sign extend the result of add
[...]
The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA.

Now, by promoting the additions to form more extended loads we would generate:
[...]
  movzbl  (%rdi), %eax   # zero-extended load
  movslq  (%rsi), %rdi   # sign-extended load
  addq  %rax, %rdi       # 64-bit add
  movzbl  %dl, %esi      # zero extend the first argument
  addq  %rax, %rsi       # 64-bit add
  movslq  %ecx, %rdx     # sign extend the second argument
  addq  %rax, %rdx       # 64-bit add
[...]
The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA.

This kind of sequences happen a lot on code using 32-bit indexes on 64-bit
architectures.

Note: The throughput numbers are similar on Sandy Bridge and Haswell.


** Proposed Solution **

To avoid the penalty of all these sign/zero extensions, we merge them in the
loads at the beginning of the chain of computation by promoting all the chain of
computation on the extended type. The promotion is done if and only if we do not
introduce new extensions, i.e., if we do not degrade the code quality.
To achieve this, we extend the existing “move ext to load” optimization with the
promotion mechanism introduced to match larger patterns for addressing mode
(r200947).
The idea of this extension is to perform the following transformation:
ext(promotableInst1(...(promotableInstN(load))))
=>
promotedInst1(...(promotedInstN(ext(load))))

The promotion mechanism in that optimization is enabled by a new TargetLowering
switch, which is off by default. In other words, by default, the optimization
performs the “move ext to load” optimization as it was before this patch.


** Performance **

Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10.
Tested Optimization Levels: O3/Os
Tests: llvm-testsuite + externals.
Results:
- No regression beside noise.
- Improvements:
CINT2006/473.astar:  ~2%
Benchmarks/PAQ8p: ~2%
Misc/perlin: ~3%

The results are consistent for both O3 and Os.

<rdar://problem/18310086>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224402 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-17 01:36:17 +00:00