Commit Graph

12318 Commits

Author SHA1 Message Date
Hal Finkel
0ef8f3189e [PowerPC] Enable speculation of cttz/ctlz
PPC has an instruction for ctlz with defined zero behavior, and our lowering of
cttz (provided by DAGCombine) is also efficient and branchless, so speculating
these makes sense.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225150 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-05 05:24:42 +00:00
Hal Finkel
9cad6c8a24 [PowerPC] Materialize i64 constants using rotation with masking
r225135 added the ability to materialize i64 constants using rotations in order
to reduce the instruction count. Sometimes we can use a rotation only with some
extra masking, so that we take advantage of the fact that generating a bunch of
extra higher-order 1 bits is easy using li/lis.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225147 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-05 03:41:38 +00:00
Simon Pilgrim
c0c36083da [X86][SSE] Added vector packing test for pr12412
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225138 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-04 19:08:03 +00:00
Simon Pilgrim
dc18ec0e0d [X86][SSE] Added vector integer truncation tests - based off pr15524
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225137 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-04 17:52:00 +00:00
Hal Finkel
2ac0826af3 [PowerPC] Materialize i64 constants using rotation
Materializing full 64-bit constants on PPC64 can be expensive, requiring up to
5 instructions depending on the locations of the non-zero bits. Sometimes
materializing a rotated constant, and then applying the inverse rotation, requires
fewer instructions than the direct method. If so, do that instead.

In r225132, I added support for forming constants using bit inversion. In
effect, this reverts that commit and replaces it with rotation support. The bit
inversion is useful for turning constants that are mostly ones into ones that
are mostly zeros (thus enabling a more-efficient shift-based materialization),
but the same effect can be obtained by using negative constants and a rotate,
and that is at least as efficient, if not more.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225135 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-04 15:43:55 +00:00
Hal Finkel
d138a7bb3f [PowerPC] Materialize i64 constants using bit inversion
Materializing full 64-bit constants on PPC64 can be expensive, requiring up to
5 instructions depending on the locations of the non-zero bits. Sometimes
materializing the bit-reversed constant, and then flipping the bits, requires
fewer instructions than the direct method. If so, do that instead.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225132 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-04 12:35:03 +00:00
Saleem Abdulrasool
97f8f69a7f ARM: permit tail calls to weak externals on COFF
Weak externals are resolved statically, so we can actually generate the tail
call on PE/COFF targets without breaking the requirements.  It is questionable
whether we want to propagate the current behaviour for MachO as the requirements
are part of the ARM ELF specifications, and it seems that prior to the SVN
r215890, we would have tail'ed the call.  For now, be conservative and only
permit it on PE/COFF where the call will always be fully resolved.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225119 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-03 21:35:00 +00:00
Hal Finkel
e05b232c20 [PowerPC/BlockPlacement] Allow target to provide a per-loop alignment preference
The existing code provided for specifying a global loop alignment preference.
However, the preferred loop alignment might depend on the loop itself. For
recent POWER cores, loops between 5 and 8 instructions should have 32-byte
alignment (while the others are better with 16-byte alignment) so that the
entire loop will fit in one i-cache line.

To support this, getPrefLoopAlignment has been made virtual, and can be
provided with an optional MachineLoop* so the target can inspect the loop
before answering the query. The default behavior, as before, is to return the
value set with setPrefLoopAlignment. MachineBlockPlacement now queries the
target for each loop instead of only once per function. There should be no
functional change for other targets.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225117 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-03 17:58:24 +00:00
Hal Finkel
a1d22cc789 [PowerPC] Use 16-byte alignment for modern cores for functions/loops
Most modern PowerPC cores prefer that functions and loops start on
16-byte-aligned boundaries (*), so instruct block placement, etc. to make this
happen. The branch selector has also been adjusted so account for the extra
nops that might now be inserted before loop headers.

(*) Some cores actually prefer other alignments for small loops, but that will
    be addressed in a follow-up commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225115 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-03 14:58:25 +00:00
Hal Finkel
958b670c34 [PowerPC] Add support for the CMPB instruction
Newer POWER cores, and the A2, support the cmpb instruction. This instruction
compares its operands, treating each of the 8 bytes in the GPRs separately,
returning a 'mask' result of 0 (for false) or -1 (for true) in each byte.

Code generation support is added, in the form of a PPCISelDAGToDAG
DAG-preprocessing routine, that recognizes patterns close to what the
instruction computes (either exactly, or related by a constant masking
operation), and generates the cmpb instruction (along with any necessary
constant masking operation). This can be expanded if use cases arise.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225106 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-03 01:16:37 +00:00
Hal Finkel
84cd524ee9 [PowerPC] Improve instruction selection bit-permuting operations (64-bit)
This is the second installment of improvements to instruction selection for "bit
permutation" instruction sequences. r224318 added logic for instruction
selection for 32-bit bit permutation sequences, and this adds lowering for
64-bit sequences. The 64-bit sequences are more complicated than the 32-bit
ones because:
  a) the 64-bit versions of the 32-bit rotate-and-mask instructions
     work by replicating the lower 32-bits of the value-to-be-rotated into the
     upper 32 bits -- and integrating this into the cost modeling for the various
     bit group operations is non-trivial
  b) unlike the 32-bit instructions in 32-bit mode, the rotate-and-mask instructions
     cannot, in one instruction, specify the
     mask starting index, the mask ending index, and the rotation factor. Also,
     forming arbitrary 64-bit constants is more complicated than in 32-bit mode
     because the number of instructions necessary is value dependent.

Plus, support for 'late masking' was added: it is sometimes more efficient to
treat the overall value as if it had no mandatory zero bits when planning the
bit-group insertions, and then mask them in at the very end. Unfortunately, as
the structure of the bit groups is different in the two cases, the more
feasible implementation technique was to generate both instruction sequences,
and then pick the shorter one.

And finally, we now generate reasonable code for i64 bswap:

        rldicl 5, 3, 16, 0
        rldicl 4, 3, 8, 0
        rldicl 6, 3, 24, 0
        rldimi 4, 5, 8, 48
        rldicl 5, 3, 32, 0
        rldimi 4, 6, 16, 40
        rldicl 6, 3, 48, 0
        rldimi 4, 5, 24, 32
        rldicl 5, 3, 56, 0
        rldimi 4, 6, 40, 16
        rldimi 4, 5, 48, 8
        rldimi 4, 3, 56, 0

vs. what we used to produce:

        li 4, 255
        rldicl 5, 3, 24, 40
        rldicl 6, 3, 40, 24
        rldicl 7, 3, 56, 8
        sldi 8, 3, 8
        sldi 10, 3, 24
        sldi 12, 3, 40
        rldicl 0, 3, 8, 56
        sldi 9, 4, 32
        sldi 11, 4, 40
        sldi 4, 4, 48
        andi. 5, 5, 65280
        andis. 6, 6, 255
        andis. 7, 7, 65280
        sldi 3, 3, 56
        and 8, 8, 9
        and 4, 12, 4
        and 9, 10, 11
        or 6, 7, 6
        or 5, 5, 0
        or 3, 3, 4
        or 7, 9, 8
        or 4, 6, 5
        or 3, 3, 7
        or 3, 3, 4

which is 12 instructions, instead of 25, and seems optimal (at least in terms
of code size).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225056 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-01 02:53:29 +00:00
Alexey Samsonov
c0319dd9c2 Revert "merge consecutive stores of extracted vector elements"
This reverts commit r224611. This change causes crashes
in X86 DAG->DAG Instruction Selection.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225031 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-31 00:40:28 +00:00
Peter Collingbourne
d8ae3e1fee x86_64: Fix calls to __morestack under the large code model.
Under the large code model, we cannot assume that __morestack lives within
2^31 bytes of the call site, so we cannot use pc-relative addressing. We
cannot perform the call via a temporary register, as the rax register may
be used to store the static chain, and all other suitable registers may be
either callee-save or used for parameter passing. We cannot use the stack
at this point either because __morestack manipulates the stack directly.

To avoid these issues, perform an indirect call via a read-only memory
location containing the address.

This solution is not perfect, as it assumes that the .rodata section
is laid out within 2^31 bytes of each function body, but this seems to
be sufficient for JIT.

Differential Revision: http://reviews.llvm.org/D6787

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225003 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-30 20:05:19 +00:00
Colin LeMahieu
88e5659aaf [Hexagon] Adding reg-reg indexed load forms.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224997 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-30 18:58:47 +00:00
Philip Reames
35f43b8786 Semantic tests for memory invalidation at statepoints
These are simply a collection of tests intended to show that information about the contents of gc references in the heap is lost at a statepoint. I've tried to write them so that they don't disallow correct transformations, while still being fairly easy to understand.

p.s. Ideas for additional tests are welcome.

Differential Revision: http://reviews.llvm.org/D6491



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224971 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-29 23:55:33 +00:00
Colin LeMahieu
0bd2ffae08 [Hexagon] Adding post-increment register form stores and register-immediate form stores with tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224952 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-29 20:44:51 +00:00
Rafael Espindola
a21d820952 Add segmented stack support for DragonFlyBSD.
Patch by Michael Neumann.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224936 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-29 15:47:28 +00:00
NAKAMURA Takumi
ff95215754 llvm/test/CodeGen/X86/fast-isel-call-bool.ll: Add explicit -mtriple=x86_64-unknown to satisfy x64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224907 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-28 23:37:11 +00:00
Keno Fischer
41bda9f201 [X86][ISel] Fix a regression I introduced in r224884
The else case ResultReg was not checked for validity.
To my surprise, this case was not hit in any of the
existing test cases. This includes a new test cases
that tests this path.

Also drop the `target triple` declaration from the
original test as suggested by H.J. Lu, because
apparently with it the test won't be run on Linux

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224901 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-28 15:20:57 +00:00
Michael Kuperstein
bfa4a373f4 [X86] Add missing memory variants to AVX false dependency breaking
Adds missing memory instruction variants to AVX false dependency breaking handling. (SSE was handled in r224246)

Differential Revision: http://reviews.llvm.org/D6780

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224900 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-28 13:15:05 +00:00
Andrea Di Biagio
70a7cda495 [CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm.cttz/ctlz.
If the control flow is modelling an if-statement where the only instruction in
the 'then' basic block (excluding the terminator) is a call to cttz/ctlz,
CodeGenPrepare can try to speculate the cttz/ctlz call and simplify the control
flow graph.

Example:
\code
entry:
  %cmp = icmp eq i64 %val, 0
  br i1 %cmp, label %end.bb, label %then.bb

then.bb:
  %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true)
  br label %end.bb

end.bb:
  %cond = phi i64 [ %c, %then.bb ], [ 64, %entry]
\code

In this example, basic block %then.bb is taken if value %val is not zero.
Also, the phi node in %end.bb would propagate the size-of in bits of %val
only if %val is equal to zero.

With this patch, CodeGenPrepare will try to hoist the call to cttz from %then.bb
into basic block %entry only if cttz is cheap to speculate for the target.

Added two new hooks in TargetLowering.h to let targets customize the behavior
(i.e. decide whether it is cheap or not to speculate calls to cttz/ctlz). The
two new methods are 'isCheapToSpeculateCtlz' and 'isCheapToSpeculateCttz'.
By default, both methods return 'false'.
On X86, method 'isCheapToSpeculateCtlz' returns true only if the target has
LZCNT. Method 'isCheapToSpeculateCttz' only returns true if the target has BMI.

Differential Revision: http://reviews.llvm.org/D6728


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224899 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-28 11:07:35 +00:00
Elena Demikhovsky
8499a501e4 Scalarizer for masked load and store intrinsics.
Masked vector intrinsics are a part of common LLVM IR, but they are really supported on AVX2 and AVX-512 targets. I added a code that translates masked intrinsic for all other targets. The masked vector intrinsic is converted to a chain of scalar operations inside conditional basic blocks.

http://reviews.llvm.org/D6436



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224897 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-28 08:54:45 +00:00
David Majnemer
bd64447bf3 PowerPC: CTR shouldn't fire if a TLS call is in the loop
Determining the address of a TLS variable results in a function call in
certain TLS models.  This means that a simple ICmpInst might actually
result in invalidating the CTR register.

In such cases, do not attempt to rely on the CTR register for loop
optimization purposes.

This fixes PR22034.

Differential Revision: http://reviews.llvm.org/D6786

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224890 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-27 19:45:38 +00:00
Keno Fischer
cc80af1b4f [FastIsel][X86] Fix invalid register replacement for bool args
Summary:
Consider the following IR:

  %3 = load i8* undef
  %4 = trunc i8 %3 to i1
  %5 = call %jl_value_t.0* @foo(..., i1 %4, ...)
  ret %jl_value_t.0* %5

Bools (that are the result of direct truncs) are lowered as whatever
the argument to the trunc was and a "and 1", causing the part of the
MBB responsible for this argument to look something like this:

  %vreg8<def,tied1> = AND8ri %vreg7<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg8,%vreg7

Later, when the load is lowered, it will insert

  %vreg15<def> = MOV8rm %vreg14, 1, %noreg, 0, %noreg; mem:LD1[undef] GR8:%vreg15 GR64:%vreg14

but remember to (at the end of isel) replace vreg7 by vreg15. Now for
the bug. In fast isel lowering, we mistakenly mark vreg8 as the result
of the load instead of the trunc. This adds a fixup to have
vreg8 replaced by whatever the result of the load is as well, so
we end up with

  %vreg15<def,tied1> = AND8ri %vreg15<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg15

which is an SSA violation and causes problems later down the road.

This fixes PR21557.

Test Plan: Test test case from PR21557 is added to the test suite.

Reviewers: ributzka

Reviewed By: ributzka

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6245

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224884 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-27 13:10:15 +00:00
Rafael Espindola
3bea6d7604 No need to run llvm-as. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224859 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-26 16:42:47 +00:00
Hal Finkel
d7b2788e51 [PowerPC] [FastISel] i1 constants must be zero extended
When materializing constant i1 values, they must be zero extended. We represent
i1 values as [0, 1], not [0, -1], in i32 registers. As it turns out, this code
path was dead for i1 values prior to r216006 (which is why this did not manifest in
miscompiles until recently).

Fixes -O0 self-hosting on PPC64/Linux.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224842 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-25 23:08:25 +00:00
Elena Demikhovsky
b31322328a Masked Load/Store - Changed the order of parameters in intrinsics.
No functional changes.
The documentation is coming.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224829 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-25 07:49:20 +00:00
David Majnemer
e277a13a71 CodeGen: Error on redefinitions instead of asserting
It's possible to have a prior definition of a symbol in module asm.
Raise an error instead of crashing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224828 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-24 23:06:55 +00:00
David Majnemer
d36cad9914 CodeGen: Allow aliases to be overridden by variables
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224827 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-24 22:44:29 +00:00
David Majnemer
e54eacce75 MC: Label definitions are permitted after .set directives
.set directives may be overridden by other .set directives as well as
label definitions.

This fixes PR22019.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224811 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-24 10:27:50 +00:00
Hal Finkel
c9e5247ea7 [PowerPC] Ensure that the TOC reload directly follows bctrl on PPC64
On non-Darwin PPC64, the TOC reload needs to come directly after the bctrl
instruction (for indirect calls) because the 'bctrl/ld 2, 40(1)' instruction
sequence is interpreted by the unwinding code in libgcc. To make sure these
occur as a pair, as with other pairings interpreted by the linker, fuse the two
instructions into one instruction (for code generation only).

In the future, we might wish to do this by emitting CFI directives instead,
but this solution is simpler, and mirrors what GCC does. Additional discussion
on this point is contained in the PR.

Fixes PR22015.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224788 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 22:29:40 +00:00
Colin LeMahieu
3c3fc28384 [Hexagon] Reapplying 224775 load words.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224786 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 20:02:16 +00:00
Colin LeMahieu
6a9ef539c6 Reverting 224775 until mayLoad flag is addressed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224783 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 19:22:59 +00:00
Colin LeMahieu
5b7d5db23b [Hexagon] Adding word loads.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224775 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 18:06:56 +00:00
Elena Demikhovsky
1a637e9fc0 AVX-512: Added FMA instructions, intrinsics an tests for KNL and SKX targets
by Asaf Badouh

http://reviews.llvm.org/D6456



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224764 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 10:30:39 +00:00
Hal Finkel
2bea947207 [PowerPC] Don't mark the return-address slot as immutable
It is tempting to mark the fixed stack slot used to store the return address as
immutable when lowering @llvm.returnaddress(i32 0). Unfortunately, within the
function, it is not completely immutable: it is written during the function
prologue. When using post-RA instruction scheduling, the prologue instructions
are available for scheduling, and we're not free to interchange the order of a
particular store in the prologue with loads from that stack location.

Fixes PR21976.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224761 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 09:45:06 +00:00
Elena Demikhovsky
6709428067 AVX-512: BLENDM - fixed encoding of the broadcast version
Added more intrinsics and encoding tests.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224760 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 09:36:28 +00:00
Michael Kuperstein
1f0ddef593 [DagCombine] Improve DAGCombiner BUILD_VECTOR when it has two sources of elements
This partially fixes PR21943.

For AVX, we go from:

vmovq   (%rsi), %xmm0
vmovq   (%rdi), %xmm1
vpermilps       $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
vinsertps       $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
vinsertps       $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3]
vpermilps       $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3]
vinsertps       $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0]

To the expected:

vmovq   (%rdi), %xmm0
vmovhpd (%rsi), %xmm0, %xmm0
retq

Fixing this for AVX2 is still open.

Differential Revision: http://reviews.llvm.org/D6749

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224759 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 08:59:45 +00:00
Hal Finkel
775294d183 [PowerPC] Don't attempt a 64-bit pow2 division on PPC32
In r224033, in moving the signed power-of-2 division expansion into
BuildSDIVPow2, I accidentally made it possible to attempt the lowering for a
64-bit division on PPC32. This later asserts.

Fixes PR21928.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224758 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 08:38:50 +00:00
Ahmed Bougacha
bc47ceef43 [ARM] Don't break alignment when combining base updates into load/stores.
r223862/r224203 tried to also combine base-updating load/stores.
There was a mistake there: the alignment was added as is as an operand to
the ARMISD::VLD/VST node.  However, the VLD/VST selection logic doesn't care
about less-than-standard alignment attributes.
For example, no matter the alignment of a v2i64 load (say 1), SelectVLD picks
VLD1q64 (because of the memory type).  But VLD1q64 ("vld1.64 {dXX, dYY}") is
8-aligned, per ARMARMv7a 3.2.1.
For the 1-aligned load, what we really want is VLD1q8.

This commit introduces bitcasts if necessary, and changes the vld/vst type to
one whose standard alignment matches the original load/store alignment.

Differential Revision: http://reviews.llvm.org/D6759


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224754 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 06:07:31 +00:00
Jim Grosbach
860122b3b7 X86: Don't over-align combined loads.
When combining consecutive loads+inserts into a single vector load,
we should keep the alignment of the base load. Doing otherwise can, and does,
lead to using overly aligned instructions. In the included test case, for
example, using a 32-byte vmovaps on a 16-byte aligned value. Oops.

rdar://19190968

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224746 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 00:35:23 +00:00
Reid Kleckner
34b7fde802 Make musttail more robust for vector types on x86
Previously I tried to plug musttail into the existing vararg lowering
code. That turned out to be a mistake, because non-vararg calls use
significantly different register lowering, even on x86. For example, AVX
vectors are usually passed in registers to normal functions and memory
to vararg functions.  Now musttail uses a completely separate lowering.

Hopefully this can be used as the basis for non-x86 perfect forwarding.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6156

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224745 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 23:58:37 +00:00
Colin LeMahieu
9c0a115fbe [Hexagon] Adding memb instruction. Fixing whitespace in test from 224730.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224735 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 21:40:43 +00:00
Bruno Cardoso Lopes
ba059464c3 [x86] Add vector @llvm.ctpop intrinsic custom lowering
Currently, when ctpop is supported for scalar types, the expansion of
@llvm.ctpop.vXiY uses vector element extractions, insertions and individual
calls to @llvm.ctpop.iY. When not, expansion with bit-math operations is used
for the scalar calls.

Local haswell measurements show that we can improve vector @llvm.ctpop.vXiY
expansion in some cases by using a using a vector parallel bit twiddling
approach, based on:

v = v - ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
v = ((v + (v >> 4) & 0xF0F0F0F)
v = v + (v >> 8)
v = v + (v >> 16)
v = v & 0x0000003F
(from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel)

When scalar ctpop isn't supported, the approach above performs better for
v2i64, v4i32, v4i64 and v8i32 (see numbers below). And even when scalar ctpop
is supported, this approach performs ~2x better for v8i32.

Here, x86_64 implies -march=corei7-avx without ctpop and x86_64h includes ctpop
support with -march=core-avx2.

== [x86_64h - new]
v8i32: 0.661685
v4i32: 0.514678
v4i64: 0.652009
v2i64: 0.324289
== [x86_64h - old]
v8i32: 1.29578
v4i32: 0.528807
v4i64: 0.65981
v2i64: 0.330707

== [x86_64 - new]
v8i32: 1.003
v4i32: 0.656273
v4i64: 1.11711
v2i64: 0.754064
== [x86_64 - old]
v8i32: 2.34886
v4i32: 1.72053
v4i64: 1.41086
v2i64: 1.0244

More work for other vector types will come next.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224725 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 19:45:43 +00:00
Quentin Colombet
7b88565334 [CodeGenPrepare] Handle properly the promotion of operands when this does not
generate instructions.

Fixes PR21978.
Related to <rdar://problem/18310086>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224717 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 18:11:52 +00:00
Elena Demikhovsky
c1aa521fb4 AVX-512: Added all forms of BLENDM instructions,
intrinsics, encoding tests for AVX-512F and skx instructions.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224707 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 13:52:48 +00:00
Karthik Bhat
0c2590a266 Lower multiply-negate operation to mneg on AArch64
This patch pattern matches code such as-
neg	 w8, w8
mul	 w8, w9, w8
to
mneg	 w8, w8, w9

Review: http://reviews.llvm.org/D6754



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224706 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 13:38:58 +00:00
Rafael Espindola
b646a4b0b8 Convert a few tests to FileCheck. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224705 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 13:29:46 +00:00
Matt Arsenault
d796cf2e01 Enable (sext x) == C --> x == (trunc C) combine
Extend the existing code which handles this for zext. This makes this
more useful for targets with ZeroOrNegativeOne BooleanContent and
obsoletes a custom combine SI uses for i1 setcc (sext(i1), 0, setne)
since the constant will now be shrunk to i1.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224691 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-21 16:48:42 +00:00
Chandler Carruth
795bfc9234 [x86] Change the test added in r223774 to first check the spelling of
the error message for a bogus processor, and then look specifically for
that error message using FileCheck.

I actually tried to write the test this way at first, but drew a blank
on how to ensure the error message stayed in sync (oops). Now that I've
recalled how to do that, this is clearly better.

It also fixes an issue with a malloc implementation that actually prints
to stderr in all cases, which was causing problems for some builders it
seems.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224665 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-20 02:19:22 +00:00