Commit Graph

30864 Commits

Author SHA1 Message Date
Robin Morisset
2b1874cbd4 [Power] Improve the expansion of atomic loads/stores
Summary:
Atomic loads and store of up to the native size (32 bits, or 64 for PPC64)
can be lowered to a simple load or store instruction (as the synchronization
is already handled by AtomicExpand, and the atomicity is guaranteed thanks to
the alignment requirements of atomic accesses). This is exactly what this patch
does. Previously, these were implemented by complex
load-linked/store-conditional loops.. an obvious performance problem.

For example, this patch turns
```
define void @store_i8_unordered(i8* %mem) {
  store atomic i8 42, i8* %mem unordered, align 1
  ret void
}
```
from
```
_store_i8_unordered:                    ; @store_i8_unordered
; BB#0:
    rlwinm r2, r3, 3, 27, 28
    li r4, 42
    xori r5, r2, 24
    rlwinm r2, r3, 0, 0, 29
    li r3, 255
    slw r4, r4, r5
    slw r3, r3, r5
    and r4, r4, r3
LBB4_1:                                 ; =>This Inner Loop Header: Depth=1
    lwarx r5, 0, r2
    andc r5, r5, r3
    or r5, r4, r5
    stwcx. r5, 0, r2
    bne cr0, LBB4_1
; BB#2:
    blr
```
into
```
_store_i8_unordered:                    ; @store_i8_unordered
; BB#0:
    li r2, 42
    stb r2, 0(r3)
    blr

```
which looks like a pretty clear win to me.

Test Plan:
fixed the tests + new test for indexed accesses + make check-all

Reviewers: jfb, wschmidt, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5587

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218922 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 22:27:07 +00:00
Juergen Ributzka
b3f91b0af7 [Stackmaps] Make ithe frame-pointer required for stackmaps.
Do not eliminate the frame pointer if there is a stackmap or patchpoint in the
function. All stackmap references should be FP relative.

This fixes PR21107.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218920 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 22:21:49 +00:00
Chandler Carruth
bf21d40070 [x86] Teach the new vector shuffle lowering to widen floating point
elements as well as integer elements in order to form simpler shuffle
patterns.

This is the primary reason why we were failing to match some of the
2-and-2 floating point shuffles such as PR21140. Even after fixing this
we need to support some extra patterns in the backend in order to match
the resulting X86ISD::UNPCKL nodes into the correct instructions. This
commit should fix PR21140 and includes more comprehensive testing of
insertion patterns in v4 shuffles.

Not all of the added tests are beautiful. For example, we don't have
clever instructions to insert-via-load in the integer domain. There are
also some places where we aren't sufficiently cunning with our use of
movq and movd, but that's future work.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218911 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 21:37:14 +00:00
Tilmann Scheller
ad7783df73 [NVPTX] Remove dead code.
Found by the Clang static analyzer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218874 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 15:12:48 +00:00
Joerg Sonnenberger
92583e0712 Support padding unaligned data in .text.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218870 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 13:41:42 +00:00
Chandler Carruth
4bbf21e71e [x86] Improve and correct how the new vector shuffle lowering was
matching and lowering 64-bit insertions.

The first problem was that we weren't looking through bitcasts to
discover that we *could* lower as insertions. Once fixed, we in turn
weren't looking through bitcasts to discover that we could fold a load
into the lowering. Once fixed, we weren't forming a SCALAR_TO_VECTOR
node around the inserted element and instead were passing a scalar to
a DAG node that expected a vector. It turns out there are some patterns
that will "lower" this into the correct asm, but the rest of the X86
backend is very unhappy with such antics.

This should fix a few more edge case regressions I've spotted going
through the regression test suite to enable the new vector shuffle
lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218839 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 23:14:28 +00:00
Eric Christopher
300743f74a constify the TargetMachine argument used in the subtarget and
lowering constructors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218832 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 21:36:28 +00:00
Sanjay Patel
2b918388ab Lower FNEG ( FABS (x) ) -> FNABS (x) [X86 codegen] PR20578
Negative FABS of either a scalar or vector should be handled the same way
on x86 with SSE/AVX: a single OR instruction of the FP operand with a
constant to light up the sign bit(s).

http://llvm.org/bugs/show_bug.cgi?id=20578

Differential Revision: http://reviews.llvm.org/D5201



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218822 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 21:20:06 +00:00
Eric Christopher
406dccea99 Now that the optimization level is adjusting the feature string
before we hit the subtarget, remove the constructor parameter.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218817 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 21:05:35 +00:00
Eric Christopher
c9038d9c1b Rework the PPC TargetMachine so that the non-function specific
overrides happen at TargetMachine creation and not on every
subtarget creation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218805 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 20:38:26 +00:00
Eric Christopher
2e07dedce3 constify TargetMachine parameter for X86TargetLowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218804 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 20:38:22 +00:00
Sanjay Patel
72447214a6 Don't repeat function/variable name in comment. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218791 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 19:39:32 +00:00
Tim Northover
472f2a056d ARM: allow copying of CPSR when all else fails.
As with x86 and AArch64, certain situations can arise where we need to spill
CPSR in the middle of a calculation. These should be avoided where possible
(MRS/MSR is rather expensive), which ARM is actually better at than the other
two since it tries to Glue defs to uses, but as a last ditch effort, copying is
better than crashing.

rdar://problem/18011155

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218789 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 19:21:03 +00:00
Adrian Prantl
02474a32eb Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

Note: I accidentally committed a bogus older version of this patch previously.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218787 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 18:55:02 +00:00
Reed Kotler
befab0a552 Add fptrunc to mips fast-sel
Summary: Implement conversion of 64 to 32 bit floating point numbers (fptrunc) in mips fast-isel

Test Plan:
fptrunc.ll
checked also with 4 internal mips build bot flavors mip32r1/miprs32r2 and at -O0 and -O2

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: rfuhler

Differential Revision: http://reviews.llvm.org/D5553

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218785 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 18:47:02 +00:00
Adrian Prantl
10c4265675 Revert r218778 while investigating buldbot breakage.
"Move the complex address expression out of DIVariable and into an extra"

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218782 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 18:10:54 +00:00
Adrian Prantl
076fd5dfc1 Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218778 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 17:55:39 +00:00
Tom Stellard
56077f5796 R600: Call EmitFunctionHeader() in the AsmPrinter to populate the ELF symbol table
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218776 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 17:15:17 +00:00
Toma Tabacu
a2878ec715 [mips] Rename emit and parse functions for the .cpload assembler directive. NFC.
Summary: It's better if we have a consistent name for .cpload-related functions.

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5437

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218768 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 14:53:19 +00:00
Tom Stellard
f7082f9bd7 R600/SI: Add a generic pseudo EXP instruction
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218767 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 14:44:45 +00:00
Tom Stellard
cbb63311cd R600/SI: Add generic pseudo MTBUF instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218766 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 14:44:43 +00:00
Tom Stellard
f69ae4815a R600/SI: Add generic pseudo SMRD instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218765 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 14:44:42 +00:00
Oliver Stannard
9d7038c437 [ARM] Allow selecting VRINT[APMXZR] and VCVT[BT] instructions for FPv5
Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions
when targeting ARMv8, but they are actually present on any target with
FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an
M-profile core, but they have the same instructions so we model them
both as FPARMv8 in the ARM backend.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218763 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 13:13:18 +00:00
Chandler Carruth
7d64681274 [x86] Fix a few more tiny patterns with the new vector shuffle lowering
that keep cropping up in the regression test suite.

This also addresses one of the issues raised on the mailing list with
failing to form 'movsd' in as many cases as we realistically should.
There will be corresponding patches forthcoming for v4f32 at least. This
was a lot of fuss for a relatively small gain, but all the fuss was on
my end trying different ways of holding the pieces of the x86 fragment
patterns *just right*. Now that it works, the code is reasonably simple.

In the new test cases I'm adding here, v2i64 sticks out as just plain
horrible. I've not come up with any great ideas here other than that it
would be nice to recognize when we're *going* to take a domain crossing
hit and cross earlier to get the decent instructions. At least with AVX
it is slightly less silly....

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218756 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 11:14:02 +00:00
Chandler Carruth
a1b88ab2c1 [x86] Delete some extraneous logic from the new vector shuffle lowering.
Nothing was relying on this and there are potentially some edge cases
that it would not be correct under. Removing it seems better than trying
to "fix" it as nothing was relying on it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218755 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 11:13:57 +00:00
Tom Coxon
01649dea92 [AArch64] Allow access to all system registers with MRS/MSR instructions.
The A64 instruction set includes a generic register syntax for accessing
implementation-defined system registers. The syntax for these registers is:
    S<op0>_<op1>_<CRn>_<CRm>_<op2>

The encoding space permitted for implementation-defined system registers
is:
    op0 op1  CRn   CRm   op2
    11  xxx  1x11  xxxx  xxx

The full encoding space can now be accessed:
    op0 op1  CRn   CRm   op2
    xx  xxx  xxxx  xxxx  xxx

This is useful to anyone needing to write assembly code supporting new
system registers before the assembler has learned the official names for
them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218753 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 10:13:59 +00:00
Asiri Rathnayake
e9bbacd0a8 Add missing natual vector cast.
Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST
was introduced in r217159 and r217138. This patch adds a missing cast from
v2f32 to v1i64 which is causing some compilation failures. Also added test
cases to cover various modimm types and BUILD_VECTORs with i64 elements.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218751 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 09:59:45 +00:00
Oliver Stannard
ff18b9ff38 [ARM] Add support for Cortex-M7, FPv5-SP and FPv5-DP (LLVM)
The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and
FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be
modelled using the same target feature, and all double-precision
operations are already disabled by the fp-only-sp target features.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218747 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 09:02:17 +00:00
Daniel Sanders
9a11fba79f [mips] Fix disassembly of [ls][wd]c[23], cache, and pref
Fixes PR21015, and PR20993.                                                       
                                                                                  
Patch by Jun Koi



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218745 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 08:26:55 +00:00
Sasa Stankovic
05a13f0bd0 [mips] For indirect calls we don't need $gp to point to .got. Mips linker
doesn't generate lazy binding stub for a function whose address is taken in
the program.

Differential Revision: http://reviews.llvm.org/D5067


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218744 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 08:22:21 +00:00
Nick Lewycky
b69f873ee1 Fix typo in comment from r218733
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218739 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 03:37:34 +00:00
Chandler Carruth
9e2fe46484 [x86] Teach the new vector shuffle lowering to be even more aggressive
in exposing the scalar value to the broadcast DAG fragment so that we
can catch even reloads and fold them into the broadcast.

This is somewhat magical I'm afraid but seems to work. It is also what
the old lowering did, and I've switched an old test to run both
lowerings demonstrating that we get the same result.

Unlike the old code, I'm not lowering f32 or f64 scalars through this
path when we only have AVX1. The target patterns include pretty heinous
code to re-cast those as shuffles when the scalar happens to not be
spilled because AVX1 provides no broadcast mechanism from registers
what-so-ever. This is terribly brittle. I'd much rather go through our
generic lowering code to get this. If needed, we can add a peephole to
get even more opportunities to broadcast-from-spill-slots that are
exposed post-RA, but my suspicion is this just doesn't matter that much.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218734 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 03:19:43 +00:00
Chandler Carruth
429670f0e8 [x86] Hoist the zext-lowering up in the v4i32 lowering routine -- it is
the same speed as pshufd but we can fold loads into the pmovzx
instructions.

This fixes some regressions that came up in the regression test suite
for the new vector shuffle lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218733 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 02:25:54 +00:00
Adam Nemet
d0d5b08fbd [AVX512] Remove space before \t in AsmStrings.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218725 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 00:41:32 +00:00
Chandler Carruth
afe75172b1 [x86] Teach the new vector shuffle lowering about VBROADCAST and
VPBROADCAST.

This has the somewhat expected pervasive impact. I don't know why
I forgot about this. Everything seems good with lots of significant
improvements in the tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218724 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 00:41:21 +00:00
Sanjay Patel
cafc85bf1e Split the estimate() interface into separate functions for each type. NFC.
It was hacky to use an opcode as a switch because it won't always match
(rsqrte != sqrte), and it looks like we'll need to add more special casing
per arch than I had hoped for. Eg, x86 will prefer a different NR estimate
implementation. ARM will want to use it's 'step' instructions. There also
don't appear to be any new estimate instructions in any arch in a long,
long time. Altivec vloge and vexpte may have been the first and last in
that field...



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218698 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 20:28:48 +00:00
Juergen Ributzka
9952c922c2 Recommit r218010 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ.
Note: This version fixed an issue with the TBZ/TBNZ instructions that were
generated in FastISel. The issue was that the 64bit version of TBZ (TBZX)
automagically sets the upper bit of the immediate field that is used to specify
the bit we want to test. To test for any of the lower 32bits we have to first
extract the subregister and use the 32bit version of the TBZ instruction (TBZW).

Original commit message:
Teach selectBranch to fold bit test and branch into a single instruction (TBZ or
TBNZ).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218693 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 19:59:35 +00:00
Matt Arsenault
28233d3a63 R600/SI: Fix printing of clamp and omod
No tests for omod since nothing uses it yet, but
this should get rid of the remaining annoying trailing
zeros after some instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218692 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 19:49:48 +00:00
Matt Arsenault
532a5c7dc6 R600/SI: Update VOP3b to not include obsolete operands
abs / neg are now part of the srcN_modifiers operands

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218691 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 19:49:43 +00:00
Reed Kotler
8a6f79e58d Add numeric extend, trunctate to mips fast-isel
Summary:
 Add numeric extend, trunctate to mips fast-isel

 Reactivates D4827



Test Plan:
fpext.ll
loadstoreconv.ll

Reviewers: dsanders

Subscribers: mcrosier

Differential Revision: http://reviews.llvm.org/D5251

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218681 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 16:30:13 +00:00
Tom Coxon
8a23890385 [AArch64] Remove unnecessary whitespace. (Test commit)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218680 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 16:23:16 +00:00
Robert Khasanov
8acdc5232d [AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VCMPGT{BWDQ}.
Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218670 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 12:15:52 +00:00
Robert Khasanov
175ff01f0f [AVX512] Added intrinsics for 128- and 256-bit versions of VCMPEQ{BWDQ}
Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1.
Now cmp intrinsics lower in the following way:
 (i8 (int_x86_avx512_mask_pcmpeq_q_128
             (v2i64 %a), (v2i64 %b), (i8 %mask))) ->
 (i8 (bitcast
   (v8i1 (insert_subvector undef,
           (v2i1 (and (PCMPEQM %a, %b),
                      (extract_subvector
                         (v8i1 (bitcast %mask)), 0))), 0))))


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218669 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 11:41:54 +00:00
Robert Khasanov
cfa5724d50 [AVX512] Added intrinsics for VPCMPEQB and VPCMPEQW.
Added new operand type for intrinsics (IIT_V64)


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218668 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 11:32:22 +00:00
Robert Khasanov
58da66b2bf [AVX512] Enabled intrinsics for VPCMPEQD and VPCMPEQQ.
Added CMP_MASK intrinsic type


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218667 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 11:19:50 +00:00
Job Noorman
deb16c9eac Make sure aggregates are properly alligned on MSP430.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218665 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 11:15:44 +00:00
Chandler Carruth
4abb04a65c [x86] Revert r218588, r218589, and r218600. These patches were pursuing
a flawed direction and causing miscompiles. Read on for details.

Fundamentally, the premise of this patch series was to map
VECTOR_SHUFFLE DAG nodes into VSELECT DAG nodes for all blends because
we are going to *have* to lower to VSELECT nodes for some blends to
trigger the instruction selection patterns of variable blend
instructions. This doesn't actually work out so well.

In order to match performance with the existing VECTOR_SHUFFLE
lowering code, we would need to re-slice the blend in order to fit it
into either the integer or floating point blends available on the ISA.
When coming from VECTOR_SHUFFLE (or other vNi1 style VSELECT sources)
this works well because the X86 backend ensures that these types of
operands to VSELECT get sign extended into '-1' and '0' for true and
false, allowing us to re-slice the bits in whatever granularity without
changing semantics.

However, if the VSELECT condition comes from some other source, for
example code lowering vector comparisons, it will likely only have the
required bit set -- the high bit. We can't blindly slice up this style
of VSELECT. Reid found some code using Halide that triggers this and I'm
hopeful to eventually get a test case, but I don't need it to understand
why this is A Bad Idea.

There is another aspect that makes this approach flawed. When in
VECTOR_SHUFFLE form, we have very distilled information that represents
the *constant* blend mask. Converting back to a VSELECT form actually
can lose this information, and so I think now that it is better to treat
this as VECTOR_SHUFFLE until the very last moment and only use VSELECT
nodes for instruction selection purposes.

My plan is to:
1) Clean up and formalize the target pre-legalization DAG combine that
   converts a VSELECT with a constant condition operand into
   a VECTOR_SHUFFLE.
2) Remove any fancy lowering from VSELECT during *legalization* relying
   entirely on the DAG combine to catch cases where we can match to an
   immediate-controlled blend instruction.

One additional step that I'm not planning on but would be interested in
others' opinions on: we could add an X86ISD::VSELECT or X86ISD::BLENDV
which encodes a fully legalized VSELECT node. Then it would be easy to
write isel patterns only in terms of this to ensure VECTOR_SHUFFLE
legalization only ever forms the fully legalized construct and we can't
cycle between it and VSELECT combining.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218658 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 02:52:28 +00:00
Matt Arsenault
108eb07f61 Fix missing C++ mode comment
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218654 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 01:05:27 +00:00
Juergen Ributzka
a0af4b0271 [FastISel][AArch64] Fold sign-/zero-extends into the load instruction.
The sign-/zero-extension of the loaded value can be performed by the memory
instruction for free. If the result of the load has only one use and the use is
a sign-/zero-extend, then we emit the proper load instruction. The extend is
only a register copy and will be optimized away later on.

Other instructions that consume the sign-/zero-extended value are also made
aware of this fact, so they don't fold the extend too.

This fixes rdar://problem/18495928.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218653 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 00:49:58 +00:00
Juergen Ributzka
e8a9706eda [FastISel][AArch64] Factor out scale factor calculation. NFC.
Factor out the code that determines the implicit scale factor of memory
operations for a given value type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218652 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 00:49:54 +00:00