Commit Graph

12872 Commits

Author SHA1 Message Date
Akira Hatanaka
522cf235e9 Fix a bug in DemoteRegToStack where a reload instruction was inserted into the
wrong basic block.

This would happen when the result of an invoke was used by a phi instruction
in the invoke's normal destination block. An instruction to reload the invoke's
value would get inserted before the critical edge was split and a new basic
block (which is the correct insertion point for the reload) was created. This
commit fixes the bug by splitting the critical edge before all the reload
instructions are inserted.

Also, hoist up the code which computes the insertion point to the only place
that need that computation.

rdar://problem/15978721


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228566 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-09 06:38:23 +00:00
Sanjay Patel
4616d7dd2f fix test attributes; this is an SSE2 test, not a Nehalem test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228546 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 21:14:27 +00:00
Sanjay Patel
751e3f1f80 fix test attributes; this is an x86-64 test, not a Nehalem test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228545 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 21:10:40 +00:00
Sanjay Patel
714f3d3a0f fix test attributes; these are SSE2 tests, not Nehalem tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228544 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 21:05:03 +00:00
Sanjay Patel
e755d452e0 fix test attributes; these are SSE2 tests, not Nehalem tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228541 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 20:50:58 +00:00
Sanjay Patel
78547012ac fix test attributes; these are x86-64 tests, not Nehalem tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228536 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 20:05:53 +00:00
Sanjay Patel
8d32999929 fix test attributes; these are MMX tests, not Nehalem tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228535 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 20:01:12 +00:00
Sanjay Patel
7596cf2b66 fix test attributes; these are SSE2 tests, not Nehalem tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228534 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 19:50:55 +00:00
Sanjay Patel
c3803c8bc2 generalize test; nothing Nehalem-specific here
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228532 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 19:38:25 +00:00
Simon Pilgrim
c92ffedc5c [X86][AVX2] AVX2 broadcast + permute memory folding tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228528 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 18:33:13 +00:00
Tim Northover
848d812931 ARM & AArch64: teach LowerVSETCC that output type size may differ from input.
While various DAG combines try to guarantee that a vector SETCC
operation will have the same output size as input, there's nothing
intrinsic to either creation or LegalizeTypes that actually guarantees
it, so the function needs to be ready to handle a mismatch.

Fortunately this is easy enough, just extend or truncate the naturally
compared result.

I couldn't reproduce the failure in other backends that I know have
SIMD, so it's probably only an issue for these two due to shared
heritage.

Should fix PR21645.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228518 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 00:50:47 +00:00
Simon Pilgrim
437265ee96 [X86][AVX2] AVX2 integer stack folding tests.
This adds tests for the remaining AVX2 instructions that currently support memory folding.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228513 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-07 23:28:16 +00:00
Simon Pilgrim
2134ae7f38 [X86][AVX] Added missing stack folding support + test for vptest ymm instruction
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228509 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-07 21:44:06 +00:00
Simon Pilgrim
710e70bb70 [X86][SSE] Added missing stack folding tests for (v)mpsadbw instruction
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228506 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-07 21:20:11 +00:00
Simon Pilgrim
3281412d2a [X86] Force fp stack folding tests to keep to specific domain.
General boolean instructions (AND, ANDN, OR, XOR) need to use a specific domain instruction (and not just the default).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228495 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-07 16:14:55 +00:00
Simon Pilgrim
bf4a435d0a [X86][AVX2] More AVX2 integer stack folding tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228494 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-07 16:07:27 +00:00
David Majnemer
fdac306a12 MC: Emit COFF section flags in the "proper" order
COFF section flags are not idempotent:
  'rd' will make a read-write section because 'd' implies write
  'dr' will make a read-only section because 'r' disables write

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228490 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-07 08:26:40 +00:00
Hal Finkel
05bd43dc6e [PowerPC] Handle loop predecessor invokes
If a loop predecessor has an invoke as its terminator, and the return value
from that invoke is used to determine the loop iteration space, then we can't
insert a computation based on that value in the loop predecessor prior to the
terminator (oops). If there's such an invoke, or just no predecessor for that
matter, insert a new loop preheader.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228488 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-07 07:32:58 +00:00
Hal Finkel
9ce4011708 [PowerPC] Fixup incomplete revert of test/CodeGen/PowerPC/tls-pic.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228467 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 23:30:06 +00:00
Simon Pilgrim
148482dd6b [X86][AVX2] Begun adding AVX2 integer stack folding tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228462 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 23:12:15 +00:00
Hal Finkel
9168f717c9 Revert "r227976 - [PowerPC] Yet another approach to __tls_get_addr" and related fixups
Unfortunately, even with the workaround of disabling the linker TLS
optimizations in Clang restored (which has already been done), this still
breaks self-hosting on my P7 machine (-O3 -DNDEBUG -mcpu=native).

Bill is currently working on an alternate implementation to address the TLS
issue in a way that also fully elides the linker bug (which, unfortunately,
this approach did not fully), so I'm reverting this now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228460 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 23:07:40 +00:00
Reid Kleckner
6dc42dd2da Don't dllexport declarations
Fixes PR22488

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228411 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 17:59:49 +00:00
Michel Danzer
7097d17da0 R600/SI: Amend a test to ensure WQM is enabled for LDS in pixel shaders
Reviewed-by: Tom Stellard <tom@stellard.net>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228374 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 02:51:29 +00:00
Michel Danzer
971f0f0071 R600/SI: Don't enable WQM for V_INTERP_* instructions v2
Doesn't seem necessary anymore. I think this was mostly compensating for
not enabling WQM for texture sampling instructions.

v2: Add test coverage
Reviewed-by: Tom Stellard <tom@stellard.net>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228373 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 02:51:25 +00:00
Michel Danzer
a7879dcf33 R600/SI: Also enable WQM for image opcodes which calculate LOD v3
If whole quad mode isn't enabled for these, the level of detail is
calculated incorrectly for pixels along diagonal triangle edges, causing
artifacts.

v2: Use a TSFlag instead of lots of switch cases
v3: Add test coverage

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88642
Reviewed-by: Tom Stellard <tom@stellard.net>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228372 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 02:51:20 +00:00
Matthias Braun
3fd0775f06 AArch64: Make test more robust.
Avoid the creation of select instructions which can result in different
scheduling of the selects.

I also added a bunch of additional store volatiles. Those avoid A
CodeGen problem (bug?) where normalizes and denomarlizing the control
moves all shift instructions into the first block where ISel can't match
them together with the cmps.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228362 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 23:52:14 +00:00
Matthias Braun
b8b2dff046 X86: Test cleanup
Use FileCheck, make it more consistent and do not rely on unoptimized
or(cmp,cmp) getting combined for max to be matched.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228361 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 23:52:12 +00:00
Colin LeMahieu
71166427a3 [Hexagon] Simplifying and formatting several patterns. Changing a pattern multiply to be expanded.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228347 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 21:13:25 +00:00
Hal Finkel
b8a6712c27 [PowerPC] Prepare loops for pre-increment loads/stores
PowerPC supports pre-increment load/store instructions (except for Altivec/VSX
vector load/stores). Using these on embedded cores can be very important, but
most loops are not naturally set up to use them. We can often change that,
however, by placing loops into a non-canonical form. Generically, this means
transforming loops like this:

  for (int i = 0; i < n; ++i)
    array[i] = c;

to look like this:

  T *p = array[-1];
  for (int i = 0; i < n; ++i)
    *++p = c;

the key point is that addresses accessed are pulled into dedicated PHIs and
"pre-decremented" in the loop preheader. This allows the use of pre-increment
load/store instructions without loop peeling.

A target-specific late IR-level pass (running post-LSR), PPCLoopPreIncPrep, is
introduced to perform this transformation. I've used this code out-of-tree for
generating code for the PPC A2 for over a year. Somewhat to my surprise,
running the test suite + externals on a P7 with this transformation enabled
showed no performance regressions, and one speedup:

External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk
	-2.32514% +/- 1.03736%

So I'm going to enable it on everything for now. I was surprised by this
because, on the POWER cores, these pre-increment load/store instructions are
cracked (and, thus, harder to schedule effectively). But seeing no regressions,
and feeling that it is generally easier to split instructions apart late than
it is to combine them late, this might be the better approach regardless.

In the future, we might want to integrate this functionality into LSR (but
currently LSR does not create new PHI nodes, so (for that and other reasons)
significant work would need to be done).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228328 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 18:43:00 +00:00
Hal Finkel
885b67a5c3 [PowerPC] Generate pre-increment floating-point ld/st instructions
PowerPC supports pre-increment floating-point load/store instructions, both r+r
and r+i, and we had patterns for them, but they were not marked as legal. Mark
them as legal (and add a test case).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228327 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 18:42:53 +00:00
Ahmed Bougacha
ec35069525 [CodeGen] Add hook/combine to form vector extloads, enabled on X86.
The combine that forms extloads used to be disabled on vector types,
because "None of the supported targets knows how to perform load and
sign extend on vectors in one instruction."

That's not entirely true, since at least SSE4.1 X86 knows how to do
those sextloads/zextloads (with PMOVS/ZX).
But there are several aspects to getting this right.
First, vector extloads are controlled by a profitability callback.
For instance, on ARM, several instructions have folded extload forms,
so it's not always beneficial to create an extload node (and trying to
match extloads is a whole 'nother can of worms).

The interesting optimization enables folding of s/zextloads to illegal
(splittable) vector types, expanding them into smaller legal extloads.

It's not ideal (it introduces some legalization-like behavior in the
combine) but it's better than the obvious alternative: form illegal
extloads, and later try to split them up.  If you do that, you might
generate extloads that can't be split up, but have a valid ext+load
expansion.  At vector-op legalization time, it's too late to generate
this kind of code, so you end up forced to scalarize. It's better to
just avoid creating egregiously illegal nodes.

This optimization is enabled unconditionally on X86.

Note that the splitting combine is happy with "custom" extloads. As
is, this bypasses the actual custom lowering, and just unrolls the
extload. But from what I've seen, this is still much better than the
current custom lowering, which does some kind of unrolling at the end
anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added
FIXME).

Also note that the existing combine that forms extloads is now also
enabled on legal vectors.  This doesn't have a big effect on X86
(because sext+load is usually combined to sext_inreg+aextload).
On ARM it fires on some rare occasions; that's for a separate commit.

Differential Revision: http://reviews.llvm.org/D6904


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228325 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 18:31:02 +00:00
Andrew Trick
c4ae8cbc5d X86 ABI fix for return values > 24 bytes.
The return value's address must be returned in %rax.
i.e. the callee needs to copy the sret argument (%rdi)
into the return value (%rax).

This probably won't manifest as a bug when the caller is LLVM-compiled
code. But it is an ABI guarantee and tools expect it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228321 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 18:09:05 +00:00
Tom Stellard
c7198528eb R600/SI: Fix bug in TTI loop unrolling preferences
We should be setting UnrollingPreferences::MaxCount to MAX_UINT instead
of UnrollingPreferences::Count.

Count is a 'forced unrolling factor', while MaxCount sets an upper
limit to the unrolling factor.

Setting Count to MAX_UINT was causing the loop in the testcase to be
unrolled 15 times, when it only had a maximum of 4 iterations.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228303 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 15:32:18 +00:00
Tom Stellard
041211cd79 R600/SI: Fix bug from insertion of llvm.SI.end.cf into loop headers
The llvm.SI.end.cf intrinsic is used to mark the end of if-then blocks,
if-then-else blocks, and loops.  It is responsible for updating the
exec mask to re-enable threads that had been masked during the preceding
control flow block.  For example:

s_mov_b64 exec, 0x3                 ; Initial exec mask
s_mov_b64 s[0:1], exec              ; Saved exec mask
v_cmpx_gt_u32 exec, s[2:3], v0, 0   ; llvm.SI.if
do_stuff()
s_or_b64 exec, exec, s[0:1]         ; llvm.SI.end.cf

The bug fixed by this patch was one where the llvm.SI.end.cf intrinsic
was being inserted into the header of loops.  This would happen when
an if block terminated in a loop header and we would end up with
code like this:

s_mov_b64 exec, 0x3                 ; Initial exec mask
s_mov_b64 s[0:1], exec              ; Saved exec mask
v_cmpx_gt_u32 exec, s[2:3], v0, 0   ; llvm.SI.if
do_stuff()

LOOP:                       ; Start of loop header
s_or_b64 exec, exec, s[0:1] ; llvm.SI.end.cf <-BUG: The exec mask has the
                              same value at the beginning of each loop
			      iteration.
do_stuff();
s_cbranch_execnz LOOP

The fix is to create a new basic block before the loop and insert the
llvm.SI.end.cf there.  This way the exec mask is restored before the
start of the loop instead of at the beginning of each iteration.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228302 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 15:32:15 +00:00
Bill Schmidt
202b6045bf [PowerPC] Implement the vclz instructions for PWR8
Patch by Kit Barton.

Add the vector count leading zeros instruction for byte, halfword,
word, and doubleword sizes.  This is a fairly straightforward addition
after the changes made for vpopcnt:

 1. Add the correct definitions for the various instructions in
    PPCInstrAltivec.td
 2. Make the CTLZ operation legal on vector types when using P8Altivec
    in PPCISelLowering.cpp 

Test Plan

Created new test case in test/CodeGen/PowerPC/vec_clz.ll to check the
instructions are being generated when the CTLZ operation is used in
LLVM.

Check the encoding and decoding in test/MC/PowerPC/ppc_encoding_vmx.s
and test/Disassembler/PowerPC/ppc_encoding_vmx.txt respectively.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228301 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 15:24:47 +00:00
Bruno Cardoso Lopes
04715c9915 [X86][MMX] Handle i32->mmx conversion using movd
Implement a BITCAST dag combine to transform i32->mmx conversion patterns
into a X86 specific node (MMX_MOVW2D) and guarantee that moves between
i32 and x86mmx are better handled, i.e., don't use store-load to do the
conversion..

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228293 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 13:23:07 +00:00
Bruno Cardoso Lopes
d4299719af [X86][MMX] Add several bitcast tests
Avoid regression in previously supported MMX code by adding different
combinations of tests which exercise MMX bitcasts. Small improvements
to these patterns should come next.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228292 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 13:22:57 +00:00
Matt Arsenault
81eb6ca158 R600/SI: Fix i64 truncate to i1
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228273 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 06:05:13 +00:00
Ahmed Bougacha
a7f2cf45f3 [ARM] Use patterns instead of hardcoded regs in test. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228259 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 01:52:19 +00:00
Ahmed Bougacha
42ec3433ef [ARM] Make testcase more explicit. NFC.
The q8/d16 thing is silly;  I'd be happy to hear about a better
way to write those tests where simple substitution isn't enough..


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228258 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 01:45:28 +00:00
Tom Stellard
26bfda9dd3 R600/SI: Enable subreg liveness by default
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228228 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 23:14:18 +00:00
Rafael Espindola
e247dd2839 Don' try to make sections in comdats SHF_MERGE.
Parts of llvm were not expecting it and we wouldn't print
the entity size of the section.

Given what comdats are used for, having SHF_MERGE sections would be
just a small improvement, so just disable it for now.

Fixes pr22463.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228196 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 21:27:24 +00:00
Tom Stellard
89c96b1cd0 R600/SI: Expand misaligned 16-bit memory accesses
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228190 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 20:49:52 +00:00
Tom Stellard
fd4c349de2 R600/SI: Make more store operations legal
v2i32, i32, trunc i32 to i16, and truc i32 to i8 stores are legal for
all address spaces.  We had marked them as custom in order to lower
them for the private address space, but this is no longer necessary.

This enables lowering of misaligned stores of these types in the
DAGLegalizer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228189 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 20:49:51 +00:00
Tom Stellard
056a34916a R600: Don't promote i64 stores to v2i32 during DAG legalization
We take care of this during instruction selection now.  This
fixes a potential infinite loop when lowering misaligned stores.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228188 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 20:49:49 +00:00
Bill Schmidt
b9fc61d031 Add missing test case from r228046
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228182 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 20:00:04 +00:00
Matthias Braun
a602c10686 MachineCSE: Clear dead-def flag on CSE.
In case CSE reuses a previoulsy unused register the dead-def flag has to
be cleared on the def operand, as exposed by the arm64-cse.ll test.

This fixes PR22439 and the corresponding rdar://19694987

Differential Revision: http://reviews.llvm.org/D7395

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228178 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 19:35:16 +00:00
Michael Kuperstein
8f260e3084 Fixes a bug in vector load legalization that confused bits and bytes.
Differential Revision: http://reviews.llvm.org/D7400

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228168 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 18:54:01 +00:00
Colin LeMahieu
47d6e4d009 [Hexagon] Adding encoding information for absolute-reg mode stores. Xfailing a test until constant extenders are correctly put in the same packet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228158 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 17:52:06 +00:00
Zoran Jovanovic
8dc0ae6606 [mips][microMIPS] Implement CodeGen support for SW16 and LW16 instructions
Differential Revision: http://reviews.llvm.org/D6581


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228149 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 15:43:17 +00:00
Daniel Sanders
712010f655 [mips] Remove unused check prefix from tests. NFC.
Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7376

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228145 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 14:48:39 +00:00
Renato Golin
0966a4e370 Adding support to LLVM for targeting Cortex-A72
Currently, Cortex-A72 is modelled as an Cortex-A57 except the fp
load balancing pass isn't enabled for Cortex-A72 as it's not
profitable to have it enabled for this core.

Patch by Ranjeet Singh.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228140 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 13:31:29 +00:00
Chandler Carruth
b0589710cc [x86] Give movss and movsd execution domains in the x86 backend.
This associates movss and movsd with the packed single and packed double
execution domains (resp.). While this is largely cosmetic, as we now
don't have weird ping-pong-ing between single and double precision, it
is also useful because it avoids the domain fixing algorithm from seeing
domain breaks that don't actually exist. It will also be much more
important if we have an execution domain default other than packed
single, as that would cause us to mix movss and movsd with integer
vector code on a regular basis, a very bad mixture.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228135 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 10:58:53 +00:00
Chandler Carruth
886bbe2d76 [x86] Remove a low-value test that was just checking how we cleared
a register. We have lots of tests covering this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228133 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 10:47:34 +00:00
Chandler Carruth
424a198c30 [x86] Mechanically update a bunch of tests' check lines using the latest
version of the script.

Changes include:
- Using the VEX prefix
- Skipping more detail when we have useful shuffle comments to match
- Matching more shuffle comments that have been added to the printer
  (yay!)
- Matching the destination registers of some AVX instructions
- Stripping trailing whitespace that crept in
- Fixing indentation issues

Nothing interesting going on here. I'm just trying really hard to ensure
these changes don't show up in the diffs with actual changes to the
backend.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228132 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 10:46:53 +00:00
Renato Golin
ff01f89466 Reverting VLD1/VST1 base-updating/post-incrementing combining
This reverts patches 223862, 224198, 224203, and 224754, which were all
related to the vector load/store combining and were reverted/reaplied
a few times due to the same alignment problems we're seeing now.

Further tests, mainly self-hosting Clang, will be needed to reapply this
patch in the future.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228129 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 10:11:59 +00:00
Chandler Carruth
d16b9cd3d4 [x86] Include the destination register in the check-lines for AVX
instructions.

No actual change here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228127 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 09:18:27 +00:00
Chandler Carruth
82b686e611 [x86] Add some tests I missed in the prior commit to cover blends with
zero for v8i16 as well.

These exhibit the same domain badness, but also exhibit other weaknesses
in our blend lowering. More fixes to come.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228126 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 09:15:46 +00:00
Chandler Carruth
da681cc578 [x86] Start to introduce bit-masking based blend lowering.
This is the simplest form of bit-math based blending which only fires
when we are blending with zero and is relatively profitable. I've only
enabled this path on very specific lowering strategies. I'm planning to
widen its applicability in subsequent patches, but so far you'll notice
that even though we get fewer shufps instructions, we *still* do the bit
math in the FP execution port. I'm looking into why this is still
happening.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228124 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 09:06:05 +00:00
Chandler Carruth
6b1eacb0b5 [x86] Add tests for blends-with-zero on 4-element vectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228122 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 09:05:58 +00:00
Bill Schmidt
89e8a17b4d [PowerPC] Handle 32-bit targets properly in PPCTLSDynamicCall.cpp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228116 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 05:51:56 +00:00
Chandler Carruth
786f55c1fb [x86] Refresh the checks of a number of tests using
update_llc_test_checks.py.

The exact format of the checks has changed over time. This includes
different indenting rules, new shuffle comments that have been added,
and more operand hiding behind regular expressions.

No functional change to the tests are expected here, but this will make
subsequent patches have a clean diff as they change shuffle lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228097 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 00:58:42 +00:00
Chandler Carruth
18ee73e456 [x86] Switch to using the long '--check-prefix' form which the
update_llc_test_checks.py script uses, and refresh the checks in this
test.

No functionality changed here, just bringing this test up to work with
automated updates using the python script.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228096 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 00:58:40 +00:00
Chandler Carruth
877ac0a034 [x86] Port this test to use utils/update_llc_test_checks.py.
This will make it easy to update as I change some parts of the X86
backend, makes it more clear what instruction differences are
introduced, and I find it makes it a bit easier to read as well.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228095 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 00:58:37 +00:00
Sanjay Patel
f1ac92a3b9 improved CHECK
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228086 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 00:24:06 +00:00
Simon Pilgrim
3d04e48cb6 [X86][SSE] psrl(w/d/q) and psll(w/d/q) bit shifts for SSE2
Patch to match cases where shuffle masks can be reduced to bit shifts. Similar to byte shift shuffle matching from D5699.

Differential Revision: http://reviews.llvm.org/D6649

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228047 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 21:58:29 +00:00
Chandler Carruth
2e3524ec17 [x86] Add two truly horrific test cases for the new vector shuffle
lowering. I'm prepping patches to improve these, and this will let the
delta of those patches show the improvement. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228044 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 21:56:28 +00:00
Chandler Carruth
d5a61c2958 [x86] Update the indent and layout of some tests in this file. NFC
This is just to remove voise from using the update_llc_test_checks
script.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228043 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 21:56:24 +00:00
Marek Olsak
90eef42c8e R600/SI: Remove the -CHECK suffix from all FileCheck prefixes in LIT tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228040 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 21:53:27 +00:00
Marek Olsak
e1a8ca95be R600/SI: Fix B64 VALU shifts on VI
SI only has standard versions. VI only has REV versions.

Tested-by: Michel Dänzer <michel.daenzer@amd.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228037 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 21:53:01 +00:00
Chandler Carruth
dc5e49a1c4 [x86] Tweak my update script to use test case function names starting
with 'stress' to indicate that the specific output isn't interesting and
relax them to only check the last instruction (a ret).

I've updated the one test case that really uses this to name the one
'stress_test' which was actually producing output we can directly check.
With this, the script doesn't introduce noise when run over the v16 test
file.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228033 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 21:26:45 +00:00
Colin LeMahieu
3c159ed1a0 [Hexagon] Converting XTYPE/SHIFT intrinsics. Cleaning out old intrinsic patterns and updating tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228026 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 20:40:52 +00:00
Simon Pilgrim
646722d55f [X86][SSE] Added general integer shuffle matching for MOVQ instruction
This patch adds general shuffle pattern matching for the MOVQ zero-extend instruction (copy lower 64bits, zero upper) for all 128-bit integer vectors, it is added as a fallback test in lowerVectorShuffleAsZeroOrAnyExtend.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228022 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 20:09:18 +00:00
Colin LeMahieu
861e105e61 [Hexagon] Updating XTYPE/PRED intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228019 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 19:43:59 +00:00
Colin LeMahieu
30f48c7dc4 [Hexagon] Updating XTYPE/PERM intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228015 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 19:36:59 +00:00
Simon Pilgrim
71a4e9522e [X86][AVX2] Enabled shuffle matching for the AVX2 zero extension (128bit -> 256bit) vpmovzx* instructions.
Differential Revision: http://reviews.llvm.org/D7251

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228014 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 19:34:09 +00:00
Rafael Espindola
f4e2998eda Fix typo in test/CodeGen/X86/sibcall.ll (pr22331).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228011 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 19:20:26 +00:00
Colin LeMahieu
6217146dce [Hexagon] Adding missing vector multiply instruction encodings. Converting multiply intrinsics and updating tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228010 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 19:15:11 +00:00
Sanjay Patel
9b4cc76745 Merge consecutive 16-byte loads into one 32-byte load (PR22329)
This patch detects consecutive vector loads using the existing 
EltsFromConsecutiveLoads() logic. This fixes:
http://llvm.org/bugs/show_bug.cgi?id=22329

This patch effectively reverts the tablegen additions of D6492 / 
http://reviews.llvm.org/rL224344 ...which in hindsight were a horrible hack.

The test cases that were added with that patch are simply modified to load
from varying offsets of a base pointer. These loads did not match the existing
tablegen patterns.

A happy side effect of doing this optimization earlier is that we can now fold
the load into a math op where possible; this is shown in some of the updated
checks in the test file.

Differential Revision: http://reviews.llvm.org/D7303



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228006 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 18:54:00 +00:00
Colin LeMahieu
936986d12d [Hexagon] Converting complex number intrinsics and adding tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227995 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 18:16:28 +00:00
Colin LeMahieu
a3a588d983 [Hexagon] Adding vector intrinsics for alu32/alu and xtype/alu.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227993 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 18:01:45 +00:00
Marek Olsak
a95296a86e R600/SI: Don't generate non-existent LSHL, LSHR, ASHR B32 variants on VI
This can happen when a REV instruction is commuted.

The trick is not to define the _vi versions of instructions, which has these
consequences:
- code generation will always fail if a pseudo cannot be lowered
  (very useful to catch bugs where an unsupported instruction somehow makes
   it to the printer)
- ability to query if a pseudo can be lowered, which is done in commuteOpcode
  to prevent REV from commuting to non-REV on VI

Tested-by: Michel Dänzer <michel.daenzer@amd.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227990 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 17:38:12 +00:00
Marek Olsak
b19dbd9eb3 R600/SI: Fix dependency between instruction writing M0 and S_SENDMSG on VI (v2)
This fixes a hang when using an empty geometry shader.

v2: - don't add s_nop when followed by s_waitcnt
    - comestic changes

Tested-by: Michel Dänzer <michel.daenzer@amd.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227986 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 17:37:52 +00:00
Sanjay Patel
3cf9267d4e Fix program crashes due to alignment exceptions generated for SSE memop instructions (PR22371).
r224330 introduced a bug by misinterpreting the "FeatureVectorUAMem" bit.
The commit log says that change did not affect anything, but that's not correct.
That change allowed SSE instructions to have unaligned mem operands folded into
math ops, and that's not allowed in the default specification for any SSE variant. 

The bug is exposed when compiling for an AVX-capable CPU that had this feature
flag but without enabling AVX codegen. Another mistake in r224330 was not adding
the feature flag to all AVX CPUs; the AMD chips were excluded.

This is part of the fix for PR22371 ( http://llvm.org/bugs/show_bug.cgi?id=22371 ).

This feature bit is SSE-specific, so I've renamed it to "FeatureSSEUnalignedMem".
Changed the existing test case for the feature bit to reflect the new name and
renamed the test file itself to better reflect the feature.
Added runs to fold-vex.ll to check for the failing codegen.

Note that the feature bit is not set by default on any CPU because it may require a
configuration register setting to enable the enhanced unaligned behavior.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227983 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 17:13:04 +00:00
Bill Schmidt
aeba87d6a6 Disable 32-bit tests in tls-pic.ll until they can be repaired
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227981 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 16:57:38 +00:00
Bill Schmidt
b32d6f455f Further revise too-restrictive test CodeGen/PowerPC/tls-pic.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227980 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 16:33:55 +00:00
Bill Schmidt
f336df5f3f Further revise too-restrictive test CodeGen/PowerPC/tls-pic.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227978 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 16:29:52 +00:00
Bill Schmidt
5114e12df0 Revise too-restrictive test CodeGen/PowerPC/tls-pic.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227977 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 16:24:05 +00:00
Bill Schmidt
1123a81009 [PowerPC] Yet another approach to __tls_get_addr
This patch is a third attempt to properly handle the local-dynamic and
global-dynamic TLS models.

In my original implementation, calls to __tls_get_addr were hidden
from view until the asm-printer phase, at which point the underlying
branch-and-link instruction was created with proper relocations.  This
mostly worked well, but I used some repellent techniques to ensure
that the TLS_GET_ADDR nodes at the SD and MI levels correctly received
input from GPR3 and produced output into GPR3.  This proved to work
badly in the presence of multiple TLS variable accesses, with the
copies to and from GPR3 being scheduled incorrectly and generally
creating havoc.

In r221703, I addressed that problem by representing the calls to
__tls_get_addr as true calls during instruction lowering.  This had
the advantage of removing all of the bad hacks and relying on the
existing call machinery to properly glue the copies in place. It
looked like this was going to be the right way to go.

However, as a side effect of the recent discovery of problems with
linker optimizations for TLS, we discovered cases of suboptimal code
generation with this strategy.  The problem comes when tls_get_addr is
called for the same address, and there is a resulting CSE
opportunity.  It turns out that in such cases MachineCSE will common
the addis/addi instructions that set up the input value to
tls_get_addr, but will not common the calls themselves.  MachineCSE
does not have any machinery to common idempotent calls.  This is
perfectly sensible, since presumably this would be done at the IR
level, and introducing calls in the back end isn't commonplace.  In
any case, we end up with two calls to __tls_get_addr when one would
suffice, and that isn't good.

I presumed that the original design would have allowed commoning of
the machine-specific nodes that hid the __tls_get_addr calls, so as
suggested by Ulrich Weigand, I went back to that design and cleaned it
up so that the copies were properly held together by glue
nodes.  However, it turned out that this didn't work either...the
presence of copies to physical registers kept the machine-specific
nodes from being commoned also.

All of which leads to the design presented here.  This is a return to
the original design, except that no attempt is made to introduce
copies to and from GPR3 during instruction lowering.  Virtual registers
are used until prior to register allocation.  At that point, a special
pass is run that identifies the machine-specific nodes that hide the
tls_get_addr calls and introduces the copies to and from GPR3 around
them.  The register allocator then coalesces these copies away.  With
this design, MachineCSE succeeds in commoning tls_get_addr calls where
possible, and we get nice optimal code generation (better than GCC at
the moment, which does not common these calls).

One additional problem must be dealt with:  After introducing the
mentions of the physical register GPR3, the aggressive anti-dependence
breaker sees opportunities to improve scheduling by selecting a
different register instead.  Flags must be used on the instruction
descriptions to tell the anti-dependence breaker to keep its hands in
its pockets.

One thing missing from the original design was recording a definition
of the link register on the GET_TLS_ADDR nodes.  Doing this was found
to be insufficient to force a stack frame to be created, which led to
looping behavior because two different LR values were stored at the
same address.  This appears to have been an oversight in
PPCFrameLowering::determineFrameLayout(), which is repaired here.

Because MustSaveLR() returns true for calls to builtin_return_address,
this changed the expected behavior of
test/CodeGen/PowerPC/retaddr2.ll, which now stacks a frame but
formerly did not.  I've fixed the test case to reflect this.

There are existing TLS tests to catch regressions; the checks in
test/CodeGen/PowerPC/tls-store2.ll proved to be too restrictive in the
face of instruction scheduling with these changes, so I fixed that
up.

I've added a new test case based on the PrettyStackTrace module that
demonstrated the original problem. This checks that we get correct
code generation and that CSE of the calls to __get_tls_addr has taken
place.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227976 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 16:16:01 +00:00
Sanjay Patel
ec60318bf5 Improve test to actually check for a folded load.
This test was checking for lack of a "movaps" (an aligned load)
rather than a "movups" (an unaligned load). It also included
a store which complicated the checking.

Add specific CPU runs to prevent subtarget feature flag overrides
from inhibiting this optimization.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227972 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 15:37:18 +00:00
Bruno Cardoso Lopes
7df357f552 [X86][MMX] Improve transfer from mmx to i32
Improve EXTRACT_VECTOR_ELT DAG combine to catch conversion patterns
between x86mmx and i32 with more layers of indirection.

Before:
  movq2dq %mm0, %xmm0
  movd %xmm0, %eax
After:
  movd %mm0, %eax

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227969 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 14:46:49 +00:00
Alex Rosenberg
cba5c599e8 Revert part of r227437 as it was unnecessary. Thanks to echristo for
pointing this out.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227897 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-02 23:58:54 +00:00
Bruno Cardoso Lopes
d821e0a5cc [X86][MMX] Add tests for MMX extract element
LLVM ToT produces poor MMX code compared to 3.5. However, part of the previous
functionality can be achieved by using -x86-experimental-vector-widening-legalization.
Add tests to be sure we don't regress again.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227869 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-02 22:00:48 +00:00
Bruno Cardoso Lopes
12c944ba10 [X86][MMX] Cleanup shuffle, bitcast and insert element tests
- Merge MMX arg passing test files
- Merge MMX bitcast, insert elt and shuffle tests

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227867 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-02 21:56:11 +00:00
Tom Stellard
d73d1062fe R600/SI: 64-bit and larger memory access must be at least 4-byte aligned
This is true for SI only. CI+ supports unaligned memory accesses,
but this requires driver support, so for now we disallow unaligned
accesses for all GCN targets.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227822 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-02 18:02:28 +00:00
Tom Stellard
80e70ee18e R600/SI: Merge two test files
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227821 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-02 18:02:23 +00:00
Ahmed Bougacha
0f1a21bcb8 [AArch64] Prefer DUP/MOV ("CPY") to INS for vector_extract.
This avoids a partial false dependency on the previous content of
the upper lanes of the destination vector register.

Differential Revision: http://reviews.llvm.org/D7307


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227820 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-02 17:55:57 +00:00
Sanjay Patel
f766946abd fix typo
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227815 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-02 17:47:30 +00:00
Jan Wen Voung
1a63641597 Fix ARM peephole optimizeCompare to avoid optimizing unsigned cmp to 0.
Summary:
Previously it only avoided optimizing signed comparisons to 0.
Sometimes the DAGCombiner will optimize the unsigned comparisons
to 0 before it gets to the peephole pass, but sometimes it doesn't.

Fix for PR22373.

Test Plan: test/CodeGen/ARM/sub-cmp-peephole.ll

Reviewers: jfb, manmanren

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D7274

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227809 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-02 16:56:50 +00:00
Hal Finkel
3bafb64914 [PowerPC] VSX stores don't also read
The VSX store instructions were also picking up an implicit "may read" from the
default pattern, which was an intrinsic (and we don't currently have a way of
specifying write-only intrinsics).

This was causing MI verification to fail for VSX spill restores.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227759 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 19:07:41 +00:00
Hal Finkel
ec716cecda [PowerPC] Better scheduling for isel on P7/P8
isel is actually a cracked instruction on the P7/P8, and must start a dispatch
group. The scheduling model should reflect this so that we don't bunch too many
of them together when possible.

Thanks to Bill Schmidt and Pat Haugen for helping to sort this out.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227758 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 17:52:16 +00:00
Michael Kuperstein
acd5f13c88 [X86] Convert esp-relative movs of function arguments to pushes, step 2
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a 
reserved call frame), and perform rudimentary call folding. It still doesn't 
have a heuristic, so it is enabled only for optsize/minsize, with stack 
alignment <= 8, where it ought to be a fairly clear win.

(Re-commit of r227728)

Differential Revision: http://reviews.llvm.org/D6789


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227752 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 16:56:04 +00:00
Michael Kuperstein
5b61b8f53c Revert r227728 due to bad line endings.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227746 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 16:15:07 +00:00
Hal Finkel
8f5c829c1e [PowerPC] Make r2 allocatable on PPC64/ELF for some leaf functions
The TOC base pointer is passed in r2, and we normally reserve this register so
that we can depend on it being there. However, for leaf functions, and
specifically those leaf functions that don't do any TOC access of their own
(which is generally due to accessing the constant pool, using TLS, etc.),
we can treat r2 as an ordinary callee-saved register (it must be callee-saved
because, for local direct calls, the linker will not insert any save/restore
code).

The allocation order has been changed slightly for PPC64/ELF systems to put r2
at the end of the list (while leaving it near the beginning for Darwin systems
to prevent unnecessary output changes). While r2 is allocatable, using it still
requires spill/restore traffic, and thus comes at the end of the list.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227745 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 15:03:28 +00:00
Michael Kuperstein
59d9986259 [X86] Convert esp-relative movs of function arguments to pushes, step 2
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a 
reserved call frame), and perform rudimentary call folding. It still doesn't 
have a heuristic, so it is enabled only for optsize/minsize, with stack 
alignment <= 8, where it ought to be a fairly clear win.

Differential Revision: http://reviews.llvm.org/D6789

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227728 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 11:44:44 +00:00
Elena Demikhovsky
516052acd3 AVX2: Added 2 more tests for gather intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227718 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 08:52:15 +00:00
Jingyue Wu
f15b696b79 [NVPTX] Emit .pragma "nounroll" for loops marked with nounroll
Summary:
CUDA driver can unroll loops when jit-compiling PTX. To prevent CUDA
driver from unrolling a loop marked with llvm.loop.unroll.disable is not
unrolled by CUDA driver, we need to emit .pragma "nounroll" at the
header of that loop.

This patch also extracts getting unroll metadata from loop ID metadata
into a shared helper function.

Test Plan: test/CodeGen/NVPTX/nounroll.ll

Reviewers: eliben, meheff, jholewinski

Reviewed By: jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D7041

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227703 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 02:27:45 +00:00
Matt Arsenault
9061eb6d2e R600/SI: Only select cvt_flr/cvt_rpi with no NaNs.
These have different behavior from cvt_i32_f32 on NaN.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227693 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-31 21:28:13 +00:00
Simon Pilgrim
982005c23e [X86][SSE] Shuffle mask decode support for zero extend, scalar float/double moves and integer load instructions
This patch adds shuffle mask decodes for integer zero extends (pmovzx** and movq xmm,xmm) and scalar float/double loads/moves (movss/movsd).

Also adds shuffle mask decodes for integer loads (movd/movq).

Differential Revision: http://reviews.llvm.org/D7228

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227688 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-31 14:09:36 +00:00
Saleem Abdulrasool
81674807cc ARM: support stack probe size on Windows on ARM
Now that -mstack-probe-size is piped through to the backend via the function
attribute as on Windows x86, honour the value to permit handling of non-default
values for stack probes.  This is needed /Gs with the clang-cl driver or
-mstack-probe-size with the clang driver when targeting Windows on ARM.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227667 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-31 02:26:37 +00:00
Ahmed Bougacha
cfb61b368c [AArch64] Add a few more DUP testcases. NFC.
Also, don't lie about testing index 0.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227642 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-30 23:41:15 +00:00
Ahmed Bougacha
f2045fb1f2 [AArch64] Robustize neon-scalar-copy.ll tests. NFC.
Some of those didn't even have run lines: they were removed
inadvertently during the Great Merge of 2014.

They used to check for DUPs, but now we go through W-regs?
Filed PR22418 for that potential regression.

For now, just make the tests explicit, so we now where we stand.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227635 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-30 23:13:57 +00:00
Ahmed Bougacha
ee93f014cc [X86] Cleanup tabs in test vector-zext.ll. NFC.
Some tests have tabs, some don't.
In vector-[sz]ext.ll, space wins (well duh!).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227615 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-30 21:41:28 +00:00
Reid Kleckner
e359929517 Win64: Put a REX_W prefix on all TAILJMP* instructions
MSDN's x64 software conventions page says that this is one of the fixed
list of legal epilogues:
https://msdn.microsoft.com/en-us/library/tawsa7cb.aspx

Presumably this is how the unwinder distinguishes epilogue jumps from
in-function control flow.

Also normalize the way we place "## TAILCALL" comments on such jumps.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227611 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-30 21:03:31 +00:00
Hao Liu
7d3a44a692 [AArch64]Fix PR21675, a bug about lowering llvm.ctpop.i32. We should noot use "DAG.getUNDEF(MVT::v8i8)" to get all zero vector.
Patch by Wei-cheng Wang.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227550 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-30 02:13:53 +00:00
Reid Kleckner
c9fbc97e95 x86: Fix large model calls to __chkstk for dynamic allocas
In the large code model, we now put __chkstk in %r11 before calling it.

Refactor the code so that we only do this once. Simplify things by using
__chkstk_ms instead of __chkstk on cygming. We already use that symbol
in the prolog emission, and it simplifies our logic.

Second half of PR18582.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227519 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 23:58:04 +00:00
Reid Kleckner
cb867e4ac4 Update comments to use unreachable instead of llvm.trap, as implemented now
win64: Call __chkstk through a register with the large code model

Fixes half of PR18582. True dynamic allocas will still have a
CALL64pcrel32 which will fail.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D7267

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227503 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 22:33:00 +00:00
Matt Arsenault
fa711758df R600/SI: Implement enableAggressiveFMAFusion
Add tests for the various combines. This should
always be at least cycle neutral on all subtargets for f64,
and faster on some. For f32 we should prefer selecting
v_mad_f32 over v_fma_f32.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227484 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 19:34:32 +00:00
Colin LeMahieu
dec5091220 [Hexagon] Deleting old variants of intrinsics and adding missing tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227474 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 17:26:56 +00:00
Colin LeMahieu
f1b4917f1b [Hexagon] Adding CR intrinsic tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227463 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 16:55:37 +00:00
Tom Stellard
51a3c27d6e R600/SI: Define a schedule model and enable the generic machine scheduler
The schedule model is not complete yet, and could be improved.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227461 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 16:55:25 +00:00
Robert Lougher
1031549bec [X86] Use single add/sub for large stack offsets
For large stack offsets the compiler generates multiple immediate mode
sub/add instructions in the prologue/epilogue.  This patch makes the
compiler place the final amount to be added/subtracted into a register,
which is then added/substracted with a single operation.

Differential Revision: http://reviews.llvm.org/D7226


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227458 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 16:18:29 +00:00
Colin LeMahieu
c7260e2ffa [Hexagon] Adding XTYPE/PRED intrinsic tests. Converting predicate types to i32 instead of i1.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227457 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 16:08:43 +00:00
Bill Schmidt
a5ea0b50a4 [PowerPC] Complete setting the baseline for ppc64le
Patch by Nemanja Ivanovic.

As was uncovered by the failing test case (when run on non-PPC
platforms), the feature set when compiling with -march=ppc64le was not
being picked up. This change ensures that if the -mcpu option is not
specified, the correct feature set is picked up regardless of whether
we are on PPC or not.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227455 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 15:59:09 +00:00
Alex Rosenberg
1c31f0df49 Make the test actually test what it's supposed to test. Add a test for the from memory variant of vcvtph2ps for 256-bit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227446 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 15:19:54 +00:00
Alex Rosenberg
27f7a0622c Cleanup a few tests on sse4a machines and FileCheckize along the way.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227437 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 13:31:32 +00:00
Rafael Espindola
6cc1119408 Don't create multiple mergeable sections with -fdata-sections.
ELF has support for sections that can be split into fixed size or
null terminated entities.

Since these sections can be split by the linker, it is not necessary
to split them in codegen.

This reduces the combined .o size in a llvm+clang build from
202,394,570 to 173,819,098 bytes.

The time for linking clang with gold (on a VM, on a laptop) goes
from 2.250089985 to 1.383001792 seconds.

The flip side is the size of rodata in clang goes from 10,926,785
to 10,929,345 bytes.

The increase seems to be because of http://sourceware.org/bugzilla/show_bug.cgi?id=17902.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227431 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 12:43:28 +00:00
Charlie Turner
1a8618cbbf Add a missing Tag_DIV_use test for Cortex-M7.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227429 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 11:19:54 +00:00
Reid Kleckner
f77571aeac Add a Windows EH preparation pass that zaps resumes
If the personality is not a recognized MSVC personality function, this
pass delegates to the dwarf EH preparation pass. This chaining supports
people on *-windows-itanium or *-windows-gnu targets.

Currently this recognizes some personalities used by MSVC and turns
resume instructions into traps to avoid link errors.  Even if cleanups
are not used in the source program, LLVM requires the frontend to emit a
code path that resumes unwinding after an exception.  Clang does this,
and we get unreachable resume instructions. PR20300 covers cleaning up
these unreachable calls to resume.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D7216

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227405 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 00:41:44 +00:00
Colin LeMahieu
24373e35a4 [Hexagon] Updating several V5 intrinsics and adding FP tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227379 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 22:08:16 +00:00
Colin LeMahieu
a5062b38a9 [Hexagon] Adding XTYPE/MPY intrinsic tests and some missing multiply instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227347 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 19:16:17 +00:00
Colin LeMahieu
4925c39604 [Hexagon] Deleting a lot of old variants of intrinsics and updating references.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227338 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 18:29:11 +00:00
Colin LeMahieu
1135b0df02 [Hexagon] Converting XTYPE/BIT intrinsic patterns and adding tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227335 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 18:06:23 +00:00
Colin LeMahieu
a5070baf7e [Hexagon] Replacing XTYPE/SHIFT intrinsic patternss. Adding tests and missing instructions with tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227330 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 17:37:59 +00:00
Colin LeMahieu
0f5f7c0c1b [Hexagon] Replacing old intrinsic tests with organized versions that match the reference manual.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227321 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 16:58:05 +00:00
Tom Stellard
ff340f98e3 R600: Move DataLayout to AMDGPUTargetMachine
This is a follow up to r227113.

It is now required to use the amdgcn target for SI and newer GPUs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227316 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 16:04:26 +00:00
Michael Kuperstein
0906c8fc1c [X86] Reduce some 32-bit imuls into lea + shl
Reduce integer multiplication by a constant of the form k*2^c, where k is in {3,5,9} into a lea + shl. Previously it was only done for imulq on 64-bit platforms, but it makes sense for imull and 32-bit as well.

Differential Revision: http://reviews.llvm.org/D7196

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227308 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 14:08:22 +00:00
Michael Kuperstein
e5b95695ea [x32] Enable sibcall optimization on x32.
This includes two things:
1) Fix TCRETURNdi and TCRETURN64di patterns to check the right thing (LP64 as opposed to target bitness).
2) Allow LEA64_32 in MatchingStackOffset.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227307 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 13:38:48 +00:00
Elena Demikhovsky
b9d3801cd2 AVX-512: Added FMA intrinsics with rounding mode
By Asaf Badouh and Elena Demikhovsky

Added special nodes for rounding: FMADD_RND, FMSUB_RND..
It will prevent merge between nodes with rounding and other standard nodes.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227303 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-28 10:21:27 +00:00
Quentin Colombet
24508d33fb Revert r227242 - Merge vector stores into wider vector stores (PR21711).
This commit creates infinite loop in DAG combine for in the LLVM test-suite
for aarch64 with mcpu=cylcone (just having neon may be enough to expose this).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227272 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-27 23:58:01 +00:00
Alexey Samsonov
00b7a940e7 Revert "[x86] Combine x86mmx/i64 to v2i64 conversion to use scalar_to_vector"
This reverts commits r226953 and r226974.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227248 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-27 21:34:11 +00:00
Sanjay Patel
c94f9d3d2f Merge vector stores into wider vector stores (PR21711)
This patch resolves part of PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ).

The 'f3' test case in that report presents a situation where we have two 128-bit
stores extracted from a 256-bit source vector. 

Instead of producing this:

vmovaps %xmm0, (%rdi)
vextractf128    $1, %ymm0, 16(%rdi)

This patch merges the 128-bit stores into a single 256-bit store:

vmovups %ymm0, (%rdi)

Differential Revision: http://reviews.llvm.org/D7208



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227242 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-27 20:50:27 +00:00
Ramkumar Ramachandra
e67f5de1f7 overloaded-intrinsic-name: exercise anyptr on struct
No other test I know shows how struct names are mangled in overloaded
intrinsic functions.

Differential Revision: http://reviews.llvm.org/D7037

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227229 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-27 20:03:08 +00:00
Marek Olsak
fd55bcd060 R600/SI: Enable all tests that pass on VI without changes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227214 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-27 17:27:15 +00:00
Simon Pilgrim
44513da617 [X86][SSE] Float comparisons can sometimes be safely commuted
For ordered, unordered, equal and not-equal tests, packed float and double comparison instructions can be safely commuted without affecting the results. This patch checks the comparison mode of the (v)cmpps + (v)cmppd instructions and commutes the result if it can.

Differential Revision: http://reviews.llvm.org/D7178

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227145 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 22:29:24 +00:00
Simon Pilgrim
3ba85ab23a [X86][PCLMUL] Enable commutation for PCLMUL instructions
Patch to allow (v)pclmulqdq to be commuted - swaps the src registers and inverts the immediate (low/high) src mask.

Differential Revision: http://reviews.llvm.org/D7180

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227141 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 22:00:18 +00:00
Simon Pilgrim
38c35f3e2c Line endings fix. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227138 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 21:28:32 +00:00
Matt Arsenault
b33118d503 R600: Cleanup or test
Fix broken check lines, use multiple check prefixes,
add an additional test for i1 or.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227137 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 21:16:10 +00:00
Simon Pilgrim
1d34ec14e9 Line endings fix. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227136 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 21:15:42 +00:00
Bruno Cardoso Lopes
0f3d4650f7 [x86][MMX] Rename and cleanup tests: arith, intrinsics and shuffle
- Rename mmx-builtins to mmx-intrinsics to match other intrinsic test naming.
- Remove tests that duplicate functionality from mmx-intrinsics.ll.
- Move arith related tests to mmx-arith.ll.
- MMX related shuffle goes to vector-shuffle-mmx.ll.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227130 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 20:06:51 +00:00
Justin Holewinski
33eac3ee53 [NVPTX] Generate a more optimal sequence for select of i1
Instead of creating a pattern like "(p && a) || ((!p) && b)",
just expand the i8 operands to i32 and perform the selp on them.

Fixes PR22246

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227123 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 19:52:20 +00:00
Justin Holewinski
48e872230d [NVPTX] Handle floating-point conversion patterns that are not explicitly ordered or unordered
Fixes PR22322

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227117 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 19:11:20 +00:00
Alex Rosenberg
8da9a6686a Use a different encoding for debugtrap on PS4.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227116 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 19:09:27 +00:00
Sanjay Patel
956d6f0cf5 Model sqrtsd as a binary operation with one source operand tied to the destination (PR14221)
This patch fixes the following miscompile:

define void @sqrtsd(<2 x double> %a) nounwind uwtable ssp {
  %0 = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a) nounwind 
  %a0 = extractelement <2 x double> %0, i32 0
  %conv = fptrunc double %a0 to float
  %a1 = extractelement <2 x double> %0, i32 1
  %conv3 = fptrunc double %a1 to float
  tail call void @callee2(float %conv, float %conv3) nounwind
  ret void
}

Current codegen:

sqrtsd	%xmm0, %xmm1        ## high element of %xmm1 is undef here
xorps	%xmm0, %xmm0
cvtsd2ss	%xmm1, %xmm0
shufpd	$1, %xmm1, %xmm1
cvtsd2ss	%xmm1, %xmm1 ## operating on undef value
jmp	_callee

This is a continuation of http://llvm.org/viewvc/llvm-project?view=revision&revision=224624 ( http://reviews.llvm.org/D6330 ) 
which was itself a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ).

All of these patches are partial fixes for PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ); 
this should be the final patch needed to resolve that bug.

Differential Revision: http://reviews.llvm.org/D6885



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227111 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 18:42:16 +00:00
Eric Christopher
fcd3c4065d Move the Mips target to storing the ABI in the TargetMachine rather
than on MipsSubtargetInfo.

This required a bit of massaging in the MC level to handle this since
MC is a) largely a collection of disparate classes with no hierarchy,
and b) there's no overarching equivalent to the TargetMachine, instead
only the subtarget via MCSubtargetInfo (which is the base class of
TargetSubtargetInfo).

We're now storing the ABI in both the TargetMachine level and in the
MC level because the AsmParser and the TargetStreamer both need to
know what ABI we have to parse assembly and emit objects. The target
streamer has a pointer to the one in the asm parser and is updated
when the asm parser is created. This is fragile as the FIXME comment
notes, but shouldn't be a problem in practice since we always
create an asm parser before attempting to emit object code via the
assembler. The TargetMachine now contains the ABI so that the DataLayout
can be constructed dependent upon ABI.

All testcases have been updated to use the -target-abi command line
flag so that we can set the ABI without using a subtarget feature.

Should be no change visible externally here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227102 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 17:33:46 +00:00
Sanjay Patel
dbc6dda771 fix line-endings; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227095 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 17:21:36 +00:00
Vasileios Kalintiris
536bce219d [mips] Enable arithmetic and binary operations for the i128 data type.
Summary:
This patch adds support for some operations that were missing from
128-bit integer types (add/sub/mul/sdiv/udiv... etc.). With these
changes we can support the __int128_t and __uint128_t data types
from C/C++.

Depends on D7125

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7143

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227089 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 12:33:22 +00:00
Vasileios Kalintiris
71ec66e7fd [mips] Add tests for bitwise binary and integer arithmetic operators.
Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7125

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227087 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 12:04:40 +00:00
Vasileios Kalintiris
823e8548a0 Revert "[mips] Fix assertion on i128 addition/subtraction on MIPS64"
This reverts commit r227003. Support for addition/subtraction and
various other operations for the i128 data type will be added in a
future commit based on the review D7143.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227082 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 09:53:30 +00:00
Craig Topper
1656cc8ddb [X86] Change comparision immediate type to i8 in test cases for AVX512 floating point comparisons. The type was already changed in the definitions and was being auto upgraded to the new type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227064 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 23:26:12 +00:00
Craig Topper
fd176682b9 [X86] Use i8 immediate for comparison type on AVX512 packed integer instructions. This matches floating point equivalents. Includes autoupgrade support to convert old code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227063 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 23:26:02 +00:00
Bill Schmidt
9dbb6a4f63 [PowerPC] Revert ppc64le-aggregates.ll test changes from r227053
It appears we have different behavior with and without -mcpu=pwr8 even
with ppc64le defaulting to POWER8.  The failure appears as follows:

/home/bb/cmake-llvm-x86_64-linux/llvm-project/llvm/test/CodeGen/PowerPC/ppc64le-aggregates.ll:268:14: error: expected string not found in input
; CHECK-DAG: lfs 1, 0([[REG]])
             ^
<stdin>:497:11: note: scanning from here
 ld 3, .LC1@toc@l(3)
          ^
<stdin>:497:11: note: with variable "REG" equal to "3"
 ld 3, .LC1@toc@l(3)
          ^
<stdin>:514:2: note: possible intended match here
 lfs 1, 0(4)
 ^

Reverting this particular test case change.  Nemanja, please have a look
at the reason for the failure.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227055 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 18:18:54 +00:00
Bill Schmidt
c536eed4d8 [PowerPC] Reset the baseline for ppc64le to be equivalent to pwr8
Test by Nemanja Ivanovic.

Since ppc64le implies POWER8 as a minimum, it makes sense that the
same features are included. Since the pwr8 processor model will likely
be getting new features until the implementation is complete, I
created a new list to add these updates to. This will include them in
both pwr8 and ppc64le.

Furthermore, it seems that it would make sense to compose the feature
lists for other processor models (pwr3 and up). Per discussion in the
review, I will make this change in a subsequent patch.

In order to test the changes, I've added an additional run step to
test cases that specify -march=ppc64le -mcpu=pwr8 to omit the -mcpu
option. Since the feature lists are the same, the behaviour should be
unchanged.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227053 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 18:05:42 +00:00
Elena Demikhovsky
717d41d8c3 AVX-512: Changes in operations on masks registers for KNL and SKX
- Added KSHIFTB/D/Q for skx
- Added KORTESTB/D/Q for skx
- Fixed store operation for v8i1 type for KNL
- Store size of v8i1, v4i1 and v2i1 are changed to 8 bits



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227043 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 12:47:15 +00:00
Alexei Starovoitov
fb490166f4 bpf: add missing lit.local.cfg
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227009 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-24 18:20:52 +00:00
Alexei Starovoitov
4fe85c7548 BPF backend
Summary:
V8->V9:
- cleanup tests

V7->V8:
- addressed feedback from David:
- switched to range-based 'for' loops
- fixed formatting of tests

V6->V7:
- rebased and adjusted AsmPrinter args
- CamelCased .td, fixed formatting, cleaned up names, removed unused patterns
- diffstat: 3 files changed, 203 insertions(+), 227 deletions(-)

V5->V6:
- addressed feedback from Chandler:
- reinstated full verbose standard banner in all files
- fixed variables that were not in CamelCase
- fixed names of #ifdef in header files
- removed redundant braces in if/else chains with single statements
- fixed comments
- removed trailing empty line
- dropped debug annotations from tests
- diffstat of these changes:
  46 files changed, 456 insertions(+), 469 deletions(-)

V4->V5:
- fix setLoadExtAction() interface
- clang-formated all where it made sense

V3->V4:
- added CODE_OWNERS entry for BPF backend

V2->V3:
- fix metadata in tests

V1->V2:
- addressed feedback from Tom and Matt
- removed top level change to configure (now everything via 'experimental-backend')
- reworked error reporting via DiagnosticInfo (similar to R600)
- added few more tests
- added cmake build
- added Triple::bpf
- tested on linux and darwin

V1 cover letter:
---------------------
recently linux gained "universal in-kernel virtual machine" which is called
eBPF or extended BPF. The name comes from "Berkeley Packet Filter", since
new instruction set is based on it.
This patch adds a new backend that emits extended BPF instruction set.

The concept and development are covered by the following articles:
http://lwn.net/Articles/599755/
http://lwn.net/Articles/575531/
http://lwn.net/Articles/603983/
http://lwn.net/Articles/606089/
http://lwn.net/Articles/612878/

One of use cases: dtrace/systemtap alternative.

bpf syscall manpage:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b4fc1a460f3017e958e6a8ea560ea0afd91bf6fe

instruction set description and differences vs classic BPF:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/filter.txt

Short summary of instruction set:
- 64-bit registers
  R0      - return value from in-kernel function, and exit value for BPF program
  R1 - R5 - arguments from BPF program to in-kernel function
  R6 - R9 - callee saved registers that in-kernel function will preserve
  R10     - read-only frame pointer to access stack
- two-operand instructions like +, -, *, mov, load/store
- implicit prologue/epilogue (invisible stack pointer)
- no floating point, no simd

Short history of extended BPF in kernel:
interpreter in 3.15, x64 JIT in 3.16, arm64 JIT, verifier, bpf syscall in 3.18, more to come in the future.

It's a very small and simple backend.
There is no support for global variables, arbitrary function calls, floating point, varargs,
exceptions, indirect jumps, arbitrary pointer arithmetic, alloca, etc.
From C front-end point of view it's very restricted. It's done on purpose, since kernel
rejects all programs that it cannot prove safe. It rejects programs with loops
and with memory accesses via arbitrary pointers. When kernel accepts the program it is
guaranteed that program will terminate and will not crash the kernel.

This patch implements all 'must have' bits. There are several things on TODO list,
so this is not the end of development.
Most of the code is a boiler plate code, copy-pasted from other backends.
Only odd things are lack or < and <= instructions, specialized load_byte intrinsics
and 'compare and goto' as single instruction.
Current instruction set is fixed, but more instructions can be added in the future.

Signed-off-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>

Subscribers: majnemer, chandlerc, echristo, joerg, pete, rengolin, kristof.beyls, arsenm, t.p.northover, tstellarAMD, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D6494

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227008 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-24 17:51:26 +00:00
Daniel Sanders
f945a40f9d [mips] Fix assertion on i128 addition/subtraction on MIPS64
Summary:
In addition to the included tests, this fixes
test/CodeGen/Generic/i128-addsub.ll on a mips64 host.

Reviewers: atanasyan, sagar, vmedic

Reviewed By: vmedic

Subscribers: sdkie, llvm-commits

Differential Revision: http://reviews.llvm.org/D6610

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227003 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-24 12:58:10 +00:00
Andrea Di Biagio
b981dc4a21 [DAG] Fix wrong canonicalization performed on shuffle nodes.
This fixes a regression introduced by r226816.
When replacing a splat shuffle node with a constant build_vector,
make sure that the new build_vector has a valid number of elements.

Thanks to Patrik Hagglund for reporting this problem and providing a
small reproducible.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227002 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-24 11:54:29 +00:00
Quentin Colombet
af1cd03764 [AArch64][LoadStoreOptimizer] Form LDPSW when possible.
This patch adds the missing LD[U]RSW variants to the load store optimizer, so
that we generate LDPSW when possible.

<rdar://problem/19583480>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226978 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-24 01:25:54 +00:00
Tom Stellard
5b37a2e5ff R600/SI: Emit .hsa.version section for amdhsa OS
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226970 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 23:59:08 +00:00
Reid Kleckner
339591e0a9 Fix assertion when C++ EH filters are present in functions using SEH
Should fix PR22305.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226969 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 23:51:25 +00:00
Bruno Cardoso Lopes
807360ab08 [x86] Combine x86mmx/i64 to v2i64 conversion to use scalar_to_vector
Handle the poor codegen for i64/x86xmm->v2i64 (%mm -> %xmm) moves. Instead of
using stack store/load pair to do the job, use scalar_to_vector directly, which
in the MMX case can use movq2dq. This was the current behavior prior to
improvements for vector legalization of extloads in r213897.

This commit fixes the regression and as a side-effect also remove some
unnecessary shuffles.

In the new attached testcase, we go from:

pshufw  $-18, (%rdi), %mm0
movq    %mm0, -8(%rsp)
movq    -8(%rsp), %xmm0
pshufd  $-44, %xmm0, %xmm0
movd    %xmm0, %eax
...

To:

pshufw  $-18, (%rdi), %mm0
movq2dq %mm0, %xmm0
movd    %xmm0, %eax
...

Differential Revision: http://reviews.llvm.org/D7126
rdar://problem/19413324

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226953 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 22:44:16 +00:00
Tom Stellard
511a3c71fc R600/SI: Move i64 -> v2i32 load promotion into AMDGPUDAGToDAGISel::Select()
We used to do this promotion during DAG legalization, but this
caused an infinite loop in ExpandUnalignedLoad() because it assumed
that i64 loads were legal if i64 was a legal type.

It also seems better to report i64 loads as legal, since they actually
are and we were just promoting them to simplify our tablegen files.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226945 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 22:05:45 +00:00
Reid Kleckner
26ba4c13a7 Classify functions by EH personality type rather than using the triple
This mostly reverts commit r222062 and replaces it with a new enum. At
some point this enum will grow at least for other MSVC EH personalities.

Also beefs up the way we were sniffing the personality function.
Previously we would emit the Itanium LSDA despite using
__C_specific_handler.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6987

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226920 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 18:49:01 +00:00
Jyoti Allur
245caec9b3 This patch fixes issue with lowering below mentioned pattern :-
_foo:
        smull	 r0, r1, r1, r0
	smull	 r2, r3, r3, r2
	adds	r0, r2, r0
	adc	r1, r3, r1
	bx	lr

to

_foo:
        smull	 r0, r1, r1, r0
	smlal	 r0, r1, r3, r2
	bx	lr


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226904 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 09:10:03 +00:00
Craig Topper
d05a6aa4e6 [x86] Change u8imm operands to always print as unsigned. This makes shuffle masks and the like make way more sense.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226902 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 08:00:59 +00:00
Jan Vesely
1d07592ec7 R600: Try to use lower types for 64bit division if possible
v2: add and enable tests for SI

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226881 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 23:42:43 +00:00
Simon Pilgrim
316b43f7df [X86][AVX] Added (V)MOVDDUP / (V)MOVSLDUP / (V)MOVSHDUP memory folding + tests.
Minor tweak now that D7042 is complete, we can enable stack folding for (V)MOVDDUP and do proper testing.

Added missing AVX ymm folding patterns and fixed alignment for AVX VMOVSLDUP / VMOVSHDUP.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226873 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 22:39:59 +00:00
Simon Pilgrim
6377361399 Line endings fixes. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226872 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 22:27:37 +00:00
Simon Pilgrim
c7d6e9b0f9 [X86][SSE] Simplified PSUBUS tests
Removed loops from PSUBUS tests - ensures folding is tested. Also renamed SSE2 tests SSSE3 to match cpu.

This is a follow up commit agreed in http://reviews.llvm.org/D7094



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226871 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 22:19:58 +00:00
Ramkumar Ramachandra
230796b278 Intrinsics: introduce llvm_any_ty aka ValueType Any
Specifically, gc.result benefits from this greatly. Instead of:

gc.result.int.*
gc.result.float.*
gc.result.ptr.*
...

We now have a gc.result.* that can specialize to literally any type.

Differential Revision: http://reviews.llvm.org/D7020

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226857 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 20:14:38 +00:00
Sanjay Patel
05d5e213c4 merge consecutive stores of extracted vector elements (PR21711)
This is a 2nd try at the same optimization as http://reviews.llvm.org/D6698. 
That patch was checked in at r224611, but reverted at r225031 because it
caused a failure outside of the regression tests.

The cause of the crash was not recognizing consecutive stores that have mixed
source values (loads and vector element extracts), so this patch adds a check
to bail out if any store value is not coming from a vector element extract.

This patch also refactors the shared logic of the constant source and vector
extracted elements source cases into a helper function.

Differential Revision: http://reviews.llvm.org/D6850
 


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226845 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 18:21:26 +00:00
Michael Kuperstein
a52ddfa930 [DAGCombine] Produce better code for constant splats
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.

Differential Revision: http://reviews.llvm.org/D7093

Fixed recommit of r226811.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226816 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 13:07:28 +00:00
Michael Kuperstein
8fc1c3a619 Revert r226811, MSVC accepts code sane compilers don't.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226814 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 12:48:07 +00:00
Michael Kuperstein
0a979a09ae [DAGCombine] Produce better code for constant splats
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.

Differential Revision: http://reviews.llvm.org/D7093

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226811 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 12:37:23 +00:00
Elena Demikhovsky
2785766bc8 Fixed a bug in type legalizer for masked load/store intrinsics.
The problem occurs when after vectorization we have type
<2 x i32>. This type is promoted to <2 x i64> and then requires
additional efforts for expanding loads and truncating stores.
I added EXPAND / TRUNCATE attributes to the masked load/store
SDNodes. The code now contains additional shuffles.
I've prepared changes in the cost estimation for masked memory
operations, it will be submitted separately.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226808 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 12:07:59 +00:00
Elena Demikhovsky
cdce03426d Fixed a bug in narrowing store operation.
Type MVT::i1 became legal in KNL, but store operation can't be narrowed to this type,
since the size of VT (1 bit) is not equal to its actual store size(8 bits).

Added a test provided by David (dag@cray.com)


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226805 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 09:39:08 +00:00
Reid Kleckner
73f671a60e SEH: Finish writing the catch-all test case
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226768 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 02:31:09 +00:00
Reid Kleckner
0d056fd4c3 Win64 SEH: Emit the constant 1 for catch-all into xdata
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226767 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 02:27:44 +00:00
Simon Pilgrim
3f6acdd265 [X86][SSE] Missing SSE/AVX1 memory folding integer instructions
Added most of the missing integer vector folding patterns for SSE (to SSE42) and AVX1.

The most useful of these are probably the i32/i64 extraction, i8/i16/i32/i64 insertions, zero/sign extension, unsigned saturation subtractions, i64 subtractions and the variable mask blends (pblendvb) - others include CLMUL, SSE42 string comparisons and bit tests.

Differential Revision: http://reviews.llvm.org/D7094



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226745 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 23:43:30 +00:00
Tim Northover
f5f8a3e6a6 DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))
It can help with argument juggling on some targets, and is generally a good
idea.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226740 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 23:17:19 +00:00
Matt Arsenault
85661f76e3 R600: Add checks for urem/srem by a constant
Make sure this uses the faster expansion using magic constants
to avoid the full division path.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226734 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 22:56:15 +00:00
Simon Pilgrim
4269590166 [X86][SSE] Added support for SSE3 lane duplication shuffle instructions
This patch adds shuffle matching for the SSE3 MOVDDUP, MOVSLDUP and MOVSHDUP instructions. The big use of these being that they avoid many single source shuffles from needing to use (pre-AVX) dual source instructions such as SHUFPD/SHUFPS: causing extra moves and preventing load folds.

Adding these instructions uncovered an issue in XFormVExtractWithShuffleIntoLoad which crashed on single operand shuffle instructions (now fixed). It also involved fixing getTargetShuffleMask to correctly identify theses instructions as unary shuffles.

Also adds a missing tablegen pattern for MOVDDUP.

Differential Revision: http://reviews.llvm.org/D7042



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226716 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 22:44:35 +00:00
Matt Arsenault
50c3bc9956 R600: Add missing tests for i64 srem
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226713 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 22:43:19 +00:00
Jonathan Roelofs
cab5680f6c Fix load-store optimizer on thumbv4t
Thumbv4t does not have lo->lo copies other than MOVS,
and that can't be predicated. So emit MOVS when needed
and bail if there's a predicate.

http://reviews.llvm.org/D6592


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226711 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 22:39:43 +00:00
Matt Arsenault
305228cc0b R600/SI: Custom lower fround
This fixes it for SI. It also removes the pattern
used previously for Evergreen for f32. I'm not sure
if the the new R600 output is better or not, but it uses
1 fewer instructions if BFI is available.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226682 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 18:18:25 +00:00
Colin LeMahieu
62b9c33e13 [Hexagon] Converting multiply and accumulate with immediate intrinsics to patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226681 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 18:13:15 +00:00
Ahmed Bougacha
34288d885e [X86] Declare SSE4.1/AVX2 vector extloads covered by PMOV[SZ]X legal.
Now that we can fully specify extload legality, we can declare them
legal for the PMOVSX/PMOVZX instructions.  This for instance enables
a DAGCombine to fire on code such as
  (and (<zextload-equivalent> ...), <redundant mask>)
to turn it into:
  (zextload ...)
as seen in the testcase changes.

There is one regression, in widen_load-2.ll: we're no longer able
to do store-to-load forwarding with illegal extload memory types.
This will be addressed separately.

Differential Revision: http://reviews.llvm.org/D6533


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226676 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 17:07:06 +00:00
Tim Northover
c49e57ade1 Revert "DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))"
It hadn't gone through review yet, but was still on my local copy.

This reverts commit r226663

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226665 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 15:48:52 +00:00
Tim Northover
004d725549 AArch64: add backend option to reserve x18 (platform register)
AAPCS64 says that it's up to the platform to specify whether x18 is
reserved, and a first step on that way is to add a flag controlling
it.

From: Andrew Turner <andrew@fubar.geek.nz>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226664 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 15:43:31 +00:00
Tim Northover
47f47f5d2a DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226663 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 15:43:28 +00:00
Michael Kuperstein
0b4244ade1 [x32] Fast ISel should use LEA64_32r instead of LEA32r to adjust addresses in x32 mode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226661 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 14:44:05 +00:00
Simon Pilgrim
650d4f00ae [X86][AVX] Simplified diff between AVX1 and SSE42 fp stack folding tests. NFC.
Changed the AVX1 tests register spill tail call to return a xmm like the SSE42 version - makes doing diffs between them a lot easier without affecting the spills themselves.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226623 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 00:02:13 +00:00
Simon Pilgrim
8608e5bbc7 [X86][SSE] Added SSE/AVX1 integer stack folding tests.
Some folding patterns + tests are missing (marked as TODO) - these will be added in a future patch for review.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226622 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 23:54:17 +00:00
Simon Pilgrim
32f0438ada [X86][SSE] Added SSE fp stack folding tests.
Some folding patterns + tests are missing (marked as TODO) - these will be added in a future patch for review.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226621 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 23:50:18 +00:00
Simon Pilgrim
bddfe2660e [X86][AVX] Renamed AVX1 fp stack folding tests. NFC.
The SSE42 version of the AVX1 float stack folding tests will be added shortly, this renames the AVX1 file so that the files will be near each other in a directory listing to help ensure they are kept in sync.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226620 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 23:45:50 +00:00
Colin LeMahieu
0d9733d596 [Hexagon] Adding intrinsics for doubleword ALU operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226606 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 20:45:05 +00:00
Colin LeMahieu
50e4abc0ee [Hexagon] Removing unnecessary clutter in intrinsic tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226602 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 19:46:07 +00:00
Daniel Jasper
529ff2f257 Prevent binary-tree deterioration in sparse switch statements.
This addresses part of llvm.org/PR22262. Specifically, it prevents
considering the densities of sub-ranges that have fewer than
TLI.getMinimumJumpTableEntries() elements. Those densities won't help
jump tables.

This is not a complete solution but works around the most pressing
issue.

Review: http://reviews.llvm.org/D7070

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226600 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 19:43:33 +00:00
Ramkumar Ramachandra
b8ae0acfaf [GC] Verify-pass void vararg functions in gc.statepoint
With the appropriate Verifier changes, exactracting the result out of a
statepoint wrapping a vararg function crashes. However, a void vararg
function works fine: commit this first step.

Differential Revision: http://reviews.llvm.org/D7071

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226599 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 19:42:46 +00:00
Tom Stellard
5d96beaab5 R600/SI: Fix simple-loop.ll test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226596 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 19:33:02 +00:00
Tom Stellard
ad7a884efe R600/SI: Add kill flag when copying scratch offset to a register
This allows us to re-use the same register for the scratch offset
when accessing large private arrays.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226585 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 17:49:45 +00:00
Tom Stellard
a978a481bb R600/SI: Don't store scratch buffer frame index in MUBUF offset field
We don't have a good way of legalizing this if the frame index offset
is more than the 12-bits, which is size of MUBUF's offset field, so
now we store the frame index in the vaddr field.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226584 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 17:49:43 +00:00
Kai Nacke
fa4d8baf54 [mips] Add registers and ALL check prefix to octeon test case.
No functional change.

Reviewed by D. Sanders


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226574 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 16:14:02 +00:00
Kai Nacke
57e80129b9 [mips] Add octeon branch instructions bbit0/bbit032/bbit1/bbit132
This commits adds the octeon branch instructions bbit0/bbit032/bbit1/bbit132.
It also includes patterns for instruction selection and test cases.

Reviewed by D. Sanders


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226573 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 16:10:51 +00:00
Simon Pilgrim
2a2f94c5f2 [X86][AVX] Missing AVX1 memory folding float instructions
Now that we can create much more exhaustive X86 memory folding tests, this patch adds the missing AVX1/F16C floating point instruction stack foldings we can easily test for including the scalar intrinsics (add, div, max, min, mul, sub), conversions float/int to double, half precision conversions, rounding, dot product and bit test. The patch also adds a couple of obviously missing SSE instructions (more to follow once we have full SSE testing).

Now that scalar folding is working it broke a very old test (2006-10-07-ScalarSSEMiscompile.ll) - this test appears to make no sense as its trying to ensure that a scalar subtraction isn't folded as it 'would zero the top elts of the loaded vector' - this test just appears to be wrong to me.

Differential Revision: http://reviews.llvm.org/D7055



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226513 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 22:40:45 +00:00
Colin LeMahieu
dc8beeba1b [Hexagon] Updating muxir/ri/ii intrinsics. Setting predicate registers as compatible with i32 rather than doing custom type conversion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226500 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 20:31:18 +00:00
Colin LeMahieu
3bea6a4959 [Hexagon] Converting intrinsics combine imm/imm, simple shifts and extends.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226483 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 18:56:19 +00:00
Colin LeMahieu
596cfabbc4 [Hexagon] Converting remaining ALU32/ALU intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226480 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 18:33:58 +00:00
Colin LeMahieu
254d992ab8 [Hexagon] Converting ALU32/ALU intrinsics to new patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226478 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 18:22:19 +00:00
Greg Fitzgerald
4b96323e81 [AArch64] Implement GHC calling convention
Original patch by Luke Iannini.  Minor improvements and test added by
Erik de Castro Lopo.

Differential Revision: http://reviews.llvm.org/D6877

From: Erik de Castro Lopo <erikd@mega-nerd.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226473 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 17:40:05 +00:00
Colin LeMahieu
438f1e4979 [Hexagon] Converting halfword to double accumulating multiply intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226472 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 17:36:32 +00:00
Rafael Espindola
4b678bff4e Bring r226038 back.
No change in this commit, but clang was changed to also produce trivial comdats when
needed.

Original message:

Don't create new comdats in CodeGen.

This patch stops the implicit creation of comdats during codegen.

Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226467 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 15:16:06 +00:00
Michael Kuperstein
5a0c8601d3 [MIScheduler] Slightly better handling of constrainLocalCopy when both source and dest are local
This fixes PR21792.

Differential Revision: http://reviews.llvm.org/D6823

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226433 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 07:30:47 +00:00
Hal Finkel
d1f1656447 [PowerPC] Add r2 as an operand for all calls under both PPC64 ELF V1 and V2
Our PPC64 ELF V2 call lowering logic added r2 as an operand to all direct call
instructions in order to represent the dependency on the TOC base pointer
value. Restricting this to ELF V2, however, does not seem to make sense: calls
under ELF V1 have the same dependence, and indirect calls have an r2 dependence
just as direct ones. Make sure the dependence is noted for all calls under both
ELF V1 and ELF V2.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226432 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 07:20:27 +00:00
Matt Arsenault
676db0a373 R600: Remove redundant test
This is already covered in ftrunc.ll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226412 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 19:30:32 +00:00
Simon Pilgrim
a707cf4652 [X86][SSE] Added scalar min/max folding tests. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226406 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 18:06:23 +00:00
Simon Pilgrim
7c62013afe [X86][SSE] Added float extract and xmm extract/insert stack folding tests. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226405 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 17:04:32 +00:00
Simon Pilgrim
2793dd7a29 [X86][SSE] Added scalar conversion stack folding tests. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226404 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 16:22:15 +00:00
Simon Pilgrim
65a3a9bea3 AVX1 stack folding tests. NFC.
Begun adding more exhaustive tests - all floating point instructions should now be either tested or have placeholders. We do seem to have a number of missing instructions, I will add a patch for review once the remaining working instructions are added.

I'll then move on to SSE tests and then the integer instructions.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226400 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 12:56:39 +00:00
Hal Finkel
a01b583dbc [PowerPC] Initial PPC64 calling-convention changes for fastcc
The default calling convention specified by the PPC64 ELF (V1 and V2) ABI is
designed to work with both prototyped and non-prototyped/varargs functions. As
a result, GPRs and stack space are allocated for every argument, even those
that are passed in floating-point or vector registers.

GlobalOpt::OptimizeFunctions will transform local non-varargs functions (that
do not have their address taken) to use the 'fast' calling convention.

When functions are using the 'fast' calling convention, don't allocate GPRs for
arguments passed in other types of registers, and don't allocate stack space for
arguments passed in registers. Other changes for the fast calling convention
may be added in the future.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226399 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 12:08:47 +00:00
Hal Finkel
962cff0c08 [PowerPC] Don't list R11 as a patchpoint scratch register
R11's status is the same under both the PPC64 ELF V1 and V2 ABIs: it is
reserved for use as an "environment pointer" for compilation models that
require such a thing. We don't, we also don't need a second scratch register,
and because we support only "local" patchpoint call targets, we might as well
let R11 be used for anyregcc patchpoints.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226369 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-17 03:57:34 +00:00
Mehdi Amini
5eed637b34 Improve DAG combine pass on certain IR vector patterns
Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector
produced suboptimal code in AVX2 mode with certain IR combinations.

In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32
(undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical,
but then mysteriously generated rather bad code; the movq/movhpd combination
didn't match.

The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs
would get promoted to 4f32 by the type legalizer, eventually resulting
in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing
these were both half the output size, concatted them and then produced
a shuffle. However, the resulting concat + shuffle was more complex than
it should be; in the case where the upper half of the output is undef, we
probably want to generate shuffle + concat instead.

This enhancement causes the vector_shuffle combine step to recognize this
suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR
in case the same suboptimal pattern occurs for other reasons.

This results in the optimizer correctly producing the optimal movq + movhpd
sequence for all three variations on this IR, even with AVX2.

I've included a test case.

Radar link: rdar://problem/19287012
Fix for PR 21943.

From: Fiona Glaser <fglaser@apple.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226360 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-17 01:35:56 +00:00
Matt Arsenault
b2bb846f17 R600: Clean up floor tests
These were using different naming schemes,
not using multiple check prefixes and not using
-LABEL.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226333 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 22:11:00 +00:00
Colin LeMahieu
d08a6cd083 [Hexagon] Converting halfword to doubleword multiply intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226326 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 21:41:57 +00:00
Colin LeMahieu
9a56f06825 [Hexagon] Converting accumulating halfword multiply intrinsics to patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226324 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 21:36:34 +00:00
Colin LeMahieu
67451fe320 [Hexagon] Beginning converting intrinsics to patterns instead of duplicated definitions. Converting halfword multiply intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226318 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 20:38:54 +00:00
Adam Nemet
ad2ac976af [AVX512] Add intrinsics for masked aligned FP loads and stores
Similar to the unaligned cases.

Test was generated with update_llc_test_checks.py.

Part of <rdar://problem/17688758>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226296 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 18:50:09 +00:00
Adam Nemet
1310e4f3c7 [AVX512] Remove trailing whitespaces in this test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226295 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 18:50:07 +00:00
Andrea Di Biagio
ac7b9c828f [X86][DAG] Disable target specific combine on INSERTPS dag nodes at -O0.
This patch disables target specific combine on X86ISD::INSERTPS dag nodes
if optlevel is CodeGenOpt::None.

The backend currently implements a target specific combine rule that converts
a vector load used by an INSERTPS dag node into a scalar load plus a
scalar_to_vector. This allows ISel to select a single INSERTPSrm instead of
two instructions (i.e. a vector load plus INSERTPSrr).

However, the existing target combine rule on INSERTPS nodes only works under
the assumption that ISel will always be able to match an INSERTPSrm. This is
not true in general at -O0, since the backend only allows folding a load into
the memory operand of an instruction if the optimization level is not
CodeGenOpt::None.

In the example below:

//
__m128 test(__m128 a, __m128 *b) {
  __m128 c = _mm_insert_ps(a, *b, 1 << 6);
  return c;
}
//

Before this patch, at -O0, the backend would have canonicalized the load to 'b'
into a scalar load plus scalar_to_vector. Later on, ISel would have selected an
INSERTPSrr leaving the insertps mask in an inconsistent state:

  movss 4(%rdi), %xmm1
  insertps  $64, %xmm1, %xmm0 # xmm0 = xmm1[1],xmm0[1,2,3].

With this patch, the backend avoids folding the vector load into the operand of
the INSERTPS. The new codegen at -O0 is:

  movaps (%rdi), %xmm1
  insertps  $64, %xmm1, %xmm0 # %xmm1[1],xmm0[1,2,3].


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226277 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 14:55:26 +00:00
Simon Pilgrim
3717f7c80c [X86] Refactored stack memory folding tests to explicitly force register spilling
The current 'big vectors' stack folded reload testing pattern is very bulky and makes it difficult to test all instructions as big vectors will tend to use only the ymm instruction implementations.

This patch changes the tests to use a nop call that lists explicit xmm registers as sideeffects, with this we can force a partial register spill of the relevant registers and then check that the reload is correctly folded. The asm generated only adds the forced spill, a nop instruction and a couple of extra labels (a fraction of the current approach).

More exhaustive tests will follow shortly, I've added some extra tests (the xmm versions of some of the existing folding tests) as a starting point.

Differential Revision: http://reviews.llvm.org/D6932



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226264 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 09:32:54 +00:00
Timur Iskhodzhanov
6a7c74de33 Revert r226242 - Revert Revert Don't create new comdats in CodeGen
This breaks AddressSanitizer (ninja check-asan) on Windows

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226251 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 08:38:45 +00:00
Hal Finkel
92cd0ca3b2 [PowerPC] Adjust PatchPoints for ppc64le
Bill Schmidt pointed out that some adjustments would be needed to properly
support powerpc64le (using the ELF V2 ABI). For one thing, R11 is not available
as a scratch register, so we need to use R12. R12 is also available under ELF
V1, so to maintain consistency, I flipped the order to make R12 the first
scratch register in the array under both ABIs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226247 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 04:40:58 +00:00
Rafael Espindola
dfe88a08c7 Revert "Revert Don't create new comdats in CodeGen"
This reverts commit r226173, adding r226038 back.

No change in this commit, but clang was changed to also produce trivial comdats for
costructors, destructors and vtables when needed.

Original message:

Don't create new comdats in CodeGen.

This patch stops the implicit creation of comdats during codegen.

Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226242 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 02:22:55 +00:00
Matt Arsenault
ab2315014e R600/SI: Add patterns for v_cvt_{flr|rpi}_i32_f32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226230 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 23:58:35 +00:00
Matt Arsenault
c204f47feb R600/SI: Fix trailing comma with modifiers
Instructions with 1 operand can still use source modifiers,
so make sure we don't print an extra comma afterwards.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226226 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 23:17:03 +00:00
Hal Finkel
94dc061e85 [PowerPC] Loosen ELFv1 PPC64 func descriptor loads for indirect calls
Function pointers under PPC64 ELFv1 (which is used on PPC64/Linux on the
POWER7, A2 and earlier cores) are really pointers to a function descriptor, a
structure with three pointers: the actual pointer to the code to which to jump,
the pointer to the TOC needed by the callee, and an environment pointer. We
used to chain these loads, and make them opaque to the rest of the optimizer,
so that they'd always occur directly before the call. This is not necessary,
and in fact, highly suboptimal on embedded cores. Once the function pointer is
known, the loads can be performed ahead of time; in fact, they can be hoisted
out of loops.

Now these function descriptors are almost always generated by the linker, and
thus the contents of the descriptors are invariant. As a result, by default,
we'll mark the associated loads as invariant (allowing them to be hoisted out
of loops). I've added a target feature to turn this off, however, just in case
someone needs that option (constructing an on-stack descriptor, casting it to a
function pointer, and then calling it cannot be well-defined C/C++ code, but I
can imagine some JIT-compilation system doing so).

Consider this simple test:
  $ cat call.c

  typedef void (*fp)();
  void bar(fp x) {
    for (int i = 0; i < 1600000000; ++i)
      x();
  }

  $ cat main.c

  typedef void (*fp)();
  void bar(fp x);
  void foo() {}
  int main() {
    bar(foo);
  }

On the PPC A2 (the BG/Q supercomputer), marking the function-descriptor loads
as invariant brings the execution time down to ~8 seconds from ~32 seconds with
the loads in the loop.

The difference on the POWER7 is smaller. Compiling with:

  gcc -std=c99 -O3 -mcpu=native call.c main.c : ~6 seconds [this is 4.8.2]

  clang -O3 -mcpu=native call.c main.c : ~5.3 seconds

  clang -O3 -mcpu=native call.c main.c -mno-invariant-function-descriptors : ~4 seconds
  (looks like we'd benefit from additional loop unrolling here, as a first
   guess, because this is faster with the extra loads)

The -mno-invariant-function-descriptors will be added to Clang shortly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226207 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 21:17:34 +00:00
Colin LeMahieu
02b677594c [Hexagon] Updating indexed load-extend patterns and changing test to new expected output.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226206 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 21:07:52 +00:00
Hal Finkel
39b09ae788 Revert "r226086 - Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers""
Reapply r226071 with fixes. Two fixes:

 1. We need to manually remove the old and create the new 'deaf defs'
    associated with physical register definitions when we move the definition of
    the physical register from the copy point to the point of the original vreg def.

    This problem was picked up by the machinstr verifier, and could trigger a
    verification failure on test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll, so I've
    turned on the verifier in the tests.

 2. When moving the def point of the phys reg up, we need to make sure that it
    is neither defined nor read in between the two instructions. We don't, however,
    extend the live ranges of phys reg defs to cover uses, so just checking for
    live-range overlap between the pair interval and the phys reg aliases won't
    pick up reads. As a result, we manually iterate over the range and check for
    reads.

    A test soon to be committed to the PowerPC backend will test this change.

Original commit message:

[RegisterCoalescer] Remove copies to reserved registers

This allows the RegisterCoalescer to join "non-flipped" range pairs with a
physical destination register -- which allows the RegisterCoalescer to remove
copies like this:

<vreg> = something (maybe a load, for example)
... (things that don't use PHYSREG)
PHYSREG = COPY <vreg>

(with all of the restrictions normally applied by the RegisterCoalescer: having
compatible register classes, etc. )

Previously, the RegisterCoalescer handled only the opposite case (copying
*from* a physical register). I don't handle the problem fully here, but try to
get the common case where there is only one use of <vreg> (the COPY).

An upcoming commit to the PowerPC backend will make this pattern much more
common on PPC64/ELF systems.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226200 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 20:32:09 +00:00
Matt Arsenault
ecbec418bd R600/SI: Improve fpext / fptrunc test coverage
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226197 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 19:39:42 +00:00