Commit Graph

10535 Commits

Author SHA1 Message Date
Adam Nemet
d290fa608f [X86] Improve buildFromShuffleMostly for AVX
For a 256-bit BUILD_VECTOR consisting mostly of shuffles of 256-bit vectors,
both the BUILD_VECTOR and its operands may need to be legalized in multiple
steps.  Consider:

(v8f32 (BUILD_VECTOR (extract_vector_elt (v8f32 %vreg0,) Constant<1>),
                     (extract_vector_elt %vreg0, Constant<2>),
                     (extract_vector_elt %vreg0, Constant<3>),
                     (extract_vector_elt %vreg0, Constant<4>),
                     (extract_vector_elt %vreg0, Constant<5>),
                     (extract_vector_elt %vreg0, Constant<6>),
                     (extract_vector_elt %vreg0, Constant<7>),
                     %vreg1))

a. We can't build a 256-bit vector efficiently so, we need to split it into
two 128-bit vecs and combine them with VINSERTX128.

b. Operands like (extract_vector_elt (v8f32 %vreg0), Constant<7>) needs to be
split into a VEXTRACTX128 and a further extract_vector_elt from the
resulting 128-bit vector.

c. The extract_vector_elt from b. is lowered into a shuffle to the first
element and a movss.

Depending on the order in which we legalize the BUILD_VECTOR and its
operands[1], buildFromShuffleMostly may be faced with:

(v4f32 (BUILD_VECTOR (extract_vector_elt
                      (vector_shuffle<1,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     (extract_vector_elt
                      (vector_shuffle<2,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     (extract_vector_elt
                      (vector_shuffle<3,u,u,u> (extract_subvector %vreg0, Constant<4>), undef),
                      Constant<0>),
                     %vreg1))

In order to figure out the underlying vector and their identity we need to see
through the shuffles.

[1] Note that the order in which operations and their operands are legalized is
only guaranteed in the first iteration of LegalizeDAG.

Fixes <rdar://problem/16296956>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206634 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 19:44:16 +00:00
Duncan P. N. Exon Smith
ebb5d29473 Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2)
This reverts commit r206622 and the MSVC fixup in r206626.

Apparently the remotely failing tests are still failing, despite my
attempt to fix the nondeterminism in r206621.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206628 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 17:56:08 +00:00
Duncan P. N. Exon Smith
54850bedf2 Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commit r206556, effectively reapplying commit r206548 and
its fixups in r206549 and r206550.

In an intervening commit I've added target triples to the tests that
were failing remotely [1] (but passing locally).  I'm hoping the mystery
is solved?  I'll revert this again if the tests are still failing
remotely.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206622 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 17:22:25 +00:00
Duncan P. N. Exon Smith
1e1954f749 Add some target triples for better determinism
These tests were failing on some buildbots after r206548 (reverted in
r206556), but passing locally.

They were missing target triples, so maybe that's the problem?

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206621 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 17:22:19 +00:00
Tim Northover
7b4b261611 AArch64/ARM64: add more NEON tests.
Mostly no testing this time, since they were just wrangling
target-specific intrinsics.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206613 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 14:54:53 +00:00
Tim Northover
f34a512a68 ARM64: disable generation of .loh directives outside MachO.
Part of PR19455.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206611 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 14:54:46 +00:00
Tim Northover
9cfd368302 ARM64: don't emit .subsections_via_symbols on ELF.
Part of PR19455.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206610 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 14:54:41 +00:00
Tim Northover
1d5a2ad8a6 ARM64: add extra NEG pattern.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206609 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 14:54:35 +00:00
Tim Northover
936285440b AArch64/ARM64: port more AArch64 tests to ARM64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206592 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 13:16:55 +00:00
Tim Northover
753cfe6172 AArch64/ARM64: add non-scalar lowering for more FCVT operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206591 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 13:16:42 +00:00
Tim Northover
7b4b522ec8 AArch64/ARM64: improve spotting of EXT instructions from VECTOR_SHUFFLE.
We couldn't cope if the first mask element was UNDEF before, which
isn't ideal.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206588 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 12:50:58 +00:00
Benjamin Kramer
c32e261a1a X86: Pattern match scalar loads + vcvtph2ps into just vcvtph2ps.
vcvtph2ps only reads the lower 64 bits of the address passed to the
intrinsic.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206579 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 10:45:33 +00:00
Tim Northover
fb96efa7dd AArch64/ARM64: port atomics test to ARM64.
Covers quite a few extra instructions (like any of the max/min ones
which were broken until recently on ARM64).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206575 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 09:31:31 +00:00
Tim Northover
0d6995985a AArch64/ARM64: spot a greater variety of concat_vector operations.
Code mostly copied from AArch64, just tidied up a trifle and plumbed
into the ARM64 way of doing things.

This also enables the AArch64 tests which inspired the previous
untested commits.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206574 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 09:31:27 +00:00
Tim Northover
70b63374f2 ARM64: implement cunning optimisation from AArch64
A vector extract followed by a dup can become a single instruction even if the
types don't match. AArch64 handled this in ISelLowering, but a few reasonably
simple patterns can take care of it in TableGen, so that's where I've put it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206573 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 09:31:20 +00:00
Tim Northover
66643da8fc AArch64/ARM64: emit all vector FP comparisons as such.
ARM64 was scalarizing some vector comparisons which don't quite map to
AArch64's compare and mask instructions. AArch64's approach of sacrificing a
little efficiency to emulate them with the limited set available was better, so
I ported it across.

More "inspired by" than copy/paste since the backend's internal expectations
were a bit different, but the tests were invaluable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206570 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 09:31:07 +00:00
Tim Northover
937290d7ed AArch64/ARM64: port BSL logic from AArch64 & enable test.
I enhanced it a little in the process. The decision shouldn't really be beased
on whether a BUILD_VECTOR is a splat: any set of constants will do the job
provided they're related in the correct way.

Also, the BUILD_VECTOR could be any operand of the incoming AND nodes, so it's
best to check for all 4 possibilities rather than assuming it'll be the RHS.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206569 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 09:31:01 +00:00
Tim Northover
2f5d14af9d AArch64/ARM64: copy byval implementation from AArch64.
It's not actually used to handle C or C++ ABI rules on ARM64, but could well be
emitted by other language front-ends, so it's as well to have a sensible
implementation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206568 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 09:30:52 +00:00
Jiangning Liu
bc3655f9c8 This is one of the optimizations ported from ARM64 to AArch64 to address the performance gap between these two back ends. The test case newly added for AArch64 already exists in ARM64.
Patched by Z.Zheng



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206559 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 05:58:09 +00:00
Matt Arsenault
746734df1a R600/SI: Try to use scalar BFE.
Use scalar BFE with constant shift and offset when possible.
This is complicated by the fact that the scalar version packs
the two operands of the vector version into one.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206558 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 05:19:26 +00:00
Jiangning Liu
532a5ffe4c This commit enables unaligned memory accesses of vector types on AArch64 back end. This should boost vectorized code performance.
Patched by Z. Zheng



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206557 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 03:58:38 +00:00
Duncan P. N. Exon Smith
c7a3b95c0f Revert "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commits r206548, r206549 and r206549.

There are some unit tests failing that aren't failing locally [1], so
reverting until I have time to investigate.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206556 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 02:17:43 +00:00
Duncan P. N. Exon Smith
cc1e1707b8 blockfreq: Rewrite BlockFrequencyInfoImpl
Rewrite the shared implementation of BlockFrequencyInfo and
MachineBlockFrequencyInfo entirely.

The old implementation had a fundamental flaw:  precision losses from
nested loops (or very wide branches) compounded past loop exits (and
convergence points).

The @nested_loops testcase at the end of
test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating.  This
function has three nested loops, with branch weights in the loop headers
of 1:4000 (exit:continue).  The old analysis gives non-sensical results:

    Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
    ---- Block Freqs ----
     entry = 1.0
     for.cond1.preheader = 1.00103
     for.cond4.preheader = 5.5222
     for.body6 = 18095.19995
     for.inc8 = 4.52264
     for.inc11 = 0.00109
     for.end13 = 0.0

The new analysis gives correct results:

    Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
    block-frequency-info: nested_loops
     - entry: float = 1.0, int = 8
     - for.cond1.preheader: float = 4001.0, int = 32007
     - for.cond4.preheader: float = 16008001.0, int = 128064007
     - for.body6: float = 64048012001.0, int = 512384096007
     - for.inc8: float = 16008001.0, int = 128064007
     - for.inc11: float = 4001.0, int = 32007
     - for.end13: float = 1.0, int = 8

Most importantly, the frequency leaving each loop matches the frequency
entering it.

The new algorithm leverages BlockMass and PositiveFloat to maintain
precision, separates "probability mass distribution" from "loop
scaling", and uses dithering to eliminate probability mass loss.  I have
unit tests for these types out of tree, but it was decided in the review
to make the classes private to BlockFrequencyInfoImpl, and try to shrink
them (or remove them entirely) in follow-up commits.

The new algorithm should generally have a complexity advantage over the
old.  The previous algorithm was quadratic in the worst case.  The new
algorithm is still worst-case quadratic in the presence of irreducible
control flow, but it's linear without it.

The key difference between the old algorithm and the new is that control
flow within a loop is evaluated separately from control flow outside,
limiting propagation of precision problems and allowing loop scale to be
calculated independently of mass distribution.  Loops are visited
bottom-up, their loop scales are calculated, and they are replaced by
pseudo-nodes.  Mass is then distributed through the function, which is
now a DAG.  Finally, loops are revisited top-down to multiply through
the loop scales and the masses distributed to pseudo nodes.

There are some remaining flaws.

  - Irreducible control flow isn't modelled correctly.  LoopInfo and
    MachineLoopInfo ignore irreducible edges, so this algorithm will
    fail to scale accordingly.  There's a note in the class
    documentation about how to get closer.  See also the comments in
    test/Analysis/BlockFrequencyInfo/irreducible.ll.

  - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting
    the 64-bit integer precision used downstream.

  - The "bias" calculation proposed on llvmdev is *not* incorporated
    here.  This will be added in a follow-up commit, once comments from
    this review have been handled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206548 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 01:57:45 +00:00
Matt Arsenault
6834a55df3 R600/SI: Match sign_extend_inreg to s_sext_i32_i8 and s_sext_i32_i16
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206547 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 01:53:18 +00:00
Tom Stellard
cfe02c46dc R600/SI: Use SReg_64 instead of VSrc_64 when selecting BUILD_PAIR
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206541 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 00:36:21 +00:00
Louis Gerbarg
fc8fa8238d Make test/CodeGen/ARM64/vector-insertion.ll explicitly select neon syntax
Change the command line vector-insertion.ll to explicitly set the neon syntax
to apple so that buildbots that default to other syntaxes won't fail.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206502 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-17 21:32:41 +00:00
Tom Stellard
93ea1378d2 R600/SI: Stop using i128 as the resource descriptor type
Having i128 as a legal type complicates the legalization phase.  v4i32
is already a legal type, so we will use that instead.

This fixes several piglit tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206500 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-17 21:00:11 +00:00
Louis Gerbarg
5540570374 Improve ARM64 vector creation
This patch improves the performance of vector creation in caseiswhere where
several of the lanes in the vector are a constant floating point value. It
also includes new patterns to fold together some of the instructions when the
value is 0.0f. Test cases included.

rdar://16349427

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206496 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-17 20:51:50 +00:00
Jim Grosbach
4af58f145d ARM64: [su]xtw use W regs as inputs, not X regs.
Update the SXT[BHW]/UXTW instruction aliases and the shifted reg addressing
mode handling.

PR19455 and rdar://16650642

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206495 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-17 20:47:31 +00:00
Tim Northover
90dd89ed81 ARM64: switch to IR-based atomic operations.
Goodbye code!

(Game: spot the bug fixed by the change).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206490 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-17 20:00:33 +00:00
Tim Northover
fa9a0aa77b ARM64: add acquire/release versions of the existing atomic intrinsics.
These will be needed to support IR-level lowering of atomic
operations.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206489 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-17 20:00:24 +00:00
Josh Magee
a32348530f [stack protector] Make the StackProtector pass respect ssp-buffer-size.
Previously, SSPBufferSize was assigned the value of the "stack-protector-buffer-size"
attribute after all uses of SSPBufferSize.  The effect was that the default
SSPBufferSize was always used during analysis.  I moved the check for the
attribute before the analysis; now --param ssp-buffer-size= works correctly again.

Differential Revision: http://reviews.llvm.org/D3349


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206486 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-17 19:08:36 +00:00
Matt Arsenault
9e383d4b48 R600/SI: f64 frint is legal on CI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206475 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-17 17:06:37 +00:00
Matt Arsenault
003de065a3 R600/SI: Fix zext from i1 to i64
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206437 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-17 02:03:08 +00:00
Adam Nemet
e1a38f7041 [ARM64] Fix "Cannot select" for vector ctpop
The commit of r205855:

Author: Arnold Schwaighofer <aschwaighofer@apple.com>
Date:   Wed Apr 9 14:20:47 2014 +0000

    SLPVectorizer: Only vectorize intrinsics whose operands are widened equally

    The vectorizer only knows how to vectorize intrinics by widening all operands by
    the same factor.

    Patch by Tyler Nowicki!

exposed a backend bug causing a regression (Cannot select ctpop).

The commit msg is a bit confusing because the patch actually changes the
behavior for the loop-vectorizer as well.  As things got refactored into a
helper ctpop got snuck in to the trivially-vectorizable helper which is now
used by both vectorizers.  In other words, we started seeing vector-ctpops in
the backend.

This change makes ctpop LegalizeAction::Expand for the types not supported by
the byte-only CNT instruction.  We may be able to custom-lower these later to
a single CNT but this is to fix the compiler crash first.

Fixes <rdar://problem/16578951>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206433 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-17 01:01:37 +00:00
Matheus Almeida
c308f165a0 [mips] Add initial support for NaN2008 in the back-end.
This is so that EF_MIPS_NAN2008 is set if we are using IEEE 754-2008
NaN encoding (-mnan=2008). This patch also adds support for parsing
'.nan legacy' and '.nan 2008' assembly directives. The handling of
these directives should match GAS' behaviour i.e., the last directive
in use sets the ELF header bit (EF_MIPS_NAN2008).

Differential Revision: http://reviews.llvm.org/D3346


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206396 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-16 15:48:55 +00:00
Tim Northover
f539725734 AArch64/ARM64: port some NEON tests to ARM64
These ones used completely different sets of intrinsics, so the only way to do
it is create a separate ARM64 copy and change them all.

Other than that, CodeGen was straightforward, no deficiencies detected here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206392 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-16 15:28:02 +00:00
Daniel Sanders
4134d06487 [mips] Fix emission of '.option pic0' for MIPS-IV.
Summary: This was a case of incorrect usage of hasMips64() vs isABI_N64()

Reviewers: matheusalmeida, dsanders

Reviewed By: dsanders

Differential Revision: http://reviews.llvm.org/D3398

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206388 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-16 13:58:57 +00:00
Daniel Sanders
849ca451c8 [mips] Correct r206370 to account for non-Linux targets using the small data section.
This should fix the ninja-x64-msvc-RA-centos6 builder.

I suspect the check in MipsSubtarget.cpp is incorrect and is really trying to
check for a bare-metal target rather and anything other than linux. I'll
investigate this.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206385 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-16 12:29:08 +00:00
Tim Northover
115d4f407b ARM64: specify triple so that Linux tests pass
Now that Linux is trying to reparse all inline asm it chokes on the different
comment character in this test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206382 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-16 12:03:56 +00:00
Tim Northover
1a8adcb569 AArch64/ARM64: add another set of tests from AArch64
Another batch with no code changes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206381 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-16 11:53:07 +00:00
Tim Northover
1a44333f0e AArch64/ARM64: port across stub handling for ELF C++ exceptions.
The most important part here is that we should actuall emit the stubs we refer
to in the exception table, but as a side issue this uses more sensible & GCC
compatible representations for some of the bits of information.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206380 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-16 11:52:55 +00:00
Tim Northover
fef8e383eb ARM64: use 32-bit moves for constants where possible.
If we know that a particular 64-bit constant has all high bits zero, then we
can rely on the fact that 32-bit ARM64 instructions automatically zero out the
high bits of an x-register. This gives the expansion logic less constraints to
satisfy and so sometimes allows it to pick better sequences.

Came up while porting test/CodeGen/AArch64/movw-consts.ll: this will allow a
32-bit MOVN to be used in @test8 soon.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206379 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-16 11:52:51 +00:00
Tim Northover
ea9988a812 ARM64: use the integrated assembler on ELF.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206378 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-16 11:52:40 +00:00
Matheus Almeida
8e0f5768a6 [mips] Emit '.set nomicromips' before a function's entry label
if not in micromips mode.

The test (elf_st_other.ll) was renamed as the name and description didn't
make sense as the test wasn't checking any symbol table entry.

Differential Revision: http://reviews.llvm.org/D3346



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206377 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-16 11:46:59 +00:00
Daniel Sanders
b78eac2d4d [mips] Correct callee saved list for the N32 ABI and enable test
Summary: Depends on D3339

Reviewers: matheusalmeida, vmedic

Reviewed By: matheusalmeida

Differential Revision: http://reviews.llvm.org/D3340

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206371 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-16 10:23:37 +00:00
Daniel Sanders
c90bd4a882 [mips] Add calling convention tests covering O32, N32, and N64.
Summary:
I had difficulty finding tests for the N32 and N64 ABI so I've added a
collection of calling convention tests based on the document MIPS ABIs
Described (MD00305), the MIPSpro N32 Handbook, and the SYSV ABI. Where the
documents/implementations disagree, I've used GCC to resolve the conflict.

A few interesting details:
* For N32, LLVM uses 64-bit pointers when saving $ra despite pointers being
  32-bit. I've yet to find a supporting statement in the ABI documentation but
  the current behaviour matches GCC.

* For O32, the non-variable portion of a varargs argument list is also subject
  to the rule that floating-point is passed via GPR's (on N32/N64 only the
  variable portion is subject to this rule). This agrees with GCC's behaviour
  and the SYSV ABI but contradicts part of the MIPSpro N32 Handbook which talks about O32's behaviour.

* The N32 implementation has the wrong callee-saved register list.
  (I already have a fix for this but will commit it as a follow-up).

I've left RUN-TODO lines in for O32 on MIPS64. I don't plan to support this case
for now but we should revisit it.

Reviewers: matheusalmeida, vmedic

Reviewed By: matheusalmeida

Differential Revision: http://reviews.llvm.org/D3339

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206370 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-16 09:59:46 +00:00
Tim Northover
946bea39dc ARM64: explicitly ask for Apple NEON syntax so test passes on Linux
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206368 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-16 09:13:44 +00:00
Tim Northover
be50dc8b1f ARM64: mark x7 as used when an i128 gets shunted onto the stack.
The second half of a split i128 was ending up in x7, which is not a good thing.

This is another part of PR19432.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206366 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-16 09:03:25 +00:00
Tim Northover
7474c171e1 DAGCombiner: don't optimise non-existant litpool load
This particular DAG combine is designed to kick in when both ConstantFPs will
end up being loaded via a litpool, however those nodes have a semi-legal
status, dictated by isFPImmLegal so in some cases there wouldn't have been a
litpool in the first place. Don't try to be clever in those circumstances.

Picked up while merging some AArch64 tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206365 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-16 09:03:09 +00:00
Matt Arsenault
9fde950b1b R600: Extend r600 sign_extend_inreg tests for EG
Patch by: Jan Vesely <jan.vesely@rutgers.edu>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206349 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-16 01:41:34 +00:00
Matt Arsenault
bec5c611e1 R600/SI: Print more immediates in hex format
Print in decimal for inline immediates, and hex otherwise. Use hex
always for offsets in addressing offsets.

This approximately matches what the shader compiler does.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206335 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 22:32:49 +00:00
Nick Lewycky
569e5a5b1c Make this test not match its own filename, when being run from a path that includes the string 'add'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206331 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 22:29:32 +00:00
Matt Arsenault
f8ea0352e0 R600/SI: Fix loads of i1
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206330 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 22:28:39 +00:00
Akira Hatanaka
f3930395f5 Make FastISel::SelectInstruction return before target specific fast-isel code
handles Intrinsic::trap if TargetOptions::TrapFuncName is set.

This fixes a bug in which the trap function was not taken into consideration
when a program was compiled without optimization (at -O0).

<rdar://problem/16291933>



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206323 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 21:30:06 +00:00
Andrea Di Biagio
749e8fee34 [X86] Improve the lowering of packed shifts by constant build_vector.
This patch teaches the backend how to efficiently lower logical and
arithmetic packed shifts on both SSE and AVX/AVX2 machines.

When possible, instead of scalarizing a vector shift, the backend should try
to expand the shift into a sequence of two packed shifts by immedate count
followed by a MOVSS/MOVSD.

Example
  (v4i32 (srl A, (build_vector < X, Y, Y, Y>)))

Can be rewritten as:
  (v4i32 (MOVSS (srl A, <Y,Y,Y,Y>), (srl A, <X,X,X,X>)))

[with X and Y ConstantInt]

The advantage is that the two new shifts from the example would be lowered into
X86ISD::VSRLI nodes. This is always cheaper than scalarizing the vector into
four scalar shifts plus four pairs of vector insert/extract.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206316 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 19:30:48 +00:00
Quentin Colombet
49ad5d5dd5 [ARM64] Set default CPU to generic instead of cyclone.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206313 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 19:08:46 +00:00
Robert Lougher
4634a9ba1e Revert r191049/r191059 as it can produce wrong code (see PR17975).
It has already been reverted on the 3.4 branch in r196521.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206311 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 18:34:24 +00:00
Tim Northover
1e3f66afe8 AArch64/ARM64: enable more AArch64 tests on ARM64.
No code changes for this bunch, just some test rejigs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206291 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 14:00:29 +00:00
Tim Northover
5f8234943d AArch64/ARM64: add missing pattern for extending load.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206290 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 14:00:19 +00:00
Tim Northover
2a83cb71ad AArch64/ARM64: only mangle MOVZ/MOVN during encoding when needed
Sometimes we need emit the bits that would actually be a MOVN when producing a
relocated MOVZ instruction (don't ask). But not always, a check which ARM64 got
wrong until now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206289 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 14:00:15 +00:00
Tim Northover
5080ae2e21 AArch64/ARM64: add support for large code-model jump tables.
I've left the MachO CodeGen as it is, there's a reasonable chance it should use
the GOT like ConstPools, but I'm not certain.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206288 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 14:00:11 +00:00
Tim Northover
e9cae9b79f AArch64/ARM64: add patterns for various commutations of FNMADD.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206287 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 14:00:06 +00:00
Tim Northover
e8bc8a7d58 AArch64/ARM64: add half as a storage type on ARM64.
This brings it into line with the AArch64 behaviour and should open the way for
certain OpenCL features.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206286 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 14:00:03 +00:00
Tim Northover
380fa65d5d AArch64/ARM64: copy patterns for fixed-point conversions
Code is mostly copied directly across, with a slight extension of the
ISelDAGToDAG function so that it can cope with the floating-point constants
being behind a litpool.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206285 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 13:59:57 +00:00
Tim Northover
1291807e03 ARM64: add constraints to various FastISel operations
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206284 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 13:59:53 +00:00
Tim Northover
f90c8c7063 AArch64/ARM64: add more arm64 lines to AArch64 regression tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206282 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 13:59:44 +00:00
Tim Northover
3f2d713f4d AArch64/ARM64: add dp tests from AArch64
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206281 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 13:59:40 +00:00
Quentin Colombet
e8603c43f2 [Register Coalescer] Add a test case for 206060.
<rdar://problem/16582185>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206235 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 01:15:32 +00:00
Louis Gerbarg
261f0df185 Fix for codegen bug that could cause illegal cmn instruction generation
In rare cases the dead definition elimination pass code can cause illegal cmn
instructions when it replaces dead registers on instructions that use
unmaterialized frame indexes. This patch disables the dead definition
optimization for instructions which include frame index operands.

rdar://16438284

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206208 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 21:05:05 +00:00
Louis Gerbarg
27539d46cc Add a flag to disable the ARM64DeadRegisterDefinitionsPass
This patch adds a -arm64-dead-def-elimination flag so that it is possible to
disable dead definition elimination. Includes test case.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206207 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 21:05:02 +00:00
Akira Hatanaka
268c0509a9 Fix a bug in which BranchProbabilityInfo wasn't setting branch weights of basic blocks inside loops correctly.
Previously, BranchProbabilityInfo::calcLoopBranchHeuristics would determine the weights of basic blocks inside loops even when it didn't have enough information to estimate the branch probabilities correctly. This patch fixes the function to exit early if it doesn't see any exit edges or back edges and let the later heuristics determine the weights.

This fixes PR18705 and <rdar://problem/15991090>.

Differential Revision: http://reviews.llvm.org/D3363



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206194 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 16:56:19 +00:00
Richard Trieu
b79d042e4e Fix 2008-03-05-SxtInRegBug.ll so that the CHECK-NOT will not match the filename.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206193 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 16:53:50 +00:00
Daniel Sanders
6ede5d24b8 [mips] Fix fcopysign for MIPS-IV and add the test.
Summary:
This was another incorrect use of hasMips64() vs isGP64bit().

Depends on D3344

Reviewers: matheusalmeida, vmedic

Reviewed By: vmedic

Differential Revision: http://reviews.llvm.org/D3347

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206187 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 16:24:12 +00:00
Daniel Sanders
0543dab791 [mips] MIPS-IV is broadly the same as MIPS64 so duplicate all -mcpu=mips64 tests with -mcpu=mips4 as a starting point
Summary:
Two exceptions to this:
  test/CodeGen/Mips/octeon.ll
  test/CodeGen/Mips/octeon_popcnt.ll
these test extensions to MIPS64

One test is altered for MIPS-IV:
  test/CodeGen/Mips/mips64countleading.ll
    Tests dclo/dclz which were added in MIPS64. The MIPS-IV version tests
    that dclo/dclz are not emitted.

Four tests fail and are not in this patch:
  test/CodeGen/Mips/abicalls.ll
  test/CodeGen/Mips/fcopysign-f32-f64.ll
  test/CodeGen/Mips/fcopysign.ll
  test/CodeGen/Mips/stack-alignment.ll

Depends on D3343

Reviewers: matheusalmeida, vmedic

Reviewed By: vmedic

Differential Revision: http://reviews.llvm.org/D3344

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206185 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 16:00:28 +00:00
Daniel Sanders
327be6483d [mips] Fix more incorrect uses of HasMips64 and isMips64()
Summary:
- Conditional moves acting on 64-bit GPR's should require MIPS-IV rather than MIPS64
- ISD::MUL, and ISD::MULH[US] should be lowered on all 64-bit ISA's

Patch by David Chisnall
His work was sponsored by: DARPA, AFRL

I've added additional testcases to cover as much of the codegen changes
affecting MIPS-IV as I can. Where I've been unable to find an existing
MIPS64 testcase that can be re-used for MIPS-IV (mainly tests covering
ISD::GlobalAddress and similar), I at least agree that MIPS-IV should
behave like MIPS64. Further testcases that are fixed by this patch will follow
in my next commit. The testcases from that commit that fail for MIPS-IV without
this patch are:
    LLVM :: CodeGen/Mips/2010-07-20-Switch.ll
    LLVM :: CodeGen/Mips/cmov.ll
    LLVM :: CodeGen/Mips/eh-dwarf-cfa.ll
    LLVM :: CodeGen/Mips/largeimmprinting.ll
    LLVM :: CodeGen/Mips/longbranch.ll
    LLVM :: CodeGen/Mips/mips64-f128.ll
    LLVM :: CodeGen/Mips/mips64directive.ll
    LLVM :: CodeGen/Mips/mips64ext.ll
    LLVM :: CodeGen/Mips/mips64fpldst.ll
    LLVM :: CodeGen/Mips/mips64intldst.ll
    LLVM :: CodeGen/Mips/mips64load-store-left-right.ll
    LLVM :: CodeGen/Mips/sint-fp-store_pattern.ll

Reviewers: dsanders

Reviewed By: dsanders

CC: matheusalmeida

Differential Revision: http://reviews.llvm.org/D3343

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206183 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 15:44:42 +00:00
Tim Northover
f86c5472c0 ARM64: specify full triple in tests to pacify Windows.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206175 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 13:18:48 +00:00
Tim Northover
069bcd7b38 AArch64: add newline to end of test files.
Should be no other change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206174 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 13:18:40 +00:00
Tim Northover
856ecbc068 ARM64: remove buggy REV16 pattern.
The 32-bit pattern is still valid: 0123 -> 3210 -> 1032.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206172 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 12:59:52 +00:00
Tim Northover
e90d4e2c69 AArch64/ARM64: enable directcond.ll test on ARM64.
Code change is because optimizeCompareInstr didn't know how to pull the
condition code out of FCSEL instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206171 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 12:51:06 +00:00
Tim Northover
97c1fc0832 ARM64: add patterns for csXYZ with reversed operands.
AArch64 tests for this, and it's obviously a good idea. Have to invert the
condition code, of course.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206170 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 12:51:02 +00:00
Tim Northover
dce294e7b1 ARM64: enable more regression tests from AArch64
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206169 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 12:50:58 +00:00
Tim Northover
3c68c5c55e ARM64: add support for AArch64's addsub_ext.ll
There was one definite issue in ARM64 (the off-by-1 check for whether
a shift could be folded in) and one difference that is probably
correct: ARM64 didn't fold nodes with multiple uses into the
arithmetic operations unless optimising for code size.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206168 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 12:50:50 +00:00
Tim Northover
41b47904ba ARM64: optimise (cmp x, (sub 0, y)) to (cmn x, y).
This transformation is only valid when being used for an EQ or NE
comparison since the flags change otherwise.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206167 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 12:50:47 +00:00
Tim Northover
e23fabc2ac ARM64: start porting regression test suite from AArch64
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206166 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 12:50:41 +00:00
Richard Osborne
c9fa660ddd [XCore] Don't create invalid MKMSK instructions inside loadImmediate().
Summary:
Previously loadImmediate() would produce MKMSK instructions with invalid
immediate values such as mkmsk r0, 9. Fix this by checking the mask size
is valid.

Reviewers: robertlytton

Reviewed By: robertlytton

CC: llvm-commits

Differential Revision: http://reviews.llvm.org/D3289

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206163 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 12:30:35 +00:00
Hal Finkel
6a34916fbf [PowerPC] Fix rlwimi isel when mask is not constant
We had been using the known-zero values of the operand of the or to construct
the mask for an rlwimi; this is not quite correct, but fine when the mask is
constant. When the mask is constant, then the known zeros of the operand must
be a superset of the zeros in the mask. However, when the mask is not a
constant, then there might be bits in the operand that are not known to be zero
that, at runtime, might be zero in the mask. Therefore, we check that any bits
not known to be zero *are* known to be one in the mask. Otherwise, we can't
fold the mask with the or and shift.

This was revealed as a miscompile of
MultiSource/Benchmarks/BitBench/drop3/drop3 when I started experimenting with
constant hoisting.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206136 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-13 17:10:58 +00:00
Hal Finkel
f4c3a5601a [PowerPC] Implement some additional TLI callbacks
Add implementations of:
  bool isLegalICmpImmediate(int64_t Imm) const
  bool isLegalAddImmediate(int64_t Imm) const
  bool isTruncateFree(Type *Ty1, Type *Ty2) const
  bool isTruncateFree(EVT VT1, EVT VT2) const
  bool shouldConvertConstantLoadToIntImm(const APInt &Imm, Type *Ty) const

Unfortunately, this regresses counter-register-based loop formation because
some of the loops now end up in forms were SE cannot compute loop counts.
However, nevertheless, the test-suite results favor committing:

SingleSource/Benchmarks/BenchmarkGame/puzzle: 26% speedup
MultiSource/Benchmarks/FreeBench/analyzer/analyzer: 21% speedup
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan: 20% speedup
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv: 19% speedup
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv: 15% speedup
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2: 2% speedup

MultiSource/Benchmarks/VersaBench/bmm/bmm: 26% slowdown

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206120 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-12 21:52:38 +00:00
Richard Trieu
6a871a361d Add extra checks to mvn.ll test to prevent the "f1" check from matching
on a directory name instead of the function name.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206104 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-12 04:47:04 +00:00
Hal Finkel
e13b87c08d Reenable use of TBAA during CodeGen
We had disabled use of TBAA during CodeGen (even when otherwise using AA)
because the ptrtoint/inttoptr used by CGP for address sinking caused BasicAA to
miss basic type punning that it should catch (and, thus, we'd fail to override
TBAA when we should).

However, when AA is in use during CodeGen, CGP now uses normal GEPs and
bitcasts, instead of ptrtoint/inttoptr, when doing address sinking. As a
result, BasicAA should be able to make us do the right thing in the face of
type-punning, and it seems safe to enable use of TBAA again. self-hosting seems
fine on PPC64/Linux on the P7, with TBAA enabled and -misched=shuffle.

Note: We still don't update TBAA when merging stack slots, although because
BasicAA should now catch all such cases, this is no longer a blocking issue.
Nevertheless, I plan to commit code to deal with this properly in the near
future.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206093 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-12 01:26:00 +00:00
Hal Finkel
24517d023f Add the ability to use GEPs for address sinking in CGP
The current memory-instruction optimization logic in CGP, which sinks parts of
the address computation that can be adsorbed by the addressing mode, does this
by explicitly converting the relevant part of the address computation into
IR-level integer operations (making use of ptrtoint and inttoptr). For most
targets this is currently not a problem, but for targets wishing to make use of
IR-level aliasing analysis during CodeGen, the use of ptrtoint/inttoptr is a
problem for two reasons:
  1. BasicAA becomes less powerful in the face of the ptrtoint/inttoptr
  2. In cases where type-punning was used, and BasicAA was used
     to override TBAA, BasicAA may no longer do so. (this had forced us to disable
     all use of TBAA in CodeGen; something which we can now enable again)

This (use of GEPs instead of ptrtoint/inttoptr) is not currently enabled by
default (except for those targets that use AA during CodeGen), and so aside
from some PowerPC subtargets and SystemZ, there should be no change in
behavior. We may be able to switch completely away from the ptrtoint/inttoptr
sinking on all targets, but further testing is required.

I've doubled-up on a number of existing tests that are sensitive to the
address sinking behavior (including some store-merging tests that are
sensitive to the order of the resulting ADD operations at the SDAG level).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206092 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-12 00:59:48 +00:00
Louis Gerbarg
5672630b7c Add ARM64 CLS patterns
This patch adds patterns to generate the cls instruction ARM64. Includes tests
for 64 bit and 32 bit operands.

rdar://15611957

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206079 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-11 22:27:58 +00:00
Quentin Colombet
010311b104 [RegAllocGreedy][Last Chance Recoloring] Change the name of the exhaustive search option.
fexhaustive-register-search => exhaustive-register-search
'f' is a Clang thing!

This is related to PR18747.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206075 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-11 21:51:09 +00:00
Quentin Colombet
92a892e8f9 [RegAllocGreedy][Last Chance Recoloring] Addition of
-fexhaustive-register-search option to allow an exhaustive search during last
chance recoloring.

This is related to PR18747

Patch by MAYUR PANDEY <mayur.p@samsung.com>. 


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206072 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-11 21:39:44 +00:00
Tom Stellard
e04360918b SelectionDAG: Use helper function to improve legalization of ISD::MUL
The TargetLowering::expandMUL() helper contains lowering code extracted
from the DAGTypeLegalizer and allows the SelectionDAGLegalizer to expand more
ISD::MUL patterns without having to use a library call.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206037 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-11 16:12:01 +00:00
Reid Kleckner
bc1fd917f0 Move the segmented stack switch to a function attribute
This removes the -segmented-stacks command line flag in favor of a
per-function "split-stack" attribute.

Patch by Luqman Aden and Alex Crichton!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205997 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-10 22:58:43 +00:00
Josh Magee
ee66766568 [stack protector] Refactor and clean-up test. No functionality change.
Refactored stack-protector.ll to use new-style function attributes everywhere
and eliminated unnecessary attributes.

This cleanup is in preparation for an upcoming test change.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205996 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-10 22:47:27 +00:00
Jim Grosbach
6472a514dc X86: Tighten up test.
llc CPU autodection bites again. Speculative fix for bot failures.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205940 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-10 00:27:43 +00:00
Jim Grosbach
afb4ef3549 Add support for load folding of avx1 logical instructions
AVX supports logical operations using an operand from memory. Unfortunately
because integer operations were not added until AVX2 the AVX1 logical
operation's types were preventing the isel from folding the loads. In a limited
number of cases the peephole optimizer would fold the loads, but most were
missed. This patch adds explicit patterns with appropriate casts in order for
these loads to be folded.

The included test cases run on reduced examples and disable the peephole
optimizer to ensure the folds are being pattern matched.

Patch by Louis Gerbarg <lgg@apple.com>

rdar://16355124

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205938 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-09 23:39:25 +00:00
Jim Grosbach
e9915738be SelectionDAG: Don't constant fold target-specific nodes.
FoldConstantArithmetic() only knows how to deal with a few target independent
ISD opcodes. Bail early if it sees a target-specific ISD node. These node do
funny things with operand types which may break the assumptions of the code
that follows, and there's no actual folding that can be done anyway. For example,
non-constant 256 bit vector shifts on X86 have a shift-amount operand that's a
128-bit v4i32 vector regardless of what the first operand type is and that breaks
the assumption that the operand types must match.

rdar://16530923

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205937 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-09 23:28:11 +00:00
Chad Rosier
c3de5ed072 [AArch64] Implement the isZExtFree APIs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205926 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-09 20:51:21 +00:00
Chad Rosier
fe5c9cee80 [AArch64] Implement the isTruncateFree API.
In AArch64 i64 to i32 truncate operation is a subregister access.

This allows more opportunities for LSR optmization to eliminate
variables of different types (i32 and i64).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205925 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-09 20:43:40 +00:00
Quentin Colombet
e4b14ebcbd [DAGCombiner] DAG combine does not know how to combine indexed loads with
sign/zero/any extensions. However a few places were not checking properly the
property of the load and were turning an indexed load into a regular extended
load. Therefore the indexed value was lost during the process and this was
triggering an assertion.

<rdar://problem/16389332>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205923 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-09 20:03:05 +00:00
Justin Holewinski
77f268945e [NVPTX] Add preliminary intrinsics and codegen support for textures/surfaces
This commit adds intrinsics and codegen support for the surface read/write and texture read instructions that take an explicit sampler parameter. Codegen operates on image handles at the PTX level, but falls back to direct replacement of handles with kernel arguments if image handles are not enabled. Note that image handles are explicitly disabled for all target architectures in this change (to be enabled later).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205907 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-09 15:39:15 +00:00
Justin Holewinski
ac4c131de6 [NVPTX] Add support for addrspacecast in global variable initializers, including emitting generic() when casting to address space 0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205906 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-09 15:39:11 +00:00
Alp Toker
46d36be2eb Fix some doc and comment typos
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205899 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-09 14:47:27 +00:00
Bradley Smith
5a09ce9ad1 [ARM64] Rename LR to the UAL-compliant 'X30'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205885 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-09 14:43:59 +00:00
Bradley Smith
37fe6627f6 [ARM64] Rename FP to the UAL-compliant 'X29'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205884 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-09 14:43:50 +00:00
Elena Demikhovsky
0d5d656524 AVX-512: insert element to mask vector; store i1 data
Implemented INSERT_VECTOR_ELT operation for v16i1 and v8i1 vectors;
Implemented "store" for i1 type


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205850 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-09 12:37:50 +00:00
Daniel Sanders
e777fb4725 Re-commit: [mips] abs.[ds], and neg.[ds] should be allowed regardless of -enable-no-nans-fp-math
Summary:
They behave in accordance with the Has2008 and ABS2008 configuration bits of the processor which are used to select between the 1985 and 2008 versions of IEEE 754. In 1985 mode, these instructions are arithmetic (i.e. they raise invalid operation exceptions when given NaN), in 2008 mode they are non-arithmetic (i.e. they are copies).

nmadd.[ds], and nmsub.[ds] are still subject to -enable-no-nans-fp-math because the ISA spec does not explicitly state that they obey Has2008 and ABS2008.

Fixed the issue with the previous version of this patch (r205628). A pre-existing 'let Predicate =' statement was removing some predicates that were necessary for FP64 to behave correctly.

Reviewers: matheusalmeida

Reviewed By: matheusalmeida

Differential Revision: http://llvm-reviews.chandlerc.com/D3274



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205844 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-09 09:56:43 +00:00
Matt Arsenault
d4786ed1de R600/SI: Match not instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205837 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-09 07:16:16 +00:00
Tim Northover
87a79507fa ARM64: scalarize v1i64 mul operation
This is the second part of fixing PR19367.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205836 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-09 07:07:02 +00:00
Tim Northover
7db3c63bb2 ARM64: add pattern for <1 x i64> custom not node.
This should fix PR19367.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205835 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-09 06:55:39 +00:00
Juergen Ributzka
c6a7502a80 [Constant Hoisting][ARM64] Enable constant hoisting for ARM64.
This implements the target-hooks for ARM64 to enable constant hoisting.

This fixes <rdar://problem/14774662> and <rdar://problem/16381500>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205791 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-08 20:39:59 +00:00
Tim Northover
362090adf5 ARM64: fix fmsub patterns which assumed accum operand was first
Confusingly, the NEON fmla instructions put the accumulator first but the
scalar versions put it at the end (like the fma lib function & LLVM's
intrinsic).

This should fix PR19345, assuming there's only one issue.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205758 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-08 12:23:51 +00:00
Elena Demikhovsky
dbcb670605 AVX-512: Added fp_to_uint and uint_to_fp patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205754 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-08 07:24:02 +00:00
Reed Kotler
bb0572a5d1 Reverting commit r205628 due to mips64 issues.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205741 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-07 22:11:40 +00:00
Tom Stellard
1d8c7eb225 R600/SI: Handle INSERT_SUBREG in SIFixSGPRCopies
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205732 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-07 19:45:45 +00:00
Tom Stellard
5c9bb7119a R600: Match 24-bit arithmetic patterns in a Target DAGCombine
Moving these patterns from TableGen files to PerformDAGCombine()
should allow us to generate better code by eliminating unnecessary
shifts and extensions earlier.

This also fixes a bug where the MAD pattern was calling
SimplifyDemandedBits with a 24-bit mask on the first operand
even when the full pattern wasn't being matched.  This occasionally
resulted in some instructions being incorrectly deleted from the
program.

v2:
  - Fix bug with 64-bit mul

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205731 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-07 19:45:41 +00:00
NAKAMURA Takumi
2e858fcaad Quick fix: Triple::isOSMSVCRT() should be false for targeting cygwin.
It affected callee's stack pop in x86. It is one of devergences between cygwin and mingw since mingw-gcc-4.6.

Added testcases to llvm/test/CodeGen/X86/win32_sret.ll for cygwin.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205688 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-06 10:01:23 +00:00
Hal Finkel
b12c642bbf [PowerPC] Add a full condition code register to make the "cc" clobber work
gcc inline asm supports specifying "cc" as a clobber of all condition
registers. Add just enough modeling of the full register to make this work.
Fixed PR19326.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205630 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-04 15:15:57 +00:00
Daniel Sanders
dc404fff12 [mips] abs.[ds], and neg.[ds] should be allowed regardless of -enable-no-nans-fp-math
Summary:
They behave in accordance with the Has2008 and ABS2008 configuration bits of the
processor which are used to select between the 1985 and 2008 versions of IEEE
754. In 1985 mode, these instructions are arithmetic (i.e. they raise invalid
operation exceptions when given NaN), in 2008 mode they are non-arithmetic
(i.e. they are copies).

nmadd.[ds], and nmsub.[ds] are still subject to -enable-no-nans-fp-math because
the ISA spec does not explicitly state that they obey Has2008 and ABS2008.

Reviewers: matheusalmeida

Reviewed By: matheusalmeida

Differential Revision: http://llvm-reviews.chandlerc.com/D3274

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205628 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-04 14:52:54 +00:00
Tim Northover
659e09e372 DAGLegalize: add last-ditch type-legalization for VSELECT.
When LLVM sees something like (v1iN (vselect v1i1, v1iN, v1iN)) it can
decide that the result is OK (v1i64 is legal on AArch64, for example)
but it still need scalarising because of that v1i1. There was no code
to do this though.

AArch64 and ARM64 have DAG combines to produce efficient code and
prevent that occuring in *most* such situations, but there are edge
cases that they miss. This adds a legalization to cope with that.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205626 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-04 14:49:30 +00:00
Tim Northover
4a4d62bfb9 ARM64: handle v1i1 types arising from setcc properly.
There were several overlapping problems here, and this solution is
closely inspired by the one adopted in AArch64 in r201381.

Firstly, scalarisation of v1i1 setcc operations simply fails if the
input types are legal. This is fixed in LegalizeVectorTypes.cpp this
time, and allows AArch64 code to be simplified slightly.

Second, vselect with such a setcc feeding into it ends up in
ScalarizeVectorOperand, where it's not handled. I experimented with an
implementation, but found that whatever DAG came out was rather
horrific. I think Hao's DAG combine approach is a good one for
quality, though there are edge cases it won't catch (to be fixed
separately).

Should fix PR19335.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205625 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-04 14:49:21 +00:00
Tim Northover
0eb313be18 ARM64: use regalloc-friendly COPY_TO_REGCLASS for bitcasts
The previous patterns directly inserted FMOV or INS instructions into
the DAG for scalar_to_vector & bitconvert patterns. This is horribly
inefficient and can generated lots more GPR <-> FPR register traffic
than necessary.

It's much better to emit instructions the register allocator
understands so it can coalesce the copies when appropriate.

It led to at least one ISelLowering hack to avoid the problems, which
was incorrect for v1i64 (FPR64 has no dsub). It can now be removed
entirely.

This should also fix PR19331.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205616 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-04 09:03:09 +00:00
Tim Northover
604dff27c9 ARM64: add 128-bit MLA operations to the custom selection code.
Without this change, the llvm_unreachable kicked in. The code pattern
being spotted is rather non-canonical for 128-bit MLAs, but it can
happen and there's no point in generating sub-optimal code for it just
because it looks odd.

Should fix PR19332.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205615 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-04 09:03:02 +00:00
Quentin Colombet
3b2b5dfa9b [RegAllocGreedy][Last Chance Recoloring] Emit diagnostics when last chance
recoloring cut-offs are encountered and register allocation failed.

This is related to PR18747

Patch by MAYUR PANDEY <mayur.p@samsung.com>.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205601 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-04 02:05:21 +00:00
Quentin Colombet
cc99615837 Revert r205599, the commit was not intended to have so many changes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205600 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-04 02:02:49 +00:00
Quentin Colombet
c65a77b92d [RegAllocGreedy][Last Chance Recoloring] Emit diagnostics when last chance
recoloring cut-offs are hit.

This is related to PR18747.

Patch by MAYUR PANDEY <mayur.p@samsung.com>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205599 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-04 01:58:57 +00:00
Saleem Abdulrasool
09ebd6d735 ARM: fix test case missed in previous roundup
This should hopefully bring the last MSVC buildbot back to green!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205596 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-04 01:19:56 +00:00
Saleem Abdulrasool
2abadea537 ARM: yet another round of ARM test clean ups
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205586 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-03 23:47:24 +00:00
Eli Bendersky
25540a7f39 Optimize away unnecessary address casts.
Removes unnecessary casts from non-generic address spaces to the generic address
space for certain code patterns.

Patch by Jingyue Wu.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205571 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-03 21:18:25 +00:00
Lang Hames
89218827c8 [ARM64] Teach the ARM64DeadRegisterDefinition pass to respect implicit-defs.
When rematerializing through truncates, the coalescer may produce instructions
with dead defs, but live implicit-defs of subregs:
E.g.
  %X1<def,dead> = MOVi64imm 2, %W1<imp-def>; %X1:GPR64, %W1:GPR32

These instructions are live, and their definitions should not be rewritten.

Fixes <rdar://problem/16492408>



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205565 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-03 20:51:08 +00:00
Tom Stellard
a7469745de R600: Correct opcode for BFE_INT
Acording to AMD documentation, the correct opcode for
BFE_INT is 0x5, not 0x4

Fixes Arithm/Absdiff.Mat/3 OpenCV test

Patch by: Bruno Jiménez

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205562 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-03 20:19:29 +00:00
Tom Stellard
50c16fb65c R600/SI: Lower 64-bit immediates using REG_SEQUENCE
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205561 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-03 20:19:27 +00:00
NAKAMURA Takumi
df34283c2a llvm/test/CodeGen/X86/peephole-multiple-folds.ll: Relax expressions to satisfy win32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205559 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-03 20:07:51 +00:00
Saleem Abdulrasool
5fe5b3dcc8 ARM: update even more tests
More updating of tests to be explicit about the target triple rather than
relying on the default target triple supporting ARM mode.

Indicate to lit that object emission is not yet available for Windows on ARM.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205545 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-03 17:35:22 +00:00
Saleem Abdulrasool
27b1252c13 ARM: fixup more tests to specify the target more explicitly
This changes the tests that were targeting ARM EABI to explicitly specify the
environment rather than relying on the default.  This breaks with the new
Windows on ARM support when running the tests on Windows where the default
environment is no longer EABI.

Take the opportunity to avoid a pointless redirect (helps when trying to debug
with providing a command line invocation which can be copy and pasted) and
removing a few greps in favour of FileCheck.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205541 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-03 16:01:44 +00:00
Tim Northover
d5561bb1f0 ARM: tell LLVM about zext properties of ldrexb/ldrexh
Implementing this via ComputeMaskedBits has two advantages:
  + It actually works. DAGISel doesn't deal with the chains properly
    in the previous pattern-based solution, so they never trigger.
  + The information can be used in other DAG combines, as well as the
    trivial "get rid of truncs". For example if the trunc is in a
    different basic block.

rdar://problem/16227836

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205540 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-03 15:10:35 +00:00
Tim Northover
3eb87654a5 ARM: skip cmpxchg failure barrier if ordering is monotonic.
The terminal barrier of a cmpxchg expansion will be either Acquire or
SequentiallyConsistent. In either case it can be skipped if the
operation has Monotonic requirements on failure.

rdar://problem/15996804

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205535 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-03 13:06:54 +00:00
Tim Northover
badb137729 ARM: expand atomic ldrex/strex loops in IR
The previous situation where ATOMIC_LOAD_WHATEVER nodes were expanded
at MachineInstr emission time had grown to be extremely large and
involved, to account for the subtly different code needed for the
various flavours (8/16/32/64 bit, cmpxchg/add/minmax).

Moving this transformation into the IR clears up the code
substantially, and makes future optimisations much easier:

1. an atomicrmw followed by using the *new* value can be more
   efficient. As an IR pass, simple CSE could handle this
   efficiently.
2. Making use of cmpxchg success/failure orderings only has to be done
   in one (simpler) place.
3. The common "cmpxchg; did we store?" idiom can be exposed to
   optimisation.

I intend to gradually improve this situation within the ARM backend
and make sure there are no hidden issues before moving the code out
into CodeGen to be shared with (at least ARM64/AArch64, though I think
PPC & Mips could benefit too).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205525 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-03 11:44:58 +00:00
Silviu Baranga
3f11cd0d25 [ARM] When generating a vpaddl node the input lane type is not always the type of the
add operation since extract_vector_elt can perform an extend operation. Get the input lane
type from the vector on which we're performing the vpaddl operation on and extend or
truncate it to the output type of the original add node.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205523 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-03 10:44:27 +00:00
Tim Northover
107283d7cb ARM64: add regression test for r205519.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205520 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-03 09:36:05 +00:00
Tim Northover
b642eb5dbc ARM64: don't generate __sincos_stret calls unless on MachO
This should fix PR19314.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205514 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-03 07:06:13 +00:00
Richard Trieu
1498ceee9e Fix test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205492 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-03 00:14:18 +00:00
Lang Hames
ed2154b816 [CodeGen] Teach the peephole optimizer to remember (and exploit) all folding
opportunities in the current basic block, rather than just the last one seen.

<rdar://problem/16478629>



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205481 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-02 22:59:58 +00:00
Juergen Ributzka
75cea2c73e Add comments and test case for [DAG] Keep the opaque constant flag when performing unary constant folding operations (r204737).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205474 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-02 22:21:01 +00:00
Saleem Abdulrasool
6d4e5ab349 ARM: fixup tests to specify the target more explicitly
This changes the tests that were targeting ARM EABI to explicitly specify the
environment rather than relying on the default.  This breaks with the new
Windows on ARM support when running the tests on Windows where the default
environment is no longer EABI.

Take the opportunity to avoid a pointless redirect (helps when trying to debug
with providing a command line invocation which can be copy and pasted) and
removing a few greps in favour of FileCheck.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205465 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-02 21:22:03 +00:00
Saleem Abdulrasool
396e5e328c ARM: update subtarget information for Windows on ARM
Update the subtarget information for Windows on ARM.  This enables using the MC
layer to target Windows on ARM.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205459 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-02 20:32:05 +00:00
Tom Stellard
adb852ddf3 TargetLibraryInfo: Disable memcpy and memset on R600
There are no implementations of these for R600.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205455 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-02 19:53:29 +00:00
Jim Grosbach
acb6d9834a ARM: cortex-m0 doesn't support unaligned memory access.
Unlike other v6+ processors, cortex-m0 never supports unaligned accesses.
From the v6m ARM ARM:

"A3.2 Alignment support: ARMv6-M always generates a fault when an unaligned
access occurs."

rdar://16491560

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205452 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-02 19:28:13 +00:00
Kai Nacke
b96fc4a5ea [mips] Add more Octeon cnMips instructions
Adds the instructions ext/ext32/cins/cins32.
It also changes pop/dpop to accept the two operand version and
adds a simple pattern to generate baddu.
Tests for the two operand versions (including baddu/dmul/dpop/pop)
and the code generation pattern for baddu are included.

Reviewed by: Daniel.Sanders@imgtec.com


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205449 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-02 18:40:43 +00:00
Oliver Stannard
af48fc4136 ARM: Add support for segmented stacks
Patch by Alex Crichton, ILyoan, Luqman Aden and Svetoslav.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205430 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-02 16:10:33 +00:00
Tim Northover
6584d94610 ARM64: use GOT for weak symbols & PIC.
Weak symbols cannot use the small code model's usual ADRP sequences since the
instruction simply may not be able to encode a value of 0.

This redirects them to use the GOT, which hopefully linkers are able to cope
with even in the static relocation model.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205426 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-02 14:39:11 +00:00
Tim Northover
671c92d886 ARM64: fix lowering of fp128 fptosi/fptoui
We were creating libcall nodes that returned an MVT::f128, when these
particular operations actually return an int of some stripe.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205425 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-02 14:39:07 +00:00
Tim Northover
3844cadc9a ARM64: make sure first argument to INSERT_SUBVECTOR has right type.
Again, coalescing and other optimisations swiftly made the MachineInstrs
consistent again, but when compiled at -O0 a bad INSERT_SUBREGISTER was
produced.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205423 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-02 14:38:58 +00:00
Tim Northover
87e824120d ARM64: convert fp16 narrowing ISel to pseudo-instruction
The previous attempt was fine with optimisations, but was actually rather
cavalier with its types. When compiled at -O0, it produced invalid COPY
MachineInstrs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205422 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-02 14:38:54 +00:00
Job Noorman
4e7ec2b053 Mark FPB as a reserved register when needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205421 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-02 13:13:56 +00:00
Renato Golin
421397ac00 Remove duplicated DMB instructions
ARM specific optimiztion, finding places in ARM machine code where 2 dmbs
follow one another, and eliminating one of them.

Patch by Reinoud Elhorst.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205409 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-02 09:03:43 +00:00
Hal Finkel
4a6c0afc52 [PowerPC] Add some missing VSX bitcast patterns
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205352 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-01 19:24:27 +00:00
Reid Kleckner
f319a2c3b3 Support segmented stacks on Win64
Identical to Win32 method except the GS segment register is used for TLS
instead of FS and pvArbitrary is at TEB offset 0x28 instead of 0x14.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205342 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-01 18:34:21 +00:00
Matt Arsenault
a173c01556 Fix missing RUN line in test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205341 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-01 18:34:13 +00:00
Matt Arsenault
eb8eac6be2 Make isSetCCEquivalent respect the TargetBooleanContents
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205336 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-01 18:13:26 +00:00
Tim Northover
c077472250 ARM: add cyclone CPU with ZeroCycleZeroing feature.
The Cyclone CPU is similar to swift for most LLVM purposes, but does have two
preferred instructions for zeroing a VFP register. This teaches LLVM about
them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205309 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-01 13:22:02 +00:00
Tim Northover
ed839120c4 ARM64: add intrinsic for pmull (p64 x p64 = p128) operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205302 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-01 12:22:37 +00:00
Tim Northover
acfb679618 ARM64: add patterns for more lane-wise ld1/st1 operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205294 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-01 10:37:09 +00:00
Tim Northover
233c567ea9 ARM64: fix bug in ld3r (1d) SelectionDAG.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205293 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-01 10:37:03 +00:00
Alexey Volkov
1d75829ddc [x86] Do not convert to cmp32 for Atom arch by Sergey Okunev
Differential Revision: http://llvm-reviews.chandlerc.com/D2824


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205288 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-01 08:13:07 +00:00
David Blaikie
a07c1ab4e6 DebugInfo: Avoid creating unnecessary/empty line tables and remove the special case of '0' in DwarfCompileUnit::initStmtList by just always using a label difference
This moves one case of raw text checking down into the MCStreamer
interfaces in the form of a virtual function, even if we ultimately end
up consolidating on the one-or-many line tables issue one day, this is
nicer in the interim. This just generally streamlines a bunch of use
cases into a common code path.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205287 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-01 08:07:52 +00:00
Juergen Ributzka
d8c9577764 [Stackmaps] Update the stackmap format to use 64-bit relocations for the function address and properly align all entries.
This commit updates the stackmap format to version 1 to indicate the
reorganizaion of several fields. This was done in order to align stackmap
entries to their natural alignment and to minimize padding.

Fixes <rdar://problem/16005902>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205254 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 22:14:04 +00:00
Matt Arsenault
193c3e91b9 R600: Compute masked bits for min and max
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205242 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 19:35:33 +00:00
Matt Arsenault
828bfc7350 R600: Add BFE, BFI, and BFM intrinsics to help with writing tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205236 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 18:21:18 +00:00
Hal Finkel
e2b3751924 [PowerPC] Don't ever expand BUILD_VECTOR of v2i64 with shuffles
If we have two unique values for a v2i64 build vector, this will always result
in two vector loads if we expand using shuffles. Only one is necessary.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205231 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 17:48:16 +00:00
Yaron Keren
b6f3f9ba45 Two updated tests for MinGW 32 and 64 exception handling code generation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205227 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 17:34:15 +00:00
Eli Bendersky
416a0e993f Fix for PR19099 - NVPTX produces invalid symbol names.
This is a more thorough fix for the issue than r203483. An IR pass will run
before NVPTX codegen to make sure there are no invalid symbol names that can't
be consumed by the ptxas assembler.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205212 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 15:56:26 +00:00
Tim Northover
93b3fcae28 ARM64: add extra patterns for scalar shifts
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205209 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 15:46:46 +00:00
Tim Northover
812a174ba4 ARM64: add extra scalar neg pattern & tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205208 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 15:46:42 +00:00
Tim Northover
b35ffdff16 ARM64: add patterns for scalar sqdmlal & sqdmlsl.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205207 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 15:46:38 +00:00
Tim Northover
9542846832 ARM64: add more patterns for commuted fmsub operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205206 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 15:46:34 +00:00
Tim Northover
4417e07e39 ARM64: shuffle patterns around for fmin/fmax & add tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205205 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 15:46:30 +00:00
Tim Northover
576c3f709f ARM64: add more scalar patterns for usqadd & suqadd.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205204 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 15:46:26 +00:00
Tim Northover
13277a78bc ARM64: add more scalar patterns for reciprocal ops.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205203 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 15:46:22 +00:00
Tim Northover
8f93e159ed ARM64: add i64 scalar pattern for @llvm.arm64.abs
This will be used by the Clang front-end code for vabsd_s64.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205202 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 15:46:17 +00:00
Tom Stellard
1f143aa3e9 R600/SI: Lower i64 SELECT by bitcasting to a vector type
This allows allows us to replace ISD::EXTRACT_ELEMENT, which is lowered
using shifts, with ISD::EXTRACT_VECTOR_ELT, which is a no-op.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205187 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 14:01:55 +00:00
Zoran Jovanovic
077aa54e4e Fixed issue with microMIPS JAL instruction.
Differential Revision: http://llvm-reviews.chandlerc.com/D3200


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205185 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 14:00:10 +00:00
Hal Finkel
65fafbb109 Look at shuffles of build_vectors in DAGCombiner::visitEXTRACT_VECTOR_ELT
When the loop vectorizer vectorizes code that uses the loop induction variable,
we often end up with IR like this:

  %b1 = insertelement <2 x i32> undef, i32 %v, i32 0
  %b2 = shufflevector <2 x i32> %b1, <2 x i32> undef, <2 x i32> zeroinitializer
  %i = add <2 x i32> %b2, <i32 2, i32 3>

If the add in this example is not legal (as is the case on PPC with VSX), it
will be scalarized, and we'll end up with a number of extract_vector_elt nodes
with the vector shuffle as the input operand, and that vector shuffle is fed by
one or more build_vector nodes. By the time that vector operations are
expanded, visitEXTRACT_VECTOR_ELT will not create new extract_vector_elt by
looking through the vector shuffle (to make sure that no illegal operations are
created), and so the extract_vector_elt -> vector shuffle -> build_vector is
never simplified to an operand of the build vector.

By looking at build_vectors through a shuffle we fix this particular situation,
preventing a vector from being built, only to be deconstructed again (for the
scalarized add) -- an expensive proposition when this all needs to be done via
the stack. We probably want a more comprehensive fix here where we look back
recursively through any shuffles to any build_vectors or scalar_to_vectors,
etc. but that can come later.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205179 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 11:43:19 +00:00
Daniel Sanders
dd67fd636c [mips] Check emitted code for llvm.bswap.i32 on MIPS16/MIPS64 and llvm.bswap.i64 on MIPS16.
While reviewing r204163, I noticed that the MIPS16 test only checked for a .ent
directive and didn't actually check the code emitted. Fixed this and added a
check for llvm.bswap.i32 on MIPS64 at the same time.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205177 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 11:00:04 +00:00
Chandler Carruth
2530cd31f4 [ARM64] Fix materialization of an fp128 zero immediate. There currently
is not a pattern to lower this with clever instructions that zero the
register, so restrict the zero immediate legality special case to f64
and f32 (the only two sizes which fmov seems to directly support). Fixes
backend errors when building code such as libxml.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205161 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 00:02:10 +00:00
Hal Finkel
111bcf9b59 Make use of previously generated stores in SelectionDAGLegalize::ExpandExtractFromVectorThroughStack
When expanding EXTRACT_VECTOR_ELT and EXTRACT_SUBVECTOR using
SelectionDAGLegalize::ExpandExtractFromVectorThroughStack, we store the entire
vector and then load the piece we want. This is fine in isolation, but
generating a new store (and corresponding stack slot) for each extraction ends
up producing code of poor quality. When we scalarize a vector operation (using
SelectionDAG::UnrollVectorOp for example) we generate one EXTRACT_VECTOR_ELT
for each element in the vector. This used to generate one stored copy of the
vector for each element in the vector. Now we search the uses of the vector for
a suitable store before generating a new one, which results in much more
efficient scalarization code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205153 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-30 15:10:18 +00:00
Hal Finkel
ee8e48d4c9 [PowerPC] Handle VSX v2i64 SIGN_EXTEND_INREG
sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node
(and similarly for v2i16 and v2i8). Even though there are no sign-extension (or
algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by
converting two and from v2i64. The small trick necessary here is to shift the
i32 elements into the right lanes before the i32 -> f64 step. This is because
of the big Endian nature of the system, we need the i32 portion in the high
word of the i64 elements.

For v2i16 and v2i8 we can do the same, but we first use the default Altivec
shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and
then apply the above procedure.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205146 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-30 13:22:59 +00:00
NAKAMURA Takumi
a60d50cb9f Suppress llvm/test/CodeGen/ARM64 for targeting pecoff. ARM64 is unaware of that.
FIXME: Could we support them?

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205126 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-30 05:01:17 +00:00
Hal Finkel
7563821402 [PowerPC] Handle v2i64 comparisons
v2i64 is a legal type under VSX, however we don't have native vector
comparisons. We can handle eq/ne by casting it to an Altivec type, but
everything else must be expanded.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205106 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-29 16:04:40 +00:00
Tim Northover
7b837d8c75 ARM64: initial backend import
This adds a second implementation of the AArch64 architecture to LLVM,
accessible in parallel via the "arm64" triple. The plan over the
coming weeks & months is to merge the two into a single backend,
during which time thorough code review should naturally occur.

Everything will be easier with the target in-tree though, hence this
commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205090 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-29 10:18:08 +00:00
Hal Finkel
44b2b9dc1a [PowerPC] Add subregister classes for f64 VSX values
We had stored both f64 values and v2f64, etc. values in the VSX registers. This
worked, but was suboptimal because we would always spill 16-byte values even
through we almost always had scalar 8-byte values. This resulted in an
increase in stack-size use, extra memory bandwidth, etc. To fix this, I've
added 64-bit subregisters of the Altivec registers, and combined those with the
existing scalar floating-point registers to form a class of VSX scalar
floating-point registers. The ABI code has also been enhanced to use this
register class and some other necessary improvements have been made.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205075 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-29 05:29:01 +00:00
Akira Hatanaka
b3cb36026a [x86] Fix printing of register operands with q modifier.
Emit 32-bit register names instead of 64-bit register names if the target does
not have 64-bit general purpose registers.

<rdar://problem/14653996>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205067 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-28 23:28:07 +00:00
David Majnemer
f3e8b0575d X86: Disable IsLegalToCallImmediateAddr for Win32
WinCOFF cannot form PC relative relocations to support absolute
MCValues.  We should reenable this once WinCOFF supports emission of
IMAGE_REL_I386_REL32 relocations.

This fixes PR19272.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205058 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-28 21:40:47 +00:00
Hal Finkel
0e11c017a9 [PowerPC] Fix VSX permutation isel
Not only did I invert the indices when I wrote the code, but I also did the
same thing when I wrote the regression test. Oops.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205046 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-28 20:24:55 +00:00
Hal Finkel
c9de9e60b9 [PowerPC] v2[fi]64 need to be explicitly passed in VSX registers
v2[fi]64 values need to be explicitly passed in VSX registers. This is because
the code in TRI that finds the minimal register class given a register and a
value type will assert if given an Altivec register and a non-Altivec type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205041 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-28 19:58:11 +00:00
Hal Finkel
e2ee98ab16 [PowerPC] Use a small cleanup pass to remove VSX self copies
As explained in r204976, because of how the allocation of VSX registers
interacts with the call-lowering code, we sometimes end up generating self VSX
copies. Specifically, things like this:
  %VSL2<def> = COPY %F2, %VSL2<imp-use,kill>
(where %F2 is really a sub-register of %VSL2, and so this copy is a nop)

This adds a small cleanup pass to remove these prior to post-RA scheduling.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204980 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-27 23:12:31 +00:00
Hal Finkel
6bdc4ebedd [PowerPC] Fix v2f64 vector extract and related patterns
First, v2f64 vector extract had not been declared legal (and so the existing
patterns were not being used). Second, the patterns for that, and for
scalar_to_vector, should really be a regclass copy, not a subregister
operation, because the VSX registers directly hold both the vector and scalar data.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204971 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-27 22:22:48 +00:00