Commit Graph

12998 Commits

Author SHA1 Message Date
Hans Wennborg
b499b73e30 Revert r230280: "Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap"
This caused PR22674, failing this assert:

Instructions.h:2281: llvm::Value* llvm::PHINode::getOperand(unsigned int) const: Assertion `i_nocapture < OperandTraits<PHINode>::operands(this) && "getOperand() out of range!"' failed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230341 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-24 16:19:29 +00:00
Michael Kuperstein
2379e8a2ee [x32] x32 should use ebx as the base pointer.
This fixes the original issue in PR22655, but not the secondary one.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230334 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-24 15:27:13 +00:00
Reed Kotler
aecbb87ee8 Beginning of alloca implementation for Mips fast-isel
Summary: Begin to add various address modes; including alloca.

Test Plan: Make sure there are no regressions in test-suite at O0/02 in mips32r1/r2

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: echristo, rfuhler, llvm-commits

Differential Revision: http://reviews.llvm.org/D6426

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230300 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-24 02:36:45 +00:00
David Majnemer
fbdee9f0c0 X86: Only use 'lea' in Win64 epilogues if a frame pointer exists
We can only use 'add' in epilogues, 'lea' is not permitted unless we've
established a frame pointer in the prologue.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230286 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-24 00:11:32 +00:00
Sanjoy Das
8d16a81c33 Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.

This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.

Apart from the attached test case, another (more realistic) manifestation
of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll.

NOTE: this change was landed with an incorrect commit message in
rL230275 and was reverted for that reason in rL230279.  This commit
message is the correct one.

Differential Revision: http://reviews.llvm.org/D7778



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230280 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-23 23:22:58 +00:00
Sanjoy Das
69048edf8a Revert 230275.
230275 got committed with an incorrect commit message due to a mixup
on my side.  Will re-land in a few moments with the correct commit
message.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230279 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-23 23:13:22 +00:00
Andrea Di Biagio
770e106ed6 [X86] Teach how to custom lower double-to-half conversions under fast-math.
This patch teaches the backend how to expand a double-half conversion into
a double-float conversion immediately followed by a float-half conversion.
We do this only under fast-math, and if float-half conversions are legal
for the target.

Added test CodeGen/X86/fastmath-float-half-conversion.ll

Differential Revision: http://reviews.llvm.org/D7832


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230276 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-23 22:59:02 +00:00
Sanjoy Das
7ebbc8de2f Fix bug 22641
The bug was a result of getPreStartForExtend interpreting nsw/nuw
flags on an add recurrence more strongly than is legal.  {S,+,X}<nsw>
implies S+X is nsw only if the backedge of the loop is taken at least
once.

Differential Revision: http://reviews.llvm.org/D7808



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230275 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-23 22:55:13 +00:00
David Majnemer
ad6622575c X86: Use a smaller 'mov' instruction for stack probe calls
Prologue emission, in some cases, requires calls to a stack probe helper
function.  The amount of stack to probe is passed as a register
argument in the Win64 ABI but the instruction sequence used is
pessimistic: it assumes that the number of bytes to probe is greater
than 4 GB.

Instead, select a more appropriate opcode depending on the number of
bytes we are going to probe.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230270 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-23 21:50:30 +00:00
David Majnemer
d71e4c6218 X86: Use 'mov' instead of 'lea' in Win64 SEH prologues when possible
'mov' and 'lea' are equivalent when the displacement applied with 'lea'
is zero.  However, 'mov' should encode smaller.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230269 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-23 21:50:27 +00:00
Bruno Cardoso Lopes
a7db376a63 [X86][MMX] Fix test to reflect current codegen
This test failed in several buildbots, a bit unclear how that happen
since this was the previous behavior before r230248.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230258 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-23 20:57:46 +00:00
Andrew Kaylor
595050a793 Adding test for Windows EH frame variable remapping.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230250 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-23 20:04:51 +00:00
Andrew Kaylor
1d10231766 Remap frame variables for native Windows exception handling.
Differential Revision: http://reviews.llvm.org/D7770



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230249 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-23 20:01:56 +00:00
Bruno Cardoso Lopes
ee7b509aa3 Revert "[X86][MMX] Add MMX instructions to foldable tables"
This reverts commit r230226 since it breaks win buildbots.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230248 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-23 19:53:37 +00:00
Daniel Sanders
b50b4e2d36 [mips] Honour -mno-odd-spreg for vector insert/extract when MSA is enabled.
Summary:
-mno-odd-spreg prohibits the use of odd-numbered single-precision floating
point registers. However, vector insert/extract was still using them when
manipulating the subregisters of an MSA register. Fixed this by ensuring
that insertion/extraction is only performed on even-numbered vector
registers when -mno-odd-spreg is given.

Reviewers: vmedic, sstankovic

Reviewed By: sstankovic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7672

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230235 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-23 17:22:16 +00:00
Bruno Cardoso Lopes
01312dd0b4 [X86] Add specific mtriple in order to appease builbots
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230229 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-23 15:33:40 +00:00
Bruno Cardoso Lopes
77d2363908 [X86][MMX] Add MMX instructions to foldable tables
Teach the peephole optimizer to work with MMX instructions by adding
entries into the foldable tables. This covers folding opportunities not
handled during isel.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230226 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-23 15:23:22 +00:00
Bruno Cardoso Lopes
c606f3a3cb [X86][MMX] Support folding loads in psll, psrl and psra intrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230225 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-23 15:23:14 +00:00
Bruno Cardoso Lopes
6916c75fba [X86][MMX] Add tests for pslli, psrli and psrai intrinsics
Add tests to cover the RR form of the pslli, psrli and psrai intrinsics.
In the next commit, the loads are going to be folded and the
instructions use the RM form.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230224 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-23 15:23:06 +00:00
Elena Demikhovsky
fdafc8fd5e AVX-512: recommitted 229837 + bugfix + test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230223 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-23 15:12:31 +00:00
Simon Pilgrim
66c960350c [DagCombiner] Generalized BuildVector Vector Concatenation
The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.

This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.

This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.

Differential Revision: http://reviews.llvm.org/D7816

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230177 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-22 18:17:28 +00:00
Matt Arsenault
29f97a6c46 R600/SI: Use v_madmk_f32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230149 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-21 21:29:10 +00:00
Matt Arsenault
c490f78e53 R600/SI: Try to use v_madak_f32
This is a code size optimization when the constant
only has one use.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230148 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-21 21:29:07 +00:00
Simon Pilgrim
b430a06e94 [X86][SSE] Added shuffle based integer zero extension tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230145 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-21 21:25:16 +00:00
David Majnemer
e95985d3a0 Win64: Stack alignment constraints aren't applied during SET_FPREG
Stack realignment occurs after the prolog, not during, for Win64.
Because of this, don't factor in the maximum stack alignment when
establishing a frame pointer.

This fixes PR22572.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230113 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-21 01:04:47 +00:00
Rafael Espindola
c093973970 Use short names for jumptable sections.
Also refactor code to remove some duplication.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230087 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 23:28:28 +00:00
Matt Arsenault
16fc5e9c0f R600/SI: Remove v_sub_f64 pseudo
The expansion code does the same thing. Since
the operands were not defined with the correct
types, this has the side effect of fixing operand
folding since the expanded pseudo would never use
SGPRs or inline immediates.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230072 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 22:10:45 +00:00
Matt Arsenault
bbb748eece R600: Use new fmad node.
This enables a few useful combines that used to only
use fma.

Also since v_mad_f32 apparently does not support denormals,
disable the existing cases that are custom handled if they are
requested.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230071 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 22:10:41 +00:00
Jozef Kolek
b2e79a8e69 Reversed revision 229706. The reason is regression, which is caused by the
usage of instruction ADDU16 by CodeGen. For this instruction an improper
register is allocated, i.e. the register that is not from register set defined
for the instruction.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230053 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 20:26:52 +00:00
Andrea Di Biagio
3583d23018 [X86][FastIsel] Teach how to select float-half conversion intrinsics.
This patch teaches X86FastISel how to select intrinsic 'convert_from_fp16' and
intrinsic 'convert_to_fp16'.
If the target has F16C, we can select VCVTPS2PHrr for a float-half conversion,
and VCVTPH2PSrr for a half-float conversion.

Differential Revision: http://reviews.llvm.org/D7673


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230043 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 19:37:14 +00:00
Kit Barton
3e00ca983c I incorrectly marked the VORC instruction as isCommutable when I added it.
This fix removes the VORC instruction definition from the isCommutable block.

Phabricator review: http://reviews.llvm.org/D7772


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230020 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 15:54:58 +00:00
Hal Finkel
2c5f9584ba [PowerPC] Loop Data Prefetching for the BG/Q
The IBM BG/Q supercomputer's A2 cores have a hardware prefetching unit, the
L1P, but it does not prefetch directly into the A2's L1 cache. Instead, it
prefetches into its own L1P buffer, and the latency to access that buffer is
significantly higher than that to the L1 cache (although smaller than the
latency to the L2 cache). As a result, especially when multiple hardware
threads are not actively busy, explicitly prefetching data into the L1 cache is
advantageous.

I've been using this pass out-of-tree for data prefetching on the BG/Q for well
over a year, and it has worked quite well. It is enabled by default only for
the BG/Q, but can be enabled for other cores as well via a command-line option.

Eventually, we might want to add some TTI interfaces and move this into
Transforms/Scalar (there is nothing particularly target dependent about it,
although only machines like the BG/Q will benefit from its simplistic
strategy).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229966 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 05:08:21 +00:00
Chandler Carruth
efbbaefea5 [x86] Remove the old vector shuffle lowering code and its flag.
The new shuffle lowering has been the default for some time. I've
enabled the new legality testing by default with no really blocking
regressions. I've fuzz tested this very heavily (many millions of fuzz
test cases have passed at this point). And this cleans up a ton of code.
=]

Thanks again to the many folks that helped with this transition. There
was a lot of work by others that went into the new shuffle lowering to
make it really excellent.

In case you aren't using a diff algorithm that can handle this:
  X86ISelLowering.cpp: 22 insertions(+), 2940 deletions(-)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229964 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 04:25:04 +00:00
Chandler Carruth
07ef8904ad [x86] Now that the new vector shuffle legality is enabled and everything
is going well, remove the flag and the code for the old legality tests.

This is the first step toward removing the entire old vector shuffle
lowering. *Much* more code to delete coming up next.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229963 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 03:59:35 +00:00
Chandler Carruth
38749b8e07 [x86] Make the new vector shuffle legality test on by default, which
reflects the fact that the x86 backend can in fact lower any shuffle you
want it to with reasonably high code quality.

My recent work on the new vector shuffle has made this regress *very*
little. The diff in the test cases makes me very, very happy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229958 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 03:05:47 +00:00
Chandler Carruth
4fd100f9e9 [x86] Clean up a couple of test cases with the new update script. Split
one test case that is only partially tested in 32-bits into two test
cases so that the script doesn't generate massive spews of tests for the
cases we don't care about.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229955 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 02:44:13 +00:00
Chandler Carruth
90b8e791ac Revert r229944: EH: Prune unreachable resume instructions during Dwarf EH preparation
This doesn't pass 'ninja check-llvm' for me. Lots of tests, including
the ones updated, fail with crashes and other explosions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229952 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 02:15:36 +00:00
Reid Kleckner
49ab3a626a EH: Prune unreachable resume instructions during Dwarf EH preparation
Today a simple function that only catches exceptions and doesn't run
destructor cleanups ends up containing a dead call to _Unwind_Resume
(PR20300). We can't remove these dead resume instructions during normal
optimization because inlining might introduce additional landingpads
that do have cleanups to run. Instead we can do this during EH
preparation, which is guaranteed to run after inlining.

Fixes PR20300.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D7744

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229944 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 01:00:19 +00:00
Eric Christopher
8c4bb575e1 Revert "AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics."
The instructions were being generated on architectures that don't support avx512.

This reverts commit r229837.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229942 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 00:45:28 +00:00
Ahmed Bougacha
5898fc70ec [ARM] Re-re-apply VLD1/VST1 base-update combine.
This re-applies r223862, r224198, r224203, and r224754, which were
reverted in r228129 because they exposed Clang misalignment problems
when self-hosting.

The combine caused the crashes because we turned ISD::LOAD/STORE nodes
to ARMISD::VLD1/VST1_UPD nodes.  When selecting addressing modes, we
were very lax for the former, and only emitted the alignment operand
(as in "[r1:128]") when it was larger than the standard alignment of
the memory type.

However, for ARMISD nodes, we just used the MMO alignment, no matter
what.  In our case, we turned ISD nodes to ARMISD nodes, and this
caused the alignment operands to start being emitted.

And that's how we exposed alignment problems that were ignored before
(but I believe would have been caught with SCTRL.A==1?).

To fix this, we can just mirror the hack done for ISD nodes:  only
take into account the MMO alignment when the access is overaligned.

Original commit message:
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.

We can do the same thing for generic load/stores.

Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).

rdar://19717869, rdar://14062261.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229932 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-19 23:52:41 +00:00
Sanjay Patel
4dad5f2731 add X86 load folding tests for unary math ops
X86 load folding is fragile; eg, the tests here
don't work without AVX even though they should. This
is because we have a mix of tablegen patterns that have
been added over time, and we have a load folding table
used by the peephole optimizer that has to be kept in 
sync with the ever-changing ISA and tablegen defs.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229870 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-19 16:59:11 +00:00
Chandler Carruth
b7012af85f [x86] Delete still more piles of complex code now that we have a good
systematic lowering of v8i16.

This required a slight strategy shift to prefer unpack lowerings in more
places. While this isn't a cut-and-dry win in every case, it is in the
overwhelming majority. There are only a few places where the old
lowering would probably be a touch faster, and then only by a small
margin.

In some cases, this is yet another significant improvement.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229859 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-19 15:21:57 +00:00
Chandler Carruth
c57e90422f [x86] Teach the unpack lowering how to lower with an initial unpack in
addition to lowering to trees rooted in an unpack.

This saves shuffles and or registers in many various ways, lets us
handle another class of v4i32 shuffles pre SSE4.1 without domain
crosses, etc.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229856 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-19 15:06:13 +00:00
Chandler Carruth
7f583a4201 [x86] Dramatically improve v8i16 shuffle lowering by not using its
terribly complex partial blend logic.

This code path was one of the more complex and bug prone when it first
went in and it hasn't faired much better. Ultimately, with the simpler
basis for unpack lowering and support bit-math blending, this is
completely obsolete. In the worst case without this we generate
different but equivalent instructions. However, in many cases we
generate much better code. This is especially true when blends or pshufb
is available.

This does expose one (minor) weakness of the unpack lowering that I'll
try to address.

In case you were wondering, this is actually a big part of what I've
been trying to pull off in the recent string of commits.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229853 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-19 14:08:24 +00:00
Chandler Carruth
943b2ca2de [x86] Remove the final fallback in the v8i16 lowering that isn't really
needed, and significantly improve the SSSE3 path.

This makes the new strategy much more clear. If we can blend, we just go
with that. If we can't blend, we try to permute into an unpack so
that we handle cases where the unpack doing the blend also simplifies
the shuffle. If that fails and we've got SSSE3, we now call into
factored-out pshufb lowering code so that we leverage the fact that
pshufb can set up a blend for us while shuffling. This generates great
code, especially because we *know* we don't have a fast blend at this
point. Finally, we fall back on decomposing into permutes and blends
because we do at least have a bit-math-based blend if we need to use
that.

This pretty significantly improves some of the v8i16 code paths. We
never need to form pshufb for the single-input shuffles because we have
effective target-specific combines to form it there, but we were missing
its effectiveness in the blends.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229851 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-19 13:56:49 +00:00
Chandler Carruth
c3d7858505 [x86] Simplify the pre-SSSE3 v16i8 lowering significantly by decomposing
them into permutes and a blend with the generic decomposition logic.

This works really well in almost every case and lets the code only
manage the expansion of a single input into two v8i16 vectors to perform
the actual shuffle. The blend-based merging is often much nicer than the
pack based merging that this replaces. The only place where it isn't we
end up blending between two packs when we could do a single pack. To
handle that case, just teach the v2i64 lowering to handle these blends
by digging out the operands.

With this we're down to only really random permutations that cause an
explosion of instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229849 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-19 13:15:12 +00:00
Chandler Carruth
3d4542ce3d [x86] Remove the insanely over-aggressive unpack lowering strategy for
v16i8 shuffles, and replace it with new facilities.

This uses precise patterns to match exact unpacks, and the new
generalized unpack lowering only when we detect a case where we will
have to shuffle both inputs anyways and they terminate in exactly
a blend.

This fixes all of the blend horrors that I uncovered by always lowering
blends through the vector shuffle lowering. It also removes *sooooo*
much of the crazy instruction sequences required for v16i8 lowering
previously. Much cleaner now.

The only "meh" aspect is that we sometimes use pshufb+pshufb+unpck when
it would be marginally nicer to use pshufb+pshufb+por. However, the
difference there is *tiny*. In many cases its a win because we re-use
the pshufb mask. In others, we get to avoid the pshufb entirely. I've
left a FIXME, but I'm dubious we can really do better than this. I'm
actually pretty happy with this lowering now.

For SSE2 this exposes some horrors that were really already there. Those
will have to fixed by changing a different path through the v16i8
lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229846 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-19 12:10:37 +00:00
Jozef Kolek
bb539d3b4c [mips][microMIPS] Make usage of AND16, OR16 and XOR16 by code generator
Differential Revision: http://reviews.llvm.org/D7611


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229845 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-19 11:51:32 +00:00
Elena Demikhovsky
675d06d1d0 AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229837 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-19 10:48:04 +00:00
Chandler Carruth
ac2b1a1bb3 [x86] Add support for bit-wise blending and use it in the v8 and v16
lowering paths. I'm going to be leveraging this to simplify a lot of the
overly complex lowering of v8 and v16 shuffles in pre-SSSE3 modes.

Sadly, this isn't profitable on v4i32 and v2i64. There, the float and
double blending instructions for pre-SSE4.1 are actually pretty good,
and we can't beat them with bit math. And once SSE4.1 comes around we
have direct blending support and this ceases to be relevant.

Also, some of the test cases look odd because the domain fixer
canonicalizes these to floating point domain. That's OK, it'll use the
integer domain when it matters and some day I may be able to update
enough of LLVM to canonicalize the other way.

This restores almost all of the regressions from teaching x86's vselect
lowering to always use vector shuffle lowering for blends. The remaining
problems are because the v16 lowering path is still doing crazy things.
I'll be re-arranging that strategy in more detail in subsequent commits
to finish recovering the performance here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229836 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-19 10:46:52 +00:00
Chandler Carruth
a8fb39af83 [x86,sdag] Two interrelated changes to the x86 and sdag code.
First, don't combine bit masking into vector shuffles (even ones the
target can handle) once operation legalization has taken place. Custom
legalization of vector shuffles may exist for these patterns (making the
predicate return true) but that custom legalization may in some cases
produce the exact bit math this matches. We only really want to handle
this prior to operation legalization.

However, the x86 backend, in a fit of awesome, relied on this. What it
would do is mark VSELECTs as expand, which would turn them into
arithmetic, which this would then match back into vector shuffles, which
we would then lower properly. Amazing.

Instead, the second change is to teach the x86 backend to directly form
vector shuffles from VSELECT nodes with constant conditions, and to mark
all of the vector types we support lowering blends as shuffles as custom
VSELECT lowering. We still mark the forms which actually support
variable blends as *legal* so that the custom lowering is bypassed, and
the legal lowering can even be used by the vector shuffle legalization
(yes, i know, this is confusing. but that's how the patterns are
written).

This makes the VSELECT lowering much more sensible, and in fact should
fix a bunch of bugs with it. However, as you'll see in the test cases,
right now what it does is point out the *hilarious* deficiency of the
new vector shuffle lowering when it comes to blends. Fortunately, my
very next patch fixes that. I can't submit it yet, because that patch,
somewhat obviously, forms the exact and/or pattern that the DAG combine
is matching here! Without this patch, teaching the vector shuffle
lowering to produce the right code infloops in the DAG combiner. With
this patch alone, we produce terrible code but at least lower through
the right paths. With both patches, all the regressions here should be
fixed, and a bunch of the improvements (like using 2 shufps with no
memory loads instead of 2 andps with memory loads and an orps) will
stay. Win!

There is one other change worth noting here. We had hilariously wrong
vectorization cost estimates for vselect because we fell through to the
code path that assumed all "expand" vector operations are scalarized.
However, the "expand" lowering of VSELECT is vector bit math, most
definitely not scalarized. So now we go back to the correct if horribly
naive cost of "1" for "not scalarized". If anyone wants to add actual
modeling of shuffle costs, that would be cool, but this seems an
improvement on its own. Note the removal of 16 and 32 "costs" for doing
a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of
course, we don't right now because of OMG bad code, but I'm going to fix
that. Next patch. I promise.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229835 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-19 10:36:19 +00:00
Peter Collingbourne
7d3b145da4 llvm-mc: Use Target::createNullStreamer to fix crashes on target-specific asm directives.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229798 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-19 00:45:04 +00:00
Chandler Carruth
00c954ffc4 [x86] Merge checks for a recently added test case that is the same on
all SSE variants and AVX variants.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229770 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 23:20:49 +00:00
Reid Kleckner
f89d9b1c75 Add an IR-to-IR test for dwarf EH preparation using opt
This tests the simple resume instruction elimination logic that we have
before making some changes to it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229768 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 23:17:41 +00:00
Reid Kleckner
ae09ebc540 dos2unix the WinEH file and tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229735 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 19:52:46 +00:00
Andrew Kaylor
a4976167c4 Adding implementation to outline C++ catch handlers for native Windows 64 exception handling.
Differential Revision: http://reviews.llvm.org/D7363



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229715 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 18:31:51 +00:00
Jozef Kolek
2032d755e7 [mips][microMIPS] Make usage of ADDU16 and SUBU16 by code generator
Differential Revision: http://reviews.llvm.org/D7609


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229706 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 17:33:56 +00:00
Daniel Sanders
7eedd07d5e [mips] Add backend support for Mips32r[35] and Mips64r[35].
Summary:
These ISA's didn't add any instructions so they are almost identical to
Mips32r2 and Mips64r2. Even the ELF e_flags are the same, However the ISA
revision in .MIPS.abiflags is 3 or 5 respectively instead of 2.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: tomatabacu, llvm-commits, atanasyan

Differential Revision: http://reviews.llvm.org/D7381


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229695 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 16:24:50 +00:00
Kit Barton
31840a62af This patch adds the VSX logical instructions introduced in the Power ISA 2.07. It also removes the added complexity that favors VMX versions of the three instructions.
Phabricator review: http://reviews.llvm.org/D7616

Commiting on Nemanja's behalf.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229694 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 16:21:46 +00:00
Vasileios Kalintiris
0563ea452c [mips] Avoid redundant sign extension of the result of binary bitwise instructions.
Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7581

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229675 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 14:57:05 +00:00
Bradley Smith
4fe6d075d5 [ARM] Add missing M/R class CPUs
Add some of the missing M and R class Cortex CPUs, namely:

Cortex-M0+ (called Cortex-M0plus for GCC compatibility)
Cortex-M1
SC000
SC300
Cortex-R5


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229660 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 10:33:30 +00:00
Michael Kuperstein
9571ea6620 Fixes two issue in SimplifyDemandedBits of sext_in_reg:
1) We should not try to simplify if the sext has multiple uses
2) There is no need to simplify is the source value is already sign-extended.

Patch by Gil Rapaport <gil.rapaport@intel.com>

Differential Revision: http://reviews.llvm.org/D6949

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229659 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 09:43:40 +00:00
Chandler Carruth
4e8a4638e9 [x86] Refactor the bit shift code the same as I just did the byte shift
code.

While this didn't have the miscompile (it used MatchLeft consistently)
it missed some cases where it could use right shifts. I've added a test
case Craig Topper came up with to exercise the right shift matching.

This code is really identical between the two. I'm going to merge them
next so that we don't keep two copies of all of this logic.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229655 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 09:19:58 +00:00
Ulrich Weigand
bebd59c74b [SystemZ] Support all TLS access models - CodeGen part
The current SystemZ back-end only supports the local-exec TLS access model.
This patch adds all required CodeGen support for the other TLS models, which
means in particular:

- Expand initial-exec TLS accesses by loading TLS offsets from the GOT
  using @indntpoff relocations.

- Expand general-dynamic and local-dynamic accesses by generating the
  appropriate calls to __tls_get_offset.  Note that this routine has
  a non-standard ABI and requires loading the GOT pointer into %r12,
  so the patch also adds support for the GLOBAL_OFFSET_TABLE ISD node.

- Add a new platform-specific optimization pass to remove redundant
  __tls_get_offset calls in the local-dynamic model (modeled after
  the corresponding X86 pass).

- Add test cases verifying all access models and optimizations.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229654 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 09:13:27 +00:00
Daniel Jasper
66bd6852bb Remove experimental options to control machine block placement.
This reverts r226034. Benchmarking with those flags has not revealed
anything interesting.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229648 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 08:18:07 +00:00
Elena Demikhovsky
87483ed180 AVX-512: Added support for FP instructions with embedded rounding mode.
By Asaf Badouh <asaf.badouh@intel.com>



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229645 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 07:59:20 +00:00
Craig Topper
052d754ccb [X86] Add another test case for the bug fixed in r229642. With the bug a vpsrldq was emitted instead of pslldq.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229643 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 07:45:43 +00:00
Chandler Carruth
c9520b48ae [x86] Rewrite the byte shift detection to not use boolean variables to
track state.

I didn't like this in the code review because the pattern tends to be
error prone, but I didn't see a clear way to rewrite it. Turns out that
there were bugs here, I found them when fuzz testing our shuffle
lowering for correctness on x86.

The core of the problem is that we need to consistently test all our
preconditions for the same directionality of shift and the same input
vector. Instead, formulate this as two predicates (one doesn't depend on
the input in any way), pass things like the directionality and input
vector as inputs, and loop over the alternatives.

This fixes a pattern of very rare miscompiles coming out of this code.
Turned up roughly 4 out of every 1 million v8 shuffles in my fuzz
testing. The new code is over half a million test runs with no failures
yet. I've also fuzzed every other function in the lowering code with
over 3.5 million test cases and not discovered any other miscompiles.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229642 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 07:13:48 +00:00
Craig Topper
ed42dcef75 [X86] Remove AVX2 and SSE2 pslldq and psrldq intrinsics. We can represent them in IR with vector shuffles now. All their uses have been removed from clang in favor of shuffles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229640 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 06:24:44 +00:00
Matt Arsenault
2422768a8a R600/SI: Add missing offset operand to buffer bothen
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229605 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 02:04:38 +00:00
Matt Arsenault
fe524d5902 R600/SI: Add missing soffset operand to global atomics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229604 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 02:04:35 +00:00
Andrea Di Biagio
b3ff6a88b6 [X86][FastIsel] Teach how to select scalar integer to float/double conversions.
This patch teaches fast-isel how to select a (V)CVTSI2SSrr for an integer to 
float conversion, and how to select a (V)CVTSI2SDrr for an integer to double
conversion.

Added test 'fast-isel-int-float-conversion.ll'.

Differential Revision: http://reviews.llvm.org/D7698


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229589 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 23:40:58 +00:00
Rafael Espindola
3b75cfe179 Add r228939 back with a fix.
The problem in the original patch was not switching back to .text after printing
an eh table.

Original message:

On ELF, put PIC jump tables in a non executable section.

Fixes PR22558.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229586 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 23:34:51 +00:00
Rafael Espindola
54b2025420 Add a test showing the problem in r228939.
If an EH table is printed in between the function and the jump table we would
fail to switch back to the text section to print the jump table.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229580 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 23:21:46 +00:00
Simon Pilgrim
cbc2ca5ec9 [X86][SSE] Generalised unpckl/unpckh shuffle matching
Added commuted unpckl/unpckh shuffle matching patterns as many cases containing undefined lanes fail to commute by themselves.

Differential Revision: http://reviews.llvm.org/D7564

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229571 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 22:24:32 +00:00
Sanjay Patel
ef23f042ce use a triple instead of a cpu; less builbot sadness
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229563 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 21:59:54 +00:00
Rafael Espindola
51beb495fc Add testcases I missed in r229541.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229542 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 20:50:39 +00:00
Sanjay Patel
166769cfb4 make basic block label matching more flexible for less sad buildbots
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229535 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 20:29:31 +00:00
Tom Stellard
ec5b9ab433 R600/SI: Fix asam errors in SIFoldOperands
We were trying to fold into implicit uses, which led to out of bounds
access of the MCInstrDesc::OpInfo arrray.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229533 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 20:11:54 +00:00
Sanjay Patel
544843cee1 prevent folding a scalar FP load into a packed logical FP instruction (PR22371)
Change the memory operands in sse12_fp_packed_scalar_logical_alias from scalars to vectors. 
That's what the hardware packed logical FP instructions define: 128-bit memory operands.
There are no scalar versions of these instructions...because this is x86.

Generating the wrong code (folding a scalar load into a 128-bit load) is still possible
using the peephole optimization pass and the load folding tables. We won't completely
solve this bug until we either fix the lowering in fabs/fneg/fcopysign and any other
places where scalar FP logic is created or fix the load folding in foldMemoryOperandImpl()
to make sure it isn't changing the size of the load.

Differential Revision: http://reviews.llvm.org/D7474


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229531 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 20:08:21 +00:00
Sanjay Patel
1115b2c27e Canonicalize splats as build_vectors (PR22283)
This is a follow-on patch to:
http://reviews.llvm.org/D7093

That patch canonicalized constant splats as build_vectors, 
and this patch removes the constant check so we can canonicalize
all splats as build_vectors.

This fixes the 2nd test case in PR22283:
http://llvm.org/bugs/show_bug.cgi?id=22283

The unfortunate code duplication between SelectionDAG and DAGCombiner
is discussed in the earlier patch review. At least this patch is just
removing code...

This improves an existing x86 AVX test and changes codegen in an ARM test.

Differential Revision: http://reviews.llvm.org/D7389


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229511 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 16:54:32 +00:00
Tom Stellard
7a7153e5ee R600/SI: Extend private extload pattern to include zext loads
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229507 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 16:36:00 +00:00
Andrea Di Biagio
16389eee3f [X86][FastISel] Add missing flag -fast-isel-abort to run lines in test fast-isel-fptrunc-fpext.ll.
Flag -fast-isel-abort is required in order to verify that X86FastISel
never fails to select FPExt (float-to-double) and FPTrunc (double-to-float).
No Functional change intended.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229489 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 12:25:49 +00:00
Elena Demikhovsky
199f58a198 AVX-512: changes in intel_ocl_bi calling conventions
- added mask types v8i1 and v16i1 to possible function parameters
- enabled passing 512-bit vectors in standard CC
- added a test for KNL intel_ocl_bi conventions


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229482 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 09:20:12 +00:00
Michael Kuperstein
e275542046 [X86] Combine vector anyext + and into a vector zext
Vector zext tends to get legalized into a vector anyext, represented as a vector shuffle with an undef vector + a bitcast, that gets ANDed with a mask that zeroes the undef elements.
Combine this into an explicit shuffle with a zero vector instead. This allows shuffle lowering to match it as a zext, instead of matching it as an anyext and emitting an explicit AND.
This combine only covers a subset of the cases, but it's a start.

Differential Revision: http://reviews.llvm.org/D7666

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229480 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 08:22:51 +00:00
Eric Christopher
fb031eee53 Move ABI handling and 64-bitness to the PowerPC target machine.
This required changing how the computation of the ABI is handled
and how some of the checks for ABI/target are done.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229471 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 06:45:15 +00:00
Chandler Carruth
1e357351be [x86] Teach the unpack lowering to try wider element unpacks.
This allows it to match still more places where previously we would have
to fall back on floating point shuffles or other more complex lowering
strategies.

I'm hoping to replace some of the hand-rolled unpack matching with this
routine is it gets more and more clever.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229463 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 02:12:24 +00:00
Hal Finkel
4ba3a67430 Specify arch in test/CodeGen/X86/float-conv-elim.ll
This test was failing on non-x86 hosts because it specified a cpu of x86_64,
but not an architecture. x86_64 is obviously not a valid cpu on all
architectures.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229460 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 00:11:19 +00:00
Hal Finkel
ba51ae6864 [PowerPC] Support non-direct-sub/superclass VSX copies
Our register allocation has become better recently, it seems, and is now
starting to generate cross-block copies into inflated register classes. These
copies are not transformed into subregister insertions/extractions by the
PPCVSXCopy class, and so need to be handled directly by
PPCInstrInfo::copyPhysReg. The code to do this was *almost* there, but not
quite (it was unnecessarily restricting itself to only the direct
sub/super-register-class case (not copying between, for example, something in
VRRC and the lower-half of VSRC which are super-registers of F8RC).

Triggering this behavior manually is difficult; I'm including two
bugpoint-reduced test cases from the test suite.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229457 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-16 23:46:30 +00:00
Cameron McInally
cdddfe0cb3 [AVX512] Make 512b vector floating point rounds legal on AVX512.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229445 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-16 22:15:42 +00:00
Simon Pilgrim
0638f4e115 [X86][SSE] Add SSE MOVQ instructions to SSEPackedInt domain
Patch to explicitly add the SSE MOVQ (rr,mr,rm) instructions to SSEPackedInt domain - prevents a number of costly domain switches.

Differential Revision: http://reviews.llvm.org/D7600

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229439 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-16 21:50:56 +00:00
Mehdi Amini
2deb1d0b54 SelectionDAG: fold (fp_to_u/sint (s/uint_to_fp)) here too
Update SPARC tests to match.

From: Fiona Glaser <fglaser@apple.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229438 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-16 21:47:58 +00:00
Craig Topper
4031c08c87 [X86] Remove the multiply by 8 that goes into the shift constant for X86ISD::VSHLDQ and X86ISD::VSRLDQ. This simplifies the pattern matching in isel and allows these nodes to become the patterns embedded in the instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229431 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-16 20:52:07 +00:00
Andrew Trick
4f7d60c1ea AArch64: Safely handle the incoming sret call argument.
This adds a safe interface to the machine independent InputArg struct
for accessing the index of the original (IR-level) argument. When a
non-native return type is lowered, we generate the hidden
machine-level sret argument on-the-fly. Before this fix, we were
representing this argument as OrigArgIndex == 0, which is an outright
lie. In particular this crashed in the AArch64 backend where we
actually try to access the type of the original argument.

Now we use a sentinel value for machine arguments that have no
original argument index. AArch64, ARM, Mips, and PPC now check for this
case before accessing the original argument.

Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229413 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-16 18:10:47 +00:00
Chandler Carruth
cbe6ecfc81 [x86] Add a generic unpack-targeted lowering technique. This can be used
to generically lower blends and is particularly nice because it is
available frome SSE2 onward. This removes a lot of the remaining domain
crossing blends in SSE2 code.

I'm hoping to replace some of the "interleaved" lowering hacks with
something closer to this which should be more principled. First, this
needs to learn how to detect and use other interleavings besides that of
the natural type provided. That will be a follow-up patch though.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229378 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-16 12:28:18 +00:00
Chandler Carruth
e62fbca6b7 [x86] Switch this test to use checks generated by my update script. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229377 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-16 12:23:22 +00:00
Chandler Carruth
29679ccc12 [x86] Add initial basic support for forming blends of v16i8 vectors.
This blend instruction is ... really lame. The register usage is insane.
As a consequence this is probably only *barely* better than 2 pshufbs
followed by a por, and that mostly because it only has to read from
a single memory location.

However, this doesn't fix as much as I kind of expected, so more to go.
Pretty sure that the ordering and delegation of v16i8 is just really,
really bad.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229373 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-16 10:58:23 +00:00
Chandler Carruth
ab497238cb [x86] Add some more test cases for i8 vector blends.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229372 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-16 10:51:49 +00:00
Craig Topper
74b9ad3485 [X86] Add support for lowering shuffles to 256-bit PALIGNR instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229359 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-16 06:29:06 +00:00
Craig Topper
abdf58f7f9 [X86] Remove some hard tab characters from tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229358 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-16 06:29:02 +00:00
Chandler Carruth
454c3997b4 [x86] Teach the 128-bit vector shuffle lowering routines to take
advantage of the existence of a reasonable blend instruction.

The 256-bit vector shuffle lowering has leveraged the general technique
of decomposed shuffles and blends for quite some time, but this never
made it back into the 128-bit code, and there are a large number of
patterns where this is substantially better. For example, this removes
almost all domain crossing in vector shuffles that involve some blend
and some permutation with SSE4.1 and later. See the massive reduction
in 'shufps' for integer test cases in this commit.

This isn't perfect yet for a few reasons:

1) The v8i16 shuffle lowering continues to plague me. We don't always
   form an unpack-based blend when that would be better. But the wins
   pretty drastically outstrip the losses here.
2) The v16i8 shuffle lowering is just a disaster here. I never went and
   implemented blend support here for some terrible reason. I'll do
   that next probably. I've not updated it for now.

More variations on this technique are coming as well -- we don't
shuffle-into-unpack or shuffle-into-palignr, both of which would also be
profitable.

Note that some test cases grow significantly in the number of
instructions, but I expect to actually be faster. We use
pshufd+pshufd+blendw instead of a single shufps, but the pshufd's are
very likely to pipeline well (two ports on most modern intel chips) and
the blend is a *very* fast instruction. The domain switch penalty will
essentially always be more than a blend instruction, which is the only
increase in tree height.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229350 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-16 01:52:02 +00:00
Chandler Carruth
aa35e012a3 [x86] Clean up a few test cases with the update script. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229349 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-16 01:39:50 +00:00
Simon Pilgrim
f4e056ac2a Added (still inefficient) shuffle test case for PR21138
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229321 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-15 18:21:39 +00:00
Simon Pilgrim
afcb895fe1 Added some test cases of missed opportunities to use unpckl/unpckh shuffles
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229313 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-15 15:07:45 +00:00
Simon Pilgrim
28f299b62d [X86][AVX2] vpslldq/vpsrldq byte shifts for AVX2
This patch refactors the existing lowerVectorShuffleAsByteShift function to add support for 256-bit vectors on AVX2 targets.

It also fixes a tablegen issue that prevented the lowering of vpslldq/vpsrldq vec256 instructions.

Differential Revision: http://reviews.llvm.org/D7596

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229311 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-15 13:19:52 +00:00
Chandler Carruth
3e93916175 [x86] Add the test case from PR22412, we now get this right even with
the new vector shuffle legality.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229310 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-15 12:45:05 +00:00
Chandler Carruth
fbde8bffba [x86] Teach the decomposed shuffle/blend lowering to use an early blend
when that will allow it to lower with a single permute instead of
multiple permutes.

It tries to detect when it will only have to do a single permute in
either case to maximize folding of loads and such.

This cuts a *lot* of the avx2 shuffle permute counts in half. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229309 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-15 12:42:15 +00:00
Chandler Carruth
72753f87f2 [SDAG] Teach the SelectionDAG to canonicalize vector shuffles of splats
directly into blends of the splats.

These patterns show up even very late in the vector shuffle lowering
where we don't have any chance for DAG combining to kick in, and
blending is a tremendously simpler operation to model. By coercing the
shuffle into a blend we can much more easily match and lower shuffles of
splats.

Immediately with this change there are significantly more blends being
matched in the x86 vector shuffle lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229308 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-15 12:18:12 +00:00
Chandler Carruth
52f1b6dbed [x86] Stop shuffling zero vectors. =]
I was somewhat surprised this pattern really came up, but it does. It
seems better to just directly handle it than try to special case every
place where we end up forming a shuffle that devolves to a shuffle of
a zero vector.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229301 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-15 10:34:52 +00:00
Chandler Carruth
46d3e580ed [x86] When splitting 256-bit vectors into 128-bit vectors, don't extract
subvectors from buildvectors. That doesn't really make any sense and it
breaks all of the down-stream matching of buildvectors to cleverly lower
shuffles.

With this, we now get the shift-based lowering of 256-bit vector
shuffles with AVX1 when we split them into 128-bit vectors. We also do
much better on the zero-extension patterns, although there remains quite
a bit of room for improvement here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229299 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-15 10:12:02 +00:00
Chandler Carruth
4d2dfa703e [x86] Update some tests with the latest version of my script and llc.
This mostly adds some shuffle decode comments and cleans up indentation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229296 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-15 09:26:15 +00:00
Chandler Carruth
62ba2b29d8 [x86] Add a slight variation on some of the other generic shuffle
lowerings -- one which decomposes into an initial blend followed by
a permute.

Particularly on newer chips, blends are handled independently of
shuffles and so this is much less bottlenecked on the single port that
floating point shuffles are executed with on Intel.

I'll be adding this lowering to a bunch of other code paths in
subsequent commits to handle still more places where we can effectively
leverage blends when they're available in the ISA.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229292 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-15 08:26:30 +00:00
Chandler Carruth
8e8b9bd42f [x86] Add a test case for PR22390 which was a dup of PR22377 and fixed
by r229285. This is a nice different test case though, so I'd like to
have the extra testing of these kinds of patterns.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229286 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-15 07:05:50 +00:00
Chandler Carruth
5ee516549f [x86] Fix PR22377, a regression with the new vector shuffle legality
test.

This was just a matter of the DAG combine for vector shuffles being too
aggressive. This is a bit of a grey area, but I think generally if we
can re-use intermediate shuffles, we should. Certainly, given the test
cases I have available, this seems like the right call.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229285 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-15 07:01:10 +00:00
Chandler Carruth
9bb943b185 [x86] Switch a collection of tests explicitly to the new vector shuffle
legality test (essentially, everything is legal).

I'm planning to make this the default shortly, but I'd like to fix
a collection of the bugs it exposes first, and this will let me easily
test them. It also showcases both the improvements and a few of the
regressions triggered by the change. The biggest improvements by far are
the significantly reduced shuffling and domain crossing in the combining
test case. The biggest regressions are missing some clever blending
patterns.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229284 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-15 06:37:21 +00:00
Chandler Carruth
0294b517ac [x86] Remove the now-default-on flag for the new vector shuffle lowering
strategy from a bunch of tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229283 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-15 06:20:51 +00:00
Chandler Carruth
da0198de41 [x86] Teach my test updating script about another quirk of the printed
asm and port the mmx vector shuffle test to it.

Not thrilled with how it handles the stack manipulation logic, but I'm
much less bothered by that than I am by updating the test manually. =]
If anyone wants to teach the test checks management script about stack
adjustment patterns, that'd be cool too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229268 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-15 00:08:01 +00:00
Simon Pilgrim
6d5ee8a8b5 [X86][XOP] Enable commutation for XOP instructions
Patch to allow XOP instructions (integer comparison and integer multiply-add) to be commuted. The comparison instructions sometimes require the compare mode to be flipped but the remaining instructions can use default commutation modes.

This patch also sets the SSE domains of all the XOP instructions.

Differential Revision: http://reviews.llvm.org/D7646

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229267 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-14 22:40:46 +00:00
Andrea Di Biagio
47cd120a18 [optnone] Skip pass Constant Hoisting on optnone functions.
Added test CodeGen/X86/constant-hoisting-optnone.ll to verify that
pass Constant Hoisting is not run on optnone functions.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229258 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-14 15:11:48 +00:00
Simon Pilgrim
fba633f4b6 [X86] Ensure integer domain on scalar load/store stack folding tests. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229257 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-14 14:10:44 +00:00
Matt Arsenault
8a44761afe R600/SI: Implement correct f64 fdiv
This version passes the OpenCL conformance test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229239 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-14 04:30:08 +00:00
Matt Arsenault
fd31a769ce R600/SI: Use complex operand folding for div_scale
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229238 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-14 04:24:28 +00:00
Matt Arsenault
febc8e20a5 R600/SI: Add tests for div_fmas with inline immediate operands
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229237 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-14 04:22:02 +00:00
Matt Arsenault
9295d69bea R600/SI: Fix implicit vcc operand to v_div_fmas_*
This should allow finally fixing the f64 fdiv implementation.

Test is disabled for VI since there seems to be a problem with one
of the buffer load instructions on it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229236 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-14 04:22:00 +00:00
Matthias Braun
821ec14add Revert "On ELF, put PIC jump tables in a non executable section."
This reverts commit r228939.

The commit broke something in the output of exception handling tables on
darwin x86-64.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229203 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-14 01:16:54 +00:00
Sanjay Patel
fa1b3ba1f0 [SSE/AVX] Use multiclasses to reduce the mass of scalar math patterns; NFCI
This takes the preposterous number of patterns in this section
that were last added to in r219033 down to just plain obnoxious.

With a little more work, we might get this down to just comical.

I've added more test cases to the existing file that checks these
patterns, but it seems that some of these patterns simply don't
exist with today's shuffle lowering.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229158 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 21:52:42 +00:00
Reid Kleckner
f5c2138cf2 Fix R600 test deadlock on Windows by giving FileCheck an argument
llc would hang trying to write output to a full pipe that FileCheck
wasn't reading. FileCheck wasn't reading from stdin because it needs a
file as a positional argument.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229157 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 21:27:28 +00:00
Matt Arsenault
1751616522 R600/SI: Allow f64 inline immediates in i64 operands
This requires considering the size of the operand when
checking immediate legality.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229135 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 19:05:03 +00:00
Matt Arsenault
d14e5ec25d R600/SI: Minor test scheduling fixes
This prevents these from failing in a later commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229134 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 19:04:56 +00:00
Jozef Kolek
85e08ed8a4 [mips][microMIPS] Delay slot filler: Replace the microMIPS JR with the JRC
This patch adds functionality in MIPS delay slot filler such as if delay slot
filler have to put NOP instruction into the delay slot of microMIPS JR
instruction, then instead of emitting NOP this instruction is replaced by
compact jump instruction JRC.

Differential Revision: http://reviews.llvm.org/D7522


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229128 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 17:51:27 +00:00
Andrea Di Biagio
59d115311a [CodeGenPrepare] Removed duplicate logic. SimplifyCFG already knows how to speculate calls to cttz/ctlz.
SimplifyCFG now knows how to speculate calls to intrinsic cttz/ctlz that are
'cheap' for the target. Therefore, some of the logic in CodeGenPrepare
that was originally added at revision 224899 can now be removed.

This patch is basically a no functional change. It removes the duplicated
logic in CodeGenPrepare and converts all the existing target specific tests
for cttz/ctlz into SimplifyCFG tests.

Differential Revision: http://reviews.llvm.org/D7608


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229105 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 14:15:48 +00:00
James Molloy
c76bff0715 [SimplifyCFG] Be more aggressive
Up the phi node folding threshold from a cheap "1" to a meagre "2".

Update tests for extra added selects and slight code churn.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229099 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 10:48:30 +00:00
Chandler Carruth
00ae03a747 Revert a series of commits starting at r228886 which is triggering some
regressions for LLDB on Linux. Rafael indicated on lldb-dev that we
should just go ahead and revert these but that he wasn't at a computer.
The patches backed out are as follows:

r228980: Add support for having multiple sections with the name and ...
r228889: Invert the section relocation map.
r228888: Use the existing SymbolTableIndex intsead of doing a lookup.
r228886: Create the Section -> Rel Section map when it is first needed.

These patches look pretty nice to me, so hoping its not too hard to get
them re-instated. =D

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229080 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 07:52:39 +00:00
Craig Topper
f3455f13a2 [X86] Add support for parsing and printing the mnemonic aliases for the XOP VPCOM instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229078 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 07:42:25 +00:00
Craig Topper
d4f1c60bf5 Fix probable typo in test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229070 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 06:07:27 +00:00
Craig Topper
db9343fb40 [X86] Remove int_x86_sse2_psll_dq_bs and int_x86_sse2_psrl_dq_bs intrinsics. The builtins aren't used by clang.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229069 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 06:07:24 +00:00
Rafael Espindola
2fa06b171b Add support for having multiple sections with the same name and comdat.
Using this in combination with -ffunction-sections allows LLVM to output a .o
file with mulitple sections named .text. This saves space by avoiding long
unique names of the form .text.<C++ mangled name>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228980 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-12 23:29:51 +00:00
David Majnemer
73a92d5136 X86: Don't crash if we can't decode the pshufb mask
Constant pool entries are uniqued by their contents regardless of their
type.  This means that a pshufb can have a shuffle mask which isn't a
simple array of bytes.

The code path which attempts to decode the mask didn't check for
failure, causing PR22559.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228979 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-12 23:26:26 +00:00
Simon Pilgrim
6911b3bc37 Ensure integer domain on general shuffle stack folding tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228972 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-12 22:47:45 +00:00
David Blaikie
c3dc5bac73 Remove typedef of a pointer type used in a gep to simplify migration of geps to a typeless-pointer future.
I'd modify my migration tool to account for this, but this is the only
instance of a typedef'd pointer type to a gep I found in the whole test
suite, so it didn't seem worthwhile.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228970 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-12 22:45:25 +00:00
Hal Finkel
8452d01224 [SDAG] Don't try to use FP_EXTEND/FP_ROUND for int<->fp promotions
The PowerPC backend has long promoted some floating-point vector operations
(such as select) to integer vector operations. Unfortunately, this behavior was
broken by r216555. When using FP_EXTEND/FP_ROUND for promotions, we must check
that both the old and new types are floating-point types. Otherwise, we must
use BITCAST as we did prior to r216555 for everything.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228969 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-12 22:43:52 +00:00
Reed Kotler
36068aae42 Add bulk of returning of values to Mips fast-isel
Summary:
Implement the bulk of returning values in Mips fast-isel



Test Plan:
reatabi.ll

Passes test-suite at -O0,-O2 and with mips32r2 and mips32r1.





Reviewers: dsanders

Reviewed By: dsanders

Subscribers: llvm-commits, aemerson, rfuhler

Differential Revision: http://reviews.llvm.org/D5920

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228958 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-12 21:05:12 +00:00
Rafael Espindola
c3c5d7c2d6 On ELF, put PIC jump tables in a non executable section.
Fixes PR22558.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228939 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-12 17:46:49 +00:00
Rafael Espindola
8eeedf74d3 Put each jump table in an independent section if the function is too.
This allows the linker to GC both, fixing pr22557.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228937 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-12 17:16:46 +00:00
Michael Kuperstein
fb107d8bf0 [X86] Call frame optimization - allow stack-relative movs to be folded into a push
Since we track esp precisely, there's no reason not to allow this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228924 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-12 14:17:35 +00:00
Elena Demikhovsky
f41b8e3e49 AVX-512: Fixed the "test" operation for i1 type
Using KORTESTW for comparison i1 value with zero was wrong since the instruction tests 16 bits.
KORTESTW may be used with KSHIFTL+KSHIFTR that clean the 15 upper bits.
I removed (X86cmp i1, 0) pattern and zero-extend i1 to i8 and then use TESTB.

There are some cases where i1 is in the mask register and the upper bits are already zeroed.
Then KORTESTW is the better solution, but it is subject for optimization.
Meanwhile, I'm fixing the correctness issue.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228916 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-12 08:40:34 +00:00
Michael Kuperstein
fd98d3be55 [X86] A heuristic to estimate the size impact for converting stack-relative parameter movs to pushes
This gives a rough estimate of whether using pushes instead of movs is profitable, in terms of size.
We go over all calls in the MachineFunction and compute:
a) For each callsite that can not use pushes, the penalty of not having a reserved call frame.
b) For each callsite that can use pushes, the gain of actually replacing the movs with pushes (and the potential penalty of having to readjust the stack).

Differential Revision: http://reviews.llvm.org/D7561

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228915 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-12 08:36:35 +00:00
Ahmed Bougacha
9e9bde9b54 [CodeGen] Don't blindly combine (fp_round (fp_round x)) to (fp_round x).
We used to do this DAG combine, but it's not always correct:
If the first fp_round isn't a value preserving truncation, it might
introduce a tie in the second fp_round, that wouldn't occur in the
single-step fp_round we want to fold to.
In other words, double rounding isn't the same as rounding.

Differential Revision: http://reviews.llvm.org/D7571


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228911 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-12 06:15:29 +00:00
Hal Finkel
091acf253b [PowerPC] Mark jumps as expensive (using using CR bits)
On PowerPC, which has a full set of logical operations on (its multiple sets
of) condition-register bits, it is not profitable to break of complex
conditions feeding a jump into multiple jumps. We can turn off this feature of
CGP/SDAGBuilder by marking jumps as "expensive".

P7 test-suite speedups (no regressions):
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2
	-0.626647% +/- 0.323583%
MultiSource/Benchmarks/Olden/power/power
	-18.2821% +/- 8.06481%

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228895 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-12 01:02:52 +00:00
Tom Stellard
293dfe59a5 R600/SI: Disable subreg liveness
This is temporary while we try to fix a crash in the register coalescer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228861 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-11 18:24:53 +00:00
Simon Pilgrim
d606d6bfe1 [X86][SSE] Added dual vector truncation tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228857 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-11 18:14:35 +00:00
Tom Stellard
946f5e91ef R600/SI: Fix -march in test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228848 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-11 17:11:48 +00:00
Sanjay Patel
cb2ff33a8a fixed to test features, not CPUs
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228836 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-11 15:00:41 +00:00
Sanjay Patel
00fb386b23 fixed to test features, not CPUs
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228835 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-11 15:00:19 +00:00
Sanjay Patel
57fb1850e2 fixed to test features, not CPUs
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228834 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-11 14:58:25 +00:00
Marek Olsak
c0021e43ea R600/SI: Enable a lot of existing tests for VI (squashed commits)
This is a union of these commits:

* R600/SI: Enable more tests for VI which need no changes

* R600/SI: Enable V_BCNT tests for VI
    Differences:
    - v_bcnt_..._e32 -> _e64
    - s_load_dword* inline offset is in bytes instead of dwords

* R600/SI: Enable all tests for VI which use S_LOAD_DWORD
    The inline offset is changed from dwords to bytes.

* R600/SI: Enable LDS tests for VI
    Differences:
    - the s_load_dword inline offset changed from dwords to bytes
    - the tests checked very little on CI, so they have been fixed to check all
      instructions that "SI" checked

* R600/SI: Enable lshr tests for VI

* R600/SI: Fix divrem64 tests
    - "v_lshl_64" was missing "b" before "64"
    - added VI-NOT checks

* R600/SI: Enable the SI.tid test for VI

* R600/SI: Enable the frem test for VI
    Also, the frem_f64 checking is added for CI-VI.

* R600/SI: Add VI tests for rsq.clamped

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228830 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-11 14:26:46 +00:00
James Molloy
1777ae7a05 Make buildbots better.
This testcase change was associated incorrectly to a followup commit in my git tree, not the base commit. Sorry!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228827 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-11 12:24:09 +00:00
James Molloy
4de471dd0a [SimplifyCFG] Swap to using TargetTransformInfo for cost
analysis.

We're already using TTI in SimplifyCFG, so remove the hard-baked "cheapness"
heuristic and use TTI directly. Generally NFC intended, but we're using a slightly
different heuristic now so there is a slight test churn.

Test changes:
  * combine-comparisons-by-cse.ll: Removed unneeded branch check.
  * 2014-08-04-muls-it.ll: Test now doesn't branch but emits muleq.
  * coalesce-subregs.ll: Superfluous block check.
  * 2008-01-02-hoist-fp-add.ll: fadd is safe to speculate. Change to udiv.
  * PhiBlockMerge.ll: Superfluous CFG checking code. Main checks still present.
  * select-gep.ll: A variable GEP is not expensive, just TCC_Basic, according to the TTI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228826 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-11 12:15:41 +00:00
Tom Stellard
9f5d593c1f R600/SI: Store immediate offsets > 12-bits in soffset
This will save us from having to extend these offsets to 64-bits
and storing them in a pair of vgprs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228776 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-11 00:34:35 +00:00
Petar Jovanovic
56e9c5f92b Fix makeLibCall argument (signed) in SoftenFloatRes_XINT_TO_FP function
The isSigned argument of makeLibCall function was hard-coded to false
(unsigned). This caused zero extension on MIPS64 soft float.
As the result SingleSource/Benchmarks/Stanford/FloatMM test and
SingleSource/UnitTests/2005-07-17-INT-To-FP test failed. 
The solution was to use the proper argument.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D7292


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228765 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 23:30:14 +00:00
David Majnemer
f2138c2df8 X86: @llvm.frameaddress should defer to SelectionDAG for Win CFI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228754 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 22:00:34 +00:00
David Majnemer
420f72a301 X86: Make @llvm.frameaddress work correctly with Windows unwind codes
Simply loading or storing the frame pointer is not sufficient for
Windows targets.  Instead, create a synthetic frame object that we will
lower later.  References to this synthetic object will be replaced with
the correct reference to the frame address.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228748 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 21:22:05 +00:00
Daniel Jasper
363b645818 Fix overly prescriptive test that broken on Mac after r228725.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228742 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 20:49:05 +00:00
Bill Schmidt
49b3971b70 [PowerPC] Fix reverted patch r227976 to avoid register assignment issues
See full discussion in http://reviews.llvm.org/D7491.

We now hide the add-immediate and call instructions together in a
separate pseudo-op, which is tagged to define GPR3 and clobber the
call-killed registers.  The PPCTLSDynamicCall pass prior to RA now
expands this op into the two separate addi and call ops, with explicit
definitions of GPR3 on both instructions, and explicit clobbers on the
call instruction.  The pass is now marked as requiring and preserving
the LiveIntervals and SlotIndexes analyses, and fixes these up after
the replacement sequences are introduced.

Self-hosting has been verified on LE P8 and BE P7 with various
optimization levels, etc.  It has also been verified with the
--no-tls-optimize flag workaround removed.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228725 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 19:09:05 +00:00
David Majnemer
3163865f01 X86: Emit Win64 SaveXMM opcodes at the right offset in the right order
Walk the instructions marked FrameSetup and consider any stores of XMM
registers to the stack as needing a SaveXMM opcode.

This fixes PR22521.

Differential Revision: http://reviews.llvm.org/D7527

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228724 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 19:01:47 +00:00
Paul Robinson
a932cb6d09 Explicitly initialize a flag in a default constructor.
Works around a Visual C++ issue.

Patch by Douglas Yung!



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228699 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 15:30:02 +00:00
Bradley Smith
cec93b661d [ARM] Add armv6s[-]m as an alias to armv6[-]m
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228696 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 15:15:08 +00:00
Simon Pilgrim
c99d58d6c1 [X86][AVX2] Missing AVX2 memory folding instructions
Added most of the missing vector folding patterns for AVX2 (as well as fixing the vpermpd and verpmq patterns)

Differential Revision: http://reviews.llvm.org/D7492

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228688 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 13:22:57 +00:00
Simon Pilgrim
8bcc093da5 [X86][XOP] Added XOP memory folding patterns + tests
This patch adds the complete AMD Bulldozer XOP instruction set to the memory folding pattern tables for stack folding, etc.

Note: Many of the XOP instructions have multiple table entries as it can fold loads from different sources.

Differential Revision: http://reviews.llvm.org/D7484

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228685 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 12:57:17 +00:00
Andrea Di Biagio
bd1729e5d4 [X86][FastIsel] Avoid introducing legacy SSE instructions if the target has AVX.
This patch teaches X86FastISel how to select AVX instructions for scalar
float/double convert operations.

Before this patch, X86FastISel always selected legacy SSE instructions
for FPExt (from float to double) and FPTrunc (from double to float).

For example:
\code
  define double @foo(float %f) {
    %conv = fpext float %f to double
    ret double %conv
  }
\end code

Before (with -mattr=+avx -fast-isel) X86FastIsel selected a CVTSS2SDrr which is
legacy SSE:
  cvtss2sd %xmm0, %xmm0

With this patch, X86FastIsel selects a VCVTSS2SDrr instead:
  vcvtss2sd %xmm0, %xmm0, %xmm0

Added test fast-isel-fptrunc-fpext.ll to check both the register-register and
the register-memory float/double conversion variants.

Differential Revision: http://reviews.llvm.org/D7438


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228682 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 12:04:41 +00:00
Nick Lewycky
3c5236ae68 Remove non-test files that appear to have been accidentally committed in r228641.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228657 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 02:39:17 +00:00
Chandler Carruth
1c7c2e8650 [x86] Fix PR22524: the DAG combiner was incorrectly handling illegal
nodes when folding bitcasts of constants.

We can't fold things and then check after-the-fact whether it was legal.
Once we have formed the DAG node, arbitrary other nodes may have been
collapsed to it. There is no easy way to go back. Instead, we need to
test for the specific folding cases we're interested in and ensure those
are legal first.

This could in theory make this less powerful for bitcasting from an
integer to some vector type, but AFAICT, that can't actually happen in
the SDAG so its fine. Now, we *only* whitelist specific int->fp and
fp->int bitcasts for post-legalization folding. I've added the test case
from the PR.

(Also as a note, this does not appear to be in 3.6, no backport needed)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228656 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 02:25:56 +00:00
David Majnemer
69114ee016 X86: Emit an ABI compliant prologue and epilogue for Win64
Win64 has specific contraints on what valid prologues and epilogues look
like.  This constraint is born from the flexibility and descriptiveness
of Win64's unwind opcodes.

Prologues previously emitted by LLVM could not be represented by the
unwind opcodes, preventing operations powered by stack unwinding to
successfully work.

Differential Revision: http://reviews.llvm.org/D7520

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228641 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 00:57:42 +00:00
Colin LeMahieu
6194244842 [Hexagon] Factoring classes out of store patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228602 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-09 20:33:46 +00:00
Sanjay Patel
93411cf4f8 fixed to test features, not CPUs
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228581 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-09 17:17:09 +00:00
Kit Barton
f60b0de42a This change implements the following three logical vector operations:
veqv (vector equivalence)
vnand
vorc
I increased the AddedComplexity for these instructions to 500 to ensure they are generated instead of issuing other VSX instructions.


Phabricator review: http://reviews.llvm.org/D7469


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228580 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-09 17:03:18 +00:00
Akira Hatanaka
522cf235e9 Fix a bug in DemoteRegToStack where a reload instruction was inserted into the
wrong basic block.

This would happen when the result of an invoke was used by a phi instruction
in the invoke's normal destination block. An instruction to reload the invoke's
value would get inserted before the critical edge was split and a new basic
block (which is the correct insertion point for the reload) was created. This
commit fixes the bug by splitting the critical edge before all the reload
instructions are inserted.

Also, hoist up the code which computes the insertion point to the only place
that need that computation.

rdar://problem/15978721


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228566 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-09 06:38:23 +00:00
Sanjay Patel
4616d7dd2f fix test attributes; this is an SSE2 test, not a Nehalem test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228546 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 21:14:27 +00:00
Sanjay Patel
751e3f1f80 fix test attributes; this is an x86-64 test, not a Nehalem test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228545 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 21:10:40 +00:00
Sanjay Patel
714f3d3a0f fix test attributes; these are SSE2 tests, not Nehalem tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228544 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 21:05:03 +00:00
Sanjay Patel
e755d452e0 fix test attributes; these are SSE2 tests, not Nehalem tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228541 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 20:50:58 +00:00
Sanjay Patel
78547012ac fix test attributes; these are x86-64 tests, not Nehalem tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228536 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 20:05:53 +00:00
Sanjay Patel
8d32999929 fix test attributes; these are MMX tests, not Nehalem tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228535 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 20:01:12 +00:00
Sanjay Patel
7596cf2b66 fix test attributes; these are SSE2 tests, not Nehalem tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228534 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 19:50:55 +00:00
Sanjay Patel
c3803c8bc2 generalize test; nothing Nehalem-specific here
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228532 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 19:38:25 +00:00
Simon Pilgrim
c92ffedc5c [X86][AVX2] AVX2 broadcast + permute memory folding tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228528 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 18:33:13 +00:00
Tim Northover
848d812931 ARM & AArch64: teach LowerVSETCC that output type size may differ from input.
While various DAG combines try to guarantee that a vector SETCC
operation will have the same output size as input, there's nothing
intrinsic to either creation or LegalizeTypes that actually guarantees
it, so the function needs to be ready to handle a mismatch.

Fortunately this is easy enough, just extend or truncate the naturally
compared result.

I couldn't reproduce the failure in other backends that I know have
SIMD, so it's probably only an issue for these two due to shared
heritage.

Should fix PR21645.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228518 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-08 00:50:47 +00:00
Simon Pilgrim
437265ee96 [X86][AVX2] AVX2 integer stack folding tests.
This adds tests for the remaining AVX2 instructions that currently support memory folding.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228513 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-07 23:28:16 +00:00
Simon Pilgrim
2134ae7f38 [X86][AVX] Added missing stack folding support + test for vptest ymm instruction
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228509 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-07 21:44:06 +00:00
Simon Pilgrim
710e70bb70 [X86][SSE] Added missing stack folding tests for (v)mpsadbw instruction
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228506 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-07 21:20:11 +00:00
Simon Pilgrim
3281412d2a [X86] Force fp stack folding tests to keep to specific domain.
General boolean instructions (AND, ANDN, OR, XOR) need to use a specific domain instruction (and not just the default).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228495 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-07 16:14:55 +00:00
Simon Pilgrim
bf4a435d0a [X86][AVX2] More AVX2 integer stack folding tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228494 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-07 16:07:27 +00:00
David Majnemer
fdac306a12 MC: Emit COFF section flags in the "proper" order
COFF section flags are not idempotent:
  'rd' will make a read-write section because 'd' implies write
  'dr' will make a read-only section because 'r' disables write

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228490 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-07 08:26:40 +00:00
Hal Finkel
05bd43dc6e [PowerPC] Handle loop predecessor invokes
If a loop predecessor has an invoke as its terminator, and the return value
from that invoke is used to determine the loop iteration space, then we can't
insert a computation based on that value in the loop predecessor prior to the
terminator (oops). If there's such an invoke, or just no predecessor for that
matter, insert a new loop preheader.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228488 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-07 07:32:58 +00:00
Hal Finkel
9ce4011708 [PowerPC] Fixup incomplete revert of test/CodeGen/PowerPC/tls-pic.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228467 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 23:30:06 +00:00
Simon Pilgrim
148482dd6b [X86][AVX2] Begun adding AVX2 integer stack folding tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228462 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 23:12:15 +00:00
Hal Finkel
9168f717c9 Revert "r227976 - [PowerPC] Yet another approach to __tls_get_addr" and related fixups
Unfortunately, even with the workaround of disabling the linker TLS
optimizations in Clang restored (which has already been done), this still
breaks self-hosting on my P7 machine (-O3 -DNDEBUG -mcpu=native).

Bill is currently working on an alternate implementation to address the TLS
issue in a way that also fully elides the linker bug (which, unfortunately,
this approach did not fully), so I'm reverting this now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228460 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 23:07:40 +00:00
Reid Kleckner
6dc42dd2da Don't dllexport declarations
Fixes PR22488

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228411 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 17:59:49 +00:00
Michel Danzer
7097d17da0 R600/SI: Amend a test to ensure WQM is enabled for LDS in pixel shaders
Reviewed-by: Tom Stellard <tom@stellard.net>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228374 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 02:51:29 +00:00
Michel Danzer
971f0f0071 R600/SI: Don't enable WQM for V_INTERP_* instructions v2
Doesn't seem necessary anymore. I think this was mostly compensating for
not enabling WQM for texture sampling instructions.

v2: Add test coverage
Reviewed-by: Tom Stellard <tom@stellard.net>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228373 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 02:51:25 +00:00