Commit Graph

12114 Commits

Author SHA1 Message Date
Juergen Ributzka
9952c922c2 Recommit r218010 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ.
Note: This version fixed an issue with the TBZ/TBNZ instructions that were
generated in FastISel. The issue was that the 64bit version of TBZ (TBZX)
automagically sets the upper bit of the immediate field that is used to specify
the bit we want to test. To test for any of the lower 32bits we have to first
extract the subregister and use the 32bit version of the TBZ instruction (TBZW).

Original commit message:
Teach selectBranch to fold bit test and branch into a single instruction (TBZ or
TBNZ).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218693 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 19:59:35 +00:00
Matt Arsenault
28233d3a63 R600/SI: Fix printing of clamp and omod
No tests for omod since nothing uses it yet, but
this should get rid of the remaining annoying trailing
zeros after some instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218692 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 19:49:48 +00:00
Reed Kotler
8a6f79e58d Add numeric extend, trunctate to mips fast-isel
Summary:
 Add numeric extend, trunctate to mips fast-isel

 Reactivates D4827



Test Plan:
fpext.ll
loadstoreconv.ll

Reviewers: dsanders

Subscribers: mcrosier

Differential Revision: http://reviews.llvm.org/D5251

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218681 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 16:30:13 +00:00
Robert Khasanov
8acdc5232d [AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VCMPGT{BWDQ}.
Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218670 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 12:15:52 +00:00
Robert Khasanov
175ff01f0f [AVX512] Added intrinsics for 128- and 256-bit versions of VCMPEQ{BWDQ}
Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1.
Now cmp intrinsics lower in the following way:
 (i8 (int_x86_avx512_mask_pcmpeq_q_128
             (v2i64 %a), (v2i64 %b), (i8 %mask))) ->
 (i8 (bitcast
   (v8i1 (insert_subvector undef,
           (v2i1 (and (PCMPEQM %a, %b),
                      (extract_subvector
                         (v8i1 (bitcast %mask)), 0))), 0))))


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218669 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 11:41:54 +00:00
Robert Khasanov
cfa5724d50 [AVX512] Added intrinsics for VPCMPEQB and VPCMPEQW.
Added new operand type for intrinsics (IIT_V64)


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218668 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 11:32:22 +00:00
Robert Khasanov
58da66b2bf [AVX512] Enabled intrinsics for VPCMPEQD and VPCMPEQQ.
Added CMP_MASK intrinsic type


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218667 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 11:19:50 +00:00
Chandler Carruth
4abb04a65c [x86] Revert r218588, r218589, and r218600. These patches were pursuing
a flawed direction and causing miscompiles. Read on for details.

Fundamentally, the premise of this patch series was to map
VECTOR_SHUFFLE DAG nodes into VSELECT DAG nodes for all blends because
we are going to *have* to lower to VSELECT nodes for some blends to
trigger the instruction selection patterns of variable blend
instructions. This doesn't actually work out so well.

In order to match performance with the existing VECTOR_SHUFFLE
lowering code, we would need to re-slice the blend in order to fit it
into either the integer or floating point blends available on the ISA.
When coming from VECTOR_SHUFFLE (or other vNi1 style VSELECT sources)
this works well because the X86 backend ensures that these types of
operands to VSELECT get sign extended into '-1' and '0' for true and
false, allowing us to re-slice the bits in whatever granularity without
changing semantics.

However, if the VSELECT condition comes from some other source, for
example code lowering vector comparisons, it will likely only have the
required bit set -- the high bit. We can't blindly slice up this style
of VSELECT. Reid found some code using Halide that triggers this and I'm
hopeful to eventually get a test case, but I don't need it to understand
why this is A Bad Idea.

There is another aspect that makes this approach flawed. When in
VECTOR_SHUFFLE form, we have very distilled information that represents
the *constant* blend mask. Converting back to a VSELECT form actually
can lose this information, and so I think now that it is better to treat
this as VECTOR_SHUFFLE until the very last moment and only use VSELECT
nodes for instruction selection purposes.

My plan is to:
1) Clean up and formalize the target pre-legalization DAG combine that
   converts a VSELECT with a constant condition operand into
   a VECTOR_SHUFFLE.
2) Remove any fancy lowering from VSELECT during *legalization* relying
   entirely on the DAG combine to catch cases where we can match to an
   immediate-controlled blend instruction.

One additional step that I'm not planning on but would be interested in
others' opinions on: we could add an X86ISD::VSELECT or X86ISD::BLENDV
which encodes a fully legalized VSELECT node. Then it would be easy to
write isel patterns only in terms of this to ensure VECTOR_SHUFFLE
legalization only ever forms the fully legalized construct and we can't
cycle between it and VSELECT combining.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218658 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 02:52:28 +00:00
Chandler Carruth
52b072d73f [x86] Add some vector-register broadcast operations to the 256-bit v4
tests which were missing them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218657 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 02:32:36 +00:00
Matt Arsenault
cbb188bffc R600: Fix broken check lines, missing scalar case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218655 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 01:05:29 +00:00
Juergen Ributzka
a0af4b0271 [FastISel][AArch64] Fold sign-/zero-extends into the load instruction.
The sign-/zero-extension of the loaded value can be performed by the memory
instruction for free. If the result of the load has only one use and the use is
a sign-/zero-extend, then we emit the proper load instruction. The extend is
only a register copy and will be optimized away later on.

Other instructions that consume the sign-/zero-extended value are also made
aware of this fact, so they don't fold the extend too.

This fixes rdar://problem/18495928.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218653 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 00:49:58 +00:00
Eric Christopher
6a2169eb6f Add soft-float to the key for the subtarget lookup in the TargetMachine
map, this makes sure that we can compile the same code for two different
ABIs (hard and soft float) in the same module.

Update one testcase accordingly (and fix some confusing naming) and
add a new testcase as well with the ordering swapped which would
highlight the problem.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218632 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-29 21:57:54 +00:00
Matt Arsenault
49cbc1891b R600/SI: Also fix fsub + fadd a, a to mad combines
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218609 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-29 14:59:38 +00:00
Matt Arsenault
a5f45d5444 R600/SI: Fix using mad with multiplies by 2
These turn into fadds, so combine them into the target
mad node.

fadd (fadd (a, a), b) -> mad 2.0, a, b

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218608 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-29 14:59:34 +00:00
Chandler Carruth
8ac2f142a8 [x86] Make the new vector shuffle lowering lower blends as VSELECT
nodes, and rely exclusively on its logic. This removes a ton of
duplication from the blend lowering and centralizes it in one place.

One downside is that it requires a bunch of hacks to make this work with
the current legalization framework. We have to manually speculate one
aspect of legalizing VSELECT nodes to get everything to work nicely
because the existing legalization framework isn't *actually* bottom-up.

The other grossness is that we somewhat duplicate the analysis of
constant blends. I'm on the fence here. If reviewers thing this would
look better with VSELECT when it has constant operands dumping over tho
VECTOR_SHUFFLE, we could go that way. But it would be a substantial
change because currently all of the actual blend instructions are
matched via patterns in the TD files based around VSELECT nodes (despite
them not being perfect fits for that). Suggestions welcome, but at least
this removes the rampant duplication in the backend.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218600 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-29 09:57:07 +00:00
Chandler Carruth
d23f1883d3 [x86] Delete a bunch of really bad and totally unnecessary code in the
X86 target-specific DAG combining that tried to convert VSELECT nodes
into VECTOR_SHUFFLE nodes that it "knew" would lower into
immediate-controlled blend nodes.

Turns out, we have perfectly good lowering of all these VSELECT nodes,
and indeed that lowering already knows how to handle lowering through
BLENDI to immediate-controlled blend nodes. The code just wasn't getting
used much because this thing forced the world to go through the vector
shuffle lowering. Yuck.

This also exposes that I was too aggressive in avoiding domain crossing
in v218588 with that lowering -- when the other option is to expand into
two 128-bit vectors, it is worth domain crossing. Restore that behavior
now that we have nice tests covering it.

The test updates here fall into two camps. One is where previously we
ended up with an unsigned encoding of the blend operand and now we get
a signed encoding. In most of those places there were elaborate comments
explaining exactly what these operands really mean. Rather than that,
just switch these tests to use the nicely decoded comments that make it
obvious that the final shuffle matches.

The other updates are just removing pointless domain crossing by
blending integers with PBLENDW rather than BLENDPS.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218589 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-29 02:01:20 +00:00
Chandler Carruth
8e93ce1780 [x86] Add the dispatch skeleton to the new vector shuffle lowering for
AVX-512.

There is no interesting logic yet. Everything ends up eventually
delegating to the generic code to split the vector and shuffle the
halves. Interestingly, that logic does a significantly better job of
lowering all of these types than the generic vector expansion code does.
Mostly, it lets most of the cases fall back to nice AVX2 code rather
than all the way back to SSE code paths.

Step 2 of basic AVX-512 support in the new vector shuffle lowering. Next
up will be to incrementally add direct support for the basic instruction
set to each type (adding tests first).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218585 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-29 00:37:27 +00:00
Chandler Carruth
b61dfec824 [x86] Teach the new vector shuffle lowering to fall back on AVX-512
vectors.

Someone will need to build the AVX512 lowering, which should follow
AVX1 and AVX2 *very* closely for AVX512F and AVX512BW resp. I've added
a dummy test which is a port of the v8f32 and v8i32 tests from AVX and
AVX2 to v8f64 and v8i64 tests for AVX512F and AVX512BW. Hopefully this
is enough information for someone to implement proper lowering here. If
not, I'll be happy to help, but right now the AVX-512 support isn't
a priority for me.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218583 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-28 23:53:10 +00:00
Chandler Carruth
4f4280469c [x86] Fix the new vector shuffle lowering's use of VSELECT for AVX2
lowerings.

This was hopelessly broken. First, the x86 backend wants '-1' to be the
element value representing true in a boolean vector, and second the
operand order for VSELECT is backwards from the actual x86 instructions.
To make matters worse, the backend is just using '-1' as the true value
to get the high bit to be set. It doesn't actually symbolically map the
'-1' to anything. But on x86 this isn't quite how it works: there *only*
the high bit is relevant. As a consequence weird non-'-1' values like
0x80 actually "work" once you flip the operands to be backwards.

Anyways, thanks to Hal for helping me sort out what these *should* be.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218582 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-28 23:23:55 +00:00
Chandler Carruth
3f40848670 [x86] Fix a really silly bug that I introduced fixing another bug in the
new vector shuffle target DAG combines -- it helps to actually test for
the value you want rather than just using an integer in a boolean
context.

Have I mentioned that I loathe implicit conversions recently? :: sigh ::

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218576 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-28 06:11:04 +00:00
Chandler Carruth
21b69296fb [x86] Fix yet another bug in the new vector shuffle lowering's handling
of widening masks.

We can't widen a zeroing mask unless both elements that would be merged
are either zeroed or undef. This is the only way to widen a mask if it
has a zeroed element.

Also clean up the code here by ordering the checks in a more logical way
and by using the symoblic values for undef and zero. I'm actually torn
on using the symbolic values because the existing code is littered with
the assumption that -1 is undef, and moreover that entries '< 0' are the
special entries. While that works with the values given to these
constants, using the symbolic constants actually makes it a bit more
opaque why this is the case.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218575 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-28 03:30:25 +00:00
James Molloy
aada52189e [AArch64] Redundant store instructions should be removed as dead code
If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed.

This problem is found in spec2006-197.parser.

For example,
  stur    w10, [x11, #-4]
  stur    w10, [x11, #-4]
Then one of the two stur instructions can be removed.

Patch by David Xu!



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218569 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-27 17:02:54 +00:00
Chandler Carruth
72c3b07dfd [x86] Fix terrible bugs everywhere in the new vector shuffle lowering
and in the target shuffle combining when trying to widen vector
elements.

Previously only one of these was correct, and we didn't correctly
propagate zeroing target shuffle masks (which have a different sentinel
value from undef in non- target shuffle masks now). This isn't just
a missed optimization, this caused us to drop zeroing shuffles on the
floor and miscompile code. The added test case is one example of that.

There are other fixes to the test suite as a consequence of this as well
as restoring the undef elements in some of the masks that were lost when
I brought sanity to the actual *value* of the undef and zero sentinels.

I've also just cleaned up some of the PSHUFD and PSHUFLW and PSHUFHW
combining code, but that code really needs to go. It was a nice initial
attempt, but it isn't very principled and the recursive shuffle combiner
is much more powerful.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218562 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-27 04:42:44 +00:00
Chandler Carruth
8470b5b812 [x86] Flip the sentinel values used in the target shuffle mask decoding
to significantly more sane sentinels. Notably, everywhere else in the
backend's representation of shuffles uses '-1' to represent undef. The
target shuffle masks really shouldn't diverge from that, especially as
in a few places they are manipulated by shared code.

This causes us to lose some undef lanes in various test masks. I want to
get these back, but technically it isn't invalid and there are a *lot*
of bugs here so I want to try to establish a saner baseline for fixing
some of the bugs by aligning the specific senitnel values used.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218561 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-27 04:42:39 +00:00
Sanjay Patel
676af35b38 Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2).
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

And:

z = y / x

into:

z = y * rcpe(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.

Differential Revision: http://reviews.llvm.org/D5484



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218553 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 23:01:47 +00:00
Chandler Carruth
0a31a52b91 [x86] Fix a moderately terrifying bug in the new 128-bit shuffle logic
that managed to elude all of my fuzz testing historically. =/

Something changed to allow this code path to actually be exercised and
it was doing bad things. It is especially heavily exercised by the
patterns that emerge when doing AVX shuffles that end up lowered through
the 128-bit code path.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218540 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 20:41:45 +00:00
Matt Arsenault
5435c66a33 R600/SI: Add strict check lines to div_scale tests.
This has weird operand requirements so it's worthwhile
to have very strict checks for its operands.

Add different combinations of SGPR operands.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218535 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 17:55:11 +00:00
Matt Arsenault
d991d2217b R600/SI Allow same SGPR to be used for multiple operands
Instead of moving the first SGPR that is different than the first,
legalize the operand that requires the fewest moves if one
SGPR is used for multiple operands.

This saves extra moves and is also required for some instructions
which require that the same operand be used for multiple operands.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218532 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 17:55:03 +00:00
Matt Arsenault
aed12d4bad R600/SI: Partially move operand legalization to post-isel hook.
Disable the SGPR usage restriction parts of the DAG legalizeOperands.
It now should only be doing immediate folding until it can be replaced
later. The real legalization work is now done by the other
SIInstrInfo::legalizeOperands

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218531 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 17:54:59 +00:00
Matt Arsenault
8a70e28114 R600/SI: Don't move operands that are required to be SGPRs
e.g. v_cndmask_b32 requires the condition operand be an SGPR.
If one of the source operands were an SGPR, that would be considered
the one SGPR use and the condition operand would be illegally moved.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218529 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 17:54:52 +00:00
Matt Arsenault
26b2a7834e R600/SI: Fix using wrong operand indices when commuting
No test since the current SIISelLowering::legalizeOperands
effectively hides this, and the general uses seem to only fire
on SALU instructions which don't have modifiers between
the operands.

When trying to use legalizeOperands immediately after
instruction selection, it now sees a lot more patterns
it did not see before which break on this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218527 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 17:54:43 +00:00
Chandler Carruth
7929a210d5 [x86] In the new vector shuffle lowering, when trying to do another
layer of tie-breaking sorting, it really helps to check that you're in
a tie first. =] Otherwise the whole thing cycles infinitely. Test case
added, another one found through fuzz testing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218523 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 17:24:26 +00:00
Chandler Carruth
7164a4ae0a [x86] Fix a large collection of bugs that crept in as I fleshed out the
AVX support.

New test cases included. Note that none of the existing test cases
covered these buggy code paths. =/ Also, it is clear from this that
SHUFPS and SHUFPD are the most bug prone shuffle instructions in x86. =[

These were all detected by fuzz-testing. (I <3 fuzz testing.)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218522 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 17:11:02 +00:00
Robert Khasanov
26ba182fdf [AVX512] Added load/store from BW/VL subsets to Register2Memory opcode tables.
Added lowering tests for these instructions.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218508 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 09:48:50 +00:00
David Xu
abf5bf221f Revert patch of r218493, delete the test case
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218495 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 02:40:54 +00:00
David Xu
c41ae2a5c4 Redundant store instructions should be removed as dead code
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218493 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 02:02:09 +00:00
Eric Christopher
55a90ab4ef Add the first backend support for on demand subtarget creation
based on the Function. This is currently used to implement
mips16 support in the mips backend via the existing module
pass resetting the subtarget.

Things to note:

a) This involved running resetTargetOptions before creating a
new subtarget so that code generation options like soft-float
could be recognized when creating the new subtarget. This is
to deal with initialization code in isel lowering that only
paid attention to the initial value.

b) Many of the existing testcases weren't using the soft-float
feature correctly. I've corrected these based on the check
values assuming that was the desired behavior.

c) The mips port now pays attention to the target-cpu and
target-features strings when generating code for a particular
function. I've removed these from one function where the
requested cpu and features didn't match the check lines in
the testcase.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218492 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 01:44:08 +00:00
Matt Arsenault
deaa9d8c72 R600: Avoid repeated check lines
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218487 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 01:12:36 +00:00
Matt Arsenault
584886c0bb R600/SI: Fix emitting trailing whitespace after s_waitcnt
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218486 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 01:09:46 +00:00
Adam Nemet
2f3ccfc257 [AVX512] Make vextract*x4/vinsert*x4 tests check for the index as well
Extend test so that it provides coverage for the next commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218479 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 23:48:47 +00:00
Matt Arsenault
3011a602be R600: Fix some missing conversion testcases
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218474 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 23:16:18 +00:00
Matt Arsenault
556ae0484a Remove duplicated RUN lines in middle of test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218473 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 23:16:14 +00:00
Bruno Cardoso Lopes
f4230250a1 [MachineSink+PGO] Teach MachineSink to use BlockFrequencyInfo
Machine Sink uses loop depth information to select between successors BBs to
sink machine instructions into, where BBs within smaller loop depths are
preferable.  This patch adds support for choosing between successors by using
profile information from BlockFrequencyInfo instead, whenever the information
is available.

Tested it under SPEC2006 train (average of 30 runs for each program); ~1.5%
execution speedup in average on x86-64 darwin.

<rdar://problem/18021659>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218472 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 23:14:26 +00:00
Tom Stellard
29d48e6a49 R600/SI: Add support for global atomic add
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218457 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 18:30:26 +00:00
Robin Morisset
79826e015e Lower idempotent RMWs to fence+load
Summary:
I originally tried doing this specifically for X86 in the backend in D5091,
but it was rather brittle and generally running too late to be general.
Furthermore, other targets may want to implement similar optimizations.
So I reimplemented it at the IR-level, fitting it into AtomicExpandPass
as it interacts with that pass (which could not be cleanly done before
at the backend level).

This optimization relies on a new target hook, which is only used by X86
for now, as the correctness of the optimization on other targets remains
an open question. If it is found correct on other targets, it should be
trivial to enable for them.

Details of the optimization are discussed in D5091.

Test Plan: make check-all + a new test

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5422

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218455 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 17:27:43 +00:00
Sid Manning
733681d3bd Add missing attributes !cmp.[eq,gt,gtu] instructions.
These instructions do not indicate they are extendable or the
number of bits in the extendable operand.  Rename to match
architected names.  Add a testcase for the intrinsics.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218453 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 13:09:54 +00:00
Daniel Sanders
03fe69e90d [mips] Add CCValAssign::[ASZ]ExtUpper and CCPromoteToUpperBitsInType and handle struct's correctly on big-endian N32/N64 return values.
Summary:
The N32/N64 ABI's require that structs passed in registers are laid out
such that spilling the register with 'sd' places the struct at the lowest
address. For little endian this is trivial but for big-endian it requires
that structs are shifted into the upper bits of the register.

We also require that structs passed in registers have the 'inreg'
attribute for big-endian N32/N64 to work correctly. This is because the
tablegen-erated calling convention implementation only has access to the
lowered form of struct arguments (one or more integers of up to 64-bits
each) and is unable to determine the original type.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5286

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218451 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 12:15:05 +00:00
Chandler Carruth
4b667ee436 [x86] Teach the new vector shuffle lowering to use AVX2 instructions for
v4f64 and v8f32 shuffles when they are lane-crossing. We have fully
general lane-crossing permutation functions in AVX2 that make this easy.

Part of this also changes exactly when and how these vectors are split
up when we don't have AVX2. This isn't always a win but it usually is
a win, so on the balance I think its better. The primary regressions are
all things that just need to be fixed anyways such as modeling when
a blend can be completely accomplished via VINSERTF128, etc.

Also, this highlights one of the few remaining big features: we do
a really poor job of inserting elements into AVX registers efficiently.

This completes almost all of the big tricks I have in mind for AVX2. The
only things left that I plan to add:

1) element insertion smarts
2) palignr and other fairly specialized lowerings when they happen to
   apply

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218449 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 11:03:55 +00:00
Chandler Carruth
05901d80ba [x86] Teach the new vector shuffle lowering a fancier way to lower
256-bit vectors with lane-crossing.

Rather than immediately decomposing to 128-bit vectors, try flipping the
256-bit vector lanes, shuffling them and blending them together. This
reduces our worst case shuffle by a pretty significant margin across the
board.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218446 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 10:21:15 +00:00
Chandler Carruth
2e8d2c727c [x86] Fix an oversight in the v8i32 path of the new vector shuffle
lowering where it only used the mask of the low 128-bit lane rather than
the entire mask.

This allows the new lowering to correctly match the unpack patterns for
v8i32 vectors.

For reference, the reason that we check for the the entire mask rather
than checking the repeated mask is because the repeated masks don't
abide by all of the invariants of normal masks. As a consequence, it is
safer to use the full mask with functions like the generic equivalence
test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218442 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 04:10:27 +00:00
Chandler Carruth
e3bb4bb2d5 [x86] Implement AVX2 support for v32i8 in the new vector shuffle
lowering.

This completes the basic AVX2 feature support, but there are still some
improvements I'd like to do to really get the last mile of performance
here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218440 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 02:52:12 +00:00
Chandler Carruth
1d63231455 [x86] More tweaks to the v32i8 test cases.
I made a mistake in the previous commit and produced the wrong pattern.
Fix that. Also make one more shuffle pattern byte-based rather than
word-based, and add two more blend patterns.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218439 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 02:44:39 +00:00
Chandler Carruth
a87d04a759 [x86] Re-work a bunch of the v32i8 test cases to actually involve byte
shuffles rather than word shuffles.

As you might guess, these were built starting from the word shuffle test
cases and I failed to properly port a bunch of them and left them as
widened word shuffle test cases. We still have a couple of tests that
check our ability to widen shuffles, but now we will test the actual
byte shuffle quite a bit better.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218438 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 02:20:02 +00:00
Chandler Carruth
ef673b3c73 [x86] Fix the v16i16 blend logic I added in the prior commit and add the
missing test cases for it.

Unsurprisingly, without test cases, there were bugs here. Surprisingly,
this bug wasn't caught at compile time. Yep, there is an X86ISD::BLENDV.
It isn't wired to anything. Oops. I'll fix than next.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218434 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 01:13:38 +00:00
Akira Hatanaka
0253523c92 [X86,AVX] Add an isel pattern for X86VBroadcast.
This fixes PR21050 and rdar://problem/18434607.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218431 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 00:26:15 +00:00
Chandler Carruth
bdecfeb723 [x86] Implement v16i16 support with AVX2 in the new vector shuffle
lowering.

This also implements the fancy blend lowering for v16i16 using AVX2 and
teaches the X86 backend to print shuffle masks for 256-bit PSHUFB
and PBLENDW instructions. It also makes the mask decoding correct for
PBLENDW instructions. The yaks, they are legion.

Tests are updated accordingly. There are some missing tests for the
VBLENDVB lowering, but I'll add those in a follow-up as this commit has
accumulated enough cruft already.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218430 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 00:24:19 +00:00
Moritz Roth
8c4e64af8a [Thumb] Make load/store optimizer less conservative.
If it's safe to clobber the condition flags, we can do a few extra things:
it's then possible to reset the base register writeback using a SUBS, so
we can try to merge even if the base register isn't dead after the merged
instruction.

This is effectively a (heavily bug-fixed) rewrite of r208992.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218386 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 16:35:50 +00:00
Chandler Carruth
10cd8098a7 [x86] Teach the instruction lowering to add comments describing constant
pool data being loaded into a vector register.

The comments take the form of:

  # ymm0 = [a,b,c,d,...]
  # xmm1 = <x,y,z...>

The []s are used for generic sequential data and the <>s are used for
specifically ConstantVector loads. Undef elements are printed as the
letter 'u', integers in decimal, and floating point values as floating
point values. Suggestions on improving the formatting or other aspects
of the display are very welcome.

My primary use case for this is to be able to FileCheck test masks
passed to vector shuffle instructions in-register. It isn't fantastic
for that (no decoding special zeroing semantics or other tricks), but it
at least puts the mask onto an instruction line that could reasonably be
checked. I've updated many of the new vector shuffle lowering tests to
leverage this in their test cases so that we're actually checking the
shuffle masks remain as expected.

Before implementing this, I tried a *bunch* of different approaches.
I looked into teaching the MCInstLower code to scan up the basic block
and find a definition of a register used in a shuffle instruction and
then decode that, but this seems incredibly brittle and complex.
I talked to Hal a lot about the "right" way to do this: attach the raw
shuffle mask to the instruction itself in some form of unencoded
operands, and then use that to emit the comments. I still think that's
the optimal solution here, but it proved to be beyond what I'm up for
here. In particular, it seems likely best done by completing the
plumbing of metadata through these layers and attaching the shuffle mask
in metadata which could have fully automatic dropping when encoding an
actual instruction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218377 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 09:39:41 +00:00
Matt Arsenault
0bb38df86c R600/SI: Fix weird CHECK-DAG usage
This prevents these from failing in a future commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218356 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 02:14:26 +00:00
Tom Stellard
81c6c9690a R600/SI: Enable selecting SALU inside branches
We can do this now that the FixSGPRLiveRanges pass is working.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218353 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 01:33:28 +00:00
Chandler Carruth
6717f9d907 [x86] Teach the new vector shuffle lowering to lower v8i32 shuffles with
the native AVX2 instructions.

Note that the test case is really frustrating here because VPERMD
requires the mask to be in the register input and we don't produce
a comment looking through that to the constant pool. I'm going to
attempt to improve this in a subsequent commit, but not sure if I will
succeed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218347 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 01:24:44 +00:00
Chandler Carruth
8415f84e49 [x86] Fix a really terrible bug in the repeated 128-bin-lane shuffle
detection. It was incorrectly handling undef lanes by actually treating
an undef lane in the first 128-bit lane as a *numeric* shuffle value.

Fortunately, this almost always DTRT and disabled detecting repeated
patterns. But not always. =/ This patch introduces a much more
principled approach and fixes the miscompiles I spotted by inspection
previously.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218346 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 01:03:57 +00:00
Robin Morisset
73ce2886b1 Fix swift-atomics testcase
This testcase was not testing what it meant: because there were only two checks for
dmb {{ish}} in the second function, it could have missed a bug where one of the three
required dmb {{ish}} became dmb {{ishst}}. As I was fixing it, I also added
CHECK-LABELs to make it a bit less brittle.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218341 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 23:18:01 +00:00
Chandler Carruth
30ce74b5e3 [x86] Teach the new vector shuffle lowering to lower v4i64 vector
shuffles using the AVX2 instructions. This is the first step of cutting
in real AVX2 support.

Note that I have spotted at least one bug in the test cases already, but
I suspect it was already present and just is getting surfaced. Will
investigate next.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218338 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 22:39:02 +00:00
Chandler Carruth
798f2849c3 [x86] Teach the rest of the 'target shuffle' machinery about blends and
add VPBLENDD to the InstPrinter's comment generation so we get nice
comments everywhere.

Now that we have the nice comments, I can see the bug introduced by
a silly typo in the commit that enabled VPBLENDD, and have fixed it. Yay
tests that are easy to inspect.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218335 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 22:14:14 +00:00
Robin Morisset
30e7514d01 [X86] Make wide loads be managed by AtomicExpand
Summary:
AtomicExpand already had logic for expanding wide loads and stores on LL/SC
architectures, and for expanding wide stores on CmpXchg architectures, but
not for wide loads on CmpXchg architectures. This patch fills this hole,
and makes use of this new feature in the X86 backend.

Only one functionnal change: we now lose the SynchScope attribute.
It is regrettable, but I have another patch that I will submit soon that will
solve this for all of AtomicExpand (it seemed better to split it apart as it
is a different concern).

Test Plan: make check-all (lots of tests for this functionality already exist)

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5404

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218332 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 20:59:25 +00:00
Robin Morisset
58bca6e8ec [Power] Use AtomicExpandPass for fence insertion, and use lwsync where appropriate
Summary:
This patch makes use of AtomicExpandPass in Power for inserting fences around
atomic as part of an effort to remove fence insertion from SelectionDAGBuilder.
As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic
lwsync) instead of sync 0 (heavyweight sync) in many cases.

I also added a test, as there was no test for the barriers emitted by the Power
backend for atomic loads and stores.

Test Plan: new test + make check-all

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5180

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218331 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 20:46:49 +00:00
Chandler Carruth
7024c7e949 [x86] Teach the new shuffle lowering's blend functionality to use AVX2's
VPBLENDD where appropriate even on 128-bit vectors.

According to Agner's tables, this instruction is significantly higher
throughput (can execute on any port) on Haswell chips so we should
aggressively try to form it when available.

Sadly, this loses our delightful shuffle comments. I'll add those back
for VPBLENDD next.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218322 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 18:16:12 +00:00
Chandler Carruth
4850be49a3 [x86] Teach the vector comment parsing and printing to correctly handle
undef in the shuffle mask. This shows up when we're printing comments
during lowering and we still have an IR-level constant hanging around
that models undef.

A nice consequence of this is *much* prettier test cases where the undef
lanes actually show up as undef rather than as a particular set of
values. This also allows us to print shuffle comments in cases that use
undef such as the recently added variable VPERMILPS lowering. Now those
test cases have nice shuffle comments attached with their details.

The shuffle lowering for PSHUFB has been augmented to use undef, and the
shuffle combining has been augmented to comprehend it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218301 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 11:15:19 +00:00
Chandler Carruth
8f637786d8 [x86] Teach the AVX1 path of the new vector shuffle lowering one more
trick that I missed.

VPERMILPS has a non-immediate memory operand mode that allows it to do
asymetric shuffles in the two 128-bit lanes. Use this rather than two
shuffles and a blend.

However, it turns out the variable shuffle path to VPERMILPS (and
VPERMILPD, although that one offers no functional differenc from the
immediate operand other than variability) wasn't even plumbed through
codegen. Do such plumbing so that we can reasonably emit
a variable-masked VPERMILP instruction. Also plumb basic comment parsing
and printing through so that the tests are reasonable.

There are still a few tests which don't show the shuffle pattern. These
are tests with undef lanes. I'll teach the shuffle decoding and printing
to handle undef mask entries in a follow-up. I've looked at the masks
and they seem reasonable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218300 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 10:08:29 +00:00
Sanjay Patel
c4ef4e47c2 tighten up checks
We manage to generate all of the matching instructions (and a lot more) via
the reciprocal optimization function - even if we completely remove the square
root optimization. With CHECK_NEXT, we assure that we're executing the
expected square root optimization paths and not generating extra insts.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218284 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 22:46:44 +00:00
Sanjay Patel
90969b9ee0 remove unnecessary labels; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218278 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 21:52:53 +00:00
Juergen Ributzka
af989653e0 [FastISel][AArch64] Also allow folding of sign-/zero-extend and shift-left for booleans (i1).
Shift-left immediate with sign-/zero-extensions also works for boolean values.
Update the assert and the test cases to reflect that fact.

This should fix a bug found by Chad.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218275 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 21:08:53 +00:00
Chandler Carruth
56c7cfe41f [x86] Introduce tests covering the gamut of 256-bit vector shuffling.
These are just test cases, no actual code yet. This establishes the
baseline fallback strategy we're starting from on AVX2 and the expected
lowering we use on AVX1.

Also, these test cases are very much generated. I've manually crafted
the specific pattern set that I'm hoping will be useful at exercising
the lowering code, but I've not (and could not) manually verify *all* of
these. I've spot checked and they seem legit to me.

As with the rest of vector shuffling, at a certain point the only really
useful way to check the correctness of this stuff is through fuzz
testing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218267 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 20:25:08 +00:00
Sanjay Patel
6539887847 Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2).
We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors.
This patch should preserve all existing behavior with regular optimization levels, 
but also use splats whenever possible when optimizing for *size* on any CPU with AVX or AVX2.

The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save
at least 8 bytes (up to 31 bytes) of constant pool data.

Differential Revision: http://reviews.llvm.org/D5347



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218263 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 18:54:01 +00:00
Akira Hatanaka
73c604b290 Fix test case commited in r218242 to appease buildbot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218261 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 18:07:20 +00:00
Tom Stellard
e1bc40b1e6 Revert "R600/SI: Add support for global atomic add"
This reverts commit r218254.

The global_atomics.ll test fails with asserts disabled.  For some reason,
the compiler fails to produce the atomic no return variants.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218257 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 16:44:04 +00:00
Tom Stellard
6d625ad495 R600/SI: Add support for global atomic add
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218254 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 15:35:35 +00:00
Pavel Chupin
25c57d5cfe [x32] Fix segmented stacks support
Summary:
Update segmented-stacks*.ll tests with x32 target case and make
corresponding changes to make them pass.

Test Plan: tests updated with x32 target

Reviewers: nadav, rafael, dschuff

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D5245

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218247 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 13:11:35 +00:00
Robert Lougher
2ee97f03a4 Fix assert when decoding PSHUFB mask
The PSHUFB mask decode routine used to assert if the mask index was out of
range (<0 or greater than the size of the vector).  The problem is, we can
legitimately have a PSHUFB with a large index using intrinsics.  The
instruction only uses the least significant 4 bits.  This change removes the
assert and masks the index to match the instruction behaviour.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218242 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 11:54:38 +00:00
Chandler Carruth
ec35919c9a [x86] Move the AVX v4i64 test cases down to group them together.
Increasingly I don't want to mix the integer and floating point tests,
especially with AVX where they are handled quite differently.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218233 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 03:05:23 +00:00
Chandler Carruth
de95c380c7 [x86] Back out a bad choice about lowering v4i64 and pave the way for
a more sane approach to AVX2 support.

Fundamentally, there is no useful way to lower integer vectors in AVX.
None. We always end up with a VINSERTF128 in the end, so we might as
well eagerly switch to the floating point domain and do everything
there. This cleans up lots of weird and unlikely to be correct
differences between integer and floating point shuffles when we only
have AVX1.

The other nice consequence is that by doing things this way we will make
it much easier to write the integer lowering routines as we won't need
to duplicate the logic to check for AVX vs. AVX2 in each one -- if we
actually try to lower a 256-bit vector as an integer vector, we have
AVX2 and can rely on it. I think this will make the code much simpler
and more comprehensible.

Currently, I've disabled *all* support for AVX2 so that we always fall
back to AVX. This keeps everything working rather than asserting. That
will go away with the subsequent series of patches that provide
a baseline AVX2 implementation.

Please note, I'm going to implement AVX2 *without access to hardware*.
That means I cannot correctness test this path. I will be relying on
those with access to AVX2 hardware to do correctness testing and fix
bugs here, but as a courtesy I'm trying to sketch out the framework for
the new-style vector shuffle lowering in the context of the AVX2 ISA.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218228 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-22 00:32:15 +00:00
Chandler Carruth
37bb4b0365 [x86] Teach the new vector shuffle lowering how to cleverly lower single
input v8f32 shuffles which are not 128-bit lane crossing but have
different shuffle patterns in the low and high lanes. This removes most
of the extract/insert traffic that was unnecessary and is particularly
good at lowering cases where only one of the two lanes is shuffled at
all.

I've also added a collection of test cases with undef lanes because this
lowering is somewhat more sensitive to undef lanes than others.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218226 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 23:46:13 +00:00
Chandler Carruth
7da57cf5b4 [x86] Add a bunch of test cases where we have different shuffle patterns
in the high and low 128-bit lanes of a v8f32 vector.

No functionality change yet, but wanted to set up the baseline for my
next patch which will make these quite a bit better. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218224 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 23:32:42 +00:00
Chandler Carruth
974542d7d8 [x86] Teach the new vector shuffle lowering to re-use the SHUFPS
lowering when it can use a symmetric SHUFPS across both 128-bit lanes.

This required making the SHUFPS lowering tolerant of other vector types,
and adjusting our canonicalization to canonicalize harder.

This is the last of the clever uses of symmetry I've thought of for
v8f32. The rest of the tricks I'm aware of here are to work around
assymetry in the mask.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218216 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 13:35:14 +00:00
Chandler Carruth
1a5f7f54f4 [x86] Teach the new vector shuffle lowering the basics about insertion
of a single element into a zero vector for v4f64 and v4i64 in AVX.
Ironically, there is less to see here because xor+blend is so crazy fast
that we can't really beat that to zero the high 128-bit lane.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218214 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 12:49:46 +00:00
Chandler Carruth
6ef31b0079 [x86] Teach the new vector shuffle lowering how to lower to UNPCKLPS and
UNPCKHPS with AVX vectors by recognizing those patterns when they are
repeated for both 128-bit lanes.

With this, we now generate the exact same (really nice) code for
Quentin's avx_test_case.ll which was the most significant regression
reported for the new shuffle lowering. In fact, I'm out of specific test
cases for AVX lowering, the rest were AVX2 I think. However, there are
a bunch of pretty obvious remaining things to improve with AVX...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218213 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 12:20:44 +00:00
Chandler Carruth
7a94357b04 [x86] Add test cases for UNPCK instructions with v8f32 AVX vectors in
preparation for enhancing their support in the new vector shuffle
lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218212 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 12:13:11 +00:00
Chandler Carruth
7922d3e39a [x86] Begin teaching the new vector shuffle lowering among the most
important bits of cleverness: to detect and lower repeated shuffle
patterns between the two 128-bit lanes with a single instruction.

This patch just teaches it how to lower single-input shuffles that fit
this model using VPERMILPS. =] There is more that needs to happen here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218211 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 12:01:19 +00:00
Chandler Carruth
e4cb9d5f25 [x86] Regenerate this test case now that I've improved my script for
generating the test cases to format things more consistently and
actually catch all the operand sequences that should be elided in favor
of the asm comments. No actual changes here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218210 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 11:51:33 +00:00
Chandler Carruth
29720a4bad [x86] Teach the new vector shuffle lowering of v4f64 to prefer a direct
VBLENDPD over using VSHUFPD. While the 256-bit variant of VBLENDPD slows
down to the same speed as VSHUFPD on Sandy Bridge CPUs, it has twice the
reciprocal throughput on Ivy Bridge CPUs much like it does everywhere
for 128-bits. There isn't a downside, so just eagerly use this
instruction when it suffices.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218208 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 11:17:55 +00:00
Chandler Carruth
0dd52092d0 [x86] Add some more comprehensive tests for v4f64 blending.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218207 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 11:12:19 +00:00
Chandler Carruth
57191b0b48 [x86] Re-generate a bunch of the v4f64 test cases with my new script.
This expands the integer cases to cover the fact that AVX2 moves their
lane-crossing shuffles into the integer domain. It also adds proper
support for AVX2 run lines and the "ALL" group when it doesn't matter.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218206 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 11:07:41 +00:00
Chandler Carruth
291140b112 [x86] Teach the new vector shuffle lowering the first step toward more
actual support for complex AVX shuffling tricks. We can do independent
blends of the low and high 128-bit lanes of an avx vector, so shuffle
the inputs into place and then do the blend at 256 bits. This will in
many cases remove one blend instruction.

The next step is to permute the low and high halves in-place rather than
extracting them and re-inserting them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218202 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 09:35:22 +00:00
Chandler Carruth
1ca1e33c3a [x86] Add some more test cases covering specific blend patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218200 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 09:01:26 +00:00
Chandler Carruth
b7ef7f97a8 [x86] Add the beginnings of some tests for our v8f32 shuffle lowering
under AVX.

This really just documents the current state of the world. I'm going to
try to flesh it out to cover any test cases I plan to improve prior to
improving them so that the delta made by changes is actually visible to
code reviewers.

This is made easier by the fact that I now have a script to automate the
process of producing test cases including the check lines. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218199 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 08:49:27 +00:00
Chandler Carruth
ae464b2ba1 [x86] Teach the new vector shuffle lowering to use VPERMILPD for
single-input shuffles with doubles. This allows them to fold memory
operands into the shuffle, etc. This is just the analog to the v4f32
case in my prior commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218193 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 22:09:27 +00:00
Chandler Carruth
479d0ba62b [x86] Add an AVX run to the 128-bit v2 tests, teach them to have
a generic SSE and AVX mode in addition to a specific AVX1 test path, and
flesh out the AVX tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218192 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 21:26:41 +00:00
David Majnemer
182c8ff6c0 Update tests which broke from r218189
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218191 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 21:18:43 +00:00
Chandler Carruth
9c7ffd20df [x86] Teach the new vector shuffle lowering to use the AVX VPERMILPS
instruction for single-vector floating point shuffles. This in turn
allows the shuffles to fold a load into the instruction which is one of
the common regressions hit with the new shuffle lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218190 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 20:52:07 +00:00
Chandler Carruth
cc727ec92a [x86] Start moving to a fancier check syntax to reduce the need for
duplication of check lines. The idea is to have broad sets of
compilation modes that will frequently diverge without having to always
and immediately explode to the precise ISA feature set.

While this already helps due to VEX encoded differences, it will help
much more as I teach the new shuffle lowering about more of the new VEX
encoded instructions which can still be used to implement 128-bit
shuffles.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218188 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 18:36:39 +00:00
Chandler Carruth
c16105b078 [x86] Teach the v4f32 path of the new shuffle lowering to handle the
tricky case of single-element insertion into the zero lane of a zero
vector.

We can't just use the same pattern here as we do in every other vector
type because the general insertion logic can handle insertion into the
non-zero lane of the vector. However, in SSE4.1 with v4f32 vectors we
have INSERTPS that is a much better choice than the generic one for such
lowerings. But INSERTPS can do lots of other lowerings as well so
factoring its logic into the general insertion logic doesn't work very
well. We also can't just extract the core common part of the general
insertion logic that is faster (forming VZEXT_MOVL synthetic nodes that
lower to MOVSS when they can) because VZEXT_MOVL is often *faster* than
a blend while INSERTPS is slower! So instead we do a restrictive
condition on attempting to use the generic insertion logic to narrow it
to those cases where VZEXT_MOVL won't need a shuffle afterward and thus
will do better than INSERTPS. Then we try blending. Then we go back to
INSERTPS.

This still doesn't generate perfect code for some silly reasons that can
be fixed by tweaking the td files for lowering VZEXT_MOVL to use
XORPS+BLENDPS when available rather than XORPS+MOVSS when the input ends
up in a register rather than a load from memory -- BLENDPSrr has twice
the reciprocal throughput of MOVSSrr. Don't you love this ISA?

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218177 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 04:15:22 +00:00
Chandler Carruth
cc62abbe39 [x86] Generalize the single-element insertion lowering to work with
floating point types and use it for both v2f64 and v2i64 single-element
insertion lowering.

This fixes the last non-AVX performance regression test case I've gotten
of for the new vector shuffle lowering. There is obvious analogous
lowering for v4f32 that I'll add in a follow-up patch (because with
INSERTPS, v4f32 requires special treatment). After that, its AVX stuff.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218175 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 03:32:25 +00:00
Peter Collingbourne
87f7e75e58 Fix crash with an insertvalue that produces an empty object.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218171 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-20 00:10:47 +00:00
Matt Arsenault
1a505ebae4 R600: Un-xfail a test which passes with pass disabled
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218165 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 23:02:20 +00:00
Matt Arsenault
ea3a0242f4 R600/SI: Un-xfail tests which work now
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218164 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 23:02:18 +00:00
Matt Arsenault
c58ab80f78 R600/SI: Un xfail a test that works now
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218162 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 22:42:40 +00:00
Juergen Ributzka
faf93a6e0c [FastIsel][AArch64] Fix a think-o in address computation.
When looking through sign/zero-extensions the code would always assume there is
such an extension instruction and use the wrong operand for the address.

There was also a minor issue in the handling of 'AND' instructions. I
accidentially used a 'cast' instead of a 'dyn_cast'.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218161 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 22:23:46 +00:00
Chandler Carruth
dc58d1e099 [x86] Fully generalize the zext lowering in the new vector shuffle
lowering to support both anyext and zext and to custom lower for many
different microarchitectures.

Using this allows us to get *exactly* the right code for zext and anyext
shuffles in all the vector sizes. For v16i8, the improvement is *huge*.
The new SSE2 test case added I refused to add before this because it was
sooooo muny instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218143 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 20:00:32 +00:00
Matt Arsenault
c14f7630e0 R600/SI: Fix test to prepare for scheduler
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218131 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 18:11:16 +00:00
Hal Finkel
c404e8208c Optionally enable more-aggressive FMA formation in DAGCombine
The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one
use, but this is overly-conservative on some systems. Specifically, if the FMA
and the FADD have the same latency (and the FMA does not compete for resources
with the FMUL any more than the FADD does), there is no need for the
restriction, and furthermore, forming the FMA leaving the FMUL can still allow
for higher overall throughput and decreased critical-path length.

Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to
elide the hasOneUse check. This is enabled for PowerPC by default, as most
PowerPC systems will benefit.

Patch by Olivier Sallenave, thanks!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218120 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 11:42:56 +00:00
Chandler Carruth
89436b4160 [x86] Recognize that we can use duplication to widen v16i8 shuffles due
to undef lanes as well as defined widenable lanes. This dramatically
improves the lowering we use for undef-shuffles in a zext-ish pattern
for SSE2.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218115 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 09:45:21 +00:00
Chandler Carruth
3e990c1e5b [x86] Actually test the SSE2 lowering for most of the zext-ish shuffles.
Not sure why I only did SSSE3 here. Also, I've left out some of the SSE2
ones because the shuffles are so absurd it's not worth transcribing
them. Will try to fix them to be sane and then check them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218114 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 08:51:06 +00:00
Chandler Carruth
ec1f7b1c87 [x86] Teach the new vector shuffle lowering to also use pmovzx for v4i32
shuffles that are zext-ing.

Not a lot to see here; the undef lane variant is better handled with
pshufd, but this improves the actual zext pattern.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218112 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 08:37:44 +00:00
Chandler Carruth
330aa6fd6b [x86] Add a dedicated lowering path for zext-compatible vector shuffles
to the new vector shuffle lowering code.

This allows us to emit PMOVZX variants consistently for patterns where
it is a viable lowering. This instruction is both fast and allows us to
fold loads into it. This only hooks the new lowering up for i16 and i8
element widths, mostly so I could manage the change to the tests. I'll
add the i32 one next, although it is significantly less interesting.

One thing to note is that we already had some tests for these patterns
but those tests had far less horrible instructions. The problem is that
those tests weren't checking the strict start and end of the instruction
sequence. =[ As a consequence something changed in the lowering making
us generate *TERRIBLE* code for these patterns in SSE2 through SSSE3.
I've consolidated all of the tests and spelled out the madness that we
currently emit for these shuffles. I'm going to try to figure out what
has gone wrong here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218102 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 06:07:49 +00:00
Jiangning Liu
61519cd699 Optimize sext/zext insertion algorithm in back-end.
With this optimization, we will not always insert zext for values crossing
basic blocks, but insert sext if the users of a value crossing basic block
has preference of sign predicate.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218101 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 05:30:35 +00:00
Hans Wennborg
2ee31bcdee Fix an it's vs. its typo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218093 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 01:14:56 +00:00
Matt Arsenault
bd2b96a12d R600: Better fix for bug 20982
Just do the left shift as unsigned to avoid the UB.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218092 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 00:42:06 +00:00
Chandler Carruth
9b676fd6f2 [x86] Extend this test to cover SSE4.1. Nothing interesting here, but
paves the way for subsequent changes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218091 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 00:30:24 +00:00
Quentin Colombet
65edced76b [ARM] Do not perform a tail call when the caller returns several values.
The fix is slightly different then x86 (see r216117) because the number of values
attached to a return can vary even for a single returned value (e.g., f64 yields
two returned values).

<rdar://problem/18352998>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218076 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 21:17:50 +00:00
Robin Morisset
5052940c27 Restore "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors"
Summary:
This patch was originally in D5304 (I could not find a way to reopen that revision).
It was accepted, commited and broke the build bots because the overloading of
the constructor of ArrayRef for braced initializer lists is not supported by all
toolchains. I then reverted it, and propose this fixed version that uses a plain
C array instead in makeDMB (that array is then converted implicitly to an
ArrayRef, but that is not behind an ifdef). Could someone confirm me whether
initialization lists for plain C arrays are supported by every toolchain used
to build llvm ? Otherwise I can just initialize the array in the old way:
args[0] = ...; .. ; args[5] = ...;

Below is the description of the original patch:
```
I had only tested this code for ARMv7 and ARMv8. This patch adds several
fallback paths if the processor does not support dmb ish:
- dmb sy if a cortex-M with support for dmb
- mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB)
These fallback paths were chosen based on the code for fence seq_cst.

Thanks to luqmana for having noticed this bug.
```

Test Plan: Added more cases to atomic-load-store.ll + make check-all

Reviewers: jfb, t.p.northover, luqmana

Subscribers: llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D5386

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218066 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 18:56:04 +00:00
Matt Arsenault
e08e52528b R600: Bug 20982 - Avoid undefined left shift of negative value
I'm not sure what the hardware actually does, so don't
bother trying to fold it for now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218057 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 15:52:26 +00:00
Chandler Carruth
72f0d9515e [x86] Use PALIGNR for v4i32 and v2i64 blends when appropriate.
There is no purpose in using it for single-input shuffles as
pshufd is just as fast and doesn't tie the two operands. This removes
a substantial amount of wrong-domain blend operations in SSSE3 mode. It
also completes the usage of PALIGNR for integer shuffles and addresses
one of the test cases Quentin hit with the new vector shuffle lowering.

There is still the question of whether and when to use this for floating
point shuffles. It is faster than shufps or shufpd but in the integer
domain. I don't yet really have a good heuristic here for when to use
this instruction for floating point vectors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218038 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 09:00:25 +00:00
Chandler Carruth
088aa097d5 [x86] Add an SSSE3 run and check mode to the 128-bit v2 tests of the new
vector shuffle lowering. This will be needed for up-coming palignr
tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218037 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 08:33:04 +00:00
Juergen Ributzka
f789dac2dd Revert "[FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ."
Reverting it until I have time to investigate a regression.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218035 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 08:07:40 +00:00
Juergen Ributzka
ef48b51126 Fix previous commit: [FastISel][AArch64] Simplify XALU multiplies.
When folding the intrinsic flag into the branch or select we also have to
consider the fact if the intrinsic got simplified, because it changes the
flag we have to check for.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218034 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 07:26:26 +00:00
Juergen Ributzka
e7fba004ce [FastISel][AArch64] Simplify XALU multiplies.
Simplify {s|u}mul.with.overflow to {s|u}add.with.overflow when possible.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218033 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 07:04:54 +00:00
Juergen Ributzka
4b6f00ad18 [FastISel][AArch64] Followup commit for 218031 to handle negative offsets too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218032 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 07:04:49 +00:00
Juergen Ributzka
22b557d942 [FastISel][AArch64] Try to fold the offset into the add instruction when simplifying a memory address.
Small optimization in 'simplifyAddress'. When the offset cannot be encoded in
the load/store instruction, then we need to materialize the address manually.
The add instruction can encode a wider range of immediates than the load/store
instructions. This change tries to fold the offset into the add instruction
first before materializing the offset in a register.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218031 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 05:40:47 +00:00
Juergen Ributzka
ffbd4879eb [FastISel][AArch64] Fold 'AND' instruction during the address computation.
The 'AND' instruction could be used to mask out the lower 32 bits of a register.
If this is done inside an address computation we might be able to fold the
instruction into the memory instruction itself.

and  x1, x1, #0xffffffff   ---> ldrb x0, [x0, w1, uxtw]
ldrb x0, [x0, x1]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218030 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 05:40:41 +00:00
Chandler Carruth
49ab1a424d [x86] Add an SSSE3 run to the v4 shuffle test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218028 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 04:38:32 +00:00
Chandler Carruth
3ff76847ba [x86] Initial step of teaching the new vector shuffle lowering about
PALIGNR. This just adds it to the v8i16 and v16i8 lowering steps where
it is completely unmatched. It also introduces the logic for detecting
rotation shuffle masks even in the presence of single input or blend
masks and arbitrarily undef lanes.

I've added fairly comprehensive tests for the matching logic in v8i16
because the tests at that size are much easier to write and manage.

I've not checked the SSE2 code generated for these tests because the
code is *horrible*. It is absolute madness. Testing it will just make
the test brittle without giving any interesting improvements in the
correctness confidence.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218013 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 04:11:29 +00:00
Juergen Ributzka
710fc316fb [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ.
Teach selectBranch to fold bit test and branch into a single instruction (TBZ or
TBNZ).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218010 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 02:44:13 +00:00
Samuel Antao
6693d0de3e Fix FastISel bug in boolean returns for PowerPC.
For PPC targets, FastISel does not take the sign extension information into account when selecting return instructions whose operands are constants. A consequence of this is that the return of boolean values is not correct. This patch fixes the problem by evaluating the sign extension information also for constants, forwarding this information to PPCMaterializeInt which takes this information to drive the sign extension during the materialization. 



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217993 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-17 23:25:06 +00:00
Juergen Ributzka
7516444a26 [FastISel][AArch64] Custom lower sdiv by power-of-2.
Emit an optimized instruction sequence for sdiv by power-of-2 depending on the
exact flag.

This fixes rdar://problem/18224511.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217986 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-17 21:55:55 +00:00
Juergen Ributzka
580875d39d [FastISel][AArch64] Simplify mul to shift when possible.
This is related to rdar://problem/18369687.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217980 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-17 20:35:41 +00:00
Alexey Samsonov
dc4eb3d6dc Exclude known and bugzilled failures from UBSan bootstrap
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217979 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-17 20:17:52 +00:00
Juergen Ributzka
46d6fd2908 [FastISel][AArch64] Fold mul into add/sub and logical operations.
Try to fold the multiply into the add/sub or logical operations (when
possible).

This is related to rdar://problem/18369687.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217978 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-17 19:51:38 +00:00
Juergen Ributzka
5461af97bc [FastISel][AArch64] Fold mul into the address computation of memory operations.
Teach 'computeAddress' to also fold multiplies into the address computation
(when possible).

This fixes rdar://problem/18369443.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217977 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-17 19:19:31 +00:00
Robin Morisset
e2ff4e489b Revert "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors"
It is breaking the build on the buildbots but works fine on my machine, I revert
while trying to understand what happens (it appears to depend on the compiler used
to build, I probably used a C++11 feature that is not perfectly supported by some
of the buildbots).

This reverts commit feb3176c4d006f99af8b40373abd56215a90e7cc.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217973 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-17 18:09:13 +00:00
Juergen Ributzka
07c9ae576c [FastISel][AArch64] Fold compare with zero and branch into CBZ and CBNZ.
This takes advanatage of the CBZ and CBNZ instruction to further optimize the
common null check pattern into a single instruction.

This is related to rdar://problem/18358882.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217972 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-17 18:05:34 +00:00
Juergen Ributzka
17e0ee5078 [FastISel][AArch64] Improve branch selection to support all FP conditions.
This adds the last two missing floating-point condition codes (FCMP_UEQ and
FCMP_ONE) also to the branch selection. In these two cases an additonal branch
instruction is required.

This also adds unit tests to checks all the different condition codes.

This is related o rdar://problem/18358882.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217966 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-17 17:46:47 +00:00
Robin Morisset
30486fa3de [ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors
Summary:
I had only tested this code for ARMv7 and ARMv8. This patch adds several
fallback paths if the processor does not support dmb ish:
- dmb sy if a cortex-M with support for dmb
- mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB)
These fallback paths were chosen based on the code for fence seq_cst.

Thanks to luqmana for having noticed this bug.

Test Plan: Added more cases to atomic-load-store.ll + make check-all

Reviewers: jfb, t.p.northover, luqmana

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5304

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217965 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-17 17:41:16 +00:00
Matt Arsenault
507636288f R600/SI: Change formatting of printed FP immediates
Only 1 decimal place should be printed for inline immediates.
Other constants should be hex constants.

Does not include f64 tests because folding those inline
immediates currently does not work.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217964 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-17 17:32:13 +00:00
Pavel Chupin
780f7e2168 [x32] Fix function indirect calls
Summary: Zero-extend register to 64-bit for callq/jmpq.

Test Plan: 3 tests added

Reviewers: nadav, dschuff

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D5355

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217942 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-17 07:09:23 +00:00
Quentin Colombet
9fe79b48b8 [CodeGenPrepare][AddressingModeMatcher] The promotion mechanism was expecting
instructions when truncate, sext, or zext were created. Fix that.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217926 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-16 22:36:07 +00:00
Juergen Ributzka
c9bc145e31 [FastISel][AArch64] Add vector support to argument lowering.
Lower the first 8 vector arguments too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217850 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-16 00:25:30 +00:00
Chandler Carruth
bad2c13aae [x86] As a follow-up to r217819, don't check for VSELECT legality now
that we don't use VSELECT and directly emit an addsub synthetic node.
Also remove a stale comment referencing VSELECT.

The test case is updated to use 'core2' which only has SSE3, not SSE4.1,
and it still passes. Previously it would not because we lacked
sufficient blend support to legalize the VSELECT.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217849 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-16 00:24:42 +00:00
Chandler Carruth
cba9d1273a [x86] Add the beginnings of a proper DAG combine to match ADDSUBPS and
ADDSUBPD nodes out of blends of adds and subs.

This allows us to actually form these instructions with SSE3 rather than
only forming them when we had both SSE3 for the ADDSUB instructions and
SSE4.1 for the blend instructions. ;] Kind-of important.

I've adjusted the CPU requirements on one of the tests to demonstrate
this kicking in nicely for an SSE3 cpu configuration.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217848 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-16 00:15:20 +00:00
Juergen Ributzka
c0f00e90d2 [FastISel][AArch64] Add missing test case for previous commit.
This adds the missing test case for the previous commit:
Allow handling of vectors during return lowering for little endian machines.

Sorry for the noise.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217847 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 23:47:57 +00:00
Juergen Ributzka
df445d7af2 [FastISel][AArch64] Lower sin/cos/pow to runtime lib calls.
Also lower sin/cos/pow to runtime lib calls.

This fixes rdar://problem/18343468.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217839 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 22:33:06 +00:00
Rafael Espindola
d41a46e942 Add back tests for empty function in SPARC and PowerPC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217834 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 22:11:07 +00:00
Juergen Ributzka
323445f706 [FastISel][AArch64] Add lowering support for frem.
This lowers frem to a runtime libcall inside fast-isel.

The test case also checks the CallLoweringInfo bug that was exposed by this
change.

This fixes rdar://problem/18342783.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217833 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 22:07:49 +00:00
Juergen Ributzka
86bdc1efbe [FastISel][AArch64] Improve floating-point compare support.
Add support for the last two missing fcmp condition codes: UEQ and ONE.

This fixes rdar://problem/18341575.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217823 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 20:47:16 +00:00
Reed Kotler
34ad085eec Add mips32 r1 to the list of supported targets for Mips fast-isel
Summary:
Expand list of supported targets for Mips to include mips32 r1.
Previously it only include r2. More patches are coming where there is 
a difference but in the current patches as pushed upstream, r1 and r2
are equivalent.

Test Plan:
simplestorefp1.ll

add new build bots at mips to test this flavor at both -O0 and -O2

Reviewers: dsanders

Reviewed By: dsanders

Differential Revision: http://reviews.llvm.org/D5306

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217821 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 20:30:25 +00:00
NAKAMURA Takumi
85deed0525 llvm/test/CodeGen/X86/peephole-fold-movsd.ll: Relax an expression for win32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217806 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 19:00:31 +00:00
Rafael Espindola
d58cb55353 Add a triple to fix the bots.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217805 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 18:54:41 +00:00
Rafael Espindola
3f0ce4fa18 Fix a lot of confusion around inserting nops on empty functions.
On MachO, and MachO only, we cannot have a truly empty function since that
breaks the linker logic for atomizing the section.

When we are emitting a frame pointer, the presence of an unreachable will
create a cfi instruction pointing past the last instruction. This is perfectly
fine. The FDE information encodes the pc range it applies to. If some tool
cannot handle this, we should explicitly say which bug we are working around
and only work around it when it is actually relevant (not for ELF for example).

Given the unreachable we could omit the .cfi_def_cfa_register, but then
again, we could also omit the entire function prologue if we wanted to.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217801 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 18:32:58 +00:00
Quentin Colombet
49e423ca30 [CodeGenPrepare][AddressingModeMatcher] Fix a think-o for the sext(zext) -> zext promotion
introduced in r217629.
We were returning the old sext instead of the new zext as the promoted instruction!

Thanks Joerg Sonnenberger for the test case.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217800 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 18:26:58 +00:00
Akira Hatanaka
348e9e7b6d [X86] Fix a bug in X86's peephole optimization.
Peephole optimization was folding MOVSDrm, which is a zero-extending double
precision floating point load, into ADDPDrr, which is a SIMD add of two packed
double precision floating point values.

(before)
%vreg21<def> = MOVSDrm <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg21
%vreg23<def,tied1> = ADDPDrr %vreg20<tied0>, %vreg21; VR128:%vreg23,%vreg20,%vreg21

(after)
%vreg23<def,tied1> = ADDPDrm %vreg20<tied0>, <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg23,%vreg20

X86InstrInfo::foldMemoryOperandImpl already had the logic that prevented this
from happening. However the check wasn't being conducted for loads from stack
objects. This commit factors out the logic into a new function and uses it for
checking loads from stack slots are not zero-extending loads.

rdar://problem/18236850


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217799 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 18:23:52 +00:00
Matt Arsenault
f1b16047b7 R600/SI: Prefer selecting more e64 instruction forms.
Add some more tests to make sure better operand
choices are still made. Leave some cases that seem
to have no reason to ever be e64 alone.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217789 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 17:15:02 +00:00
Matt Arsenault
6fc71a0cfc R600/SI: Make sure double vector fmul is tested
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217787 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 17:04:54 +00:00
Matt Arsenault
e626ee51b6 R600/SI: Add some mubuf testcases.
I noticed some odd looking cases where addr64 wasn't set
when storing to a pointer in an SGPR. This seems to be intentional,
and partially tested already.

The documentation seems to describe addr64 in terms of which registers
addressing modifiers come from, but I would expect to always need
addr64 when using 64-bit pointers. If no offset is applied,
it makes sense to not need to worry about doing a 64-bit add
for the final address. A small immediate offset can be applied,
so is it OK to not have addr64 set if a carry is necessary when adding
the base pointer in the resource to the offset?

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217785 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 16:48:01 +00:00
Matt Arsenault
d189a0407d R600/SI: Add preliminary support for flat address space
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217777 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 15:41:53 +00:00
Chandler Carruth
c5371836a5 [x86] Begin emitting PBLENDW instructions for integer blend operations
when SSE4.1 is available.

This removes a ton of domain crossing from blend code paths that were
ending up in the floating point code path.

This is just the tip of the iceberg though. The real switch is for
integer blend lowering to more actively rely on this instruction being
available so we don't hit shufps at all any longer. =] That will come in
a follow-up patch.

Another place where we need better support is for using PBLENDVB when
doing so avoids the need to have two complementary PSHUFB masks.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217767 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 12:40:54 +00:00
Chandler Carruth
9277ad2d36 [x86] Add an explicit SSE3 run to this test and flesh out a bunch of
missing specific checks.

While there is a lot of redundancy here where all-but-one mode use the
same code generation, I'd rather have each variant spelled out and
checked so that readers aren't misled by an omission in the test suite.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217765 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 11:40:20 +00:00
Chandler Carruth
2fdec16fbe [x86] Teach the x86 DAG combiner to form UNPCKLPS and UNPCKHPS
instructions from the relevant shuffle patterns.

This is the last tweak I'm aware of to generate essentially perfect
v4f32 and v2f64 shuffles with the new vector shuffle lowering up through
SSE4.1. I'm sure I've missed some and it'd be nice to check since v4f32
is amenable to exhaustive exploration, but this is all of the tricks I'm
aware of.

With AVX there is a new trick to use the VPERMILPS instruction, that's
coming up in a subsequent patch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217761 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 11:26:25 +00:00
Chandler Carruth
08780d4c1d [x86] Teach the x86 DAG combiner to form MOVSLDUP and MOVSHDUP
instructions when it finds an appropriate pattern.

These are lovely instructions, and its a shame to not use them. =] They
are fast, and can hand loads folded into their operands, etc.

I've also plumbed the comment shuffle decoding through the various
layers so that the test cases are printed nicely.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217758 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 11:15:23 +00:00
Chandler Carruth
04402a6c13 [x86] Undo a flawed transform I added to form UNPCK instructions when
AVX is available, and generally tidy up things surrounding UNPCK
formation.

Originally, I was thinking that the only advantage of PSHUFD over UNPCK
instruction variants was its free copy, and otherwise we should use the
shorter encoding UNPCK instructions. This isn't right though, there is
a larger advantage of being able to fold a load into the operand of
a PSHUFD. For UNPCK, the operand *must* be in a register so it can be
the second input.

This removes the UNPCK formation in the target-specific DAG combine for
v4i32 shuffles. It also lifts the v8 and v16 cases out of the
AVX-specific check as they are potentially replacing multiple
instructions with a single instruction and so should always be valuable.
The floating point checks are simplified accordingly.

This also adjusts the formation of PSHUFD instructions to attempt to
match the shuffle mask to one which would fit an UNPCK instruction
variant. This was originally motivated to allow it to match the UNPCK
instructions in the combiner, but clearly won't now.

Eventually, we should add a MachineCombiner pass that can form UNPCK
instructions post-RA when the operand is known to be in a register and
thus there is no loss.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217755 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 10:35:41 +00:00
Chandler Carruth
a6cc351c5b [x86] Teach the new vector shuffle lowering to use 'punpcklwd' and
'punpckhwd' instructions when suitable rather than falling back to the
generic algorithm.

While we could canonicalize to these patterns late in the process, that
wouldn't help when the freedom to use them is only visible during
initial lowering when undef lanes are well understood. This, it turns
out, is very important for matching the shuffle patterns that are used
to lower sign extension. Fixes a small but relevant regression in
gcc-loops with the new lowering.

When I changed this I noticed that several 'pshufd' lowerings became
unpck variants. This is bad because it removes the ability to freely
copy in the same instruction. I've adjusted the widening test to handle
undef lanes correctly and now those will correctly continue to use
'pshufd' to lower. However, this caused a bunch of churn in the test
cases. No functional change, just churn.

Both of these changes are part of addressing a general weakness in the
new lowering -- it doesn't sufficiently leverage undef lanes. I've at
least a couple of patches that will help there at least in an academic
sense.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217752 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-15 09:02:37 +00:00
Chandler Carruth
e610c324e1 [x86] Teach the new vector shuffle lowering to use BLENDPS and BLENDPD.
These are super simple. They even take precedence over crazy
instructions like INSERTPS because they have very high throughput on
modern x86 chips.

I still have to teach the integer shuffle variants about this to avoid
so many domain crossings. However, due to the particular instructions
available, that's a touch more complex and so a separate patch.

Also, the backend doesn't seem to realize it can commute blend
instructions by negating the mask. That would help remove a number of
copies here. Suggestions on how to do this welcome, it's an area I'm
less familiar with.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217744 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-14 23:43:33 +00:00
NAKAMURA Takumi
0309c5d4bc llvm/test/CodeGen/X86/vec_shuffle-38.ll: Add explicit -mtriple=x86_64-unknown to avoid incompatibility of win32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217742 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-14 23:39:01 +00:00
Chandler Carruth
59def2bffa [x86] Add an SSE41 mode to this test. Nothing interesting here, its the
same as SSE3.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217741 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-14 23:28:12 +00:00
Chandler Carruth
151a867774 [x86] Switch this test to use an ALL prefix with special SSE2 and SSE3
variants where significant.

This will make it more obvious what is happening when we start using
blends in SSE41.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217740 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-14 23:19:37 +00:00
Chandler Carruth
85e37090e8 [x86] Add some test cases where we should emit blendpd in SSE4.1. No
actual change yet though.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217739 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-14 23:15:52 +00:00
Chandler Carruth
33957173a7 [x86] Teach the vector combiner that picks a canonical shuffle from to
support transforming the forms from the new vector shuffle lowering to
use 'movddup' when appropriate.

A bunch of the cases where we actually form 'movddup' don't actually
show up in the test results because something even later than DAG
legalization maps them back to 'unpcklpd'. If this shows back up as
a performance problem, I'll probably chase it down, but it is at least
an encoded size loss. =/

To make this work, also always do this canonicalizing step for floating
point vectors where the baseline shuffle instructions don't provide any
free copies of their inputs. This also causes us to canonicalize
unpck[hl]pd into mov{hl,lh}ps (resp.) which is a nice encoding space
win.

There is one test which is "regressed" by this: extractelement-load.
There, the test case where the optimization it is testing *fails*, the
exact instruction pattern which results is slightly different. This
should probably be fixed by having the appropriate extract formed
earlier in the DAG, but that would defeat the purpose of the test.... If
this test case is critically important for anyone, please let me know
and I'll try to work on it. The prior behavior was actually contrary to
the comment in the test case and seems likely to have been an accident.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217738 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-14 22:41:37 +00:00
Matt Arsenault
a0ba49844c R600/SI: Fix broken check lines
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217736 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-14 18:32:05 +00:00
Juergen Ributzka
5bf1f01c15 [FastISel][AArch64] Add support for non-native types for logical ops.
Extend the logical ops selection to also support non-native types such as i1,
i8, and i16.

Fixes rdar://problem/18330589.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217732 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-13 23:46:28 +00:00
Chad Rosier
4fb3a966d0 [AArch64] Enable post-RA MI scheduler.
Phabricator Revision: http://reviews.llvm.org/D5278
Patch by Sanjin Sijaric!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217693 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 17:40:39 +00:00
NAKAMURA Takumi
35d9566e68 llvm/test/CodeGen/X86/vec_ctbits.ll: Add explicit -mtriple=x86_64-unknown. It was incompatible to Win32 x64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217683 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 15:10:56 +00:00
Bill Schmidt
183704cb08 Address comments on r217622
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217680 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 14:26:36 +00:00
Benjamin Kramer
66feb63c3c Legalizer: Use the scalar bit width when promoting bit counting instrs on
vectors.

e.g. when promoting ctlz from <2 x i32> to <2 x i64> we have to fixup
the result by 32 bits, not 64. PR20917.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217671 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 12:50:27 +00:00
Matt Arsenault
86ffcddf42 R600/SI: Fix off by 1 error in used register count
The register numbers start at 0, so if only 1 register
was used, this was reported as 0.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217636 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 22:51:37 +00:00
Quentin Colombet
dcc0e7eaa1 [CodeGenPrepare] Teach the addressing mode matcher how to promote zext.
I.e., teach it about 'sext (zext a to ty) to ty2' => zext a to ty2.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217629 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 21:22:14 +00:00
Bill Schmidt
d4604dca94 Add missing colon to RUN line...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217623 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 20:13:52 +00:00
Bill Schmidt
24ebd0edd6 [PATCH, PowerPC] Accept 'U' and 'X' constraints in inline asm
Inline asm may specify 'U' and 'X' constraints to print a 'u' for an
update-form memory reference, or an 'x' for an indexed-form memory
reference.  However, these are really only useful in GCC internal code
generation.  In inline asm the operand of the memory constraint is
typically just a register containing the address, so 'U' and 'X' make
no sense.

This patch quietly accepts 'U' and 'X' in inline asm patterns, but
otherwise does nothing.  If we ever unexpectedly see a non-register,
we'll assert and sort it out afterwards.

I've added a new test for these constraints; the test case should be
used for other asm-constraints changes down the road.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217622 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 20:10:03 +00:00
Matt Arsenault
e7d5fc9e53 Add triple to test to fix bots
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217612 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 17:50:20 +00:00
Brad Smith
da9bce2e13 Provide an implementation of getNoopForMachoTarget for SPARC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217611 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 17:40:51 +00:00
Matt Arsenault
31b1bdbd95 Add DAG combine for shl + add of constants.
Do
 (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)

This is already done for multiplies, but since multiplies
by powers of two are turned into shifts, we also need
to handle it here.

This might want checks for isLegalAddImmediate to avoid
transforming an add of a legal immediate with one that isn't.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217610 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 17:34:19 +00:00
Adam Nemet
49f31255be [AVX512] Fix miscompile for unpack
r189189 implemented AVX512 unpack by essentially performing a 256-bit unpack
between the low and the high 256 bits of src1 into the low part of the
destination and another unpack of the low and high 256 bits of src2 into the
high part of the destination.

I don't think that's how unpack works.  AVX512 unpack simply has more 128-bit
lanes but other than it works the same way as AVX.  So in each 128-bit lane,
we're always interleaving certain parts of both operands rather different
parts of one of the operands.

E.g. for this:
__v16sf a = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
__v16sf b = { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 };
__v16sf c = __builtin_shufflevector(a, b, 0, 8, 1, 9, 4, 12, 5, 13, 16,
	    			       	     24, 17, 25, 20, 28, 21, 29);

we generated punpcklps (notice how the elements of a and b are not interleaved
in the shuffle).  In turn, c was set to this:

  0 16 1 17 4 20 5 21 8 24 9 25 12 28 13 29

Obviously this should have just returned the mask vector of the shuffle
vector.

I mostly reverted this change and made sure the original AVX code worked
for 512-bit vectors as well.

Also updated the tests because they matched the logic from the code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217602 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 16:51:10 +00:00
Sanjay Patel
558cd3c6a2 Add triple and remove hashes to account for buildbot differences in comment strings.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217601 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 16:08:44 +00:00
Sanjay Patel
04bb0e721f Combine fmul vector FP constants when unsafe math is allowed.
This is an extension of the change made with r215820:
http://llvm.org/viewvc/llvm-project?view=revision&revision=215820

That patch allowed combining of splatted vector FP constants that are multiplied.

This patch allows combining non-uniform vector FP constants too by relaxing the
check on the type of vector. Also, canonicalize a vector fmul in the
same way that we already do for scalars - if only one operand of the fmul is a
constant, make it operand 1. Otherwise, we miss potential folds.

This fold is also done by -instcombine, but it's possible that extra
fmuls may have been generated during lowering.

Differential Revision: http://reviews.llvm.org/D5254



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217599 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 15:45:27 +00:00
Aaron Watry
1ff44854f6 R600: Test local atomics for evergreen
Now that the operations are all implemented, we can test this sub-arch here.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217595 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 15:02:52 +00:00
Tilmann Scheller
c1df48dde2 [ARM] Add Thumb-2 code size optimization regression test for LSR (register).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217582 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 10:45:50 +00:00
Tilmann Scheller
171bd26061 [ARM] Add Thumb-2 code size optimization regression test for LSR (immediate).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217581 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 10:42:17 +00:00
Arnaud A. de Grandmaison
d1c83953b9 [AArch64] Reenable the PBQP test now that the leak issue has been fixed.
David Blaikie's commits r217563 & r217564, which added shared_ptr to the
CostPool have fixed some memory leak issues exposed by the PBQP with
coalescing constraints.

The sanitizer bot was failing because of those leaks. Now that the leaks
are gone, we can reenable the aarch64/pbqp test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217580 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 10:39:52 +00:00
Tilmann Scheller
993f84c9bc [ARM] Add Thumb-2 code size optimization regression test for LSL (register).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217579 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 10:33:39 +00:00
Tilmann Scheller
98ff94a759 [ARM] Add Thumb2 code size optimization regression test for LSL (immediate).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217576 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 10:29:42 +00:00
Chandler Carruth
3c1808f628 [x86] Fixup r217565 which baked in an assumption about the function
name that breaks on some platforms. This part of the test just doesn't
matter...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217575 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 10:21:25 +00:00
David Xu
65aac0f8e3 Build correct vector filled with undef nodes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217570 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 05:10:28 +00:00
Chandler Carruth
463a096177 [x86] FileCheck-ize this test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217565 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 00:13:35 +00:00
Matt Arsenault
5ee5d45e7e R600/SI: Fix losing chain when fixing reg class of loads.
The lost chain resulting in earlier side effecting nodes
being deleted.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217561 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 23:26:19 +00:00
Matt Arsenault
257e85e7c2 R600: Custom lower frem
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217553 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 21:44:27 +00:00
Arnaud A. de Grandmaison
50196a89d1 [AArch64] Temporarily desactivate the PBQP test, while I investigate some leaks in the allocator
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217531 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 18:40:18 +00:00
Arnaud A. de Grandmaison
438669ca81 [AArch64] Add experimental PBQP support
This adds target specific support for using the PBQP register allocator on the
AArch64, for the A57 cpu.

By default, the PBQP allocator is not used, unless explicitely required
on the command line with "-aarch64-pbqp".

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217504 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 14:06:10 +00:00
Asiri Rathnayake
3babc141b2 [AArch 64] Use a constant pool load for weak symbol references when
using static relocation model and small code model.

Summary: currently we generate GOT based relocations for weak symbol
references regardless of the underlying relocation model. This should
be change so that in static relocation model we use a constant pool
load instead.

Patch from: Keith Walker

Reviewers: Renato Golin, Tim Northover

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217503 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 13:54:38 +00:00
Tim Northover
01dbae1163 ARM: don't size-reduce STMs using the LR register.
The only Thumb-1 multi-store capable of using LR is the PUSH instruction, which
translates to STMDB, so we shouldn't convert STMIAs.

Patch by Sergey Dmitrouk.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217498 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 12:53:28 +00:00
Job Noorman
77f923cfc1 Drop the W postfix on the 16-bit registers.
This ensures the inline assembly register constraints are properly recognised in
TargetLowering::getRegForInlineAsmConstraint.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217479 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 06:58:14 +00:00
Pavel Chupin
586994a74e [x32] Emit callq for CALLpcrel32
Summary:
In AT&T annotation for both x86_64 and x32 calls should be printed as
callq in assembly. It's only a matter of correct mnemonic, object output
is ok.

Test Plan: trivial test added

Reviewers: nadav, dschuff, craig.topper

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D5213

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217435 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-09 11:54:12 +00:00
Renato Golin
ccfbbaca3f ARM: Negative offset support problem
This patch is to permit a negative offset usage for a non frame access.

Patch by Igor Oblakov.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217431 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-09 09:57:59 +00:00
Bob Wilson
086832979b Set trunc store action to Expand for all X86 targets.
When compiling without SSE2, isTruncStoreLegal(F64, F32) would return Legal, whereas with SSE2 it would return Expand. And since the Target doesn't seem to actually handle a truncstore for double -> float, it would just output a store of a full double in the space for a float hence overwriting other bits on the stack.

Patch by Luqman Aden!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217410 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-09 01:13:36 +00:00
Hans Wennborg
4cd53531fd Fast-ISel: Remove dead code after falling back from selecting call instructions (PR20863)
Previously, fast-isel would not clean up after failing to select a call
instruction, because it would have called flushLocalValueMap() which moves
the insertion point, making SavedInsertPt in selectInstruction() invalid.

Fixing this by making SavedInsertPt a member variable, and having
flushLocalValueMap() update it.

This removes some redundant code at -O0, and more importantly fixes PR20863.

Differential Revision: http://reviews.llvm.org/D5249

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217401 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-08 20:24:10 +00:00
Matt Arsenault
ef4bb30475 R600/SI: Replace LDS atomics with no return versions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217379 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-08 15:07:31 +00:00
Chad Rosier
b30d031de4 [AArch64] Improve AA to remove unneeded edges in the AA MI scheduling graph.
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!
Phabricator Review: http://reviews.llvm.org/D5103

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217371 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-08 14:43:48 +00:00
Chandler Carruth
8ceea90956 [x86] Revert my over-eager commit in r217332.
I hadn't actually run all the tests yet and these combines have somewhat
surprisingly far reaching effects.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217333 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-07 12:37:11 +00:00
Chandler Carruth
e328c5ea83 [x86] Tweak the rules surrounding 0,0 and 1,1 v2f64 shuffles and add
support for MOVDDUP which is really important for matrix multiply style
operations that do lots of non-vector-aligned load and splats.

The original motivation was to add support for MOVDDUP as the lack of it
regresses matmul_f64_4x4 by 5% or so. However, all of the rules here
were somewhat suspicious.

First, we should always be using the floating point domain shuffles,
regardless of how many copies we have to make as a movapd is *crazy*
faster than the domain switching cost on some chips. (Mostly because
movapd is crazy cheap.) Because SHUFPD can't do the copy-for-free trick
of the PSHUF instructions, there is no need to avoid canonicalizing on
UNPCK variants, so do that canonicalizing. This also ensures we have the
chance to form MOVDDUP. =]

Second, we assume SSE2 support when doing any vector lowering, and given
that we should just use UNPCKLPD and UNPCKHPD as they can operate on
registers or memory. If vectors get spilled or come from memory at all
this is going to allow the load to be folded into the operation. If we
want to optimize for encoding size (the only difference, and only
a 2 byte difference) it should be done *much* later, likely after RA.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217332 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-07 12:02:14 +00:00
Matt Arsenault
6a712e709d R600/SI: Relax a few tests to help enable scheduler
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217320 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-06 20:44:41 +00:00
Matt Arsenault
360ed46f68 R600/SI: Fix broken check lines.
Fix missing check, and hardcoded register numbers.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217318 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-06 20:37:56 +00:00
Chandler Carruth
7cd7154421 [x86] Fix a pretty horrible bug and inconsistency in the x86 asm
parsing (and latent bug in the instruction definitions).

This is effectively a revert of r136287 which tried to address
a specific and narrow case of immediate operands failing to be accepted
by x86 instructions with a pretty heavy hammer: it introduced a new kind
of operand that behaved differently. All of that is removed with this
commit, but the test cases are both preserved and enhanced.

The core problem that r136287 and this commit are trying to handle is
that gas accepts both of the following instructions:

  insertps $192, %xmm0, %xmm1
  insertps $-64, %xmm0, %xmm1

These will encode to the same byte sequence, with the immediate
occupying an 8-bit entry. The first form was fixed by r136287 but that
broke the prior handling of the second form! =[ Ironically, we would
still emit the second form in some cases and then be unable to
re-assemble the output.

The reason why the first instruction failed to be handled is because
prior to r136287 the operands ere marked 'i32i8imm' which forces them to
be sign-extenable. Clearly, that won't work for 192 in a single byte.
However, making thim zero-extended or "unsigned" doesn't really address
the core issue either because it breaks negative immediates. The correct
fix is to make these operands 'i8imm' reflecting that they can be either
signed or unsigned but must be 8-bit immediates. This patch backs out
r136287 and then changes those places as well as some others to use
'i8imm' rather than one of the extended variants.

Naturally, this broke something else. The custom DAG nodes had to be
updated to have a much more accurate type constraint of an i8 node, and
a bunch of Pat immediates needed to be specified as i8 values.

The fallout didn't end there though. We also then ceased to be able to
match the instruction-specific intrinsics to the instructions so
modified. Digging, this is because they too used i32 rather than i8 in
their signature. So I've also switched those intrinsics to i8 arguments
in line with the instructions.

In order to make the intrinsic adjustments of course, I also had to add
auto upgrading for the intrinsics.

I suspect that the intrinsic argument types may have led everything down
this rabbit hole. Pretty happy with the result.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217310 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-06 10:00:01 +00:00
Chandler Carruth
469c73bc27 [x86] Fix an embarressing bug in the INSERTPS formation code. The mask
computation was totally wrong, but somehow it didn't really show up with
llc.

I've added an assert that triggers on multiple existing test cases and
updated one of them to show the correct value.

There appear to still be more bugs lurking around insertps's mask. =/
However, note that this only really impacts the new vector shuffle
lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217289 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-05 23:19:45 +00:00
Sanjay Patel
52af82df95 Allow vector fsub ops with constants to get the same optimizations as scalars.
This problem is bigger than just fsub, but this is the minimum fix to solve
fneg for PR20556 ( http://llvm.org/bugs/show_bug.cgi?id=20556 ), and we solve
zero subtraction with the same change.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217286 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-05 22:26:22 +00:00
Rafael Espindola
eaa85e2027 Revert "Disable the fix for pr20793 because of a gnu ld bug."
This reverts commit r217211.

Both the bfd ld and gold outputs were valid. They were using a Rela relocation,
so the value present in the relocated location was not used, which caused me
to misread the output.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217264 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-05 18:03:38 +00:00
Matt Arsenault
89a7e3ec3e R600/SI: Use same complex patterns for DS atomics
This fixes hitting the same negative base offset problem
that was already fixed for regular loads and stores.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217256 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-05 16:24:58 +00:00
Jan Vesely
286f644bce R600: Fix FROUND
round halfway cases away from zero

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217250 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-05 14:26:54 +00:00
Tom Stellard
7cda2d0666 R600/SI: Use S_ADD_U32 and S_SUB_U32 for low half of 64-bit operations
https://bugs.freedesktop.org/show_bug.cgi?id=83416

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217248 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-05 14:07:59 +00:00
Chandler Carruth
c1c5dcf069 [x86] Factor out the zero vector insertion logic in the new vector
shuffle lowering for integer vectors and share it from v4i32, v8i16, and
v16i8 code paths.

Ironically, the SSE2 v16i8 code for this is now better than the SSSE3!
=] Will have to fix the SSSE3 code next to just using a single pshufb.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217240 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-05 10:36:31 +00:00
Jiangning Liu
b20b9bf9fd [AArch64] Add pass to enable additional comparison optimizations by CSE.
Patched by Sergey Dmitrouk.

This pass tries to make consecutive compares of values use same operands to
allow CSE pass to remove duplicated instructions. For this it analyzes
branches and adjusts comparisons with immediate values by converting:

GE -> GT
GT -> GE
LT -> LE
LE -> LT

and adjusting immediate values appropriately. It basically corrects two
immediate values towards each other to make them equal.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217220 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-05 02:55:24 +00:00
Rafael Espindola
6cf4a0f506 Disable the fix for pr20793 because of a gnu ld bug.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217211 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-05 00:14:12 +00:00
Rafael Espindola
295a0088db Fix pr20793.
With this patch the third field of llvm.global_ctors is also used on ELF.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217202 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-04 23:03:58 +00:00
Tim Northover
8dcac5d77a AArch64: fix vector-immediate BIC/ORR on big-endian devices.
Follow up to r217138, extending the logic to other NEON-immediate instructions.
As before, the instruction already performs the correct operation and we're
just using a different type for convenience, so we want a true nop-cast.

Patch by Asiri Rathnayake.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217159 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-04 15:05:24 +00:00
Tim Northover
dfe4e3e706 AArch64: fix big-endian immediate materialisation
We were materialising big-endian constants using DAG nodes with types different
from what was requested, followed by a bitcast. This is fine on little-endian
machines where bitcasting is a nop, but we need a slightly different
representation for big-endian. This adds a new set of NVCAST (natural-vector
cast) operations which are always nops.

Patch by Asiri Rathnayake.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217138 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-04 09:46:14 +00:00
Chandler Carruth
ae98867126 [x86] Teach the new v4i32 shuffle lowering some more tricks to recognize
vzext patterns and insert-element patterns that for SSE4 have dedicated
instructions.

With this we can enable the experimental mode in a regression test that
happens to cover some of the past set of issues. You can see that the
new logic does significantly better here on the floating point cases.

A follow-up to this change and the previous ones will hoist the logic
into helpers so it can be shared across element type sizes as in this
particular case it generalizes cleanly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217136 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-04 09:26:30 +00:00
Juergen Ributzka
cd72c216cd Revert r216803 "[MachineSinking] Clear kill flag of all operands at all their uses."
This reverts commit r216803, because it might have broken the buildbot.
The issue is tracked in PR20842.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217120 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-04 02:07:36 +00:00
Juergen Ributzka
68a4ab08b3 [FastISel][AArch64] Add target-specific lowering for logical operations.
This change adds support for immediate and shift-left folding into logical
operations.

This fixes rdar://problem/18223183.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217118 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-04 01:29:18 +00:00
Chandler Carruth
fa2dfaedf2 [x86] Teach the new vector shuffle lowering about the zero masking
abilities of INSERTPS which are really powerful and come up in very
important contexts such as forming diagonal matrices, etc.

With this I ended up being able to remove the somewhat weird helper
I added for INSERTPS because we can collapse the entire state to a no-op
mask. Added a bunch of tests for inserting into a zero-ish vector.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217117 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-04 01:13:48 +00:00
Matt Arsenault
c9cc488dfe R600/SI: Try to keep i32 mul on SALU
Also fix bug this exposed where when legalizing an immediate
operand, a v_mov_b32 would be created with a VSrc dest register.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217108 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-03 23:24:35 +00:00
Chandler Carruth
699fd1909e [x86] Teach the new vector shuffle lowering about the simplest of
'insertps' patterns.

This replaces two shuffles with a single insertps in very common cases.
My next patch will extend this to leverage the zeroing capabilities of
insertps which will allow it to be used in a much wider set of cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217100 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-03 22:48:34 +00:00
Chandler Carruth
36cf5d68be [x86] Add an SSE4.1 mode to this test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217072 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-03 20:39:06 +00:00
Chandler Carruth
87508f1d87 [x86] Make this test check everything for both SSE2 and AVX1 modes,
using a common 'all' prefix for the common test output.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217063 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-03 19:39:10 +00:00
Lang Hames
07ad198d6c Add a regression test to sanity check the PBQP allocator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217057 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-03 18:04:10 +00:00
Tom Stellard
ce4caf146f R600/SI: Add a pattern for i64 and in a branch
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217041 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-03 15:22:41 +00:00
Renato Golin
218805d21d Check-label a bit more specific
Sometimes, the .file could be reordered and it'd identify the ldr in the filename as a bad match.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217037 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-03 13:32:08 +00:00
Alexander Potapenko
fac68d2d70 Fix PR20800: correctly calculate the offset of the subq instruction when generating compact unwind info.
This CL replaces the constant DarwinX86AsmBackend.PushInstrSize with a method
that lets the backend account for different sizes of "push %reg" instruction
sizes.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217020 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-03 07:11:34 +00:00
Juergen Ributzka
847547086d Reapply r216805 "[MachineCombiner][AArch64] Use the correct register class for MADD, SUB, and OR.""
This reapplies r216805 with a fix to a copy-past error, which resulted in an
incorrect register class.

Original commit message:
Select the correct register class for the various instructions that are
generated when combining instructions and constrain the registers to the
appropriate register class.

This fixes rdar://problem/18183707.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217019 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-03 07:07:10 +00:00
Juergen Ributzka
dd7a7107c1 [FastISel][AArch64] Add target-dependent instruction selection for Add/Sub.
There is already target-dependent instruction selection support for Adds/Subs to
support compares and the intrinsics with overflow check. This takes advantage of
the existing infrastructure to also support Add/Sub, which allows the folding of
immediates, sign-/zero-extends, and shifts.

This fixes rdar://problem/18207316.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217007 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-03 01:38:36 +00:00
Renato Golin
418103c4d4 Missing test from r216989
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216990 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 22:46:18 +00:00
Renato Golin
ddcf3bd0a0 Only emit movw on ARMv6T2+
Fix PR18364.

Patch by Dimitry Andric.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216989 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 22:45:13 +00:00
Juergen Ributzka
79ec2ed417 [FastISel][AArch64] Use the target-dependent selection code for shifts first.
This uses the target-dependent selection code for shifts first, which allows us
to create better code for shifts with immediates and sign-/zero-extend folding.

Vector type are not handled yet and the code falls back to target-independent
instruction selection for these cases.

This fixes rdar://problem/17907920.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216985 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 22:33:57 +00:00
Robin Morisset
76b55cc4b1 [X86] Allow atomic operations using immediates to avoid using a register
The only valid lowering of atomic stores in the X86 backend was mov from
register to memory. As a result, storing an immediate required a useless copy
of the immediate in a register. Now these can be compiled as a simple mov.

Similarily, adding/and-ing/or-ing/xor-ing an
immediate to an atomic location (but through an atomic_store/atomic_load,
not a fetch_whatever intrinsic) can now make use of an 'add $imm, x(%rip)'
instead of using a register. And the same applies to inc/dec.

This second point matches the first issue identified in
  http://llvm.org/bugs/show_bug.cgi?id=17281

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216980 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 22:16:29 +00:00
Matt Arsenault
2aab51a118 R600/SI: Relax some ordering in tests.
This will help with enabling misched

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216971 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 21:45:50 +00:00
Matt Arsenault
9c21df64a4 R600/SI: Fix hardcoded register numbers in test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216944 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 20:43:07 +00:00
Matt Arsenault
f471c483e6 R600/SI: Add failing testcase.
This is broken when 64-bit add is only partially
moved to the VALU.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216933 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 19:12:31 +00:00
Matt Arsenault
f7a3c7e705 Fix interference caused by fmul 2, x -> fadd x, x
If an fmul was introduced by lowering, it wouldn't be folded
into a multiply by a constant since the earlier combine would
have replaced the fmul with the fadd.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216932 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 19:02:53 +00:00
Reid Kleckner
f93099eb1c CodeGen: Handle va_start in the entry block
Also fix a small copy-paste bug in X86ISelLowering where Chain should
have been used in place of DAG.getEntryToken().

Fixes PR20828.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216929 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 18:42:44 +00:00
Hal Finkel
2633f795c6 Enable splitting indexing from loads with TargetConstants
When I recommitted r208640 (in r216898) I added an exclusion for TargetConstant
offsets, as there is no guarantee that a backend can handle them on generic
ADDs (even if it generates them during address-mode matching) -- and,
specifically, applying this transformation directly with TargetConstants caused
a self-hosting failure on PPC64. Ignoring all TargetConstants, however, is less
than ideal. Instead, for non-opaque constants, we can convert them into regular
constants for use with the generated ADD (or SUB).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216908 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 16:05:23 +00:00
Rafael Espindola
1e556a80ff Replace -use-init-array with -use-ctors.
We have been using .init-array for most systems for quiet some time,
but tools like llc are still defaulting to .ctors because the old
option was never changed.

This patch makes llc default to .init-array and changes the option to
be -use-ctors.

Clang is not affected by this. It has its own fancier logic.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216905 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 13:54:53 +00:00
David Xu
4e2b661005 Merge Extend and Shift into a UBFX
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216899 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 09:33:56 +00:00
Hal Finkel
3da41a28a1 Revert "Revert '[DAGCombiner] Split up an indexed load if only the base pointer value is live'"
I reverted r208640 in r209747 because r208640 broke self-hosting on PPC64. The
underlying cause of the failure is that pre-inc loads with increments
represented by ISD::TargetConstants were being transformed into ISD:::ADDs with
ISD::TargetConstant operands. PPC doesn't have a pattern for those, and so they
were selected as invalid r+r adds.

This recommits r208640, rebased and with an exclusion for ISD::TargetConstant
increments. This behavior seems correct, although in the future we might want
to ask the target to split out the indexing that uses ISD::TargetConstants.

Unfortunately, I don't yet have small test case where the relevant invalid
'add' instruction is not itself dead (and thus eliminated by
DeadMachineInstructionElim -- sometimes bugpoint is too good at removing things)

Original commit message (by Adam Nemet):

Right now the load may not get DCE'd because of the side-effect of updating
the base pointer.

This can happen if we lower a read-modify-write of an illegal larger type
(e.g. i48) such that the modification only affects one of the subparts (the
lower i32 part but not the higher i16 part).  See the testcase.

In order to spot the dead load we need to revisit it when SimplifyDemandedBits
decided that the value of the load is masked off.  This is the
CommitTargetLoweringOpt piece.

I checked compile time with ARM64 by sending SPEC bitcode files through llc.
No measurable change.

Fixes <rdar://problem/16031651>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216898 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 06:24:04 +00:00
Jingyue Wu
88350bf61d Fix a typo in comments in r216862, NFC
PR20766 -> PR20776. Thanks Roman Divacky for the catch!


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216883 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-01 14:55:04 +00:00
Tilmann Scheller
4016a9ea4a [ARM] Add Thumb-2 code size optimization regression test for EOR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216881 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-01 12:59:34 +00:00
Tilmann Scheller
9e6f09d7ce ARM] Add Thumb-2 code size optimization regression test for BIC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216880 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-01 12:53:29 +00:00
Jingyue Wu
d43e6df10b [MachineSink] Use the real post dominator tree
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics
in isPostDominatedBy, use the real MachinePostDominatorTree. The old
heuristics caused instructions to sink unnecessarily, and might create
register pressure.

Test Plan:
Added a NVPTX codegen test to verify that our change is in effect. It also
shows the unnecessary register pressure caused by over-sinking. Updated
affected tests in AArch64 and X86.

Reviewers: eliben, meheff, Jiangning

Reviewed By: Jiangning

Subscribers: jholewinski, aemerson, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D4814



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216862 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-01 03:47:25 +00:00
Juergen Ributzka
bcbae3d680 Revert r216805 "[MachineCombiner][AArch64] Use the correct register class for MADD, SUB, and OR."
I think this broke the build bot. Reverting it for now until I have time to take a closer look.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216813 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-30 06:16:26 +00:00
Juergen Ributzka
4e92383b67 [MachineCombiner][AArch64] Use the correct register class for MADD, SUB, and OR.
Select the correct register class for the various instructions that are
generated when combining instructions and constrain the registers to the
appropriate register class.

This fixes rdar://problem/18183707.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216805 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 23:48:09 +00:00
Juergen Ributzka
e7f301e079 [FastISel][AArch64] Use the correct register class for branches.
Also constrain the register class for branches.

This fixes rdar://problem/18181496.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216804 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 23:48:06 +00:00
Juergen Ributzka
bc420a0cc1 [MachineSinking] Clear kill flag of all operands at all their uses.
When sinking an instruction it might be moved past the original last use of one
of its operands. This last use has the kill flag set and the verifier will
obviously complain about this.

Before Machine Sinking (AArch64):
%vreg3<def> = ASRVXr %vreg1, %vreg2<kill>
%XZR<def> = SUBSXrs %vreg4, %vreg1<kill>, 160, %NZCV<imp-def>
...

After Machine Sinking:
%XZR<def> = SUBSXrs %vreg4, %vreg1<kill>, 160, %NZCV<imp-def>
...
%vreg3<def> = ASRVXr %vreg1, %vreg2<kill>

This fix clears all the kill flags in all instruction that use the same operands
as the instruction that is being sunk.

This fixes rdar://problem/18180996.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216803 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 23:48:03 +00:00
Robin Morisset
217b38e19a Fix typos in comments, NFC
Summary: Just fixing comments, no functional change.

Test Plan: N/A

Reviewers: jfb

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D5130

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216784 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 21:53:01 +00:00
Reid Kleckner
9436574d1b musttail: Forward regparms of variadic functions on x86_64
Summary:
If a variadic function body contains a musttail call, then we copy all
of the remaining register parameters into virtual registers in the
function prologue. We track the virtual registers through the function
body, and add them as additional registers to pass to the call. Because
this is all done in virtual registers, the register allocator usually
gives us good code. If the function does a call, however, it will have
to spill and reload all argument registers (ew).

Forwarding regparms on x86_32 is not implemented because most compilers
don't support varargs in 32-bit with regparms.

Reviewers: majnemer

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5060

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216780 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 21:42:08 +00:00
Reid Kleckner
dae28732f4 Verifier: Don't reject varargs callee cleanup functions
We've rejected these kinds of functions since r28405 in 2006 because
it's impossible to lower the return of a callee cleanup varargs
function. However there are lots of legal ways to leave such a function
without returning, such as aborting. Today we can leave a function with
a musttail call to another function with the correct prototype, and
everything works out.

I'm removing the verifier check declaring that a normal return from such
a function is UB.

Reviewed By: nlewycky

Differential Revision: http://reviews.llvm.org/D5059

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216779 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 21:25:28 +00:00
Louis Gerbarg
6393b3a677 Remove spurious mask operations from AArch64 add->compares on 16 and 8 bit values
This patch checks for DAG patterns that are an add or a sub followed by a
compare on 16 and 8 bit inputs. Since AArch64 does not support those types
natively they are legalized into 32 bit values, which means that mask operations
are inserted into the DAG to emulate overflow behaviour. In many cases those
masks do not change the result of the processing and just introduce a dependent
operation, often in the middle of a hot loop.

This patch detects the relevent DAG patterns and then tests to see if the
transforms are equivalent with and without the mask, removing the mask if
possible. The exact mechanism of this patch was discusses in
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-July/074444.html

There is a reasonably good chance there are missed oppurtunities due to similiar
(but not identical) DAG patterns that could be funneled into this test, adding
them should be simple if we see test cases.

Tests included.

rdar://13754426

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216776 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 21:00:22 +00:00
Reid Kleckner
1469e29334 X86: Fix conflict over ESI between base register and rep;movsl
The new solution is to not use this lowering if there are any dynamic
allocas in the current function. We know up front if there are dynamic
allocas, but we don't know if we'll need to create stack temporaries
with large alignment during lowering. Conservatively assume that we will
need such temporaries.

Reviewed By: hans

Differential Revision: http://reviews.llvm.org/D5128

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216775 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 20:50:31 +00:00
Juergen Ributzka
d8835d09ec [FastISel][AArch64] Fix an incorrect kill flag due to a bug in SelectTrunc.
When we select a trunc instruction we don't emit any code if the type is already
i32 or smaller. This is because the instruction that uses the truncated value
will deal with it.

This behavior can incorrectly transfer a kill flag, which was meant for the
result of the truncate, onto the source register.

%2 = trunc i32 %1 to i16
... = ... %2                -> ... = ... vreg1 <kill>
... = ... %1                   ... = ... vreg1

This commit fixes this by emitting a COPY instruction, so that the result and
source register are distinct virtual registers.

This fixes rdar://problem/18178188.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216750 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 17:58:16 +00:00
Tilmann Scheller
59758c4337 [ARM] Add Thumb-2 code size optimization test for ASR (register).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216746 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 17:19:00 +00:00
Tilmann Scheller
b1424d72ca [ARM] Add Thumb-2 code size optimization test for ASR (immediate).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216744 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 17:02:28 +00:00
Matt Arsenault
f4d57e7874 R600/SI: Use mad for fsub + fmul
We can use a negate source modifier to match
this for fsub.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216735 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 16:01:14 +00:00
Tim Northover
1e77dc84c4 AArch64: only try to get operand of a known node.
A bug in r216725 meant we tried to discover the type of a SETCC before
confirming the node actually was a SETCC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216734 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 15:34:58 +00:00
Jingyue Wu
87a2b36cf6 [NVPTX] Make the alignment an explicit argument to ldu/ldg
Summary:
Instead of specifying the alignment as metadata which may be destroyed by
transformation passes, make the alignment the second argument to ldu/ldg
intrinsic calls.

Test Plan:
ldu-ldg.ll
ldu-i8.ll
ldu-reg-plus-offset.ll

Reviewers: eliben, meheff, jholewinski

Reviewed By: meheff, jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D5093

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216731 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 15:30:20 +00:00
Tilmann Scheller
c5484a2704 [ARM] Make Thumb-2 code size optimization test more strict.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216729 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 15:13:35 +00:00
Tilmann Scheller
f238c1844e [ARM] Add a first test for the Thumb-2 code size optimization pass.
While working on a Thumb-2 code size optimization I just realized that we don't have any regression tests for it.

So here's a first test case, I plan to increase the coverage over time.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216728 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 15:04:40 +00:00
Tim Northover
1f70cb9c14 AArch64: skip select/setcc combine in complex case.
In an llvm-stress generated test, we were trying to create a v0iN type and
asserting when that failed. This case could probably be handled by the
function, but not without added complexity and the situation it arises in is
sufficiently odd that there's probably no benefit anyway.

Should fix PR20775.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216725 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 13:05:18 +00:00
Robert Khasanov
37e671e894 [SKX] Enable lowering of integer CMP operations.
Added new types to Legalizer.
Fixed getSetCCResultType function
Added lowering tests.

Reviewed by Elena Demikhovsky.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216717 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 08:46:04 +00:00
Job Noorman
d2323fc295 Do not assume the value passed to memset is an i32.
The code in SelectionDAG::getMemset for some reason assumes the value passed to
memset is an i32. This breaks the generated code for targets that only have
registers smaller than 32 bits because the value might get split into multiple
registers by the calling convention. See the test for the MSP430 target included
in the patch for an example.

This patch ensures that nothing is assumed about the type of the value. Instead,
the type is taken from the selected overload of the llvm.memset intrinsic.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216716 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 08:23:53 +00:00
Jiangning Liu
3cd73a5ded [AArch64] Fix some failures exposed by value type v4f16 and v8f16.
1) Add some missing bitcast patterns for v8f16.
2) Add type promotion for operand of ld/st operations.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216706 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 01:31:42 +00:00
Juergen Ributzka
cf45151b2c [FastISel][AArch64] Don't fold instructions that are not in the same basic block.
This fix checks first if the instruction to be folded (e.g. sign-/zero-extend,
or shift) is in the same machine basic block as the instruction we are folding
into.

Not doing so can result in incorrect code, because the value might not be
live-out of the basic block, where the value is defined.

This fixes rdar://problem/18169495.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216700 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 00:19:21 +00:00
Jim Grosbach
0d34b1ed26 AArch64: More correctly constrain target vector extend lowering.
The AArch64 target lowering for [zs]ext of vectors is set up to handle
input simple types and expects the generic SDag path to do something reasonable
with anything that's not a simple type. The code, however, was only
checking that the result type was a simple type and assuming that
implied that the source type would also be a simple type. That's not a
valid assumption, as operations like "zext <1 x i1> %0 to <1 x i32>"
demonstrate. The fix is to simply explicitly validate the source type
as well as the result type.

PR20791

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216689 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 22:08:28 +00:00
Rafael Espindola
4bb535027e On MachO, don't put non-private constants in mergeable sections.
On MachO, putting a symbol that doesn't start with a 'L' or 'l' in one of the
__TEXT,__literal* sections prevents the linker from merging the context of the
section.

Since private GVs are the ones the get mangled to start with 'L' or 'l', we now
only put those on the __TEXT,__literal* sections.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216682 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 20:13:31 +00:00
Sanjay Patel
cf9661c6f9 Fix a logic bug in x86 vector codegen: sext (zext (x) ) != sext (x) (PR20472).
Remove a block of code from LowerSIGN_EXTEND_INREG() that was added with:
http://llvm.org/viewvc/llvm-project?view=revision&revision=177421

And caused:
http://llvm.org/bugs/show_bug.cgi?id=20472 (more analysis here)
http://llvm.org/bugs/show_bug.cgi?id=18054

The testcases confirm that we (1) don't remove a zext op that is necessary and (2) generate
a pmovz instead of punpck if SSE4.1 is available. Although pmovz is 1 byte longer, it allows 
folding of the load, and so saves 3 bytes overall.

Differential Revision: http://reviews.llvm.org/D4909



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216679 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 18:59:22 +00:00
David Xu
5ca793561e Generate CMN when comparing a short int with minus
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216651 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 04:59:53 +00:00
Chandler Carruth
da3a293313 [x86] Clean up some tests to use FileCheck and combine two into a single
file.

Changing code that is covered by these tests is just too hard to debug
currently, and now it will be clear the nature of the changes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216643 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 03:41:28 +00:00
Juergen Ributzka
4a76317ebb [FastISel] Undo phi node updates when falling-back to SelectionDAG.
The included test case would fail, because the MI PHI node would have two
operands from the same predecessor.

This problem occurs when a switch instruction couldn't be selected. This happens
always, because there is no default switch support for FastISel to begin with.

The problem was that FastISel would first add the operand to the PHI nodes and
then fall-back to SelectionDAG, which would then in turn add the same operands
to the PHI nodes again.

This fix removes these duplicate PHI node operands by reseting the
PHINodesToUpdate to its original state before FastISel tried to select the
instruction.

This fixes <rdar://problem/18155224>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216640 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 02:06:55 +00:00
Juergen Ributzka
d24494d672 [FastISel]
Currently instructions are folded very aggressively for AArch64 into the memory
operation, which can lead to the use of killed operands:
  %vreg1<def> = ADDXri %vreg0<kill>, 2
  %vreg2<def> = LDRBBui %vreg0, 2
  ... = ... %vreg1 ...

This usually happens when the result is also used by another non-memory
instruction in the same basic block, or any instruction in another basic block.

This fix teaches hasTrivialKill to not only check the LLVM IR that the value has
a single use, but also to check if the register that represents that value has
already been used. This can happen when the instruction with the use was folded
into another instruction (in this particular case a load instruction).

This fixes rdar://problem/18142857.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216634 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 00:09:46 +00:00
Juergen Ributzka
a26b1bdcc8 Revert "[FastISel][AArch64] Don't fold instructions too aggressively into the memory operation."
Quentin pointed out that this is not the correct approach and there is a better and easier solution.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216632 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 23:09:40 +00:00
Juergen Ributzka
1f5263e43f [FastISel][AArch64] Don't fold instructions too aggressively into the memory operation.
Currently instructions are folded very aggressively into the memory operation,
which can lead to the use of killed operands:
  %vreg1<def> = ADDXri %vreg0<kill>, 2
  %vreg2<def> = LDRBBui %vreg0, 2
  ... = ... %vreg1 ...

This usually happens when the result is also used by another non-memory
instruction in the same basic block, or any instruction in another basic block.

If the computed address is used by only memory operations in the same basic
block, then it is safe to fold them. This is because all memory operations will
fold the address computation and the original computation will never be emitted.

This fixes rdar://problem/18142857.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216629 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 22:52:33 +00:00
Juergen Ributzka
ccf53013cd [FastISel][AArch64] Fix simplify address when the address comes from a shift.
When the address comes directly from a shift instruction then the address
computation cannot be folded into the memory instruction, because the zero
register is not available as a base register. Simplify addess needs to emit the
shift instruction and use the result as base register.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216621 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 21:38:33 +00:00
Juergen Ributzka
d445e4acdb [FastISel][AArch64] Use the zero register for stores.
Use the zero register directly when possible to avoid an unnecessary register
copy and a wasted register at -O0. This also uses integer stores to store a
positive floating-point zero. This saves us from materializing the positive zero
in a register and then storing it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216617 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 21:04:52 +00:00
Oliver Stannard
5e487f8dc7 Teach the AArch64 backend about v4f16 and v8f16
This teaches the AArch64 backend to deal with the operations required
to deal with the operations on v4f16 and v8f16 which are exposed by
NEON intrinsics, plus the add, sub, mul and div operations.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216555 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 16:16:04 +00:00
Chandler Carruth
7e3dc40fab [x86] Fix a regression introduced with r213897 for 32-bit targets where
we stopped efficiently lowering sextload using the SSE41 instructions
for that operation.

This is a consequence of a bad predicate I used thinking of the memory
access needs. The code actually handles the cases where the predicate
doesn't apply, and handles them much better. =] Simple fix and a test
case added. Fixes PR20767.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216538 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 11:39:47 +00:00
Chandler Carruth
963a5e6c61 [SDAG] Re-instate r215611 with a fix to a pesky X86 DAG combine.
This combine is essentially combining target-specific nodes back into target
independent nodes that it "knows" will be combined yet again by a target
independent DAG combine into a different set of target-independent nodes that
are legal (not custom though!) and thus "ok". This seems... deeply flawed. The
crux of the problem is that we don't combine un-legalized shuffles that are
introduced by legalizing other operations, and thus we don't see a very
profitable combine opportunity. So the backend just forces the input to that
combine to re-appear.

However, for this to work, the conditions detected to re-form the unlegalized
nodes must be *exactly* right. Previously, failing this would have caused poor
code (if you're lucky) or a crasher when we failed to select instructions.
After r215611 we would fall back into the legalizer. In some cases, this just
"fixed" the crasher by produces bad code. But in the test case added it caused
the legalizer and the dag combiner to iterate forever.

The fix is to make the alignment checking in the x86 side of things match the
alignment checking in the generic DAG combine exactly. This isn't really a
satisfying or principled fix, but it at least make the code work as intended.
It also highlights that it would be nice to detect the availability of under
aligned loads for a given type rather than bailing on this optimization. I've
left a FIXME to document this.

Original commit message for r215611 which covers the rest of the chang:
  [SDAG] Fix a case where we would iteratively legalize a node during
  combining by replacing it with something else but not re-process the
  node afterward to remove it.

  In a truly remarkable stroke of bad luck, this would (in the test case
  attached) end up getting some other node combined into it without ever
  getting re-processed. By adding it back on to the worklist, in addition
  to deleting the dead nodes more quickly we also ensure that if it
  *stops* being dead for any reason it makes it back through the
  legalizer. Without this, the test case will end up failing during
  instruction selection due to an and node with a type we don't have an
  instruction pattern for.

It took many million runs of the shuffle fuzz tester to find this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216537 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 11:22:16 +00:00
Elena Demikhovsky
fe0c6ead85 AVX-512: Added intrinsic for VMOVSS store form with mask.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216530 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 07:38:43 +00:00
Juergen Ributzka
fc03e72b4f [FastISel][AArch64] Fix address simplification.
When a shift with extension or an add with shift and extension cannot be folded
into the memory operation, then the address calculation has to be materialized
separately. While doing so the code forgot to consider a possible sign-/zero-
extension. This fix folds now also the sign-/zero-extension into the add or
shift instruction which is used to materialize the address.

This fixes rdar://problem/18141718.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216511 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 00:58:30 +00:00
Juergen Ributzka
836f4bd090 [FastISel][AArch64] Fold Sign-/Zero-Extend into the shift immediate instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216510 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 00:58:26 +00:00
Yi Kong
2282afa6cc ARM: Add patterns for dbg
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216451 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-26 12:47:26 +00:00
Chad Rosier
373fc00835 [AArch32] Add patterns for VCVT{A,N,P,M}.
Patterns for lowering libm calls to VCVT{A,N,P,M} are also included.
Phabricator Revision: http://reviews.llvm.org/D5033

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216388 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-25 16:56:33 +00:00
Hal Finkel
7ca2a7d742 [PowerPC] Add support for dcbtst and icbt (prefetch)
Adds code generation support for dcbtst (data cache prefetch for write) and
icbt (instruction cache prefetch for read - Book E cores only).

We still end up with a 'cannot select' error for the non-supported prefetch
intrinsic forms. This will be fixed in a later commit.

Fixes PR20692.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216339 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-23 23:21:04 +00:00
Chad Rosier
8eb867e97d Revert "ARM: improve RTABI 4.2 conformance on Linux"
This reverts commit r215862 due to nightly failures.  Will work on getting a
reduced test case, but I wanted to get our bots green in the meantime.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216325 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-23 18:29:43 +00:00
Chandler Carruth
bfed08e41f [x86] Start fixing a really subtle and terrible form of miscompile in
these DAG combines.

The DAG auto-CSE thing is truly terrible. Due to it, when RAUW-ing
a node with its operand, you can cause its uses to CSE to itself, which
then causes their uses to become your uses which causes them to be
picked up by the RAUW. For nodes that are determined to be "no-ops",
this is "fine". But if the RAUW is one of several steps to enact
a transformation, this causes the DAG to really silently eat an discard
nodes that you would never expect. It took days for me to actually
pinpoint a test case triggering this and a really frustrating amount of
time to even comprehend the bug because I never even thought about the
ability of RAUW to iteratively consume nodes due to CSE-ing them into
itself.

To fix this, we have to build up a brand-new chain of operations any
time we are combining across (potentially) intervening nodes. But once
the logic is added to do this, another issue surfaces: CombineTo eagerly
deletes the one node combined, *but no others*. This is... really
frustrating. If deleting it makes its operands become dead, those
operand nodes often won't go onto the worklist in the
order you would want -- they're already on it and not near the top. That
means things higher on the worklist will get combined prior to these
dead nodes being GCed out of the worklist, and if the chain is long, the
immediate users won't be enough to re-detect where the root of the chain
is that became single-use again after deleting the dead nodes. The
better way to do this is to never immediately delete nodes, and instead
to just enqueue them so we can recursively delete them. The
combined-from node is typically not on the worklist anyways by virtue of
having been popped off.... But that in turn breaks other tests that
*require* CombineTo to delete unused nodes. :: sigh ::

Fortunately, there is a better way. This whole routine should have been
returning the replacement rather than using CombineTo which is quite
hacky. Switch to that, and all the pieces fall together.

I suspect the same kind of miscompile is possible in the half-shuffle
folding code, and potentially the recursive folding code. I'll be
switching those over to a pattern more like this one for safety's sake
even though I don't immediately have any test cases for them. Note that
the only way I got a test case for this instance was with *heavily* DAG
combined 256-bit shuffle sequences generated by my fuzzer. ;]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216319 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-23 10:25:15 +00:00
Nick Lewycky
f591b9c33e Revert r215611 because it caused the infinite loop in bug 20736. There is a reduced testcase in that bug.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216307 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-23 00:45:03 +00:00
Reid Kleckner
d89c0abc07 ARM / x86_64 varargs: Don't save regparms in prologue without va_start
There's no need to do this if the user doesn't call va_start. In the
future, we're going to have thunks that forward these register
parameters with musttail calls, and they won't need these spills for
handling va_start.

Most of the test suite changes are adding va_start calls to existing
tests to keep things working.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216294 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-22 21:59:26 +00:00
Tom Stellard
f50f927d65 R600/SI: Use READ2/WRITE2 instructions for 64-bit mem ops with 32-bit alignment
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216279 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-22 18:49:35 +00:00
Tom Stellard
ec4cb3346d R600/SI: Use a ComplexPattern for DS loads and stores
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216278 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-22 18:49:33 +00:00
Quentin Colombet
c3f2ad0879 [ARM] Move the implementation of the target hooks related to copy-related
instruction from ARMInstrInfo to ARMBaseInstrInfo.
That way, thumb mode can also benefit from the advanced copy optimization.

<rdar://problem/12702965>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216274 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-22 18:05:22 +00:00
Sasa Stankovic
cc59c3f335 [mips] Don't use odd-numbered float registers for double arguments for fastcc
calling convention if FP is 64-bit and +nooddspreg is used.

Differential Revision: http://reviews.llvm.org/D4981.diff


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216262 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-22 09:23:22 +00:00
Juergen Ributzka
5e34dffb9c [FastISel][AArch64] Add support for variable shift.
This adds the missing variable shift support for value type i8, i16, and i32.

This fixes <rdar://problem/18095685>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216242 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 23:06:07 +00:00
Juergen Ributzka
5d6365c80c [FastISel][AArch64] Use the correct register class to make the MI verifier happy.
This is mostly achieved by providing the correct register class manually,
because getRegClassFor always returns the GPR*AllRegClass for MVT::i32 and
MVT::i64.

Also cleanup the code to use the FastEmitInst_* method whenever possible. This
makes sure that the operands' register class is properly constrained. For all
the remaining cases this adds the missing constrainOperandRegClass calls for
each operand.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216225 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 20:57:57 +00:00
Tom Stellard
fdbf61d00d R600/SI: Teach moveToVALU how to handle more S_LOAD_* instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216220 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 20:41:00 +00:00
Tom Stellard
5f52739370 R600/SI: Make sure SCRATCH_WAVE_OFFSET is added as Live-In to the function
This fixes a crash in an ocl conformance test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216219 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 20:40:58 +00:00
Quentin Colombet
ad3c6289b6 [AArch64] Run a peephole pass right after AdvSIMD pass.
The AdvSIMD pass may produce copies that are not coalescer-friendly. The
peephole optimizer knows how to fix that as demonstrated in the test case.

<rdar://problem/12702965>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216200 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 18:10:07 +00:00
Moritz Roth
a6afad8b33 Thumb1 load/store optimizer: Improve code to materialize new base register.
There are two add-immediate instructions in Thumb1: tADDi8 and tADDi3. Only
the latter supports using different source and destination registers, so
whenever we materialize a new base register (at a certain offset) we'd do
so by moving the base register value to the new register and then adding in
place. This patch changes the code to use a single tADDi3 if the offset is
small enough to fit in 3 bits.

Differential Revision: http://reviews.llvm.org/D5006

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216193 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 17:11:03 +00:00
Juergen Ributzka
69ec09b61f [FastISel][AArch64] Remove redundant test.
These tests and many more are already covered by fast-isel-addressing-modes.ll.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216186 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 16:40:05 +00:00
Jonathan Roelofs
4c3be1aa0f Add a thread-model knob for lowering atomics on baremetal & single threaded systems
http://reviews.llvm.org/D4984


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216182 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 14:35:47 +00:00
Benjamin Kramer
daada81e5c DAGCombiner: Make concat_vector combine safe for EVTs and concat_vectors with many arguments.
PR20677

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216175 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 13:28:02 +00:00
Oliver Stannard
760a46522a [ARM] Enable DP copy, load and store instructions for FPv4-SP
The FPv4-SP floating-point unit is generally referred to as
single-precision only, but it does have double-precision registers and
load, store and GPR<->DPR move instructions which operate on them.
This patch enables the use of these registers, the main advantage of
which is that we now comply with the AAPCS-VFP calling convention.
This partially reverts r209650, which added some AAPCS-VFP support,
but did not handle return values or alignment of double arguments in
registers.

This patch also adds tests for Thumb2 code generation for
floating-point instructions and intrinsics, which previously only
existed for ARM.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216172 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 12:50:31 +00:00
Robert Khasanov
fec1abaeab [x86] Added _addcarry_ and _subborrow_ intrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216164 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 09:43:43 +00:00
Robert Khasanov
10dacc4b52 [x86] Broadwell: ADOX/ADCX. Added _addcarryx_u{32|64} intrinsics to LLVM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216162 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 09:27:00 +00:00
Jiangning Liu
150cef218f Revert r216066, "Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216147 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 01:59:30 +00:00
Quentin Colombet
e817bdd304 [PeepholeOptimizer] Take advantage of the isInsertSubreg property in the
advanced copy optimization.

This is the final step patch toward transforming:
udiv    r0, r0, r2
udiv    r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov    r0, r1, d16
bx      lr

into:
udiv    r0, r0, r2
udiv    r1, r1, r3
bx      lr

Indeed, thanks to this patch, this optimization is able to look through
vmov.32 d16[0], r0
vmov.32 d16[1], r1

and is able to rewrite the following sequence:
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov    r0, r1, d16

into simple generic GPR copies that the coalescer managed to remove.

<rdar://problem/12702965>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216144 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 00:19:16 +00:00
Jonathan Roelofs
506ed4d4a5 Lower thumbv4t & thumbv5 lo->lo copies through a push-pop sequence
On pre-v6 hardware, 'MOV lo, lo' gives undefined results, so such copies need to
be avoided. This patch trades simplicity for implementation time at the expense
of performance... As they say: correctness first, then performance.

See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075998.html for a few
ideas on how to make this better.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216138 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 23:38:50 +00:00
Sanjay Patel
89305e5345 Don't prevent a vselect of constants from becoming a single load (PR20648).
Fix for PR20648 - http://llvm.org/bugs/show_bug.cgi?id=20648

This patch checks the operands of a vselect to see if all values are constants.
If yes, bail out of any further attempts to create a blend or shuffle because
SelectionDAGLegalize knows how to turn this kind of vselect into a single load.

This already happens for machines without SSE4.1, so the added checks just send
more targets down that path.

Differential Revision: http://reviews.llvm.org/D4934


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216121 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 20:34:56 +00:00
Duncan P. N. Exon Smith
4641d5dbdf X86: Add missing triples from r216119
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216120 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 19:58:59 +00:00
Duncan P. N. Exon Smith
5012f1db20 X86: Align the stack on word boundaries in LowerFormalArguments()
The goal of the patch is to implement section 3.2.3 of the AMD64 ABI
correctly.  The controlling sentence is, "The size of each argument gets
rounded up to eightbytes.  Therefore the stack will always be eightbyte
aligned." The equivalent sentence in the i386 ABI page 37 says, "At all
times, the stack pointer should point to a word-aligned area."  For both
architectures, the stack pointer is not being rounded up to the nearest
eightbyte or word between the last normal argument and the first
variadic argument.

Patch by Thomas Jablin!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216119 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 19:40:59 +00:00
Keno Fischer
4b1cddbaf0 Do not insert a tail call when returning multiple values on X86
Summary: This fixes http://llvm.org/bugs/show_bug.cgi?id=19530.
The problem is that X86ISelLowering erroneously thought the third call
was eligible for tail call elimination.
It would have been if it's return value was actually the one returned
by the calling function, but here that is not the case and
additional values are being returned.

Test Plan: Test case from the original bug report is included.

Reviewers: rafael

Reviewed By: rafael

Subscribers: rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D4968

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216117 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 19:00:37 +00:00
Sanjay Patel
3deb3e32b2 critical-anti-dependency breaker: don't use reg def info from kill insts (PR20308)
In PR20308 ( http://llvm.org/bugs/show_bug.cgi?id=20308 ), the critical-anti-dependency breaker
caused a miscompile because it broke a WAR hazard using a register that it thinks is available
based on info from a kill inst. Until PR18663 is solved, we shouldn't use any def/use info from
a kill because they are really just nops.

This patch adds guard checks for kills around calls to ScanInstruction() where the DefIndices
array is set. For good measure, add an assert in ScanInstruction() so we don't hit this bug again.

The test case is a reduced version of the code from the bug report.

Differential Revision: http://reviews.llvm.org/D4977



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216114 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 18:03:00 +00:00
Quentin Colombet
dcd3cbea54 [PeepholeOptimizer] Refactor the advanced copy optimization to take advantage of
the isRegSequence property.

This is a follow-up of r215394 and r215404, which respectively introduces the
isRegSequence property and uses it for ARM.

Thanks to the property introduced by the previous commits, this patch is able
to optimize the following sequence:
vmov	d0, r2, r3
vmov	d1, r0, r1
vmov	r0, s0
vmov	r1, s2
udiv	r0, r1, r0
vmov	r1, s1
vmov	r2, s3
udiv	r1, r2, r1
vmov.32	d16[0], r0
vmov.32	d16[1], r1
vmov	r0, r1, d16
bx	lr

into:
udiv	r0, r0, r2
udiv	r1, r1, r3
vmov.32	d16[0], r0
vmov.32	d16[1], r1
vmov	r0, r1, d16
bx	lr

This patch refactors how the copy optimizations are done in the peephole
optimizer. Prior to this patch, we had one copy-related optimization that
replaced a copy or bitcast by a generic, more suitable (in terms of register
file), copy.

With this patch, the peephole optimizer features two copy-related optimizations:
1. One for rewriting generic copies to generic copies:
PeepholeOptimizer::optimizeCoalescableCopy.
2. One for replacing non-generic copies with generic copies:
PeepholeOptimizer::optimizeUncoalescableCopy.

The goals of these two optimizations are slightly different: one rewrite the
operand of the instruction (#1), the other kills off the non-generic instruction
and replace it by a (sequence of) generic instruction(s).

Both optimizations rely on the ValueTracker introduced in r212100.

The ValueTracker has been refactored to use the information from the
TargetInstrInfo for non-generic instruction. As part of the refactoring, we
switched the tracking from the index of the definition to the actual register
(virtual or physical). This one change is to provide better consistency with
register related APIs and to ease the use of the TargetInstrInfo.

Moreover, this patch introduces a new helper class CopyRewriter used to ease the
rewriting of generic copies (i.e., #1).

Finally, this patch adds a dead code elimination pass right after the peephole
optimizer to get rid of dead code that may appear after rewriting.

This is related to <rdar://problem/12702965>.

Review: http://reviews.llvm.org/D4874


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216088 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 17:41:48 +00:00
Juergen Ributzka
5273295751 [FastISel][AArch64] Don't fold the sign-/zero-extend from i1 into the compare.
This fixes a bug I introduced in a previous commit (r216033). Sign-/Zero-
extension from i1 cannot be folded into the ADDS/SUBS instructions. Instead both
operands have to be sign-/zero-extended with separate instructions.

Related to <rdar://problem/17913111>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216073 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 16:34:15 +00:00
Jiangning Liu
e04455d72b Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type
legalization stage. With those two optimizations, fewer signed/zero extension
instructions can be inserted, and then we can expose more opportunities to
Machine CSE pass in back-end.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216066 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 12:05:15 +00:00
Pavel Chupin
aadaac228d [x32] Fix FrameIndex check in SelectLEA64_32Addr
Summary:
Fixes http://llvm.org/bugs/show_bug.cgi?id=20016 reproducible on new
lea-5.ll case.
Also use RSP/RBP for x32 lea to save 1 byte used for 0x67 prefix in
ESP/EBP case.

Test Plan: lea tests modified to include x32/nacl and new test added

Reviewers: nadav, dschuff, t.p.northover

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D4929

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216065 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 11:59:22 +00:00
Yi Kong
40f9d11ccc ARM: Fix codegen for rbit intrinsic
LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic.
According to ARM ARM, rbit only takes register as argument, not immediate.
The correct instruction should be rbit <Rd>, <Rm>.

The bug was originally introduced in r211057.

Differential Revision: http://reviews.llvm.org/D4980

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216064 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 10:40:20 +00:00
Juergen Ributzka
3ef392c4e2 [FastISel][AArch64] Use the proper FMOV instruction to materialize a +0.0.
Use FMOVWSr/FMOVXDr instead of FMOVSr/FMOVDr, which have the proper register
class to be used with the zero register. This makes the MachineInstruction
verifier happy again.

This is related to <rdar://problem/18027157>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216040 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 01:10:36 +00:00
Juergen Ributzka
ae9a7964ef [FastISel][AArch64] Factor out ADDS/SUBS instruction emission and add support for extensions and shift folding.
Factor out the ADDS/SUBS instruction emission code into helper functions and
make the helper functions more clever to support most of the different ADDS/SUBS
instructions the architecture support. This includes better immedediate support,
shift folding, and sign-/zero-extend folding.

This fixes <rdar://problem/17913111>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216033 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-19 22:29:55 +00:00
Juergen Ributzka
1d58f989d4 [FastISel][AArch64] Extend floating-point materialization test.
This adds the missing test that I promised for r215753 to test the
materialization of the floating-point value +0.0.

Related to <rdar://problem/18027157>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216019 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-19 20:35:07 +00:00
Juergen Ributzka
06bb1ca1e0 Reapply [FastISel][AArch64] Add support for more addressing modes (r215597).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

Original commit message:
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.

For Example:
  lsl x1, x1, #3     --> ldr x0, [x0, x1, lsl #3]
  ldr x0, [x0, x1]

  sxtw x1, w1
  lsl x1, x1, #3     --> ldr x0, [x0, x1, sxtw #3]
  ldr x0, [x0, x1]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216013 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-19 19:44:17 +00:00
Juergen Ributzka
96b1e70c66 Reapply [FastISel][X86] Add large code model support for materializing floating-point constants (r215595).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

Original commit message:
In the large code model for X86 floating-point constants are placed in the
constant pool and materialized by loading from it. Since the constant pool
could be far away, a PC relative load might not work. Therefore we first
materialize the address of the constant pool with a movabsq and then load
from there the floating-point value.

Fixes <rdar://problem/17674628>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216012 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-19 19:44:13 +00:00
Juergen Ributzka
9c23685dd2 Reapply [FastISel][X86] Use XOR to materialize the "0" value (r215594).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216011 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-19 19:44:10 +00:00
Juergen Ributzka
e8757c5dbb Reapply [FastISel][X86] Emit more efficient instructions for integer constant materialization (r215593).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

Original commit message:
This mostly affects the i64 value type, which always resulted in an 15byte
mobavsq instruction to materialize any constant. The custom code checks the
value of the immediate and tries to use a different and smaller mov
instruction when possible.

This fixes <rdar://problem/17420988>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216010 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-19 19:44:06 +00:00
Juergen Ributzka
78f686d37c Reapply [FastISel][AArch64] Make use of the zero register when possible (r215591).
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.

Original commit message:
This change materializes now the value "0" from the zero register.
The zero register can be folded by several instruction, so no
materialization is need at all.

Fixes <rdar://problem/17924413>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216009 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-19 19:44:02 +00:00
Juergen Ributzka
f08cddcf56 Reapply [FastISel] Let the target decide first if it wants to materialize a constant (215588).
Note: This was originally reverted to track down a buildbot error. This commit
exposed a latent bug that was fixed in r215753. Therefore it is reapplied
without any modifications.

I run it through SPEC2k and SPEC2k6 for AArch64 and it didn't introduce any new
regeressions.

Original commit message:
This changes the order in which FastISel tries to materialize a constant.
Originally it would try to use a simple target-independent approach, which
can lead to the generation of inefficient code.

On X86 this would result in the use of movabsq to materialize any 64bit
integer constant - even for simple and small values such as 0 and 1. Also
some very funny floating-point materialization could be observed too.

On AArch64 it would materialize the constant 0 in a register even the
architecture has an actual "zero" register.

On ARM it would generate unnecessary mov instructions or not use mvn.

This change simply changes the order and always asks the target first if it
likes to materialize the constant. This doesn't fix all the issues
mentioned above, but it enables the targets to implement such
optimizations.

Related to <rdar://problem/17420988>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216006 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-19 19:05:24 +00:00
Juergen Ributzka
8841fb5f25 [FastISel][AArch64] Fix a few BuildMI callsites where the result register was added as an operand register.
This fixes a few BuildMI callsites where the result register was added by
using addReg, which is per default a use and therefore an operand register.

Also use the zero register as result register when emitting a compare
instruction (SUBS with unused result register).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215997 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-19 17:41:53 +00:00
Akira Hatanaka
6290308366 [X86, X87 stackifier] Do not mark an operand of a debug instruction as kill.
<rdar://problem/16952634>



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215962 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-19 02:09:57 +00:00
Oliver Stannard
eb922109f9 Teach the AArch64 backend to handle f16
This allows the AArch64 backend to handle fadd, fsub, fmul and fdiv
operations on f16 (half-precision) types by promoting to f32.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215891 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-18 14:22:39 +00:00
Oliver Stannard
802d420792 [ARM,AArch64] Do not tail-call to an externally-defined function with weak linkage
Externally-defined functions with weak linkage should not be
tail-called on ARM or AArch64, as the AAELF spec requires normal calls
to undefined weak functions to be replaced with a NOP or jump to the
next instruction. The behaviour of branch instructions in this
situation (as used for tail calls) is implementation-defined, so we
cannot rely on the linker replacing the tail call with a return.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215890 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-18 12:42:15 +00:00
Elena Demikhovsky
9735ccb7ea AVX-512: Fixed a bug in emitting compare for MVT:i1 type.
Added a test.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215889 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-18 11:59:06 +00:00
Saleem Abdulrasool
f15492fd72 ARM: improve RTABI 4.2 conformance on Linux
The set of functions defined in the RTABI was separated for no real reason.
This brings us closer to proper utilisation of the functions defined by the
RTABI.  It also sets the ground for correctly emitting function calls to AEABI
functions on all AEABI conforming platforms.

The previously existing lie on the behaviour of __ldivmod and __uldivmod is
propagated as it is beyond the scope of the change.

The changes to the test are due to the fact that we now use the divmod functions
which return both the quotient and remainder and thus we no longer need to
invoke two functions on Linux (making it closer to EABI's behaviour).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215862 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-17 22:51:02 +00:00
NAKAMURA Takumi
0ac5626b56 llvm/test/CodeGen/X86/fmul-combines.ll: Appease Windows x64. <4 x float> is passed by stack.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215821 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-16 22:28:37 +00:00
Matt Arsenault
5f8a9ae17c Fix fmul combines with constant splat vectors
Fixes things like fmul x, 2 -> fadd x, x

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215820 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-16 10:14:19 +00:00
Chandler Carruth
a3805f1c73 [x86] Teach lots of the new vector shuffle lowering to use UNPCK
instructions for blend operations at 128 bits. This was a serious hole
in our prior blend lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215819 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-16 09:42:15 +00:00
Andrea Di Biagio
89cea3c36b [DAGCombiner] Improve the folding of target independet shuffles to Undef.
When combining a pair of shuffle nodes, check if the combined shuffle mask is
trivially Undef. In case, immediately fold that pair of shuffles to Undef.

The lack of checks for undef masks was the root-cause of a poor-codegen bug
in the dag combiner.

Example:
  %1 = shufflevector <4 x i32> %A, <4 x i32> %B, <4 x i32> <i32 4, i32 1, i32 1, i32 6>
  %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 0, i32 4, i32 1, i32 6>
  %3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> <i32 1, i32 5, i32 3, i32 3>

Before this patch, on x86 (with -mcpu=corei7) we failed to fold the entire
sequence to Undef value and therefore we generated:
  shufps $-123, %xmm1, $xmm0
  pshufd $-46, %xmm0, %xmm0

With this patch, the entire shuffle sequence is folded to Undef and no
shuffles are generated in the output assembly.

Added new test cases to test 'combine-vec-shuffle-5.ll'.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215797 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-16 00:29:44 +00:00
Hal Finkel
5dc48ac04a [PowerPC] Mark fixed-offset byvals as pointed-to by IR values
A byval object, even if allocated at a fixed offset (prescribed by the ABI) is
pointed to by IR values. Most fixed-offset stack objects are not pointed-to by
IR values, so the default is to assume this is not possible. However, we need
to override the default in this case (instruction scheduling can cause
miscompiles otherwise).

Fixes PR20280.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215795 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-16 00:17:05 +00:00
Chad Rosier
cc921d6f41 [AArch32] Add support for FP rounding operations for ARMv8/AArch32.
Phabricator Revision: http://reviews.llvm.org/D4935

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215772 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 21:38:16 +00:00
Matt Arsenault
c86e55eb6e R600/SI: Move all fabs / fneg handling to patterns
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215749 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 18:42:22 +00:00
Matt Arsenault
0498d07255 R600/SI: Use source modifiers for f64 fneg
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215748 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 18:42:18 +00:00
Matt Arsenault
c882fc78fe R600/SI: Use source modifier for f64 fabs
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215747 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 18:42:15 +00:00
Matt Arsenault
34ef4cd65b R600/SI: Fix offset folding in some cases with shifted pointers.
Ordinarily (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
is only done if the add has one use. If the resulting constant
add can be folded into an addressing mode, force this to happen
for the pointer operand.

This ends up happening a lot because of how LDS objects are allocated.
Since the globals are allocated next to each other, acessing the first
element of the second object is directly indexed by a shifted pointer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215739 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 17:49:05 +00:00
Chandler Carruth
92ee945e2e [x86] Teach the new AVX v4f64 shuffle lowering to use UNPCK instructions
where applicable for blending.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215737 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 17:42:00 +00:00
Matt Arsenault
5bc44c7603 R600/SI: Add intrinsic for ldexp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215734 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 17:30:25 +00:00
Juergen Ributzka
e2bb4f981b [FastISel][ARM] Fix unit test from r215682.
Thanks Jim for finding this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215733 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 17:23:20 +00:00
Matt Arsenault
ed76ca720b R600/SI: Implement isLegalAddressingMode
The default assumes that a 16-bit signed offset is used.
LDS instruction use a 16-bit unsigned offset, so it wasn't
being used in some cases where it was assumed a negative offset
could be used.

More should be done here, but first isLegalAddressingMode needs
to gain an addressing mode argument. For now, copy most of the rest
of the default implementation with the immediate offset change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215732 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 17:17:07 +00:00
Moritz Roth
d84561bf69 ARM: Fix and re-enable load/store optimizer for Thumb1.
In a previous iteration of the pass, we would try to compensate for
writeback by updating later instructions and/or inserting a SUBS to
reset the base register if necessary.
Since such a SUBS sets the condition flags it's not generally safe to do
this. For now, only merge LDR/STRs if there is no writeback to the base
register (LDM that loads into the base register) or the base register is
killed by one of the merged instructions. These cases are clear wins
both in terms of instruction count and performance.

Also add three new test cases, and update the existing ones accordingly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215729 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 17:00:30 +00:00
Amara Emerson
cef3ad6720 [AArch64] Narrow arguments passed in wrong position on the stack in
big-endian mode.

Patch by Asiri Rathnayake.

Differential Revision: http://reviews.llvm.org/D4922

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215716 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 14:29:57 +00:00
Bill Schmidt
44beebe8de [PPC64] Add test case for r215685.
I had deferred adding this test case until I could get it down to a
reasonable size.  That's done now.

Thanks,
Bill


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215711 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 13:51:57 +00:00
Chandler Carruth
12e69a0267 [x86] Add the initial skeleton of type-based dispatch for AVX vectors in
the new shuffle lowering and an implementation for v4 shuffles.

This allows us to handle non-half-crossing shuffles directly for v4
shuffles, both integer and floating point. This currently misses places
where we could perform the blend via UNPCK instructions, but otherwise
generates equally good or better code for the test cases included to the
existing vector shuffle lowering. There are a few cases that are
entertainingly better. ;]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215702 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 11:01:40 +00:00
Chandler Carruth
886f0101a7 [x86] Fix the very broken formation of vpunpck instructions in the
target-specific shuffl DAG combines.

We were recognizing the paired shuffles backwards. This code needs to be
replaced anyways as we have the same functionality elsewhere, but I'll
do the refactoring in a follow-up, this is the minimal fix to the
behavior.

In addition to fixing miscompiles with the new vector shuffle lowering,
it also causes the canonicalization to kick in much better, selecting
the smaller encoding variants in lots of places in the new AVX path.
This still isn't quite ideal as we don't need both the shufpd and the
punpck instructions, but that'll get fixed in a follow-up patch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215690 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 03:54:49 +00:00
Chandler Carruth
477f28c48d [x86] Fix PR20540 where the x86 shuffle DAG combiner had completely
broken logic for merging shuffle masks in the face of SM_SentinelZero
mask operands.

While these are '-1' they don't mean 'undef' the way '-1' means in the
pre-legalized shuffle masks. Instead, they mean that the shuffle
operation is forcibly zeroing that lane. Reflect this and explicitly
handle it in a bunch of places. In one place the effect is equivalent
but much more clear. In the rest it was really weirdly broken.

Also, rewrite the entire merging thing to be a more directy operation
with a single loop and just doing math to map the indices through the
various masks.

Also add a bunch of asserts to try to make in extremely clear what the
different masks can possibly look like.

Finally, add some comments to clarify that we're merging shuffle masks
*up* here rather than *down* as we do everywhere else, and thus the
logic is quite confusing.

Thanks to several different people for sending test cases, and for
Robert Khasanov for an initial attempt at fixing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215687 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 02:43:18 +00:00
Juergen Ributzka
266ecacfaa [FastISel][ARM] Fall-back to constant pool loads when materializing an i32 constant.
FastEmit_i won't always succeed to materialize an i32 constant and just fail.
This would trigger a fall-back to SelectionDAG, which is really not necessary.

This fix will first fall-back to a constant pool load to materialize the constant
before giving up for good.

This fixes <rdar://problem/18022633>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215682 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-14 23:29:49 +00:00
Juergen Ributzka
6398a7f5fd Revert several FastISel commits to track down a buildbot error.
This reverts:
r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants."
r215594 "[FastISel][X86] Use XOR to materialize the "0" value."
r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization."
r215591 "[FastISel][AArch64] Make use of the zero register when possible."
r215588 "[FastISel] Let the target decide first if it wants to materialize a constant."
r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI."

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215673 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-14 19:56:28 +00:00
Adam Nemet
41c5e687ed [AVX512] Add test for FMA masking instrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215665 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-14 17:13:33 +00:00
Adam Nemet
90eb948fc9 [AVX512] Switch FMA intrinsics to the masking version
This does the renaming and updates the lowering logic.

Part of <rdar://problem/17688758>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215664 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-14 17:13:30 +00:00
Juergen Ributzka
14bc045838 Revert "[FastISel][AArch64] Add support for more addressing modes."
This reverts commits r215597, because it might have broken the build bots.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215659 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-14 17:10:54 +00:00
Sanjay Patel
9615d702ad optimize vector fneg of bitcasted integer value
This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops.

This patch is very similar to a fabs patch committed at r214892.

Differential Revision: http://reviews.llvm.org/D4852



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215646 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-14 15:15:28 +00:00
Toma Tabacu
0b2081a05a [mips] Improve robustness of some tests.
Summary:
This is done by removing some hardcoded registers like $at or expecting a single digit register to be selected.

Contains work done by Matheus Almeida.

Reviewers: matheusalmeida, dsanders

Reviewed By: dsanders

Subscribers: tomatabacu

Differential Revision: http://reviews.llvm.org/D4227

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215640 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-14 13:10:48 +00:00
Chandler Carruth
cad1711154 [x86] Begin stubbing out the AVX support in the new vector shuffle
lowering scheme.

Currently, this just directly bails to the fallback path of splitting
the 256-bit vector into two 128-bit vectors, operating there, and then
joining the results back together. While the results are far from
perfect, they are *shockingly* good for what we're doing here. I'll be
layering the rest of the functionality on top of this piece by piece and
updating tests as I go.

Note that 256-bit vectors in this mode are still somewhat WIP. While
I think the code paths that I'm adding here are clean and good-to-go,
there are still a lot of 128-bit assumptions that I'll need to stomp out
as I march through the functional spread here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215637 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-14 12:13:59 +00:00
Chandler Carruth
369e0ef67d [SDAG] Fix a bug in the DAG combiner where we would fail to return the
input node after manually adding it to the worklist and using CombineTo.

Once we use CombineTo the input node may have been deleted. Despite this
being *completely confusing* and somewhat broken, the only way to
"correctly" return from a DAG combine after potentially deleting the
input node is to return *that exact node*....

But really, this code should just never have used CombineTo. It won't do
what it wants (returning the node as mentioned above just causes the
combine to infloop). The correct way to combine away a casted load to
a load of the correct type is to RAUW the chain directly and then return
the loaded value to replace the actual value node.

I managed to find this with the vector shuffle fuzzer even though it
clearly has nothing at all to do with vector shuffles and rather those
happen to trigger a load of a constant pool that hits this combine *just
right*. I've included the test as it is small and a nice stress test
that the infrastructure isn't asserting.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215622 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-14 08:18:34 +00:00
Chandler Carruth
14ee003f1a [SDAG] Fix a case where we would iteratively legalize a node during
combining by replacing it with something else but not re-process the
node afterward to remove it.

In a truly remarkable stroke of bad luck, this would (in the test case
attached) end up getting some other node combined into it without ever
getting re-processed. By adding it back on to the worklist, in addition
to deleting the dead nodes more quickly we also ensure that if it
*stops* being dead for any reason it makes it back through the
legalizer. Without this, the test case will end up failing during
instruction selection due to an and node with a type we don't have an
instruction pattern for.

It took many million runs of the shuffle fuzz tester to find this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215611 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-14 01:07:37 +00:00
Akira Hatanaka
d0ddfb0896 [AArch64, fast-isel] Fall back to SelectionDAG to select tail calls.
Certain functions such as objc_autoreleaseReturnValue have to be called as
tail-calls even at -O0. Since normal fast-isel doesn't emit calls as tail calls,
we have to fall back to SelectionDAG to select calls that are marked as tail.

<rdar://problem/17991614>



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215600 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 23:23:58 +00:00
Juergen Ributzka
8c9a0319bb [FastISel][AArch64] Add support for more addressing modes.
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.

For Example:
  lsl x1, x1, #3     --> ldr x0, [x0, x1, lsl #3]
  ldr x0, [x0, x1]

  sxtw x1, w1
  lsl x1, x1, #3     --> ldr x0, [x0, x1, sxtw #3]
  ldr x0, [x0, x1]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215597 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 22:53:29 +00:00
Juergen Ributzka
b677a877c8 [FastISel][X86] Add large code model support for materializing floating-point constants.
In the large code model for X86 floating-point constants are placed in the
constant pool and materialized by loading from it. Since the constant pool
could be far away, a PC relative load might not work. Therefore we first
materialize the address of the constant pool with a movabsq and then load
from there the floating-point value.

Fixes <rdar://problem/17674628>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215595 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 22:25:35 +00:00
Juergen Ributzka
0701e5d43b [FastISel][X86] Use XOR to materialize the "0" value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215594 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 22:22:17 +00:00
Juergen Ributzka
f245d9aa77 [FastISel][X86] Emit more efficient instructions for integer constant materialization.
This mostly affects the i64 value type, which always resulted in an 15byte
mobavsq instruction to materialize any constant. The custom code checks the
value of the immediate and tries to use a different and smaller mov
instruction when possible.

This fixes <rdar://problem/17420988>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215593 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 22:18:11 +00:00
Juergen Ributzka
dc408e8069 [FastISel][AArch64] Make use of the zero register when possible.
This change materializes now the value "0" from the zero register.
The zero register can be folded by several instruction, so no
materialization is need at all.

Fixes <rdar://problem/17924413>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215591 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 22:13:14 +00:00
Juergen Ributzka
eb1c51f8b3 [FastISel] Let the target decide first if it wants to materialize a constant.
This changes the order in which FastISel tries to materialize a constant.
Originally it would try to use a simple target-independent approach, which
can lead to the generation of inefficient code.

On X86 this would result in the use of movabsq to materialize any 64bit
integer constant - even for simple and small values such as 0 and 1. Also
some very funny floating-point materialization could be observed too.

On AArch64 it would materialize the constant 0 in a register even the
architecture has an actual "zero" register.

On ARM it would generate unnecessary mov instructions or not use mvn.

This change simply changes the order and always asks the target first if it
likes to materialize the constant. This doesn't fix all the issues
mentioned above, but it enables the targets to implement such
optimizations.

Related to <rdar://problem/17420988>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215588 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 22:08:02 +00:00
Juergen Ributzka
047423787c [FastISel][ARM] Use MOVT/MOVW if the subtarget requests it.
This change is also in preparation for a future change to make sure that
the constant materialization uses MOVT/MOVW when available and not a load
from the constant pool.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215584 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 21:42:19 +00:00
Matt Arsenault
bd949eea85 R600: Correctly set the src value offset for scalarized kernel args
This for some reason fixes v1i64 kernel arguments on pre-SI. This
currently breaks some other cases in the kernel-args.ll test for R600,
but I'm not particularly confident in the new output. VTX_READ_* are not
used for some of the scalarized cases, and the code reading from the
constant buffer doesn't make much sense to me.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215564 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 18:14:11 +00:00
Andrea Di Biagio
05a76eb9f2 [DAGCombiner] Improved target independent vector shuffle combine rule.
This patch improves the existing algorithm in DAGCombiner that
attempts to fold shuffles according to rule:
  shuffle(shuffle(x, y, M1), undef, M2) -> shuffle(y, undef, M3)

Before this change, there were cases where the DAGCombiner conservatively
avoided folding shuffles even if the resulting mask would have been legal.
That is because the algorithm wrongly assumed that commuting
an illegal shuffle mask would always produce an illegal mask.

With this change, we now correctly compute the commuted shuffle mask before
calling method 'isShuffleMaskLegal' on it.
On X86, this improves for example the codegen for the following function:

define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
  %1 = shufflevector <4 x i32> %B, <4 x i32> %A, <4 x i32> <i32 1, i32 2, i32 6, i32 7>
  %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 2, i32 3>
  ret <4 x i32> %2
}

Before this change the X86 backend (-mcpu=corei7) generated
the following assembly code for function @test:
  shufps $-23, %xmm0, %xmm1  # xmm1 = xmm1[1,2],xmm0[2,3]
  movhlps %xmm1, %xmm1       # xmm1 = xmm1[1,1]
  movaps %xmm1, %xmm0

Now we produce:
  movhlps %xmm0, %xmm0       # xmm0 = xmm0[1,1]

Added extra test cases in combine-vec-shuffle-2.ll to verify that we correctly
fold according to the above-mentioned rule.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215555 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 16:09:40 +00:00
Robert Khasanov
232202439a [SKX] Extended non-temporal load/store instructions for AVX512VL subsets.
Added avx512_movnt_vl multiclass for handling 256/128-bit forms of instruction.
Added encoding and lowering tests.

Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215536 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 10:46:00 +00:00
Elena Demikhovsky
4c97c1420b AVX-512: Fixed a bug in shufflevector lowering.
PALIGNR instruction does not exist in AVX-512F set.
Added a test.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215526 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 07:58:43 +00:00
Chandler Carruth
6bb093bbe7 [x86] Rewrite a core part of the new vector shuffle lowering to handle
one pesky test case correctly.

This test case caused the old code to infloop occilating between solving
the low-half and the high-half. The 'side balancing' part of
single-input v8 shuffle lowering didn't handle the one pattern which can
cause it to occilate. Fortunately the fuzz testing found this case.
Unfortuately it was *terrible* to handle. I'm really sorry for the
amount and density of the code here, I'd love suggestions on how to
simplify it. I feel like there *must* be a simpler form here, but after
a lot of days I've not found it. This is the only one I've found that
even works. I've added the one pesky test case along with some nice
comments explaining the core problem that we have to solve here.

So far this has survived approximately 32k test cases. More strenuous
fuzzing commencing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215519 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 01:25:45 +00:00
Hal Finkel
e693d3c558 [PowerPC] Implement PPCTargetLowering::getTgtMemIntrinsic
This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store
intrinsics. As with the construction of the MachineMemOperands for the
intrinsic calls used for unaligned load/store lowering, the only slight
complication is that we need to represent a larger memory range than the
loaded/stored value-type size (because the address is rounded down to an
aligned address, and we need to conservatively represent the entire possible
range of the actual access). This required adding an extra size field to
TargetLowering::IntrinsicInfo, and this was done in a way that required no
modifications to other targets (the size defaults to the store size of the
provided memory data type).

This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215512 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 01:15:40 +00:00
Hal Finkel
695e914c03 Fix classof for ISD::INTRINSIC_W_CHAIN and INTRINSIC_VOID
Unfortunately, our use of the SDNode class hierarchy for INTRINSIC_W_CHAIN and
INTRINSIC_VOID nodes is somewhat broken right now. These nodes sometimes are
used for memory intrinsics (those with MachineMemOperands), and sometimes not.
When not, the nodes are not created as instances of MemIntrinsicSDNode, but
rather created as some other subclass of SDNode using DAG::getNode. When they
are memory intrinsics, they are created using DAG::getMemIntrinsicNode as
instances of MemIntrinsicSDNode. MemIntrinsicSDNode is a subclass of
MemSDNode, but prior to r214452, we had a non-self-consistent setup whereby
MemIntrinsicSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID would
return true but MemSDNode::classof on INTRINSIC_W_CHAIN and INTRINSIC_VOID
would return false. In r214452, MemSDNode::classof was changed to return true
for INTRINSIC_W_CHAIN and INTRINSIC_VOID, which is now self-consistent. The
problem is that neither the pre-r214452 logic and the post-r214452 logic are
really right. The truth is that not all INTRINSIC_W_CHAIN and INTRINSIC_VOID
nodes are instances of MemIntrinsicSDNode (or MemSDNode for that matter), and
the return value from classof needs to reflect that. This was broken before
r214452 (because MemIntrinsicSDNode::classof always returned true), and was
broken afterward (because MemSDNode::classof also always returned true), and
will now be correct.

The minimal solution is to grab one of the SubclassData bits (there is one left
for MemIntrinsicSDNode nodes) and use it to store whether or not a particular
INTRINSIC_W_CHAIN or INTRINSIC_VOID is really an instance of
MemIntrinsicSDNode or not. Doing this allows both MemIntrinsicSDNode::classof
and MemSDNode::classof to return the correct answer for the underlying object
for both the memory-intrinsic and non-memory-intrinsic cases.

This fixes the problem that r214452 created in the SelectionDAGDumper (thanks
to Matt Arsenault for pointing it out).

Because PowerPC does not implement getTgtMemIntrinsic, this change breaks
test/CodeGen/PowerPC/unal-altivec-wint.ll. I've XFAILed it for now, and will
fix it in a follow-up commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215511 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 01:15:37 +00:00
Adam Nemet
4c9467ea5d [AVX512] Verify the code generated for the intrinsic _mm512_broadcastsd_pd
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215487 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 00:30:05 +00:00
Adam Nemet
6ea7f36872 [AVX512] Handle valign masking intrinsic via C++ lowering
I think that this will scale better in most cases than adding a Pat<> for each
mapping from the intrinsic DAG to the intruction (i.e. rri, rrik, rrikz).  We
can just lower to the SDNode and have the resulting DAG be matches by the DAG
patterns.

Alternatively (long term), we could keep the Pat<>s but generate them via the
new AVX512_masking multiclass.  The difficulty is that in order to formulate
that we would have to concatenate DAGs.  Currently this is only supported if
the operators of the input DAGs are identical.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215473 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-12 21:13:12 +00:00
Jan Vesely
3c57820bbb R600: Use optimized 24bit path in udivrem
v2: drop enum keyword
    use correct extension mode
    don't bother computing the sign in unsinged case

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215462 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-12 17:31:20 +00:00
Jan Vesely
b40562c0ec R600: Use i24 optimized path for SREM
v2: add tests
    rename LowerSDIV24 to LowerSDIVREM24
    handle the rem part in this function

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215460 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-12 17:31:17 +00:00
Gerolf Hoflehner
392f7d970c [MachineCombiner] Fix for ICE bug 20598
The combiner ignored DBG nodes when checking
the uses of a virtual register.

It combined a sequence like
   %vreg1 = madd %vreg2, %vreg3,...
   DBG_VALUE (%vreg1 ...)
   %vreg4 = add %vreg1,...
to
  %vreg4 = madd %vreg2, %vreg3

leaving behind a dangling DBG_VALUE with
a definition. This triggered an assertion
in the MachineTraceMetrics.cpp module.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215431 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-12 07:54:12 +00:00
Michael J. Spencer
4935833df5 [x86] Fold extract_vector_elt of a load into the Load's address computation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215409 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-11 23:49:33 +00:00
Tom Stellard
13f4476c55 R600/SI: Add a ComplexPattern for selecting MUBUF _OFFSET variant
This saves us from having to copy a 64-bit 0 value into VGPRs for
BUFFER_* instruction which only have a 12-bit immediate offset.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215399 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-11 22:18:17 +00:00
Tom Stellard
68e9ebbe44 R600/SI: Add check for low 32 bits of encoding to mubuf tests
There are no variable values like registers encoded in the low 32 bits of MUBUF
instructions, so it is relatively easy to check these bits, and it will
help prevent us from introducing encoding bugs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215397 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-11 22:18:11 +00:00
Tom Stellard
728d0e4218 R600/SI: Clear lds bit on MUBUF instructions used for private stores
This bit was left uninitialized, which was causing some random failures
of piglit tests.

NOTE: This is a candidate for the 3.5 branch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215396 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-11 22:18:09 +00:00
Tom Stellard
0df264a0fd R600/SI: Fix broken test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215395 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-11 22:18:05 +00:00
Quentin Colombet
7f4f923aa5 [AArch64] Fix registerAllocator assigns same register for base and wback in
pre/post-index load and store.

Patch by Steven Wu <stevenwu@apple.com>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215390 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-11 21:39:53 +00:00
Saleem Abdulrasool
6c2be4ff95 ARM: try harder to detect non-IT eligible instructions
For many Thumb-1 register register instructions, setting the CPSR is not
permitted inside an IT block.  We would not correctly flag those instructions.
The previous change to identify this scenario was insufficient as it did not
actually catch all the instances.  The current list is formed by manual
inspection of the ARMv6M ARM.

The change to the Thumb2 IT block test is due to the fact that the new more
stringent checking of the MIs results in the If Conversion pass being prevented
from executing (since not all the instructions in the BB are predicable).  This
results in code gen changes.

Thanks to Tim Northover for pointing out that the previous patch was
insufficient and hinting that the use of the v6M ARM would be much easier to use
than the v7 or v8!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215382 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-11 20:13:25 +00:00
Sanjay Patel
7c0fa0cfab Correct a missing RUN line in the ARM codegen test for fneg ops. We should also explicitly specify +/-neonfp.
The bug was introduced at r99570 when use of "-arm-use-neon-fp" was removed.

Differential Revision: http://reviews.llvm.org/D4846



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215377 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-11 19:04:28 +00:00
Oliver Stannard
17ef00ea94 ARM: __gnu_h2f_ieee and __gnu_f2h_ieee always use the soft-float calling convention
By default, LLVM uses the "C" calling convention for all runtime
library functions. The half-precision FP conversion functions use the
soft-float calling convention, and are needed for some targets which
use the hard-float convention by default, so must have their calling
convention explicitly set.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215348 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-11 09:12:32 +00:00
Jiangning Liu
0679d2d0a4 In Machine CSE pass, the source register of a COPY machine instruction can
be propagated to all its users, and this propagation could increase the 
probability of finding common subexpressions. If the COPY has only one user,
the COPY itself can be removed.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215344 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-11 05:17:19 +00:00
Petar Jovanovic
97b0c63f6b Add support for scalarizing cttz_zero_undef
Follow up to r214266. Add missing case in ScalarizeVectorResult() for
cttz_zero_undef.

Differential Revision: http://reviews.llvm.org/D4813


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215330 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-10 22:49:54 +00:00
Saleem Abdulrasool
3e5734dc38 ARM: correct isPredicable for MULS in ThHUMB mode
The ARM ARM states that CPSR may not be updated by a MUL in thumb mode.  Due to
an ordering of Thumb 2 Size Reduction and If Conversion, we would end up
generating a THUMB MULS inside an IT block.

The If Conversion pass uses the TTI isPredicable method to ensure that it can
transform a Basic Block.  However, because we only check for IT handling on
Thumb2 functions, we may miss some cases.  Even then, it only validates that the
CPSR is not *live* rather than it is not accessed.  This corrects the handling
for that particular case since the same restriction does not hold on the vast
majority of the instructions.

This does prevent the IfConversion optimization from kicking in in certain
cases, but generating correct code is more valuable.  Addresses PR20555.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215328 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-10 22:20:37 +00:00
Tom Stellard
4e8a136db8 R600/SI: Custom lower CONCAT_VECTORS
This will lower them using register copies rather than loads and stores
to the stack.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215270 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-09 01:06:56 +00:00
Tom Stellard
f1ba587963 R600/SI: Update concat_vectors.ll to check for scratch usage
These tests were using SI-NOT: MOVREL to make sure concat vectors
weren't being lowered to stack loads and stores, but we are using
scratch buffers for the stack now instead of registers, so we need
to add an additional SI-NOT check for scratch buffers.

With this change I was able to uncover one broken test which will
be fixed in a future commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215269 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-09 01:06:53 +00:00
Joerg Sonnenberger
f0b70e2fbc Provide an implementation of getNoopForMachoTarget for PPC, otherwise
empty functions will assert in the MC object writer.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215238 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-08 19:13:23 +00:00
Juergen Ributzka
0980ef248e [FastISel][X86] Fix INC/DEC optimization (r215230)
I accidentally also used INC/DEC for unsigned arithmetic which doesn't work,
because INC/DEC don't set the required flag which is used for the overflow
check.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215237 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-08 18:47:04 +00:00
Juergen Ributzka
cbda4b32c6 [FastISel][X86] Use INC/DEC when possible for {sadd|ssub}.with.overflow intrinsics.
This is a small peephole optimization to emit INC/DEC when possible.

Fixes <rdar://problem/17952308>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215230 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-08 17:21:37 +00:00
Daniel Sanders
a19fc6deb8 [mips] Invert the abicalls feature bit to be noabicalls so that it's possible for -mno-abicalls to take effect.
Also added the testcase that should have been in r215194.

This behaviour has surprised me a few times now. The problem is that the
generated MipsSubtarget::ParseSubtargetFeatures() contains code like this:

   if ((Bits & Mips::FeatureABICalls) != 0) IsABICalls = true;

so '-abicalls' means 'leave it at the default' and '+abicalls' means 'set it to
true'. In this case, (and the similar -modd-spreg case) I'd like the code to be

  IsABICalls = (Bits & Mips::FeatureABICalls) != 0;

or possibly:

   if ((Bits & Mips::FeatureABICalls) != 0)
     IsABICalls = true;
   else
     IsABICalls = false;

and preferably arrange for 'Bits & Mips::FeatureABICalls' to be true by default
(on some triples).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215211 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-08 15:47:17 +00:00
Jiangning Liu
3b85f30319 [AArch64] Fix a type conversion bug for anlyzing compare.
The bug can cause spec2006/483.xalancbmk failure.

Patched by David Xu.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215206 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-08 14:19:29 +00:00
Daniel Sanders
1952807c9c [mips] Remove reason for XFAIL from a test that isn't actually XFAILed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215201 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-08 12:58:17 +00:00
James Molloy
3a106a2813 [AArch64] Add an FP load balancing pass for Cortex-A57
For best-case performance on Cortex-A57, we should try to use a balanced mix of odd and even D-registers when performing a critical sequence of independent, non-quadword FP/ASIMD floating-point multiply or multiply-accumulate operations.

This pass attempts to detect situations where the register allocation may adversely affect this load balancing and to change the registers used so as to better utilize the CPU.

Ideally we'd just take each multiply or multiply-accumulate in turn and allocate it alternating even or odd registers. However, multiply-accumulates are most efficiently performed in the same functional unit as their accumulation operand. Therefore this pass tries to find maximal sequences ("Chains") of multiply-accumulates linked via their accumulation operand, and assign them all the same "color" (oddness/evenness).

This optimization affects S-register and D-register floating point multiplies and FMADD/FMAs, as well as vector (floating point only) muls and FMADD/FMA. Q register instructions (and 128-bit vector instructions) are not affected.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215199 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-08 12:33:21 +00:00
Tim Northover
bbdf1e0432 AArch64: stop trying to take control of all UnknownArch triples.
This short-circuited our error reporting for incorrectly specified
target triples (you'd get AArch64 code instead).

Should fix PR20567.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215191 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-08 08:27:44 +00:00
Patrik Hagglund
cf403861a3 [pr19635] Revert most of r170537, and add new testcase.
Patch provided by Andrey Kuharev.

Sorry, r170537 was obviously wrong.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215190 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-08 08:21:19 +00:00
Adam Nemet
a8e1cda622 [AVX512] Add zero-masking variant to AVX512_masking multiclass
This completes one item from the todo-list of r215125 "Generate masking
instruction variants with tablegen".

The AddedComplexity is needed just like for the k variant.

Added a codegen test based on valignq.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215173 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 23:53:38 +00:00
Adam Nemet
690499ed49 [AVX512] Add codegen test for the masking variant of valign
The AddedComplexity is needed just like in avx512_perm_3src.  There may be a
bug in the complexity computation...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215168 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 23:18:18 +00:00
Akira Hatanaka
43f6ce9289 [stack protector] Look through bitcasts to get global variable
__stack_chk_guard.

Handle the case where the pointer operand of the load instruction that loads the
stack guard is not a global variable but instead a bitcast.

%StackGuard = load i8** bitcast (i64** @__stack_chk_guard to i8**)
call void @llvm.stackprotector(i8* %StackGuard, i8** %StackGuardSlot)

Original test case provided by Ana Pazos.

This fixes PR20558.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215167 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 23:08:24 +00:00
Adrian Prantl
7f48f056f7 Make these regexes stricter by disallowing any additional characters in the output.
Thanks to dblaikie for pointing this out!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215166 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 23:04:07 +00:00
Adrian Prantl
2ead89ae61 Reflow this comment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215160 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 22:44:24 +00:00
Reed Kotler
cf76da912c fix materialization of one bit constants and global values which are accessed through
a base GOT entry.

Summary:
get tip of tree mips fast-isel to pass test-suite

Two bugs were fixed:

1) one bit booleans were treated as 1 bit signed integers and so the literal '1' could become sign extended.
2) mips uses got for pic but in certain cases, as with string constants for example, many items can be referenced from the same got entry and this case was not handled properly.

Test Plan: test-suite

Reviewers: dsanders

Reviewed By: dsanders

Subscribers: mcrosier

Differential Revision: http://reviews.llvm.org/D4801

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215155 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 22:09:01 +00:00
Gerolf Hoflehner
e4fa341dde MachineCombiner Pass for selecting faster instruction sequence on AArch64
Re-commit of r214832,r21469 with a work-around that
avoids the previous problem with gcc build compilers

The work-around is to use SmallVector instead of ArrayRef
of basic blocks in preservesResourceLen()/MachineCombiner.cpp



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215151 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 21:40:58 +00:00
Akira Hatanaka
70b56056a1 [Branch probability] Recompute branch weights of tail-merged basic blocks.
BranchFolderPass was not correctly setting the basic block branch weights when
tail-merging created or merged blocks. This patch recomutes the weights of
tail-merged blocks using the following formula:

branch_weight(merged block to successor j) =
sum(block_frequency(bb) * branch_probability(bb -> j))

bb is a block that is in the set of merged blocks.

<rdar://problem/16256423>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215135 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 19:30:13 +00:00
Chandler Carruth
0e89fbb120 [x86] Fix another miscompile found through fuzz testing the new vector
shuffle lowering.

This is closely related to the previous one. Here we failed to use the
source offset when swapping in the other case -- where we end up
swapping the *final* shuffle. The cause of this bug is a bit different:
I simply wasn't thinking about the fact that this mask is actually
a slice of a wide mask and thus has numbers that need SourceOffset
applied. Simple fix. Would be even more simple with an algorithm-y thing
to use here, but correctness first. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215095 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 10:37:35 +00:00
Chandler Carruth
0651861b7b [x86] Fix another miscompile in the new vector shuffle lowering found
via the fuzz tester.

Here I missed an offset when round-tripping a value through a shuffle
mask. I got it right 2 lines below. See a problem? I do. ;] I'll
probably be adding a little "swap" algorithm which accepts a range and
two values and swaps those values where they occur in the range. Don't
really have a name for it, let me know if you do.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215094 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 10:14:27 +00:00
Chandler Carruth
b3364512fc [x86] Fix another miscompile in the new vector shuffle lowering found
through the new fuzzer.

This one is great: bad operator precedence led the modulus to happen at
the wrong point. All the asserts didn't fire because there were usually
the right values past the end of the 4 element region we were looking
at. Probably could have gotten a crash here with ASan + fuzzing, but the
correctness tests pinpointed this really nicely.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215092 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 09:45:02 +00:00
Pavel Chupin
5d8c984e54 [x32] Use ebp/esp as frame and stack pointer
Summary:
Since pointers are 32-bit on x32 we can use ebp and esp as frame and stack
pointer. Some operations like PUSH/POP and CFI_INSTRUCTION still
require 64-bit register, so using 64-bit MachineFramePtr where required.

X86_64 NaCl uses 64-bit frame/stack pointers, however it's been found that
both isTarget64BitLP64 and isTarget64BitILP32 are true for NaCl. Addressing
this issue here as well by making isTarget64BitLP64 false.

Also mark hasReservedSpillSlot unreachable on X86. See inlined comments.

Test Plan: Add one new simple test and upgrade 2 existing with x32 target case.

Reviewers: nadav, dschuff

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D4617

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215091 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 09:41:19 +00:00
Chandler Carruth
15d82b7d33 [x86] Fix a miscompile in the new shuffle lowering found through the new
fuzz testing.

The function which tested for adjacency did what it said on the tin, but
when I called it, I wanted it to do something more thorough: I wanted to
know if the *pairs* of shuffle elements were adjacent and started at
0 mod 2. In one place I had the decency to try to test for this, but in
the other it was completely skipped, miscompiling this test case. Fix
this by making the helper actually do what I wanted it to do everywhere
I called it (and removing the now redundant code in one place).

I *really* dislike the name "canWidenShuffleElements" for this
predicate. If anyone can come up with a better name, please let me know.
The other name I thought about was "canWidenShuffleMask" but is it
really widening the mask to reduce the number of lanes shuffled? I don't
know. Naming things is hard.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215089 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 08:11:31 +00:00
Sanjay Patel
b9736caa6a Fix a test that has no checks.
X86 doesn't have fneg, so check for xor.

Differential Revision: http://reviews.llvm.org/D4812


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214992 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-06 20:45:30 +00:00
Matt Arsenault
60178b180f R600: Cleanup fadd and fsub tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214991 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-06 20:27:55 +00:00
Reid Kleckner
7911f2db78 Add a triple to this test to get the right IR mangling
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214982 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-06 18:09:15 +00:00
Reid Kleckner
5d04e520c0 Don't count inreg params when mangling fastcall functions
This is consistent with MSVC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214981 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-06 18:09:04 +00:00
Reid Kleckner
9688239469 Round up the size of byval arguments to MinAlign
Otherwise we can end up with an argument frame size that is not a
multiple of stack slot size, which is very awkward.

This fixes PR20547, which was a bug in x86_64 Sys V vararg handling.
However, it's much easier to test this with x86 callee-cleanup
functions, which previously ended in "retl $6" instead of "retl $8".

This does affect behavior of all backends, but it presumably fixes the
same bug in all of them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214980 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-06 17:57:23 +00:00
Robert Khasanov
ec4188bad7 [AVX512] Added load/store instructions to Register2Memory opcode tables.
Added lowering tests for load/store.

Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214972 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-06 15:40:34 +00:00
James Molloy
e0243fb42d [AArch64] Add a testcase for r214957.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214965 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-06 13:31:32 +00:00
Tim Northover
2c0d42ac9a ARM: do not generate BLX instructions on Cortex-M CPUs.
Particularly on MachO, we were generating "blx _dest" instructions on M-class
CPUs, which don't actually exist. They happen to get fixed up by the linker
into valid "bl _dest" instructions (which is why such a massive issue has
remained largely undetected), but we shouldn't rely on that.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214959 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-06 11:13:14 +00:00
Tim Northover
08828a979a ARM-MachO: materialize callee address correctly on v4t.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214958 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-06 11:13:06 +00:00
Chandler Carruth
a341a8070a [x86] Fix two independent miscompiles in the process of getting the same
test case to actually generate correct code.

The primary miscompile fixed here is that we weren't correctly handling
in-place elements in one half of a single-input v8i16 shuffle when
moving a dword of elements from that half to the other half. Some times,
we would clobber the in-place elements in forming the dword to move
across halves.

The fix to this involves forcibly marking the in-place inputs even when
there is no need to gather them into a dword, and to much more carefully
re-arrange the elements when grouping them into a dword to move across
halves. With these two changes we would generate correct shuffles for
the test case, but found another miscompile. There are also some random
perturbations of the generated shuffle pattern in SSE2. It looks like
a wash; more instructions in some cases fewer in others.

The second miscompile would corrupt the results into nonsense. This is
a buggy pattern in one of the added DAG combines. Mapping elements
through a PSHUFD when pairing redundant half-shuffles is *much* harder
than this code makes it out to be -- it requires reasoning about *all*
of where the input is used in the PSHUFD, not just one part of where it
is used. Plus, we can't combine a half shuffle *into* a PSHUFD but the
code didn't guard against it. I think this was just a bad idea and I've
just removed that aspect of the combine. No tests regress as
a consequence so seems OK.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214954 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-06 10:16:36 +00:00
Adam Nemet
2b9b50379b [X86] Fixes commit r214890 to match the posted patch
This was another fallout from my local rebase where something went wrong :(

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214951 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-06 07:13:12 +00:00
Matt Arsenault
85dc7da6f3 R600: Increase nearby load scheduling threshold.
This partially fixes weird looking load scheduling
in memcpy test. The load clustering doesn't seem
particularly smart, but this method seems to be partially
deprecated so it might not be worth trying to fix.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214943 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-06 00:29:49 +00:00