11020 Commits

Author SHA1 Message Date
Matt Arsenault
ff71f3dac1 R600: Run private-memory test with and without alloca promote
The unpromoted path still needs to be tested since we can't
always promote to using LDS.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212894 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-13 02:18:06 +00:00
Saleem Abdulrasool
01c06d7954 AArch64: add support for llvm.aarch64.hint intrinsic
This adds a llvm.aarch64.hint intrinsic to mirror the llvm.arm.hint in order to
support the various hint intrinsic functions in the ACLE.

Add an optional pattern field that permits the subclass to specify the pattern
that matches the selection.  The intrinsic pattern is set as mayLoad, mayStore,
so overload the value for the definition of the hint instruction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212883 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-12 21:20:49 +00:00
Matt Arsenault
cc6a418600 R600: Add missing tests for some intrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212870 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-12 00:36:19 +00:00
Ulrich Weigand
edb27188b4 [PowerPC] Fix invalid displacement created by LocalStackAlloc
This commit fixes a bug in PPCRegisterInfo::isFrameOffsetLegal that
could result in the LocalStackAlloc pass creating an MI instruction
out-of-range displacement:
        %vreg17<def> = LD 33184, %vreg31; mem:LD8[%g](align=32)
        %G8RC:%vreg17 G8RC_and_G8RC_NOX0:%vreg31
(In final assembler output the top bits are stripped off, resulting
in a negative offset loading from below the stack pointer.)

Common code expects the isFrameOffsetLegal routine to verify whether
adding a given offset to the offset already present in the instruction
results in a valid displacement.  However, on PowerPC the routine
did not take the already present instruction offset into account.

This commit fixes isFrameOffsetLegal to add the instruction offset,
and updates a local caller (needsFrameBaseReg) to no longer add the
instruction offset itself before calling isFrameOffsetLegal.

Reviewed by Hal Finkel.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212832 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-11 17:19:31 +00:00
Marek Olsak
438e1f2ad8 R600/SI: Use i32 vectors for resources and samplers
This affects new intrinsics only.

What surprises me is that v32i8 still works.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212831 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-11 17:11:52 +00:00
Marek Olsak
5a35fdcafb R600/SI: add sample and image intrinsics exposing all instruction fields
We need the intrinsics with offsets, so why not just add them all.
The R128 parameter will also be useful for reducing SGPR usage.
GL_ARB_image_load_store also adds some image GLSL modifiers like "coherent",
so Mesa will probably translate those to slc, glc, etc.

When LLVM 3.5 is released, I'll switch Mesa to these new intrinsics.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212830 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-11 17:11:46 +00:00
Oliver Stannard
cb047f2a74 ARM: Allow __fp16 as a function arg or return type for AArch64
ACLE 2.0 allows __fp16 to be used as a function argument or return
type. This enables this for AArch64.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212812 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-11 13:33:46 +00:00
Quentin Colombet
52c50f5ec5 [X86] Fix the inversion of low and high bits for the lowering of MUL_LOHI.
Also add a few comments.

<rdar://problem/17581756>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212808 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-11 12:08:23 +00:00
Jan Vesely
865527a09b R600: Implement float to long/ulong
Use alg. from LegalizeDAG.cpp
Move Expand setting to SIISellowering

v2: Extend existing tests instead of creating new ones
v3: use separate LowerFPTOSINT function
v4: use TargetLowering::expandFP_TO_SINT
    add comment about using FP_TO_SINT for uints

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212773 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 22:40:21 +00:00
Zoran Jovanovic
fc6a659676 [mips] Emit two CFI offset directives per double precision SDC1/LDC1
instead of just one for FR=1 registers
Differential Revision: http://reviews.llvm.org/D4310


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212769 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 22:23:30 +00:00
Andrea Di Biagio
4d3b6131c7 Extend the test coverage in combine-vec-shuffle-2.ll adding some negative tests.
Add test cases where we don't expect to trigger the combine optimizations
introduced at revision 212748.

No functional change intended.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212756 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 18:59:41 +00:00
Matt Arsenault
b730c3d28d Revert "Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine.""
Don't try to convert the select condition type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212750 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 18:21:04 +00:00
Andrea Di Biagio
3996e5290a [DAG] Further improve the logic in DAGCombiner that folds a pair of shuffles into a single shuffle if the resulting mask is legal.
This patch teaches the DAGCombiner how to fold shuffles according to the
following new rules:
  1. shuffle(shuffle(x, y), undef) -> x
  2. shuffle(shuffle(x, y), undef) -> y
  3. shuffle(shuffle(x, y), undef) -> shuffle(x, undef)
  4. shuffle(shuffle(x, y), undef) -> shuffle(y, undef)

The backend avoids to combine shuffles according to rules 3. and 4. if
the resulting shuffle does not have a legal mask. This is to avoid introducing
illegal shuffles that are potentially expanded into a sub-optimal sequence of
target specific dag nodes during vector legalization.

Added test case combine-vec-shuffle-2.ll to verify that we correctly triggers
the new rules when combining shuffles.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212748 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 18:04:55 +00:00
Akira Hatanaka
60079c18bc [X86] Mark pseudo instruction TEST8ri_NOEREX as hasSIdeEffects=0.
Also, add a case clause in X86InstrInfo::shouldScheduleAdjacent to enable
macro-fusion.

<rdar://problem/15680770>



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212747 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 18:00:53 +00:00
Zoran Jovanovic
0ef47114d3 [mips] Added FPXX modeless calling convention.
Differential Revision: http://reviews.llvm.org/D4293


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212726 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 15:36:12 +00:00
Tim Northover
6b0ac2aa02 AArch64: correctly fast-isel i8 & i16 multiplies
We were asking for a register for type i8 or i16 which caused an assert.

rdar://problem/17620015

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212718 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 14:18:46 +00:00
Daniel Sanders
24a071b8c5 [mips] Add support for -modd-spreg/-mno-odd-spreg
Summary:
When -mno-odd-spreg is in effect, 32-bit floating point values are not
permitted in odd FPU registers. The option also prohibits 32-bit and 64-bit
floating point comparison results from being written to odd registers.

This option has three purposes:
* It allows support for certain MIPS implementations such as loongson-3a that
  do not allow the use of odd registers for single precision arithmetic.
* When using -mfpxx, -mno-odd-spreg is the default and this allows us to
  statically check that code is compliant with the O32 FPXX ABI since mtc1/mfc1
  instructions to/from odd registers are guaranteed not to appear for any
  reason. Once this has been established, the user can then re-enable
  -modd-spreg to regain the use of all 32 single-precision registers.
* When using -mfp64 and -mno-odd-spreg together, an O32 extension named
  O32 FP64A is used as the ABI. This is intended to provide almost all
  functionality of an FR=1 processor but can also be executed on a FR=0 core
  with the assistance of a hardware compatibility mode which emulates FR=0
  behaviour on an FR=1 processor.

* Added '.module oddspreg' and '.module nooddspreg' each of which update
  the .MIPS.abiflags section appropriately
* Moved setFpABI() call inside emitDirectiveModuleFP() so that the caller
  doesn't have to remember to do it.
* MipsABIFlags now calculates the flags1 and flags2 member on demand rather
  than trying to maintain them in the same format they will be emitted in.

There is one portion of the -mfp64 and -mno-odd-spreg combination that is not
implemented yet. Moves to/from odd-numbered double-precision registers must not
use mtc1. I will fix this in a follow-up.

Differential Revision: http://reviews.llvm.org/D4383


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212717 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 13:38:23 +00:00
Chandler Carruth
cdbdfa28d1 [x86,SDAG] Introduce any- and sign-extend-vector-inreg nodes analogous
to the zero-extend-vector-inreg node introduced previously for the same
purpose: manage the type legalization of widened extend operations,
especially to support the experimental widening mode for x86.

I'm adding both because sign-extend is expanded in terms of any-extend
with shifts to propagate the sign bit. This removes the last
fundamental scalarization from vec_cast2.ll (a test case that hit many
really bad edge cases for widening legalization), although the trunc
tests in that file still appear scalarized because the the shuffle
legalization is scalarizing. Funny thing, I've been working on that.

Some initial experiments with this and SSE2 scenarios is showing
moderately good behavior already for sign extension. Still some work to
do on the shuffle combining on X86 before we're generating optimal
sequences, but avoiding scalarization is a huge step forward.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212714 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 12:32:32 +00:00
NAKAMURA Takumi
e93de7fb73 llvm/test/CodeGen/X86/shift-parts.ll: FileCheck-ize. (from r212640)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212709 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 11:37:39 +00:00
NAKAMURA Takumi
5290734b8f Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine."
This caused miscompilation on, at least, x86-64. SExt(i1 cond) confused other optimizations.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212708 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 11:37:28 +00:00
Chandler Carruth
a4e7f05a28 [x86] Add another combine that is particularly useful for the new vector
shuffle lowering: match shuffle patterns equivalent to an unpcklwd or
unpckhwd instruction.

This allows us to use generic lowering code for v8i16 shuffles and match
the unpack pattern late.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212705 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 11:09:29 +00:00
Daniel Sanders
b0b3161567 Make it possible for ints/floats to return different values from getBooleanContents()
Summary:
On MIPS32r6/MIPS64r6, floating point comparisons return 0 or -1 but integer
comparisons return 0 or 1.

Updated the various uses of getBooleanContents. Two simplifications had to be
disabled when float and int boolean contents differ:
- ScalarizeVecRes_VSELECT except when the kind of boolean contents is trivially
  discoverable (i.e. when the condition of the VSELECT is a SETCC node).
- visitVSELECT (select C, 0, 1) -> (xor C, 1).
  Come to think of it, this one could test for the common case of 'C'
  being a SETCC too.

Preserved existing behaviour for all other targets and updated the affected
MIPS32r6/MIPS64r6 tests. This also fixes the pi benchmark where the 'low'
variable was counting in the wrong direction because it thought it could simply
add the result of the comparison.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, jholewinski, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D4389


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212697 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 10:18:12 +00:00
Chandler Carruth
977aab501d [x86] Expand the target DAG combining for PSHUFD nodes to be able to
combine into half-shuffles through unpack instructions that expand the
half to a whole vector without messing with the dword lanes.

This fixes some redundant instructions in splat-like lowerings for
v16i8, which are now getting to be *really* nice.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212695 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 09:57:36 +00:00
Chandler Carruth
dc90a3ab8f [x86] Tweak the v16i8 single input special case lowering for shuffles
that splat i8s into i16s.

Previously, we would try much too hard to arrange a sequence of i8s in
one half of the input such that we could unpack them into i16s and
shuffle those into place. This isn't always going to be a cheaper i8
shuffle than our other strategies. The case where it is always going to
be cheaper is when we can arrange all the necessary inputs into one half
using just i16 shuffles. It happens that viewing the problem this way
also makes it much easier to produce an efficient set of shuffles to
move the inputs into one half and then unpack them.

With this, our splat code gets one step closer to being not terrible
with the new experimental lowering strategy. It also exposes two
combines missing which I will add next.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212692 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 09:16:40 +00:00
Chandler Carruth
95b14b00da [x86] Initial improvements to the new shuffle lowering for v16i8
shuffles specifically for cases where a small subset of the elements in
the input vector are actually used.

This is specifically targetted at improving the shuffles generated for
trunc operations, but also helps out splat-like operations.

There is still some really low-hanging fruit here that I want to address
but this is a huge step in the right direction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212680 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 04:34:06 +00:00
Hao Liu
a3c15c19b8 [AArch64]Fix an assertion failure in DAG Combiner about concating 2 build_vector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212677 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 03:41:50 +00:00
Matt Arsenault
425ef825a6 R600/SI: Add support for llvm.convert.{to|from}.fp16
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212676 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-10 03:22:20 +00:00
David Blaikie
eff0c67b6a Recommit r212203: Don't try to construct debug LexicalScopes hierarchy for functions that do not have top level debug information.
Reverted by Eric Christopher (Thanks!) in r212203 after Bob Wilson
reported LTO issues. Duncan Exon Smith and Aditya Nandakumar helped
provide a reduced reproduction, though the failure wasn't too hard to
guess, and even easier with the example to confirm.

The assertion that the subprogram metadata associated with an
llvm::Function matches the scope data referenced by the DbgLocs on the
instructions in that function is not valid under LTO. In LTO, a C++
inline function might exist in multiple CUs and the subprogram metadata
nodes will refer to the same llvm::Function. In this case, depending on
the order of the CUs, the first intance of the subprogram metadata may
not be the one referenced by the instructions in that function and the
assertion will fail.

A test case (test/DebugInfo/cross-cu-linkonce-distinct.ll) is added, the
assertion removed and a comment added to explain this situation.

Original commit message:

If a function isn't actually in a CU's subprogram list in the debug info
metadata, ignore all the DebugLocs and don't try to build scopes, track
variables, etc.

While this is possibly a minor optimization, it's also a correctness fix
for an incoming patch that will add assertions to LexicalScopes and the
debug info verifier to ensure that all scope chains lead to debug info
for the current function.

Fix up a few test cases that had broken/incomplete debug info that could
violate this constraint.

Add a test case where this occurs by design (inlining a
debug-info-having function in an attribute nodebug function - we want
this to work because /if/ the nodebug function is then inlined into a
debug-info-having function, it should be fine (and will work fine - we
just stitch the scopes up as usual), but should the inlining not happen
we need to not assert fail either).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212649 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 21:02:41 +00:00
Matt Arsenault
3e8ed89484 Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine.
Do this if the truncate is free and the select is legal.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212640 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 19:12:07 +00:00
Jim Grosbach
a3edd6a038 AArch64: Better codegen for storing to __fp16.
Storing will generally be immediately preceded by rounding from an f32
or f64, so make sure to match those patterns directly to convert into the
FPR16 register class directly rather than going through the integer GPRs.

This also eliminates an extra step in the convert-from-f64 path
which was first converting to f32 and then to f16 from there.

rdar://17594379

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212638 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 18:55:52 +00:00
Chandler Carruth
4c27c85cde [x86] Fix a bug in my new zext-vector-inreg DAG trickery where we were
not widening the input type to the node sufficiently to let the ext take
place in a register.

This would in turn result in a mysterious bitcast assertion failure
downstream. First change here is to add back the helpful assert I had in
an earlier version of the code to catch this immediately.

Next change is to add support to the type legalization to detect when we
have widened the operand either too little or too much (for whatever
reason) and find a size-matched legal vector type to convert it to
first. This can also fail so we get a new fallback path, but that seems
OK.

With this, we no longer crash on vec_cast2.ll when using widening. I've
also added the CHECK lines for the zero-extend cases here. We still need
to support sign-extend and trunc (or something) to get plausible code
for the other two thirds of this test which is one of the regression
tests that showed the most scalarization when widening was
force-enabled. Slowly closing in on widening being a viable legalization
strategy without it resorting to scalarization at every turn. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212614 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 12:36:54 +00:00
Benjamin Kramer
4afbd3e941 X86: When lowering v8i32 himuls use the correct shuffle masks for AVX2.
Turns out my trick of using the same masks for SSE4.1 and AVX2 didn't work out
as we have to blend two vectors. While there remove unecessary cross-lane moves
from the shuffles so the backend can lower it to palignr instead of vperm.

Fixes PR20118, a miscompilation of vector sdiv by constant on AVX2.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212611 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 11:12:39 +00:00
Chandler Carruth
ce184e95f9 [x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when widening
vector types to be legal and a ZERO_EXTEND node is encountered.

When we use widening to legalize vector types, extend nodes are a real
challenge. Either the input or output is likely to be legal, but in many
cases not both. As a consequence, we don't really have any way to
represent this situation and the prior code in the widening legalization
framework would just scalarize the extend operation completely.

This patch introduces a new DAG node to represent doing a zero extend of
a vector "in register". The core of the idea is to allow legal but
different vector types in the input and output. The output vector must
have fewer lanes but wider elements. The operation is defined to zero
extend the low elements of the input to the size of the output elements,
and drop all of the high elements which don't have a corresponding lane
in the output vector.

It also includes generic expansion of this node in terms of blending
a zero vector into the high elements of the vector and bitcasting
across. This in turn yields extremely nice code for x86 SSE2 when we use
the new widening legalization logic in conjunction with the new shuffle
lowering logic.

There is still more to do here. We need to support sign extension, any
extension, and potentially int-to-float conversions. My current plan is
to continue using similar synthetic nodes to model each of these
transitions with generic lowering code for each one.

However, with this patch LLVM already reaches performance parity with
GCC for the core C loops of the x264 code (assuming you disable the
hand-written assembly versions) when compiling for SSE2 and SSE3
architectures and enabling the new widening and lowering logic for
vectors.

Differential Revision: http://reviews.llvm.org/D4405

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212610 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 10:58:18 +00:00
Daniel Sanders
1285b3130b [mips][mips64r6] Correct select patterns that have the condition or true/false values backwards
Summary: This bug caused SingleSource/Regression/C/uint64_to_float and SingleSource/UnitTests/2002-05-02-CastTest3 to fail (among others).

Differential Revision: http://reviews.llvm.org/D4388


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212608 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 10:47:26 +00:00
Daniel Sanders
388704618e [mips][mips64r6] Correct cond names in the cmp.cond.[ds] instructions
Summary:
It seems we accidentally read the wrong column of the table MIPS64r6 spec
and used the names for c.cond.fmt instead of cmp.cond.fmt.

Differential Revision: http://reviews.llvm.org/D4387


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212607 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 10:40:20 +00:00
Daniel Sanders
f08bcb9b97 [mips][mips64r6] Use JALR for indirect branches instead of JR (which is not available on MIPS32r6/MIPS64r6)
Summary:
This completes the change to use JALR instead of JR on MIPS32r6/MIPS64r6.

Reviewers: jkolek, vmedic, zoran.jovanovic, dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4269


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212605 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 10:21:59 +00:00
Daniel Sanders
7c2ef822f7 [mips][mips64r6] Use JALR for returns instead of JR (which is not available on MIPS32r6/MIPS64r6)
Summary:
RET, and RET_MM have been replaced by a pseudo named PseudoReturn.
In addition a version with a 64-bit GPR named PseudoReturn64 has been
added.

Instruction selection for a return matches RetRA, which is expanded post
register allocation to PseudoReturn/PseudoReturn64. During MipsAsmPrinter,
this PseudoReturn/PseudoReturn64 are emitted as:
- (JALR64 $zero, $rs) on MIPS64r6
- (JALR $zero, $rs) on MIPS32r6
- (JR_MM $rs) on microMIPS
- (JR $rs) otherwise

On MIPS32r6/MIPS64r6, 'jr $rs' is an alias for 'jalr $zero, $rs'. To aid
development and review (specifically, to ensure all cases of jr are
updated), these aliases are temporarily named 'r6.jr' instead of 'jr'.
A follow up patch will change them back to the correct mnemonic.

Added (JALR $zero, $rs) to MipsNaClELFStreamer's definition of an indirect
jump, and removed it from its definition of a call.
Note: I haven't accounted for MIPS64 in MipsNaClELFStreamer since it's
doesn't appear to account for any MIPS64-specifics.

The return instruction created as part of eh_return expansion is now expanded
using expandRetRA() so we use the right return instruction on MIPS32r6/MIPS64r6
('jalr $zero, $rs').

Also, fixed a misuse of isABI_N64() to detect 64-bit wide registers in
expandEhReturn().

Reviewers: jkolek, vmedic, mseaborn, zoran.jovanovic, dsanders

Reviewed By: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4268


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212604 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 10:16:07 +00:00
Chandler Carruth
98eac0a244 [x86] Re-apply a variant of the x86 side of r212324 now that the rest
has settled without incident, removing the x86-specific and overly
strict 'isVectorSplat' routine in favor of generic and more powerful
splat detection.

The primary motivation and result of this is that the x86 backend can
now see through splats which contain undef elements. This is essential
if we are using a widening form of legalization and I've updated a test
case to also run in that mode as before this change the generated code
for the test case was completely scalarized.

This version of the patch much more carefully handles the undef lanes.
- We aren't overly conservative about them in the shift lowering
  (where we will never use the splat itself).
- One place where the splat would have been re-used by the existing code
  now explicitly constructs a new constant splat that will be safe.
- The broadcast lowering is much more reasonable with undefs by doing
  a correct check of whether the splat is the only user of a loaded
  value, checking that the splat actually crosses multiple lanes before
  using a broadcast, and handling broadcasts of non-constant splats.

As a consequence of the last bullet, the weird usage of vpshufd instead
of vbroadcast is gone, and we actually can lower an AVX splat with
vbroadcastss where before we emitted a really strange pattern of
a vector load and a manual splat across the vector.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212602 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-09 10:06:58 +00:00
Jim Grosbach
05bb7c5045 AArch64: Better codegen for loading from __fp16.
Loading will generally extend to an f32 or an 64, so make sure
to match those patterns directly to load into the FPR16 register
class directly rather than going through the integer GPRs.

This also eliminates an extra step in the convert-to-f64 path
which was first converting to f32 and then to f64 from there.

rdar://17594379

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212573 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-08 23:28:48 +00:00
Andrea Di Biagio
b8245a4599 [DAG] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal.
This patch teaches how to fold a shuffle according to rule:
  shuffle (shuffle (x, undef, M0), undef, M1) -> shuffle(x, undef, M2)

We do this only if the resulting mask M2 is legal; this is to avoid introducing
illegal shuffles that are potentially expanded into a sub-optimal sequence
of target specific dag nodes.

This patch has the advantage of being target independent, since it works on ISD
nodes. Therefore, all targets (not only x86) can take advantage of this rule.
The idea behind this patch is that most shuffle pairs can be safely combined
before we run the legalizer on vector operations. This allows us to
combine/simplify dag nodes earlier in the process and not only immediately
before instruction selection stage.

That said. This patch is not meant to replace any existing target specific
combine rules; backends might still introduce new shuffles during legalization
stage. Also, this rule is very simple and avoids to aggressively optimize
shuffles.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212539 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-08 15:22:29 +00:00
Vladimir Medic
ffbc2a1325 Mips.abiflags is a new implicitly generated section that will be present on all new modules. The section contains a versioned data structure which represents essentially information to allow a program loader to determine the requirements of the application. This patch implements mips.abiflags section and provides test cases for it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212519 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-08 08:59:22 +00:00
Chandler Carruth
25b7d54e7f [x86,SDAG] Sink the logic for folding shuffles of splats more
aggressively from the x86 shuffle lowering to the generic SDAG vector
shuffle formation code.

This code already tried to fold away shuffles of splats! It just had
lots of bugs and couldn't handle the case my new x86 shuffle lowering
needed.

First, it failed to correctly compute whether N2 was undef because it
pre-computed this, then did transformations which could *make* N2 undef,
then failed to ever re-consider the precomputed state.

Second, it didn't look through bitcasts at all, even in the safe cases
where they are just element-type bitcasts with no change to the number
of elements.

Third, it didn't handle all-zero bit casts nicely the way my code in the
x86 side of things did, which is essential to getting good zext-shuffle
lowerings.

But all of these are generic. I just ported the code down to this layer
and fixed the surrounding bugs. Tests exercising this in the x86 backend
still pass and some silly code in widen_cast-6.ll gets better. I updated
that test to be a bit more precise but it's still pretty unclear what
the value of the test is in this day and age.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212517 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-08 08:45:38 +00:00
Andrea Di Biagio
cfb83b7bac [x86] Fix assertion failure caused by a wrong combine of PSHUFD nodes with different types.
When combining a sequence of two PSHUFD dag nodes into a single PSHUFD,
make sure that we assign the correct type to the resulting PSHUFD.
X86ISD::PSHUFD dag nodes can be either MVT::v4i32 or MVT::v4f32.

Before this change, an assertion failure was triggered in method
'DAGCombinerInfo::CombineTo' when trying to combine the shuffles from the test
below into a single PSHUFD.

define <4 x float> @test1(<4 x float> %V) {
  %1 = shufflevector <4 x float> %V, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
  %2 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
  ret <4 x float> %2
}



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212498 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-07 23:25:23 +00:00
Juergen Ributzka
1154be8198 [FastISel][X86] Fix smul.with.overflow.i8 lowering.
Add custom lowering code for signed multiply instruction selection, because the
default FastISel instruction selection for ISD::MUL will use unsigned multiply
for the i8 type and signed multiply for all other types. This would set the
incorrect flags for the overflow check.

This fixes <rdar://problem/17549300>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212493 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-07 21:52:21 +00:00
Louis Gerbarg
e7f8191b18 Allow AArch64FastISel to degrade graceully in the presence of an MVT::i128
Currently AArch64FastISel crashes if it tries to extend an integer into an
MVT::i128. This can happen by creating 128 bit integers like so:

  typedef unsigned int uint128_t __attribute__((mode(TI)));
  typedef int sint128_t __attribute__((mode(TI)));

This patch makes EmitIntExt check for their presence and then falls back to
SelectionDAG.

Tests included.

rdar://17516686

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212492 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-07 21:37:51 +00:00
Ulrich Weigand
50e72958aa [PowerPC] Fix testcase regression
Use -mcpu to avoid different codegen depending on host platform.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212478 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-07 19:41:54 +00:00
Ulrich Weigand
bf7bfe3549 [PowerPC] Fix "byval align" arguments
Arguments passed as "byval align" should get the specified alignment
in the parameter save area.  There was some code in PPCISelLowering.cpp
that attempted to implement this, but this didn't work correctly:
while code did update the ArgOffset value, it neglected to update
the PtrOff value (which was already computed from the old ArgOffset),
and it also neglected to update GPR_idx -- fields skipped due to
alignment in the save area must likewise be skipped in GPRs.

This patch fixes and simplifies this logic by:
- handling argument offset alignment right at the beginning
  of argument processing, using a new helper routine
  CalculateStackSlotAlignment (this avoids having to update
  PtrOff and other derived values later on)
- not tracking GPR_idx separately, but always computing the
  correct GPR_idx for each argument *from* its ArgOffset
- removing some redundant computation in LowerFormalArguments:
  MinReservedArea must equal ArgOffset after argument processing,
  so there's no use in computing it twice.

[This doesn't change the behavior of the current clang front-end,
since that never creates "byval align" arguments at the moment.
This will change with a follow-on patch, however.]


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212476 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-07 19:26:41 +00:00
Chandler Carruth
7fcb422bb2 [x86] Revert r212324 which was too aggressive w.r.t. allowing undef
lanes in vector splats.

The core problem here is that undef lanes can't *unilaterally* be
considered to contribute to splats. Their handling needs to be more
cautious. There is also a reported failure of the nightly testers
(thanks Tobias!) that may well stem from the same core issue. I'm going
to fix this theoretical issue, factor the APIs a bit better, and then
verify that I don't see anything bad with Tobias's reduction from the
test suite before recommitting.

Original commit message for r212324:
  [x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for
  any constant, constant FP, or undef splat and to tolerate any undef
  lanes in a splat, then replace all uses of isSplatVector in X86's
  lowering with it.

  This fixes issues where undef lanes in an otherwise splat vector would
  prevent the splat logic from firing. It is a touch more awkward to use
  this interface, but it is much more accurate. Suggestions for better
  interface structuring welcome.

  With this fix, the code generated with the widening legalization
  strategy for widen_cast-4.ll is *dramatically* improved as the special
  lowering strategies for a v16i8 SRA kick in even though the high lanes
  are undef.

  We also get a slightly different choice for broadcasting an aligned
  memory location, and use vpshufd instead of vbroadcastss. This looks
  like a minor win for pipelining and domain crossing, but a minor loss
  for the number of micro-ops. I suspect its a wash, but folks can
  easily tweak the lowering if they want.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212475 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-07 19:03:32 +00:00
Matt Arsenault
0e1619e77c R600: Fix mishandling of load / store chains.
Fixes various bugs with reordering loads and stores.
Scalarized vector loads weren't collecting the chains
at all.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212473 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-07 18:34:45 +00:00
Chandler Carruth
36ad61f4ea [x86] Teach the new vector shuffle lowering code to handle what is
essentially a DAG combine that never gets a chance to run.

We might typically expect DAG combining to remove shuffles-of-splats and
other similar patterns, but we don't get a chance to run the DAG
combiner when we recursively form sub-shuffles during the lowering of
a shuffle. So instead hand-roll a really important combine directly into
the lowering code to detect shuffles-of-splats, especially shuffles of
an all-zero splat which needn't even have the same element width, etc.

This lets the new vector shuffle lowering handle shuffles which
implement things like zero-extension really nicely. This will become
even more important when I wire the legalization of zero-extension to
vector shuffles with the new widening legalization strategy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212444 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-07 09:06:58 +00:00