Commit Graph

13922 Commits

Author SHA1 Message Date
Alex Lorenz
31512fe6ce MIR Parser: Use source locations for MBB naming errors.
This commit changes the type of the field 'Name' in the struct
'yaml::MachineBasicBlock' from 'std::string' to 'yaml::StringValue'. This change
allows the MIR parser to report errors related to the MBB name with the proper
source locations.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241718 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-08 20:22:20 +00:00
Krzysztof Parzyszek
a307401165 [Hexagon] Implement commoning of GetElementPtr instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241714 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-08 19:22:28 +00:00
Reid Kleckner
92ea0775b7 [SEH] Add missing test case from previous realignment commit
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241700 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-08 18:09:39 +00:00
Reid Kleckner
f0999f3b02 [SEH] Ensure that empty __except blocks have their own BB
The 32-bit lowering assumed that WinEHPrepare had this invariant.
WinEHPrepare did it for C++, but not SEH. The result was that we would
insert calls to llvm.x86.seh.restoreframe in normal basic blocks, which
corrupted the frame pointer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241699 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-08 18:08:52 +00:00
James Y Knight
8eb1aaac9c [SPARC] Cleanup handling of the Y/ASR registers.
- Implement copying ASR to/from GPR regs.
- Mark ASRs as non-allocatable, so it won't try to arbitrarily use
  them inappropriately.
- Instead of inserting explicit WRASR/RDASR nodes in the MUL/DIV
  routines, just do normal register copies.
- Also...mark div as using Y, not just writing it.

Added a test case with some code which previously died with an
assertion failure (with -O0), or produced wrong code (otherwise).

(Third time's the charm?)

Differential Revision: http://reviews.llvm.org/D10401

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241686 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-08 16:25:12 +00:00
Krzysztof Parzyszek
e7f45f66a7 [Hexagon] Generate "insert" instructions more aggressively
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241683 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-08 14:47:34 +00:00
Krzysztof Parzyszek
5d447e9c2a Revert 241681: causes Windows builds to fail
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241682 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-08 14:34:13 +00:00
Krzysztof Parzyszek
ea2273d00c [Hexagon] Generate "insert" instructions more aggressively
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241681 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-08 14:22:27 +00:00
Simon Pilgrim
796a06d4eb [X86][SSE] Added (V)ROUNDSD + (V)ROUNDSS stack folding support
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241671 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-08 08:07:57 +00:00
Reid Kleckner
39ee70ca76 [WinEH] Make llvm.x86.seh.restoreframe work for stack realignment prologues
The incoming EBP value points to the end of a local stack allocation, so
we can use that to restore ESI, the base pointer. Once we do that, we
can use local stack allocations. If we know we need stack realignment,
spill the original frame pointer in the prologue and reload it after
restoring ESI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241648 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-07 23:45:58 +00:00
Reid Kleckner
4fe74caa61 [WinEH] Add localaddress intrinsic instead of using frameaddress
Clang uses this for SEH finally. The new intrinsic will produce the
right value when stack realignment is required.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241643 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-07 23:23:03 +00:00
Arnold Schwaighofer
39fe55270a Add more nvcasts
Tim Northover has told me that they can occur when the compiler cleverly
constructs constants - as demonstrated in the test case.

rdar://21703486

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241641 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-07 23:13:18 +00:00
Reid Kleckner
8f32e5f0d6 Rename llvm.frameescape and llvm.framerecover to localescape and localrecover
Summary:
Initially, these intrinsics seemed like part of a family of "frame"
related intrinsics, but now I think that's more confusing than helpful.
Initially, the LangRef specified that this would create a new kind of
allocation that would be allocated at a fixed offset from the frame
pointer (EBP/RBP). We ended up dropping that design, and leaving the
stack frame layout alone.

These intrinsics are really about sharing local stack allocations, not
frame pointers. I intend to go further and add an `llvm.localaddress()`
intrinsic that returns whatever register (EBP, ESI, ESP, RBX) is being
used to address locals, which should not be confused with the frame
pointer.

Naming suggestions at this point are welcome, I'm happy to re-run sed.

Reviewers: majnemer, nicholas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11011

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241633 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-07 22:25:32 +00:00
Alex Lorenz
78bc2545c9 MIR Serialization: Serialize the 'dead' register machine operand flag.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241624 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-07 20:34:53 +00:00
Arnold Schwaighofer
2b88d93a2e Add CHECK lines to test case
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241619 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-07 19:26:31 +00:00
Arnold Schwaighofer
f869ca86f1 Add a pattern for a nvcast from v2f64 -> v4f32
Since the NvCast is generated by the selection process the concerns about
endianess and bit reversal don't apply.

rdar://21703486

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241611 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-07 18:31:55 +00:00
Akira Hatanaka
75a855e853 Fix test case to unbreak build.
This commit changes the target arch to fix the test case commited in r241566
that was failing on ninja-x64-msvc-RA-centos6. Also add checks to make sure
the callee's address is loaded to blx's operand. 


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241588 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-07 14:45:12 +00:00
Akira Hatanaka
a744879a65 [ARM] Define a subtarget feature and use it to decide whether long calls should
be emitted.

This is needed to enable ARM long calls for LTO and enable and disable it on a
per-function basis.

Out-of-tree projects currently using EnableARMLongCalls to emit long calls
should start passing "+long-calls" to the feature string (see the changes made
to clang in r241565).

rdar://problem/21529937

Differential Revision: http://reviews.llvm.org/D9364


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241566 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-07 06:54:42 +00:00
Alex Lorenz
d0ef9f3115 MIR Parser: Verify the implicit machine register operands.
This commit verifies that the parsed machine instructions contain the implicit
register operands as specified by the MCInstrDesc. Variadic and call
instructions aren't verified.

Reviewers: Duncan P. N. Exon Smith

Differential Revision: http://reviews.llvm.org/D10781


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241537 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-07 02:08:46 +00:00
Dan Gohman
4214e961d7 [WebAssembly] Create a CodeGen unittest directory.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241520 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-06 23:14:57 +00:00
Alex Lorenz
4ec0f60807 MIR Serialization: Serialize the implicit register flag.
This commit serializes the implicit flag for the register machine operands. It
introduces two new keywords into the machine instruction syntax: 'implicit' and
'implicit-def'. The 'implicit' keyword is used for the implicit register
operands, and the 'implicit-def' keyword is used for the register operands that
have both the implicit and the define flags set.

Reviewers: Duncan P. N. Exon Smith

Differential Revision: http://reviews.llvm.org/D10709


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241519 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-06 23:07:26 +00:00
Simon Pilgrim
315fd86400 [X86][AVX] Add support for shuffle decoding of vperm2f128/vperm2i128 with zero'd lanes
The vperm2f128/vperm2i128 shuffle mask decoding was not attempting to deal with shuffles that give zero lanes. This patch fixes this so that the assembly printer can provide shuffle comments.

As this decoder is also used in X86ISelLowering for shuffle combining, I've added an early-out to match existing behaviour. The hope is that we can add zero support in the future, this would allow other ops' decodes (e.g. insertps) to be combined as well.

Differential Revision: http://reviews.llvm.org/D10593

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241516 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-06 22:46:46 +00:00
Sanjay Patel
75a2ce3271 [x86] extend machine combiner reassociation optimization to SSE scalar adds
Extend the reassociation optimization of http://reviews.llvm.org/rL240361 (D10460)
to SSE scalar FP SP adds in addition to AVX scalar FP SP adds.

With the 'switch' in place, we can trivially add other opcodes and test cases in
future patches.

Differential Revision: http://reviews.llvm.org/D10975



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241515 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-06 22:35:29 +00:00
Simon Pilgrim
6970be03d1 [X86][SSE] Vectorized i64 uniform constant SRA shifts
This patch adds vectorization support for uniform constant i64 arithmetic shift right operators.

Differential Revision: http://reviews.llvm.org/D9645

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241514 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-06 22:35:19 +00:00
Reid Kleckner
1249487852 [WinEH] Add some test cases I forgot to add to previous commits
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241510 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-06 21:13:53 +00:00
Reid Kleckner
e23370402c [WinEH] Insert the EH code load before the block terminator
The previous code put the load after the terminator, leading to invalid
IR and downstream crashes. This caused http://crbug.com/506446.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241509 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-06 21:13:43 +00:00
Simon Pilgrim
3ecdd44e5d [X86][SSE4A] Shuffle lowering using SSE4A EXTRQ/INSERTQ instructions
This patch adds support for v8i16 and v16i8 shuffle lowering using the immediate versions of the SSE4A EXTRQ and INSERTQ instructions. Although rather limited (they can only act on the lower 64-bits of the source vectors, leave the upper 64-bits of the result vector undefined and don't have VEX encoded variants), the instructions are still useful for the zero extension of any lane (EXTRQ) or inserting a lane into another vector (INSERTQ). Testing demonstrated that it wasn't typically worth it to use these instructions for v2i64 or v4i32 vector shuffles although they are capable of it.

As well as adding specific pattern matching for the shuffles, the patch uses EXTRQ for zero extension cases where SSE41 isn't available and its more efficient than the SSE2 'unpack' default approach. It also adds shuffle decode support for the EXTRQ / INSERTQ cases when the instructions are handling full byte-sized extractions / insertions.

From this foundation, future patches will be able to make use of the instructions for situations that use their ability to extract/insert at the bit level.

Differential Revision: http://reviews.llvm.org/D10146

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241508 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-06 20:46:41 +00:00
Alex Lorenz
edfa571cbd llc: Add a 'run-pass' option.
This commit adds a 'run-pass' option to llc, which instructs the compiler to run
one specific code generation pass only.

Llc already has the 'start-after' and the 'stop-after' options, and this new
option complements the other two by making it easier to write tests that want
to invoke a single pass only.

Reviewers: Duncan P. N. Exon Smith

Differential Revision: http://reviews.llvm.org/D10776


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241476 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-06 17:44:26 +00:00
Matt Arsenault
6fe7acaaf8 AMDGPU/SI: Add debugging subtarget feature for DS offsets
We don't have a good way to detect most situations where
DS offsets are usable on SI, so add an option to force using
them even if unsafe for debugging performance problems.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241462 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-06 16:01:58 +00:00
Simon Pilgrim
ff55c29f54 [X86][SSE] Added missing stack folding test for SQRTSD and SQRTSS instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241445 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-06 14:15:02 +00:00
Asaf Badouh
169ee3383c [X86][AVX512] Multiply Packed Unsigned Integers with Round and Scale
pmulhrsw

review:
http://reviews.llvm.org/D10948

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241443 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-06 14:03:40 +00:00
Simon Pilgrim
995b551ae7 [X86][SSE3] Just use an explicit SSE3 target attribute - not a cpu type.
Merged arch/target into a specific triple - we had i686 and x86_64 targets overriding each other....

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241410 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-05 19:06:32 +00:00
Simon Pilgrim
5fd4fe08f6 [X86][SSE2] Just use an explicit SSE2 target attribute - not a cpu type.
corei7 is capable of a lot more than just SSE2.... 

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241409 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-05 19:03:51 +00:00
Asaf Badouh
5047893c31 [x86][AVX512] add Multiply High Op
include encoding and intrinsics tests.

review
http://reviews.llvm.org/D10896

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241406 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-05 12:23:20 +00:00
Nemanja Ivanovic
8be316bf23 Add missing builtins to the PPC back end for ABI compliance (vol. 2)
This patch corresponds to review:
http://reviews.llvm.org/D10874

Back end portion of the second round of additions to altivec.h.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241398 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-05 06:03:51 +00:00
Simon Pilgrim
4606f6d8da [X86][SSE] Improved i8/i16 to f64 uint2fp vector conversions
Followup to D10433 and D10589 that fixes i8/i16 uint2fp vector conversions by zero extending to i32 and using the sint2fp path (unless the target does actually support uint2fp).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241394 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-04 15:33:34 +00:00
Simon Pilgrim
571beb683f [X86] Added 32-bit builds to fp<->int tests.
Ensure that i686 x87/SSE/SSE2 targets all build.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241368 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-03 20:07:57 +00:00
NAKAMURA Takumi
3ec9de8dfd llvm/test/CodeGen/ARM/fnattr-trap.ll: Add -mtriple, to appease targeting *-win32.
LLVM ERROR: CPU: 'generic' does not support ARM mode execution!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241329 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-03 08:21:38 +00:00
Simon Pilgrim
2010d82c49 whitespace tidyup. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241326 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-03 08:02:12 +00:00
Simon Pilgrim
339c530319 [X86][SSE] Sign extension for target vector sizes less than 128 bits (pt2)
Add support for v2i8/v2i16 to v2f64 by using a sign extension to v2i32 before conversion to v2f64.

Differential Revision: http://reviews.llvm.org/D10589

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241325 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-03 08:01:36 +00:00
Simon Pilgrim
e3c6222c76 [X86][SSE] Sign extension for target vector sizes less than 128 bits (pt1)
This patch adds support for sign extension for sub 128-bit vectors, such as to v2i32. It concatenates with UNDEF subvectors up to 128-bits, performs the sign extension (i.e. as v4i32) and then extracts the target subvector.

Patch 1/2 of D10589 - the second patch covers the conversion of v2i8/v2i16 to v2f64.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241323 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-03 07:51:01 +00:00
Nadav Rotem
6890be345e Fix an overly aggressive assertion in getCopyFromPartsVector.
The assertion in getCopyFromPartsVector assumed that the vector 'part' must
match the type of argument (arguments are potentially split into multiple
parts). However, in some cases the targets return a 'part' of the right size
but with a different type. We already handle this case correctly later on
and generate a bitcast. This commit just makes sure that we are actually
checking the property that we care about.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241312 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-02 23:23:52 +00:00
Akira Hatanaka
516286ff69 Use function attribute "trap-func-name" and remove TargetOptions::TrapFuncName.
This commit changes normal isel and fast isel to read the user-defined trap
function name from function attribute "trap-func-name" attached to llvm.trap or
llvm.debugtrap instead of from TargetOptions::TrapFuncName. This is needed to
use clang's command line option "-ftrap-function" for LTO and enable changing
the trap function name on a per-call-site basis.

Out-of-tree projects currently using TargetOptions::TrapFuncName to specify the
trap function name should attach attribute "trap-func-name" to the call sites
of llvm.trap and llvm.debugtrap instead.

rdar://problem/21225723

Differential Revision: http://reviews.llvm.org/D10832


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241305 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-02 22:13:27 +00:00
Bill Schmidt
397fac95d5 [PPC64LE] Remove implicit-subreg restriction from VSX swap removal
In r241285, I removed the SUBREG_TO_REG restriction from VSX swap
removal, determining that this was overly conservative.  We have
another form of the same restriction in that we check for the presence
of implicit subregs in vector operations.  As with SUBREG_TO_REG for
partial register conversions, an implicit subreg is safe in and of
itself, provided no other operation makes a lane-sensitive assumption
about the result.  This patch removes that restriction, by removing
the HasImplicitSubreg flag and all code that relies on it.

I've added a test case that fails to optimize before this patch is
applied, and optimizes properly with the patch.  Test based on a
report from Anton Blanchard.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241290 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-02 19:01:22 +00:00
Bill Schmidt
a5a5a62fff [PPC64LE] Teach swap optimization about the doubleword splat idiom
With a previous patch, the VSX swap optimization is able to recognize
the doubleword load-splat idiom that can be implemented using lxvdsx.
However, that does not cover a doubleword splat where the source is a
register.  We can implement this using xxspltd (a special form of
xxpermdi).  This patch teaches the swap optimization pass about this
idiom.

As a prerequisite, it also permits swap optimization to succeed for
all forms of SUBREG_TO_REG.  Previously we were conservative and only
allowed SUBREG_TO_REG when it copied a full register.  However, on
reflection any form of SUBREG_TO_REG is safe in and of itself, so long
as an unsafe operation is not performed on its result.  In particular,
a widening SUBREG_TO_REG often occurs as an input to a doubleword
splat idiom, particularly in auto-vectorized code.

The doubleword splat idiom is an XXPERMDI operation where both source
registers are identical, and the selection mask is either 0 (splat the
first element) or 3 (splat the second element).  To determine whether
the registers are identical, we use the existing mechanism for looking
through "copy-like" operations.  That mechanism has a side effect of
marking the XXPERMDI operation as using a physical register, which
would invalidate its presence in a swap-optimized region.  This is
correct for the form of XXPERMDI that performs a swap and hence would
be removed, but is not what we want for a doubleword-splat variety of
XXPERMDI.  Therefore we reset the physical-register flag on the
XXPERMDI when it represents a splat.

A simple test case is added to verify that we generate the splat and
that we also remove the xxswapd instructions that would otherwise be
associated with the load and store of another operand.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241285 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-02 17:03:06 +00:00
Pawel Bylica
074d71dea6 Reapply r240291: Fix shl folding in DAG combiner.
The code responsible for shl folding in the DAGCombiner was assuming incorrectly that all constants are less than 64 bits. This patch simply changes the way values are compared.

It has been reverted previously because of some problems with comparing APInt with raw uint64_t. That has been fixed/changed with r241204.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241254 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-02 11:44:54 +00:00
Quentin Colombet
a1a323c637 [TwoAddressInstructionPass] Try 3 Addr Conversion After Commuting.
TwoAddressInstructionPass stops after a successful commuting but 3 Addr
conversion might be good for some cases.
 
Consider:

int foo(int a, int b) {
  return a + b;
}

Before this commit, we emit:

addl	%esi, %edi
movl	%edi, %eax
ret

After this commit, we try 3 Addr conversion:

leal	(%rsi,%rdi), %eax
ret

Patch by Volkan Keles <vkeles@apple.com>!

Differential Revision: http://reviews.llvm.org/D10851


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241206 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-01 23:12:13 +00:00
Matthias Braun
3c76e5f588 Test for specific output in lit test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241200 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-01 22:34:59 +00:00
Jingyue Wu
e08f05f3a5 [NVPTX] expand extload/truncstore for vectors of floats
Summary:
According to PTX ISA:

For convenience, ld, st, and cvt instructions permit source and destination data operands to be wider than the instruction-type size, so that narrow values may be loaded, stored, and converted using regular-width registers. For example, 8-bit or 16-bit values may be held directly in 32-bit or 64-bit registers when being loaded, stored, or converted to other types and sizes. The operand type checking rules are relaxed for bit-size and integer (signed and unsigned) instruction types; floating-point instruction types still require that the operand type-size matches exactly, unless the operand is of bit-size type.

So, the ISA does not support load with extending/store with truncatation for floating numbers. This is reflected in setting the loadext/truncstore actions to expand in the code for floating numbers, but vectors of floating numbers are not taken care of.

As a result, loading a vector of floats followed by a fp_extend may be combined by DAGCombiner to a extload, and the extload may be lowered to NVPTXISD::LoadV2 with extending information. However, NVPTXISD::LoadV2 does not perform extending, and no extending instructions are inserted. Finally, PTX instructions with mismatched types are generated, like
ld.v2.f32 {%fd3, %fd4}, [%rd2]

This patch adds the correct actions for vectors of floats, so DAGCombiner would not create loads with extending, and correct code is generated.

Patched by Gang Hu. 

Test Plan: Test case attached.

Reviewers: jingyue

Reviewed By: jingyue

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D10876

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241191 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-01 21:32:42 +00:00
Jingyue Wu
8f2981cb40 [NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPass
Summary:
Offset of frame index is calculated by NVPTXPrologEpilogPass. Before
that the correct offset of stack objects cannot be obtained, which
leads to wrong offset if there are more than 2 frame objects. This patch
move NVPTXPeephole after NVPTXPrologEpilogPass. Because the frame index
is already replaced by %VRFrame in NVPTXPrologEpilogPass, we check
VRFrame register instead, and try to remove the VRFrame if there
is no usage after NVPTXPeephole pass.

Patched by Xuetian Weng. 

Test Plan:
Strengthened test/CodeGen/NVPTX/local-stack-frame.ll to check the
offset calculation based on SP and SPL.

Reviewers: jholewinski, jingyue

Reviewed By: jingyue

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D10853

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@241185 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-01 20:08:06 +00:00