Since the VSrc_* register classes contain both VGPRs and SGPRs, copies
that used be emitted by isel like this:
SGPR = COPY VGPR
Will now be emitted like this:
VSrC = COPY VGPR
This patch also adds a pass that tries to identify and fix situations where
a VGPR to SGPR copy may occur. Hopefully, these changes will make it
impossible for the compiler to generate illegal VGPR to SGPR copies.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187831 91177308-0d34-0410-b5e6-96231b3b80d8
* Added R600_Reg64 class
* Added T#Index#.XY registers definition
* Added v2i32 register reads from parameter and global space
* Added f32 and i32 elements extraction from v2f32 and v2i32
* Added v2i32 -> v2f32 conversions
Tom Stellard:
- Mark vec2 operations as expand. The addition of a vec2 register
class made them all legal.
Patch by: Dmitry Cherkassov
Signed-off-by: Dmitry Cherkassov <dcherkassov@gmail.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187582 91177308-0d34-0410-b5e6-96231b3b80d8
build_vector is lowered to REG_SEQUENCE, which is something the register
allocator does a good job at optimizing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187397 91177308-0d34-0410-b5e6-96231b3b80d8
This patch prevents the following combine when the input vector is used more
than once.
insert_vector_elt (build_vector elt0, ..., eltN), NewEltIdx, idx
=>
build_vector elt0, ..., NewEltIdx, ..., eltN
The reasons are:
- Building a vector may be expensive, so try to reuse the existing part of a
vector instead of creating a new one (think big vectors).
- elt0 to eltN now have two users instead of one. This may prevent some other
optimizations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187396 91177308-0d34-0410-b5e6-96231b3b80d8
This commit also implements these functions for R600 and removes a test
case that was relying on the buggy behavior.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187007 91177308-0d34-0410-b5e6-96231b3b80d8
These are really the same address space in hardware. The only
difference is that CONSTANT_ADDRESS uses a special cache for faster
access. When we are unable to use the constant kcache for some reason
(e.g. smaller types or lack of indirect addressing) then the instruction
selector must use GLOBAL_ADDRESS loads instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187006 91177308-0d34-0410-b5e6-96231b3b80d8
This increases the number of opportunites we have for folding. With the
previous implementation we were unable to fold into any instructions
other than the first when multiple instructions were selected from a
single SDNode.
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186919 91177308-0d34-0410-b5e6-96231b3b80d8
A side-effect of this is that now the compiler expects kernel arguments
to be 4-byte aligned.
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186916 91177308-0d34-0410-b5e6-96231b3b80d8
I'm guessing the failure had something to do with the double precision
floating point constant used in the test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186191 91177308-0d34-0410-b5e6-96231b3b80d8
Enough for the radeonsi driver to use it for calculating derivatives.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186012 91177308-0d34-0410-b5e6-96231b3b80d8
Test is not included as it is several 1000 lines long.
To test this functionnality, a test case must generate at least 2 ALU clauses,
where an ALU clause is ~110 instructions long.
NOTE: This is a candidate for the stable branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185943 91177308-0d34-0410-b5e6-96231b3b80d8