The OpenCL specs say: "The vector versions of the math functions operate
component-wise. The description is per-component."
Patch by: Jan Vesely
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200773 91177308-0d34-0410-b5e6-96231b3b80d8
V_ADD_F32 with source modifier does not produce -0.0 for this. Just
manipulate the sign bit directly instead.
Also add a pattern for (fneg (fabs ...)).
Fixes a bunch of bit encoding piglit tests with radeonsi.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200743 91177308-0d34-0410-b5e6-96231b3b80d8
This didn't work for any integer vectors, and didn't
work with some sizes of float vectors. This should now
work with all sizes of float and i32 vectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200619 91177308-0d34-0410-b5e6-96231b3b80d8
This pattern uses an SDNodeXForm, which isn't being emitted for some
reason. I can get it to work by attaching the PatLeaf that has the
XForm to the argument in the output pattern, but this results in an
immediate being used in a register operand, which the backend can't
handle yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199918 91177308-0d34-0410-b5e6-96231b3b80d8
The control flow finalizer would sometimes use an ALU_POP_AFTER
instruction before the vetex fetch clause instead of using a POP
instruction after it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199917 91177308-0d34-0410-b5e6-96231b3b80d8
Implement the getUnrollingPreferences() function for
AMDGPUTargetTransformInfo so that loops that do address calculations
on pointers derived from alloca are unconditionally unrolled.
Unrolling these loops makes it more likely that SROA will be able to
eliminate the allocas, which is a big win for R600 since memory
allocated by alloca (private memory) is really slow.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199916 91177308-0d34-0410-b5e6-96231b3b80d8
The unit test is now disabled on non-asserts builds.
The CF stack can be corrupted if you use CF_ALU_PUSH_BEFORE,
CF_ALU_ELSE_AFTER, CF_ALU_BREAK, or CF_ALU_CONTINUE when the number of
sub-entries on the stack is greater than or equal to the stack entry
size and sub-entries modulo 4 is either 0 or 3 (on cedar the bug is
present when number of sub-entries module 8 is either 7 or 0)
We choose to be conservative and always apply the work-around when the
number of sub-enries is greater than or equal to the stack entry size,
so that we can safely over-allocate the stack when we are unsure of the
stack allocation rules.
reviewed-by: Vincent Lejeune <vljn at ovi.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199905 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit 35b8331cad6eb512a2506adbc394201181da94ba.
The -debug-only flag for llc doesn't appear to be available in
all build configurations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199845 91177308-0d34-0410-b5e6-96231b3b80d8
The CF stack can be corrupted if you use CF_ALU_PUSH_BEFORE,
CF_ALU_ELSE_AFTER, CF_ALU_BREAK, or CF_ALU_CONTINUE when the number of
sub-entries on the stack is greater than or equal to the stack entry
size and sub-entries modulo 4 is either 0 or 3 (on cedar the bug is
present when number of sub-entries module 8 is either 7 or 0)
We choose to be conservative and always apply the work-around when the
number of sub-enries is greater than or equal to the stack entry size,
so that we can safely over-allocate the stack when we are unsure of the
stack allocation rules.
reviewed-by: Vincent Lejeune <vljn at ovi.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199842 91177308-0d34-0410-b5e6-96231b3b80d8
v2: Add ftrunc->TRUNC pattern instead of replacing int_AMDGPU_trunc
v3: move ftrunc pattern next to TRUNC definition, it's available since R600
Patch By: Jan Vesely
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@197783 91177308-0d34-0410-b5e6-96231b3b80d8
Different sized address spaces should theoretically work
most of the time now, and since 64-bit add is currently
disabled, using more 32-bit pointers fixes some cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@197659 91177308-0d34-0410-b5e6-96231b3b80d8
SGPRs are spilled into VGPRs using the {READ,WRITE}LANE_B32 instructions.
v2:
- Fix encoding of Lane Mask
- Use correct register flags, so we don't overwrite the low dword
when restoring multi-dword registers.
v3:
- Register spilling seems to hang the GPU, so replace all shaders
that need spilling with a dummy shader.
v4:
- Fix *LANE definitions
- Change destination reg class for 32-bit SMRD instructions
v5:
- Remove small optimization that was crashing Serious Sam 3.
https://bugs.freedesktop.org/show_bug.cgi?id=68224https://bugs.freedesktop.org/show_bug.cgi?id=71285
NOTE: This is a candidate for the 3.4 branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195880 91177308-0d34-0410-b5e6-96231b3b80d8
Writing to the M0 register from an SMRD instruction hangs the GPU, so
we need to use the SGPR_32 register class, which does not include M0.
NOTE: This is a candidate for the 3.4 branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195879 91177308-0d34-0410-b5e6-96231b3b80d8
We were ignoring the ordered/onordered bits and also the signed/unsigned
bits of condition codes when lowering the DAG to MachineInstrs.
NOTE: This is a candidate for the 3.4 branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195514 91177308-0d34-0410-b5e6-96231b3b80d8
The legalizer can now do this type of expansion for more
type combinations without loading and storing to and
from the stack.
NOTE: This is a candidate for the 3.4 branch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195398 91177308-0d34-0410-b5e6-96231b3b80d8
Test doesn't actually check the output. I need
to fix add i64 being matched for the addressing
calculations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195040 91177308-0d34-0410-b5e6-96231b3b80d8
This is to avoid this transformation in some cases:
fold (conv (load x)) -> (load (conv*)x)
On architectures that don't natively support some vector
loads efficiently casting the load to a smaller vector of
larger types and loading is more efficient.
Patch by Micah Villmow.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194783 91177308-0d34-0410-b5e6-96231b3b80d8
The LDS output queue is accessed via the OQAP register. The OQAP
register cannot be live across clauses, so if value is written to the
output queue, it must be retrieved before the end of the clause.
With the machine scheduler, we cannot statisfy this constraint, because
it lacks proper alias analysis and it will mark some LDS accesses as
having a chain dependency on vertex fetches. Since vertex fetches
require a new clauses, the dependency may end up spiltting OQAP uses and
defs so the end up in different clauses. See the lds-output-queue.ll
test for a more detailed explanation.
To work around this issue, we now combine the LDS read and the OQAP
copy into one instruction and expand it after register allocation.
This patch also adds some checks to the EmitClauseMarker pass, so that
it doesn't end a clause with a value still in the output queue and
removes AR.X and OQAP handling from the scheduler (AR.X uses and defs
were already being expanded post-RA, so the scheduler will never see
them).
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194755 91177308-0d34-0410-b5e6-96231b3b80d8
All shift operations will be selected as SALU instructions and then
if necessary lowered to VALU instructions in the SIFixSGPRCopies pass.
This allows us to do more operations on the SALU which will improve
performance and is also required for implementing private memory
using indirect addressing, since the private memory pointers must stay
in the scalar registers.
This patch includes some fixes from Matt Arsenault.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194625 91177308-0d34-0410-b5e6-96231b3b80d8
Print the range of registers used with a single letter prefix.
This better matches what the shader compiler produces and
is overall less obnoxious than concatenating all of the
subregister names together.
Instead of SGPR0, it will print s0. Instead of SGPR0_SGPR1,
it will print s[0:1] and so on.
There doesn't appear to be a straightforward way
to get the actual register info in the InstPrinter,
so this parses the generated name to print with the
new syntax.
The required test changes are pretty nasty, and register
matching regexes are now worse. Since there isn't a way to
add to a variable in FileCheck, some of the tests now don't
check the exact number of registers used, but I don't think that
will be a real problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194443 91177308-0d34-0410-b5e6-96231b3b80d8
The SelectionDAGBuilder was promoting vector kernel arguments to legal
types, but this won't work for R600 and SI since kernel arguments are
stored in memory and can't be promoted. In order to handle vector
arguments correctly we need to look at the original types from the LLVM IR
function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193215 91177308-0d34-0410-b5e6-96231b3b80d8
The AMDGPUIndirectAddressing pass was previously responsible for
lowering private loads and stores to indirect addressing instructions.
However, this pass was buggy and way too complicated. The only
advantage it had over the new simplified code was that it saved one
instruction per direct write to private memory. This optimization
likely has a minimal impact on performance, and we may be able
to duplicate it using some other transformation.
For the private address space, we now:
1. Lower private loads/store to Register(Load|Store) instructions
2. Reserve part of the register file as 'private memory'
3. After regalloc lower the Register(Load|Store) instructions to
MOV instructions that use indirect addressing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193179 91177308-0d34-0410-b5e6-96231b3b80d8
We were calling llvm_unreachable() when failing to optimize the
branch into if case. However, it is still possible for us
to structurize the CFG by duplicating blocks even if this optimization
fails.
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192813 91177308-0d34-0410-b5e6-96231b3b80d8
We can't enable the verifier for tests with SI_IF and SI_ELSE, because
these instructions are always followed by a COPY which copies their
result to the next basic block. This violates the machine verifier's
rule that non-terminators can not folow terminators.
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@192366 91177308-0d34-0410-b5e6-96231b3b80d8
We were completely ignoring the unorder/ordered attributes of condition
codes and also incorrectly lowering seto and setuo.
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191603 91177308-0d34-0410-b5e6-96231b3b80d8
For _XYZ, the type of VDATA is v4i32, because v3i32 doesn't exist.
The ADDR64 bit is not exposed. A simpler intrinsic that doesn't take
a resource descriptor might be nicer.
The maximum number of input SGPRs is bumped to 17.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190575 91177308-0d34-0410-b5e6-96231b3b80d8