Instead add m0 as an implicit operand. This allows us to avoid using
the M0Reg register class and eliminates a number of unnecessary spills
when using s_sendmsg instructions. This impacts one shader in the
shader-db:
SGPRS: 48 -> 40 (-16.67 %)
VGPRS: 112 -> 108 (-3.57 %)
Code Size: 40132 -> 38796 (-3.33 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 2048 -> 0 (-100.00 %) bytes per wave
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237133 91177308-0d34-0410-b5e6-96231b3b80d8
changes:
Don't apply on hexagon and NVPTX since they no longer claim to support UADDO/USUBO
Add location to getConstant
Drop comment about the ops being turned into expand
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236240 91177308-0d34-0410-b5e6-96231b3b80d8
Revert "Remove default in fully-covered switch (to fix Clang -Werror -Wcovered-switch-default)"
Revert "R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO"
Revert "LegalizeDAG: Try to use Overflow operations when expanding ADD/SUB"
Using overflow operations fails CodeGen/Generic/2011-07-07-ScheduleDAGCrash.ll
on hexagon, nvptx, and r600. Revert while I investigate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234768 91177308-0d34-0410-b5e6-96231b3b80d8
v2: tighten the sub64 tests
v3: rename to CARRY/BORROW
v4: fixup test cmdline
add known bits computation
use sign extend instead of sub 0,x
better add test
v5: remove redundant break
move lowering to separate functions
fix comments
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewers: arsenm
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@234759 91177308-0d34-0410-b5e6-96231b3b80d8
This enables a few useful combines that used to only
use fma.
Also since v_mad_f32 apparently does not support denormals,
disable the existing cases that are custom handled if they are
requested.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230071 91177308-0d34-0410-b5e6-96231b3b80d8
This should allow finally fixing the f64 fdiv implementation.
Test is disabled for VI since there seems to be a problem with one
of the buffer load instructions on it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229236 91177308-0d34-0410-b5e6-96231b3b80d8
These instructions can only take a limited input range, and return
the constant value 1 out of range. We should do range reduction to
be able to process arbitrary values. Use a FRACT instruction after
normalization to achieve this. Also add a test for constant folding
with the lowered code with unsafe-fp-math enabled.
v2: use DAG lowering instead of intrinsic, adapt test
v3: calculate constant, fold pattern into instruction definition
v4: misc style fixes, add sin-fold testcase, cosmetics
Patch by Grigori Goronzy
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213458 91177308-0d34-0410-b5e6-96231b3b80d8
R600 was using a clamped version of rsq, but SI was not. Add a
new rsq_clamped intrinsic and use them consistently.
It's unclear to me from the documentation what behavior
the R600 instructions have, so I assume they have the legacy behavior
described by the SI documents. For R600, use RECIPSQRT_IEEE
for both llvm.AMDGPU.rsq.legacy and llvm.AMDGPU.rsq. R600 also
has RECIPSQRT_FF, which I'm not sure how it fits in here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211637 91177308-0d34-0410-b5e6-96231b3b80d8
These will be used for custom lowering and for library
implementations of various math functions, so it's useful
to expose these as builtins.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211247 91177308-0d34-0410-b5e6-96231b3b80d8
Delete all unused ones, and add new AMDGPU named intrinsics for
the ones that are. Handle the old AMDIL names for comptability (although
remove their GCCBuiltin names) and add tests since there weren't any
for these before.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210827 91177308-0d34-0410-b5e6-96231b3b80d8
Moving these patterns from TableGen files to PerformDAGCombine()
should allow us to generate better code by eliminating unnecessary
shifts and extensions earlier.
This also fixes a bug where the MAD pattern was calling
SimplifyDemandedBits with a 24-bit mask on the first operand
even when the full pattern wasn't being matched. This occasionally
resulted in some instructions being incorrectly deleted from the
program.
v2:
- Fix bug with 64-bit mul
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205731 91177308-0d34-0410-b5e6-96231b3b80d8
Only implemented for R600 so far. SI is missing implementations of a
few callbacks used by the Indirect Addressing pass and needs code to
handle frame indices.
At the moment R600 only supports array sizes of 16 dwords or less.
Register packing of vector types is currently disabled, which means that a
vec4 is stored in T0_X, T1_X, T2_X, T3_X, rather than T0_XYZW. In order
to correctly pack registers in all cases, we will need to implement an
analysis pass for R600 that determines the correct vector width for each
array.
v2:
- Add support for i8 zext load from stack.
- Coding style fixes
v3:
- Don't reserve registers for indirect addressing when it isn't
being used.
- Fix bug caused by LLVM limiting the number of SubRegIndex
declarations.
v4:
- Fix 64-bit defines
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174525 91177308-0d34-0410-b5e6-96231b3b80d8