Also add lit test for both cases on SI, and v2i32 for evergreen.
Note: I followed the guidance of the v4i32 EG check... UDIV produces really
complex code, so let's just check that the instruction was lowered
successfully.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184843 91177308-0d34-0410-b5e6-96231b3b80d8
Also add lit test for both cases on SI, and v2i32 for evergreen.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184842 91177308-0d34-0410-b5e6-96231b3b80d8
Also add lit test for both cases on SI, and v2i32 for evergreen.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184841 91177308-0d34-0410-b5e6-96231b3b80d8
Also add lit test for both cases on SI, and v2i32 for evergreen.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184840 91177308-0d34-0410-b5e6-96231b3b80d8
Also add lit test for both cases on SI, and v2i32 for evergreen.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184839 91177308-0d34-0410-b5e6-96231b3b80d8
Also add lit test for both cases on SI, and v2i32 for evergreen.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184838 91177308-0d34-0410-b5e6-96231b3b80d8
Also add lit test for both cases on SI, and v2i32 for evergreen.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184837 91177308-0d34-0410-b5e6-96231b3b80d8
In reality, some unaligned memory accesses are legal for 32-bit types and
smaller too, but it all depends on the address space. Allowing
unaligned loads/stores for > 32-bit types is mainly to prevent the
legalizer from splitting one load into multiple loads of smaller types.
https://bugs.freedesktop.org/show_bug.cgi?id=65873
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184822 91177308-0d34-0410-b5e6-96231b3b80d8
This should only make a difference in programs that use a lot of the
vector ALU instructions like BFI_INT and BIT_ALIGN. There is a slight
improvement in the phatk bitcoin mining kernel with this patch on
Evergreen (vector size == 1):
Before:
1173 Instruction Groups / 9520 dwords
After:
1167 Instruction Groups / 9510 dwords
Reviewed-by: Reviewed-by: Vincent Lejeune<vljn at ovi.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184819 91177308-0d34-0410-b5e6-96231b3b80d8
Also add a v2i32 test to the existing v4i32 test.
Patch by: Aaron Watry
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry<awatry@gmail.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184482 91177308-0d34-0410-b5e6-96231b3b80d8
Also add SI tests to existing file and a v2i32 test for both
R600 and SI.
Patch by: Aaron Watry
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184481 91177308-0d34-0410-b5e6-96231b3b80d8
The custom lowering causes llc to crash with a segfault.
Ideally, the custom lowering can be fixed, but this allows
programs which load/store v2i32 to work without crashing.
Patch by: Aaron Watry
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry<awatry@gmail.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184480 91177308-0d34-0410-b5e6-96231b3b80d8
Also add a seperate vector lit test file, since r600 doesn't seem to handle
v2i32 load/store yet, but we can test both for SI.
Patch by: Aaron Watry
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184021 91177308-0d34-0410-b5e6-96231b3b80d8
We were using RAT_INST_STORE_RAW, which seemed to work, but the docs
say this instruction doesn't exist for Cayman, so it's probably safer
to use a documented instruction instead.
Reviewed-by: Vincent Lejeune<vljn at ovi.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184015 91177308-0d34-0410-b5e6-96231b3b80d8
This is using a hint from AMD APP OpenCL Programming Guide with
empirically tweaked parameters.
I used Unigine Heaven 3.0 to determine best parameters on my system
(i7 2600/Radeon 6950/Kernel 3.9.4) the benchmark :
it went from 38.8 average fps to 39.6, which is ~3% gain.
(Lightmark 2008.2 gain is much more marginal: from 537 to 539)
There is no lit test provided as the parameter were determined
empirically and it it would be nearly impossiblet to find a test
program that check for optimal behavior.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183593 91177308-0d34-0410-b5e6-96231b3b80d8
We weren't computing structure size correctly and we were relying on
the original alloca instruction to compute the offset, which isn't
always reliable.
Reviewed-by: Vincent Lejeune <vljn@ovi.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183568 91177308-0d34-0410-b5e6-96231b3b80d8
This should simplify the subtarget definitions and make it easier to
add new ones.
Reviewed-by: Vincent Lejeune <vljn@ovi.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183566 91177308-0d34-0410-b5e6-96231b3b80d8
Previously commited @183279 but tests were failing, reverted @183286
It was broken because @183336 was missing, now it's there.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183343 91177308-0d34-0410-b5e6-96231b3b80d8