mirror of
https://github.com/c64scene-ar/llvm-6502.git
synced 2025-01-01 15:33:33 +00:00
fc22bfd921
This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for PowerPC, which provide programmer access to the lxvd2x, lxvw4x, stxvd2x, and stxvw4x instructions. New LLVM intrinsics are provided to represent these four instructions in IntrinsicsPowerPC.td. These are patterned after the similar intrinsics for lvx and stvx (Altivec). In PPCInstrVSX.td, these intrinsics are tied to the code gen patterns, with additional patterns to allow plain vanilla loads and stores to still generate these instructions. At -O1 and higher the intrinsics are immediately converted to loads and stores in InstCombineCalls.cpp. This will open up more optimization opportunities while still allowing the correct instructions to be generated. (Similar code exists for aligned Altivec loads and stores.) The new intrinsics are added to the code that checks for consecutive loads and stores in PPCISelLowering.cpp, as well as to PPCTargetLowering::getTgtMemIntrinsic(). There's a new test to verify the correct instructions are generated. The loads and stores tend to be reordered, so the test just counts their number. It runs at -O2, as it's not very effective to test this at -O0, when many unnecessary loads and stores are generated. I ended up having to modify vsx-fma-m.ll. It turns out this test case is slightly unreliable, but I don't know a good way to prevent problems with it. The xvmaddmdp instructions read and write the same register, which is one of the multiplicands. Commutativity allows either to be chosen. If the FMAs are reordered differently than expected by the test, the register assignment can be different as a result. Hopefully this doesn't change often. There is a companion patch for Clang. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221767 91177308-0d34-0410-b5e6-96231b3b80d8
37 lines
1.9 KiB
LLVM
37 lines
1.9 KiB
LLVM
; RUN: llc -mcpu=pwr8 -mattr=+vsx -O2 -mtriple=powerpc64-unknown-linux-gnu < %s > %t
|
|
; RUN: grep lxvw4x < %t | count 3
|
|
; RUN: grep lxvd2x < %t | count 3
|
|
; RUN: grep stxvw4x < %t | count 3
|
|
; RUN: grep stxvd2x < %t | count 3
|
|
|
|
@vsi = global <4 x i32> <i32 -1, i32 2, i32 -3, i32 4>, align 16
|
|
@vui = global <4 x i32> <i32 0, i32 1, i32 2, i32 3>, align 16
|
|
@vf = global <4 x float> <float -1.500000e+00, float 2.500000e+00, float -3.500000e+00, float 4.500000e+00>, align 16
|
|
@vsll = global <2 x i64> <i64 255, i64 -937>, align 16
|
|
@vull = global <2 x i64> <i64 1447, i64 2894>, align 16
|
|
@vd = global <2 x double> <double 3.500000e+00, double -7.500000e+00>, align 16
|
|
@res_vsi = common global <4 x i32> zeroinitializer, align 16
|
|
@res_vui = common global <4 x i32> zeroinitializer, align 16
|
|
@res_vf = common global <4 x float> zeroinitializer, align 16
|
|
@res_vsll = common global <2 x i64> zeroinitializer, align 16
|
|
@res_vull = common global <2 x i64> zeroinitializer, align 16
|
|
@res_vd = common global <2 x double> zeroinitializer, align 16
|
|
|
|
; Function Attrs: nounwind
|
|
define void @test1() {
|
|
entry:
|
|
%0 = load <4 x i32>* @vsi, align 16
|
|
%1 = load <4 x i32>* @vui, align 16
|
|
%2 = load <4 x i32>* bitcast (<4 x float>* @vf to <4 x i32>*), align 16
|
|
%3 = load <2 x double>* bitcast (<2 x i64>* @vsll to <2 x double>*), align 16
|
|
%4 = load <2 x double>* bitcast (<2 x i64>* @vull to <2 x double>*), align 16
|
|
%5 = load <2 x double>* @vd, align 16
|
|
store <4 x i32> %0, <4 x i32>* @res_vsi, align 16
|
|
store <4 x i32> %1, <4 x i32>* @res_vui, align 16
|
|
store <4 x i32> %2, <4 x i32>* bitcast (<4 x float>* @res_vf to <4 x i32>*), align 16
|
|
store <2 x double> %3, <2 x double>* bitcast (<2 x i64>* @res_vsll to <2 x double>*), align 16
|
|
store <2 x double> %4, <2 x double>* bitcast (<2 x i64>* @res_vull to <2 x double>*), align 16
|
|
store <2 x double> %5, <2 x double>* @res_vd, align 16
|
|
ret void
|
|
}
|