mirror of
https://github.com/c64scene-ar/llvm-6502.git
synced 2024-12-28 04:33:05 +00:00
737c2ac4fc
Because we've canonicalised on using LD1/ST1, every time we do a bitcast between vector types we must do an equivalent lane reversal. Consider a simple memory load followed by a bitconvert then a store. v0 = load v2i32 v1 = BITCAST v2i32 v0 to v4i16 store v4i16 v2 In big endian mode every memory access has an implicit byte swap. LDR and STR do a 64-bit byte swap, whereas LD1/ST1 do a byte swap per lane - that is, they treat the vector as a sequence of elements to be byte-swapped. The two pairs of instructions are fundamentally incompatible. We've decided to use LD1/ST1 only to simplify compiler implementation. LD1/ST1 perform the equivalent of a sequence of LDR/STR + REV. This makes the original code sequence: v0 = load v2i32 v1 = REV v2i32 (implicit) v2 = BITCAST v2i32 v1 to v4i16 v3 = REV v4i16 v2 (implicit) store v4i16 v3 But this is now broken - the value stored is different to the value loaded due to lane reordering. To fix this, on every BITCAST we must perform two other REVs: v0 = load v2i32 v1 = REV v2i32 (implicit) v2 = REV v2i32 v3 = BITCAST v2i32 v2 to v4i16 v4 = REV v4i16 v5 = REV v4i16 v4 (implicit) store v4i16 v5 This means an extra two instructions, but actually in most cases the two REV instructions can be combined into one. For example: (REV64_2s (REV64_4h X)) === (REV32_4h X) There is also no 128-bit REV instruction. This must be synthesized with an EXT instruction. Most bitconverts require some sort of conversion. The only exceptions are: a) Identity conversions - vNfX <-> vNiX b) Single-lane-to-scalar - v1fX <-> fX or v1iX <-> iX Even though there are hundreds of changed lines, I have a fairly high confidence that they are somewhat correct. The changes to add two REV instructions per bitcast were pretty mechanical, and once I'd done that I threw the resulting .td at a script I wrote which combined the two REVs together (and added an EXT instruction, for f128) based on an instruction description I gave it. This was much less prone to error than doing it all manually, plus my brain would not just have melted but would have vapourised. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208194 91177308-0d34-0410-b5e6-96231b3b80d8 |
||
---|---|---|
.. | ||
AsmParser | ||
Disassembler | ||
InstPrinter | ||
MCTargetDesc | ||
TargetInfo | ||
Utils | ||
ARM64.h | ||
ARM64.td | ||
ARM64AddressTypePromotion.cpp | ||
ARM64AdvSIMDScalarPass.cpp | ||
ARM64AsmPrinter.cpp | ||
ARM64BranchRelaxation.cpp | ||
ARM64CallingConv.h | ||
ARM64CallingConvention.td | ||
ARM64CleanupLocalDynamicTLSPass.cpp | ||
ARM64CollectLOH.cpp | ||
ARM64ConditionalCompares.cpp | ||
ARM64DeadRegisterDefinitionsPass.cpp | ||
ARM64ExpandPseudoInsts.cpp | ||
ARM64FastISel.cpp | ||
ARM64FrameLowering.cpp | ||
ARM64FrameLowering.h | ||
ARM64InstrAtomics.td | ||
ARM64InstrFormats.td | ||
ARM64InstrInfo.cpp | ||
ARM64InstrInfo.h | ||
ARM64InstrInfo.td | ||
ARM64ISelDAGToDAG.cpp | ||
ARM64ISelLowering.cpp | ||
ARM64ISelLowering.h | ||
ARM64LoadStoreOptimizer.cpp | ||
ARM64MachineFunctionInfo.h | ||
ARM64MCInstLower.cpp | ||
ARM64MCInstLower.h | ||
ARM64PerfectShuffle.h | ||
ARM64PromoteConstant.cpp | ||
ARM64RegisterInfo.cpp | ||
ARM64RegisterInfo.h | ||
ARM64RegisterInfo.td | ||
ARM64SchedA53.td | ||
ARM64SchedCyclone.td | ||
ARM64Schedule.td | ||
ARM64SelectionDAGInfo.cpp | ||
ARM64SelectionDAGInfo.h | ||
ARM64StorePairSuppress.cpp | ||
ARM64Subtarget.cpp | ||
ARM64Subtarget.h | ||
ARM64TargetMachine.cpp | ||
ARM64TargetMachine.h | ||
ARM64TargetObjectFile.cpp | ||
ARM64TargetObjectFile.h | ||
ARM64TargetTransformInfo.cpp | ||
CMakeLists.txt | ||
LLVMBuild.txt | ||
Makefile |