mirror of
https://github.com/c64scene-ar/llvm-6502.git
synced 2025-03-06 05:33:28 +00:00
Add a bunch of notes from my journey thus far.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@27170 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
parent
f48b50a7ef
commit
3ee9ffb0e5
@ -1,11 +1,5 @@
|
||||
//===- README_ALTIVEC.txt - Notes for improving Altivec code gen ----------===//
|
||||
|
||||
Implement TargetConstantVec, and set up PPC to custom lower ConstantVec into
|
||||
TargetConstantVec's if it's one of the many forms that are algorithmically
|
||||
computable using the spiffy altivec instructions.
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
Implement PPCInstrInfo::isLoadFromStackSlot/isStoreToStackSlot for vector
|
||||
registers, to generate better spill code.
|
||||
|
||||
@ -31,8 +25,6 @@ void foo(void) {
|
||||
Altivec: Codegen'ing MUL with vector FMADD should add -0.0, not 0.0:
|
||||
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8763
|
||||
|
||||
We need to codegen -0.0 vector efficiently (no constant pool load).
|
||||
|
||||
When -ffast-math is on, we can use 0.0.
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
@ -48,7 +40,109 @@ a load/store/lve*x sequence.
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
There are a wide range of vector constants we can generate with combinations of
|
||||
altivec instructions. For example, GCC does: t=vsplti*, r = t+t.
|
||||
altivec instructions. Examples
|
||||
GCC does: "t=vsplti*, r = t+t" for constants it can't generate with one vsplti
|
||||
|
||||
-0.0 (sign bit): vspltisw v0,-1 / vslw v0,v0,v0
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
Missing intrinsics:
|
||||
|
||||
ds*
|
||||
lve*
|
||||
lvs*
|
||||
lvx*
|
||||
mf*
|
||||
st*
|
||||
vavg*
|
||||
vexptefp
|
||||
vlogefp
|
||||
vmax*
|
||||
vmhaddshs/vmhraddshs
|
||||
vmin*
|
||||
vmladduhm
|
||||
vmr*
|
||||
vmsum*
|
||||
vmul*
|
||||
vperm
|
||||
vpk*
|
||||
vr*
|
||||
vsel (some aliases only accessible using builtins)
|
||||
vsl* (except vsldoi)
|
||||
vsr*
|
||||
vsum*
|
||||
vup*
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
FABS/FNEG can be codegen'd with the appropriate and/xor of -0.0.
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
For functions that use altivec AND have calls, we are VRSAVE'ing all call
|
||||
clobbered regs.
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
VSPLTW and friends are expanded by the FE into insert/extract element ops. Make
|
||||
sure that the dag combiner puts them back together in the appropriate
|
||||
vector_shuffle node and that this gets pattern matched appropriately.
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
Implement passing/returning vectors by value.
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
GCC apparently tries to codegen { C1, C2, Variable, C3 } as a constant pool load
|
||||
of C1/C2/C3, then a load and vperm of Variable.
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a 16-byte
|
||||
aligned stack slot, followed by a lve*x/vperm. We should probably just store it
|
||||
to a scalar stack slot, then use lvsl/vperm to load it. If the value is already
|
||||
in memory, this is a huge win.
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
Do not generate the MFCR/RLWINM sequence for predicate compares when the
|
||||
predicate compare is used immediately by a branch. Just branch on the right
|
||||
cond code on CR6.
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
SROA should turn "vector unions" into the appropriate insert/extract element
|
||||
instructions.
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
We need an LLVM 'shuffle' instruction, that corresponds to the VECTOR_SHUFFLE
|
||||
node.
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
We need a way to teach tblgen that some operands of an intrinsic are required to
|
||||
be constants. The verifier should enforce this constraint.
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
We should instcombine the lvx/stvx intrinsics into loads/stores if we know that
|
||||
the loaded address is 16-byte aligned.
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
Instead of writting a pattern for type-agnostic operations (e.g. gen-zero, load,
|
||||
store, and, ...) in every supported type, make legalize do the work. We should
|
||||
have a canonical type that we want operations changed to (e.g. v4i32 for
|
||||
build_vector) and legalize should change non-identical types to thse. This is
|
||||
similar to what it does for operations that are only supported in some types,
|
||||
e.g. x86 cmov (not supported on bytes).
|
||||
|
||||
This would fix two problems:
|
||||
1. Writing patterns multiple times.
|
||||
2. Identical operations in different types are not getting CSE'd (e.g.
|
||||
{ 0U, 0U, 0U, 0U } and {0.0, 0.0, 0.0, 0.0}.
|
||||
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user