llvm-6502/lib/Target/X86/README.txt

//===---------------------------------------------------------------------===//
// Random ideas for the X86 backend.
//===---------------------------------------------------------------------===//

Add a MUL2U and MUL2S nodes to represent a multiply that returns both the
Hi and Lo parts (combination of MUL and MULH[SU] into one node).  Add this to
X86, & make the dag combiner produce it when needed.  This will eliminate one
imul from the code generated for:

long long test(long long X, long long Y) { return X*Y; }

by using the EAX result from the mul.  We should add a similar node for
DIVREM.

//===---------------------------------------------------------------------===//

This should be one DIV/IDIV instruction, not a libcall:

unsigned test(unsigned long long X, unsigned Y) {
        return X/Y;
}

This can be done trivially with a custom legalizer.  What about overflow 
though?  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14224

//===---------------------------------------------------------------------===//

Need to add support for rotate instructions.

//===---------------------------------------------------------------------===//

Some targets (e.g. athlons) prefer freep to fstp ST(0):
http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00659.html

//===---------------------------------------------------------------------===//

This should use faddi on chips where it is profitable:
double foo(double P, int *I) { return P+*I; }

//===---------------------------------------------------------------------===//

The FP stackifier needs to be global.  Also, it should handle simple permutates
to reduce number of shuffle instructions, e.g. turning:

fld P	->		fld Q
fld Q			fld P
fxch

or:

fxch	->		fucomi
fucomi			jl X
jg X

//===---------------------------------------------------------------------===//

Improvements to the multiply -> shift/add algorithm:
http://gcc.gnu.org/ml/gcc-patches/2004-08/msg01590.html

//===---------------------------------------------------------------------===//

Improve code like this (occurs fairly frequently, e.g. in LLVM):
long long foo(int x) { return 1LL << x; }

http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01109.html
http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01128.html
http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01136.html

Another useful one would be  ~0ULL >> X and ~0ULL << X.
Put some of my random notes somewhere public git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23897 91177308-0d34-0410-b5e6-96231b3b80d8 2005-10-23 19:52:42 +00:00			`//===---------------------------------------------------------------------===//`
			`// Random ideas for the X86 backend.`
			`//===---------------------------------------------------------------------===//`

			`Add a MUL2U and MUL2S nodes to represent a multiply that returns both the`
			`Hi and Lo parts (combination of MUL and MULH[SU] into one node). Add this to`
			`X86, & make the dag combiner produce it when needed. This will eliminate one`
			`imul from the code generated for:`

			`long long test(long long X, long long Y) { return X*Y; }`

			`by using the EAX result from the mul. We should add a similar node for`
			`DIVREM.`

			`//===---------------------------------------------------------------------===//`

			`This should be one DIV/IDIV instruction, not a libcall:`

			`unsigned test(unsigned long long X, unsigned Y) {`
			`return X/Y;`
			`}`

			`This can be done trivially with a custom legalizer. What about overflow`
			`though? http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14224`

			`//===---------------------------------------------------------------------===//`

			`Need to add support for rotate instructions.`

			`//===---------------------------------------------------------------------===//`

			`Some targets (e.g. athlons) prefer freep to fstp ST(0):`
			`http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00659.html`

			`//===---------------------------------------------------------------------===//`

			`This should use faddi on chips where it is profitable:`
			`double foo(double P, int I) { return P+I; }`

			`//===---------------------------------------------------------------------===//`

			`The FP stackifier needs to be global. Also, it should handle simple permutates`
			`to reduce number of shuffle instructions, e.g. turning:`

			`fld P -> fld Q`
			`fld Q fld P`
			`fxch`

			`or:`

			`fxch -> fucomi`
			`fucomi jl X`
			`jg X`

			`//===---------------------------------------------------------------------===//`

			`Improvements to the multiply -> shift/add algorithm:`
			`http://gcc.gnu.org/ml/gcc-patches/2004-08/msg01590.html`

			`//===---------------------------------------------------------------------===//`

			`Improve code like this (occurs fairly frequently, e.g. in LLVM):`
			`long long foo(int x) { return 1LL << x; }`

			`http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01109.html`
			`http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01128.html`
			`http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01136.html`

			`Another useful one would be ~0ULL >> X and ~0ULL << X.`