2005-10-23 19:52:42 +00:00
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
// Random ideas for the X86 backend.
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
Add a MUL2U and MUL2S nodes to represent a multiply that returns both the
|
|
|
|
Hi and Lo parts (combination of MUL and MULH[SU] into one node). Add this to
|
|
|
|
X86, & make the dag combiner produce it when needed. This will eliminate one
|
|
|
|
imul from the code generated for:
|
|
|
|
|
|
|
|
long long test(long long X, long long Y) { return X*Y; }
|
|
|
|
|
|
|
|
by using the EAX result from the mul. We should add a similar node for
|
|
|
|
DIVREM.
|
|
|
|
|
2005-12-02 00:11:20 +00:00
|
|
|
another case is:
|
|
|
|
|
|
|
|
long long test(int X, int Y) { return (long long)X*Y; }
|
|
|
|
|
|
|
|
... which should only be one imul instruction.
|
|
|
|
|
2005-10-23 19:52:42 +00:00
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
This should be one DIV/IDIV instruction, not a libcall:
|
|
|
|
|
|
|
|
unsigned test(unsigned long long X, unsigned Y) {
|
|
|
|
return X/Y;
|
|
|
|
}
|
|
|
|
|
|
|
|
This can be done trivially with a custom legalizer. What about overflow
|
|
|
|
though? http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14224
|
|
|
|
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
Some targets (e.g. athlons) prefer freep to fstp ST(0):
|
|
|
|
http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00659.html
|
|
|
|
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
|
2006-01-12 22:54:21 +00:00
|
|
|
This should use fiadd on chips where it is profitable:
|
2005-10-23 19:52:42 +00:00
|
|
|
double foo(double P, int *I) { return P+*I; }
|
|
|
|
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
The FP stackifier needs to be global. Also, it should handle simple permutates
|
|
|
|
to reduce number of shuffle instructions, e.g. turning:
|
|
|
|
|
|
|
|
fld P -> fld Q
|
|
|
|
fld Q fld P
|
|
|
|
fxch
|
|
|
|
|
|
|
|
or:
|
|
|
|
|
|
|
|
fxch -> fucomi
|
|
|
|
fucomi jl X
|
|
|
|
jg X
|
|
|
|
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
Improvements to the multiply -> shift/add algorithm:
|
|
|
|
http://gcc.gnu.org/ml/gcc-patches/2004-08/msg01590.html
|
|
|
|
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
Improve code like this (occurs fairly frequently, e.g. in LLVM):
|
|
|
|
long long foo(int x) { return 1LL << x; }
|
|
|
|
|
|
|
|
http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01109.html
|
|
|
|
http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01128.html
|
|
|
|
http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01136.html
|
|
|
|
|
|
|
|
Another useful one would be ~0ULL >> X and ~0ULL << X.
|
|
|
|
|
2005-10-23 21:44:59 +00:00
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
Should support emission of the bswap instruction, probably by adding a new
|
|
|
|
DAG node for byte swapping. Also useful on PPC which has byte-swapping loads.
|
|
|
|
|
2005-11-28 04:52:39 +00:00
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
Compile this:
|
|
|
|
_Bool f(_Bool a) { return a!=1; }
|
|
|
|
|
|
|
|
into:
|
|
|
|
movzbl %dil, %eax
|
|
|
|
xorl $1, %eax
|
|
|
|
ret
|
2005-12-17 01:25:19 +00:00
|
|
|
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
Some isel ideas:
|
|
|
|
|
|
|
|
1. Dynamic programming based approach when compile time if not an
|
|
|
|
issue.
|
|
|
|
2. Code duplication (addressing mode) during isel.
|
|
|
|
3. Other ideas from "Register-Sensitive Selection, Duplication, and
|
|
|
|
Sequencing of Instructions".
|
|
|
|
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
Should we promote i16 to i32 to avoid partial register update stalls?
|
2005-12-17 06:54:43 +00:00
|
|
|
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
Leave any_extend as pseudo instruction and hint to register
|
|
|
|
allocator. Delay codegen until post register allocation.
|
2006-01-12 22:54:21 +00:00
|
|
|
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
Add a target specific hook to DAG combiner to handle SINT_TO_FP and
|
|
|
|
FP_TO_SINT when the source operand is already in memory.
|
|
|
|
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
|
|
|
|
Check if load folding would add a cycle in the dag.
|