llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2024-11-15 04:08:07 +00:00

History

Evan Cheng e826a018b9 Added notes about a x86 isel deficiency. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@25706 91177308-0d34-0410-b5e6-96231b3b80d8		2006-01-27 22:11:01 +00:00
..
.cvsignore	ignore generated files	2004-11-21 00:01:54 +00:00
Makefile	Added preliminary x86 subtarget support.	2006-01-26 09:53:06 +00:00
README.txt	Added notes about a x86 isel deficiency.	2006-01-27 22:11:01 +00:00
X86.h	Bye bye Pattern ISel, hello DAG ISel.	2006-01-27 21:26:54 +00:00
X86.td	x86 CPU detection and proper subtarget support	2006-01-27 08:10:46 +00:00
X86AsmPrinter.cpp	Use the shared asmprinter code for printing special llvm globals	2005-12-13 06:32:50 +00:00
X86AsmPrinter.h	Use the shared asmprinter code for printing special llvm globals	2005-12-13 06:32:50 +00:00
X86ATTAsmPrinter.cpp	Work around some x86 Darwin assembler bugs	2006-01-26 02:27:43 +00:00
X86ATTAsmPrinter.h	No longer track value types for asm printer operands, and remove them as	2005-11-30 18:54:35 +00:00
X86CodeEmitter.cpp	Unbreak the JIT with SSE	2006-01-27 18:27:18 +00:00
X86ELFWriter.cpp	Refactor things a bit to allow the ELF code emitter to run the X86 machine code emitter	2005-07-11 05:17:48 +00:00
X86FloatingPoint.cpp	Improve compatibility with VC2005, patch by Morten Ofstad!	2006-01-26 20:41:32 +00:00
X86InstrBuilder.h	* Remove trailing whitespace	2005-04-21 23:38:14 +00:00
X86InstrInfo.cpp	Properly split f32 and f64 into separate register classes for scalar sse fp	2005-10-14 22:06:00 +00:00
X86InstrInfo.h	Eliminate tabs and trailing spaces.	2005-07-27 05:53:44 +00:00
X86InstrInfo.td	x86 CPU detection and proper subtarget support	2006-01-27 08:10:46 +00:00
X86IntelAsmPrinter.cpp	Add explicit #includes of <iostream>	2006-01-22 23:41:00 +00:00
X86IntelAsmPrinter.h	Fix a typo in my latest change	2005-11-30 18:57:39 +00:00
X86ISelDAGToDAG.cpp	x86 CPU detection and proper subtarget support	2006-01-27 08:10:46 +00:00
X86ISelLowering.cpp	Bye bye Pattern ISel, hello DAG ISel.	2006-01-27 21:26:54 +00:00
X86ISelLowering.h	Remove TLI.LowerReturnTo, and just let targets custom lower ISD::RET for	2006-01-27 21:09:22 +00:00
X86ISelPattern.cpp	x86 CPU detection and proper subtarget support	2006-01-27 08:10:46 +00:00
X86JITInfo.cpp	Improve compatibility with VC2005, patch by Morten Ofstad!	2006-01-26 19:55:20 +00:00
X86JITInfo.h	turn off GOT on archs that didn't use it (not that it appeard to harm them much with it on)	2005-07-29 23:32:02 +00:00
X86PeepholeOpt.cpp	remove some never-completed and now-obsolete code.	2005-12-12 20:12:20 +00:00
X86RegisterInfo.cpp	Support for ADD_PARTS, SUB_PARTS, SHL_PARTS, SHR_PARTS, and SRA_PARTS.	2006-01-09 18:33:28 +00:00
X86RegisterInfo.h	Pass extra regclasses into spilling code	2005-09-30 01:29:42 +00:00
X86RegisterInfo.td	Remove the uses of STATUS flag register. Rely on node property SDNPInFlag,	2006-01-26 00:29:36 +00:00
X86Relocations.h	* Remove trailing whitespace	2005-04-21 23:38:14 +00:00
X86Subtarget.cpp	Added a temporary option -enable-x86-sse to enable sse support. It is used by	2006-01-27 21:49:34 +00:00
X86Subtarget.h	x86 CPU detection and proper subtarget support	2006-01-27 08:10:46 +00:00
X86TargetMachine.cpp	Bye bye Pattern ISel, hello DAG ISel.	2006-01-27 21:26:54 +00:00
X86TargetMachine.h	Add a new option to indicate we want the code generator to emit code quickly,not spending tons of time microoptimizing it. This is useful for an -O0style of build.	2005-11-08 02:11:51 +00:00

README.txt

//===---------------------------------------------------------------------===//
// Random ideas for the X86 backend.
//===---------------------------------------------------------------------===//

Add a MUL2U and MUL2S nodes to represent a multiply that returns both the
Hi and Lo parts (combination of MUL and MULH[SU] into one node).  Add this to
X86, & make the dag combiner produce it when needed.  This will eliminate one
imul from the code generated for:

long long test(long long X, long long Y) { return X*Y; }

by using the EAX result from the mul.  We should add a similar node for
DIVREM.

another case is:

long long test(int X, int Y) { return (long long)X*Y; }

... which should only be one imul instruction.

//===---------------------------------------------------------------------===//

This should be one DIV/IDIV instruction, not a libcall:

unsigned test(unsigned long long X, unsigned Y) {
        return X/Y;
}

This can be done trivially with a custom legalizer.  What about overflow 
though?  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14224

//===---------------------------------------------------------------------===//

Some targets (e.g. athlons) prefer freep to fstp ST(0):
http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00659.html

//===---------------------------------------------------------------------===//

This should use fiadd on chips where it is profitable:
double foo(double P, int *I) { return P+*I; }

//===---------------------------------------------------------------------===//

The FP stackifier needs to be global.  Also, it should handle simple permutates
to reduce number of shuffle instructions, e.g. turning:

fld P	->		fld Q
fld Q			fld P
fxch

or:

fxch	->		fucomi
fucomi			jl X
jg X

Ideas:
http://gcc.gnu.org/ml/gcc-patches/2004-11/msg02410.html


//===---------------------------------------------------------------------===//

Improvements to the multiply -> shift/add algorithm:
http://gcc.gnu.org/ml/gcc-patches/2004-08/msg01590.html

//===---------------------------------------------------------------------===//

Improve code like this (occurs fairly frequently, e.g. in LLVM):
long long foo(int x) { return 1LL << x; }

http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01109.html
http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01128.html
http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01136.html

Another useful one would be  ~0ULL >> X and ~0ULL << X.

//===---------------------------------------------------------------------===//

Should support emission of the bswap instruction, probably by adding a new
DAG node for byte swapping.  Also useful on PPC which has byte-swapping loads.

//===---------------------------------------------------------------------===//

Compile this:
_Bool f(_Bool a) { return a!=1; }

into:
        movzbl  %dil, %eax
        xorl    $1, %eax
        ret

//===---------------------------------------------------------------------===//

Some isel ideas:

1. Dynamic programming based approach when compile time if not an
   issue.
2. Code duplication (addressing mode) during isel.
3. Other ideas from "Register-Sensitive Selection, Duplication, and
   Sequencing of Instructions".

//===---------------------------------------------------------------------===//

Should we promote i16 to i32 to avoid partial register update stalls?

//===---------------------------------------------------------------------===//

Leave any_extend as pseudo instruction and hint to register
allocator. Delay codegen until post register allocation.

//===---------------------------------------------------------------------===//

Add a target specific hook to DAG combiner to handle SINT_TO_FP and
FP_TO_SINT when the source operand is already in memory.

//===---------------------------------------------------------------------===//

Check if load folding would add a cycle in the dag.

//===---------------------------------------------------------------------===//

Model X86 EFLAGS as a real register to avoid redudant cmp / test. e.g.

	cmpl $1, %eax
	setg %al
	testb %al, %al  # unnecessary
	jne .BB7

//===---------------------------------------------------------------------===//

Count leading zeros and count trailing zeros:

int clz(int X) { return __builtin_clz(X); }
int ctz(int X) { return __builtin_ctz(X); }

$ gcc t.c -S -o - -O3  -fomit-frame-pointer -masm=intel
clz:
        bsr     %eax, DWORD PTR [%esp+4]
        xor     %eax, 31
        ret
ctz:
        bsf     %eax, DWORD PTR [%esp+4]
        ret

however, check that these are defined for 0 and 32.  Our intrinsics are, GCC's
aren't.

//===---------------------------------------------------------------------===//

Use push/pop instructions in prolog/epilog sequences instead of stores off 
ESP (certain code size win, perf win on some [which?] processors).

//===---------------------------------------------------------------------===//

Only use inc/neg/not instructions on processors where they are faster than
add/sub/xor.  They are slower on the P4 due to only updating some processor
flags.

//===---------------------------------------------------------------------===//

Open code rint,floor,ceil,trunc:
http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02006.html
http://gcc.gnu.org/ml/gcc-patches/2004-08/msg02011.html

//===---------------------------------------------------------------------===//

Combine: a = sin(x), b = cos(x) into a,b = sincos(x).

//===---------------------------------------------------------------------===//

Solve this DAG isel folding deficiency:

int X, Y;

void fn1(void)
{
  X = X | (Y << 3);
}

compiles to

fn1:
	movl Y, %eax
	shll $3, %eax
	orl X, %eax
	movl %eax, X
	ret

The problem is the store's chain operand is not the load X but rather
a TokenFactor of the load X and load Y. This prevents the folding.