double %test(uint %X) {
%tmp.1 = cast uint %X to double ; <double> [#uses=1]
ret double %tmp.1
}
into:
test:
sub %ESP, 8
mov %EAX, DWORD PTR [%ESP + 12]
mov %ECX, 0
mov DWORD PTR [%ESP], %EAX
mov DWORD PTR [%ESP + 4], %ECX
fild QWORD PTR [%ESP]
add %ESP, 8
ret
... which basically zero extends to 8 bytes, then does an fild for an
8-byte signed int.
Now we generate this:
test:
sub %ESP, 4
mov %EAX, DWORD PTR [%ESP + 8]
mov DWORD PTR [%ESP], %EAX
fild DWORD PTR [%ESP]
shr %EAX, 31
fadd DWORD PTR [.CPItest_0 + 4*%EAX]
add %ESP, 4
ret
.section .rodata
.align 4
.CPItest_0:
.quad 5728578726015270912
This does a 32-bit signed integer load, then adds in an offset if the sign
bit of the integer was set.
It turns out that this is substantially faster than the preceeding sequence.
Consider this testcase:
unsigned a[2]={1,2};
volatile double G;
void main() {
int i;
for (i=0; i<100000000; ++i )
G += a[i&1];
}
On zion (a P4 Xeon, 3Ghz), this patch speeds up the testcase from 2.140s
to 0.94s.
On apoc, an athlon MP 2100+, this patch speeds up the testcase from 1.72s
to 1.34s.
Note that the program takes 2.5s/1.97s on zion/apoc with GCC 3.3 -O3
-fomit-frame-pointer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17083 91177308-0d34-0410-b5e6-96231b3b80d8
%X = and Y, constantint
%Z = setcc %X, 0
instead of emitting:
and %EAX, 3
test %EAX, %EAX
je .LBBfoo2_2 # UnifiedReturnBlock
We now emit:
test %EAX, 3
je .LBBfoo2_2 # UnifiedReturnBlock
This triggers 581 times on 176.gcc for example.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17080 91177308-0d34-0410-b5e6-96231b3b80d8
case:
int C[100];
int foo() {
return C[4];
}
We now codegen:
foo:
mov %EAX, DWORD PTR [C + 16]
ret
instead of:
foo:
mov %EAX, OFFSET C
mov %EAX, DWORD PTR [%EAX + 16]
ret
Other impressive features may be coming later.
This patch is contributed by Jeff Cohen!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17011 91177308-0d34-0410-b5e6-96231b3b80d8
which prevented setcc's from being folded into branches. It appears that
conditional branchinst's CC operand is actually operand(2), not operand(0)
as we might expect. :(
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16859 91177308-0d34-0410-b5e6-96231b3b80d8
t:
mov %EDX, DWORD PTR [%ESP + 4]
mov %ECX, 2
mov %EAX, %EDX
sar %EDX, 31
idiv %ECX
mov %EAX, %EDX
ret
Generate:
t:
mov %ECX, DWORD PTR [%ESP + 4]
*** mov %EAX, %ECX
cdq
and %ECX, 1
xor %ECX, %EDX
sub %ECX, %EDX
*** mov %EAX, %ECX
ret
Note that the two marked moves are redundant, and should be eliminated by the
register allocator, but aren't.
Compare this to GCC, which generates:
t:
mov %eax, DWORD PTR [%esp+4]
mov %edx, %eax
shr %edx, 31
lea %ecx, [%edx+%eax]
and %ecx, -2
sub %eax, %ecx
ret
or ICC 8.0, which generates:
t:
movl 4(%esp), %ecx #3.5
movl $-2147483647, %eax #3.25
imull %ecx #3.25
movl %ecx, %eax #3.25
sarl $31, %eax #3.25
addl %ecx, %edx #3.25
subl %edx, %eax #3.25
addl %eax, %eax #3.25
negl %eax #3.25
subl %eax, %ecx #3.25
movl %ecx, %eax #3.25
ret #3.25
We would be in great shape if not for the moves.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16763 91177308-0d34-0410-b5e6-96231b3b80d8
Move include/Config and include/Support into include/llvm/Config,
include/llvm/ADT and include/llvm/Support. From here on out, all LLVM
public header files must be under include/llvm/.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16137 91177308-0d34-0410-b5e6-96231b3b80d8
improvements on instruction selection that account for register and frame
index bases.
Patch contributed by Jeff Cohen. Thanks Jeff!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16110 91177308-0d34-0410-b5e6-96231b3b80d8
this LLVM function:
int %foo() {
ret int cast (int** getelementptr (int** null, int 1) to int)
}
into:
foo:
mov %EAX, 0
lea %EAX, DWORD PTR [%EAX + 4]
ret
now we compile it into:
foo:
mov %EAX, 4
ret
This sequence is frequently generated by the MSIL front-end, and soon the malloc lowering pass and
Java front-ends as well..
-Chris
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14834 91177308-0d34-0410-b5e6-96231b3b80d8
float as a truncation by going through memory. This truncation was being
skipped, which caused 175.vpr to fail after aggressive register promotion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14473 91177308-0d34-0410-b5e6-96231b3b80d8
comparisons. In an 'isunordered' predicate, which looks like this at
the LLVM level:
%a = call bool %llvm.isnan(double %X)
%b = call bool %llvm.isnan(double %Y)
%COM = or bool %a, %b
We used to generate this code:
fxch %ST(1)
fucomip %ST(0), %ST(0)
setp %AL
fucomip %ST(0), %ST(0)
setp %AH
or %AL, %AH
With this patch, we generate this code:
fucomip %ST(0), %ST(1)
fstp %ST(0)
setp %AL
Which should make alkis happy. Tested as X86/compare_folding.llx:test1
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14148 91177308-0d34-0410-b5e6-96231b3b80d8
sized allocas in the entry block). Instead of generating code like this:
entry:
reg1024 = ESP+1234
... (much later)
*reg1024 = 17
Generate code that looks like this:
entry:
(no code generated)
... (much later)
t = ESP+1234
*t = 17
The advantage being that we DRAMATICALLY reduce the register pressure for these
silly temporaries (they were all being spilled to the stack, resulting in very
silly code). This is actually a manual implementation of rematerialization :)
I have a patch to fold the alloca address computation into loads & stores, which
will make this much better still, but just getting this right took way too much time
and I'm sleepy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@13554 91177308-0d34-0410-b5e6-96231b3b80d8
compiling things like 'add long %X, 1'. The problem is that we were switching
the order of the operands for longs even though we can't fold them yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@13451 91177308-0d34-0410-b5e6-96231b3b80d8
In InsertFPRegKills(), just check the MachineBasicBlock for successors
instead of its corresponding BasicBlock.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@13213 91177308-0d34-0410-b5e6-96231b3b80d8