comparisons. In an 'isunordered' predicate, which looks like this at
the LLVM level:
%a = call bool %llvm.isnan(double %X)
%b = call bool %llvm.isnan(double %Y)
%COM = or bool %a, %b
We used to generate this code:
fxch %ST(1)
fucomip %ST(0), %ST(0)
setp %AL
fucomip %ST(0), %ST(0)
setp %AH
or %AL, %AH
With this patch, we generate this code:
fucomip %ST(0), %ST(1)
fstp %ST(0)
setp %AL
Which should make alkis happy. Tested as X86/compare_folding.llx:test1
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14148 91177308-0d34-0410-b5e6-96231b3b80d8
This makes the code much simpler, and the two cases really do belong apart.
Once we do it, it's pretty obvious how flawed the logic was for A != A case,
so I fixed it (fixing PR369).
This also uses freeStackSlotAfter instead of inserting an fxchg then
popStackAfter'ing in the case where there is a dead result (unlikely, but
possible), producing better code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14139 91177308-0d34-0410-b5e6-96231b3b80d8
Get rid of separate numbering for LLVM BasicBlocks; use the automatically
generated MachineBasicBlock numbering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@13567 91177308-0d34-0410-b5e6-96231b3b80d8
sized allocas in the entry block). Instead of generating code like this:
entry:
reg1024 = ESP+1234
... (much later)
*reg1024 = 17
Generate code that looks like this:
entry:
(no code generated)
... (much later)
t = ESP+1234
*t = 17
The advantage being that we DRAMATICALLY reduce the register pressure for these
silly temporaries (they were all being spilled to the stack, resulting in very
silly code). This is actually a manual implementation of rematerialization :)
I have a patch to fold the alloca address computation into loads & stores, which
will make this much better still, but just getting this right took way too much time
and I'm sleepy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@13554 91177308-0d34-0410-b5e6-96231b3b80d8
compiling things like 'add long %X, 1'. The problem is that we were switching
the order of the operands for longs even though we can't fold them yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@13451 91177308-0d34-0410-b5e6-96231b3b80d8
In InsertFPRegKills(), just check the MachineBasicBlock for successors
instead of its corresponding BasicBlock.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@13213 91177308-0d34-0410-b5e6-96231b3b80d8
The iterator is pointing at the next instruction which should not disappear
when doing the load/store replacement.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12954 91177308-0d34-0410-b5e6-96231b3b80d8
even when the "optimization" I added before is turned off. It generates this
extremely pointless code:
test:
fld QWORD PTR [%ESP + 4]
mov %AL, 0
test %AL, %AL
fcmove %ST(0), %ST(0)
ret
Good thing the optimizer will have removed this before code generation
anyway. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12939 91177308-0d34-0410-b5e6-96231b3b80d8
Fix several bugs in the intrinsics:
1. Make sure to copy the input registers before the instructions that use them
2. Make sure to copy the value returned by 'in' out of EAX into the register
it is supposed to be in.
This fixes assertions when using in/out and linear scan.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12896 91177308-0d34-0410-b5e6-96231b3b80d8
of the fucom[p][p] instructions. This allows us to code generate this function
bool %test(double %X, double %Y) {
%C = setlt double %Y, %X
ret bool %C
}
... into:
test:
fld QWORD PTR [%ESP + 4]
fld QWORD PTR [%ESP + 12]
fucomip %ST(1)
fstp %ST(0)
setb %AL
movsx %EAX, %AL
ret
where before we generated:
test:
fld QWORD PTR [%ESP + 4]
fld QWORD PTR [%ESP + 12]
fucompp
** fnstsw
** sahf
setb %AL
movsx %EAX, %AL
ret
The two marked instructions (which are the ones eliminated) are very bad,
because they serialize execution of the processor. These instructions are
available on the PPRO and later, but since we already use cmov's we aren't
losing any portability.
I retained the old code for the day when we decide we want to support back
to the 386.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12852 91177308-0d34-0410-b5e6-96231b3b80d8
If the source of the cast is a load, we can just use the source memory location,
without having to create a temporary stack slot entry.
Before we code generated this:
double %int(int* %P) {
%V = load int* %P
%V2 = cast int %V to double
ret double %V2
}
into:
int:
sub %ESP, 4
mov %EAX, DWORD PTR [%ESP + 8]
mov %EAX, DWORD PTR [%EAX]
mov DWORD PTR [%ESP], %EAX
fild DWORD PTR [%ESP]
add %ESP, 4
ret
Now we produce this:
int:
mov %EAX, DWORD PTR [%ESP + 4]
fild DWORD PTR [%EAX]
ret
... which is nicer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12846 91177308-0d34-0410-b5e6-96231b3b80d8
for mul and div.
Instead of generating this:
test_divr:
fld QWORD PTR [%ESP + 4]
fld QWORD PTR [.CPItest_divr_0]
fdivrp %ST(1)
ret
We now generate this:
test_divr:
fld QWORD PTR [%ESP + 4]
fdivr QWORD PTR [.CPItest_divr_0]
ret
This code desperately needs refactoring, which will come in the next
patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12841 91177308-0d34-0410-b5e6-96231b3b80d8
instructions use. This doesn't change any functionality except that long
constant expressions of these operations will now magically start working.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12840 91177308-0d34-0410-b5e6-96231b3b80d8
fld QWORD PTR [%ESP + 4]
fadd QWORD PTR [.CPItest_add_0]
instead of:
fld QWORD PTR [%ESP + 4]
fld QWORD PTR [.CPItest_add_0]
faddp %ST(1)
I also intend to do this for mul & div, but it appears that I have to
refactor a bit of code before I can do so.
This is tested by: test/Regression/CodeGen/X86/fp_constant_op.llx
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12839 91177308-0d34-0410-b5e6-96231b3b80d8
1. If an incoming argument is dead, don't load it from the stack
2. Do not code gen noop copies at all (ie, cast int -> uint), not even to
a move. This should reduce register pressure for allocators that are
unable to coallesce away these copies in some cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12835 91177308-0d34-0410-b5e6-96231b3b80d8
InstSelectSimple.cpp:
Change the checks for proper I/O port address size into an exit() instead
of an assertion. Assertions aren't used in Release builds, and handling
this error should be graceful (not that this counts as graceful, but it's
more graceful).
Modified the generation of the IN/OUT instructions to have 0 arguments.
X86InstrInfo.td:
Added the OpSize attribute to the 16 bit IN and OUT instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12786 91177308-0d34-0410-b5e6-96231b3b80d8
I/O port instructions on x86. The specific code sequence is tailored to
the parameters and return value of the intrinsic call.
Added the ability for implicit defintions to be printed in the Instruction
Printer.
Added the ability for RawFrm instruction to print implict uses and
defintions with correct comma output. This required adjustment to some
methods so that a leading comma would or would not be printed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12782 91177308-0d34-0410-b5e6-96231b3b80d8
Enable folding of long seteq/setne comparisons into branches and select instructions
Implement unfolded long relational comparisons against a constants a bit more efficiently
Folding comparisons changes code that looks like this:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
mov %ECX, %EAX
or %ECX, %EDX
sete %CL
test %CL, %CL
je .LBB2 # PC rel: F
into code that looks like this:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
mov %ECX, %EAX
or %ECX, %EDX
jne .LBB2 # PC rel: F
This speeds up 186.crafty by 6% with llc-ls.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12702 91177308-0d34-0410-b5e6-96231b3b80d8
of the words of the constant is zeros. For example:
Y = and long X, 1234
now generates:
Yl = and Xl, 1234
Yh = 0
instead of:
Yl = and Xl, 1234
Yh = and Xh, 0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12685 91177308-0d34-0410-b5e6-96231b3b80d8
* In promote32, if we can just promote a constant value, do so instead of
promoting a constant dynamically.
* In visitReturn inst, actually USE the promote32 argument that takes a
Value*
The end result of this is that we now generate this:
test:
mov %EAX, 0
ret
instead of...
test:
mov %AX, 0
movzx %EAX, %AX
ret
for:
ushort %test() {
ret ushort 0
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12679 91177308-0d34-0410-b5e6-96231b3b80d8
the X86 does not support a full set of fp cmove instructions, so we can't always
fold the condition into the select. :( Yuck.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12577 91177308-0d34-0410-b5e6-96231b3b80d8
an incoming value from a block, the selector would evaluate the constant
at the TOP of the block instead of at the end of the block. This made the
live range for the constant span the entire block, increasing register
pressure needlessly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12542 91177308-0d34-0410-b5e6-96231b3b80d8
folding load instructions into other instructions across free instruction
boundaries. Perhaps this will also fix the other strange failures?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12494 91177308-0d34-0410-b5e6-96231b3b80d8
testcase like this:
int %test(int* %P, int %A) {
%Pv = load int* %P
%B = add int %A, %Pv
ret int %B
}
We now generate:
test:
mov %ECX, DWORD PTR [%ESP + 4]
mov %EAX, DWORD PTR [%ESP + 8]
add %EAX, DWORD PTR [%ECX]
ret
Instead of:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
mov %EAX, DWORD PTR [%EAX]
add %EAX, %ECX
ret
... saving one instruction, and often a register. Note that there are a lot
of other instructions that could use this, but they aren't handled. I'm not
really interested in adding them, but mul/div and all of the FP instructions
could be supported as well if someone wanted to add them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12204 91177308-0d34-0410-b5e6-96231b3b80d8
(16) into certain areas of the SPARC V9 back-end. I'm fairly sure the US IIIi's
dcache has 32-byte lines, so I'm not sure where the 16 came from. However, in
the interest of not breaking things any more than they already are, I'm going
to leave the constant alone.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12043 91177308-0d34-0410-b5e6-96231b3b80d8
of generating this code:
mov %EAX, 4
mov DWORD PTR [%ESP], %EAX
mov %AX, 123
movsx %EAX, %AX
mov DWORD PTR [%ESP + 4], %EAX
call Y
we now generate:
mov DWORD PTR [%ESP], 4
mov DWORD PTR [%ESP + 4], 123
call Y
Which hurts the eyes less. :)
Considering that register pressure around call sites is already high (with all
of the callee clobber registers n stuff), this may help a lot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12028 91177308-0d34-0410-b5e6-96231b3b80d8
1) For 8-bit registers try to use first the ones that are parts of the
same register (AL then AH). This way we only alias 2 16/32-bit
registers after allocating 4 8-bit variables.
2) Move EBX as the last register to allocate. This will cause less
spills to happen since we will have 8-bit registers available up to
register excaustion (assuming we use the allocation order). It
would be nice if we could push all of the 8-bit aliased registers
towards the end but we much prefer to keep callee saved register to
the end to avoid saving them on entry and exit of the function.
For example this gives a slight reduction of spills with linear scan
on 164.gzip.
Before:
11221 asm-printer - Number of machine instrs printed
975 spiller - Number of loads added
675 spiller - Number of stores added
398 spiller - Number of register spills
After:
11182 asm-printer - Number of machine instrs printed
952 spiller - Number of loads added
652 spiller - Number of stores added
386 spiller - Number of register spills
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11996 91177308-0d34-0410-b5e6-96231b3b80d8
their names more decriptive. A name consists of the base name, a
default operand size followed by a character per operand with an
optional special size. For example:
ADD8rr -> add, 8-bit register, 8-bit register
IMUL16rmi -> imul, 16-bit register, 16-bit memory, 16-bit immediate
IMUL16rmi8 -> imul, 16-bit register, 16-bit memory, 8-bit immediate
MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11995 91177308-0d34-0410-b5e6-96231b3b80d8
parse. The name is now I (operand size)*. For example:
Im32 -> instruction with 32-bit memory operands.
Im16i8 -> instruction with 16-bit memory operands and 8 bit immediate
operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11970 91177308-0d34-0410-b5e6-96231b3b80d8
the size of the immediate and the memory operand on instructions that
use them. This resolves problems with instructions that take both a
memory and an immediate operand but their sizes differ (i.e. ADDmi32b).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11967 91177308-0d34-0410-b5e6-96231b3b80d8