instructions use. This doesn't change any functionality except that long
constant expressions of these operations will now magically start working.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12840 91177308-0d34-0410-b5e6-96231b3b80d8
fld QWORD PTR [%ESP + 4]
fadd QWORD PTR [.CPItest_add_0]
instead of:
fld QWORD PTR [%ESP + 4]
fld QWORD PTR [.CPItest_add_0]
faddp %ST(1)
I also intend to do this for mul & div, but it appears that I have to
refactor a bit of code before I can do so.
This is tested by: test/Regression/CodeGen/X86/fp_constant_op.llx
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12839 91177308-0d34-0410-b5e6-96231b3b80d8
1. If an incoming argument is dead, don't load it from the stack
2. Do not code gen noop copies at all (ie, cast int -> uint), not even to
a move. This should reduce register pressure for allocators that are
unable to coallesce away these copies in some cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12835 91177308-0d34-0410-b5e6-96231b3b80d8
InstSelectSimple.cpp:
Change the checks for proper I/O port address size into an exit() instead
of an assertion. Assertions aren't used in Release builds, and handling
this error should be graceful (not that this counts as graceful, but it's
more graceful).
Modified the generation of the IN/OUT instructions to have 0 arguments.
X86InstrInfo.td:
Added the OpSize attribute to the 16 bit IN and OUT instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12786 91177308-0d34-0410-b5e6-96231b3b80d8
I/O port instructions on x86. The specific code sequence is tailored to
the parameters and return value of the intrinsic call.
Added the ability for implicit defintions to be printed in the Instruction
Printer.
Added the ability for RawFrm instruction to print implict uses and
defintions with correct comma output. This required adjustment to some
methods so that a leading comma would or would not be printed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12782 91177308-0d34-0410-b5e6-96231b3b80d8
Enable folding of long seteq/setne comparisons into branches and select instructions
Implement unfolded long relational comparisons against a constants a bit more efficiently
Folding comparisons changes code that looks like this:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
mov %ECX, %EAX
or %ECX, %EDX
sete %CL
test %CL, %CL
je .LBB2 # PC rel: F
into code that looks like this:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
mov %ECX, %EAX
or %ECX, %EDX
jne .LBB2 # PC rel: F
This speeds up 186.crafty by 6% with llc-ls.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12702 91177308-0d34-0410-b5e6-96231b3b80d8
of the words of the constant is zeros. For example:
Y = and long X, 1234
now generates:
Yl = and Xl, 1234
Yh = 0
instead of:
Yl = and Xl, 1234
Yh = and Xh, 0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12685 91177308-0d34-0410-b5e6-96231b3b80d8
* In promote32, if we can just promote a constant value, do so instead of
promoting a constant dynamically.
* In visitReturn inst, actually USE the promote32 argument that takes a
Value*
The end result of this is that we now generate this:
test:
mov %EAX, 0
ret
instead of...
test:
mov %AX, 0
movzx %EAX, %AX
ret
for:
ushort %test() {
ret ushort 0
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12679 91177308-0d34-0410-b5e6-96231b3b80d8
the X86 does not support a full set of fp cmove instructions, so we can't always
fold the condition into the select. :( Yuck.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12577 91177308-0d34-0410-b5e6-96231b3b80d8
an incoming value from a block, the selector would evaluate the constant
at the TOP of the block instead of at the end of the block. This made the
live range for the constant span the entire block, increasing register
pressure needlessly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12542 91177308-0d34-0410-b5e6-96231b3b80d8
folding load instructions into other instructions across free instruction
boundaries. Perhaps this will also fix the other strange failures?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12494 91177308-0d34-0410-b5e6-96231b3b80d8
testcase like this:
int %test(int* %P, int %A) {
%Pv = load int* %P
%B = add int %A, %Pv
ret int %B
}
We now generate:
test:
mov %ECX, DWORD PTR [%ESP + 4]
mov %EAX, DWORD PTR [%ESP + 8]
add %EAX, DWORD PTR [%ECX]
ret
Instead of:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
mov %EAX, DWORD PTR [%EAX]
add %EAX, %ECX
ret
... saving one instruction, and often a register. Note that there are a lot
of other instructions that could use this, but they aren't handled. I'm not
really interested in adding them, but mul/div and all of the FP instructions
could be supported as well if someone wanted to add them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12204 91177308-0d34-0410-b5e6-96231b3b80d8
of generating this code:
mov %EAX, 4
mov DWORD PTR [%ESP], %EAX
mov %AX, 123
movsx %EAX, %AX
mov DWORD PTR [%ESP + 4], %EAX
call Y
we now generate:
mov DWORD PTR [%ESP], 4
mov DWORD PTR [%ESP + 4], 123
call Y
Which hurts the eyes less. :)
Considering that register pressure around call sites is already high (with all
of the callee clobber registers n stuff), this may help a lot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12028 91177308-0d34-0410-b5e6-96231b3b80d8
their names more decriptive. A name consists of the base name, a
default operand size followed by a character per operand with an
optional special size. For example:
ADD8rr -> add, 8-bit register, 8-bit register
IMUL16rmi -> imul, 16-bit register, 16-bit memory, 16-bit immediate
IMUL16rmi8 -> imul, 16-bit register, 16-bit memory, 8-bit immediate
MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11995 91177308-0d34-0410-b5e6-96231b3b80d8
scaled indexes. This allows us to compile GEP's like this:
int* %test([10 x { int, { int } }]* %X, int %Idx) {
%Idx = cast int %Idx to long
%X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0
ret int* %X
}
Into a single address computation:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4]
ret
Before it generated:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
shl %ECX, 3
add %EAX, %ECX
lea %EAX, DWORD PTR [%EAX + 4]
ret
This is useful for things like int/float/double arrays, as the indexing can be folded into
the loads&stores, reducing register pressure and decreasing the pressure on the decode unit.
With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot. On
bzip2 for example, we go from this:
10665 asm-printer - Number of machine instrs printed
40 ra-local - Number of loads/stores folded into instructions
1708 ra-local - Number of loads added
1532 ra-local - Number of stores added
1354 twoaddressinstruction - Number of instructions added
1354 twoaddressinstruction - Number of two-address instructions
2794 x86-peephole - Number of peephole optimization performed
to this:
9873 asm-printer - Number of machine instrs printed
41 ra-local - Number of loads/stores folded into instructions
1710 ra-local - Number of loads added
1521 ra-local - Number of stores added
789 twoaddressinstruction - Number of instructions added
789 twoaddressinstruction - Number of two-address instructions
2142 x86-peephole - Number of peephole optimization performed
... and these types of instructions are often in tight loops.
Linear scan is also helped, but not as much. It goes from:
8787 asm-printer - Number of machine instrs printed
2389 liveintervals - Number of identity moves eliminated after coalescing
2288 liveintervals - Number of interval joins performed
3522 liveintervals - Number of intervals after coalescing
5810 liveintervals - Number of original intervals
700 spiller - Number of loads added
487 spiller - Number of stores added
303 spiller - Number of register spills
1354 twoaddressinstruction - Number of instructions added
1354 twoaddressinstruction - Number of two-address instructions
363 x86-peephole - Number of peephole optimization performed
to:
7982 asm-printer - Number of machine instrs printed
1759 liveintervals - Number of identity moves eliminated after coalescing
1658 liveintervals - Number of interval joins performed
3282 liveintervals - Number of intervals after coalescing
4940 liveintervals - Number of original intervals
635 spiller - Number of loads added
452 spiller - Number of stores added
288 spiller - Number of register spills
789 twoaddressinstruction - Number of instructions added
789 twoaddressinstruction - Number of two-address instructions
258 x86-peephole - Number of peephole optimization performed
Though I'm not complaining about the drop in the number of intervals. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11820 91177308-0d34-0410-b5e6-96231b3b80d8
to do analysis.
*** FOLD getelementptr instructions into loads and stores when possible,
making use of some of the crazy X86 addressing modes.
For example, the following C++ program fragment:
struct complex {
double re, im;
complex(double r, double i) : re(r), im(i) {}
};
inline complex operator+(const complex& a, const complex& b) {
return complex(a.re+b.re, a.im+b.im);
}
complex addone(const complex& arg) {
return arg + complex(1,0);
}
Used to be compiled to:
_Z6addoneRK7complex:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
*** mov %EDX, %ECX
fld QWORD PTR [%EDX]
fld1
faddp %ST(1)
*** add %ECX, 8
fld QWORD PTR [%ECX]
fldz
faddp %ST(1)
*** mov %ECX, %EAX
fxch %ST(1)
fstp QWORD PTR [%ECX]
*** add %EAX, 8
fstp QWORD PTR [%EAX]
ret
Now it is compiled to:
_Z6addoneRK7complex:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
fld QWORD PTR [%ECX]
fld1
faddp %ST(1)
fld QWORD PTR [%ECX + 8]
fldz
faddp %ST(1)
fxch %ST(1)
fstp QWORD PTR [%EAX]
fstp QWORD PTR [%EAX + 8]
ret
Other programs should see similar improvements, across the board. Note that
in addition to reducing instruction count, this also reduces register pressure
a lot, always a good thing on X86. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11819 91177308-0d34-0410-b5e6-96231b3b80d8