instructions use. This doesn't change any functionality except that long
constant expressions of these operations will now magically start working.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12840 91177308-0d34-0410-b5e6-96231b3b80d8
fld QWORD PTR [%ESP + 4]
fadd QWORD PTR [.CPItest_add_0]
instead of:
fld QWORD PTR [%ESP + 4]
fld QWORD PTR [.CPItest_add_0]
faddp %ST(1)
I also intend to do this for mul & div, but it appears that I have to
refactor a bit of code before I can do so.
This is tested by: test/Regression/CodeGen/X86/fp_constant_op.llx
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12839 91177308-0d34-0410-b5e6-96231b3b80d8
1. If an incoming argument is dead, don't load it from the stack
2. Do not code gen noop copies at all (ie, cast int -> uint), not even to
a move. This should reduce register pressure for allocators that are
unable to coallesce away these copies in some cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12835 91177308-0d34-0410-b5e6-96231b3b80d8
InstSelectSimple.cpp:
Change the checks for proper I/O port address size into an exit() instead
of an assertion. Assertions aren't used in Release builds, and handling
this error should be graceful (not that this counts as graceful, but it's
more graceful).
Modified the generation of the IN/OUT instructions to have 0 arguments.
X86InstrInfo.td:
Added the OpSize attribute to the 16 bit IN and OUT instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12786 91177308-0d34-0410-b5e6-96231b3b80d8
I/O port instructions on x86. The specific code sequence is tailored to
the parameters and return value of the intrinsic call.
Added the ability for implicit defintions to be printed in the Instruction
Printer.
Added the ability for RawFrm instruction to print implict uses and
defintions with correct comma output. This required adjustment to some
methods so that a leading comma would or would not be printed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12782 91177308-0d34-0410-b5e6-96231b3b80d8
function prologues, and fix an off-by-one in visitCallInst that was
putting call args into the wrong registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12757 91177308-0d34-0410-b5e6-96231b3b80d8
have no good way of handling this until the code generator is improved.
We should probably just emit V9 instructions in the meantime.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12745 91177308-0d34-0410-b5e6-96231b3b80d8
Preliminary support for division. It's gross because you have to initialize
the "Y" register, which is the top 32 bits of the thing you're dividing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12732 91177308-0d34-0410-b5e6-96231b3b80d8
Enable folding of long seteq/setne comparisons into branches and select instructions
Implement unfolded long relational comparisons against a constants a bit more efficiently
Folding comparisons changes code that looks like this:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
mov %ECX, %EAX
or %ECX, %EDX
sete %CL
test %CL, %CL
je .LBB2 # PC rel: F
into code that looks like this:
mov %EAX, DWORD PTR [%ESP + 4]
mov %EDX, DWORD PTR [%ESP + 8]
mov %ECX, %EAX
or %ECX, %EDX
jne .LBB2 # PC rel: F
This speeds up 186.crafty by 6% with llc-ls.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12702 91177308-0d34-0410-b5e6-96231b3b80d8
of the words of the constant is zeros. For example:
Y = and long X, 1234
now generates:
Yl = and Xl, 1234
Yh = 0
instead of:
Yl = and Xl, 1234
Yh = and Xh, 0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12685 91177308-0d34-0410-b5e6-96231b3b80d8
* In promote32, if we can just promote a constant value, do so instead of
promoting a constant dynamically.
* In visitReturn inst, actually USE the promote32 argument that takes a
Value*
The end result of this is that we now generate this:
test:
mov %EAX, 0
ret
instead of...
test:
mov %AX, 0
movzx %EAX, %AX
ret
for:
ushort %test() {
ret ushort 0
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12679 91177308-0d34-0410-b5e6-96231b3b80d8
have non-long indices for sequential types. In order to avoid trying to figure
out how the v9 backend works, we'll just hack it in the preselection pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12647 91177308-0d34-0410-b5e6-96231b3b80d8
Eliminating call-frame pseudo instrs and frame indices are still stubs.
Flesh out the emitPrologue method based on better ABI knowledge.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12632 91177308-0d34-0410-b5e6-96231b3b80d8
Move lowerselect pass to come after preselection. Move machine
code construction and stack slots pass to come right before instruction
selection. This is to help fix perlbmk.
Update comments.
Make the sequence of passes in addPassesToJITCompile look more like
the sequence of passes in addPassesToEmitAssembly, including support
for -print-machineinstrs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12614 91177308-0d34-0410-b5e6-96231b3b80d8
the X86 does not support a full set of fp cmove instructions, so we can't always
fold the condition into the select. :( Yuck.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12577 91177308-0d34-0410-b5e6-96231b3b80d8
an incoming value from a block, the selector would evaluate the constant
at the TOP of the block instead of at the end of the block. This made the
live range for the constant span the entire block, increasing register
pressure needlessly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12542 91177308-0d34-0410-b5e6-96231b3b80d8
Otherwise, if you're in debugging mode, you get warnings for (apparently)
every immediate constant in the function during reg. allocation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12538 91177308-0d34-0410-b5e6-96231b3b80d8
folding load instructions into other instructions across free instruction
boundaries. Perhaps this will also fix the other strange failures?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12494 91177308-0d34-0410-b5e6-96231b3b80d8
Make an explicit call to it from runOnFunction() if we know we're supposed to
write into the global. This is lame (esp. the const_cast), but it solves
the problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12291 91177308-0d34-0410-b5e6-96231b3b80d8
make the output more compact.
Divorce state-saving from the doFinalization method; for some reason it's not
getting called when I want it to, at Reoptimizer time. Put the guts in
PhyRegAlloc::finishSavingState(). Put an abort() in it so that I can be really
really sure that it's getting called.
Update comments.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12286 91177308-0d34-0410-b5e6-96231b3b80d8
De-constify SaveStateToModule; we have to set both it and SaveRegAllocState
explicitly in the reoptimizer.
Make SaveRegAllocState an 'external location' option.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12278 91177308-0d34-0410-b5e6-96231b3b80d8
testcase like this:
int %test(int* %P, int %A) {
%Pv = load int* %P
%B = add int %A, %Pv
ret int %B
}
We now generate:
test:
mov %ECX, DWORD PTR [%ESP + 4]
mov %EAX, DWORD PTR [%ESP + 8]
add %EAX, DWORD PTR [%ECX]
ret
Instead of:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
mov %EAX, DWORD PTR [%EAX]
add %EAX, %ECX
ret
... saving one instruction, and often a register. Note that there are a lot
of other instructions that could use this, but they aren't handled. I'm not
really interested in adding them, but mul/div and all of the FP instructions
could be supported as well if someone wanted to add them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12204 91177308-0d34-0410-b5e6-96231b3b80d8
been using the default target data layout object to lower malloc instructions,
causing us to allocate more memory than we needed! This could improve the
performance of the CBE generated code substantially!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12088 91177308-0d34-0410-b5e6-96231b3b80d8
(16) into certain areas of the SPARC V9 back-end. I'm fairly sure the US IIIi's
dcache has 32-byte lines, so I'm not sure where the 16 came from. However, in
the interest of not breaking things any more than they already are, I'm going
to leave the constant alone.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12043 91177308-0d34-0410-b5e6-96231b3b80d8
of generating this code:
mov %EAX, 4
mov DWORD PTR [%ESP], %EAX
mov %AX, 123
movsx %EAX, %AX
mov DWORD PTR [%ESP + 4], %EAX
call Y
we now generate:
mov DWORD PTR [%ESP], 4
mov DWORD PTR [%ESP + 4], 123
call Y
Which hurts the eyes less. :)
Considering that register pressure around call sites is already high (with all
of the callee clobber registers n stuff), this may help a lot.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@12028 91177308-0d34-0410-b5e6-96231b3b80d8
1) For 8-bit registers try to use first the ones that are parts of the
same register (AL then AH). This way we only alias 2 16/32-bit
registers after allocating 4 8-bit variables.
2) Move EBX as the last register to allocate. This will cause less
spills to happen since we will have 8-bit registers available up to
register excaustion (assuming we use the allocation order). It
would be nice if we could push all of the 8-bit aliased registers
towards the end but we much prefer to keep callee saved register to
the end to avoid saving them on entry and exit of the function.
For example this gives a slight reduction of spills with linear scan
on 164.gzip.
Before:
11221 asm-printer - Number of machine instrs printed
975 spiller - Number of loads added
675 spiller - Number of stores added
398 spiller - Number of register spills
After:
11182 asm-printer - Number of machine instrs printed
952 spiller - Number of loads added
652 spiller - Number of stores added
386 spiller - Number of register spills
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11996 91177308-0d34-0410-b5e6-96231b3b80d8
their names more decriptive. A name consists of the base name, a
default operand size followed by a character per operand with an
optional special size. For example:
ADD8rr -> add, 8-bit register, 8-bit register
IMUL16rmi -> imul, 16-bit register, 16-bit memory, 16-bit immediate
IMUL16rmi8 -> imul, 16-bit register, 16-bit memory, 8-bit immediate
MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11995 91177308-0d34-0410-b5e6-96231b3b80d8
parse. The name is now I (operand size)*. For example:
Im32 -> instruction with 32-bit memory operands.
Im16i8 -> instruction with 16-bit memory operands and 8 bit immediate
operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11970 91177308-0d34-0410-b5e6-96231b3b80d8
the size of the immediate and the memory operand on instructions that
use them. This resolves problems with instructions that take both a
memory and an immediate operand but their sizes differ (i.e. ADDmi32b).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11967 91177308-0d34-0410-b5e6-96231b3b80d8
an 8-bit immediate. So mark the shifts that take immediates as taking
an 8-bit argument. The rest with the implicit use of CL are marked
appropriately.
A bug still exists:
def SHLDmri32 : I2A8 <"shld", 0xA4, MRMDestMem>, TB; // [mem32] <<= [mem32],R32 imm8
The immediate in the above instruction is 8-bit but the memory
reference is 32-bit. The printer prints this as an 8-bit reference
which confuses the assembler. Same with SHRDmri32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11931 91177308-0d34-0410-b5e6-96231b3b80d8
Functions with linkonce linkage are declared with weak linkage.
Global floating point constants used to represent unprintable values
(such as NaN and infinity) are declared static so that they don't interfere
with other CBE generated translation units.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11884 91177308-0d34-0410-b5e6-96231b3b80d8
scaled indexes. This allows us to compile GEP's like this:
int* %test([10 x { int, { int } }]* %X, int %Idx) {
%Idx = cast int %Idx to long
%X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0
ret int* %X
}
Into a single address computation:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4]
ret
Before it generated:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
shl %ECX, 3
add %EAX, %ECX
lea %EAX, DWORD PTR [%EAX + 4]
ret
This is useful for things like int/float/double arrays, as the indexing can be folded into
the loads&stores, reducing register pressure and decreasing the pressure on the decode unit.
With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot. On
bzip2 for example, we go from this:
10665 asm-printer - Number of machine instrs printed
40 ra-local - Number of loads/stores folded into instructions
1708 ra-local - Number of loads added
1532 ra-local - Number of stores added
1354 twoaddressinstruction - Number of instructions added
1354 twoaddressinstruction - Number of two-address instructions
2794 x86-peephole - Number of peephole optimization performed
to this:
9873 asm-printer - Number of machine instrs printed
41 ra-local - Number of loads/stores folded into instructions
1710 ra-local - Number of loads added
1521 ra-local - Number of stores added
789 twoaddressinstruction - Number of instructions added
789 twoaddressinstruction - Number of two-address instructions
2142 x86-peephole - Number of peephole optimization performed
... and these types of instructions are often in tight loops.
Linear scan is also helped, but not as much. It goes from:
8787 asm-printer - Number of machine instrs printed
2389 liveintervals - Number of identity moves eliminated after coalescing
2288 liveintervals - Number of interval joins performed
3522 liveintervals - Number of intervals after coalescing
5810 liveintervals - Number of original intervals
700 spiller - Number of loads added
487 spiller - Number of stores added
303 spiller - Number of register spills
1354 twoaddressinstruction - Number of instructions added
1354 twoaddressinstruction - Number of two-address instructions
363 x86-peephole - Number of peephole optimization performed
to:
7982 asm-printer - Number of machine instrs printed
1759 liveintervals - Number of identity moves eliminated after coalescing
1658 liveintervals - Number of interval joins performed
3282 liveintervals - Number of intervals after coalescing
4940 liveintervals - Number of original intervals
635 spiller - Number of loads added
452 spiller - Number of stores added
288 spiller - Number of register spills
789 twoaddressinstruction - Number of instructions added
789 twoaddressinstruction - Number of two-address instructions
258 x86-peephole - Number of peephole optimization performed
Though I'm not complaining about the drop in the number of intervals. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11820 91177308-0d34-0410-b5e6-96231b3b80d8
to do analysis.
*** FOLD getelementptr instructions into loads and stores when possible,
making use of some of the crazy X86 addressing modes.
For example, the following C++ program fragment:
struct complex {
double re, im;
complex(double r, double i) : re(r), im(i) {}
};
inline complex operator+(const complex& a, const complex& b) {
return complex(a.re+b.re, a.im+b.im);
}
complex addone(const complex& arg) {
return arg + complex(1,0);
}
Used to be compiled to:
_Z6addoneRK7complex:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
*** mov %EDX, %ECX
fld QWORD PTR [%EDX]
fld1
faddp %ST(1)
*** add %ECX, 8
fld QWORD PTR [%ECX]
fldz
faddp %ST(1)
*** mov %ECX, %EAX
fxch %ST(1)
fstp QWORD PTR [%ECX]
*** add %EAX, 8
fstp QWORD PTR [%EAX]
ret
Now it is compiled to:
_Z6addoneRK7complex:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
fld QWORD PTR [%ECX]
fld1
faddp %ST(1)
fld QWORD PTR [%ECX + 8]
fldz
faddp %ST(1)
fxch %ST(1)
fstp QWORD PTR [%EAX]
fstp QWORD PTR [%EAX + 8]
ret
Other programs should see similar improvements, across the board. Note that
in addition to reducing instruction count, this also reduces register pressure
a lot, always a good thing on X86. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11819 91177308-0d34-0410-b5e6-96231b3b80d8
into a single LEA instruction. This should improve the code generated for
things like X->A.B.C[12].D.
The bigger benefit is still coming though. Note that this uses an LEA instruction
instead of an add, giving the register allocator more freedom. We should probably
never generate ADDri32's.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11817 91177308-0d34-0410-b5e6-96231b3b80d8
longer was getting this #include, it always fell back on the less precise
floating point initializer values, causing some testsuite failures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11803 91177308-0d34-0410-b5e6-96231b3b80d8
block into MachineBasicBlock::getFirstTerminator().
This also fixes a bug in the implementation of the above in both
RegAllocLocal and InstrSched, where instructions where added after the
terminator if the basic block's only instruction was a terminator (it
shouldn't matter for RegAllocLocal since this case never occurs in
practice).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11748 91177308-0d34-0410-b5e6-96231b3b80d8
use FP instructions. This reduces the number of instructions inserted in
176.gcc (for example) from 58074 to 101 (it doesn't use much FP, which
is typical). This reduction speeds up the entire code generator. In the
case of 176.gcc, llc went from taking 31.38s to 24.78s. The passes that
sped up the most are the register allocator and the 2 live variable analysis
passes, which sped up 2.3, 1.3, and 1.5s respectively. The asmprinter
pass also sped up because it doesn't print the instructions in comments :)
Note that this patch is likely to expose latent bugs in machine code passes,
because now basicblock can be empty, where they were never empty before. I
cleaned out regalloclocal, but who knows about linscan :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11717 91177308-0d34-0410-b5e6-96231b3b80d8
switch statements in the constructors and simplifies the
implementation of the getUseType() member function. You will have to
specify defs using MachineOperand::Def instead of MOTy::Def though
(similarly for Use and UseAndDef).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11715 91177308-0d34-0410-b5e6-96231b3b80d8
(minor) benefits right now:
1. An extra dummy MOVrr32 is gone. This move would often be coallesced by
both allocators anyway.
2. The code now uses the gep_type_iterator to walk the gep, which should future
proof it a bit. It still assumes that array indexes are Longs though.
These don't really justify rewriting the code. The big benefit will come later
though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11710 91177308-0d34-0410-b5e6-96231b3b80d8
value is a physreg and one is a virtreg. For this reason, disable copy folding
entirely for physregs. Also, use the new isMoveInstr target hook which gives us
folding of FP moves as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11700 91177308-0d34-0410-b5e6-96231b3b80d8