1. If we are two-addressing a commutable instruction and the LHS is not the
last use of the variable, see if the instruction is the last use of the
RHS. If so, commute the instruction, allowing us to avoid a
register-register copy in many cases for common instructions like ADD, OR,
AND, etc on X86.
2. If #1 doesn't hold, and if this is an instruction that also existing in
3-address form, promote the instruction to a 3-address instruction to
avoid the register-register copy. We can do this for several common
instructions in X86, including ADDrr, INC, DEC, etc.
This patch implements test/Regression/CodeGen/X86/commute-two-addr.ll,
overlap-add.ll, and overlap-shift.ll when I check in the X86 support for it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19245 91177308-0d34-0410-b5e6-96231b3b80d8
Make only one print method to avoid overloaded virtual warnings when \
compiled with -Woverloaded-virtual
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@18589 91177308-0d34-0410-b5e6-96231b3b80d8
* Do not put fixed registers into the unhandled set. This means they will
never find their way into the inactive, active, or handled sets, so we
can simplify a bunch of code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17945 91177308-0d34-0410-b5e6-96231b3b80d8
the iterator hints we have to speed up overlaps(). This speeds linscan up
by about .2s (out of 8.7) on 175.vpr for PPC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17935 91177308-0d34-0410-b5e6-96231b3b80d8
* Eliminate the releaseMemory method, this is not an analysis
* Change the fixed, active, and inactive lists of intervals to maintain an
iterator for the current position in the interval. This allows us to do
constant time increments of the iterator instead of having to do a binary
search to find our liverange in our liveinterval all of the time, which
substantially speeds up cases where LiveIntervals have many LiveRanges
- which is very common for physical registers. On targets with many
physregs, this can make a noticable difference.
With a release build of LLC for PPC, this halves the time in
processInactiveIntervals and processActiveIntervals, from 1.5s to .75s.
This also lays the ground for more to come.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17933 91177308-0d34-0410-b5e6-96231b3b80d8
useful when you have a reference like:
int A[100];
void foo() { A[10] = 1; }
In this case, &A[10] is a single constant and should be treated as such.
Only MO_GlobalAddress and MO_ExternalSymbol are allowed to use this field, no
other operand type is.
This is another fine patch contributed by Jeff Cohen!!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17007 91177308-0d34-0410-b5e6-96231b3b80d8
The problem occurred when trying to reload this instruction:
MOV32mr %reg2326, 8, %reg2297, 4, %reg2295
The value of reg2326 was available in EBX, so it was reused from there, instead
of reloading it into EDX.
The value of reg2297 was available in EDX, so it was reused from there, instead
of reloading it into EDI.
The value of reg2295 was not available, so we tried reloading it into EBX, its
assigned register. However, we checked and saw that we already reloaded
something into EBX, so we chose what reg2326 was assigned to (EDX) and reloaded
into that register instead.
Unfortunately EDX had already been used by reg2297, so reloading into EDX
clobbered the value used by the reg2326 operand, breaking the program.
The fix for this is to check that the newly picked register is ok. In this
case we now find that EDX is already used and try using EDI, which succeeds.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17006 91177308-0d34-0410-b5e6-96231b3b80d8
it was a use, def, or both. This allows us to be less pessimistic in our
analysis of them. In practice, this doesn't make a big difference, but it
doesn't hurt either.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16632 91177308-0d34-0410-b5e6-96231b3b80d8
and delete them if they turn out to be dead. This is a useful little hack
that even speeds up some programs. For example, it speeds up Ptrdist/ks
from 17.53s to 15.59s, and 188.ammp from 149s to 146s.
This also speeds up llc :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16630 91177308-0d34-0410-b5e6-96231b3b80d8
generated code over the simple spiller. The new local spiller generates
substantially better code than the simple one in some cases, by reusing
values that are loaded out of stack slots and kept available in registers.
This primarily helps programs that are spilling a lot, and there is still
stuff that can be done to improve it. This patch makes the local spiller
the default, as it's only a tiny bit slower than the simple spiller (it
increases the runtime of llc by < 1%).
Here are some numbers with speedups.
Program #reuse old(s) new(s) Speedup
Povray: 3452, 16.87 -> 15.93 (5.5%)
177.mesa: 2176, 2.77 -> 2.76 (0%)
179.art: 35, 28.43 -> 28.01 (1.5%)
183.equake: 55, 61.44 -> 61.41 (0%)
188.ammp: 869, 174 -> 149 (15%)
164.gzip: 43, 40.73 -> 40.71 (0%)
175.vpr: 351, 18.54 -> 17.34 (6.5%)
176.gcc: 2471, 5.01 -> 4.92 (1.8%)
181.mcf 42, 79.30 -> 75.20 (5.2%)
186.crafty: 484, 29.73 -> 30.04 (-1%)
197.parser: 251, 10.47 -> 10.67 (-1%)
252.eon: 1501, 1.98 -> 1.75 (12%)
253.perlbm: 1183, 14.83 -> 14.42 (2.8%)
254.gap: 825, 7.46 -> 7.29 (2.3%)
255.vortex: 285, 10.51 -> 10.27 (2.3%)
256.bzip2: 63, 55.70 -> 55.20 (0.9%)
300.twolf: 830, 21.63 -> 22.00 (-1%)
PtrDist/ks 14, 32.75 -> 17.53 (46.5%)
Olden/tsp 46, 8.71 -> 8.24 (5.4%)
Free/distray 70, 1.09 -> 0.99 (9.2%)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16629 91177308-0d34-0410-b5e6-96231b3b80d8
* Add const_iterator stuff
* Add a print method, which means that I can now call dump() from the
debugger.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16612 91177308-0d34-0410-b5e6-96231b3b80d8
two spillers produce perfectly identical code (at least on povray and eon),
but the simple spiller is substantially faster than the local spiller. Once
the local spiller is improved, we can switch back.
Switching cuts 5.2% off of the llc time for povray (about 1.3s).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16608 91177308-0d34-0410-b5e6-96231b3b80d8
use a simple vector. This speeds up -spiller=simple from taking 22s to taking
.1s on povray (debug build). This change does not modify the generated code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16607 91177308-0d34-0410-b5e6-96231b3b80d8
data structures). Fix the print method to send to the right ostream, not
always cerr. Delete typedefs that are only used once.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16606 91177308-0d34-0410-b5e6-96231b3b80d8
Move include/Config and include/Support into include/llvm/Config,
include/llvm/ADT and include/llvm/Support. From here on out, all LLVM
public header files must be under include/llvm/.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16137 91177308-0d34-0410-b5e6-96231b3b80d8
lists. Instead of scanning the vector backwards, scan it forward and
swap each element we want to erase. Then at the end erase all removed
intervals at once. This doesn't save much: 0.08s out of 4s when
compiling 176.gcc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16136 91177308-0d34-0410-b5e6-96231b3b80d8
Regression.CodeGen.Generic.2004-04-09-SameValueCoalescing.llx and the
code size problem.
This bug prevented us from doing most register coallesces.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16031 91177308-0d34-0410-b5e6-96231b3b80d8
with emitting .xwords when not on an 8-byte boundary (.xword 0 is not the
same as 8 .byte 0's). Because we do not know when or when we are not aligned,
just emit bytes like the old V9 asmprinter did.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16006 91177308-0d34-0410-b5e6-96231b3b80d8
Add support for targets that must spill certain physregs at certain locations.
Patch contributed by Nate Begeman, slightly hacked by me.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15701 91177308-0d34-0410-b5e6-96231b3b80d8
MachineBasicBlock* as a parameter so that nxext() and prior() helper
functions can work naturally on it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15376 91177308-0d34-0410-b5e6-96231b3b80d8
These side-effects seem to make a difference when using llc -march=sparcv9
in Release mode (i.e., with -DNDEBUG); when they are left out, lots of
instructions just get dropped on the floor, because they never end up
in the schedule.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15339 91177308-0d34-0410-b5e6-96231b3b80d8
aggressively coallesce live ranges even if they overlap. Consider this LLVM
code for example:
int %test(int %X) {
%Y = mul int %X, 1 ;; Codegens to Y = X
%Z = add int %X, %Y
ret int %Z
}
The mul is just there to get a copy into the code stream. This produces
this machine code:
(0x869e5a8, LLVM BB @0x869b9a0):
%reg1024 = mov <fi#-2>, 1, %NOREG, 0 ;; "X"
%reg1025 = mov %reg1024 ;; "Y" (subsumed by X)
%reg1026 = add %reg1024, %reg1025
%EAX = mov %reg1026
ret
Note that the life times of reg1024 and reg1025 overlap, even though they
contain the same value. This results in this machine code:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, %EAX
add %EAX, %ECX
ret
Another, worse case involves loops and PHI nodes. Consider this trivial loop:
testcase:
int %test2(int %X) {
entry:
br label %Loop
Loop:
%Y = phi int [%X, %entry], [%Z, %Loop]
%Z = add int %Y, 1
%cond = seteq int %Z, 100
br bool %cond, label %Out, label %Loop
Out:
ret int %Z
}
Because of interactions between the PHI elimination pass and the register
allocator, this got compiled to this code:
test2:
mov %ECX, DWORD PTR [%ESP + 4]
.LBBtest2_1:
*** mov %EAX, %ECX
inc %EAX
cmp %EAX, 100
*** mov %ECX, %EAX
jne .LBBtest2_1
ret
Or on powerpc, this code:
_test2:
mflr r0
stw r0, 8(r1)
stwu r1, -60(r1)
.LBB_test2_1:
addi r2, r3, 1
cmpwi cr0, r2, 100
*** or r3, r2, r2
bne cr0, .LBB_test2_1
*** or r3, r2, r2
lwz r0, 68(r1)
mtlr r0
addi r1, r1, 60
blr 0
With this improvement in place, we now generate this code for these two
testcases, which is what we want:
test:
mov %EAX, DWORD PTR [%ESP + 4]
add %EAX, %EAX
ret
test2:
mov %EAX, DWORD PTR [%ESP + 4]
.LBBtest2_1:
inc %EAX
cmp %EAX, 100
jne .LBBtest2_1 # Loop
ret
Or on PPC:
_test2:
mflr r0
stw r0, 8(r1)
stwu r1, -60(r1)
.LBB_test2_1:
addi r3, r3, 1
cmpwi cr0, r3, 100
bne cr0, .LBB_test2_1
lwz r0, 68(r1)
mtlr r0
addi r1, r1, 60
blr 0
Static numbers for spill code loads/stores/reg-reg copies (smaller is better):
em3d: before: 47/25/26 after: 44/22/24
164.gzip: before: 433/245/310 after: 403/231/278
175.vpr: before: 3721/2189/1581 after: 4144/2081/1423
176.gcc: before: 26195/8866/9235 after: 25942/8082/8275
186.crafty: before: 4295/2587/3079 after: 4119/2519/2916
252.eon: before: 12754/7585/5803 after: 12508/7425/5643
256.bzip2: before: 463/226/315 after: 482:241/309
Runtime perf number samples on X86:
gzip: before: 41.09 after: 39.86
bzip2: runtime: before: 56.71s after: 57.07s
gcc: before: 6.16 after: 6.12
eon: before: 2.03s after: 2.00s
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15194 91177308-0d34-0410-b5e6-96231b3b80d8
same as the PHI use. This is not correct as the PHI use value is different
depending on which branch is taken. This fixes espresso with aggressive
coallescing, and perhaps others.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15189 91177308-0d34-0410-b5e6-96231b3b80d8
LiveInterval>. This saves some space and removes the pointer
indirection caused by following the pointer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15167 91177308-0d34-0410-b5e6-96231b3b80d8
us back to taking about 10.5s on gcc, instead of taking 15.6s! The net result
is that my big patches have hand no significant effect on compile time or code
quality. heh.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15156 91177308-0d34-0410-b5e6-96231b3b80d8
Interval. This generalizes the isDefinedOnce mechanism that we used before
to help us coallesce ranges that overlap. As part of this, every logical
range with a different value is assigned a different number in the interval.
For example, for code that looks like this:
0 X = ...
4 X += ...
...
N = X
We now generate a live interval that contains two ranges: [2,6:0),[6,?:1)
reflecting the fact that there are two different values in the range at
different positions in the code.
Currently we are not using this information at all, so this just slows down
liveintervals. In the future, this will change.
Note that this change also substantially refactors the joinIntervalsInMachineBB
method to merge the cases for virt-virt and phys-virt joining into a single
case, adds comments, and makes the code a bit easier to follow.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15154 91177308-0d34-0410-b5e6-96231b3b80d8
* Fix comment typeo
* add dump() methods
* add a few new methods like getLiveRangeContaining, removeRange & joinable
(which is currently the same as overlaps)
* Remove the unused operator==
Bigger change:
* In LiveInterval, instead of using a boolean isDefinedOnce to keep track of
if there are > 1 definitions in a particular interval, keep a counter,
NumValues to keep track of exactly how many there are.
* In LiveRange, add a new ValId element to indicate which of the numbered
values each LiveRange belongs to. We now no longer merge LiveRanges if
they are of differing value ID's even if they are neighbors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15152 91177308-0d34-0410-b5e6-96231b3b80d8
* Inline some functions
* Eliminate some comparisons from the release build
This is good for another .3 on gcc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15144 91177308-0d34-0410-b5e6-96231b3b80d8
want to insert a new range into the middle of the vector, then delete ranges
one at a time next to the inserted one as they are merged.
Instead, if the inserted interval overlaps, just start merging. The only time
we insert into the middle of the vector is when we don't overlap at all. Also
delete blocks of live ranges if we overlap with many of them.
This patch speeds up joining by .7 seconds on a large testcase, but more
importantly gets all of the range adding code into addRangeFrom.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15141 91177308-0d34-0410-b5e6-96231b3b80d8
will soon be renamed) into their own file. The new file should not emit
DEBUG output or have other side effects. The LiveInterval class also now
doesn't know whether its working on registers or some other thing.
In the future we will want to use the LiveInterval class and friends to do
stack packing. In addition to a code simplification, this will allow us to
do it more easily.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15134 91177308-0d34-0410-b5e6-96231b3b80d8
Use an explicit LiveRange class to represent ranges instead of an std::pair.
This is a minor cleanup, but is really intended to make a future patch simpler
and less invasive.
Alkis, could you please take a look at LiveInterval::liveAt? I suspect that
you can add an operator<(unsigned) to LiveRange, allowing us to speed up the
upper_bound call by quite a bit (this would also apply to other callers of
upper/lower_bound). I would do it myself, but I still don't understand that
crazy liveAt function, despite the comment. :)
Basically I would like to see this:
LiveRange dummy(index, index+1);
Ranges::const_iterator r = std::upper_bound(ranges.begin(),
ranges.end(),
dummy);
Turn into:
Ranges::const_iterator r = std::upper_bound(ranges.begin(),
ranges.end(),
index);
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15130 91177308-0d34-0410-b5e6-96231b3b80d8
interfere. Because these intervals have a single definition, and one of them
is a copy instruction, they are always safe to merge even if their lifetimes
interfere. This slightly reduces the amount of spill code, for example on
252.eon, from:
12837 spiller - Number of loads added
7604 spiller - Number of stores added
5842 spiller - Number of register spills
18155 liveintervals - Number of identity moves eliminated after coalescing
to:
12754 spiller - Number of loads added
7585 spiller - Number of stores added
5803 spiller - Number of register spills
18262 liveintervals - Number of identity moves eliminated after coalescing
The much much bigger win would be to merge intervals with multiple definitions
(aka phi nodes) but this is not that day.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15124 91177308-0d34-0410-b5e6-96231b3b80d8
intervals need not be sorted anymore. Removing this redundant step
improves LiveIntervals running time by 5% on 176.gcc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15106 91177308-0d34-0410-b5e6-96231b3b80d8
compilation of gcc:
* Use vectors instead of lists for the intervals sets
* Use a heap for the unhandled set to keep intervals always sorted and
makes insertions back to the heap very fast (compared to scanning a
list)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15103 91177308-0d34-0410-b5e6-96231b3b80d8
fortunately, they are easy to handle if we know about them. This patch fixes
some serious pessimization of code produced by the linscan register allocator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15092 91177308-0d34-0410-b5e6-96231b3b80d8
is a simple change, but seems to improve code a little. For example, on
256.bzip2, we went from 75.0s -> 73.33s (2% speedup).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15004 91177308-0d34-0410-b5e6-96231b3b80d8
* vreg <-> vreg joining now works, enable it unconditionally when joining
is enabled (which is the default).
* Fix a serious pessimization of spill code where we were saying that a
spilled DEF operand was live into the subsequent instruction. This allows
for substantially better code when spilling starts to happen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14993 91177308-0d34-0410-b5e6-96231b3b80d8
order, causing the inactive list in the linearscan list to get unsorted, which
basically fuxored everything up severely.
These seems to fix the joiner, so with more testing I will enable it by default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14992 91177308-0d34-0410-b5e6-96231b3b80d8
Heavily refactor handleVirtualRegisterDef, adding comments and making it more
efficient. It is also much easier to follow and convince ones self that it is
correct :)
Add -debug output to the joine, showing the result of joining the intervals.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14989 91177308-0d34-0410-b5e6-96231b3b80d8
basic block clear()'s all of the operands lists, including phis. This
caused removePredecessor to get confused later. Because of this, we just
nuke (without prejudice) PHI nodes in unreachable blocks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14635 91177308-0d34-0410-b5e6-96231b3b80d8
pass is required to paper over problems in the code generator (primarily
live variables and its clients) which doesn't really have any well defined
semantics for unreachable code.
The proper solution to this problem is to have instruction selectors not
select blocks that are unreachable. Until we have a instruction selection
framework available for use, however, we can't expect all instruction
selector writers to do this. Until then, this pass should be used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14563 91177308-0d34-0410-b5e6-96231b3b80d8
instructions. Instead, keep a map of instructions -> MCFI objects in the
already sparc-specific class MachineFunctionInfo. This will slow down the
sparc backend a bit, but it does not penalize the rest of LLVM!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14438 91177308-0d34-0410-b5e6-96231b3b80d8
The vector may actually be empty if the register that we are marking as
recently used is not actually allocatable. This happens for physical registers
that are not allocatable, like the ST(x) registers on X86.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14195 91177308-0d34-0410-b5e6-96231b3b80d8
Work around for bogus errors: no those static functions don't need to
be executed to be initialized!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14040 91177308-0d34-0410-b5e6-96231b3b80d8
broke obsequi and a lot of other things. It all boiled down to MBB being
overloaded in an inner scope and me confusing it with the one in the outer
scope. Ugh!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@13517 91177308-0d34-0410-b5e6-96231b3b80d8
MBBs start out as #-1. When a MBB is added to a MachineFunction, it
gets the next available unique MBB number. If it is removed from a
MachineFunction, it goes back to being #-1.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@13514 91177308-0d34-0410-b5e6-96231b3b80d8
in the basic block being processed. This fixes PhiElimination on kimwitu++
from taking 105s to taking a much more reasonable 0.6s (in a debug build).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@13453 91177308-0d34-0410-b5e6-96231b3b80d8
than before. Because this is the case, we can compute the first non-phi
instruction once when de-phi'ing a block. This shaves ~4s off of
phielimination of _Z7yyparsev in kimwitu++ from 109s -> 105s. There are
still much more important gains to come.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@13452 91177308-0d34-0410-b5e6-96231b3b80d8
when we see a read of a register. This is important in cases like:
AL = ...
AH = ...
= AX
The read of AX must make both the AL and AH defs live until the use.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@13444 91177308-0d34-0410-b5e6-96231b3b80d8
use MachineBasicBlocks. To do this, we traverse the Machine CFG instead of
the LLVM CFG, which is also *MUCH* more efficient by having fewer levels of
indirections and mappings.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@13301 91177308-0d34-0410-b5e6-96231b3b80d8
documentation that this module needs to be made independent of the
register file description of the current target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@13125 91177308-0d34-0410-b5e6-96231b3b80d8