* sumarray2d fixed: large fixed-size alloca
* make is now compileable
* Re-organized tests to fit them under proper headings
Patch by Nate Begeman.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15347 91177308-0d34-0410-b5e6-96231b3b80d8
These side-effects seem to make a difference when using llc -march=sparcv9
in Release mode (i.e., with -DNDEBUG); when they are left out, lots of
instructions just get dropped on the floor, because they never end up
in the schedule.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15339 91177308-0d34-0410-b5e6-96231b3b80d8
assumed that a constant on the RHS of a multiplication was either an
IntConstant or an FPConstant. It checked for an IntConstant and then,
if it did not find one, did a hard cast to an FPConstant. That code
would crash if the RHS were a ConstantExpr that was neither an
IntConstant nor an FPConstant. This version replaces the hard cast
with a dyn_cast. It performs the same way for IntConstants and
FPConstants but does nothing, instead of crashing, for constant
expressions.
The regression test for this change is 2004-07-27-ConstantExprMul.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15291 91177308-0d34-0410-b5e6-96231b3b80d8
functions in SparcV9InstrSelection and SparcV9PreSelection into regular
old global functions. As it happens, none of them really have anything
to do with TargetInstrInfo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15278 91177308-0d34-0410-b5e6-96231b3b80d8
used in the SparcV9 backend really have anything to do with
TargetInstrInfo, so we're converting them into regular old global
functions and moving their declarations to SparcV9InstrSelectionSupport.h.
(They're mostly used as helper functions for SparcV9InstrSelection.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15277 91177308-0d34-0410-b5e6-96231b3b80d8
understand, and more accurate to boot! This implements
GlobalModRef/purecse.ll over the previous impl.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15260 91177308-0d34-0410-b5e6-96231b3b80d8
addi r1, r2, 0
addi r1, <frame index #n>, 0
so we must check for the second parameter being a register for this instruction
to be considered a reg-to-reg copy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15244 91177308-0d34-0410-b5e6-96231b3b80d8
* Implemented GEP folding
* Dynamically output global address stuff once per function
* Fix casting fp<->short/byte
Patch contributed by Nate Begeman.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15237 91177308-0d34-0410-b5e6-96231b3b80d8
(At[3] << 24) is an int type and it is being coerced to uint64_t, it was
getting sign extended, causing us to get FFFFFFFFxxxxxxxx constants all of
the time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15224 91177308-0d34-0410-b5e6-96231b3b80d8
glibc 'nan' function because the initializer is not a string. This breaks when used in a global
initializer. Try compiling this testcase for example:
%X = global float <some nan value>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15223 91177308-0d34-0410-b5e6-96231b3b80d8
- encode/decode target triple and dependent libraries
bug 401:
- fix encoding/decoding of FP values to be little-endian only
bug 402:
- initial (compatible) cut at 24-bit types instead of 32-bit
- reduce size of block headers by 50%
Other:
- cleanup Writer by consolidating to one compilation unit, rem. other files
- use a std::vector instead of std::deque so the buffer can be allocated
in multiples of 64KByte chunks rather than in multiples of some smaller
(default) number.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15210 91177308-0d34-0410-b5e6-96231b3b80d8
a bug in DSE).
* Delete dead operand uses iteratively instead of recursively, using a
SetVector.
* Defer deletion of dead operand uses until the end of processing, which means
we don't have to bother with updating the AliasSetTracker. This speeds up
DSE substantially.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15204 91177308-0d34-0410-b5e6-96231b3b80d8
aggressively coallesce live ranges even if they overlap. Consider this LLVM
code for example:
int %test(int %X) {
%Y = mul int %X, 1 ;; Codegens to Y = X
%Z = add int %X, %Y
ret int %Z
}
The mul is just there to get a copy into the code stream. This produces
this machine code:
(0x869e5a8, LLVM BB @0x869b9a0):
%reg1024 = mov <fi#-2>, 1, %NOREG, 0 ;; "X"
%reg1025 = mov %reg1024 ;; "Y" (subsumed by X)
%reg1026 = add %reg1024, %reg1025
%EAX = mov %reg1026
ret
Note that the life times of reg1024 and reg1025 overlap, even though they
contain the same value. This results in this machine code:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, %EAX
add %EAX, %ECX
ret
Another, worse case involves loops and PHI nodes. Consider this trivial loop:
testcase:
int %test2(int %X) {
entry:
br label %Loop
Loop:
%Y = phi int [%X, %entry], [%Z, %Loop]
%Z = add int %Y, 1
%cond = seteq int %Z, 100
br bool %cond, label %Out, label %Loop
Out:
ret int %Z
}
Because of interactions between the PHI elimination pass and the register
allocator, this got compiled to this code:
test2:
mov %ECX, DWORD PTR [%ESP + 4]
.LBBtest2_1:
*** mov %EAX, %ECX
inc %EAX
cmp %EAX, 100
*** mov %ECX, %EAX
jne .LBBtest2_1
ret
Or on powerpc, this code:
_test2:
mflr r0
stw r0, 8(r1)
stwu r1, -60(r1)
.LBB_test2_1:
addi r2, r3, 1
cmpwi cr0, r2, 100
*** or r3, r2, r2
bne cr0, .LBB_test2_1
*** or r3, r2, r2
lwz r0, 68(r1)
mtlr r0
addi r1, r1, 60
blr 0
With this improvement in place, we now generate this code for these two
testcases, which is what we want:
test:
mov %EAX, DWORD PTR [%ESP + 4]
add %EAX, %EAX
ret
test2:
mov %EAX, DWORD PTR [%ESP + 4]
.LBBtest2_1:
inc %EAX
cmp %EAX, 100
jne .LBBtest2_1 # Loop
ret
Or on PPC:
_test2:
mflr r0
stw r0, 8(r1)
stwu r1, -60(r1)
.LBB_test2_1:
addi r3, r3, 1
cmpwi cr0, r3, 100
bne cr0, .LBB_test2_1
lwz r0, 68(r1)
mtlr r0
addi r1, r1, 60
blr 0
Static numbers for spill code loads/stores/reg-reg copies (smaller is better):
em3d: before: 47/25/26 after: 44/22/24
164.gzip: before: 433/245/310 after: 403/231/278
175.vpr: before: 3721/2189/1581 after: 4144/2081/1423
176.gcc: before: 26195/8866/9235 after: 25942/8082/8275
186.crafty: before: 4295/2587/3079 after: 4119/2519/2916
252.eon: before: 12754/7585/5803 after: 12508/7425/5643
256.bzip2: before: 463/226/315 after: 482:241/309
Runtime perf number samples on X86:
gzip: before: 41.09 after: 39.86
bzip2: runtime: before: 56.71s after: 57.07s
gcc: before: 6.16 after: 6.12
eon: before: 2.03s after: 2.00s
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15194 91177308-0d34-0410-b5e6-96231b3b80d8
same as the PHI use. This is not correct as the PHI use value is different
depending on which branch is taken. This fixes espresso with aggressive
coallescing, and perhaps others.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15189 91177308-0d34-0410-b5e6-96231b3b80d8
LiveInterval>. This saves some space and removes the pointer
indirection caused by following the pointer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15167 91177308-0d34-0410-b5e6-96231b3b80d8
us back to taking about 10.5s on gcc, instead of taking 15.6s! The net result
is that my big patches have hand no significant effect on compile time or code
quality. heh.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15156 91177308-0d34-0410-b5e6-96231b3b80d8
Interval. This generalizes the isDefinedOnce mechanism that we used before
to help us coallesce ranges that overlap. As part of this, every logical
range with a different value is assigned a different number in the interval.
For example, for code that looks like this:
0 X = ...
4 X += ...
...
N = X
We now generate a live interval that contains two ranges: [2,6:0),[6,?:1)
reflecting the fact that there are two different values in the range at
different positions in the code.
Currently we are not using this information at all, so this just slows down
liveintervals. In the future, this will change.
Note that this change also substantially refactors the joinIntervalsInMachineBB
method to merge the cases for virt-virt and phys-virt joining into a single
case, adds comments, and makes the code a bit easier to follow.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15154 91177308-0d34-0410-b5e6-96231b3b80d8
* Fix comment typeo
* add dump() methods
* add a few new methods like getLiveRangeContaining, removeRange & joinable
(which is currently the same as overlaps)
* Remove the unused operator==
Bigger change:
* In LiveInterval, instead of using a boolean isDefinedOnce to keep track of
if there are > 1 definitions in a particular interval, keep a counter,
NumValues to keep track of exactly how many there are.
* In LiveRange, add a new ValId element to indicate which of the numbered
values each LiveRange belongs to. We now no longer merge LiveRanges if
they are of differing value ID's even if they are neighbors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15152 91177308-0d34-0410-b5e6-96231b3b80d8
* Inline some functions
* Eliminate some comparisons from the release build
This is good for another .3 on gcc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15144 91177308-0d34-0410-b5e6-96231b3b80d8
want to insert a new range into the middle of the vector, then delete ranges
one at a time next to the inserted one as they are merged.
Instead, if the inserted interval overlaps, just start merging. The only time
we insert into the middle of the vector is when we don't overlap at all. Also
delete blocks of live ranges if we overlap with many of them.
This patch speeds up joining by .7 seconds on a large testcase, but more
importantly gets all of the range adding code into addRangeFrom.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15141 91177308-0d34-0410-b5e6-96231b3b80d8
(e.g., LICM) into FunctionPassManagers. The problem is that we were
using a C-style cast to cast required analysis passes to PassClass*, but
if it's a FunctionPassManager, and the required analysis pass is an
ImmutablePass, the types aren't really compatible, so the C-style cast
causes a crash.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15140 91177308-0d34-0410-b5e6-96231b3b80d8
will soon be renamed) into their own file. The new file should not emit
DEBUG output or have other side effects. The LiveInterval class also now
doesn't know whether its working on registers or some other thing.
In the future we will want to use the LiveInterval class and friends to do
stack packing. In addition to a code simplification, this will allow us to
do it more easily.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15134 91177308-0d34-0410-b5e6-96231b3b80d8
Use an explicit LiveRange class to represent ranges instead of an std::pair.
This is a minor cleanup, but is really intended to make a future patch simpler
and less invasive.
Alkis, could you please take a look at LiveInterval::liveAt? I suspect that
you can add an operator<(unsigned) to LiveRange, allowing us to speed up the
upper_bound call by quite a bit (this would also apply to other callers of
upper/lower_bound). I would do it myself, but I still don't understand that
crazy liveAt function, despite the comment. :)
Basically I would like to see this:
LiveRange dummy(index, index+1);
Ranges::const_iterator r = std::upper_bound(ranges.begin(),
ranges.end(),
dummy);
Turn into:
Ranges::const_iterator r = std::upper_bound(ranges.begin(),
ranges.end(),
index);
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15130 91177308-0d34-0410-b5e6-96231b3b80d8
interfere. Because these intervals have a single definition, and one of them
is a copy instruction, they are always safe to merge even if their lifetimes
interfere. This slightly reduces the amount of spill code, for example on
252.eon, from:
12837 spiller - Number of loads added
7604 spiller - Number of stores added
5842 spiller - Number of register spills
18155 liveintervals - Number of identity moves eliminated after coalescing
to:
12754 spiller - Number of loads added
7585 spiller - Number of stores added
5803 spiller - Number of register spills
18262 liveintervals - Number of identity moves eliminated after coalescing
The much much bigger win would be to merge intervals with multiple definitions
(aka phi nodes) but this is not that day.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15124 91177308-0d34-0410-b5e6-96231b3b80d8
* Don't allow negative immediates to users of unsigned immediates
* Fix long compares
* Support <const int>, op as a potential immediate candidate
* Fix sign extension of short and byte loads
* Fix and improve integer casts
* Fix passing of doubles as vararg functions
Patch contributed by Nate Begeman.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15109 91177308-0d34-0410-b5e6-96231b3b80d8
intervals need not be sorted anymore. Removing this redundant step
improves LiveIntervals running time by 5% on 176.gcc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15106 91177308-0d34-0410-b5e6-96231b3b80d8
compilation of gcc:
* Use vectors instead of lists for the intervals sets
* Use a heap for the unhandled set to keep intervals always sorted and
makes insertions back to the heap very fast (compared to scanning a
list)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15103 91177308-0d34-0410-b5e6-96231b3b80d8
can be improved in many ways. But: stop laughing, even with -basicaa it
deletes 15% of the stores in 252.eon :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15101 91177308-0d34-0410-b5e6-96231b3b80d8
fortunately, they are easy to handle if we know about them. This patch fixes
some serious pessimization of code produced by the linscan register allocator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15092 91177308-0d34-0410-b5e6-96231b3b80d8
* Test for whether bits are shifted out during the optzn.
If so, the fold is illegal, though it can be handled explicitly for setne/seteq
This fixes the miscompilation of 254.gap last night, which was a latent bug
exposed by other optimizer improvements.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15085 91177308-0d34-0410-b5e6-96231b3b80d8
* Generation of opcodes that take 16 bit immediates
* Rewrote multiply to be correct for 64 bit values
* Rewrote all the long handling to be correct for PowerPC
* Fix visitSelectInst() to define the upper register of the pair of regs
representing a long value
Patch contributed by Nate Begeman.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15083 91177308-0d34-0410-b5e6-96231b3b80d8
* Fix functions that take more than 32 bytes of args
* Alignment of doubles in structs is 4 bytes, not 8
* Fix passing long args: rN = hi, rN+1 = lo
* Rewrite signed divide
* Rewrite Intrinsic::returnaddress
Patch courtesy of Nate Begeman.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15036 91177308-0d34-0410-b5e6-96231b3b80d8
actually care about. Someday when the cast instruction is gone, we can do
better here, but this will do for now. This implements
instcombine/cast.ll:test17/18 as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15018 91177308-0d34-0410-b5e6-96231b3b80d8
`-> asm printer updated to not print out those registers with the call instr
All of Shootout tests now work. Great thanks to Nate Begeman for the patch!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15015 91177308-0d34-0410-b5e6-96231b3b80d8
* Fn args passed in registers are now recorded as used by the call instruction
`-> asm printer updated to not print out those registers with the call instr
* Stack frame layout in prolog/epilog fixed, spills and vararg fns now work
* float/double to signed int codegen now correct
* various single precision float codegen bugs fixed
* const integer multiply codegen fixed
* select and setcc blocks inserted into the correct place in machine CFG
* load of integer constant code optimized
All of Shootout tests now work. Great thanks to Nate Begeman for the patch!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15014 91177308-0d34-0410-b5e6-96231b3b80d8
is a simple change, but seems to improve code a little. For example, on
256.bzip2, we went from 75.0s -> 73.33s (2% speedup).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@15004 91177308-0d34-0410-b5e6-96231b3b80d8
* vreg <-> vreg joining now works, enable it unconditionally when joining
is enabled (which is the default).
* Fix a serious pessimization of spill code where we were saying that a
spilled DEF operand was live into the subsequent instruction. This allows
for substantially better code when spilling starts to happen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14993 91177308-0d34-0410-b5e6-96231b3b80d8
order, causing the inactive list in the linearscan list to get unsorted, which
basically fuxored everything up severely.
These seems to fix the joiner, so with more testing I will enable it by default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14992 91177308-0d34-0410-b5e6-96231b3b80d8
Heavily refactor handleVirtualRegisterDef, adding comments and making it more
efficient. It is also much easier to follow and convince ones self that it is
correct :)
Add -debug output to the joine, showing the result of joining the intervals.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14989 91177308-0d34-0410-b5e6-96231b3b80d8
and a list of don'ts for the library. All so future maintainers don't
break the important contract this library has with its user: LLVM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14979 91177308-0d34-0410-b5e6-96231b3b80d8
night compiling cfrac. It did not realize that code like this:
int G; int *H = &G;
takes the address of G.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@14973 91177308-0d34-0410-b5e6-96231b3b80d8