1. Functions do not make things incomplete, only variables
2. Constant global variables no longer need to be marked incomplete, because
we are guaranteed that the initializer for the global will be in the
graph we are hacking on now. This makes resolution of indirect calls happen
a lot more in the bu pass, supports things like vtables and the C counterparts
(giant constant arrays of function pointers), etc...
Testcase here: test/Regression/Analysis/DSGraph/constant_globals.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11852 91177308-0d34-0410-b5e6-96231b3b80d8
Make the incompleteness marker faster by looping directly over the globals
instead of over the scalars to find the globals
Fix a bug where we didn't mark a global incomplete if it didn't have any
outgoing edges. This wouldn't break any current clients but is still wrong.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11848 91177308-0d34-0410-b5e6-96231b3b80d8
pair, and look up varargs in the execution stack every time, instead of
just pushing iterators (which can be invalidated during callFunction())
around. (union GenericValue now has a "pair of uints" member, to support
this mechanism.) Fixes Bug 234.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11845 91177308-0d34-0410-b5e6-96231b3b80d8
assume that if they don't intend to write to a global variable, that they
would mark it as constant. However, there are people that don't understand
that the compiler can do nice things for them if they give it the information
it needs.
This pass looks for blatently obvious globals that are only ever read from.
Though it uses a trivially simple "alias analysis" of sorts, it is still able
to do amazing things to important benchmarks. 253.perlbmk, for example,
contains several ***GIANT*** function pointer tables that are not marked
constant and should be. Marking them constant allows the optimizer to turn
a whole bunch of indirect calls into direct calls. Note that only a link-time
optimizer can do this transformation, but perlbmk does have several strings
and other minor globals that can be marked constant by this pass when run
from GCCAS.
176.gcc has a ton of strings and large tables that are marked constant, both
at compile time (38 of them) and at link time (48 more). Other benchmarks
give similar results, though it seems like big ones have disproportionally
more than small ones.
This pass is extremely quick and does good things. I'm going to enable it
in gccas & gccld. Not bad for 50 SLOC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11836 91177308-0d34-0410-b5e6-96231b3b80d8
scaled indexes. This allows us to compile GEP's like this:
int* %test([10 x { int, { int } }]* %X, int %Idx) {
%Idx = cast int %Idx to long
%X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0
ret int* %X
}
Into a single address computation:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4]
ret
Before it generated:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
shl %ECX, 3
add %EAX, %ECX
lea %EAX, DWORD PTR [%EAX + 4]
ret
This is useful for things like int/float/double arrays, as the indexing can be folded into
the loads&stores, reducing register pressure and decreasing the pressure on the decode unit.
With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot. On
bzip2 for example, we go from this:
10665 asm-printer - Number of machine instrs printed
40 ra-local - Number of loads/stores folded into instructions
1708 ra-local - Number of loads added
1532 ra-local - Number of stores added
1354 twoaddressinstruction - Number of instructions added
1354 twoaddressinstruction - Number of two-address instructions
2794 x86-peephole - Number of peephole optimization performed
to this:
9873 asm-printer - Number of machine instrs printed
41 ra-local - Number of loads/stores folded into instructions
1710 ra-local - Number of loads added
1521 ra-local - Number of stores added
789 twoaddressinstruction - Number of instructions added
789 twoaddressinstruction - Number of two-address instructions
2142 x86-peephole - Number of peephole optimization performed
... and these types of instructions are often in tight loops.
Linear scan is also helped, but not as much. It goes from:
8787 asm-printer - Number of machine instrs printed
2389 liveintervals - Number of identity moves eliminated after coalescing
2288 liveintervals - Number of interval joins performed
3522 liveintervals - Number of intervals after coalescing
5810 liveintervals - Number of original intervals
700 spiller - Number of loads added
487 spiller - Number of stores added
303 spiller - Number of register spills
1354 twoaddressinstruction - Number of instructions added
1354 twoaddressinstruction - Number of two-address instructions
363 x86-peephole - Number of peephole optimization performed
to:
7982 asm-printer - Number of machine instrs printed
1759 liveintervals - Number of identity moves eliminated after coalescing
1658 liveintervals - Number of interval joins performed
3282 liveintervals - Number of intervals after coalescing
4940 liveintervals - Number of original intervals
635 spiller - Number of loads added
452 spiller - Number of stores added
288 spiller - Number of register spills
789 twoaddressinstruction - Number of instructions added
789 twoaddressinstruction - Number of two-address instructions
258 x86-peephole - Number of peephole optimization performed
Though I'm not complaining about the drop in the number of intervals. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11820 91177308-0d34-0410-b5e6-96231b3b80d8
to do analysis.
*** FOLD getelementptr instructions into loads and stores when possible,
making use of some of the crazy X86 addressing modes.
For example, the following C++ program fragment:
struct complex {
double re, im;
complex(double r, double i) : re(r), im(i) {}
};
inline complex operator+(const complex& a, const complex& b) {
return complex(a.re+b.re, a.im+b.im);
}
complex addone(const complex& arg) {
return arg + complex(1,0);
}
Used to be compiled to:
_Z6addoneRK7complex:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
*** mov %EDX, %ECX
fld QWORD PTR [%EDX]
fld1
faddp %ST(1)
*** add %ECX, 8
fld QWORD PTR [%ECX]
fldz
faddp %ST(1)
*** mov %ECX, %EAX
fxch %ST(1)
fstp QWORD PTR [%ECX]
*** add %EAX, 8
fstp QWORD PTR [%EAX]
ret
Now it is compiled to:
_Z6addoneRK7complex:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
fld QWORD PTR [%ECX]
fld1
faddp %ST(1)
fld QWORD PTR [%ECX + 8]
fldz
faddp %ST(1)
fxch %ST(1)
fstp QWORD PTR [%EAX]
fstp QWORD PTR [%EAX + 8]
ret
Other programs should see similar improvements, across the board. Note that
in addition to reducing instruction count, this also reduces register pressure
a lot, always a good thing on X86. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11819 91177308-0d34-0410-b5e6-96231b3b80d8
into a single LEA instruction. This should improve the code generated for
things like X->A.B.C[12].D.
The bigger benefit is still coming though. Note that this uses an LEA instruction
instead of an add, giving the register allocator more freedom. We should probably
never generate ADDri32's.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11817 91177308-0d34-0410-b5e6-96231b3b80d8
Also fix problem where we didn't check to see if a node pointer was null.
Though fclose(null) doesn't make a lot of sense, 300.twolf does it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11810 91177308-0d34-0410-b5e6-96231b3b80d8
longer was getting this #include, it always fell back on the less precise
floating point initializer values, causing some testsuite failures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11803 91177308-0d34-0410-b5e6-96231b3b80d8
allocator.
The implementation is completely rewritten and now employs several
optimizations not exercised before. For example for 164.gzip we have
997 loads and 699 stores vs the 1221 loads and 880 stores we have
before.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11798 91177308-0d34-0410-b5e6-96231b3b80d8
This case occurs many times in various benchmarks, especially when combined
with the previous patch. This allows it to get stuff like:
if (X == 4 || X == 3)
if (X == 5 || X == 8)
and
switch (X) {
case 4: case 5: case 6:
if (X == 4 || X == 5)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11797 91177308-0d34-0410-b5e6-96231b3b80d8
block into MachineBasicBlock::getFirstTerminator().
This also fixes a bug in the implementation of the above in both
RegAllocLocal and InstrSched, where instructions where added after the
terminator if the basic block's only instruction was a terminator (it
shouldn't matter for RegAllocLocal since this case never occurs in
practice).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11748 91177308-0d34-0410-b5e6-96231b3b80d8
use FP instructions. This reduces the number of instructions inserted in
176.gcc (for example) from 58074 to 101 (it doesn't use much FP, which
is typical). This reduction speeds up the entire code generator. In the
case of 176.gcc, llc went from taking 31.38s to 24.78s. The passes that
sped up the most are the register allocator and the 2 live variable analysis
passes, which sped up 2.3, 1.3, and 1.5s respectively. The asmprinter
pass also sped up because it doesn't print the instructions in comments :)
Note that this patch is likely to expose latent bugs in machine code passes,
because now basicblock can be empty, where they were never empty before. I
cleaned out regalloclocal, but who knows about linscan :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11717 91177308-0d34-0410-b5e6-96231b3b80d8
switch statements in the constructors and simplifies the
implementation of the getUseType() member function. You will have to
specify defs using MachineOperand::Def instead of MOTy::Def though
(similarly for Use and UseAndDef).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11715 91177308-0d34-0410-b5e6-96231b3b80d8
(minor) benefits right now:
1. An extra dummy MOVrr32 is gone. This move would often be coallesced by
both allocators anyway.
2. The code now uses the gep_type_iterator to walk the gep, which should future
proof it a bit. It still assumes that array indexes are Longs though.
These don't really justify rewriting the code. The big benefit will come later
though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11710 91177308-0d34-0410-b5e6-96231b3b80d8
value is a physreg and one is a virtreg. For this reason, disable copy folding
entirely for physregs. Also, use the new isMoveInstr target hook which gives us
folding of FP moves as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11700 91177308-0d34-0410-b5e6-96231b3b80d8
FIX MAJOR BUG, whereby we didn't merge null edges correctly. Correcting this
fixes poolallocation on 175.vpr, and possibly others.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11695 91177308-0d34-0410-b5e6-96231b3b80d8
in a signal handler, allocating memory or doing other unsafe things is bad,
which means we should do it in a different process.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11689 91177308-0d34-0410-b5e6-96231b3b80d8
BU propagation, clone the globals into the GG of EACH FUNCTION that finishes
processing! The GlobalsGraph *must* include all globals and effects from
all functions in the program. Fixing this makes pool allocation work better
on 175.vpr, but it still ultimately crashes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11686 91177308-0d34-0410-b5e6-96231b3b80d8
end of the BU and CBU passes. The globals will be marked incomplete, so it
doesn't matter if they are missing some info, and merging isn't guaranteed
to bring everything in anyway!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11684 91177308-0d34-0410-b5e6-96231b3b80d8
1. LiveIntervals now implement a 4 slot per instruction model. Load,
Use, Def and a Store slot. This is required in order to correctly
represent caller saved register clobbering on function calls,
register reuse in the same instruction (def resues last use) and
also spill code added later by the allocator. The previous
representation (2 slots per instruction) was insufficient and as a
result was causing subtle bugs.
2. Fixes in spill code generation. This was the major cause of
failures in the test suite.
3. Linear scan now has core support for folding memory operands. This
is untested and not enabled (the live interval update function does
not attempt to fold loads/stores in instructions).
4. Lots of improvements in the debugging output of both live intervals
and linear scan. Give it a try... it is beautiful :-)
In summary the above fixes all the issues with the recent reserved
register elimination changes and get the allocator very close to the
next big step: folding memory operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11654 91177308-0d34-0410-b5e6-96231b3b80d8
by operator<< on MachineInstr's, and looking up what register "24" is all of the
time was greatly annoying.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11623 91177308-0d34-0410-b5e6-96231b3b80d8
that need them. This is very useful on CISCy targets like the X86 because it
reduces the total spill pressure, and makes better use of it's (large)
instruction set. Though the X86 backend doesn't know how to rewrite many
instructions yet, this already makes a substantial difference on 176.gcc for
example:
Before:
Time:
8.0099 ( 31.2%) 0.0100 ( 12.5%) 8.0199 ( 31.2%) 7.7186 ( 30.0%) Local Register Allocator
Code quality:
734559 asm-printer - Number of machine instrs printed
111395 ra-local - Number of registers reloaded
79902 ra-local - Number of registers spilled
231554 x86-peephole - Number of peephole optimization performed
After:
Time:
7.8700 ( 30.6%) 0.0099 ( 19.9%) 7.8800 ( 30.6%) 7.7892 ( 30.2%) Local Register Allocator
Code quality:
733083 asm-printer - Number of machine instrs printed
2379 ra-local - Number of reloads fused into instructions
109046 ra-local - Number of registers reloaded
79881 ra-local - Number of registers spilled
230658 x86-peephole - Number of peephole optimization performed
So by fusing 2300 instructions, we reduced the static number of instructions
by 1500, and reduces the number of peepholes (and thus the work) by about 900.
This also clearly reduces the number of reload/spill instructions that are
emitted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11542 91177308-0d34-0410-b5e6-96231b3b80d8
nightly tests to be really messed up. The problem was that the new leakdetector
was depending on undefined behavior: the order of destruction of static objects.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11488 91177308-0d34-0410-b5e6-96231b3b80d8
hacks can be banished. Also, this gives us the opportunity to emit special code
for the setjmp/longjmps which alows the elimination of one GCC warning for every
setjmp/longjmp site (which is often THOUSANDS in C++ programs). Yaay!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11484 91177308-0d34-0410-b5e6-96231b3b80d8
prototypes, even if they don't precisely match what it would prefer to use.
This fixes: CBackend/2004-02-15-PreexistingExternals.llx compiling it into:
ltmp_0_30 = memcpy(l14_C, 4u, 17);
ltmp_1_30 = memcpy(((int *)l27_A), ((unsigned )(long)l27_B), ((int )123u));
instead of:
ltmp_0_30 = memcpy(l14_C, 4u, 17);
ltmp_1_27 = l43_memcpy(l27_A, l27_B, 123u);
Which does the wrong thing as you could imagine.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11481 91177308-0d34-0410-b5e6-96231b3b80d8
MRegisterInfo::getNumRegs() instead of
MRegisterInfo::FirstVirtualRegister.
Also use MRegisterInfo::is{Physical,Virtual}Register where
appropriate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11477 91177308-0d34-0410-b5e6-96231b3b80d8
initializers for constant structs and arrays take constant space, instead of
space proportinal to the number of elements. This reduces the memory usage of
the LLVM compiler by hundreds of megabytes when compiling some nasty SPEC95
benchmarks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11470 91177308-0d34-0410-b5e6-96231b3b80d8
'Constant', instead of specific subclass pointers. In the future, these will
return an instance of ConstantAggregateZero if all of the inputs are zeros.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11467 91177308-0d34-0410-b5e6-96231b3b80d8
that will be responsible for the creation of MachineFunctions and will
be required by all MachineFunctionPass passes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11453 91177308-0d34-0410-b5e6-96231b3b80d8
implementation class. This makes the code simpler and allows for more
types to be added easily. It also implements caching for generic
objects (it was only available for llvm objects).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11452 91177308-0d34-0410-b5e6-96231b3b80d8
"expensive exceptions" controlled by an option. Also refactor and eliminate a bunch of cruft.
This is a temporary solution and causes millions of warnings to pour out of programs that use
exceptions, but it should fix the problem with sparc and the 'write' declaration (PR190).
Subsequent changes will make this stink much less
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11405 91177308-0d34-0410-b5e6-96231b3b80d8
operands as incomplete, though fopen is known to only read them. This just adds
fclose for symmetry, though it doesn't gain anything. This makes the dsgraphs for
181.mcf much more precise.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11390 91177308-0d34-0410-b5e6-96231b3b80d8
allowed in invoke instructions. Thus, if we are inlining a call to an intrinsic
function into an invoke site, we don't need to turn the call into an invoke!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11384 91177308-0d34-0410-b5e6-96231b3b80d8
any attempts by LLI to use varargs (possibly left over from the introduction
of IntrinsicLowering??)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11370 91177308-0d34-0410-b5e6-96231b3b80d8
Rename SetMachineOperandConst's formal parameters to match other methods here.
Mark some methods as being used only by the SPARC back-end.
Fix a missing-paren bug in OutputValue().
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11363 91177308-0d34-0410-b5e6-96231b3b80d8
MachineBasicBlock. Also change opcode to a short and numImplicitRefs
to an unsigned char so that overall MachineInstr's size stays the
same.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11357 91177308-0d34-0410-b5e6-96231b3b80d8
ilist of MachineInstr objects. This allows constant time removal and
insertion of MachineInstr instances from anywhere in each
MachineBasicBlock. It also allows for constant time splicing of
MachineInstrs into or out of MachineBasicBlocks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11340 91177308-0d34-0410-b5e6-96231b3b80d8
MO_VirtualRegister but if the register number is one of a physical
register is it considered as a physical register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11315 91177308-0d34-0410-b5e6-96231b3b80d8
more of a testcase for profiling information than anything that should reasonably
be used, but it's a starting point. When I have more time I will whip this into
better shape.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11311 91177308-0d34-0410-b5e6-96231b3b80d8
Having a proper 'select' instruction would allow the elimination of a lot
of the special case cruft in this patch, but we don't have one yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11307 91177308-0d34-0410-b5e6-96231b3b80d8
passed into main, make sure they use the return value of the init call
instead of the one passed in.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11262 91177308-0d34-0410-b5e6-96231b3b80d8
I will observe that the concept of using WriteAsOperand is completely broken,
but then we all knew that, didn't we?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11255 91177308-0d34-0410-b5e6-96231b3b80d8
in this for programs with lots of types (like the testcase in PR224).
The problem was that the type ID that the outer vector was using was not
very dense (as many types are getting resolved), so the vector is large
and gets reallocated a lot.
Since there are a lot of values in the program (the .ll file is 10M),
each reallocation has to copy the subvectors, which is also quite slow
(this wouldn't be a problem if C++ supported move semantics, but it
doesn't, at least not yet :(
Changing the outer data structure to a map speeds a release build of
llvm-as up from 11.21s to 5.13s on the testcase in PR224.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11244 91177308-0d34-0410-b5e6-96231b3b80d8
this speeds up a release llvm-as from 21.95s to 11.21s, because before it
would do an expensive traversal of the type-graph of every type resolved.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11242 91177308-0d34-0410-b5e6-96231b3b80d8
type at the same time, resolve the upreferences to each other before resolving
it to the outer type. This shaves off some time from the testcase in PR224, from
25.41s -> 21.72s.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11241 91177308-0d34-0410-b5e6-96231b3b80d8
instead of randomly groping about inside its outEdges array.
Make SchedGraph::addDummyEdges() use getNumOutEdges() instead of
outEdges.size().
Get rid of ifdefed-out code in SchedGraph::buildGraph().
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11238 91177308-0d34-0410-b5e6-96231b3b80d8
consistent across the various type classes, we can factor out a LOT more
almost-identical code. Also, add a couple of temporary statistics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11232 91177308-0d34-0410-b5e6-96231b3b80d8
all of the ad-hoc storage of contained types. This allows getContainedType to
not be virtual, and allows us to entirely delete the TypeIterator class.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11230 91177308-0d34-0410-b5e6-96231b3b80d8
contains the type we are looking for, just search the immediately used types.
We can only do this because we keep the "current" type in the nesting level
as we decrement upreferences.
This change speeds up the testcase in PR224 from 50.4s to 22.08s, not
too shabby.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11221 91177308-0d34-0410-b5e6-96231b3b80d8
the Virt2PhysRegMap std::map with an std::vector. This speeds up the
register allocator another (almost) 40%, from .72->.45s in a release build
of LLC on 253.perlbmk.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11219 91177308-0d34-0410-b5e6-96231b3b80d8
This speeds up live variables a lot, from .60/.39s -> .47/.26s in LLC, for
the first/second pass respectively.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11216 91177308-0d34-0410-b5e6-96231b3b80d8
from physical registers, and they are always dense, it makes sense to not have
a ton of RBtree overhead. This change speeds up regalloclocal about ~30% on
253.perlbmk, from .35s -> .27s in the JIT (in LLC, it goes from .74 -> .55).
Now live variable analysis is the slowest codegen pass. Of course it doesn't
help that we have to run it twice, because regalloclocal doesn't update it,
but even if it did it would be the slowest pass (now it's just the 2x slowest
pass :(
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11215 91177308-0d34-0410-b5e6-96231b3b80d8
1. The "work" was not in the assert, so it was punishing the optimized release
2. getNamedFunction is _very_ expensive in large programs. It is not designed
to be used like this, and was taking 7% of the execution time of the code
generator on perlbmk.
Since the assert "can never fail", I'm just killing it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11214 91177308-0d34-0410-b5e6-96231b3b80d8
This causes the JIT, or LLC'd program to print out a nice message, explaining
WHY the program aborted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11184 91177308-0d34-0410-b5e6-96231b3b80d8
removeDeadNodes is called, only call it at the end of the pass being run.
This saves 1.3 seconds running DSA on 177.mesa (5.3->4.0s), which is
pretty big. This is only possible because of the automatic garbage
collection done on forwarding nodes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11178 91177308-0d34-0410-b5e6-96231b3b80d8
DSGraphs while they are forwarding. When the last reference to the forwarding
node is dropped, the forwarding node is autodeleted. This should simplify
removeTriviallyDead nodes, and is only (efficiently) possible because we are
using an ilist of dsnodes now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11175 91177308-0d34-0410-b5e6-96231b3b80d8
slots each. As a concequence they get numbered as 0, 2, 4 and so
on. The first slot is used for operand uses and the second for
defs. Here's an example:
0: A = ...
2: B = ...
4: C = A + B ;; last use of A
The live intervals should look like:
A = [1, 5)
B = [3, x)
C = [5, y)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11141 91177308-0d34-0410-b5e6-96231b3b80d8
The problem is that the dominator update code didn't "realize" that it's
possible for the newly inserted basic block to dominate anything. Because
it IS possible, stuff was getting updated wrong.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11137 91177308-0d34-0410-b5e6-96231b3b80d8
complete rewrite of load-vn will make it a bit faster. This changes speeds up
the gcse pass (which uses load-vn) from 25.45s to 0.42s on the testcase in
PR209.
I've also verified that this gives the exact same results as the old one.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11132 91177308-0d34-0410-b5e6-96231b3b80d8
1. Don't scan to the end of alloca instructions in the caller function to
insert inlined allocas, just insert at the top. This saves a lot of
time inlining into functions with a lot of allocas.
2. Use splice to move the alloca instructions over, instead of remove/insert.
This allows us to transfer a block at a time, and eliminates a bunch of
silly symbol table manipulations.
This speeds up the inliner on the testcase in PR209 from 1.73s -> 1.04s (67%)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11118 91177308-0d34-0410-b5e6-96231b3b80d8
and that basic block ends with a return instruction. In this case, we can just splice
the cloned "body" of the function directly into the source basic block, avoiding a lot
of rearrangement and splitBasicBlock's linear scan over the split block. This speeds up
the inliner on the testcase in PR209 from 2.3s to 1.7s, a 35% reduction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11116 91177308-0d34-0410-b5e6-96231b3b80d8
before we delete the original call site, allowing slight simplifications of
code, but nothing exciting.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11109 91177308-0d34-0410-b5e6-96231b3b80d8
process. The only optimization we did so far is to avoid creating a
PHI node, then immediately destroying it in the common case where the
callee has one return statement. Instead, we just don't create the return
value. This has no noticable performance impact, but paves the way for
future improvements.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11108 91177308-0d34-0410-b5e6-96231b3b80d8
to add the cloned block to. This allows the block to be added to the function
immediately, and all of the instructions to be immediately added to the function
symbol table, which speeds up the inliner from 3.7 -> 3.38s on the PR209.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11107 91177308-0d34-0410-b5e6-96231b3b80d8
instead of a loop that is really inefficient with large basic blocks.
This speeds up the inliner pass on the testcase in PR209 from 13.8s to 2.24s
which still isn't exactly speedy, but is a lot better. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11105 91177308-0d34-0410-b5e6-96231b3b80d8
process them all as a group. This speeds up SRoA/mem2reg from 28.46s to
0.62s on the testcase from PR209.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11100 91177308-0d34-0410-b5e6-96231b3b80d8
Basically we store floating point values as their integral components, instead of relying
on the semantics of floating point < to differentiate between values. This is likely to
make the map search be faster anyway.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11064 91177308-0d34-0410-b5e6-96231b3b80d8
registers (not as the max number of registers).
Change toSpill from a std::set into a std::vector<bool>.
Use the reverse iterator adapter to do a reverse scan of allocatable
registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@11061 91177308-0d34-0410-b5e6-96231b3b80d8