more callee-saved registers and introduce copies. Only allows it if scheduling
a node above calls would end up lessen register pressure.
Call operands also has added ABI restrictions for register allocation, so be
extra careful with hoisting them above calls.
rdar://9329627
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@130245 91177308-0d34-0410-b5e6-96231b3b80d8
This has two effects: 1. We never inflate to a larger register class than what
the sub-target can handle. 2. Completely unconstrained virtual registers get the
largest possible register class.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@130229 91177308-0d34-0410-b5e6-96231b3b80d8
fix bugs exposed by the gcc dejagnu testsuite:
1. The load may actually be used by a dead instruction, which
would cause an assert.
2. The load may not be used by the current chain of instructions,
and we could move it past a side-effecting instruction. Change
how we process uses to define the problem away.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@130018 91177308-0d34-0410-b5e6-96231b3b80d8
The basic issue here is that bottom-up isel is matching the branch
and compare, and was failing to fold the load into the branch/compare
combo. Fixing this (by allowing folding into any instruction of a
sequence that is selected) allows us to produce things like:
cmpb $0, 52(%rax)
je LBB4_2
instead of:
movb 52(%rax), %cl
cmpb $0, %cl
je LBB4_2
This makes the generated -O0 code run a bit faster, but also speeds up
compile time by putting less pressure on the register allocator and
generating less code.
This was one of the biggest classes of missing load folding. Implementing
this shrinks 176.gcc's c-decl.s (as a random example) by about 4% in (verbose-asm)
line count.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129656 91177308-0d34-0410-b5e6-96231b3b80d8
Change ELF systems to use CFI for producing the EH tables. This reduces the
size of the clang binary in Debug builds from 690MB to 679MB.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129571 91177308-0d34-0410-b5e6-96231b3b80d8
This is done by pushing physical register definitions close to their
use, which happens to handle flag definitions if they're not glued to
the branch. This seems to be generally a good thing though, so I
didn't need to add a target hook yet.
The primary motivation is to generate code closer to what people
expect and rule out missed opportunity from enabling macro-op
fusion. As a side benefit, we get several 2-5% gains on x86
benchmarks. There is one regression:
SingleSource/Benchmarks/Shootout/lists slows down be -10%. But this is
an independent scheduler bug that will be tracked separately.
See rdar://problem/9283108.
Incidentally, pre-RA scheduling is only half the solution. Fixing the
later passes is tracked by:
<rdar://problem/8932804> [pre-RA-sched] on x86, attempt to schedule CMP/TEST adjacent with condition jump
Fixes:
<rdar://problem/9262453> Scheduler unnecessary break of cmp/jump fusion
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129508 91177308-0d34-0410-b5e6-96231b3b80d8
It is common for large live ranges to have few basic blocks with register uses
and many live-through blocks without any uses. This approach grows the Hopfield
network incrementally around the use blocks, completely avoiding checking
interference for some through blocks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129188 91177308-0d34-0410-b5e6-96231b3b80d8
induction variable. The preRA scheduler is unaware of induction vars,
so we look for potential "virtual register cycles" instead.
Fixes <rdar://problem/8946719> Bad scheduling prevents coalescing
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@129100 91177308-0d34-0410-b5e6-96231b3b80d8
This allows us to always keep the smaller slot for an instruction which is what
we want when a register has early clobber defines.
Drop the UsingInstrs set and the UsingBlocks map. They are no longer needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128886 91177308-0d34-0410-b5e6-96231b3b80d8
inlined path for the common case.
Most basic blocks don't contain a call that may throw, so the last split point
os simply the first terminator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128874 91177308-0d34-0410-b5e6-96231b3b80d8
Correctly terminate the range of register DBG_VALUEs when the register is
clobbered or when the basic block ends.
The code is now ready to deal with variables that are sometimes in a register
and sometimes on the stack. We just need to teach emitDebugLoc to say 'stack
slot'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128327 91177308-0d34-0410-b5e6-96231b3b80d8
I have convinced myself that it can only happen when a phi value dies. When it
happens, allocate new virtual registers for the components.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127827 91177308-0d34-0410-b5e6-96231b3b80d8
Initially, slot indexes are quad-spaced. There is room for inserting up to 3
new instructions between the original instructions.
When we run out of indexes between two instructions, renumber locally using
double-spaced indexes. The original quad-spacing means that we catch up quickly,
and we only have to renumber a handful of instructions to get a monotonic
sequence. This is much faster than renumbering the whole function as we did
before.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127023 91177308-0d34-0410-b5e6-96231b3b80d8