so the scheduler can't create new interferences on the copies
themselves. Prior to this fix the scheduler could get stuck in a loop
creating copies.
Fixes PR9509.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@128164 91177308-0d34-0410-b5e6-96231b3b80d8
flexible.
If it returns a register class that's different from the input, then that's the
register class used for cross-register class copies.
If it returns a register class that's the same as the input, then no cross-
register class copies are needed (normal copies would do).
If it returns null, then it's not at all possible to copy registers of the
specified register class.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127368 91177308-0d34-0410-b5e6-96231b3b80d8
with this before since none of the register tracking or nightly tests
had unschedulable nodes.
This should probably be refixed with a special default Node that just
returns some "don't touch me" values.
Fixes PR9427
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127263 91177308-0d34-0410-b5e6-96231b3b80d8
This change uses the MaxReorderWindow for both height and depth, which
tends to limit the negative effects of high register pressure.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127203 91177308-0d34-0410-b5e6-96231b3b80d8
regs. This is the only change in this checkin that may affects the
default scheduler. With better register tracking and heuristics, it
doesn't make sense to artificially lower the register limit so much.
Added -sched-high-latency-cycles and X86InstrInfo::isHighLatencyDef to
give the scheduler a way to account for div and sqrt on targets that
don't have an itinerary. It is currently defaults to 10 (the actual
number doesn't matter much), but only takes effect on non-default
schedulers: list-hybrid and list-ilp.
Added several heuristics that can be individually disabled for the
non-default sched=list-ilp mode. This helps us determine how much
better we can do on a given benchmark than the default
scheduler. Certain compute intensive loops run much faster in this
mode with the right set of heuristics, and it doesn't seem to have
much negative impact elsewhere. Not all of the heuristics are needed,
but we still need to experiment to decide which should be disabled by
default for sched=list-ilp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@127067 91177308-0d34-0410-b5e6-96231b3b80d8
Fix the PendingQueue, then disable it because it's not required for
the current schedulers' heuristics.
Fix the logic for the unused list-ilp scheduler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@126981 91177308-0d34-0410-b5e6-96231b3b80d8
precisely track pressure on a selection DAG, but we can at least keep
it balanced. This design accounts for various interesting aspects of
selection DAGS: register and subregister copies, glued nodes, dead
nodes, unused registers, etc.
Added SUnit::NumRegDefsLeft and ScheduleDAGSDNodes::RegDefIter.
Note: I disabled PrescheduleNodesWithMultipleUses when register
pressure is enabled, based on no evidence other than I don't think it
makes sense to have both enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@124853 91177308-0d34-0410-b5e6-96231b3b80d8
DAG. Disable using "-disable-sched-cycles".
For ARM, this enables a framework for modeling the cpu pipeline and
counting stalls. It also activates several heuristics to drive
scheduling based on the model. Scheduling is inherently imprecise at
this stage, and until spilling is improved it may defeat attempts to
schedule. However, this framework provides greater control over
tuning codegen.
Although the flag is not target-specific, it should have very little
affect on the default scheduler used by x86. The only two changes that
affect x86 are:
- scheduling a high-latency operation bumps the current cycle so independent
operations can have their latency covered. i.e. two independent 4
cycle operations can produce results in 4 cycles, not 8 cycles.
- Two operations with equal register pressure impact and no
latency-based stalls on their uses will be prioritized by depth before height
(height is irrelevant if no stalls occur in the schedule below this point).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123971 91177308-0d34-0410-b5e6-96231b3b80d8
flags. They are still not enable in this revision.
Added TargetInstrInfo::isZeroCost() to fix a fundamental problem with
the scheduler's model of operand latency in the selection DAG.
Generalized unit tests to work with sched-cycles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123969 91177308-0d34-0410-b5e6-96231b3b80d8
Added a check for already live regs before claiming HighRegPressure.
Fixed a few cases of checking the wrong number of successors.
Added some tracing until these heuristics are better understood.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123892 91177308-0d34-0410-b5e6-96231b3b80d8
disabled in this checkin. Sorry for the large diffs due to
refactoring. New functionality is all guarded by EnableSchedCycles.
Scheduling the isel DAG is inherently imprecise, but we give it a best
effort:
- Added MayReduceRegPressure to allow stalled nodes in the queue only
if there is a regpressure need.
- Added BUHasStall to allow checking for either dependence stalls due to
latency or resource stalls due to pipeline hazards.
- Added BUCompareLatency to encapsulate and standardize the heuristics
for minimizing stall cycles (vs. reducing register pressure).
- Modified the bottom-up heuristic (now in BUCompareLatency) to
prioritize nodes by their depth rather than height. As long as it
doesn't stall, height is irrelevant. Depth represents the critical
path to the DAG root.
- Added hybrid_ls_rr_sort::isReady to filter stalled nodes before
adding them to the available queue.
Related Cleanup: most of the register reduction routines do not need
to be templates.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123468 91177308-0d34-0410-b5e6-96231b3b80d8
DAG scheduling during isel. Most new functionality is currently
guarded by -enable-sched-cycles and -enable-sched-hazard.
Added InstrItineraryData::IssueWidth field, currently derived from
ARM itineraries, but could be initialized differently on other targets.
Added ScheduleHazardRecognizer::MaxLookAhead to indicate whether it is
active, and if so how many cycles of state it holds.
Added SchedulingPriorityQueue::HasReadyFilter to allowing gating entry
into the scheduler's available queue.
ScoreboardHazardRecognizer now accesses the ScheduleDAG in order to
get information about it's SUnits, provides RecedeCycle for bottom-up
scheduling, correctly computes scoreboard depth, tracks IssueCount, and
considers potential stall cycles when checking for hazards.
ScheduleDAGRRList now models machine cycles and hazards (under
flags). It tracks MinAvailableCycle, drives the hazard recognizer and
priority queue's ready filter, manages a new PendingQueue, properly
accounts for stall cycles, etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122541 91177308-0d34-0410-b5e6-96231b3b80d8
In the bottom-up selection DAG scheduling, handle two-address
instructions that read/write unspillable registers. Treat
the entire chain of two-address nodes as a single live range.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122472 91177308-0d34-0410-b5e6-96231b3b80d8
Imagine we see:
EFLAGS = inst1
EFLAGS = inst2 FLAGS
gpr = inst3 EFLAGS
Previously, we would refuse to schedule inst2 because it clobbers
the EFLAGS of the predecessor. However, it also uses the EFLAGS
of the predecessor, so it is safe to emit. SDep edges ensure that
the right order happens already anyway.
This fixes 2 testsuite crashes with the X86 patch I'm going to
commit next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122211 91177308-0d34-0410-b5e6-96231b3b80d8
1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to
"optimize for latency". Call instructions don't have the right latency and
this is more likely to use introduce spills.
2. Fix if-converter cost function. For ARM, it should use instruction latencies,
not # of micro-ops since multi-latency instructions is completely executed
even when the predicate is false. Also, some instruction will be "slower"
when they are predicated due to the register def becoming implicit input.
rdar://8598427
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@118135 91177308-0d34-0410-b5e6-96231b3b80d8
operand and one of them has a single use that is a live out copy, favor the
one that is live out. Otherwise it will be difficult to eliminate the copy
if the instruction is a loop induction variable update. e.g.
BB:
sub r1, r3, #1
str r0, [r2, r3]
mov r3, r1
cmp
bne BB
=>
BB:
str r0, [r2, r3]
sub r3, r3, #1
cmp
bne BB
This fixed the recent 256.bzip2 regression.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@117675 91177308-0d34-0410-b5e6-96231b3b80d8
appropriate for targets without detailed instruction iterineries.
The scheduler schedules for increased instruction level parallelism in
low register pressure situation; it schedules to reduce register pressure
when the register pressure becomes high.
On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2
by 16%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109300 91177308-0d34-0410-b5e6-96231b3b80d8
it's too late to start backing off aggressive latency scheduling when most
of the registers are in use so the threshold should be a bit tighter.
- Correctly handle live out's and extract_subreg etc.
- Enable register pressure aware scheduling by default for hybrid scheduler.
For ARM, this is almost always a win on # of instructions. It's runtime
neutral for most of the tests. But for some kernels with high register
pressure it can be a huge win. e.g. 464.h264ref reduced number of spills by
54 and sped up by 20%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109279 91177308-0d34-0410-b5e6-96231b3b80d8
of getPhysicalRegisterRegClass with it.
If we want to make a copy (or estimate its cost), it is better to use the
smallest class as more efficient operations might be possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@107140 91177308-0d34-0410-b5e6-96231b3b80d8