The PPC::EXTSW instruction preserves the low 32 bits of its input, just
like some of the x86 instructions. Use it to reduce register pressure
when the low 32 bits have multiple uses.
This requires a small change to PeepholeOptimizer since EXTSW takes a
64-bit input register.
This is related to PR5997.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158743 91177308-0d34-0410-b5e6-96231b3b80d8
Thanks to Jakob's help, this now causes no new test suite failures!
Over the entire test suite, this gives an average 1% speedup. The largest speedups are:
SingleSource/Benchmarks/Misc/pi - 108%
SingleSource/Benchmarks/CoyoteBench/lpbench - 54%
MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50%
SingleSource/Benchmarks/Shootout/ary3 - 32%
SingleSource/Benchmarks/Shootout-C++/matrix - 30%
The largest slowdowns are:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30%
MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22%
MultiSource/Applications/d/make_dparser - -14%
SingleSource/Benchmarks/Shootout-C++/ary - -13%
In light of these slowdowns, additional profiling work is obviously needed!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158223 91177308-0d34-0410-b5e6-96231b3b80d8
The pass itself works well, but the something in the Machine* infrastructure
does not understand terminators which define registers. Without the ability
to use the block-placement pass, etc. this causes performance regressions (and
so is turned off by default). Turning off the analysis turns off the problems
with the Machine* infrastructure.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158206 91177308-0d34-0410-b5e6-96231b3b80d8
This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon
pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are
no longer otherwise used. Also, invalid preheader DebugLoc is not used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158204 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a full itinerary for IBM's PPC64 A2 embedded core. These
cores form the basis for the CPUs in the new IBM BG/Q supercomputer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153842 91177308-0d34-0410-b5e6-96231b3b80d8
Dynamic linking on PPC64 has had problems since we had to move the top-down
hazard-detection logic post-ra. For dynamic linking to work there needs to be
a nop placed after every call. It turns out that it is really hard to guarantee
that nothing will be placed in between the call (bl) and the nop during post-ra
scheduling. Previous attempts at fixing this by placing logic inside the
hazard detector only partially worked.
This is now fixed in a different way: call+nop codegen-only instructions. As far
as CodeGen is concerned the pair is now a single instruction and cannot be split.
This solution works much better than previous attempts.
The scoreboard hazard detector is also renamed to be more generic, there is currently
no cpu-specific logic in it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153816 91177308-0d34-0410-b5e6-96231b3b80d8
and MCSubtargetInfo.
- Added methods to update subtarget features (used when targets automatically
detect subtarget features or switch modes).
- Teach X86Subtarget to update MCSubtargetInfo features bits since the
MCSubtargetInfo layer can be shared with other modules.
- These fixes .code 16 / .code 32 support since mode switch is updated in
MCSubtargetInfo so MC code emitter can do the right thing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@134884 91177308-0d34-0410-b5e6-96231b3b80d8
sink them into MC layer.
- Added MCInstrInfo, which captures the tablegen generated static data. Chang
TargetInstrInfo so it's based off MCInstrInfo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@134021 91177308-0d34-0410-b5e6-96231b3b80d8
DAG scheduling during isel. Most new functionality is currently
guarded by -enable-sched-cycles and -enable-sched-hazard.
Added InstrItineraryData::IssueWidth field, currently derived from
ARM itineraries, but could be initialized differently on other targets.
Added ScheduleHazardRecognizer::MaxLookAhead to indicate whether it is
active, and if so how many cycles of state it holds.
Added SchedulingPriorityQueue::HasReadyFilter to allowing gating entry
into the scheduler's available queue.
ScoreboardHazardRecognizer now accesses the ScheduleDAG in order to
get information about it's SUnits, provides RecedeCycle for bottom-up
scheduling, correctly computes scoreboard depth, tracks IssueCount, and
considers potential stall cycles when checking for hazards.
ScheduleDAGRRList now models machine cycles and hazards (under
flags). It tracks MinAvailableCycle, drives the hazard recognizer and
priority queue's ready filter, manages a new PendingQueue, properly
accounts for stall cycles, etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122541 91177308-0d34-0410-b5e6-96231b3b80d8
operands.
Hopefully this fixes the llvm-gcc-powerpc-darwin9 buildbot. It really shouldn't
since missing memoperands should not affect correctness.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@108540 91177308-0d34-0410-b5e6-96231b3b80d8
The only folding these load/store architectures can do is converting COPY into a
load or store, and the target independent part of foldMemoryOperand already
knows how to do that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@108099 91177308-0d34-0410-b5e6-96231b3b80d8
addresses a longstanding deficiency noted in many FIXMEs scattered
across all the targets.
This effectively moves the problem up one level, replacing eleven
FIXMEs in the targets with eight FIXMEs in CodeGen, plus one path
through FastISel where we actually supply a DebugLoc, fixing Radar
7421831.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@106243 91177308-0d34-0410-b5e6-96231b3b80d8
registers. Currently it is not so marked, which leads to
VCMPEQ instructions that feed into it getting deleted.
If it is so marked, local RA complains about this sequence:
vreg = MCRF CR0
MFCR <kill of whatever preg got assigned to vreg>
All current uses of this instruction are only interested in
one of the 8 CR registers, so redefine MFCR to be a normal
unary instruction with a CR input (which is emitted only as
a comment). That avoids all problems. 7739628.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@104238 91177308-0d34-0410-b5e6-96231b3b80d8
This is possible because F8RC is a subclass of F4RC. We keep FMRSD around so
fextend has a pattern.
Also allow folding of memory operands on FMRSD.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@97275 91177308-0d34-0410-b5e6-96231b3b80d8