a lot throughout many programs. In particular, specfp triggers it a bunch for
constant FP nodes when you have code like cond ? 1.0 : -1.0.
If the PPC ISel exposed the loads implicit in pic references to external globals,
we would be able to eliminate a load in cases like this as well:
%X = external global int
%Y = external global int
int* %test4(bool %C) {
%G = select bool %C, int* %X, int* %Y
ret int* %G
}
Note that this breaks things that use SrcValue's (see the fixme), but since nothing
uses them yet, this is ok.
Also, simplify some code to use hasOneUse() on an SDOperand instead of hasNUsesOfValue directly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23781 91177308-0d34-0410-b5e6-96231b3b80d8
you could be AND'ing with the result of a shift that shifts out all the
bits you care about, in addition to a constant.
Also, move over an add/sub_parts fold from legalize to the dag combiner,
where it works for things other than constants. Woot!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23720 91177308-0d34-0410-b5e6-96231b3b80d8
out, where after the first CombineTo() call, the node the second CombineTo
wishes to replace may no longer exist.
Fix a very real bug with the truncated load optimization on little endian
targets, which do not need a byte offset added to the load.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23704 91177308-0d34-0410-b5e6-96231b3b80d8
like turning:
_foo:
fctiwz f0, f1
stfd f0, -8(r1)
lwz r2, -4(r1)
rlwinm r3, r2, 0, 16, 31
blr
into
_foo:
fctiwz f0,f1
stfd f0,-8(r1)
lhz r3,-2(r1)
blr
Also removed an unncessary constraint from sra -> srl conversion, which
should take care of hte only reason we would ever need to handle sra in
MaskedValueIsZero, AFAIK.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23703 91177308-0d34-0410-b5e6-96231b3b80d8
location, replace them with a new store of the last value. This occurs
in the same neighborhood in 197.parser, speeding it up about 1.5%
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23691 91177308-0d34-0410-b5e6-96231b3b80d8
multiple results.
Use this support to implement trivial store->load forwarding, implementing
CodeGen/PowerPC/store-load-fwd.ll. Though this is the most simple case and
can be extended in the future, it is still useful. For example, it speeds
up 197.parser by 6.2% by avoiding an LSU reject in xalloc:
stw r6, lo16(l5_end_of_array)(r2)
addi r2, r5, -4
stwx r5, r4, r2
- lwzx r5, r4, r2
- rlwinm r5, r5, 0, 0, 30
stwx r5, r4, r2
lwz r2, -4(r4)
ori r2, r2, 1
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23690 91177308-0d34-0410-b5e6-96231b3b80d8
creating a new vreg and inserting a copy: just use the input vreg directly.
This speeds up the compile (e.g. about 5% on mesa with a debug build of llc)
by not adding a bunch of copies and vregs to be coallesced away. On mesa,
for example, this reduces the number of intervals from 168601 to 129040
going into the coallescer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23671 91177308-0d34-0410-b5e6-96231b3b80d8
previous copy elisions and we discover we need to reload a register, make
sure to use the regclass of the original register for the reload, not the
class of the current register. This avoid using 16-bit loads to reload 32-bit
values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23645 91177308-0d34-0410-b5e6-96231b3b80d8
store r12 -> [ss#2]
R3 = load [ss#1]
use R3
R3 = load [ss#2]
R4 = load [ss#1]
and turn it into this code:
store R12 -> [ss#2]
R3 = load [ss#1]
use R3
R3 = R12
R4 = R3 <- oops!
The problem was that promoting R3 = load[ss#2] to a copy missed the fact that
the instruction invalidated R3 at that point.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23638 91177308-0d34-0410-b5e6-96231b3b80d8
with the dag combiner. This speeds up espresso by 8%, reaching performance
parity with the dag-combiner-disabled llc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23636 91177308-0d34-0410-b5e6-96231b3b80d8
dead node elim and dag combiner passes where the root is potentially updated.
This fixes a fixme in the dag combiner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23634 91177308-0d34-0410-b5e6-96231b3b80d8
that testcase still does not pass with the dag combiner. This is because
not all forms of br* are folded yet.
Also, when we combine a node into another one, delete the node immediately
instead of waiting for the node to potentially come up in the future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23632 91177308-0d34-0410-b5e6-96231b3b80d8
Since calls return more than one value, don't bail if one of their uses
happens to be a node that's not an MVT::Other when following the chain
from CALLSEQ_START to CALLSEQ_END.
Once we've found a CALLSEQ_START, we can just return; there's no need to
tail-recurse further up the graph.
Most importantly, just because something only has one use doesn't mean we
should use it's one use to follow from start to end. This faulty logic
caused us to follow a chain of one-use FP operations back to a much earlier
call, putting a cycle in the graph from a later start to an earlier end.
This is a better fix that reverting to the workaround committed earlier
today.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23620 91177308-0d34-0410-b5e6-96231b3b80d8
Neither of us have yet figured out why this code is necessary, but stuff
breaks if its not there. Still tracking this down...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23617 91177308-0d34-0410-b5e6-96231b3b80d8
large basic blocks because it was purely recursive. This switches it to an
iterative/recursive hybrid.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23596 91177308-0d34-0410-b5e6-96231b3b80d8
For instructions that define multiple results, use the right regclass
to define the result, not always the rc of result #0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23580 91177308-0d34-0410-b5e6-96231b3b80d8
2. Added node groups to handle flagged nodes.
3. Started weaning simple scheduling off existing emitter.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23566 91177308-0d34-0410-b5e6-96231b3b80d8
Though I have done extensive testing, it is possible that this will break
things in configs I can't test. Please let me know if this causes a problem
and I'll fix it ASAP.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23504 91177308-0d34-0410-b5e6-96231b3b80d8
for testing and will require target machine info to do a proper scheduling.
The simple scheduler can be turned on using -sched=simple (defaults
to -sched=none)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23455 91177308-0d34-0410-b5e6-96231b3b80d8
This happens all the time on PPC for bool values, e.g. eliminating a xori
in inverted-bool-compares.ll.
This should be added to the dag combiner as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23403 91177308-0d34-0410-b5e6-96231b3b80d8
when possible, avoiding the load (and avoiding the copy if the value is already
in the right register).
This patch came about when I noticed code like the following being generated:
store R17 -> [SS1]
...blah...
R4 = load [SS1]
This was causing an LSU reject on the G5. This problem was due to the register
allocator folding spill code into a reg-reg copy (producing the load), which
prevented the spiller from being able to rewrite the load into a copy, despite
the fact that the value was already available in a register. In the case
above, we now rip out the R4 load and replace it with a R4 = R17 copy.
This speeds up several programs on X86 (which spills a lot :) ), e.g.
smg2k from 22.39->20.60s, povray from 12.93->12.66s, 168.wupwise from
68.54->53.83s (!), 197.parser from 7.33->6.62s (!), etc. This may have a larger
impact in some cases on the G5 (by avoiding LSU rejects), though it probably
won't trigger as often (less spilling in general).
Targets that implement folding of loads/stores into copies should implement
the isLoadFromStackSlot hook to get this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23388 91177308-0d34-0410-b5e6-96231b3b80d8
select (x < y), 1, 0 -> (x < y) incorrectly: the setcc returns i1 but the
select returned i32. Add the zero extend as needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23301 91177308-0d34-0410-b5e6-96231b3b80d8
are, simplify logic, and cause things to not be nested as deeply. This also
uses MRI->areAliases instead of an explicit loop.
No functionality change, just code cleanup.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23296 91177308-0d34-0410-b5e6-96231b3b80d8
only add a reload live range once for the instruction. This is one step
towards fixing a regalloc pessimization that Nate notice, but is later undone
by the spiller (so no code is changed).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23293 91177308-0d34-0410-b5e6-96231b3b80d8
This restores all of stanford to being identical with and without the dag
combiner with the add folding turned off in sd.cpp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23258 91177308-0d34-0410-b5e6-96231b3b80d8
we were losing a node, causing an assertion to fail. Now we eagerly delete
discovered CSE's, and provide an optional vector to keep track of these
discovered equivalences.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23255 91177308-0d34-0410-b5e6-96231b3b80d8