Generalize r188163 to cope with return types other than MVT::i32, just
as the existing visitMemCmpCall code did. I've split this out into a
subroutine so that it can be used for other upcoming patches.
I also noticed that I'd used the wrong API to record the out chain.
It's a load that uses DAG.getRoot() rather than getRoot(), so the out
chain should go on PendingLoads. I don't have a testcase for that because
we don't do any interesting scheduling on z yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188540 91177308-0d34-0410-b5e6-96231b3b80d8
r188163 used CLC to implement memcmp. Code that compares the result
directly against zero can test the CC value produced by CLC, but code
that needs an integer result must use IPM. The sequence I'd used was:
ipm <reg>
sll <reg>, 2
sra <reg>, 30
but I'd forgotten that this inverts the order, so that CC==1 ("less")
becomes an integer greater than zero, and CC==2 ("greater") becomes
an integer less than zero. This sequence should only be used if the
CLC arguments are reversed to compensate. The problem then is that
the branch condition must also be reversed when testing the CLC
result directly.
Rather than do that, I went for a different sequence that works with
the natural CLC order:
ipm <reg>
srl <reg>, 28
rll <reg>, <reg>, 31
One advantage of this is that it doesn't clobber CC. A disadvantage
is that any sign extension to 64 bits must be done separately,
rather than being folded into the shifts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188538 91177308-0d34-0410-b5e6-96231b3b80d8
- Instead of setting the suffixes in a bunch of places, just set one master
list in the top-level config. We now only modify the suffix list in a few
suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).
- Aside from removing the need for a bunch of lit.local.cfg files, this enables
4 tests that were inadvertently being skipped (one in
Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
XFAILED).
- This commit also fixes a bunch of config files to use config.root instead of
older copy-pasted code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188513 91177308-0d34-0410-b5e6-96231b3b80d8
For now this is restricted to fixed-length comparisons with a length
in the range [1, 256], as for memcpy() and MVC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188163 91177308-0d34-0410-b5e6-96231b3b80d8
This follows the same lines as the integer code. In the end it seemed
easier to have a second 4-bit mask in TSFlags to specify the compare-like
CC values. That eats one more TSFlags bit than adding a CCHasUnordered
would have done, but it feels more concise.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187883 91177308-0d34-0410-b5e6-96231b3b80d8
This patch just uses a peephole test for "add; compare; branch" sequences
within a single block. The IR optimizers already convert loops to
decrement-and-branch-on-nonzero form in some cases, so even this
simplistic test triggers many times during a clang bootstrap and
projects/test-suite run. It looks like there are still cases where we
need to more strongly prefer branches on nonzero though. E.g. I saw a
case where a loop that started out with a check for 0 ended up with a
check for -1. I'll try to look at that sometime.
I ended up adding the Reference class because MachineInstr::readsRegister()
doesn't check for subregisters (by design, as far as I could tell).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187723 91177308-0d34-0410-b5e6-96231b3b80d8
This also fixes a bug in the predication of LR to LOCR: I'd forgotten
that with these in-place instruction builds, the implicit operands need
to be added manually. I think this was latent until now, but is tested
by int-cmp-45.c. It also adds a CC valid mask to STOC, again tested by
int-cmp-45.c.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187573 91177308-0d34-0410-b5e6-96231b3b80d8
Convert >= 1 to > 0, etc. Using comparison with zero isn't a win on its own,
but it exposes more opportunities for CC reuse (the next patch).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187571 91177308-0d34-0410-b5e6-96231b3b80d8
The loop optimizers were assuming that scales > 1 were OK. I think this
is actually a bug in TargetLoweringBase::isLegalAddressingMode(),
since it seems to be trying to reject anything that isn't r+i or r+r,
but it has no default case for scales other than 0, 1 or 2. Implementing
the hook for z means that z can no longer test any change there though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187497 91177308-0d34-0410-b5e6-96231b3b80d8
Extend r187495 to conditional loads. I split this out because the
easiest way seemed to be to force a particular operand order in
SystemZISelDAGToDAG.cpp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187496 91177308-0d34-0410-b5e6-96231b3b80d8
System z branches have a mask to select which of the 4 CC values should
cause the branch to be taken. We can invert a branch by inverting the mask.
However, not all instructions can produce all 4 CC values, so inverting
the branch like this can lead to some oddities. For example, integer
comparisons only produce a CC of 0 (equal), 1 (less) or 2 (greater).
If an integer EQ is reversed to NE before instruction selection,
the branch will test for 1 or 2. If instead the branch is reversed
after instruction selection (by inverting the mask), it will test for
1, 2 or 3. Both are correct, but the second isn't really canonical.
This patch therefore keeps track of which CC values are possible
and uses this when inverting a mask.
Although this is mostly cosmestic, it fixes undefined behavior
for the CIJNLH in branch-08.ll. Another fix would have been
to mask out bit 0 when generating the fused compare and branch,
but the point of this patch is that we shouldn't need to do that
in the first place.
The patch also makes it easier to reuse CC results from other instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187495 91177308-0d34-0410-b5e6-96231b3b80d8
r187116 moved compare-and-branch generation from the instruction-selection
pass to the peephole optimizer (via optimizeCompare). It turns out that even
this is a bit too early. Fused compare-and-branch instructions don't
interact well with predication, where a CC result is needed. They also
make it harder to reuse the CC side-effects of earlier instructions
(not yet implemented, but the subject of a later patch).
Another problem was that the AnalyzeBranch family of routines weren't
handling compares and branches, so we weren't able to reverse the fused
form in cases where we would reverse a separate branch. This could have
been fixed by extending AnalyzeBranch, but given the other problems,
I've instead moved the fusing to the long-branch pass, which is also
responsible for the opposite transformation: splitting out-of-range
compares and branches into separate compares and long branches.
I've added a test for the AnalyzeBranch problem. A test for the
predication problem is included in the next patch, which fixes a bug
in the choice of CC mask.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187494 91177308-0d34-0410-b5e6-96231b3b80d8
r186399 aggressively used the RISBG instruction for immediate ANDs,
both because it can handle some values that AND IMMEDIATE can't,
and because it allows the destination register to be different from
the source. I realized later while implementing the distinct-ops
support that it would be better to leave the choice up to
convertToThreeAddress() instead. The AND IMMEDIATE form is shorter
and is less likely to be cracked.
This is a problem for 32-bit ANDs because we assume that all 32-bit
operations will leave the high word untouched, whereas RISBG used in
this way will either clear the high word or copy it from the source
register. The patch uses the z196 instruction RISBLG for this instead.
This means that z10 will be restricted to NILL, NILH and NILF for
32-bit ANDs, but I think that should be OK for now. Although we're
using z10 as the base architecture, the optimization work is going
to be focused more on z196 and zEC12.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187492 91177308-0d34-0410-b5e6-96231b3b80d8
Before the patch we took advantage of the fact that the compare and
branch are glued together in the selection DAG and fused them together
(where possible) while emitting them. This seemed to work well in practice.
However, fusing the compare so early makes it harder to remove redundant
compares in cases where CC already has a suitable value. This patch
therefore uses the peephole analyzeCompare/optimizeCompareInstr pair of
functions instead.
No behavioral change intended, but it paves the way for a later patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187116 91177308-0d34-0410-b5e6-96231b3b80d8
As with the stores, these instructions can trap when the condition is false,
so they are only used for things like (cond ? x : *ptr).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187112 91177308-0d34-0410-b5e6-96231b3b80d8
These instructions are allowed to trap even if the condition is false,
so for now they are only used for "*ptr = (cond ? x : *ptr)"-style
constructs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187111 91177308-0d34-0410-b5e6-96231b3b80d8
The insn definitions themselves crept into r186689, sorry.
This should be the last of the distinct-ops instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186690 91177308-0d34-0410-b5e6-96231b3b80d8
Follows the same lines as r186686, but much more limited, since we only
use ADD LOGICAL for multi-i64 additions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186689 91177308-0d34-0410-b5e6-96231b3b80d8
The atomic tests assume the two-operand forms, so I've restricted them to z10.
Running and-01.ll, or-01.ll and xor-01.ll for z196 as well as z10 shows why
using convertToThreeAddress() is better than exposing the three-operand forms
first and then converting back to two operands where possible (which is what
I'd originally tried). Using the three-operand form first stops us from
taking advantage of NG, OG and XG for spills.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186683 91177308-0d34-0410-b5e6-96231b3b80d8
The original code only folded SRA into ROTATE ... SELECTED BITS
if there was no outer shift. This patch splits out that check
and generalises it slightly. The extra cases aren't really that
interesting, but this is paving the way for RNSBG support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186571 91177308-0d34-0410-b5e6-96231b3b80d8
Another patch in the series to make more use of R.SBG. This one extends
r186072 and r186073 to handle cases where the AND is inside the shift.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186399 91177308-0d34-0410-b5e6-96231b3b80d8
This update was done with the following bash script:
find test/CodeGen -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc.*debug" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
done
sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
mv $TEMP $NAME
fi
done
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186280 91177308-0d34-0410-b5e6-96231b3b80d8
Normal (sext (setcc ...)) sequences are optimised into
(select_cc ..., -1, 0) by DAGCombiner::visitSIGN_EXTEND.
However, this is deliberately not done for vectors, and after
vector type legalization we have (sext_inreg (setcc ...)) instead.
I wondered about trying to extend DAGCombiner to handle this case too,
but it seemed to be a loss on some other targets I tried, even those for
which SETCC isn't "legal" and SELECT_CC is.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186149 91177308-0d34-0410-b5e6-96231b3b80d8
If the source of these instructions is spilled we should load the destination.
If the destination is spilled we should store the source.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186147 91177308-0d34-0410-b5e6-96231b3b80d8
RISBG can handle some ANDs for which no AND IMMEDIATE exists.
It also acts as a three-operand AND for some cases where an
AND IMMEDIATE could be used instead.
It might be worth adding a pass to replace RISBG with AND IMMEDIATE
in cases where the register operands end up being the same and where
AND IMMEDIATE is smaller.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186072 91177308-0d34-0410-b5e6-96231b3b80d8
Look for patterns of the form (store (load ...), ...) in which the two
locations are known not to partially overlap. (Identical locations are OK.)
These sequences are better implemented by MVC unless either the load or
the store could use RELATIVE LONG instructions.
The testcase showed that we weren't using LHRL and LGHRL for extload16,
only sextloadi16. The patch fixes that too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185919 91177308-0d34-0410-b5e6-96231b3b80d8
Use "STC;MVC" for memsets that are too big for two STCs or MV...Is yet
small enough for a single MVC. As with memcpy, I'm leaving longer cases
till later.
The number of tests might seem excessive, but f33 & f34 from memset-04.ll
failed the first cut because I'd not added the "?:" on the calculation
of Size1.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185918 91177308-0d34-0410-b5e6-96231b3b80d8
Use MVC for memcpy in cases where a single MVC is enough. Using MVC is
a win for longer copies too, but I'll leave that for later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185802 91177308-0d34-0410-b5e6-96231b3b80d8
The stack coloring pass has code to delete stores and loads that become
trivially dead after coloring. Extend it to cope with single instructions
that copy from one frame index to another.
The testcase happens to show an example of this kicking in at the moment.
It did occur in Real Code too though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185705 91177308-0d34-0410-b5e6-96231b3b80d8
The stack coloring pass renumbered frame indexes with a loop of the form:
for each frame index FI
for each instruction I that uses FI
for each use of FI in I
rename FI to FI'
This caused problems if an instruction used two frame indexes F0 and F1
and if F0 was renamed to F1 and F1 to F2. The first time we visited the
instruction we changed F0 to F1, then we changed both F1s to F2.
In other words, the problem was that SSRefs recorded which instructions
used an FI, but not which MachineOperands and MachineMemOperands within
that instruction used it.
This is easily fixed for MachineOperands by walking the instructions
once and processing each operand in turn. There's already a loop to
do that for dead store elimination, so it seemed more efficient to
fuse the two at the block level.
MachineMemOperands are more tricky because they can be shared between
instructions. The patch handles them by making SSRefs an array of
MachineMemOperands rather than an array of MachineInstrs. We might end
up processing the same MachineMemOperand twice, but that's OK because
we always know from the SSRefs index what the original frame index was.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185703 91177308-0d34-0410-b5e6-96231b3b80d8
...now that the problem that prompted the restriction has been fixed.
The original spill-02.py was a compromise because at the time I couldn't
find an example that actually failed without the two scavenging slots.
The version included here did.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185701 91177308-0d34-0410-b5e6-96231b3b80d8
This is another prerequisite for frame-to-frame MVC copies.
I'll commit the patch that makes use of the slot separately.
The downside of trying to test many corner cases with each of the
available addressing modes is that a fair few tests need to account
for the new frame layout. I do still think it's useful to have all
these tests though, since it's something that wouldn't get much coverage
otherwise.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185698 91177308-0d34-0410-b5e6-96231b3b80d8
Add a mapping from register-based <INSN>R instructions to the corresponding
memory-based <INSN>. Use it to cut down on the number of spill loads.
Some instructions extend their operands from smaller fields, so this
required a new TSFlags field to say how big the unextended operand is.
This optimisation doesn't trigger for C(G)R and CL(G)R because in practice
we always combine those instructions with a branch. Adding a test for every
other case probably seems excessive, but it did catch a missed optimisation
for DSGF (fixed in r185435).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185529 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes some cases where we were using full 64-bit division for (sdiv i32, i32)
and (sdiv i64, i32).
The "32" in "SDIVREM32" just refers to the second operand. The first operand
of all *DIVREM*s is a GR128.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185435 91177308-0d34-0410-b5e6-96231b3b80d8
Try to use MVC when spilling the destination of a simple load or the source
of a simple store. As explained in the comment, this doesn't yet handle
the case where the load or store location is also a frame index, since
that could lead to two simultaneous scavenger spills, something the
backend can't handle yet. spill-02.py tests that this restriction kicks in,
but unfortunately I've not yet found a case that would fail without it.
The volatile trick I used for other scavenger tests doesn't work here
because we can't use MVC for volatile accesses anyway.
I'm planning on relaxing the restriction later, hopefully with a test
that does trigger the problem...
Tests @f8 and @f9 also showed that L(G)RL and ST(G)RL were wrongly
classified as SimpleBDX{Load,Store}. It wouldn't be easy to test for
that bug separately, which is why I didn't split out the fix as a
separate patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185434 91177308-0d34-0410-b5e6-96231b3b80d8