If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed.
This problem is found in spec2006-197.parser.
For example,
stur w10, [x11, #-4]
stur w10, [x11, #-4]
Then one of the two stur instructions can be removed.
Patch by David Xu!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218569 91177308-0d34-0410-b5e6-96231b3b80d8
and in the target shuffle combining when trying to widen vector
elements.
Previously only one of these was correct, and we didn't correctly
propagate zeroing target shuffle masks (which have a different sentinel
value from undef in non- target shuffle masks now). This isn't just
a missed optimization, this caused us to drop zeroing shuffles on the
floor and miscompile code. The added test case is one example of that.
There are other fixes to the test suite as a consequence of this as well
as restoring the undef elements in some of the masks that were lost when
I brought sanity to the actual *value* of the undef and zero sentinels.
I've also just cleaned up some of the PSHUFD and PSHUFLW and PSHUFHW
combining code, but that code really needs to go. It was a nice initial
attempt, but it isn't very principled and the recursive shuffle combiner
is much more powerful.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218562 91177308-0d34-0410-b5e6-96231b3b80d8
to significantly more sane sentinels. Notably, everywhere else in the
backend's representation of shuffles uses '-1' to represent undef. The
target shuffle masks really shouldn't diverge from that, especially as
in a few places they are manipulated by shared code.
This causes us to lose some undef lanes in various test masks. I want to
get these back, but technically it isn't invalid and there are a *lot*
of bugs here so I want to try to establish a saner baseline for fixing
some of the bugs by aligning the specific senitnel values used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218561 91177308-0d34-0410-b5e6-96231b3b80d8
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.
The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:
z = y / sqrt(x)
into:
z = y * rsqrte(x)
And:
z = y / x
into:
z = y * rcpe(x)
using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .
There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.
Differential Revision: http://reviews.llvm.org/D5484
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218553 91177308-0d34-0410-b5e6-96231b3b80d8
that managed to elude all of my fuzz testing historically. =/
Something changed to allow this code path to actually be exercised and
it was doing bad things. It is especially heavily exercised by the
patterns that emerge when doing AVX shuffles that end up lowered through
the 128-bit code path.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218540 91177308-0d34-0410-b5e6-96231b3b80d8
This has weird operand requirements so it's worthwhile
to have very strict checks for its operands.
Add different combinations of SGPR operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218535 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of moving the first SGPR that is different than the first,
legalize the operand that requires the fewest moves if one
SGPR is used for multiple operands.
This saves extra moves and is also required for some instructions
which require that the same operand be used for multiple operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218532 91177308-0d34-0410-b5e6-96231b3b80d8
Disable the SGPR usage restriction parts of the DAG legalizeOperands.
It now should only be doing immediate folding until it can be replaced
later. The real legalization work is now done by the other
SIInstrInfo::legalizeOperands
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218531 91177308-0d34-0410-b5e6-96231b3b80d8
e.g. v_cndmask_b32 requires the condition operand be an SGPR.
If one of the source operands were an SGPR, that would be considered
the one SGPR use and the condition operand would be illegally moved.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218529 91177308-0d34-0410-b5e6-96231b3b80d8
No test since the current SIISelLowering::legalizeOperands
effectively hides this, and the general uses seem to only fire
on SALU instructions which don't have modifiers between
the operands.
When trying to use legalizeOperands immediately after
instruction selection, it now sees a lot more patterns
it did not see before which break on this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218527 91177308-0d34-0410-b5e6-96231b3b80d8
layer of tie-breaking sorting, it really helps to check that you're in
a tie first. =] Otherwise the whole thing cycles infinitely. Test case
added, another one found through fuzz testing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218523 91177308-0d34-0410-b5e6-96231b3b80d8
AVX support.
New test cases included. Note that none of the existing test cases
covered these buggy code paths. =/ Also, it is clear from this that
SHUFPS and SHUFPD are the most bug prone shuffle instructions in x86. =[
These were all detected by fuzz-testing. (I <3 fuzz testing.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218522 91177308-0d34-0410-b5e6-96231b3b80d8
based on the Function. This is currently used to implement
mips16 support in the mips backend via the existing module
pass resetting the subtarget.
Things to note:
a) This involved running resetTargetOptions before creating a
new subtarget so that code generation options like soft-float
could be recognized when creating the new subtarget. This is
to deal with initialization code in isel lowering that only
paid attention to the initial value.
b) Many of the existing testcases weren't using the soft-float
feature correctly. I've corrected these based on the check
values assuming that was the desired behavior.
c) The mips port now pays attention to the target-cpu and
target-features strings when generating code for a particular
function. I've removed these from one function where the
requested cpu and features didn't match the check lines in
the testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218492 91177308-0d34-0410-b5e6-96231b3b80d8
Machine Sink uses loop depth information to select between successors BBs to
sink machine instructions into, where BBs within smaller loop depths are
preferable. This patch adds support for choosing between successors by using
profile information from BlockFrequencyInfo instead, whenever the information
is available.
Tested it under SPEC2006 train (average of 30 runs for each program); ~1.5%
execution speedup in average on x86-64 darwin.
<rdar://problem/18021659>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218472 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
I originally tried doing this specifically for X86 in the backend in D5091,
but it was rather brittle and generally running too late to be general.
Furthermore, other targets may want to implement similar optimizations.
So I reimplemented it at the IR-level, fitting it into AtomicExpandPass
as it interacts with that pass (which could not be cleanly done before
at the backend level).
This optimization relies on a new target hook, which is only used by X86
for now, as the correctness of the optimization on other targets remains
an open question. If it is found correct on other targets, it should be
trivial to enable for them.
Details of the optimization are discussed in D5091.
Test Plan: make check-all + a new test
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5422
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218455 91177308-0d34-0410-b5e6-96231b3b80d8
These instructions do not indicate they are extendable or the
number of bits in the extendable operand. Rename to match
architected names. Add a testcase for the intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218453 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The N32/N64 ABI's require that structs passed in registers are laid out
such that spilling the register with 'sd' places the struct at the lowest
address. For little endian this is trivial but for big-endian it requires
that structs are shifted into the upper bits of the register.
We also require that structs passed in registers have the 'inreg'
attribute for big-endian N32/N64 to work correctly. This is because the
tablegen-erated calling convention implementation only has access to the
lowered form of struct arguments (one or more integers of up to 64-bits
each) and is unable to determine the original type.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5286
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218451 91177308-0d34-0410-b5e6-96231b3b80d8
v4f64 and v8f32 shuffles when they are lane-crossing. We have fully
general lane-crossing permutation functions in AVX2 that make this easy.
Part of this also changes exactly when and how these vectors are split
up when we don't have AVX2. This isn't always a win but it usually is
a win, so on the balance I think its better. The primary regressions are
all things that just need to be fixed anyways such as modeling when
a blend can be completely accomplished via VINSERTF128, etc.
Also, this highlights one of the few remaining big features: we do
a really poor job of inserting elements into AVX registers efficiently.
This completes almost all of the big tricks I have in mind for AVX2. The
only things left that I plan to add:
1) element insertion smarts
2) palignr and other fairly specialized lowerings when they happen to
apply
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218449 91177308-0d34-0410-b5e6-96231b3b80d8
256-bit vectors with lane-crossing.
Rather than immediately decomposing to 128-bit vectors, try flipping the
256-bit vector lanes, shuffling them and blending them together. This
reduces our worst case shuffle by a pretty significant margin across the
board.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218446 91177308-0d34-0410-b5e6-96231b3b80d8
lowering where it only used the mask of the low 128-bit lane rather than
the entire mask.
This allows the new lowering to correctly match the unpack patterns for
v8i32 vectors.
For reference, the reason that we check for the the entire mask rather
than checking the repeated mask is because the repeated masks don't
abide by all of the invariants of normal masks. As a consequence, it is
safer to use the full mask with functions like the generic equivalence
test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218442 91177308-0d34-0410-b5e6-96231b3b80d8
lowering.
This completes the basic AVX2 feature support, but there are still some
improvements I'd like to do to really get the last mile of performance
here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218440 91177308-0d34-0410-b5e6-96231b3b80d8
I made a mistake in the previous commit and produced the wrong pattern.
Fix that. Also make one more shuffle pattern byte-based rather than
word-based, and add two more blend patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218439 91177308-0d34-0410-b5e6-96231b3b80d8
shuffles rather than word shuffles.
As you might guess, these were built starting from the word shuffle test
cases and I failed to properly port a bunch of them and left them as
widened word shuffle test cases. We still have a couple of tests that
check our ability to widen shuffles, but now we will test the actual
byte shuffle quite a bit better.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218438 91177308-0d34-0410-b5e6-96231b3b80d8
missing test cases for it.
Unsurprisingly, without test cases, there were bugs here. Surprisingly,
this bug wasn't caught at compile time. Yep, there is an X86ISD::BLENDV.
It isn't wired to anything. Oops. I'll fix than next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218434 91177308-0d34-0410-b5e6-96231b3b80d8
lowering.
This also implements the fancy blend lowering for v16i16 using AVX2 and
teaches the X86 backend to print shuffle masks for 256-bit PSHUFB
and PBLENDW instructions. It also makes the mask decoding correct for
PBLENDW instructions. The yaks, they are legion.
Tests are updated accordingly. There are some missing tests for the
VBLENDVB lowering, but I'll add those in a follow-up as this commit has
accumulated enough cruft already.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218430 91177308-0d34-0410-b5e6-96231b3b80d8
If it's safe to clobber the condition flags, we can do a few extra things:
it's then possible to reset the base register writeback using a SUBS, so
we can try to merge even if the base register isn't dead after the merged
instruction.
This is effectively a (heavily bug-fixed) rewrite of r208992.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218386 91177308-0d34-0410-b5e6-96231b3b80d8
pool data being loaded into a vector register.
The comments take the form of:
# ymm0 = [a,b,c,d,...]
# xmm1 = <x,y,z...>
The []s are used for generic sequential data and the <>s are used for
specifically ConstantVector loads. Undef elements are printed as the
letter 'u', integers in decimal, and floating point values as floating
point values. Suggestions on improving the formatting or other aspects
of the display are very welcome.
My primary use case for this is to be able to FileCheck test masks
passed to vector shuffle instructions in-register. It isn't fantastic
for that (no decoding special zeroing semantics or other tricks), but it
at least puts the mask onto an instruction line that could reasonably be
checked. I've updated many of the new vector shuffle lowering tests to
leverage this in their test cases so that we're actually checking the
shuffle masks remain as expected.
Before implementing this, I tried a *bunch* of different approaches.
I looked into teaching the MCInstLower code to scan up the basic block
and find a definition of a register used in a shuffle instruction and
then decode that, but this seems incredibly brittle and complex.
I talked to Hal a lot about the "right" way to do this: attach the raw
shuffle mask to the instruction itself in some form of unencoded
operands, and then use that to emit the comments. I still think that's
the optimal solution here, but it proved to be beyond what I'm up for
here. In particular, it seems likely best done by completing the
plumbing of metadata through these layers and attaching the shuffle mask
in metadata which could have fully automatic dropping when encoding an
actual instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218377 91177308-0d34-0410-b5e6-96231b3b80d8
the native AVX2 instructions.
Note that the test case is really frustrating here because VPERMD
requires the mask to be in the register input and we don't produce
a comment looking through that to the constant pool. I'm going to
attempt to improve this in a subsequent commit, but not sure if I will
succeed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218347 91177308-0d34-0410-b5e6-96231b3b80d8
detection. It was incorrectly handling undef lanes by actually treating
an undef lane in the first 128-bit lane as a *numeric* shuffle value.
Fortunately, this almost always DTRT and disabled detecting repeated
patterns. But not always. =/ This patch introduces a much more
principled approach and fixes the miscompiles I spotted by inspection
previously.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218346 91177308-0d34-0410-b5e6-96231b3b80d8
This testcase was not testing what it meant: because there were only two checks for
dmb {{ish}} in the second function, it could have missed a bug where one of the three
required dmb {{ish}} became dmb {{ishst}}. As I was fixing it, I also added
CHECK-LABELs to make it a bit less brittle.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218341 91177308-0d34-0410-b5e6-96231b3b80d8
shuffles using the AVX2 instructions. This is the first step of cutting
in real AVX2 support.
Note that I have spotted at least one bug in the test cases already, but
I suspect it was already present and just is getting surfaced. Will
investigate next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218338 91177308-0d34-0410-b5e6-96231b3b80d8
add VPBLENDD to the InstPrinter's comment generation so we get nice
comments everywhere.
Now that we have the nice comments, I can see the bug introduced by
a silly typo in the commit that enabled VPBLENDD, and have fixed it. Yay
tests that are easy to inspect.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218335 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
AtomicExpand already had logic for expanding wide loads and stores on LL/SC
architectures, and for expanding wide stores on CmpXchg architectures, but
not for wide loads on CmpXchg architectures. This patch fills this hole,
and makes use of this new feature in the X86 backend.
Only one functionnal change: we now lose the SynchScope attribute.
It is regrettable, but I have another patch that I will submit soon that will
solve this for all of AtomicExpand (it seemed better to split it apart as it
is a different concern).
Test Plan: make check-all (lots of tests for this functionality already exist)
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5404
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218332 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch makes use of AtomicExpandPass in Power for inserting fences around
atomic as part of an effort to remove fence insertion from SelectionDAGBuilder.
As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic
lwsync) instead of sync 0 (heavyweight sync) in many cases.
I also added a test, as there was no test for the barriers emitted by the Power
backend for atomic loads and stores.
Test Plan: new test + make check-all
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5180
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218331 91177308-0d34-0410-b5e6-96231b3b80d8
VPBLENDD where appropriate even on 128-bit vectors.
According to Agner's tables, this instruction is significantly higher
throughput (can execute on any port) on Haswell chips so we should
aggressively try to form it when available.
Sadly, this loses our delightful shuffle comments. I'll add those back
for VPBLENDD next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218322 91177308-0d34-0410-b5e6-96231b3b80d8
undef in the shuffle mask. This shows up when we're printing comments
during lowering and we still have an IR-level constant hanging around
that models undef.
A nice consequence of this is *much* prettier test cases where the undef
lanes actually show up as undef rather than as a particular set of
values. This also allows us to print shuffle comments in cases that use
undef such as the recently added variable VPERMILPS lowering. Now those
test cases have nice shuffle comments attached with their details.
The shuffle lowering for PSHUFB has been augmented to use undef, and the
shuffle combining has been augmented to comprehend it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218301 91177308-0d34-0410-b5e6-96231b3b80d8
trick that I missed.
VPERMILPS has a non-immediate memory operand mode that allows it to do
asymetric shuffles in the two 128-bit lanes. Use this rather than two
shuffles and a blend.
However, it turns out the variable shuffle path to VPERMILPS (and
VPERMILPD, although that one offers no functional differenc from the
immediate operand other than variability) wasn't even plumbed through
codegen. Do such plumbing so that we can reasonably emit
a variable-masked VPERMILP instruction. Also plumb basic comment parsing
and printing through so that the tests are reasonable.
There are still a few tests which don't show the shuffle pattern. These
are tests with undef lanes. I'll teach the shuffle decoding and printing
to handle undef mask entries in a follow-up. I've looked at the masks
and they seem reasonable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218300 91177308-0d34-0410-b5e6-96231b3b80d8
We manage to generate all of the matching instructions (and a lot more) via
the reciprocal optimization function - even if we completely remove the square
root optimization. With CHECK_NEXT, we assure that we're executing the
expected square root optimization paths and not generating extra insts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218284 91177308-0d34-0410-b5e6-96231b3b80d8