into unblended shuffles and a blend.
This is the consistent fallback for the lowering paths that have fast
blend operations available, and its getting quite repetitive.
No functionality changed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218399 91177308-0d34-0410-b5e6-96231b3b80d8
If it's safe to clobber the condition flags, we can do a few extra things:
it's then possible to reset the base register writeback using a SUBS, so
we can try to merge even if the base register isn't dead after the merged
instruction.
This is effectively a (heavily bug-fixed) rewrite of r208992.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218386 91177308-0d34-0410-b5e6-96231b3b80d8
v7M only allows the 16-bit encoding of the 'cps' (Change Processor
State) instruction, and does not have the 32-bit encoding which is
valid from v6T2 onwards.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218382 91177308-0d34-0410-b5e6-96231b3b80d8
pool data being loaded into a vector register.
The comments take the form of:
# ymm0 = [a,b,c,d,...]
# xmm1 = <x,y,z...>
The []s are used for generic sequential data and the <>s are used for
specifically ConstantVector loads. Undef elements are printed as the
letter 'u', integers in decimal, and floating point values as floating
point values. Suggestions on improving the formatting or other aspects
of the display are very welcome.
My primary use case for this is to be able to FileCheck test masks
passed to vector shuffle instructions in-register. It isn't fantastic
for that (no decoding special zeroing semantics or other tricks), but it
at least puts the mask onto an instruction line that could reasonably be
checked. I've updated many of the new vector shuffle lowering tests to
leverage this in their test cases so that we're actually checking the
shuffle masks remain as expected.
Before implementing this, I tried a *bunch* of different approaches.
I looked into teaching the MCInstLower code to scan up the basic block
and find a definition of a register used in a shuffle instruction and
then decode that, but this seems incredibly brittle and complex.
I talked to Hal a lot about the "right" way to do this: attach the raw
shuffle mask to the instruction itself in some form of unencoded
operands, and then use that to emit the comments. I still think that's
the optimal solution here, but it proved to be beyond what I'm up for
here. In particular, it seems likely best done by completing the
plumbing of metadata through these layers and attaching the shuffle mask
in metadata which could have fully automatic dropping when encoding an
actual instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218377 91177308-0d34-0410-b5e6-96231b3b80d8
attempt didn't work out so well. It looks like it will be much better
for introducing extra logic to find a shuffle mask if the finding logic
is totally separate. This also makes it easy to sink the opcode logic
completely out of the routine so we don't re-dispatch across it.
Still no functionality changed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218363 91177308-0d34-0410-b5e6-96231b3b80d8
asm. This can be somewhat expensive and there is no reason to do it
outside of tests or debugging sessions. I'm also likely to make it
significantly more expensive to support more styles of shuffles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218362 91177308-0d34-0410-b5e6-96231b3b80d8
from the MachineInstr into the caller which is already doing a switch
over the instruction.
This will make it more clear how to compute different operands to feed
the comment selection for example.
Also, in a drive-by-fix, don't append an empty comment string (which is
a no-op ultimately).
No functionality changed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218361 91177308-0d34-0410-b5e6-96231b3b80d8
vector shuffles.
This is just the beginning by hoisting it into its own function and
making use of early exit to dramatically simplify the flow of the
function. I'm going to be incrementally refactoring this until it is
a bit less magical how this applies to other instructions, and I can
teach it how to dig a shuffle mask out of a register. Then I plan to
hook it up to VPERMD so we get our mask comments for it.
No functionality changed yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218357 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a bug that is uncovered by a future commit and will
be tested by the test/CodeGen/R600/sgpr-control-flow.ll test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218352 91177308-0d34-0410-b5e6-96231b3b80d8
The previous implementation was extending the live range of SGPRs
by modifying the live intervals directly. This was causing a lot
of machine verification errors when the machine scheduler was enabled.
The new implementation adds pseudo instructions with implicit uses to
extend the live ranges of SGPRs, which works much better.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218351 91177308-0d34-0410-b5e6-96231b3b80d8
These registers can be allocated and used like other 32-bit registers,
but it seems like a likely source for bugs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218350 91177308-0d34-0410-b5e6-96231b3b80d8
Correctly handle special registers: EXEC, EXEC_LO, EXEC_HI, VCC_LO,
VCC_HI, and M0. The previous implementation would assertion fail
when passed these registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218349 91177308-0d34-0410-b5e6-96231b3b80d8
VGPRs are spilled to LDS. This still needs more testing, but
we need to at least enable it at -O0, because the fast register
allocator spills all registers that are live at the end of blocks
and without this some future commits will break the
flat-address-space.ll test.
v2: Only calculate thread id once
v3: Move insertion of spill instructions to
SIRegisterInfo::eliminateFrameIndex()
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218348 91177308-0d34-0410-b5e6-96231b3b80d8
the native AVX2 instructions.
Note that the test case is really frustrating here because VPERMD
requires the mask to be in the register input and we don't produce
a comment looking through that to the constant pool. I'm going to
attempt to improve this in a subsequent commit, but not sure if I will
succeed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218347 91177308-0d34-0410-b5e6-96231b3b80d8
detection. It was incorrectly handling undef lanes by actually treating
an undef lane in the first 128-bit lane as a *numeric* shuffle value.
Fortunately, this almost always DTRT and disabled detecting repeated
patterns. But not always. =/ This patch introduces a much more
principled approach and fixes the miscompiles I spotted by inspection
previously.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218346 91177308-0d34-0410-b5e6-96231b3b80d8
shuffles using the AVX2 instructions. This is the first step of cutting
in real AVX2 support.
Note that I have spotted at least one bug in the test cases already, but
I suspect it was already present and just is getting surfaced. Will
investigate next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218338 91177308-0d34-0410-b5e6-96231b3b80d8
e.g., add w1, w2, w3, lsl #(2 - 1)
This sort of thing comes up in pre-processed assembly playing macro games.
Still validate that it's an assembly time constant. The early exit error check
was just a bit overzealous and disallowed a left paren.
rdar://18430542
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218336 91177308-0d34-0410-b5e6-96231b3b80d8
add VPBLENDD to the InstPrinter's comment generation so we get nice
comments everywhere.
Now that we have the nice comments, I can see the bug introduced by
a silly typo in the commit that enabled VPBLENDD, and have fixed it. Yay
tests that are easy to inspect.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218335 91177308-0d34-0410-b5e6-96231b3b80d8
There are new register classes VCSrc_* which represent operands that
can take an SGPR, VGPR or inline constant. The VSrc_* class is now used
to represent operands that can take an SGPR, VGPR, or a 32-bit
immediate.
This allows us to have more accurate checks for legality of
immediates, since before we had no way to distinguish between operands
that supported any 32-bit immediate and operands which could only
support inline constants.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218334 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
AtomicExpand already had logic for expanding wide loads and stores on LL/SC
architectures, and for expanding wide stores on CmpXchg architectures, but
not for wide loads on CmpXchg architectures. This patch fills this hole,
and makes use of this new feature in the X86 backend.
Only one functionnal change: we now lose the SynchScope attribute.
It is regrettable, but I have another patch that I will submit soon that will
solve this for all of AtomicExpand (it seemed better to split it apart as it
is a different concern).
Test Plan: make check-all (lots of tests for this functionality already exist)
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5404
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218332 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch makes use of AtomicExpandPass in Power for inserting fences around
atomic as part of an effort to remove fence insertion from SelectionDAGBuilder.
As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic
lwsync) instead of sync 0 (heavyweight sync) in many cases.
I also added a test, as there was no test for the barriers emitted by the Power
backend for atomic loads and stores.
Test Plan: new test + make check-all
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5180
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218331 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The goal is to eventually remove all the code related to getInsertFencesForAtomic
in SelectionDAGBuilder as it is wrong (designed for ARM, not really portable, works
mostly by accident because the backends are overly conservative), and repeats the
same logic that goes in emitLeading/TrailingFence.
In this patch, I make AtomicExpandPass insert the fences as it knows better
where to put them. Because this requires getting the fences and not just
passing an IRBuilder around, I had to change the return type of
emitLeading/TrailingFence.
This code only triggers on ARM for now. Because it is earlier in the pipeline
than SelectionDAGBuilder, it triggers and lowers atomic accesses to atomic so
SelectionDAGBuilder does not add barriers anymore on ARM.
If this patch is accepted I plan to implement emitLeading/TrailingFence for all
backends that setInsertFencesForAtomic(true), which will allow both making them
less conservative and simplifying SelectionDAGBuilder once they are all using
this interface.
This should not cause any functionnal change so the existing tests are used
and not modified.
Test Plan: make check-all, benefits from existing tests of atomics on ARM
Reviewers: jfb, t.p.northover
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D5179
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218329 91177308-0d34-0410-b5e6-96231b3b80d8
This was overlooked in r218320, which removed the relocation headers for other
targets. Thanks to Ulrich Weigand for catching it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218327 91177308-0d34-0410-b5e6-96231b3b80d8
VPBLENDD where appropriate even on 128-bit vectors.
According to Agner's tables, this instruction is significantly higher
throughput (can execute on any port) on Haswell chips so we should
aggressively try to form it when available.
Sadly, this loses our delightful shuffle comments. I'll add those back
for VPBLENDD next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218322 91177308-0d34-0410-b5e6-96231b3b80d8
Fix a null pointer dereference when trying to swap the endianness of
fixups in the .eh_frame section in the AArch64 backend.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218311 91177308-0d34-0410-b5e6-96231b3b80d8
undef in the shuffle mask. This shows up when we're printing comments
during lowering and we still have an IR-level constant hanging around
that models undef.
A nice consequence of this is *much* prettier test cases where the undef
lanes actually show up as undef rather than as a particular set of
values. This also allows us to print shuffle comments in cases that use
undef such as the recently added variable VPERMILPS lowering. Now those
test cases have nice shuffle comments attached with their details.
The shuffle lowering for PSHUFB has been augmented to use undef, and the
shuffle combining has been augmented to comprehend it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218301 91177308-0d34-0410-b5e6-96231b3b80d8
trick that I missed.
VPERMILPS has a non-immediate memory operand mode that allows it to do
asymetric shuffles in the two 128-bit lanes. Use this rather than two
shuffles and a blend.
However, it turns out the variable shuffle path to VPERMILPS (and
VPERMILPD, although that one offers no functional differenc from the
immediate operand other than variability) wasn't even plumbed through
codegen. Do such plumbing so that we can reasonably emit
a variable-masked VPERMILP instruction. Also plumb basic comment parsing
and printing through so that the tests are reasonable.
There are still a few tests which don't show the shuffle pattern. These
are tests with undef lanes. I'll teach the shuffle decoding and printing
to handle undef mask entries in a follow-up. I've looked at the masks
and they seem reasonable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218300 91177308-0d34-0410-b5e6-96231b3b80d8
td pattern). Currently we only model the immediate operand variation of
VPERMILPS and VPERMILPD, we should make that clear in the pseudos used.
Will be adding support for the variable mask variant in my next commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218282 91177308-0d34-0410-b5e6-96231b3b80d8
Shift-left immediate with sign-/zero-extensions also works for boolean values.
Update the assert and the test cases to reflect that fact.
This should fix a bug found by Chad.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218275 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This fixes a couple of issues. One is ensuring that AOK_Label rewrite
rules have a lower priority than AOK_Skip rules, as AOK_Skip needs to
be able to skip the brackets properly. The other part of the fix ensures
that we don't overwrite Identifier when looking up the identifier, and
that we use the locally available information to generate the AOK_Label
rewrite in ParseIntelIdentifier. Doing that in CreateMemForInlineAsm
would be problematic since the Start location there may point to the
beginning of a bracket expression, and not necessarily the beginning of
an identifier.
This also means that we don't need to carry around the InternlName field,
which helps simplify the code.
Test Plan: This will be tested on the clang side.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5445
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218270 91177308-0d34-0410-b5e6-96231b3b80d8
We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors.
This patch should preserve all existing behavior with regular optimization levels,
but also use splats whenever possible when optimizing for *size* on any CPU with AVX or AVX2.
The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save
at least 8 bytes (up to 31 bytes) of constant pool data.
Differential Revision: http://reviews.llvm.org/D5347
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218263 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r218254.
The global_atomics.ll test fails with asserts disabled. For some reason,
the compiler fails to produce the atomic no return variants.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218257 91177308-0d34-0410-b5e6-96231b3b80d8
BypassSlowDiv is used by codegen prepare to insert a run-time
check to see if the operands to a 64-bit division are really 32-bit
values and if they are it will do 32-bit division instead.
This is not useful for R600, which has predicated control flow since
both the 32-bit and 64-bit paths will be executed in most cases. It
also increases code size which can lead to more instruction cache
misses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218252 91177308-0d34-0410-b5e6-96231b3b80d8
ISD::MUL and ISD:UMULO are the same except that UMULO sets an overflow
bit. Since we aren't using the overflow bit, we should use ISD::MUL.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218251 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Update segmented-stacks*.ll tests with x32 target case and make
corresponding changes to make them pass.
Test Plan: tests updated with x32 target
Reviewers: nadav, rafael, dschuff
Subscribers: llvm-commits, zinovy.nis
Differential Revision: http://reviews.llvm.org/D5245
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218247 91177308-0d34-0410-b5e6-96231b3b80d8