Currently, if you use a MultiArg<> option, then printing out the help/usage
message will cause an assert. This fixes getOptionHelpName() to work with
MultiArg Options.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215770 91177308-0d34-0410-b5e6-96231b3b80d8
We were setting the comdat when functions were copied in the initial pass, but
not when they were linked only when we found out that they are needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215765 91177308-0d34-0410-b5e6-96231b3b80d8
The floating-point value positive zero (+0.0) is a valid immedate value
according to isFPImmLegal. As a result AArch64 FastISel went ahead and
used the immediate version of fmov to materialize the constant.
The problem is that the immediate version of fmov cannot encode an imediate for
postive zero. Instead a fmov from the zero register was supposed to be used in
this case.
This fix adds handling for this special case and uses fmov from the zero
register to materialize a positive zero (negative zeroes go to the constant
pool).
There is no test case for this, because this code is currently dead. It will be
enabled in a future commit and I will add a test case in a separate commit
after that.
This fixes <rdar://problem/18027157>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215753 91177308-0d34-0410-b5e6-96231b3b80d8
Note: This reapplies r215582 without any modifications. The refactoring wasn't
responsible for the buildbot failures.
Original commit message:
Cleanup and prepare constant materialization code for future commits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215752 91177308-0d34-0410-b5e6-96231b3b80d8
MSVC gives this awesome diagnostic:
..\lib\Target\X86\X86ISelLowering.cpp(7085) : error C2971: 'llvm::VariadicFunction1' : template parameter 'Func' : 'isShuffleEquivalentImpl' : a local variable cannot be used as a non-type argument
..\include\llvm/ADT/VariadicFunction.h(153) : see declaration of 'llvm::VariadicFunction1'
..\lib\Target\X86\X86ISelLowering.cpp(7061) : see declaration of 'isShuffleEquivalentImpl'
Using an anonymous namespace makes the problem go away.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215744 91177308-0d34-0410-b5e6-96231b3b80d8
Ordinarily (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
is only done if the add has one use. If the resulting constant
add can be folded into an addressing mode, force this to happen
for the pointer operand.
This ends up happening a lot because of how LDS objects are allocated.
Since the globals are allocated next to each other, acessing the first
element of the second object is directly indexed by a shifted pointer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215739 91177308-0d34-0410-b5e6-96231b3b80d8
As Jim pointed out this assert isn't really needed to test for correctness,
because the code right afterwards does the same check and falls-back to
SelectionDAG - as intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215735 91177308-0d34-0410-b5e6-96231b3b80d8
The default assumes that a 16-bit signed offset is used.
LDS instruction use a 16-bit unsigned offset, so it wasn't
being used in some cases where it was assumed a negative offset
could be used.
More should be done here, but first isLegalAddressingMode needs
to gain an addressing mode argument. For now, copy most of the rest
of the default implementation with the immediate offset change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215732 91177308-0d34-0410-b5e6-96231b3b80d8
In a previous iteration of the pass, we would try to compensate for
writeback by updating later instructions and/or inserting a SUBS to
reset the base register if necessary.
Since such a SUBS sets the condition flags it's not generally safe to do
this. For now, only merge LDR/STRs if there is no writeback to the base
register (LDM that loads into the base register) or the base register is
killed by one of the merged instructions. These cases are clear wins
both in terms of instruction count and performance.
Also add three new test cases, and update the existing ones accordingly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215729 91177308-0d34-0410-b5e6-96231b3b80d8
This adds some code back that was deleted in r92053. The location of the
last merged memory operation needs to be kept up-to-date since MemOps
may be in a different order to the original instruction stream to
allow merging (since registers need to be in ascending order). Also
simplify the logic to determine BaseKill using findRegisterUseOperandIdx
to use an equivalent function call instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215728 91177308-0d34-0410-b5e6-96231b3b80d8
We actually need to return the register into which we materialized the constant
and not just "true" for success. This code is currently partially dead, that is
why it didn't trigger any failures yet. Once I change the order of the constant
materialization this code will be fully exercised.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215727 91177308-0d34-0410-b5e6-96231b3b80d8
Replace the old code in GVN and BBVectorize with it. Update SimplifyCFG to use
it.
Patch by Björn Steinbrink!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215723 91177308-0d34-0410-b5e6-96231b3b80d8
I had deferred adding this test case until I could get it down to a
reasonable size. That's done now.
Thanks,
Bill
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215711 91177308-0d34-0410-b5e6-96231b3b80d8
the new shuffle lowering and an implementation for v4 shuffles.
This allows us to handle non-half-crossing shuffles directly for v4
shuffles, both integer and floating point. This currently misses places
where we could perform the blend via UNPCK instructions, but otherwise
generates equally good or better code for the test cases included to the
existing vector shuffle lowering. There are a few cases that are
entertainingly better. ;]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215702 91177308-0d34-0410-b5e6-96231b3b80d8
BLENDPS, BLENDPD, and PBLENDW instructions into pretty shuffle comments.
These will be used in my next commit as part of test cases for AVX
shuffles which can directly use blend in more places.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215701 91177308-0d34-0410-b5e6-96231b3b80d8
These are system-only instructions for CPUs with virtualization
extensions, allowing a hypervisor easy access to all of the various
different AArch32 registers.
rdar://problem/17861345
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215700 91177308-0d34-0410-b5e6-96231b3b80d8
elements of a shuffle mask and simplify how it works. No functionality
changed now that the bug that was here has been fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215696 91177308-0d34-0410-b5e6-96231b3b80d8
target-specific shuffl DAG combines.
We were recognizing the paired shuffles backwards. This code needs to be
replaced anyways as we have the same functionality elsewhere, but I'll
do the refactoring in a follow-up, this is the minimal fix to the
behavior.
In addition to fixing miscompiles with the new vector shuffle lowering,
it also causes the canonicalization to kick in much better, selecting
the smaller encoding variants in lots of places in the new AVX path.
This still isn't quite ideal as we don't need both the shufpd and the
punpck instructions, but that'll get fixed in a follow-up patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215690 91177308-0d34-0410-b5e6-96231b3b80d8
broken logic for merging shuffle masks in the face of SM_SentinelZero
mask operands.
While these are '-1' they don't mean 'undef' the way '-1' means in the
pre-legalized shuffle masks. Instead, they mean that the shuffle
operation is forcibly zeroing that lane. Reflect this and explicitly
handle it in a bunch of places. In one place the effect is equivalent
but much more clear. In the rest it was really weirdly broken.
Also, rewrite the entire merging thing to be a more directy operation
with a single loop and just doing math to map the indices through the
various masks.
Also add a bunch of asserts to try to make in extremely clear what the
different masks can possibly look like.
Finally, add some comments to clarify that we're merging shuffle masks
*up* here rather than *down* as we do everywhere else, and thus the
logic is quite confusing.
Thanks to several different people for sending test cases, and for
Robert Khasanov for an initial attempt at fixing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215687 91177308-0d34-0410-b5e6-96231b3b80d8
The LDinto_toc pattern has been part of 64-bit PowerPC for a long
time, and represents loading from a memory location into the TOC
register (X2). However, this pattern doesn't explicitly record that
it modifies that register. This patch adds the missing dependency.
It was very surprising to me that this has never shown up as a problem
in the past, and that we only saw this problem recently in a single
scenario when building a self-hosted clang. It turns out that in most
cases we have another dependency present that keeps the LDinto_toc
instruction tied in place. LDinto_toc is used for TOC restore
following a call site, so this is a typical sequence:
BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
LDinto_toc 24, %X1
ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use>
Because the LDinto_toc is inserted prior to the ADJCALLSTACKUP, there
is a natural anti-dependency between the two that keeps it in place.
Therefore we don't usually see a problem. However, in one particular
case, one call is followed immediately by another call, and the second
call requires a parameter that is a TOC-relative address. This is the
code sequence:
BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
LDinto_toc 24, %X1
ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use>
ADJCALLSTACKDOWN 96, %R1<imp-def>, %R1<imp-use>
%vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
%vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39
Note that the back-to-back stack adjustments are the same size! The
back end is smart enough to recognize this and optimize them away:
BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
LDinto_toc 24, %X1
%vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
%vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39
Now there is nothing to prevent the ADDIStocHA instruction from moving
ahead of the LDinto_toc instruction, and because of the longest-path
heuristic, this is what happens.
With the accompanying patch, %X2 is represented as an implicit def:
BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
LDinto_toc 24, %X1, %X2<imp-def,dead>
ADJCALLSTACKUP 96, 0, %R1<imp-def,dead>, %R1<imp-use>
ADJCALLSTACKDOWN 96, %R1<imp-def,dead>, %R1<imp-use>
%vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
%vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39
So now when the two stack adjustments are removed, ADDIStocHA is
prevented from being moved above LDinto_toc.
I have not yet created a test case for this, because the original
failure occurs on a relatively large function that needs reduction.
However, this is a fairly serious bug, despite its infrequency, and I
wanted to get this patch onto the list as soon as possible so that it
can be considered for a 3.5 backport. I'll work on whittling down a
test case.
Have we missed the boat for 3.5 at this point?
Thanks,
Bill
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215685 91177308-0d34-0410-b5e6-96231b3b80d8
FastEmit_i won't always succeed to materialize an i32 constant and just fail.
This would trigger a fall-back to SelectionDAG, which is really not necessary.
This fix will first fall-back to a constant pool load to materialize the constant
before giving up for good.
This fixes <rdar://problem/18022633>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215682 91177308-0d34-0410-b5e6-96231b3b80d8
When a call site with noalias metadata is inlined, that metadata can be
propagated directly to the inlined instructions (only those that might access
memory because it is not useful on the others). Prior to inlining, the noalias
metadata could express that a call would not alias with some other memory
access, which implies that no instruction within that called function would
alias. By propagating the metadata to the inlined instructions, we preserve
that knowledge.
This should complete the enhancements requested in PR20500.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215676 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts:
r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants."
r215594 "[FastISel][X86] Use XOR to materialize the "0" value."
r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization."
r215591 "[FastISel][AArch64] Make use of the zero register when possible."
r215588 "[FastISel] Let the target decide first if it wants to materialize a constant."
r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI."
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215673 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change. This will be used by the new FMA intrinsic lowering
code.
We can probably add NO_EXC here as well, I am just not too familiar with this
part of AVX512 yet. We can add that later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215662 91177308-0d34-0410-b5e6-96231b3b80d8