has settled without incident, removing the x86-specific and overly
strict 'isVectorSplat' routine in favor of generic and more powerful
splat detection.
The primary motivation and result of this is that the x86 backend can
now see through splats which contain undef elements. This is essential
if we are using a widening form of legalization and I've updated a test
case to also run in that mode as before this change the generated code
for the test case was completely scalarized.
This version of the patch much more carefully handles the undef lanes.
- We aren't overly conservative about them in the shift lowering
(where we will never use the splat itself).
- One place where the splat would have been re-used by the existing code
now explicitly constructs a new constant splat that will be safe.
- The broadcast lowering is much more reasonable with undefs by doing
a correct check of whether the splat is the only user of a loaded
value, checking that the splat actually crosses multiple lanes before
using a broadcast, and handling broadcasts of non-constant splats.
As a consequence of the last bullet, the weird usage of vpshufd instead
of vbroadcast is gone, and we actually can lower an AVX splat with
vbroadcastss where before we emitted a really strange pattern of
a vector load and a manual splat across the vector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212602 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This test ensures that we can correctly specify a full Windows path to the clang ASAN runtime libraries. This is in preparation to fix PR20246.
Reviewers: rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4427
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212580 91177308-0d34-0410-b5e6-96231b3b80d8
This will allow the "-s" flag to implemented in the future as it
is in darwin’s nm(1) to list symbols only in the specified section.
Given a LGTM by Shankar Easwaran who originally implemented
the support for lvm-nm’s -print-armap and archive map symbols.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212576 91177308-0d34-0410-b5e6-96231b3b80d8
Loading will generally extend to an f32 or an 64, so make sure
to match those patterns directly to load into the FPR16 register
class directly rather than going through the integer GPRs.
This also eliminates an extra step in the convert-to-f64 path
which was first converting to f32 and then to f64 from there.
rdar://17594379
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212573 91177308-0d34-0410-b5e6-96231b3b80d8
BasicAA contains knowledge of certain intrinsics, such as memcpy and memset,
and uses that information to form more-accurate answers to CallSite vs. Loc
ModRef queries. Unfortunately, it did not use this information when answering
CallSite vs. CallSite queries.
Generically, when an intrinsic takes one or more pointers and the intrinsic is
marked only to read/write from its arguments, the offset/size is unknown. As a
result, the generic code that answers CallSite vs. CallSite (and CallSite vs.
Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's
arguments. While BasicAA's CallSite vs. Loc override could use more-accurate
size information for some intrinsics, it did not do the same for CallSite vs.
CallSite queries.
This change refactors the intrinsic-specific logic in BasicAA into a generic AA
query function: getArgLocation, which is overridden by BasicAA to supply the
intrinsic-specific knowledge, and used by AA's generic implementation. This
allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and
CallSite vs. CallSite queries, and simplifies the BasicAA implementation.
Currently, only one function, Mac's memset_pattern16, is handled by BasicAA
(all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's
getModRefBehavior override now also returns OnlyAccessesArgumentPointees for
this function (which is an improvement).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212572 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit 5b55a47e94e28fbb56d0cd5d72c3db9105c15b4c.
A test case was found to crash after this was applied. I'll file a bug to track fixing this with the test case needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212550 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches how to fold a shuffle according to rule:
shuffle (shuffle (x, undef, M0), undef, M1) -> shuffle(x, undef, M2)
We do this only if the resulting mask M2 is legal; this is to avoid introducing
illegal shuffles that are potentially expanded into a sub-optimal sequence
of target specific dag nodes.
This patch has the advantage of being target independent, since it works on ISD
nodes. Therefore, all targets (not only x86) can take advantage of this rule.
The idea behind this patch is that most shuffle pairs can be safely combined
before we run the legalizer on vector operations. This allows us to
combine/simplify dag nodes earlier in the process and not only immediately
before instruction selection stage.
That said. This patch is not meant to replace any existing target specific
combine rules; backends might still introduce new shuffles during legalization
stage. Also, this rule is very simple and avoids to aggressively optimize
shuffles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212539 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Follow on to r212519 to improve the encapsulation and limit the scope of the enums.
Also merged two very similar parser functions, fixed a bug where ASE's
were not being reported, and marked CPR1's as being 128-bit when MSA is
enabled.
Differential Revision: http://reviews.llvm.org/D4384
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212522 91177308-0d34-0410-b5e6-96231b3b80d8
aggressively from the x86 shuffle lowering to the generic SDAG vector
shuffle formation code.
This code already tried to fold away shuffles of splats! It just had
lots of bugs and couldn't handle the case my new x86 shuffle lowering
needed.
First, it failed to correctly compute whether N2 was undef because it
pre-computed this, then did transformations which could *make* N2 undef,
then failed to ever re-consider the precomputed state.
Second, it didn't look through bitcasts at all, even in the safe cases
where they are just element-type bitcasts with no change to the number
of elements.
Third, it didn't handle all-zero bit casts nicely the way my code in the
x86 side of things did, which is essential to getting good zext-shuffle
lowerings.
But all of these are generic. I just ported the code down to this layer
and fixed the surrounding bugs. Tests exercising this in the x86 backend
still pass and some silly code in widen_cast-6.ll gets better. I updated
that test to be a bit more precise but it's still pretty unclear what
the value of the test is in this day and age.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212517 91177308-0d34-0410-b5e6-96231b3b80d8
As destination k0 is allowed but not as predicate/writemask.
I also modified the test to allow checking of error messages by the assembler.
I applied a similar approach to the test ret.s in the same directory.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212504 91177308-0d34-0410-b5e6-96231b3b80d8
When combining a sequence of two PSHUFD dag nodes into a single PSHUFD,
make sure that we assign the correct type to the resulting PSHUFD.
X86ISD::PSHUFD dag nodes can be either MVT::v4i32 or MVT::v4f32.
Before this change, an assertion failure was triggered in method
'DAGCombinerInfo::CombineTo' when trying to combine the shuffles from the test
below into a single PSHUFD.
define <4 x float> @test1(<4 x float> %V) {
%1 = shufflevector <4 x float> %V, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
%2 = shufflevector <4 x float> %1, <4 x float> undef, <4 x i32> <i32 3, i32 0, i32 2, i32 1>
ret <4 x float> %2
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212498 91177308-0d34-0410-b5e6-96231b3b80d8
Add custom lowering code for signed multiply instruction selection, because the
default FastISel instruction selection for ISD::MUL will use unsigned multiply
for the i8 type and signed multiply for all other types. This would set the
incorrect flags for the overflow check.
This fixes <rdar://problem/17549300>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212493 91177308-0d34-0410-b5e6-96231b3b80d8
Currently AArch64FastISel crashes if it tries to extend an integer into an
MVT::i128. This can happen by creating 128 bit integers like so:
typedef unsigned int uint128_t __attribute__((mode(TI)));
typedef int sint128_t __attribute__((mode(TI)));
This patch makes EmitIntExt check for their presence and then falls back to
SelectionDAG.
Tests included.
rdar://17516686
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212492 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds to an existing loop over phi nodes in SimplifyCondBranchToCondBranch() to check for trapping ops and bails out of the optimization if we find one of those.
The test cases verify that trapping ops are not hoisted and non-trapping ops are still optimized as expected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212490 91177308-0d34-0410-b5e6-96231b3b80d8
Arguments passed as "byval align" should get the specified alignment
in the parameter save area. There was some code in PPCISelLowering.cpp
that attempted to implement this, but this didn't work correctly:
while code did update the ArgOffset value, it neglected to update
the PtrOff value (which was already computed from the old ArgOffset),
and it also neglected to update GPR_idx -- fields skipped due to
alignment in the save area must likewise be skipped in GPRs.
This patch fixes and simplifies this logic by:
- handling argument offset alignment right at the beginning
of argument processing, using a new helper routine
CalculateStackSlotAlignment (this avoids having to update
PtrOff and other derived values later on)
- not tracking GPR_idx separately, but always computing the
correct GPR_idx for each argument *from* its ArgOffset
- removing some redundant computation in LowerFormalArguments:
MinReservedArea must equal ArgOffset after argument processing,
so there's no use in computing it twice.
[This doesn't change the behavior of the current clang front-end,
since that never creates "byval align" arguments at the moment.
This will change with a follow-on patch, however.]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212476 91177308-0d34-0410-b5e6-96231b3b80d8
lanes in vector splats.
The core problem here is that undef lanes can't *unilaterally* be
considered to contribute to splats. Their handling needs to be more
cautious. There is also a reported failure of the nightly testers
(thanks Tobias!) that may well stem from the same core issue. I'm going
to fix this theoretical issue, factor the APIs a bit better, and then
verify that I don't see anything bad with Tobias's reduction from the
test suite before recommitting.
Original commit message for r212324:
[x86] Generalize BuildVectorSDNode::getConstantSplatValue to work for
any constant, constant FP, or undef splat and to tolerate any undef
lanes in a splat, then replace all uses of isSplatVector in X86's
lowering with it.
This fixes issues where undef lanes in an otherwise splat vector would
prevent the splat logic from firing. It is a touch more awkward to use
this interface, but it is much more accurate. Suggestions for better
interface structuring welcome.
With this fix, the code generated with the widening legalization
strategy for widen_cast-4.ll is *dramatically* improved as the special
lowering strategies for a v16i8 SRA kick in even though the high lanes
are undef.
We also get a slightly different choice for broadcasting an aligned
memory location, and use vpshufd instead of vbroadcastss. This looks
like a minor win for pipelining and domain crossing, but a minor loss
for the number of micro-ops. I suspect its a wash, but folks can
easily tweak the lowering if they want.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212475 91177308-0d34-0410-b5e6-96231b3b80d8
Fixes various bugs with reordering loads and stores.
Scalarized vector loads weren't collecting the chains
at all.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212473 91177308-0d34-0410-b5e6-96231b3b80d8
Generate entire ASan asm instrumentation in MC without
relying on runtime helper functions.
Patch by Yuri Gorshenin.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212455 91177308-0d34-0410-b5e6-96231b3b80d8
essentially a DAG combine that never gets a chance to run.
We might typically expect DAG combining to remove shuffles-of-splats and
other similar patterns, but we don't get a chance to run the DAG
combiner when we recursively form sub-shuffles during the lowering of
a shuffle. So instead hand-roll a really important combine directly into
the lowering code to detect shuffles-of-splats, especially shuffles of
an all-zero splat which needn't even have the same element width, etc.
This lets the new vector shuffle lowering handle shuffles which
implement things like zero-extension really nicely. This will become
even more important when I wire the legalization of zero-extension to
vector shuffles with the new widening legalization strategy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212444 91177308-0d34-0410-b5e6-96231b3b80d8
We've been performing the wrong operation on ARM for "atomicrmw nand" for
years, since "a NAND b" is "~(a & b)" rather than ARM's very tempting "a & ~b".
This bled over into the generic expansion pass.
So I assume no-one has ever actually tried to do an atomic nand in the real
world. Oh well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212443 91177308-0d34-0410-b5e6-96231b3b80d8
This completes the handling for DLL import storage symbols when lowering
instructions. A DLL import storage symbol must have an additional load
performed prior to use. This is applicable to variables and functions.
This is particularly important for non-function symbols as it is possible to
handle function references by emitting a thunk which performs the translation
from the unprefixed __imp_ symbol to the proper symbol (although, this is a
non-optimal lowering). For a variable symbol, no such thunk can be
accommodated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212431 91177308-0d34-0410-b5e6-96231b3b80d8
It is not clear if llvm.global_ctors should or should not be in llvm.metadata,
but in practice it is not and we need to ignore it for LTO.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212351 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The tests in this directory are intended to test a single IR instruction
with as few dependencies on other instructions as possible. The aim is to
be very confident that each LLVM-IR instruction is implemented correctly and
with the optimal sequence of instructions, as well as to make it easy to tell
what is tested, and make it easier to bring up new ISA revisions in the
future. This gives us a good foundation on which to test bigger things.
These particular tests will allow testing that MIPS32r6/MIPS64r6 generate
the correct return instruction for returns, calls, and indirect branches.
This will be a bit tricky since the assembly text is identical but the
instruction is actually different. On MIPS32r6/MIPS64r6 'jr $rs' has been
removed in favour of the equivalent 'jalr $zero, $rs'. 'jr $rs' remains as
an alias for 'jalr $zero, $rs'.
Differential Revision: http://reviews.llvm.org/D4266
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212345 91177308-0d34-0410-b5e6-96231b3b80d8
This is useful for functions that are not actually available externally but
referenced by a vtable of some kind. Clang emits functions like this for the MS
ABI.
PR20182.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212337 91177308-0d34-0410-b5e6-96231b3b80d8
The linker relies on relocation type info (e.g. is it a branch?) to perform the
correct actions, so we should keep that even when we end up using a scattered
relocation for whatever reason.
rdar://problem/17553104
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212333 91177308-0d34-0410-b5e6-96231b3b80d8
There were two issues here:
1. At the very least, scattered relocations cannot use the same code to
determine the corresponding symbol being referred to. For some reason we
pretend there is no symbol, even when one actually exists in the symtab, so to
match this behaviour getRelocationSymbol should simply return symbols_end for
scattered relocations.
2. Printing "-" when we can't get a symbol (including the scattered case, but
not exclusively), isn't that helpful. In both cases there *is* interesting
information in that field, so we should print it. As hex will do.
Small part of rdar://problem/17553104
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212332 91177308-0d34-0410-b5e6-96231b3b80d8
We have detected a documentation bug in the encoding tables of the released
MIPS64r6 specification that has resulted in the wrong encodings being used for
these instructions in LLVM. This commit corrects them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212330 91177308-0d34-0410-b5e6-96231b3b80d8
any constant, constant FP, or undef splat and to tolerate any undef
lanes in a splat, then replace all uses of isSplatVector in X86's
lowering with it.
This fixes issues where undef lanes in an otherwise splat vector would
prevent the splat logic from firing. It is a touch more awkward to use
this interface, but it is much more accurate. Suggestions for better
interface structuring welcome.
With this fix, the code generated with the widening legalization
strategy for widen_cast-4.ll is *dramatically* improved as the special
lowering strategies for a v16i8 SRA kick in even though the high lanes
are undef.
We also get a slightly different choice for broadcasting an aligned
memory location, and use vpshufd instead of vbroadcastss. This looks
like a minor win for pipelining and domain crossing, but a minor loss
for the number of micro-ops. I suspect its a wash, but folks can easily
tweak the lowering if they want.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212324 91177308-0d34-0410-b5e6-96231b3b80d8
Silvermont can only decode one instruction per cycle if the instruction exceeds 8 bytes.
Also in Silvermont instructions with more than 3 prefixes will cause 3 cycle penalty.
Maximum nop length is limited to 7 bytes when used for padding on Silvermont.
For other x86 processors max nop length remains unchanged 15 bytes.
Differential Revision: http://reviews.llvm.org/D4374
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212321 91177308-0d34-0410-b5e6-96231b3b80d8
When INT_MIN is the numerator in a sdiv, we would not properly handle
overflow when calculating the bounds of possible values; abs(INT_MIN) is
not a meaningful number.
Instead, check and handle INT_MIN by reasoning that the largest value is
INT_MIN/-2 and the smallest value is INT_MIN.
This fixes PR20199.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212307 91177308-0d34-0410-b5e6-96231b3b80d8