The blocking code originated in ARM, which is more aggressive about casting
types to a canonical representative before doing anything else, so I missed out
most vector HFAs and broke the ABI. This should fix it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223126 91177308-0d34-0410-b5e6-96231b3b80d8
The default ARM floating-point mode does not support IEEE 754 mode exactly. Of
relevance to this patch is that input denormals are flushed to zero. The way in
which they're flushed to zero depends on the architecture,
* For VFPv2, it is implementation defined as to whether the sign of zero is
preserved.
* For VFPv3 and above, the sign of zero is always preserved when a denormal
is flushed to zero.
When FP support has been disabled, the strategy taken by this patch is to
assume the software support will mirror the behaviour of the hardware support
for the target *if it existed*. That is, for architectures which can only have
VFPv2, it is assumed the software will flush to positive zero. For later
architectures it is assumed the software will flush to zero preserving sign.
Change-Id: Icc5928633ba222a4ba3ca8c0df44a440445865fd
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223110 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
".weak" symbols cannot be consumed by ptxas (PR21685). This patch makes the
weak directive in MCAsmPrinter customizable, and disables emitting ".weak"
symbols for NVPTX.
Test Plan: weak-linkage.ll
Reviewers: jholewinski
Reviewed By: jholewinski
Subscribers: majnemer, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D6455
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223077 91177308-0d34-0410-b5e6-96231b3b80d8
r208210 introduced an optimization that improves the vector select
codegen by doing the setcc on vectors directly.
This is a problem they the setcc operands are i1s, because the
optimization would create vectors of i1, which aren't legal.
Part of PR21549.
Differential Revision: http://reviews.llvm.org/D6308
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223075 91177308-0d34-0410-b5e6-96231b3b80d8
r213378 improved f16 bitcasts, so that they go directly through subregs,
instead of through the stack. That code now causes an assertion failure
for bitcasts from other 16-bits types (most importantly v2i8).
Correct that by doing the custom lowering for i16 bitcasts only when the
input is an f16.
Part of PR21549.
Differential Revision: http://reviews.llvm.org/D6307
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223074 91177308-0d34-0410-b5e6-96231b3b80d8
The MachineVerifier used to check that there was always exactly one
unconditional branch to a non-landingpad (normal) successor.
If that normal successor to an invoke BB is unreachable, it seems
reasonable to only have one successor, the landing pad.
On targets other than AArch64 (and on AArch64 with a different testcase),
the branch folder turns the branch to the landing pad into a fallthrough.
The MachineVerifier, which relies on AnalyzeBranch, is unable to check
the condition, and doesn't complain. However, it does in this specific
testcase, where the branch to the landing pad remained.
Make the MachineVerifier accept it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223059 91177308-0d34-0410-b5e6-96231b3b80d8
This can significantly reduce the size of the switch, allowing for more
efficient lowering.
I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223049 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
PowerPC DWARF unwind info defined CFA as SP + offset even in a function
where the stack had been dynamically realigned. This clearly doesn't
work because the offset from SP to CFA is not a constant. Fix it by
defining CFA as BP instead.
This was causing the AddressSanitizer null_deref test to fail 50% of
the time, depending on whether SP happened to be 32-byte aligned on
entry to a particular function or not.
Reviewers: willschm, uweigand, hfinkel
Reviewed By: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6410
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222996 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot. I'll respond to the commit on the
list with a reproduction of one of the failures.
Conflicts:
lib/Target/X86/X86TargetTransformInfo.cpp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222936 91177308-0d34-0410-b5e6-96231b3b80d8
The AAPCS treats small structs and homogeneous floating (or vector) aggregates
specially, and guarantees they either get passed as a contiguous block of
registers, or prevent any future use of those registers and get passed on the
stack.
This concept can fit quite neatly into LLVM's own type system, mapping an HFA
to [N x float] and so on, and small structs to [N x i64]. Doing so allows
front-ends to emit AAPCS compliant code without having to duplicate the
register counting logic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222903 91177308-0d34-0410-b5e6-96231b3b80d8
The string data for string-valued build attributes were being unconditionally
uppercased. There is no mention in the ARM ABI addenda about case conventions,
so it's technically implementation defined as to whether the data are
capitialised in some way or not. However, there are good reasons not to
captialise the data.
* It's less work.
* Some vendors may legitimately have case-sensitive checks for these
attributes which would fail on LLVM generated object files.
* There could be locale issues with uppercasing.
The original reasons for uppercasing appear to have stemmed from an
old codesourcery toolchain behaviour, see
http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/87133
This patch makes the object file emitted no longer captialise string
data, it encodes as seen in the assembly source.
Change-Id: Ibe20dd6e60d2773d57ff72a78470839033aa5538
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222882 91177308-0d34-0410-b5e6-96231b3b80d8
This mostly entails adding relocations, however there are a couple of
changes to existing relocations:
1. R_AARCH64_NONE is defined to be zero rather than 256
R_AARCH64_NONE has been defined to be zero for a long time elsewhere
e.g. binutils and glibc since the submission of the AArch64 port in
2012 so this is required for compatibility.
2. R_AARCH64_TLSDESC_ADR_PAGE renamed to R_AARCH64_TLSDESC_ADR_PAGE21
I don't think there is any way for relocation names to leak out of LLVM
so this should not break anything.
Tested with check-all with no regressions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222821 91177308-0d34-0410-b5e6-96231b3b80d8
including SAE mode and memory operand.
Added AVX512_maskable_scalar template, that should cover all scalar instructions in the future.
The main difference between AVX512_maskable_scalar<> and AVX512_maskable<> is using X86select instead of vselect.
I need it, because I can't create vselect node for MVT::i1 mask for scalar instruction.
http://reviews.llvm.org/D6378
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222820 91177308-0d34-0410-b5e6-96231b3b80d8
Since (v)pslldq / (v)psrldq instructions resolve to a single input argument it is useful to match it much earlier than we currently do - this prevents more complicated shuffles (notably insertion into a zero vector) matching before it.
Differential Revision: http://reviews.llvm.org/D6409
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222796 91177308-0d34-0410-b5e6-96231b3b80d8
The pattern matching failed to recognize all instances of "-1", because when
comparing against "-1" we didn't use an APInt of the same bitwidth.
This commit fixes this and also adds inverse versions of the conditon to catch
more cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222722 91177308-0d34-0410-b5e6-96231b3b80d8
This does not matter on newer cores (where we can use reciprocal estimates in
fast-math mode anyway), but for older cores this allows us to generate better
fast-math code where we have multiple FDIVs with a common divisor.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222710 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches function 'transformVSELECTtoBlendVECTOR_SHUFFLE' how to
convert VSELECT dag nodes to shuffles on targets that do not have SSE4.1.
On pre-SSE4.1 targets, we can still perform blend operations using movss/movsd.
Also, removed a target specific combine that performed a premature lowering of
VSELECT nodes to target specific MOVSS/MOVSD nodes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222647 91177308-0d34-0410-b5e6-96231b3b80d8
r222375 made some improvements to build_vector lowering of v4x32 and v4xf32 into an insertps, but it missed a case where:
1. A single extracted element is used twice.
2. The lower of the two non-zero indexes should be preserved, and the higher should be used for the dest mask.
This caused a crash, since the source value for the insertps ends-up uninitialized.
Differential Revision: http://reviews.llvm.org/D6377
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222635 91177308-0d34-0410-b5e6-96231b3b80d8
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222632 91177308-0d34-0410-b5e6-96231b3b80d8
has a remarkably unique and efficient lowering.
While we get this some of the time already, we miss a few cases and
there wasn't a principled reason we got it. We should at least test
this. v8 already has tests for this pattern.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222607 91177308-0d34-0410-b5e6-96231b3b80d8
This s_mov_b32 will write to a virtual register from the M0Reg
class and all the ds instructions now take an extra M0Reg explicit
argument.
This change is necessary to prevent issues with the scheduler
mixing together instructions that expect different values in the m0
registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222583 91177308-0d34-0410-b5e6-96231b3b80d8
filler such as if delay slot filler have to put NOP instruction into the
delay slot of microMIPS BEQ or BNE instruction which uses the register $0,
then instead of emitting NOP this instruction is replaced by the corresponding
microMIPS compact branch instruction, i.e. BEQZC or BNEZC.
Differential Revision: http://reviews.llvm.org/D3566
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222580 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen
for Sandy Bridge and Ivy Bridge. There is no functionality change intended for
those chips. Previously, the absence of AVX2 was being used as a proxy to detect
this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2
that do not have the 32-byte unaligned access slowdown.
Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).
Differential Revision: http://reviews.llvm.org/D6355
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222544 91177308-0d34-0410-b5e6-96231b3b80d8
shuffle lowering to allow much better blend matching.
Specifically, with the new structure the code seems clearer to me and we
correctly can hit the cases where merging two 128-bit lanes is a clear
win and can be shuffled cheaply afterward.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222539 91177308-0d34-0410-b5e6-96231b3b80d8
a bunch more improvements.
Non-lane-crossing is fine, the key is that lane merging only makes sense
for single-input shuffles. Not sure why I got so turned around here. The
code all works, I was just using the wrong model for it.
This only updates v4 and v8 lowering. The v16 and v32 lowering requires
restructuring the entire check sequence.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222537 91177308-0d34-0410-b5e6-96231b3b80d8
Before this patch, the DAGCombiner only tried to convert build_vector dag nodes
into shuffles if all operands were either extract_vector_elt or undef.
This patch improves that logic and teaches the DAGCombiner how to deal with
build_vector dag nodes where one or more operands are zero. A build_vector
dag node with some zero operands is turned into a shuffle only if the resulting
shuffle mask is legal for the target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222536 91177308-0d34-0410-b5e6-96231b3b80d8
lanes.
By special casing these we can often either reduce the total number of
shuffles significantly or reduce the number of (high latency on Haswell)
AVX2 shuffles that potentially cross 128-bit lanes. Even when these
don't actually cross lanes, they have much higher latency to support
that. Doing two of them and a blend is worse than doing a single insert
across the 128-bit lanes to blend and then doing a single interleaved
shuffle.
While this seems like a narrow case, it kept cropping up on me and the
difference is *huge* as you can see in many of the test cases. I first
hit this trying to perfectly fix the interleaving shuffle patterns used
by Halide for AVX2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222533 91177308-0d34-0410-b5e6-96231b3b80d8
merging 128-bit subvectors and also shuffling all the elements of those
subvectors. Currently we generate pretty bad code for many of these, but
I'm testing a patch that should dramatically improve this in addition to
making the shuffle lowering robust to other changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222525 91177308-0d34-0410-b5e6-96231b3b80d8
This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in
the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on
SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM,
there is a store moved out of the inner loop) and a potential speedup on
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it
makes some code look cleaner, and synchronizing the backends in this regard
seems like a generally good thing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222504 91177308-0d34-0410-b5e6-96231b3b80d8