Canonicalize access to function attributes to use the simpler API.
getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
=> getFnAttribute(Kind)
getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
=> hasFnAttribute(Kind)
Also, add `Function::getFnStackAlignment()`, and canonicalize:
getAttributes().getStackAlignment(AttributeSet::FunctionIndex)
=> getFnStackAlignment()
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229208 91177308-0d34-0410-b5e6-96231b3b80d8
We used to do this DAG combine, but it's not always correct:
If the first fp_round isn't a value preserving truncation, it might
introduce a tie in the second fp_round, that wouldn't occur in the
single-step fp_round we want to fold to.
In other words, double rounding isn't the same as rounding.
Differential Revision: http://reviews.llvm.org/D7571
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228911 91177308-0d34-0410-b5e6-96231b3b80d8
Add new token factor node and its users to worklist if alias analysis is
turned on, in DAGCombiner::visitTokenFactor(). Alias analysis may cause
a lot of new token factors to be inserted into the DAG, and they need to
be optimized to avoid significant slow-downs.
Reviewed by Hal Finkel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228841 91177308-0d34-0410-b5e6-96231b3b80d8
nodes when folding bitcasts of constants.
We can't fold things and then check after-the-fact whether it was legal.
Once we have formed the DAG node, arbitrary other nodes may have been
collapsed to it. There is no easy way to go back. Instead, we need to
test for the specific folding cases we're interested in and ensure those
are legal first.
This could in theory make this less powerful for bitcasting from an
integer to some vector type, but AFAICT, that can't actually happen in
the SDAG so its fine. Now, we *only* whitelist specific int->fp and
fp->int bitcasts for post-legalization folding. I've added the test case
from the PR.
(Also as a note, this does not appear to be in 3.6, no backport needed)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228656 91177308-0d34-0410-b5e6-96231b3b80d8
The combine that forms extloads used to be disabled on vector types,
because "None of the supported targets knows how to perform load and
sign extend on vectors in one instruction."
That's not entirely true, since at least SSE4.1 X86 knows how to do
those sextloads/zextloads (with PMOVS/ZX).
But there are several aspects to getting this right.
First, vector extloads are controlled by a profitability callback.
For instance, on ARM, several instructions have folded extload forms,
so it's not always beneficial to create an extload node (and trying to
match extloads is a whole 'nother can of worms).
The interesting optimization enables folding of s/zextloads to illegal
(splittable) vector types, expanding them into smaller legal extloads.
It's not ideal (it introduces some legalization-like behavior in the
combine) but it's better than the obvious alternative: form illegal
extloads, and later try to split them up. If you do that, you might
generate extloads that can't be split up, but have a valid ext+load
expansion. At vector-op legalization time, it's too late to generate
this kind of code, so you end up forced to scalarize. It's better to
just avoid creating egregiously illegal nodes.
This optimization is enabled unconditionally on X86.
Note that the splitting combine is happy with "custom" extloads. As
is, this bypasses the actual custom lowering, and just unrolls the
extload. But from what I've seen, this is still much better than the
current custom lowering, which does some kind of unrolling at the end
anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added
FIXME).
Also note that the existing combine that forms extloads is now also
enabled on legal vectors. This doesn't have a big effect on X86
(because sext+load is usually combined to sext_inreg+aextload).
On ARM it fires on some rare occasions; that's for a separate commit.
Differential Revision: http://reviews.llvm.org/D6904
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228325 91177308-0d34-0410-b5e6-96231b3b80d8
This commit creates infinite loop in DAG combine for in the LLVM test-suite
for aarch64 with mcpu=cylcone (just having neon may be enough to expose this).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227272 91177308-0d34-0410-b5e6-96231b3b80d8
This patch resolves part of PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ).
The 'f3' test case in that report presents a situation where we have two 128-bit
stores extracted from a 256-bit source vector.
Instead of producing this:
vmovaps %xmm0, (%rdi)
vextractf128 $1, %ymm0, 16(%rdi)
This patch merges the 128-bit stores into a single 256-bit store:
vmovups %ymm0, (%rdi)
Differential Revision: http://reviews.llvm.org/D7208
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227242 91177308-0d34-0410-b5e6-96231b3b80d8
This is a 2nd try at the same optimization as http://reviews.llvm.org/D6698.
That patch was checked in at r224611, but reverted at r225031 because it
caused a failure outside of the regression tests.
The cause of the crash was not recognizing consecutive stores that have mixed
source values (loads and vector element extracts), so this patch adds a check
to bail out if any store value is not coming from a vector element extract.
This patch also refactors the shared logic of the constant source and vector
extracted elements source cases into a helper function.
Differential Revision: http://reviews.llvm.org/D6850
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226845 91177308-0d34-0410-b5e6-96231b3b80d8
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.
Differential Revision: http://reviews.llvm.org/D7093
Fixed recommit of r226811.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226816 91177308-0d34-0410-b5e6-96231b3b80d8
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.
Differential Revision: http://reviews.llvm.org/D7093
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226811 91177308-0d34-0410-b5e6-96231b3b80d8
The problem occurs when after vectorization we have type
<2 x i32>. This type is promoted to <2 x i64> and then requires
additional efforts for expanding loads and truncating stores.
I added EXPAND / TRUNCATE attributes to the masked load/store
SDNodes. The code now contains additional shuffles.
I've prepared changes in the cost estimation for masked memory
operations, it will be submitted separately.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226808 91177308-0d34-0410-b5e6-96231b3b80d8
Type MVT::i1 became legal in KNL, but store operation can't be narrowed to this type,
since the size of VT (1 bit) is not equal to its actual store size(8 bits).
Added a test provided by David (dag@cray.com)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226805 91177308-0d34-0410-b5e6-96231b3b80d8
Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector
produced suboptimal code in AVX2 mode with certain IR combinations.
In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32
(undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical,
but then mysteriously generated rather bad code; the movq/movhpd combination
didn't match.
The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs
would get promoted to 4f32 by the type legalizer, eventually resulting
in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing
these were both half the output size, concatted them and then produced
a shuffle. However, the resulting concat + shuffle was more complex than
it should be; in the case where the upper half of the output is undef, we
probably want to generate shuffle + concat instead.
This enhancement causes the vector_shuffle combine step to recognize this
suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR
in case the same suboptimal pattern occurs for other reasons.
This results in the optimizer correctly producing the optimal movq + movhpd
sequence for all three variations on this IR, even with AVX2.
I've included a test case.
Radar link: rdar://problem/19287012
Fix for PR 21943.
From: Fiona Glaser <fglaser@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226360 91177308-0d34-0410-b5e6-96231b3b80d8
utils/sort_includes.py.
I clearly haven't done this in a while, so more changed than usual. This
even uncovered a missing include from the InstrProf library that I've
added. No functionality changed here, just mechanical cleanup of the
include order.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225974 91177308-0d34-0410-b5e6-96231b3b80d8
This requires a new hook to prevent expanding sqrt in terms
of rsqrt and reciprocal. v_rcp_f32, v_rsq_f32, and v_sqrt_f32 are
all the same rate, so this expansion would just double the number
of instructions and cycles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225828 91177308-0d34-0410-b5e6-96231b3b80d8
As pointed out by Aditya (and Owen), when we elide an FP extend to form an FMA,
we need to extend the incoming operands so that the resulting node will really
be legal. This is currently enabled only for PowerPC, and it happens to work
there regardless, but this should fix the functionality for everyone else
should anyone else wish to use it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225492 91177308-0d34-0410-b5e6-96231b3b80d8
As pointed out by Aditya (and Owen), there are two things wrong with this code.
First, it adds patterns which elide FP extends when forming FMAs, and that might
not be profitable on all targets (it belongs behind the pre-existing
aggressive-FMA-formation flag). This is fixed by this change.
Second, the resulting nodes might have operands of different types (the
extensions need to be re-added). That will be fixed in the follow-up commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225485 91177308-0d34-0410-b5e6-96231b3b80d8
type (in addition to the memory type).
The *LoadExt* legalization handling used to only have one type, the
memory type. This forced users to assume that as long as the extload
for the memory type was declared legal, and the result type was legal,
the whole extload was legal.
However, this isn't always the case. For instance, on X86, with AVX,
this is legal:
v4i32 load, zext from v4i8
but this isn't:
v4i64 load, zext from v4i8
Whereas v4i64 is (arguably) legal, even without AVX2.
Note that the same thing was done a while ago for truncstores (r46140),
but I assume no one needed it yet for extloads, so here we go.
Calls to getLoadExtAction were changed to add the value type, found
manually in the surrounding code.
Calls to setLoadExtAction were mechanically changed, by wrapping the
call in a loop, to match previous behavior. The loop iterates over
the MVT subrange corresponding to the memory type (FP vectors, etc...).
I also pulled neighboring setTruncStoreActions into some of the loops;
those shouldn't make a difference, as the additional types are illegal.
(e.g., i128->i1 truncstores on PPC.)
No functional change intended.
Differential Revision: http://reviews.llvm.org/D6532
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225421 91177308-0d34-0410-b5e6-96231b3b80d8
Right now in DAG Combine check the validity of the returned type
only when -debug is given on the command line. However usually
the test cases in the validation does not use -debug.
An Assert build should always check this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224779 91177308-0d34-0410-b5e6-96231b3b80d8
This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors.
This fixes PR15872 and provides a partial fix for PR21711.
Differential Revision: http://reviews.llvm.org/D6678
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224429 91177308-0d34-0410-b5e6-96231b3b80d8
Add an option to disable optimization to shrink truncated larger type
loads to smaller type loads. On SI this prevents using scalar load
instructions in some cases, since there are no scalar extloads.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224084 91177308-0d34-0410-b5e6-96231b3b80d8
Added instcombine optimizations for BSWAP with AND/OR/XOR ops:
OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )
OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )
Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well:
fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y))
Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone.
Differential Revision: http://reviews.llvm.org/D6407
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223349 91177308-0d34-0410-b5e6-96231b3b80d8
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.
Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223348 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot. I'll respond to the commit on the
list with a reproduction of one of the failures.
Conflicts:
lib/Target/X86/X86TargetTransformInfo.cpp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222936 91177308-0d34-0410-b5e6-96231b3b80d8
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222632 91177308-0d34-0410-b5e6-96231b3b80d8
Before this patch, the DAGCombiner only tried to convert build_vector dag nodes
into shuffles if all operands were either extract_vector_elt or undef.
This patch improves that logic and teaches the DAGCombiner how to deal with
build_vector dag nodes where one or more operands are zero. A build_vector
dag node with some zero operands is turned into a shuffle only if the resulting
shuffle mask is legal for the target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222536 91177308-0d34-0410-b5e6-96231b3b80d8
This patch simplifies the logic that combines a pair of shuffle nodes into
a single shuffle if there is a legal mask. Also added comments to better
describe the algorithm. No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222522 91177308-0d34-0410-b5e6-96231b3b80d8
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.
This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222334 91177308-0d34-0410-b5e6-96231b3b80d8
Some optimisations in DAGCombiner cause miscompilations for targets that use
TargetLowering::UndefinedBooleanContent, because they assume that the results
of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and
XORed. These optimisations are only valid for targets that use
ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent.
This is a follow-up to D6210/r221693.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222123 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches the DAGCombiner how to combine shuffles according to rules:
shuffle(shuffle(A, Undef, M0), B, M1) -> shuffle(B, A, M2)
shuffle(shuffle(A, B, M0), B, M1) -> shuffle(B, A, M2)
shuffle(shuffle(A, B, M0), A, M1) -> shuffle(B, A, M2)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222090 91177308-0d34-0410-b5e6-96231b3b80d8
LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with
(set_cc !cc x y), which is only correct when the xor has type i1.
Instead, we should check that the constant operand to the xor is all
ones.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221693 91177308-0d34-0410-b5e6-96231b3b80d8