This commit moves `MDLocation`, finishing off PR21433. There's an
accompanying clang commit for frontend testcases. I'll attach the
testcase upgrade script I used to PR21433 to help out-of-tree
frontends/backends.
This changes the schema for `DebugLoc` and `DILocation` from:
!{i32 3, i32 7, !7, !8}
to:
!MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8)
Note that empty fields (line/column: 0 and inlinedAt: null) don't get
printed by the assembly writer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226048 91177308-0d34-0410-b5e6-96231b3b80d8
Propagate whether `MDNode`s are 'distinct' through the other types of IR
(assembly and bitcode). This adds the `distinct` keyword to assembly.
Currently, no one actually calls `MDNode::getDistinct()`, so these nodes
only get created for:
- self-references, which are never uniqued, and
- nodes whose operands are replaced that hit a uniquing collision.
The concept of distinct nodes is still not quite first-class, since
distinct-ness doesn't yet survive across `MapMetadata()`.
Part of PR22111.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225474 91177308-0d34-0410-b5e6-96231b3b80d8
Now that `Metadata` is typeless, reflect that in the assembly. These
are the matching assembly changes for the metadata/value split in
r223802.
- Only use the `metadata` type when referencing metadata from a call
intrinsic -- i.e., only when it's used as a `Value`.
- Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode`
when referencing it from call intrinsics.
So, assembly like this:
define @foo(i32 %v) {
call void @llvm.foo(metadata !{i32 %v}, metadata !0)
call void @llvm.foo(metadata !{i32 7}, metadata !0)
call void @llvm.foo(metadata !1, metadata !0)
call void @llvm.foo(metadata !3, metadata !0)
call void @llvm.foo(metadata !{metadata !3}, metadata !0)
ret void, !bar !2
}
!0 = metadata !{metadata !2}
!1 = metadata !{i32* @global}
!2 = metadata !{metadata !3}
!3 = metadata !{}
turns into this:
define @foo(i32 %v) {
call void @llvm.foo(metadata i32 %v, metadata !0)
call void @llvm.foo(metadata i32 7, metadata !0)
call void @llvm.foo(metadata i32* @global, metadata !0)
call void @llvm.foo(metadata !3, metadata !0)
call void @llvm.foo(metadata !{!3}, metadata !0)
ret void, !bar !2
}
!0 = !{!2}
!1 = !{i32* @global}
!2 = !{!3}
!3 = !{}
I wrote an upgrade script that handled almost all of the tests in llvm
and many of the tests in cfe (even handling many `CHECK` lines). I've
attached it (or will attach it in a moment if you're speedy) to PR21532
to help everyone update their out-of-tree testcases.
This is part of PR21532.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224257 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot. I'll respond to the commit on the
list with a reproduction of one of the failures.
Conflicts:
lib/Target/X86/X86TargetTransformInfo.cpp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222936 91177308-0d34-0410-b5e6-96231b3b80d8
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222632 91177308-0d34-0410-b5e6-96231b3b80d8
A few minor changes to prevent @llvm.assume from interfering with loop
vectorization. First, treat @llvm.assume like the lifetime intrinsics, which
are scalarized (but don't otherwise interfere with the legality checking).
Second, ignore the cost of ephemeral instructions in the loop (these will go
away anyway during CodeGen).
Alignment assumptions and other uses of @llvm.assume can often end up inside of
loops that should be vectorized (this is not uncommon for assumptions generated
by __attribute__((align_value(n))), for example).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219741 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r218918, effectively reapplying r218914 after fixing
an Ocaml bindings test and an Asan crash. The root cause of the latter
was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a
PR to investigate who requires the loose check (and why).
Original commit message follows.
--
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString. Integers are stringified and
a `\0` character is used as a separator.
Part of PR17891.
Note: I've attached my testcases upgrade scripts to the PR. If I've
just broken your out-of-tree testcases, they might help.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219010 91177308-0d34-0410-b5e6-96231b3b80d8
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString. Integers are stringified and
a `\0` character is used as a separator.
Part of PR17891.
Note: I've attached my testcases upgrade scripts to the PR. If I've
just broken your out-of-tree testcases, they might help.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218914 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible.
Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend.
Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216371 91177308-0d34-0410-b5e6-96231b3b80d8
The current remark is ambiguous and makes it sounds like explicitly specifying vectorization will allow the loop to be vectorized. This is not the case. The improved remark directs the user to -Rpass-analysis=loop-vectorize to determine the cause of the pass-miss.
Reviewed by Arnold Schwaighofer`
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214445 91177308-0d34-0410-b5e6-96231b3b80d8
Prior to this change, the loop vectorizer did not make use of the alias
analysis infrastructure. Instead, it performed memory dependence analysis using
ScalarEvolution-based linear dependence checks within equivalence classes
derived from the results of ValueTracking's GetUnderlyingObjects.
Unfortunately, this meant that:
1. The loop vectorizer had logic that essentially duplicated that in BasicAA
for aliasing based on identified objects.
2. The loop vectorizer could not partition the space of dependency checks
based on information only easily available from within AA (TBAA metadata is
currently the prime example).
This means, for example, regardless of whether -fno-strict-aliasing was
provided, the vectorizer would only vectorize this loop with a runtime
memory-overlap check:
void foo(int *a, float *b) {
for (int i = 0; i < 1600; ++i)
a[i] = b[i];
}
This is suboptimal because the TBAA metadata already provides the information
necessary to show that this check unnecessary. Of course, the vectorizer has a
limit on the number of such checks it will insert, so in practice, ignoring
TBAA means not vectorizing more-complicated loops that we should.
This change causes the vectorizer to use an AliasSetTracker to keep track of
the pointers in the loop. The resulting alias sets are then used to partition
the space of dependency checks, and potential runtime checks; this results in
more-efficient vectorizations.
When pointer locations are added to the AliasSetTracker, two things are done:
1. The location size is set to UnknownSize (otherwise you'd not catch
inter-iteration dependencies)
2. For instructions in blocks that would need to be predicated, TBAA is
removed (because the metadata might have a control dependency on the condition
being speculated).
For non-predicated blocks, you can leave the TBAA metadata. This is safe
because you can't have an iteration dependency on the TBAA metadata (if you
did, and you unrolled sufficiently, you'd end up with the same pointer value
used by two accesses that TBAA says should not alias, and that would yield
undefined behavior).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213486 91177308-0d34-0410-b5e6-96231b3b80d8
This patch modifies the existing DiagnosticInfo system to create a generic base
class that is inherited to produce diagnostic-based warnings. This is used by
the loop vectorizer to trigger a warning when vectorization is forced and
fails. Several tests have been added to verify this behavior.
Reviewed by: Arnold Schwaighofer
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213110 91177308-0d34-0410-b5e6-96231b3b80d8
This lets us experiment with 512-bit vectorization without passing
force-vector-width manually.
The code generated for a simple integer memset loop is properly vectorized.
Disassembly is still broken for it though :(.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212634 91177308-0d34-0410-b5e6-96231b3b80d8
[LLVM part]
These patches rename the loop unrolling and loop vectorizer metadata
such that they have a common 'llvm.loop.' prefix. Metadata name
changes:
llvm.vectorizer.* => llvm.loop.vectorizer.*
llvm.loopunroll.* => llvm.loop.unroll.*
This was a suggestion from an earlier review
(http://reviews.llvm.org/D4090) which added the loop unrolling
metadata.
Patch by Mark Heffernan.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211710 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This new debug emission kind supports emitting line location
information in all instructions, but stops code generation
from emitting debug info to the final output.
This mode is useful when the backend wants to track source
locations during code generation, but it does not want to
produce debug info. This is currently used by optimization
remarks (-pass-remarks, -pass-remarks-missed and
-pass-remarks-analysis).
To prevent debug info emission, DIBuilder never inserts the
annotation 'llvm.dbg.cu' when LocTrackingOnly is enabled.
Reviewers: echristo, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4234
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211609 91177308-0d34-0410-b5e6-96231b3b80d8
The old method used by X86TTI to determine partial-unrolling thresholds was
messy (because it worked by testing target features), and also would not
correctly identify the target CPU if certain target features were disabled.
After some discussions on IRC with Chandler et al., it was decided that the
processor scheduling models were the right containers for this information
(because it is often tied to special uop dispatch-buffer sizes).
This does represent a small functionality change:
- For generic x86-64 (which uses the SB model and, thus, will get some
unrolling).
- For AMD cores (because they still currently use the SB scheduling model)
- For Haswell (based on benchmarking by Louis Gerbarg, it was decided to bump
the default threshold to 50; we're working on a test case for this).
Otherwise, nothing has changed for any other targets. The logic, however, has
been moved into BasicTTI, so other targets may now also opt-in to this
functionality simply by setting LoopMicroOpBufferSize in their processor
model definitions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208289 91177308-0d34-0410-b5e6-96231b3b80d8
This patch changes the vectorization remarks to also inform when
vectorization is possible but not beneficial.
Added tests to exercise some loop remarks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207574 91177308-0d34-0410-b5e6-96231b3b80d8
This provides an initial implementation of getUnrollingPreferences for x86.
getUnrollingPreferences is used by the generic (concatenation) unroller, which
is distinct from the unrolling done by the loop vectorizer. Many modern x86
cores have some kind of uop cache and loop-stream detector (LSD) used to
efficiently dispatch small loops, and taking full advantage of this requires
unrolling small loops (small here means 10s of uops).
These caches also have limits on the number of taken branches in the loop, and
so we also cap the loop unrolling factor based on the maximum "depth" of the
loop. This is currently calculated with a partial DFS traversal (partial
because it will stop early if the path length grows too much). This is still an
approximation, and one that is both conservative (because it does not account
for branches eliminated via block placement) and optimistic (because it is only
recording the maximum depth over minimum paths). Nevertheless, because the
loops that fit in these uop caches are so small, it is not clear how much the
details matter.
The original set of patches posted for review produced the following test-suite
performance results (from the TSVC benchmark) at that time:
ControlLoops-dbl - 13% speedup
ControlLoops-flt - 15% speedup
Reductions-dbl - 7.5% speedup
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205348 91177308-0d34-0410-b5e6-96231b3b80d8
The generic (concatenation) loop unroller is currently placed early in the
standard optimization pipeline. This is a good place to perform full unrolling,
but not the right place to perform partial/runtime unrolling. However, most
targets don't enable partial/runtime unrolling, so this never mattered.
However, even some x86 cores benefit from partial/runtime unrolling of very
small loops, and follow-up commits will enable this. First, we need to move
partial/runtime unrolling late in the optimization pipeline (importantly, this
is after SLP and loop vectorization, as vectorization can drastically change
the size of a loop), while keeping the full unrolling where it is now. This
change does just that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205264 91177308-0d34-0410-b5e6-96231b3b80d8
Pretty obvious follow-on to r205159 to also handle conversion from double
besides float.
Fixes <rdar://problem/16373208>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205253 91177308-0d34-0410-b5e6-96231b3b80d8
There is no direct AVX instruction to convert to unsigned. I have some ideas
how we may be able to do this with three vector instructions but the current
backend just bails on this to get it scalarized.
See the comment why we need to adjust the cost returned by BasicTTI.
The test is a bit roundabout (and checks assembly rather than bit code) because
I'd like it to work even if at some point we could vectorize this conversion.
Fixes <rdar://problem/16371920>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205159 91177308-0d34-0410-b5e6-96231b3b80d8
and v4i64->v4f64.
The new costs match what we did for SSE2 and reflect the reality of our codegen.
<rdar://problem/16381225>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204884 91177308-0d34-0410-b5e6-96231b3b80d8
loop vectorizer to not do so when runtime pointer checks are needed and
share code with the new (not yet enabled) load/store saturation runtime
unrolling. Also ensure that we only consider the runtime checks when the
loop hasn't already been vectorized. If it has, the runtime check cost
has already been paid.
I've fleshed out a test case to cover the scalar unrolling as well as
the vector unrolling and comment clearly why we are or aren't following
the pattern.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200530 91177308-0d34-0410-b5e6-96231b3b80d8
vectorizer, placing it behind an off-by-default flag.
It turns out that block frequency isn't what we want at all, here or
elsewhere. This has been I think a nagging feeling for several of us
working with it, but Arnold has given some really nice simple examples
where the results are so comprehensively wrong that they aren't useful.
I'm planning to email the dev list with a summary of why its not really
useful and a couple of ideas about how to better structure these types
of heuristics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200294 91177308-0d34-0410-b5e6-96231b3b80d8
cold loops as-if they were being optimized for size.
Nothing fancy here. Simply test case included. The nice thing is that we
can now incrementally build on top of this to drive other heuristics.
All of the infrastructure work is done to get the profile information
into this layer.
The remaining work necessary to make this a fully general purpose loop
unroller for very hot loops is to make it a fully general purpose loop
unroller. Things I know of but am not going to have time to benchmark
and fix in the immediate future:
1) Don't disable the entire pass when the target is lacking vector
registers. This really doesn't make any sense any more.
2) Teach the unroller at least and the vectorizer potentially to handle
non-if-converted loops. This is trivial for the unroller but hard for
the vectorizer.
3) Compute the relative hotness of the loop and thread that down to the
various places that make cost tradeoffs (very likely only the
unroller makes sense here, and then only when dealing with loops that
are small enough for unrolling to not completely blow out the LSD).
I'm still dubious how useful hotness information will be. So far, my
experiments show that if we can get the correct logic for determining
when unrolling actually helps performance, the code size impact is
completely unimportant and we can unroll in all cases. But at least
we'll no longer burn code size on cold code.
One somewhat unrelated idea that I've had forever but not had time to
implement: mark all functions which are only reachable via the global
constructors rigging in the module as optsize. This would also decrease
the impact of any more aggressive heuristics here on code size.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200219 91177308-0d34-0410-b5e6-96231b3b80d8
Test is platform independent, but I don't want to force vector-width, or
that could spoil the pragma test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196539 91177308-0d34-0410-b5e6-96231b3b80d8
In signed arithmetic we could end up with an i64 trip count for an i32 phi.
Because it is signed arithmetic we know that this is only defined if the i32
does not wrap. It is therefore safe to truncate the i64 trip count to a i32
value.
Fixes PR18049.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195787 91177308-0d34-0410-b5e6-96231b3b80d8
We are slicing an array of Value pointers and process those slices in a loop.
The problem is that we might invalidate a later slice by vectorizing a former
slice.
Use a WeakVH to track the pointer. If the pointer is deleted or RAUW'ed we can
tell.
The test case will only fail when running with libgmalloc.
radar://15498655
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195162 91177308-0d34-0410-b5e6-96231b3b80d8
When unrolling is disabled in the pass manager, the loop vectorizer should also
not unroll loops. This will allow the -fno-unroll-loops option in Clang to
behave as expected (even for vectorizable loops). The loop vectorizer's
-force-vector-unroll option will (continue to) override the pass-manager
setting (including -force-vector-unroll=0 to force use of the internal
auto-selection logic).
In order to test this, I added a flag to opt (-disable-loop-unrolling) to force
disable unrolling through opt (the analog of -fno-unroll-loops in Clang). Also,
this fixes a small bug in opt where the loop vectorizer was enabled only after
the pass manager populated the queue of passes (the global_alias.ll test needed
a slight update to the RUN line as a result of this fix).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@189499 91177308-0d34-0410-b5e6-96231b3b80d8
- Instead of setting the suffixes in a bunch of places, just set one master
list in the top-level config. We now only modify the suffix list in a few
suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).
- Aside from removing the need for a bunch of lit.local.cfg files, this enables
4 tests that were inadvertently being skipped (one in
Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
XFAILED).
- This commit also fixes a bunch of config files to use config.root instead of
older copy-pasted code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188513 91177308-0d34-0410-b5e6-96231b3b80d8
This update was done with the following bash script:
find test/Transforms -name "*.ll" | \
while read NAME; do
echo "$NAME"
if ! grep -q "^; *RUN: *llc" $NAME; then
TEMP=`mktemp -t temp`
cp $NAME $TEMP
sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
while read FUNC; do
sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3@$FUNC(/g" $TEMP
done
mv $TEMP $NAME
fi
done
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186268 91177308-0d34-0410-b5e6-96231b3b80d8
- llvm.loop.parallel metadata has been renamed to llvm.loop to be more generic
by making the root of additional loop metadata.
- Loop::isAnnotatedParallel now looks for llvm.loop and associated
llvm.mem.parallel_loop_access
- document llvm.loop and update llvm.mem.parallel_loop_access
- add support for llvm.vectorizer.width and llvm.vectorizer.unroll
- document llvm.vectorizer.* metadata
- add utility class LoopVectorizerHints for getting/setting loop metadata
- use llvm.vectorizer.width=1 to indicate already vectorized instead of
already_vectorized
- update existing tests that used llvm.loop.parallel and
llvm.vectorizer.already_vectorized
Reviewed by: Nadav Rotem
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182802 91177308-0d34-0410-b5e6-96231b3b80d8
This patch disables memory-instruction vectorization for types that need padding
bytes, e.g., x86_fp80 has 10 bytes store size with 6 bytes padding in darwin on
x86_64. Because the load/store vectorization is performed by the bit casting to
a packed vector, which has incompatible memory layout due to the lack of padding
bytes, the present vectorizer produces inconsistent result for memory
instructions of those types.
This patch checks an equality of the AllocSize of a scalar type and allocated
size for each vector element, to ensure that there is no padding bytes and the
array can be read/written using vector operations.
Patch by Daisuke Takahashi!
Fixes PR15758.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180196 91177308-0d34-0410-b5e6-96231b3b80d8