getTTI method used to get an actual TTI object.
No functionality changed. This just threads the argument and ensures
code like the inliner can correctly look up the callee's TTI rather than
using a fixed one.
The next change will use this to implement per-function subtarget usage
by TTI. The changes after that should eliminate the need for FTTI as that
will have become the default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227730 91177308-0d34-0410-b5e6-96231b3b80d8
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227669 91177308-0d34-0410-b5e6-96231b3b80d8
This patch was generated by a clang tidy checker that is being open sourced.
The documentation of that checker is the following:
/// The emptiness of a container should be checked using the empty method
/// instead of the size method. It is not guaranteed that size is a
/// constant-time function, and it is generally more efficient and also shows
/// clearer intent to use empty. Furthermore some containers may implement the
/// empty method but not implement the size method. Using empty whenever
/// possible makes it easier to switch to another container in the future.
Patch by Gábor Horváth!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226161 91177308-0d34-0410-b5e6-96231b3b80d8
These are named following the IEEE-754 names for these
functions, rather than the libm fmin / fmax to avoid
possible ambiguities. Some languages may implement something
resembling fmin / fmax which return NaN if either operand is
to propagate errors. These implement the IEEE-754 semantics
of returning the other operand if either is a NaN representing
missing data.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220341 91177308-0d34-0410-b5e6-96231b3b80d8
Replace the old code in GVN and BBVectorize with it. Update SimplifyCFG to use
it.
Patch by Björn Steinbrink!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215723 91177308-0d34-0410-b5e6-96231b3b80d8
This commit adds scoped noalias metadata. The primary motivations for this
feature are:
1. To preserve noalias function attribute information when inlining
2. To provide the ability to model block-scope C99 restrict pointers
Neither of these two abilities are added here, only the necessary
infrastructure. In fact, there should be no change to existing functionality,
only the addition of new features. The logic that converts noalias function
parameters into this metadata during inlining will come in a follow-up commit.
What is added here is the ability to generally specify noalias memory-access
sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA
nodes:
!scope0 = metadata !{ metadata !"scope of foo()" }
!scope1 = metadata !{ metadata !"scope 1", metadata !scope0 }
!scope2 = metadata !{ metadata !"scope 2", metadata !scope0 }
!scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 }
!scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 }
Loads and stores can be tagged with an alias-analysis scope, and also, with a
noalias tag for a specific scope:
... = load %ptr1, !alias.scope !{ !scope1 }
... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 }
When evaluating an aliasing query, if one of the instructions is associated
with an alias.scope id that is identical to the noalias scope associated with
the other instruction, or is a descendant (in the scope hierarchy) of the
noalias scope associated with the other instruction, then the two memory
accesses are assumed not to alias.
Note that is the first element of the scope metadata is a string, then it can
be combined accross functions and translation units. The string can be replaced
by a self-reference to create globally unqiue scope identifiers.
[Note: This overview is slightly stylized, since the metadata nodes really need
to just be numbers (!0 instead of !scope0), and the scope lists are also global
unnamed metadata.]
Existing noalias metadata in a callee is "cloned" for use by the inlined code.
This is necessary because the aliasing scopes are unique to each call site
(because of possible control dependencies on the aliasing properties). For
example, consider a function: foo(noalias a, noalias b) { *a = *b; } that gets
inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } --
now just because we know that a1 does not alias with b1 at the first call site,
and a2 does not alias with b2 at the second call site, we cannot let inlining
these functons have the metadata imply that a1 does not alias with b2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213864 91177308-0d34-0410-b5e6-96231b3b80d8
definition below all of the header #include lines, lib/Transforms/...
edition.
This one is tricky for two reasons. We again have a couple of passes
that define something else before the includes as well. I've sunk their
name macros with the DEBUG_TYPE.
Also, InstCombine contains headers that need DEBUG_TYPE, so now those
headers #define and #undef DEBUG_TYPE around their code, leaving them
well formed modular headers. Fixing these headers was a large motivation
for all of these changes, as "leaky" macros of this form are hard on the
modules implementation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206844 91177308-0d34-0410-b5e6-96231b3b80d8
This requires a number of steps.
1) Move value_use_iterator into the Value class as an implementation
detail
2) Change it to actually be a *Use* iterator rather than a *User*
iterator.
3) Add an adaptor which is a User iterator that always looks through the
Use to the User.
4) Wrap these in Value::use_iterator and Value::user_iterator typedefs.
5) Add the range adaptors as Value::uses() and Value::users().
6) Update *all* of the callers to correctly distinguish between whether
they wanted a use_iterator (and to explicitly dig out the User when
needed), or a user_iterator which makes the Use itself totally
opaque.
Because #6 requires churning essentially everything that walked the
Use-Def chains, I went ahead and added all of the range adaptors and
switched them to range-based loops where appropriate. Also because the
renaming requires at least churning every line of code, it didn't make
any sense to split these up into multiple commits -- all of which would
touch all of the same lies of code.
The result is still not quite optimal. The Value::use_iterator is a nice
regular iterator, but Value::user_iterator is an iterator over User*s
rather than over the User objects themselves. As a consequence, it fits
a bit awkwardly into the range-based world and it has the weird
extra-dereferencing 'operator->' that so many of our iterators have.
I think this could be fixed by providing something which transforms
a range of T&s into a range of T*s, but that *can* be separated into
another patch, and it isn't yet 100% clear whether this is the right
move.
However, this change gets us most of the benefit and cleans up
a substantial amount of code around Use and User. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203364 91177308-0d34-0410-b5e6-96231b3b80d8
Move the test for this class into the IR unittests as well.
This uncovers that ValueMap too is in the IR library. Ironically, the
unittest for ValueMap is useless in the Support library (honestly, so
was the ValueHandle test) and so it already lives in the IR unittests.
Mmmm, tasty layering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202821 91177308-0d34-0410-b5e6-96231b3b80d8
Instead, have a DataLayoutPass that holds one. This will allow parts of LLVM
don't don't handle passes to also use DataLayout.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202168 91177308-0d34-0410-b5e6-96231b3b80d8
I am really sorry for the noise, but the current state where some parts of the
code use TD (from the old name: TargetData) and other parts use DL makes it
hard to write a patch that changes where those variables come from and how
they are passed along.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201827 91177308-0d34-0410-b5e6-96231b3b80d8
'OK_NonUniformConstValue' to identify operands which are constants but
not constant splats.
The cost model now allows returning 'OK_NonUniformConstValue'
for non splat operands that are instances of ConstantVector or
ConstantDataVector.
With this change, targets are now able to compute different costs
for instructions with non-uniform constant operands.
For example, On X86 the cost of a vector shift may vary depending on whether
the second operand is a uniform or non-uniform constant.
This patch applies the following changes:
- The cost model computation now takes into account non-uniform constants;
- The cost of vector shift instructions has been improved in
X86TargetTransformInfo analysis pass;
- BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish
between non-uniform and uniform constant operands.
Added a new test to verify that the output of opt
'-cost-model -analyze' is valid in the following configurations: SSE2,
SSE4.1, AVX, AVX2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201272 91177308-0d34-0410-b5e6-96231b3b80d8
Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200892 91177308-0d34-0410-b5e6-96231b3b80d8
can be used by both the new pass manager and the old.
This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.
The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.
Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199104 91177308-0d34-0410-b5e6-96231b3b80d8
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.
Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.
But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199082 91177308-0d34-0410-b5e6-96231b3b80d8
When computing the use set of a store, we need to add the store to the write
set prior to iterating over later instructions. Otherwise, if there is a later
aliasing load of that store, that load will not be tagged as a use, and bad
things will happen.
trackUsesOfI still adds later dependent stores of an instruction to that
instruction's write set, but it never sees the original instruction, and so
when tracking uses of a store, the store must be added to the write set by the
caller.
Fixes PR16834.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188329 91177308-0d34-0410-b5e6-96231b3b80d8
After the recent data-structure improvements, a couple of debugging statements
were broken (printing pointer values).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176791 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes PR15289. This bug was introduced (recently) in r175215; collecting
all std::vector references for candidate pairs to delete at once is invalid
because subsequent lookups in the owning DenseMap could invalidate the
references.
bugpoint was able to reduce a useful test case. Unfortunately, because whether
or not this asserts depends on memory layout, this test case will sometimes
appear to produce valid output. Nevertheless, running under valgrind will
reveal the error.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175397 91177308-0d34-0410-b5e6-96231b3b80d8
Several functions and variable names used the term 'tree' to refer
to what is actually a DAG. Correcting this mistake will, hopefully,
prevent confusion in the future.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175278 91177308-0d34-0410-b5e6-96231b3b80d8
For some basic blocks, it is possible to generate many candidate pairs for
relatively few pairable instructions. When many (tens of thousands) of these pairs
are generated for a single instruction group, the time taken to generate and
rank the different vectorization plans can become quite large. As a result, we now
cap the number of candidate pairs within each instruction group. This is done by
closing out the group once the threshold is reached (set now at 3000 pairs).
Although this will limit the overall compile-time impact, this may not be the best
way to achieve this result. It might be better, for example, to prune excessive
candidate pairs after the fact the prevent the generation of short, but highly-connected
groups. We can experiment with this in the future.
This change reduces the overall compile-time slowdown of the csa.ll test case in
PR15222 to ~5x. If 5x is still considered too large, a lower limit can be
used as the default.
This represents a functionality change, but only for very large inputs
(thus, there is no regression test).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175251 91177308-0d34-0410-b5e6-96231b3b80d8
All instances of std::multimap have now been replaced by
DenseMap<K, std::vector<V> >, and this yields a speedup of 5% on the
csa.ll test case from PR15222.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175216 91177308-0d34-0410-b5e6-96231b3b80d8
This is another commit on the road to removing std::multimap from
BBVectorize. This gives an ~1% speedup on the csa.ll test case
in PR15222.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175215 91177308-0d34-0410-b5e6-96231b3b80d8
When building the pairable-instruction dependency map, don't search
past the last pairable instruction. For large blocks that have been
divided into multiple instruction groups, searching past the last
instruction in each group is very wasteful. This gives a 32% speedup
on the csa.ll test case from PR15222 (when using 50 instructions
in each group).
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174915 91177308-0d34-0410-b5e6-96231b3b80d8
This map is queried only for instructions in pairs of pairable
instructions; so make sure that only pairs of pairable
instructions are added to the map. This gives a 3.5% speedup
on the csa.ll test case from PR15222.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174914 91177308-0d34-0410-b5e6-96231b3b80d8
This eliminates one more linear search over a range of
std::multimap entries. This gives a 22% speedup on the
csa.ll test case from PR15222.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174893 91177308-0d34-0410-b5e6-96231b3b80d8
This removes the last of the linear searches over ranges of std::multimap
iterators, giving a 7% speedup on the doduc.bc input from PR15222.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174859 91177308-0d34-0410-b5e6-96231b3b80d8
This is another cleanup aimed at eliminating linear searches
in ranges of std::multimap.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174858 91177308-0d34-0410-b5e6-96231b3b80d8
Profiling suggests that getInstructionTypes is performance-sensitive,
this cleans up some double-casting in that function in favor of
using dyn_cast.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174857 91177308-0d34-0410-b5e6-96231b3b80d8
By itself, this does not have much of an effect, but only because in the default
configuration the full cycle checks are used only for small problem sizes.
This is part of a general cleanup of uses of iteration over std::multimap
ranges only for the purpose of checking membership.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174856 91177308-0d34-0410-b5e6-96231b3b80d8
This is a follow-up to the cost-model change in r174713 which splits
the cost of a memory operation between the address computation and the
actual memory access. In r174713, this cost is always added to the
memory operation cost, and so BBVectorize will do the same.
Currently, this new cost function is used only by ARM, and I don't
have any ARM test cases for BBVectorize. Assistance in generating some
good ARM test cases for BBVectorize would be greatly appreciated!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174743 91177308-0d34-0410-b5e6-96231b3b80d8
When flipping the pair of subvectors that form a vector, if the
vector length is 2, we can use the SK_Reverse shuffle kind to get
more-accurate cost information. Also we can use the SK_ExtractSubvector
shuffle kind to get accurate subvector extraction costs.
The current cost model implementations don't yet seem complex enough
for this to make a difference (thus, there are no test cases with this
commit), but it should help in future.
Depending on how the various targets optimize and combine shuffles in
practice, we might be able to get more-accurate costs by combining the
costs of multiple shuffle kinds. For example, the cost of flipping the
subvector pairs could be modeled as two extractions and two subvector
insertions. These changes, however, should probably be motivated
by specific test cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@173621 91177308-0d34-0410-b5e6-96231b3b80d8
This could be simplified further, but Hal has a specific feature for
ignoring TTI, and so I preserved that.
Also, I needed to use it because a number of tests fail when switching
from a null TTI to the NoTTI nonce implementation. That seems suspicious
to me and so may be something that you need to look into Hal. I worked
it by preserving the old behavior for these tests with the flag that
ignores all target info.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171722 91177308-0d34-0410-b5e6-96231b3b80d8