O2 compiles just before GlobalDCE, unless we are preparing for LTO.
This pass eliminates available externally globals (turning them into
declarations), regardless of whether they are dead/unreferenced, since
we are guaranteed to have a copy available elsewhere at link time.
This enables additional opportunities for GlobalDCE.
If we are preparing for LTO (e.g. a -flto -c compile), the pass is not
included as we want to preserve available externally functions for possible
link time inlining. The FE indicates whether we are doing an -flto compile
via the new PrepareForLTO flag on the PassManagerBuilder.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239480 91177308-0d34-0410-b5e6-96231b3b80d8
`StripDebug` was only used by tools/opt/opt.cpp in
`AddStandardLinkPasses()`, but opt.cpp adds the same pass based on its
command-line flag before it calls `AddStandardLinkPasses()`. Stripping
debug info twice isn't very useful.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232765 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.
This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.
I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.
I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.
Test Plan:
Reviewers: echristo
Subscribers: llvm-commits
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231740 91177308-0d34-0410-b5e6-96231b3b80d8
By loading from indexed offsets into a byte array and applying a mask, a
program can test bits from the bit set with a relatively short instruction
sequence. For example, suppose we have 15 bit sets to lay out:
A (16 bits), B (15 bits), C (14 bits), D (13 bits), E (12 bits),
F (11 bits), G (10 bits), H (9 bits), I (7 bits), J (6 bits), K (5 bits),
L (4 bits), M (3 bits), N (2 bits), O (1 bit)
These bits can be laid out in a 16-byte array like this:
Byte Offset
0123456789ABCDEF
Bit
7 HHHHHHHHHIIIIIII
6 GGGGGGGGGGJJJJJJ
5 FFFFFFFFFFFKKKKK
4 EEEEEEEEEEEELLLL
3 DDDDDDDDDDDDDMMM
2 CCCCCCCCCCCCCCNN
1 BBBBBBBBBBBBBBBO
0 AAAAAAAAAAAAAAAA
For example, to test bit X of A, we evaluate ((bits[X] & 1) != 0), or to
test bit X of I, we evaluate ((bits[9 + X] & 0x80) != 0). This can be done
in 1-2 machine instructions on x86, or 4-6 instructions on ARM.
This uses the LPT multiprocessor scheduling algorithm to lay out the bits
efficiently.
Saves ~450KB of instructions in a recent build of Chromium.
Differential Revision: http://reviews.llvm.org/D7954
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231043 91177308-0d34-0410-b5e6-96231b3b80d8
This change aligns globals to the next highest power of 2 bytes, up to a
maximum of 128. This makes it more likely that we will be able to compress
bit sets with a greater alignment. In many more cases, we can now take
advantage of a new optimization also introduced in this patch that removes
bit set checks if the bit set is all ones.
The 128 byte maximum was found to provide the best tradeoff between instruction
overhead and data overhead in a recent build of Chromium. It allows us to
remove ~2.4MB of instructions at the cost of ~250KB of data.
Differential Revision: http://reviews.llvm.org/D7873
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230540 91177308-0d34-0410-b5e6-96231b3b80d8
The builder is based on a layout algorithm that tries to keep members of
small bit sets together. The new layout compresses Chromium's bit sets to
around 15% of their original size.
Differential Revision: http://reviews.llvm.org/D7796
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230394 91177308-0d34-0410-b5e6-96231b3b80d8
This patch introduces a new mechanism that allows IR modules to co-operatively
build pointer sets corresponding to addresses within a given set of
globals. One particular use case for this is to allow a C++ program to
efficiently verify (at each call site) that a vtable pointer is in the set
of valid vtable pointers for the class or its derived classes. One way of
doing this is for a toolchain component to build, for each class, a bit set
that maps to the memory region allocated for the vtables, such that each 1
bit in the bit set maps to a valid vtable for that class, and lay out the
vtables next to each other, to minimize the total size of the bit sets.
The patch introduces a metadata format for representing pointer sets, an
'@llvm.bitset.test' intrinsic and an LTO lowering pass that lays out the globals
and builds the bitsets, and documents the new feature.
Differential Revision: http://reviews.llvm.org/D7288
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230054 91177308-0d34-0410-b5e6-96231b3b80d8
LLVM's include tree and the use of using declarations to hide the
'legacy' namespace for the old pass manager.
This undoes the primary modules-hostile change I made to keep
out-of-tree targets building. I sent an email inquiring about whether
this would be reasonable to do at this phase and people seemed fine with
it, so making it a reality. This should allow us to start bootstrapping
with modules to a certain extent along with making it easier to mix and
match headers in general.
The updates to any code for users of LLVM are very mechanical. Switch
from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
Qualify the types which now produce compile errors with "legacy::". The
most common ones are "PassManager", "PassManagerBase", and
"FunctionPassManager".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229094 91177308-0d34-0410-b5e6-96231b3b80d8
analyses back into the LTO code generator.
The pass manager builder (and the transforms library in general)
shouldn't be referencing the target machine at all.
This makes the LTO population work like the others -- the data layout
and target transform info need to be pre-populated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227576 91177308-0d34-0410-b5e6-96231b3b80d8
manager to support the actual uses of it. =]
When I ported instcombine to the new pass manager I discover that it
didn't work because TLI wasn't available in the right places. This is
a somewhat surprising and/or subtle aspect of the new pass manager
design that came up before but I think is useful to be reminded of:
While the new pass manager *allows* a function pass to query a module
analysis, it requires that the module analysis is already run and cached
prior to the function pass manager starting up, possibly with
a 'require<foo>' style utility in the pass pipeline. This is an
intentional hurdle because using a module analysis from a function pass
*requires* that the module analysis is run prior to entering the
function pass manager. Otherwise the other functions in the module could
be in who-knows-what state, etc.
A somewhat surprising consequence of this design decision (at least to
me) is that you have to design a function pass that leverages
a module analysis to do so as an optional feature. Even if that means
your function pass does no work in the absence of the module analysis,
you have to handle that possibility and remain conservatively correct.
This is a natural consequence of things being able to invalidate the
module analysis and us being unable to re-run it. And it's a generally
good thing because it lets us reorder passes arbitrarily without
breaking correctness, etc.
This ends up causing problems in one case. What if we have a module
analysis that is *definitionally* impossible to invalidate. In the
places this might come up, the analysis is usually also definitionally
trivial to run even while other transformation passes run on the module,
regardless of the state of anything. And so, it follows that it is
natural to have a hard requirement on such analyses from a function
pass.
It turns out, that TargetLibraryInfo is just such an analysis, and
InstCombine has a hard requirement on it.
The approach I've taken here is to produce an analysis that models this
flexibility by making it both a module and a function analysis. This
exposes the fact that it is in fact safe to compute at any point. We can
even make it a valid CGSCC analysis at some point if that is useful.
However, we don't want to have a copy of the actual target library info
state for each function! This state is specific to the triple. The
somewhat direct and blunt approach here is to turn TLI into a pimpl,
with the state and mutators in the implementation class and the query
routines primarily in the wrapper. Then the analysis can lazily
construct and cache the implementations, keyed on the triple, and
on-demand produce wrappers of them for each function.
One minor annoyance is that we will end up with a wrapper for each
function in the module. While this is a bit wasteful (one pointer per
function) it seems tolerable. And it has the advantage of ensuring that
we pay the absolute minimum synchronization cost to access this
information should we end up with a nice parallel function pass manager
in the future. We could look into trying to mark when analysis results
are especially cheap to recompute and more eagerly GC-ing the cached
results, or we could look at supporting a variant of analyses whose
results are specifically *not* cached and expected to just be used and
discarded by the consumer. Either way, these seem like incremental
enhancements that should happen when we start profiling the memory and
CPU usage of the new pass manager and not before.
The other minor annoyance is that if we end up using the TLI in both
a module pass and a function pass, those will be produced by two
separate analyses, and thus will point to separate copies of the
implementation state. While a minor issue, I dislike this and would like
to find a way to cleanly allow a single analysis instance to be used
across multiple IR unit managers. But I don't have a good solution to
this today, and I don't want to hold up all of the work waiting to come
up with one. This too seems like a reasonable thing to incrementally
improve later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226981 91177308-0d34-0410-b5e6-96231b3b80d8
This is mostly a cleanup, but it changes a fairly old behavior.
Every "real" LTO user was already disabling the silly internalize pass
and creating the internalize pass itself. The difference with this
patch is for "opt -std-link-opts" and the C api.
Now to get a usable behavior out of opt one doesn't need the funny
looking command line:
opt -internalize -disable-internalize -internalize-public-api-list=foo,bar -std-link-opts
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214919 91177308-0d34-0410-b5e6-96231b3b80d8
This extension point allows adding passes that perform peephole optimizations
similar to the instruction combiner. These passes will be inserted after
each instance of the instruction combiner pass.
Differential Revision: http://reviews.llvm.org/D3905
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209595 91177308-0d34-0410-b5e6-96231b3b80d8
The intended behaviour is to force vectorization on the presence
of the flag (either turn on or off), and to continue the behaviour
as expected in its absence. Tests were added to make sure the all
cases are covered in opt. No tests were added in other tools with
the assumption that they should use the PassManagerBuilder in the
same way.
This patch also removes the outdated -late-vectorize flag, which was
on by default and not helping much.
The pragma metadata is being attached to the same place as other loop
metadata, but nothing forbids one from attaching it to a function
(to enable #pragma optimize) or basic blocks (to hint the basic-block
vectorizers), etc. The logic should be the same all around.
Patches to Clang to produce the metadata will be produced after the
initial implementation is agreed upon and committed. Patches to other
vectorizers (such as SLP and BB) will be added once we're happy with
the pass manager changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196537 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a boolean member variable to the PassManagerBuilder to control loop
rerolling (just like we have for unrolling and the various vectorization
options). This is necessary for control by the frontend. Loop rerolling remains
disabled by default at all optimization levels.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194966 91177308-0d34-0410-b5e6-96231b3b80d8
give the files a legacy prefix in the right directory. Use forwarding
headers in the old locations to paper over the name change for most
clients during the transitional period.
No functionality changed here! This is just clearing some space to
reduce renaming churn later on with a new system.
Even when the new stuff starts to go in, it is going to be hidden behind
a flag and off-by-default as it is still WIP and under development.
This patch is specifically designed so that very little out-of-tree code
has to change. I'm going to work as hard as I can to keep that the case.
Only direct forward declarations of the PassManager class are impacted
by this change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194324 91177308-0d34-0410-b5e6-96231b3b80d8
CGSCC pass manager. This should insulate the inlining decisions from the
vectorization decisions, however it may have both compile time and code
size problems so it is just an experimental option right now.
Adding this based on a discussion with Arnold and it seems at least
worth having this flag for us to both run some experiments to see if
this strategy is workable. It may solve some of the regressions seen
with the loop vectorizer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184698 91177308-0d34-0410-b5e6-96231b3b80d8
This commit completely removes what is left of the simplify-libcalls
pass. All of the functionality has now been migrated to the instcombine
and functionattrs passes. The following C API functions are now NOPs:
1. LLVMAddSimplifyLibCallsPass
2. LLVMPassManagerBuilderSetDisableSimplifyLibCalls
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184459 91177308-0d34-0410-b5e6-96231b3b80d8
interfaces. These methods were used in the old inline cost system where
there was a persistent cache that had to be updated, invalidated, and
cleared. We're now doing more direct computations that don't require
this intricate dance. Even if we resume some level of caching, it would
almost certainly have a simpler and more narrow interface than this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153813 91177308-0d34-0410-b5e6-96231b3b80d8
on a per-callsite walk of the called function's instructions, in
breadth-first order over the potentially reachable set of basic blocks.
This is a major shift in how inline cost analysis works to improve the
accuracy and rationality of inlining decisions. A brief outline of the
algorithm this moves to:
- Build a simplification mapping based on the callsite arguments to the
function arguments.
- Push the entry block onto a worklist of potentially-live basic blocks.
- Pop the first block off of the *front* of the worklist (for
breadth-first ordering) and walk its instructions using a custom
InstVisitor.
- For each instruction's operands, re-map them based on the
simplification mappings available for the given callsite.
- Compute any simplification possible of the instruction after
re-mapping, and store that back int othe simplification mapping.
- Compute any bonuses, costs, or other impacts of the instruction on the
cost metric.
- When the terminator is reached, replace any conditional value in the
terminator with any simplifications from the mapping we have, and add
any successors which are not proven to be dead from these
simplifications to the worklist.
- Pop the next block off of the front of the worklist, and repeat.
- As soon as the cost of inlining exceeds the threshold for the
callsite, stop analyzing the function in order to bound cost.
The primary goal of this algorithm is to perfectly handle dead code
paths. We do not want any code in trivially dead code paths to impact
inlining decisions. The previous metric was *extremely* flawed here, and
would always subtract the average cost of two successors of
a conditional branch when it was proven to become an unconditional
branch at the callsite. There was no handling of wildly different costs
between the two successors, which would cause inlining when the path
actually taken was too large, and no inlining when the path actually
taken was trivially simple. There was also no handling of the code
*path*, only the immediate successors. These problems vanish completely
now. See the added regression tests for the shiny new features -- we
skip recursive function calls, SROA-killing instructions, and high cost
complex CFG structures when dead at the callsite being analyzed.
Switching to this algorithm required refactoring the inline cost
interface to accept the actual threshold rather than simply returning
a single cost. The resulting interface is pretty bad, and I'm planning
to do lots of interface cleanup after this patch.
Several other refactorings fell out of this, but I've tried to minimize
them for this patch. =/ There is still more cleanup that can be done
here. Please point out anything that you see in review.
I've worked really hard to try to mirror at least the spirit of all of
the previous heuristics in the new model. It's not clear that they are
all correct any more, but I wanted to minimize the change in this single
patch, it's already a bit ridiculous. One heuristic that is *not* yet
mirrored is to allow inlining of functions with a dynamic alloca *if*
the caller has a dynamic alloca. I will add this back, but I think the
most reasonable way requires changes to the inliner itself rather than
just the cost metric, and so I've deferred this for a subsequent patch.
The test case is XFAIL-ed until then.
As mentioned in the review mail, this seems to make Clang run about 1%
to 2% faster in -O0, but makes its binary size grow by just under 4%.
I've looked into the 4% growth, and it can be fixed, but requires
changes to other parts of the inliner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@153812 91177308-0d34-0410-b5e6-96231b3b80d8
directly query the function information which this set was representing.
This simplifies the interface of the inline cost analysis, and makes the
always-inline pass significantly more efficient.
Previously, always-inline would first make a single set of every
function in the module *except* those marked with the always-inline
attribute. It would then query this set at every call site to see if the
function was a member of the set, and if so, refuse to inline it. This
is quite wasteful. Instead, simply check the function attribute directly
when looking at the callsite.
The normal inliner also had similar redundancy. It added every function
in the module with the noinline attribute to its set to ignore, even
though inside the cost analysis function we *already tested* the
noinline attribute and produced the same result.
The only tricky part of removing this is that we have to be able to
correctly remove only the functions inlined by the always-inline pass
when finalizing, which requires a bit of a hack. Still, much less of
a hack than the set of all non-always-inline functions was. While I was
touching this function, I switched a heavy-weight set to a vector with
sort+unique. The algorithm already had a two-phase insert and removal
pattern, we were just needlessly paying the uniquing cost on every
insert.
This probably speeds up some compiles by a small amount (-O0 compiles
with lots of always-inline, so potentially heavy libc++ users), but I've
not tried to measure it.
I believe there is no functional change here, but yell if you spot one.
None are intended.
Finally, the direction this is going in is to greatly simplify the
inline cost query interface so that we can replace its implementation
with a much more clever one. Along the way, all the APIs get simplified,
so it seems incrementally good.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@152903 91177308-0d34-0410-b5e6-96231b3b80d8
are optimization hints, but at -O0 we're not optimizing. This becomes a problem
when the alwaysinline attribute is abused.
rdar://10921594
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151429 91177308-0d34-0410-b5e6-96231b3b80d8
This is the initial checkin of the basic-block autovectorization pass along with some supporting vectorization infrastructure.
Special thanks to everyone who helped review this code over the last several months (especially Tobias Grosser).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@149468 91177308-0d34-0410-b5e6-96231b3b80d8
Before 3.0, I'd like to add a mechanism for automatically loading a set of plugins from a config file. API suggestions welcome...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137717 91177308-0d34-0410-b5e6-96231b3b80d8
but it solves a layering violation since things in Support are not supposed to
use things in Transforms.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136726 91177308-0d34-0410-b5e6-96231b3b80d8