Add new API for converting temporaries that may self-reference.
Self-referencing nodes are not allowed to be uniqued, so sending them
into `replaceWithUniqued()` is dangerous (and this commit adds
assertions that prevent it).
`replaceWithPermanent()` has similar semantics to `get()` followed by
calls to `replaceOperandWith()`. In particular, if there's a
self-reference, it returns a distinct node; otherwise, it returns a
uniqued one. Like `replaceWithUniqued()` and `replaceWithDistinct()`
(well, it calls out to them) it mutates the temporary node in place if
possible, only calling `replaceAllUsesWith()` on a uniquing collision.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228726 91177308-0d34-0410-b5e6-96231b3b80d8
On Windows, we now use RaiseException to generate the kind of trap we require (one which calls our vectored exception handler), and fall back to using a volatile write to simulate a trap elsewhere.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228691 91177308-0d34-0410-b5e6-96231b3b80d8
std::strings) rather than StringRefs in JITSymbol get-address lambda.
Capturing a StringRef by-value is still effectively capturing a reference, which
is no good here because the referenced string may be gone by the time the lambda
is being evaluated the original value may be gone. Make sure to capture a
std::string instead.
No test case: This bug doesn't manifest under OrcMCJITReplacement, since it
keeps IR modules (from which the StringRefs are sourced) alive permanently.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228676 91177308-0d34-0410-b5e6-96231b3b80d8
This allows all CMake projects, as well as C++ code, to detect if
and when DIA SDK is available for use so that we can enable the
DIA-based PDB reader implementation.
Differential Revision: http://reviews.llvm.org/D7457
Reviewed By: Chandler Carruth
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228669 91177308-0d34-0410-b5e6-96231b3b80d8
I noticed this fields were never used in r228607, but I neglected to
propagate that into `MDTemplateParameter` until now. This really should
have been done before commit in r228640; sorry for the churn.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228652 91177308-0d34-0410-b5e6-96231b3b80d8
Add specialized debug info metadata nodes that match the `DIDescriptor`
wrappers (used by `DIBuilder`) closely. Assembly and bitcode support to
follow soon (it'll mostly just be obvious), but this sketches in today's
schema. This is the first big commit (well, the only *big* one aside
from the testcase changes that'll come when I move this into place) for
PR22464.
I've marked a bunch of obvious changes as `TODO`s in the source; I plan
to make those changes promptly after this hierarchy is moved underneath
`DIDescriptor`, but for now I'm aiming mostly to match the status quo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228640 91177308-0d34-0410-b5e6-96231b3b80d8
intermediate representation. This
- increases consistency by using the same granularity everywhere
- allows for pieces < 1 byte
- DW_OP_piece didn't actually allow storing an offset.
Part of PR22495.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228631 91177308-0d34-0410-b5e6-96231b3b80d8
I just realized that the specialized metadata node patch I'm about to
commit won't compile on old compilers. Bump `hash_combine()`'s support
for non-variadic templates to 18 (I tested this by reversing the logic
in the #ifdef).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228629 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
It's important that our users immediately know what gc.safepoint_poll
is. Also fix the style of the declaration of CreateGCStatepoint, in
preparation for another change that will wrap it.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7517
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228626 91177308-0d34-0410-b5e6-96231b3b80d8
`DIExpression` deals with `uint64_t`, so it doesn't make sense that
`createExpression()` is created from `int64_t`. Switch to `uint64_t` to
unify them.
I've temporarily left in the `int64_t` version, which forwards to the
`uint64_t` version. I'll delete it once I've updated the callers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228619 91177308-0d34-0410-b5e6-96231b3b80d8
5 minutes is an eternity, so try to strike a better balance between
waiting long enough for any reasonable module build and not so long that
users kill the process because they think it's hanging.
Also give the client a way to delete the lock file after a timeout.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228603 91177308-0d34-0410-b5e6-96231b3b80d8
As far as I can tell r228568 was the right workaround, and r228567 was
unnecessary. If reverting this causes problems on the bots I'll reinstate it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228585 91177308-0d34-0410-b5e6-96231b3b80d8
Apparently gcc-4.7.2 is touchy about 'this' appearing in a lambda capture list
along with other captures. I've rewritten my captures to try to avoid the issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228567 91177308-0d34-0410-b5e6-96231b3b80d8
This patch refactors a key piece of the Orc APIs: It removes the
*::getSymbolAddress and *::lookupSymbolAddressIn methods, which returned target
addresses (uint64_ts), and replaces them with *::findSymbol and *::findSymbolIn
respectively, which return instances of the new JITSymbol type. Unlike the old
methods, calling findSymbol or findSymbolIn does not cause the symbol to be
immediately materialized when found. Instead, the symbol will be materialized
if/when the getAddress method is called on the returned JITSymbol. This allows
us to query for the existence of symbols without actually materializing them. In
the future I expect more information to be attached to the JITSymbol class, for
example whether the returned symbol is a weak or strong definition. This will
allow us to properly handle weak symbols and multiple definitions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228557 91177308-0d34-0410-b5e6-96231b3b80d8
Dumping a symbol often requires access to data that isn't inside
the symbol hierarchy, but which is only accessible through the
top-level session. This patch is a pure interface change to give
symbols a reference to the session.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228542 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The alias.scope metadata represents sets of things an instruction might
alias with. When generically combining the metadata from two
instructions the result must be the union of the original sets, because
the new instruction might alias with anything any of the original
instructions aliased with.
Reviewers: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7490
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228525 91177308-0d34-0410-b5e6-96231b3b80d8
Gather and Scatter are new introduced intrinsics, comming after recently implemented masked load and store.
This is the first patch for Gather and Scatter intrinsics. It includes only the syntax, parsing and verification.
Gather and Scatter intrinsics allow to perform multiple memory accesses (read/write) in one vector instruction.
The intrinsics are not target specific and will have the following syntax:
Gather:
declare <16 x i32> @llvm.masked.gather.v16i32(<16 x i32*> <vector of ptrs>, i32 <alignment>, <16 x i1> <mask>, <16 x i32> <passthru>)
declare <8 x float> @llvm.masked.gather.v8f32(<8 x float*><vector of ptrs>, i32 <alignment>, <8 x i1> <mask>, <8 x float><passthru>)
Scatter:
declare void @llvm.masked.scatter.v8i32(<8 x i32><vector value to be stored> , <8 x i32*><vector of ptrs> , i32 <alignment>, <8 x i1> <mask>)
declare void @llvm.masked.scatter.v16i32(<16 x i32> <vector value to be stored> , <16 x i32*> <vector of ptrs>, i32 <alignment>, <16 x i1><mask> )
Vector of ptrs - a set of source/destination addresses, to load/store the value.
Mask - switches on/off vector lanes to prevent memory access for switched-off lanes
vector of ptrs, value and mask should have the same vector width.
These are code examples where gather / scatter should be used and will allow function vectorization
;void foo1(int * restrict A, int * restrict B, int * restrict C) {
; for (int i=0; i<SIZE; i++) {
; A[i] = B[C[i]];
; }
;}
;void foo3(int * restrict A, int * restrict B) {
; for (int i=0; i<SIZE; i++) {
; A[B[i]] = i+5;
; }
;}
Tests will come in the following patches, with CodeGen and Vectorizer.
http://reviews.llvm.org/D7433
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228521 91177308-0d34-0410-b5e6-96231b3b80d8
This patch implements a few of the optional suggestions from the
initial patch comitting libpdb. In particular, it implements a
virtual function out of line for each of the concrete classes.
A few other minor cleanups exist as well, such as using override
instead of virtual, etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228516 91177308-0d34-0410-b5e6-96231b3b80d8
Composing DenseMaps and SmallVectors is still somewhat suboptimal,
but this at least halves the size of the vector elements. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228497 91177308-0d34-0410-b5e6-96231b3b80d8
This resolves the strange effect that emplace_back is only available
when the type contained in the vector is not trivially copyable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228496 91177308-0d34-0410-b5e6-96231b3b80d8
Use definition file for `DW_VIRTUALITY_*`. Add a `DW_VIRTUALITY_max`
both for ease of testing and for future use by the `LLParser`.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228473 91177308-0d34-0410-b5e6-96231b3b80d8
This change resubmits the patch that broke the build, this time
without unittests. The unittests will be submitted separately
after the problem has been addressed:
--Original Commit Message--
Create lib/DebugInfo/PDB.
This patch creates a platform-independent interface to a PDB reader.
There is currently no implementation of this interface, which will
be provided in future patches. This defines the basic object model
which any implementation must conform to.
Reviewed by: David Blaikie
Differential Revision: http://reviews.llvm.org/D7356
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228435 91177308-0d34-0410-b5e6-96231b3b80d8
If complete-unroll could help us to optimize away N% of instructions, we
might want to do this even if the final size would exceed loop-unroll
threshold. However, we don't want to unroll huge loop, and we are add
AbsoluteThreshold to avoid that - this threshold will never be crossed,
even if we expect to optimize 99% instructions after that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228434 91177308-0d34-0410-b5e6-96231b3b80d8
It is a variation of SimplifyBinOp, but it takes into account
FastMathFlags.
It is needed in inliner and loop-unroller to accurately predict the
transformation's outcome (previously we dropped the flags and were too
conservative in some cases).
Example:
float foo(float *a, float b) {
float r;
if (a[1] * b)
r = /* a lot of expensive computations */;
else
r = 1;
return r;
}
float boo(float *a) {
return foo(a, 0.0);
}
Without this patch, we don't inline 'foo' into 'boo'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228432 91177308-0d34-0410-b5e6-96231b3b80d8
This patch creates a platform-independent interface to a PDB reader.
There is currently no implementation of this interface, which will
be provided in future patches. This defines the basic object model
which any implementation must conform to.
Reviewed by: David Blaikie
Differential Revision: http://reviews.llvm.org/D7356
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228428 91177308-0d34-0410-b5e6-96231b3b80d8
This was a trivial think-o, but it's in a method of a templated class
and doesn't have any callers yet, so the compiler let it pass. I hope
to add a unit test to cover this soon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228425 91177308-0d34-0410-b5e6-96231b3b80d8
by using a segment set.
The patch addresses a compile-time performance regression in the LiveIntervals
analysis pass (see http://llvm.org/bugs/show_bug.cgi?id=18580). This regression
is especially critical when compiling long functions. Our analysis had shown
that the most of time is taken for generation of live intervals for physical
registers. Insertions in the middle of the array of live ranges cause quadratic
algorithmic complexity, which is apparently the main reason for the slow-down.
Overview of changes:
- The patch introduces an additional std::set<Segment>* member in LiveRange for
storing segments in the phase of initial creation. The set is used if this
member is not NULL, otherwise everything works the old way.
- The set of operations on LiveRange used during initial creation (i.e. used by
createDeadDefs and extendToUses) have been reimplemented to use the segment
set if it is available.
- After a live range is created the contents of the set are flushed to the
segment vector, because the set is not as efficient as the vector for the
later uses of the live range. After the flushing, the set is deleted and
cannot be used again.
- The set is only for live ranges computed in
LiveIntervalAnalysis::computeLiveInRegUnits() and getRegUnit() but not in
computeVirtRegs(), because I did not bring any performance benefits to
computeVirtRegs() and for some examples even brought a slow down.
Patch by Vaidas Gasiunas <vaidas.gasiunas@sap.com>
Differential Revision: http://reviews.llvm.org/D6013
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228421 91177308-0d34-0410-b5e6-96231b3b80d8
This will allow it to be shared with the new Loop Distribution pass.
getFirstInst is currently duplicated across LoopVectorize.cpp and
LoopAccessAnalysis.cpp. This is a short-term work-around until we figure out
a better solution.
NFC. (The code moved is adjusted a bit for the name of the Loop member and
that PtrRtCheck is now a reference rather than a pointer.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228418 91177308-0d34-0410-b5e6-96231b3b80d8
Since testing the function indirectly is tricky, introduce a direct
print-memderefs pass, in the same spirit as print-memdeps, which prints
dereferenceability information matched by FileCheck.
Differential Revision: http://reviews.llvm.org/D7075
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228369 91177308-0d34-0410-b5e6-96231b3b80d8
The combine that forms extloads used to be disabled on vector types,
because "None of the supported targets knows how to perform load and
sign extend on vectors in one instruction."
That's not entirely true, since at least SSE4.1 X86 knows how to do
those sextloads/zextloads (with PMOVS/ZX).
But there are several aspects to getting this right.
First, vector extloads are controlled by a profitability callback.
For instance, on ARM, several instructions have folded extload forms,
so it's not always beneficial to create an extload node (and trying to
match extloads is a whole 'nother can of worms).
The interesting optimization enables folding of s/zextloads to illegal
(splittable) vector types, expanding them into smaller legal extloads.
It's not ideal (it introduces some legalization-like behavior in the
combine) but it's better than the obvious alternative: form illegal
extloads, and later try to split them up. If you do that, you might
generate extloads that can't be split up, but have a valid ext+load
expansion. At vector-op legalization time, it's too late to generate
this kind of code, so you end up forced to scalarize. It's better to
just avoid creating egregiously illegal nodes.
This optimization is enabled unconditionally on X86.
Note that the splitting combine is happy with "custom" extloads. As
is, this bypasses the actual custom lowering, and just unrolls the
extload. But from what I've seen, this is still much better than the
current custom lowering, which does some kind of unrolling at the end
anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added
FIXME).
Also note that the existing combine that forms extloads is now also
enabled on legal vectors. This doesn't have a big effect on X86
(because sext+load is usually combined to sext_inreg+aextload).
On ARM it fires on some rare occasions; that's for a separate commit.
Differential Revision: http://reviews.llvm.org/D6904
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228325 91177308-0d34-0410-b5e6-96231b3b80d8
The node is still defined oddly so that the
address spaces are not operands and not accessible
from tablegen, but as-is this can now be used to write
a ComplexPattern with an addrspacecast root node.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228270 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: When evaluating floating point instructions in the inliner, ask the TTI whether it is an expensive operation. By default, it's not an expensive operation. This keeps the default behavior the same as before. The ARM TTI has been updated to return back TCC_Expensive for targets which don't have hardware floating point.
Reviewers: chandlerc, echristo
Reviewed By: echristo
Subscribers: t.p.northover, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D6936
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228263 91177308-0d34-0410-b5e6-96231b3b80d8
Add some API to `APSInt` to make it easier to compare with `int64_t`.
- `APSInt::compareValues(APSInt, APSInt)` returns 1, -1 or 0 for
greater, lesser, or equal, doing the right thing for mismatched
"has-sign" and bitwidths. This is just like `isSameValue()` (and is
now the implementation of it).
- `APSInt::get(int64_t)` gets a signed `APSInt`.
- `operator<(int64_t)`, etc., are implemented trivially via `get()`
and `compareValues()`.
- Also added `APSInt::getUnsigned(uint64_t)` to make it easier to test
`compareValues()`.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228239 91177308-0d34-0410-b5e6-96231b3b80d8
This used to do something when we modeled the Cygwin and MinGW
environments as distinct OSs, but now it is not needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228229 91177308-0d34-0410-b5e6-96231b3b80d8
In case CSE reuses a previoulsy unused register the dead-def flag has to
be cleared on the def operand, as exposed by the arm64-cse.ll test.
This fixes PR22439 and the corresponding rdar://19694987
Differential Revision: http://reviews.llvm.org/D7395
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228178 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change allows users to create SpecialCaseList objects from
multiple local files. This is needed to implement a proper support
for -fsanitize-blacklist flag (allow users to specify multiple blacklists,
in addition to default blacklist, see PR22431).
DFSan can also benefit from this change, as DFSan instrumentation pass now
accepts ABI-lists both from -fsanitize-blacklist= and -mllvm -dfsan-abilist flags.
Go bindings are fixed accordingly.
Test Plan: regression test suite
Reviewers: pcc
Subscribers: llvm-commits, axw, kcc
Differential Revision: http://reviews.llvm.org/D7367
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228155 91177308-0d34-0410-b5e6-96231b3b80d8
This pass is responsible for figuring out where to place call safepoints and safepoint polls. It doesn't actually make the relocations explicit; that's the job of the RewriteStatepointsForGC pass (http://reviews.llvm.org/D6975).
Note that this code is not yet finalized. Its moving in tree for incremental development, but further cleanup is needed and will happen over the next few days. It is not yet part of the standard pass order.
Planned changes in the near future:
- I plan on restructuring the statepoint rewrite to use the functions add to the IRBuilder a while back.
- In the current pass, the function "gc.safepoint_poll" is treated specially but is not an intrinsic. I plan to make identifying the poll function a property of the GCStrategy at some point in the near future.
- As follow on patches, I will be separating a collection of test cases we have out of tree and submitting them upstream.
- It's not explicit in the code, but these two patches are introducing a new state for a statepoint which looks a lot like a patchpoint. There's no a transient form which doesn't yet have the relocations explicitly represented, but does prevent reordering of memory operations. Once this is in, I need to update actually make this explicit by reserving the 'unused' argument of the statepoint as a flag, updating the docs, and making the code explicitly check for such a thing. This wasn't really planned, but once I split the two passes - which was done for other reasons - the intermediate state fell out. Just reminds us once again that we need to merge statepoints and patchpoints at some point in the not that distant future.
Future directions planned:
- Identifying more cases where a backedge safepoint isn't required to ensure timely execution of a safepoint poll.
- Tweaking the insertion process to generate easier to optimize IR. (For example, investigating making SplitBackedge) the default.
- Adding opt-in flags for a GCStrategy to use this pass. Once done, add this pass to the actual pass ordering.
Differential Revision: http://reviews.llvm.org/D6981
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228090 91177308-0d34-0410-b5e6-96231b3b80d8
Creating empty and expansion regions is awkward with the current API.
Expose static methods to make this simpler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228075 91177308-0d34-0410-b5e6-96231b3b80d8
Also re-implements the `dwarf::Tag` enumerator. I've moved the mock
tags into the enumerator since there's no other way to do this. Really
they shouldn't be used at all (they're just a hack to identify
`MDNode`s, but we have a class hierarchy for that now).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228030 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Straight-line strength reduction (SLSR) is implemented in GCC but not yet in
LLVM. It has proven to effectively simplify statements derived from an unrolled
loop, and can potentially benefit many other cases too. For example,
LLVM unrolls
#pragma unroll
foo (int i = 0; i < 3; ++i) {
sum += foo((b + i) * s);
}
into
sum += foo(b * s);
sum += foo((b + 1) * s);
sum += foo((b + 2) * s);
However, no optimizations yet reduce the internal redundancy of the three
expressions:
b * s
(b + 1) * s
(b + 2) * s
With SLSR, LLVM can optimize these three expressions into:
t1 = b * s
t2 = t1 + s
t3 = t2 + s
This commit is only an initial step towards implementing a series of such
optimizations. I will implement more (see TODO in the file commentary) in the
near future. This optimization is enabled for the NVPTX backend for now.
However, I am more than happy to push it to the standard optimization pipeline
after more thorough performance tests.
Test Plan: test/StraightLineStrengthReduce/slsr.ll
Reviewers: eliben, HaoLiu, meheff, hfinkel, jholewinski, atrick
Reviewed By: jholewinski, atrick
Subscribers: karthikthecool, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D7310
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228016 91177308-0d34-0410-b5e6-96231b3b80d8
lto_codegen_compile_optimized. Also add lto_api_version.
Before this commit, we can only dump the optimized bitcode after running
lto_codegen_compile, but it includes some impacts of running codegen passes,
one example is StackProtector pass. We will get assertion failure when running
llc on the optimized bitcode, because StackProtector is effectively run twice.
After splitting lto_codegen_compile, the linker can choose to dump the bitcode
before running lto_codegen_compile_optimized.
lto_api_version is added so ld64 can check for runtime-availability of the new
API.
rdar://19565500
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228000 91177308-0d34-0410-b5e6-96231b3b80d8
LoopVectorizationLegality::{getNumLoads,getNumStores} should forward to
LoopAccessAnalysis now.
Thanks to Takumi for noticing this!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227992 91177308-0d34-0410-b5e6-96231b3b80d8
The PBQP::RegAlloc::MatrixMetadata class assumes that matrices have at least two
rows/columns (for the spill option plus at least one physreg). This patch
ensures that that invariant is met by pre-spilling vregs that have no physreg
options so that no node (and no corresponding edges) need be added to the PBQP
graph.
This fixes a bug in an out-of-tree target that was identified by Jonas Paulsson.
Thanks for tracking this down Jonas!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227942 91177308-0d34-0410-b5e6-96231b3b80d8
This is still kind of a weird API, but dropping the (partial) update
of the passed in CoverageMappingRecord makes it a little easier to
understand and use.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227900 91177308-0d34-0410-b5e6-96231b3b80d8
Allow `GenericDebugNode` construction directly from `MDString`, rather
than requiring `StringRef`s. I've refactored the `StringRef`
constructors to use these. There's no real functionality change here,
except for exposing the lower-level API.
The purpose of this is to simplify construction of string operands when
reading bitcode. It's unnecessarily indirect to parse an `MDString` ID,
lookup the `MDString` in the bitcode reader list, get the `StringRef`
out of that, and then have `GenericDebugNode::getImpl()` use
`MDString::get()` to acquire the original `MDString`. Instead, this
allows the bitcode reader to directly pass in the `MDString`.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227848 91177308-0d34-0410-b5e6-96231b3b80d8
Move debug-info-centred `Metadata` subclasses into their own
header/source file. A couple of private template functions are needed
from both `Metadata.cpp` and `DebugInfoMetadata.cpp`, so I've moved them
to `lib/IR/MetadataImpl.h`.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227835 91177308-0d34-0410-b5e6-96231b3b80d8
Similar to the C++14 void specializations of these templates, useful as
a stop-gap until LLVM switches to '14.
Example use-cases in tblgen because I saw some functors that looked like
they could be simplified/refactored.
Reviewers: dexonsmith
Differential Revision: http://reviews.llvm.org/D7324
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227828 91177308-0d34-0410-b5e6-96231b3b80d8
finalization time.
As currently implemented, RuntimeDyldELF requires the original object
file to be avaible when relocations are being resolved. This patch
ensures that the ObjectLinkingLayer preserves it until then. In the
future RuntimeDyldELF should be rewritten to remove this requirement, at
which point this patch can be reverted.
Regression test cases for Orc (which include coverage of this bug) will
be committed shortly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227778 91177308-0d34-0410-b5e6-96231b3b80d8
Other than moving code and adding the boilerplate for the new files, the code
being moved is unchanged.
There are a few global functions that are shared with the rest of the
LoopVectorizer. I moved these to the new module as well (emitLoopAnalysis,
stripIntegerCast, replaceSymbolicStrideSCEV) along with the Report class used
by emitLoopAnalysis. There is probably room for further improvement in this
area.
I kept DEBUG_TYPE "loop-vectorize" because it's used as the PassName with
emitOptimizationRemarkAnalysis. This will obviously have to change.
NFC. This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227756 91177308-0d34-0410-b5e6-96231b3b80d8
VectorUtils.h needs to be included in LoopAccessAnalysis.cpp for
getIntrinsicIDForCall but hasVectorInstrinsicScalarOpd is not used by this
module.
NFC. This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227753 91177308-0d34-0410-b5e6-96231b3b80d8
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a
reserved call frame), and perform rudimentary call folding. It still doesn't
have a heuristic, so it is enabled only for optsize/minsize, with stack
alignment <= 8, where it ought to be a fairly clear win.
(Re-commit of r227728)
Differential Revision: http://reviews.llvm.org/D6789
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227752 91177308-0d34-0410-b5e6-96231b3b80d8
now that we have a correct and cached subtarget specific to the
function.
Also, finish providing a cached per-function subtarget in the core
LLVMTargetMachine -- that layer hadn't switched over yet.
The only use of the TargetMachine was to re-lookup a subtarget for
a particular function to work around the fact that TTI was immutable.
Now that it is per-function and we haved a cached subtarget, use it.
This still leaves a few interfaces with real warts on them where we were
passing Function objects through the TTI interface. I'll remove these
and clean their usage up in subsequent commits now that this isn't
necessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227738 91177308-0d34-0410-b5e6-96231b3b80d8
intermediate TTI implementation template and instead query up to the
derived class for both the TargetMachine and the TargetLowering.
Most of the derived types had a TLI cached already and there is no need
to store a less precisely typed target machine pointer.
This will in turn make it much cleaner to look up the TLI via
a per-function subtarget instead of the generic subtarget, and it will
pave the way toward pulling the subtarget used for unroll preferences
into the same form once we are *always* using the function to look up
the correct subtarget.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227737 91177308-0d34-0410-b5e6-96231b3b80d8
TargetIRAnalysis access path directly rather than implementing getTTI.
This even removes getTTI from the interface. It's more efficient for
each target to just register a precise callback that creates their
specific TTI.
As part of this, all of the targets which are building their subtargets
individually per-function now build their TTI instance with the function
and thus look up the correct subtarget and cache it. NVPTX, R600, and
XCore currently don't leverage this functionality, but its trivial for
them to add it now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227735 91177308-0d34-0410-b5e6-96231b3b80d8
null.
For some reason some of the original TTI code supported a null target
machine. This seems to have been legacy, and I made matters worse when
refactoring this code by spreading that pattern further through the
various targets.
The TargetMachine can't actually be null, and it doesn't make sense to
support that use case. I've now consistently removed it and removed all
of the code trying to cope with that situation. This is probably good,
as several targets *didn't* cope with it being null despite the null
default argument in their constructors. =]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227734 91177308-0d34-0410-b5e6-96231b3b80d8
terms of the new pass manager's TargetIRAnalysis.
Yep, this is one of the nicer bits of the new pass manager's design.
Passes can in many cases operate in a vacuum and so we can just nest
things when convenient. This is particularly convenient here as I can
now consolidate all of the TargetMachine logic on this analysis.
The most important change here is that this pushes the function we need
TTI for all the way into the TargetMachine, and re-creates the TTI
object for each function rather than re-using it for each function.
We're now prepared to teach the targets to produce function-specific TTI
objects with specific subtargets cached, etc.
One piece of feedback I'd love here is whether its worth renaming any of
this stuff. None of the names really seem that awesome to me at this
point, but TargetTransformInfoWrapperPass is particularly ... odd.
TargetIRAnalysisWrapper might make more sense. I would want to do that
rename separately anyways, but let me know what you think.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227731 91177308-0d34-0410-b5e6-96231b3b80d8
getTTI method used to get an actual TTI object.
No functionality changed. This just threads the argument and ensures
code like the inliner can correctly look up the callee's TTI rather than
using a fixed one.
The next change will use this to implement per-function subtarget usage
by TTI. The changes after that should eliminate the need for FTTI as that
will have become the default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227730 91177308-0d34-0410-b5e6-96231b3b80d8
This moves the transformation introduced in r223757 into a separate MI pass.
This allows it to cover many more cases (not only cases where there must be a
reserved call frame), and perform rudimentary call folding. It still doesn't
have a heuristic, so it is enabled only for optsize/minsize, with stack
alignment <= 8, where it ought to be a fairly clear win.
Differential Revision: http://reviews.llvm.org/D6789
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227728 91177308-0d34-0410-b5e6-96231b3b80d8
This should be sufficient to replace the initial (minor) function pass
pipeline in Clang with the new pass manager. I'll probably add an (off
by default) flag to do that just to ensure we can get extra testing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227726 91177308-0d34-0410-b5e6-96231b3b80d8
I've added RUN lines both to the basic test for EarlyCSE and the
target-specific test, as this serves as a nice test that the TTI layer
in the new pass manager is in fact working well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227725 91177308-0d34-0410-b5e6-96231b3b80d8
over declarations.
This is both quite unproductive and causes things to crash, for example
domtree would just assert.
I've added a declaration and a domtree run to the basic high-level tests
for the new pass manager.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227724 91177308-0d34-0410-b5e6-96231b3b80d8
produce it.
This adds a function to the TargetMachine that produces this analysis
via a callback for each function. This in turn faves the way to produce
a *different* TTI per-function with the correct subtarget cached.
I've also done the necessary wiring in the opt tool to thread the target
machine down and make it available to the pass registry so that we can
construct this analysis from a target machine when available.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227721 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
CUDA driver can unroll loops when jit-compiling PTX. To prevent CUDA
driver from unrolling a loop marked with llvm.loop.unroll.disable is not
unrolled by CUDA driver, we need to emit .pragma "nounroll" at the
header of that loop.
This patch also extracts getting unroll metadata from loop ID metadata
into a shared helper function.
Test Plan: test/CodeGen/NVPTX/nounroll.ll
Reviewers: eliben, meheff, jholewinski
Reviewed By: jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D7041
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227703 91177308-0d34-0410-b5e6-96231b3b80d8
base which it adds a single analysis pass to, to instead return the type
erased TargetTransformInfo object constructed for that TargetMachine.
This removes all of the pass variants for TTI. There is now a single TTI
*pass* in the Analysis layer. All of the Analysis <-> Target
communication is through the TTI's type erased interface itself. While
the diff is large here, it is nothing more that code motion to make
types available in a header file for use in a different source file
within each target.
I've tried to keep all the doxygen comments and file boilerplate in line
with this move, but let me know if I missed anything.
With this in place, the next step to making TTI work with the new pass
manager is to introduce a really simple new-style analysis that produces
a TTI object via a callback into this routine on the target machine.
Once we have that, we'll have the building blocks necessary to accept
a function argument as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227685 91177308-0d34-0410-b5e6-96231b3b80d8
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227669 91177308-0d34-0410-b5e6-96231b3b80d8
In preparation for adding PDB support to LLVM, this moves the
DWARF parsing code to its own subdirectory under DebugInfo, and
renames LLVMDebugInfo to LLVMDebugInfoDWARF.
This is purely a mechanical / build system change.
Differential Revision: http://reviews.llvm.org/D7269
Reviewed by: Eric Christopher
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227586 91177308-0d34-0410-b5e6-96231b3b80d8
analyses back into the LTO code generator.
The pass manager builder (and the transforms library in general)
shouldn't be referencing the target machine at all.
This makes the LTO population work like the others -- the data layout
and target transform info need to be pre-populated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227576 91177308-0d34-0410-b5e6-96231b3b80d8
incarnation of target transform info.
This is in preparation for starting to redesign TTI to be amenable to
the new PM world.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227525 91177308-0d34-0410-b5e6-96231b3b80d8
between the linker's TLS optimizations and Clang's TLS code generation.
For now, Clang has been changed to disable linker TLS optimizations
until it (and LLVM more generally) are emitting TLS code sequences
compatible with the old bugs found in the linkers. That's a better fix
to handle bootstrapping on that platform.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227511 91177308-0d34-0410-b5e6-96231b3b80d8
Any code creating an MCSectionELF knows ELF and already provides the flags.
SectionKind is an abstraction used by common code that uses a plain
MCSection.
Use the flags to compute the SectionKind. This removes a lot of
guessing and boilerplate from the MCSectionELF construction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227476 91177308-0d34-0410-b5e6-96231b3b80d8
to get a powerpc64 host so that I can reproduce and test this, but it
only impacts that platform so trying the only other realistic option.
According to Ulrich, who debugged this initially, initial-exec is likely
to be sufficient for our needs and not subject to this bug. Will watch
the build bots to see.
If this doesn't work, I'll be forced to cut a really ugly pthread-based
approach into the primary user (our stack trace printing) as that user
cannot use the ThreadLocal implementation due to lifetime issues.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227414 91177308-0d34-0410-b5e6-96231b3b80d8
entirely when threads are not enabled. This should allow anyone who
needs to bootstrap or cope with a host loader without TLS support to
limp along without threading support.
There is still some bug in the PPC TLS stuff that is not worked around.
I'm getting access to a machine to reproduce and debug this further.
There is some chance that I'll have to add a terrible workaround for
PPC.
There is also some problem with iOS, but I have no ability to really
evaluate what the issue is there. I'm leaving it to folks maintaining
that platform to suggest a path forward -- personally I don't see any
useful path forward that supports threading in LLVM but does so without
support for *very basic* TLS. Note that we don't need more than some
pointers, and we don't need constructors, destructors, or any of the
other fanciness which remains widely unimplemented.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227411 91177308-0d34-0410-b5e6-96231b3b80d8
If the personality is not a recognized MSVC personality function, this
pass delegates to the dwarf EH preparation pass. This chaining supports
people on *-windows-itanium or *-windows-gnu targets.
Currently this recognizes some personalities used by MSVC and turns
resume instructions into traps to avoid link errors. Even if cleanups
are not used in the source program, LLVM requires the frontend to emit a
code path that resumes unwinding after an exception. Clang does this,
and we get unreachable resume instructions. PR20300 covers cleaning up
these unreachable calls to resume.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D7216
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227405 91177308-0d34-0410-b5e6-96231b3b80d8
Patch by: Igor Laevsky <igor@azulsystems.com>
"Currently SplitBlockPredecessors generates incorrect code in case if basic block we are going to split has a landingpad. Also seems like it is fairly common case among it's users to conditionally call either SplitBlockPredecessors or SplitLandingPadPredecessors. Because of this I think it is reasonable to add this condition directly into SplitBlockPredecessors."
Differential Revision: http://reviews.llvm.org/D7157
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227390 91177308-0d34-0410-b5e6-96231b3b80d8
Parsed DIEs are stored in a vector and that makes it easy to get their
indices. Having easy access to a DIE's index makes it possible to use
arrays or vectors to efficiently store/access DIE related information.
There's no test for that new functionality (I don't see how to test
it standalone), but it'll be used in a subsequent dsymutil commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227381 91177308-0d34-0410-b5e6-96231b3b80d8
This is a refactoring to restructure the single user of performCustomLowering as a specific lowering pass and remove the custom lowering hook entirely.
Before this change, the LowerIntrinsics pass (note to self: rename!) was essentially acting as a pass manager, but without being structured in terms of passes. Instead, it proxied calls to a set of GCStrategies internally. This adds a lot of conceptual complexity (i.e. GCStrategies are stateful!) for very little benefit. Since there's been interest in keeping the ShadowStackGC working, I extracting it's custom lowering pass into a dedicated pass and just added that to the pass order. It will only run for functions which opt-in to that gc.
I wasn't able to find an easy way to preserve the runtime registration of custom lowering functionality. Given that no user of this exists that I'm aware of, I made the choice to just remove that. If someone really cares, we can look at restoring it via dynamic pass registration in the future.
Note that despite the large diff, none of the lowering code actual changes. I added the framing needed to make it a pass and rename the class, but that's it.
Differential Revision: http://reviews.llvm.org/D7218
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227351 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The primary goal of this patch is to remove the need for MarkOptionsChanged(). That goal is accomplished by having addOption and removeOption properly sort the options.
This patch puts the new add and remove functionality on a CommandLineParser class that is a placeholder. Some of the functionality in this class will need to be merged into the OptionRegistry, and other bits can hopefully be in a better abstraction.
This patch also removes the RegisteredOptionList global, and the need for cl::Option objects to be linked list nodes.
The changes in CommandLineTest.cpp are required because these changes shift when we validate that options are not duplicated. Before this change duplicate options were only found during certain cl API calls (like cl::ParseCommandLine). With this change duplicate options are found during option construction.
Reviewers: dexonsmith, chandlerc, pete
Reviewed By: pete
Subscribers: pete, majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D7132
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227345 91177308-0d34-0410-b5e6-96231b3b80d8
querying of the pass registry.
The pass manager relies on the static registry of PassInfo objects to
perform all manner of its functionality. I don't understand why it does
much of this. My very vague understanding is that this registry is
touched both during static initialization *and* while each pass is being
constructed. As a consequence it is hard to make accessing it not
require a acquiring some lock. This lock ends up in the hot path of
setting up, tearing down, and invaliditing analyses in the legacy pass
manager.
On most systems you can observe this as a non-trivial % of the time
spent in 'ninja check-llvm'. However, I haven't really seen it be more
than 1% in extreme cases of compiling more real-world software,
including LTO.
Unfortunately, some of the GPU JITs are seeing this taking essentially
all of their time because they have very small IR running through
a small pass pipeline very many times (at least, this is the vague
understanding I have of it).
This patch tries to minimize the cost of looking up PassInfo objects by
leveraging the fact that the objects themselves are immutable and they
are allocated separately on the heap and so don't have their address
change. It also requires a change I made the last time I tried to debug
this problem which removed the ability to de-register a pass from the
registry. This patch creates a single access path to these objects
inside the PMTopLevelManager which memoizes the result of querying the
registry. This is somewhat gross as I don't really know if
PMTopLevelManager is the *right* place to put it, and I dislike using
a mutable member to memoize things, but it seems to work.
For long-lived pass managers this should completely eliminate
the cost of acquiring locks to look into the pass registry once the
memoized cache is warm. For 'ninja check' I measured about 1.5%
reduction in CPU time and in total time on a machine with 32 hardware
threads. For normal compilation, I don't know how much this will help,
sadly. We will still pay the cost while we populate the memoized cache.
I don't think it will hurt though, and for LTO or compiles with many
small functions it should still be a win. However, for tight loops
around a pass manager with many passes and small modules, this will help
tremendously. On the AArch64 backend I saw nearly 50% reductions in time
to complete 2000 cycles of spinning up and tearing down the pipeline.
Measurements from Owen of an actual long-lived pass manager show more
along the lines of 10% improvements.
Differential Revision: http://reviews.llvm.org/D7213
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227299 91177308-0d34-0410-b5e6-96231b3b80d8
This patch folds fcmp in some cases of interest in Julia. The patch adds a function CannotBeOrderedLessThanZero that returns true if a value is provably not less than zero. I.e. the function returns true if the value is provably -0, +0, positive, or a NaN. The patch extends InstructionSimplify.cpp to fold instances of fcmp where:
- the predicate is olt or uge
- the first operand is provably not less than zero
- the second operand is zero
The motivation for handling these cases optimizing away domain checks for sqrt in Julia for common idioms such as sqrt(x*x+y*y)..
http://reviews.llvm.org/D6972
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227298 91177308-0d34-0410-b5e6-96231b3b80d8