The reading of 64 bit values could still be optimized, but at least this cuts
down on the number of virtual calls to fetch more data.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221865 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r221836.
The tests are asserting on some buildbots. This also reverts the
test part of r221837 as it relies on dwarfdump dumping the
accelerator tables.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221842 91177308-0d34-0410-b5e6-96231b3b80d8
This avoids an issue where AtEndOfStream mistakenly returns true at the /start/ of
a stream.
(In the rare case that the size is known and actually 0, the slow path will still
handle it correctly.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221840 91177308-0d34-0410-b5e6-96231b3b80d8
The class used for the dump only allows to dump for the moment, but
it can (and will) be easily extended to support search also.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221836 91177308-0d34-0410-b5e6-96231b3b80d8
Currently FormValues are only used for attributes of DIEs and thus
uers always have a CU lying around when calling into the FormValue
API.
Accelerator tables encode their information using the same Forms
as the attributes, thus it is natural to use DWARFFormValue to
extract/dump them. There is no CU in that case though. Allow the
API to be called with a null CU arguemnt by making the RelocMap
lookup conditional on the CU pointer validity. And document this
new behvior in the header. (Test coverage for this use of the API
comes in the DwarfAccelTable support patch)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221835 91177308-0d34-0410-b5e6-96231b3b80d8
One of them (__memcpy_chk) was already there, the others were checked
by comparing function names.
Note that the fortified libfuncs are now part of TLI, but are always
available, because they aren't generated, only optimized into the
non-checking versions.
Differential Revision: http://reviews.llvm.org/D6179
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221817 91177308-0d34-0410-b5e6-96231b3b80d8
Returning more information will allow BitstreamReader to be simplified a bit
and changed to read 64 bits at a time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221794 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Large-model was added first. With the addition of support for multiple PIC
models in LLVM, now add small-model PIC for 32-bit PowerPC, SysV4 ABI. This
generates more optimal code, for shared libraries with less than about 16380
data objects.
Test Plan: Test cases added or updated
Reviewers: joerg, hfinkel
Reviewed By: hfinkel
Subscribers: jholewinski, mcrosier, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D5399
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221791 91177308-0d34-0410-b5e6-96231b3b80d8
This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for
PowerPC, which provide programmer access to the lxvd2x, lxvw4x,
stxvd2x, and stxvw4x instructions.
New LLVM intrinsics are provided to represent these four instructions
in IntrinsicsPowerPC.td. These are patterned after the similar
intrinsics for lvx and stvx (Altivec). In PPCInstrVSX.td, these
intrinsics are tied to the code gen patterns, with additional patterns
to allow plain vanilla loads and stores to still generate these
instructions.
At -O1 and higher the intrinsics are immediately converted to loads
and stores in InstCombineCalls.cpp. This will open up more
optimization opportunities while still allowing the correct
instructions to be generated. (Similar code exists for aligned
Altivec loads and stores.)
The new intrinsics are added to the code that checks for consecutive
loads and stores in PPCISelLowering.cpp, as well as to
PPCTargetLowering::getTgtMemIntrinsic().
There's a new test to verify the correct instructions are generated.
The loads and stores tend to be reordered, so the test just counts
their number. It runs at -O2, as it's not very effective to test this
at -O0, when many unnecessary loads and stores are generated.
I ended up having to modify vsx-fma-m.ll. It turns out this test case
is slightly unreliable, but I don't know a good way to prevent
problems with it. The xvmaddmdp instructions read and write the same
register, which is one of the multiplicands. Commutativity allows
either to be chosen. If the FMAs are reordered differently than
expected by the test, the register assignment can be different as a
result. Hopefully this doesn't change often.
There is a companion patch for Clang.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221767 91177308-0d34-0410-b5e6-96231b3b80d8
Every MemoryObject is a StreamableMemoryObject since the removal of
StringRefMemoryObject, so just merge the two.
I will clean up the MemoryObject interface in the upcoming commits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221766 91177308-0d34-0410-b5e6-96231b3b80d8
A subtle bug was found where attempting to copy a non-const function_ref
lvalue would actually invoke the generic forwarding constructor (as it
was a closer match - being T& rather than the const T& of the implicit
copy constructor). In the particular case this lead to a dangling
function_ref member (since it had referenced the function_ref passed by
value to its ctor, rather than the outer function_ref that was still
alive)
SFINAE the converting constructor to not be considered if the copy
constructor is available and demonstrate that this causes the copy to
refer to the original functor, not to the function_ref it was copied
from. (without the code change, the test would fail as Y would be
referencing X and Y() would see the result of the mutation to X, ie: 2)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221753 91177308-0d34-0410-b5e6-96231b3b80d8
With this patch MCDisassembler::getInstruction takes an ArrayRef<uint8_t>
instead of a MemoryObject.
Even on X86 there is a maximum size an instruction can have. Given
that, it seems way simpler and more efficient to just pass an ArrayRef
to the disassembler instead of a MemoryObject and have it do a virtual
call every time it wants some extra bytes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221751 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change moves asan-coverage instrumentation
into a separate Module pass.
The other part of the change in clang introduces a new flag
-fsanitize-coverage=N.
Another small patch will update tests in compiler-rt.
With this patch no functionality change is expected except for the flag name.
The following changes will make the coverage instrumentation work with tsan/msan
Test Plan: Run regression tests, chromium.
Reviewers: nlewycky, samsonov
Reviewed By: nlewycky, samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6152
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221718 91177308-0d34-0410-b5e6-96231b3b80d8
Instead, we're going to separate metadata from the Value hierarchy. See
PR21532.
This reverts commit r221375.
This reverts commit r221373.
This reverts commit r221359.
This reverts commit r221167.
This reverts commit r221027.
This reverts commit r221024.
This reverts commit r221023.
This reverts commit r220995.
This reverts commit r220994.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221711 91177308-0d34-0410-b5e6-96231b3b80d8
What would happen before that commit is that the SDDbgValues associated with
a deallocated SDNode would be marked Invalidated, but SDDbgInfo would keep
a map entry keyed by the SDNode pointer pointing to this list of invalidated
SDDbgNodes. As the memory gets reused, the list might get wrongly associated
with another new SDNode. As the SDDbgValues are cloned when they are transfered,
this can lead to an exponential number of SDDbgValues being produced during
DAGCombine like in http://llvm.org/bugs/show_bug.cgi?id=20893
Note that the previous behavior wasn't really buggy as the invalidation made
sure that the SDDbgValues won't be used. This commit can be considered a
memory optimization and as such is really hard to validate in a unit-test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221709 91177308-0d34-0410-b5e6-96231b3b80d8
This commit adds a new pass that can inject checks before indirect calls to
make sure that these calls target known locations. It supports three types of
checks and, at compile time, it can take the name of a custom function to call
when an indirect call check fails. The default failure function ignores the
error and continues.
This pass incidentally moves the function JumpInstrTables::transformType from
private to public and makes it static (with a new argument that specifies the
table type to use); this is so that the CFI code can transform function types
at call sites to determine which jump-instruction table to use for the check at
that site.
Also, this removes support for jumptables in ARM, pending further performance
analysis and discussion.
Review: http://reviews.llvm.org/D4167
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221708 91177308-0d34-0410-b5e6-96231b3b80d8
Referencing one symbol from another in the same section does not
generally require a relocation. However, the MS linker has a feature
called /INCREMENTAL which enables incremental links. It achieves this
by creating thunks to the actual function and redirecting all
relocations to point to the thunk.
This breaks down with the old scheme if you have a function which
references, say, itself. On x86_64, we would use %rip relative
addressing to reference the start of the function from out current
position. This would lead to miscompiles because other references might
reference the thunk instead, breaking function pointer equality.
This fixes PR21520.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221678 91177308-0d34-0410-b5e6-96231b3b80d8
This adds const to a few methods that already return const references or
creates a const version when they reterun non-const references.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221666 91177308-0d34-0410-b5e6-96231b3b80d8
This introduces the symbol rewriter. This is an IR->IR transformation that is
implemented as a CodeGenPrepare pass. This allows for the transparent
adjustment of the symbols during compilation.
It provides a clean, simple, elegant solution for symbol inter-positioning. This
technique is often used, such as in the various sanitizers and performance
analysis.
The control of this is via a custom YAML syntax map file that indicates source
to destination mapping, so as to avoid having the compiler to know the exact
details of the source to destination transformations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221548 91177308-0d34-0410-b5e6-96231b3b80d8
I.E., there is no value is having
void foo() override = 0;
If it is override it is already present in a base class. Since it is pure,
some other class will have to implement it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221537 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In addition to the usual f128 workaround, it was also necessary to provide
a means of accessing ArgListEntry::IsFixed.
Reviewers: theraven, vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6111
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221518 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Teach llvm-symbolizer about PowerPC64 ELF function descriptors. Symbols in the .opd section point to function descriptors, the first word of which is a pointer to the real function. For the purposes of symbolizing we pretend that the symbol points directly to the function.
This is enough to get decent function names in stack traces for unoptimized binaries, which fixes the sanitizer print-stack-trace test on PowerPC64 Linux.
Reviewers: kcc, willschm, samsonov
Reviewed By: samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6110
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221514 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This makes PIC levels a Module flag attribute, which can be queried by the
backend. The flag is named `PIC Level`, and can have a value of:
0 - Backend-default
1 - Small-model (-fpic)
2 - Large-model (-fPIC)
These match the `-pic-level' command line argument for clang, and the value of the
preprocessor macro `__PIC__'.
Test Plan:
New flags tests specific for the 'PIC Level' module flag.
Tests to be added as part of a future commit for PowerPC, which will use this new API.
Reviewers: rafael, echristo
Reviewed By: rafael, echristo
Subscribers: rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D5882
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221510 91177308-0d34-0410-b5e6-96231b3b80d8
The ELF symbol `st_other` field might contain additional flags besides
visibility ones. This patch implements support for some MIPS specific
flags.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221491 91177308-0d34-0410-b5e6-96231b3b80d8
Imported declarations can be DIGlobalVariables which aren't a DIScope. Today
clang (unknowingly I believe) shoehorns these into a DIScope and it all works
just because we never access the fields.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221466 91177308-0d34-0410-b5e6-96231b3b80d8
Change `NamedMDNode::getOperator()` from returning `MDNode *` to
returning `Value *`. To reduce boilerplate at some call sites, add a
`getOperatorAsMDNode()` for named metadata that's expected to only
return `MDNode` -- for now, that's everything, but debug node named
metadata (such as llvm.dbg.cu and llvm.dbg.sp) will soon change. This
is part of PR21433.
Note that there's a follow-up patch to clang for the API change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221375 91177308-0d34-0410-b5e6-96231b3b80d8
Change `NamedMDNode::addOperand()` to take a `Value *` instead of an
`MDNode *`. This is part of PR21433.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221359 91177308-0d34-0410-b5e6-96231b3b80d8
Commit 220932 caused crash when building clang-tblgen on aarch64 debian target,
so it's blocking all daily tests.
The std::call_once implementation in pthread has bug for aarch64 debian.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221331 91177308-0d34-0410-b5e6-96231b3b80d8
This patch improves how the different costs (register, interference, spill
and coalescing) relates together. The assumption is now that:
- coalescing (or any other "side effect" of reg alloc) is negative, and
instead of being derived from a spill cost, they use the block
frequency info.
- spill costs are in the [MinSpillCost:+inf( range
- register or interference costs are in [0.0:MinSpillCost( or +inf
The current MinSpillCost is set to 10.0, which is a random value high
enough that the current constraint builders do not need to worry about
when settings costs. It would however be worth adding a normalization
step for register and interference costs as the last step in the
constraint builder chain to ensure they are not greater than SpillMinCost
(unless this has some sense for some architectures). This would work well
with the current builder pipeline, where all costs are tweaked relatively
to each others, but could grow above MinSpillCost if the pipeline is
deep enough.
The current heuristic is tuned to depend rather on the number of uses of
a live interval rather than a density of uses, as used by the greedy
allocator. This heuristic provides a few percent improvement on a number
of benchmarks (eembc, spec, ...) and will definitely need to change once
spill placement is implemented: the current spill placement is really
ineficient, so making the cost proportionnal to the number of use is a
clear win.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221292 91177308-0d34-0410-b5e6-96231b3b80d8
We shouldn't put this kind of attribute stuff in DataTypes.h.
Leave the END_WITH_NULL name for now so I can update clang without
making build spam.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221215 91177308-0d34-0410-b5e6-96231b3b80d8
the tombstone or empty keys of a DenseMap<int64_t, T>. This patch
fixes the issue (and adds a tests case).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221214 91177308-0d34-0410-b5e6-96231b3b80d8
LoadCombine can be smarter about aborting when a writing instruction is
encountered, instead of aborting upon encountering any writing instruction, use
an AliasSetTracker, and only abort when encountering some write that might
alias with the loads that could potentially be combined.
This was originally motivated by comments made (and a test case provided) by
David Majnemer in response to PR21448. It turned out that LoadCombine was not
responsible for that PR, but LoadCombine should also be improved so that
unrelated stores (and @llvm.assume) don't interrupt load combining.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221203 91177308-0d34-0410-b5e6-96231b3b80d8
Unconditional noexcept support was added in the VS 2013 Nov CTP. Given
that there have been three CTPs since then, I don't think we need
careful macro magic to target that specific tech preview. Instead,
target the major release version number of 1900, which corresponds to
the as-yet unreleased VS "14".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221169 91177308-0d34-0410-b5e6-96231b3b80d8
Change `Instruction::getAllMetadataOtherThanDebugLoc()` from a vector of
`MDNode` to one of `Value`. Part of PR21433.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221167 91177308-0d34-0410-b5e6-96231b3b80d8
When LLVM emits DWARF call frame information, it currently creates a local,
section-relative symbol in the code section, which is pointed to by a
relocation on the .eh_frame section. However, for C++ we emit some functions in
section groups, and the SysV ABI has some rules to make it easier to remove
these sections
(http://www.sco.com/developers/gabi/latest/ch4.sheader.html#section_group_rules):
A symbol table entry with STB_LOCAL binding that is defined relative to one
of a group's sections, and that is contained in a symbol table section that is
not part of the group, must be discarded if the group members are discarded.
References to this symbol table entry from outside the group are not allowed.
This means that we need to use the function symbol for the relocation, not a
temporary symbol.
There was a comment in the code claiming that the local symbol was used to
avoid creating a relocation, but a relocation must be created anyway as the
code and CFI are in different sections.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221150 91177308-0d34-0410-b5e6-96231b3b80d8
The problem is mostly that variadic output instruction
aren't handled, so it is rejected for having an inconsistent
number of operands, and then the right number of operands
isn't emitted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221117 91177308-0d34-0410-b5e6-96231b3b80d8
m_ZExt might bind against a ConstantExpr instead of an Instruction.
Assuming this, using cast<Instruction>, results in InstCombine crashing.
Instead, introduce ZExtOperator to bridge both Instruction and
ConstantExpr ZExts.
This fixes PR21445.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221069 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
CustomCallingConv is simply a CallingConv that tablegen should not generate the
implementation for. It allows regular CallingConv's to delegate to these custom
functions. This is (currently) necessary for Mips and we cannot use CCCustom
without having to adapt to the different API that CCCustom uses.
This brings us a bit closer to being able to remove
MipsCC::analyzeCallOperands and MipsCC::analyzeFormalArguments in favour of
the common implementation.
No functional change to the targets.
Depends on D3341
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: vmedic, llvm-commits
Differential Revision: http://reviews.llvm.org/D5965
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221052 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch extends the 'show' and 'merge' commands in llvm-profdata to handle
sample PGO formats. Using the 'merge' command it is now possible to convert
one sample PGO format to another.
The only format that is currently not working is 'gcc'. I still need to
implement support for it in lib/ProfileData.
The changes in the sample profile support classes are needed for the
merge operation.
Reviewers: bogner
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6065
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221032 91177308-0d34-0410-b5e6-96231b3b80d8
Change `Instruction::getAllMetadata()` to modify a vector of `Value`
instead of `MDNode` and update call sites. This is part of PR21433.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221027 91177308-0d34-0410-b5e6-96231b3b80d8
Change `Instruction::getMetadata()` to return `Value` as part of
PR21433.
Update most callers to use `Instruction::getMDNode()`, which wraps the
result in a `cast_or_null<MDNode>`.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221024 91177308-0d34-0410-b5e6-96231b3b80d8
Add `Instruction::getMDNode()` that casts to `MDNode` before changing
`Instruction::getMetadata()` to return `Value`. This avoids adding
`cast_or_null<MDNode>` boiler-plate throughout the code.
Part of PR21433.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221023 91177308-0d34-0410-b5e6-96231b3b80d8
It appears to ignore or find ambiguous MachineInstrBuilder's conversion
operators that allow conversion to MachineInstr* and
MachineBasicBlock::bundle_iterator.
As a workaround, add an explicit way to get the MachineInstr.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221017 91177308-0d34-0410-b5e6-96231b3b80d8
We have to use _MSC_FULL_VER here as CTP 2 and earlier didn't define
noexcept to my knowledge.
Fixes build error in lib/Support/Error.cpp when inheriting from
std::error_category, which has a noexcept virtual method.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221013 91177308-0d34-0410-b5e6-96231b3b80d8
The getBinary and getBuffer method now return ordinary pointers of appropriate
const-ness. Ownership is transferred by calling takeBinary(), which returns a
pair of the Binary and a MemoryBuffer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221003 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we have initial support for VSX, we can begin adding
intrinsics for programmer access to VSX instructions. This patch adds
basic support for VSX intrinsics in general, and tests it by
implementing intrinsics for minimum and maximum for the vector double
data type.
The LLVM portion of this is quite straightforward. There is a
companion patch for Clang.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220988 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds an optimization in CodeGenPrepare to move an extractelement
right before a store when the target can combine them.
The optimization may promote any scalar operations to vector operations in the
way to make that possible.
** Context **
Some targets use different register files for both vector and scalar operations.
This means that transitioning from one domain to another may incur copy from one
register file to another. These copies are not coalescable and may be expensive.
For example, according to the scheduling model, on cortex-A8 a vector to GPR
move is 20 cycles.
** Motivating Example **
Let us consider an example:
define void @foo(<2 x i32>* %addr1, i32* %dest) {
%in1 = load <2 x i32>* %addr1, align 8
%extract = extractelement <2 x i32> %in1, i32 1
%out = or i32 %extract, 1
store i32 %out, i32* %dest, align 4
ret void
}
As it is, this IR generates the following assembly on armv7:
vldr d16, [r0] @vector load
vmov.32 r0, d16[1] @ cross-register-file copy: 20 cycles
orr r0, r0, #1 @ scalar bitwise or
str r0, [r1] @ scalar store
bx lr
Whereas we could generate much faster code:
vldr d16, [r0] @ vector load
vorr.i32 d16, #0x1 @ vector bitwise or
vst1.32 {d16[1]}, [r1:32] @ vector extract + store
bx lr
Half of the computation made in the vector is useless, but this allows to get
rid of the expensive cross-register-file copy.
** Proposed Solution **
To avoid this cross-register-copy penalty, we promote the scalar operations to
vector operations. The penalty will be removed if we manage to promote the whole
chain of computation in the vector domain.
Currently, we do that only when the chain of computation ends by a store and the
target is able to combine an extract with a store.
Stores are the most likely candidates, because other instructions produce values
that would need to be promoted and so, extracted as some point[1]. Moreover,
this is customary that targets feature stores that perform a vector extract (see
AArch64 and X86 for instance).
The proposed implementation relies on the TargetTransformInfo to decide whether
or not it is beneficial to promote a chain of computation in the vector domain.
Unfortunately, this interface is rather inaccurate for this level of details and
although this optimization may be beneficial for X86 and AArch64, the inaccuracy
will lead to the optimization being too aggressive.
Basically in TargetTransformInfo, everything that is legal has a cost of 1,
whereas, even if a vector type is legal, usually a vector operation is slightly
more expensive than its scalar counterpart. That will lead to too many
promotions that may not be counter balanced by the saving of the
cross-register-file copy. For instance, on AArch64 this penalty is just 4
cycles.
For now, the optimization is just enabled for ARM prior than v8, since those
processors have a larger penalty on cross-register-file copies, and the scope is
limited to basic blocks. Because of these two factors, we limit the effects of
the inaccuracy. Indeed, I did not want to build up a fancy cost model with block
frequency and everything on top of that.
[1] We can imagine targets that can combine an extractelement with other
instructions than just stores. If we want to go into that direction, the current
interfaces must be augmented and, moreover, I think this becomes a global isel
problem.
Differential Revision: http://reviews.llvm.org/D5921
<rdar://problem/14170854>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220978 91177308-0d34-0410-b5e6-96231b3b80d8
Do a better job classifying symbols. This increases the consistency
between the COFF handling code and the ELF side of things.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220952 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch adds an llvm_call_once which is a wrapper around std::call_once on platforms where it is available and devoid of bugs. The patch also migrates the ManagedStatic mutex to be allocated using llvm_call_once.
These changes are philosophically equivalent to the changes added in r219638, which were reverted due to a hang on Win32 which was the result of a bug in the Windows implementation of std::call_once.
Reviewers: aaron.ballman, chapuni, chandlerc, rnk
Reviewed By: rnk
Subscribers: majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D5922
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220932 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch finishes up support for handling sampling profiles in both
text and binary formats. The new binary format uses uleb128 encoding to
represent numeric values. This makes profiles files about 25% smaller.
The profile writer class can write profiles in the existing text and the
new binary format. In subsequent patches, I will add the capability to
read (and perhaps write) profiles in the gcov format used by GCC.
Additionally, I will be adding support in llvm-profdata to manipulate
sampling profiles.
There was a bit of refactoring needed to separate some code that was in
the reader files, but is actually common to both the reader and writer.
The new test checks that reading the same profile encoded as text or
raw, produces the same results.
Reviewers: bogner, dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6000
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220915 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This helps llvm-objdump -r to print out the symbol name along
with the relocation type on x86. Adjust existing tests from checking
for "Unknown" to check for the symbol now.
Test Plan: Adjusted test/Object tests.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5987
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220866 91177308-0d34-0410-b5e6-96231b3b80d8
This is a Microsoft calling convention that supports both x86 and x86_64
subtargets. It passes vector and floating point arguments in XMM0-XMM5,
and passes them indirectly once they are consumed.
Homogenous vector aggregates of up to four elements can be passed in
sequential vector registers, but this part is not implemented in LLVM
and will be handled in Clang.
On 32-bit x86, it is similar to fastcall in that it uses ecx:edx as
integer register parameters and is callee cleanup. On x86_64, it
delegates to the normal win64 calling convention.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D5943
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220745 91177308-0d34-0410-b5e6-96231b3b80d8
I noticed that it was untested, and forcing it on caused some tests to fail:
LLVM :: Linker/metadata-a.ll
LLVM :: Linker/prefixdata.ll
LLVM :: Linker/type-unique-odr-a.ll
LLVM :: Linker/type-unique-simple-a.ll
LLVM :: Linker/type-unique-simple2-a.ll
LLVM :: Linker/type-unique-simple2.ll
LLVM :: Linker/type-unique-type-array-a.ll
LLVM :: Linker/unnamed-addr1-a.ll
LLVM :: Linker/visibility1.ll
If it is to be resurrected, it has to be fixed and we should probably have a
-preserve-source command line option in llvm-mc and run tests with and without
it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220741 91177308-0d34-0410-b5e6-96231b3b80d8
sets as keys into a cache of interference matrice values in the Interference
constraint adder.
Creating interference matrices was one of the large remaining time-sinks in
PBQP. Caching them reduces the total compile time (when using PBQP) on the
nightly test suite by ~10%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220688 91177308-0d34-0410-b5e6-96231b3b80d8
These just delegate to the underlying vector type in the MapVector.
Also just add in some sanity unittests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220687 91177308-0d34-0410-b5e6-96231b3b80d8
We used to always vectorize (slp and loop vectorize) in the LTO pass pipeline.
r220345 changed it so that we used the PassManager's fields 'LoopVectorize' and
'SLPVectorize' out of the desire to be able to disable vectorization using the
cl::opt flags 'vectorize-loops'/'slp-vectorize' which the before mentioned
fields default to.
Unfortunately, this turns off vectorization because those fields
default to false.
This commit adds flags to the LTO library to disable lto vectorization which
reconciles the desire to optionally disable vectorization during LTO and
the desired behavior of defaulting to enabled vectorization.
We really want tools to set PassManager flags directly to enable/disable
vectorization and not go the route via cl::opt flags *in*
PassManagerBuilder.cpp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220652 91177308-0d34-0410-b5e6-96231b3b80d8
Apparently unique_ptr'ifying NodeMetadata exposed an issue in VS where it
occasionally tries to synthesize copy constructors instead of moves. Hopefully
explicitly deleting the copy constructor and defining the move constructor will
fix this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220649 91177308-0d34-0410-b5e6-96231b3b80d8
It broke the Windows build:
[1/19] Building CXX object lib\CodeGen\CMakeFiles\LLVMCodeGen.dir\RegAllocPBQP.cpp.obj
C:\bb-win7\ninja-clang-i686-msc17-R\llvm-project\llvm\include\llvm/CodeGen/RegAllocPBQP.h(132) : error C2248: 'std::unique_ptr<_Ty>::unique_ptr' : cannot access private member declared in class 'std::unique_ptr<_Ty>'
with
[
_Ty=unsigned int []
]
D:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\include\memory(1600) : see declaration of 'std::unique_ptr<_Ty>::unique_ptr'
with
[
_Ty=unsigned int []
]
This diagnostic occurred in the compiler generated function 'llvm::PBQP::RegAlloc::NodeMetadata::NodeMetadata(const llvm::PBQP::RegAlloc::NodeMetadata &)'
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220645 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change. This just brings things more in-line with coding
standards, and makes ValuePool's functionality clearer (it's not tied to pooling
costs, and we may want to use it to hold other things in the future).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220641 91177308-0d34-0410-b5e6-96231b3b80d8
To do this, change the representation of lazy loaded functions.
The previous representation cannot differentiate between a function whose body
has been removed and one whose body hasn't been read from the .bc file. That
means that in order to drop a function, the entire body had to be read.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220580 91177308-0d34-0410-b5e6-96231b3b80d8
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.
For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).
This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..
See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900
Differential Revision: http://reviews.llvm.org/D5658
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220570 91177308-0d34-0410-b5e6-96231b3b80d8
In post-commit review of r219442, Rafael pointed out that the comment style
of the newly introduced helper didn't follow LLVM's coding standard.
Modernize the whole file to the new standards.
Differential Revision: http://reviews.llvm.org/D5918
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220467 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we're sure the only root (non-abstract) scope is the current
function scope, there's no need for isCurrentFunctionScope, the property
can be tested directly instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220451 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently when emitting a label, a new data fragment is created for it if the
current fragment isn't a data fragment.
This change instead enqueues the label and attaches it to the next fragment
(e.g. created for the next instruction) if possible.
When bundle alignment is not enabled, this has no functionality change (it
just results in fewer extra fragments being created). For bundle alignment,
previously labels would point to the beginning of the bundle padding instead
of the beginning of the emitted instruction. This was not only less efficient
(e.g. jumping to the nops instead of past them) but also led to miscalculation
of the address of the GOT (since MC uses a label difference rather than
emitting a "." symbol).
Fixes https://code.google.com/p/nativeclient/issues/detail?id=3982
Test Plan: regression test attached
Reviewers: jvoung, eliben
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D5915
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220439 91177308-0d34-0410-b5e6-96231b3b80d8
When a call to a double-precision libm function has fast-math semantics
(via function attribute for now because there is no IR-level FMF on calls),
we can avoid fpext/fptrunc operations and use the float version of the call
if the input and output are both float.
We already do this optimization using a command-line option; this patch just
adds the ability for fast-math to use the existing functionality.
I moved the cl::opt from InstructionCombining into SimplifyLibCalls because
it's only ever used internally to that class.
Modified the existing test cases to use the unsafe-fp-math attribute rather
than repeating all tests.
This patch should solve: http://llvm.org/bugs/show_bug.cgi?id=17850
Differential Revision: http://reviews.llvm.org/D5893
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220390 91177308-0d34-0410-b5e6-96231b3b80d8
These are named following the IEEE-754 names for these
functions, rather than the libm fmin / fmax to avoid
possible ambiguities. Some languages may implement something
resembling fmin / fmax which return NaN if either operand is
to propagate errors. These implement the IEEE-754 semantics
of returning the other operand if either is a NaN representing
missing data.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220341 91177308-0d34-0410-b5e6-96231b3b80d8
This requires incorporating __GNUC_PATCHLEVEL__ into our prerequisite
check, and renaming our __GNUC_PREREQ to LLVM_GNUC_PREREQ, since it is
now functionally different.
Patch by Chilledheart!
Differential Revision: http://reviews.llvm.org/D5879
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220332 91177308-0d34-0410-b5e6-96231b3b80d8
This enables targets to adapt their pass pipeline to the register
allocator in use. For example, with the AArch64 backend, using PBQP
with the cortex-a57, the FPLoadBalancing pass is no longer necessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220321 91177308-0d34-0410-b5e6-96231b3b80d8
It is just too easy to use a virtual register intead of a NodeId without a
compiler warning. This does not fix the fundamental problem, i.e. both
have the same underlying types, but increases the likelyhood to detect it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220303 91177308-0d34-0410-b5e6-96231b3b80d8
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.
To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.
These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.
I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220277 91177308-0d34-0410-b5e6-96231b3b80d8
Every target we support has support for assembly that looks like
a = b - c
.long a
What is special about MachO is that the above combination suppresses the
production of a relocation.
With this change we avoid producing the intermediary labels when they don't
add any value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220256 91177308-0d34-0410-b5e6-96231b3b80d8
Our metadata scheme lazily assigns IDs to string metadata, but we have a mechanism to preassign them as well. Using a preassigned ID is helpful since we get compile time type checking, and avoid some (minimal) string construction and comparison. This change adds enum value for three existing metadata types:
+ MD_nontemporal = 9, // "nontemporal"
+ MD_mem_parallel_loop_access = 10, // "llvm.mem.parallel_loop_access"
+ MD_nonnull = 11 // "nonnull"
I went through an updated various uses as well. I made no attempt to get all uses; I focused on the ones which were easily grepable and easily to translate. For example, there were several items in LoopInfo.cpp I chose not to update.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220248 91177308-0d34-0410-b5e6-96231b3b80d8
be BigEndian so the default can continue to be zero-initialized.
This is one of the prerequisites to making DataLayout a constant and
always available part of every module.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220193 91177308-0d34-0410-b5e6-96231b3b80d8
this header to remove numerous formatting inconsistencies that impede
making simple changes here without large diffs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220192 91177308-0d34-0410-b5e6-96231b3b80d8
implementation.
This is good for a ~6% reduction in total compile time on the nightly test suite
when running with -regalloc=pbqp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220183 91177308-0d34-0410-b5e6-96231b3b80d8
This operation is analogous to its counterpart in DenseMap: It allows lookup
via cheap-to-construct keys (provided that getHashValue and isEqual are
implemented for the cheap key-type in the DenseMapInfo specialization).
Thanks to Chandler for the review.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220168 91177308-0d34-0410-b5e6-96231b3b80d8
The only difference from r219829 is using
getOrCreateSectionSymbol(*ELFSec)
instead of
GetOrCreateSymbol(ELFSec->getSectionName())
in ELFObjectWriter which causes us to use the correct section symbol even if
we have multiple sections with the same name.
Original messages:
r219829:
Correctly handle references to section symbols.
When processing assembly like
.long .text
we were creating a new undefined symbol .text. GAS on the other hand would
handle that as a reference to the .text section.
This patch implements that by creating the section symbols earlier so that
they are visible during asm parsing.
The patch also updates llvm-readobj to print the symbol number in the relocation
dump so that the test can differentiate between two sections with the same name.
r219835:
Allow forward references to section symbols.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220021 91177308-0d34-0410-b5e6-96231b3b80d8
Revert "Correctly handle references to section symbols."
Revert "Allow forward references to section symbols."
Rui found a regression I am debugging.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220010 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Backends can use setInsertFencesForAtomic to signal to the middle-end that
montonic is the only memory ordering they can accept for
stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger
ordering to fences + monotonic accesses is currently living in
SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it
for several reasons:
- There is lots of redundancy to avoid: extremely similar logic already
exists in AtomicExpand.
- The current code in SelectionDAGBuilder does not use any target-hooks, it
does the same transformation for every backend that requires it
- As a result it is plain *unsound*, as it was apparently designed for ARM.
It happens to mostly work for the other targets because they are extremely
conservative, but Power for example had to switch to AtomicExpand to be
able to use lwsync safely (see r218331).
- Because it produces IR-level fences, it cannot be made sound ! This is noted
in the C++11 standard (section 29.3, page 1140):
```
Fences cannot, in general, be used to restore sequential consistency for atomic
operations with weaker ordering semantics.
```
It can also be seen by the following example (called IRIW in the litterature):
```
atomic<int> x = y = 0;
int r1, r2, r3, r4;
Thread 0:
x.store(1);
Thread 1:
y.store(1);
Thread 2:
r1 = x.load();
r2 = y.load();
Thread 3:
r3 = y.load();
r4 = x.load();
```
r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst.
But if they are lowered to monotonic accesses, no amount of fences can prevent it..
This patch does three things (I could cut it into parts, but then some of them
would not be tested/testable, please tell me if you would prefer that):
- it provides a default implementation for emitLeadingFence/emitTrailingFence in
terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder.
As we saw above, this is unsound, but the best that can be done without knowing
the targets well (and there is a comment warning about this risk).
- it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default
implementation (that exactly replicates the logic of SelectionDAGBuilder, so no
functional change)
- it finally erase this logic from SelectionDAGBuilder as it is dead-code.
Ideally, each target would define its own override for emitLeading/TrailingFence
using target-specific fences, but I do not know the Sparc/Mips/XCore memory model
well enough to do this, and they appear to be dealing fine with the ARM-inspired
default expansion for now (probably because they are overly conservative, as
Power was). If anyone wants to compile fences more agressively on these
platforms, the long comment should make it clear why he should first override
emitLeading/TrailingFence.
Test Plan: make check-all, no functional change
Reviewers: jfb, t.p.northover
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D5474
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219957 91177308-0d34-0410-b5e6-96231b3b80d8
If a square root call has an FP multiplication argument that can be reassociated,
then we can hoist a repeated factor out of the square root call and into a fabs().
In the simplest case, this:
y = sqrt(x * x);
becomes this:
y = fabs(x);
This patch relies on an earlier optimization in instcombine or reassociate to put the
multiplication tree into a canonical form, so we don't have to search over
every permutation of the multiplication tree.
Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to
use function-level attributes to do this optimization. This needs to be fixed
for both the intrinsics and in the backend.
Differential Revision: http://reviews.llvm.org/D5787
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219944 91177308-0d34-0410-b5e6-96231b3b80d8
Clang CodeGen had a utility function for creating pointer alignment assumptions
using the @llvm.assume intrinsic. This functionality will also be needed by the
inliner (to preserve function-argument alignment attributes when inlining), so
this moves the utility function into IRBuilder where it can be used both by
Clang CodeGen and also other LLVM-level code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219875 91177308-0d34-0410-b5e6-96231b3b80d8
This CL introduces MachOObjectFile::getUuid(). This function returns an ArrayRef to the object file's UUID, or an empty ArrayRef if the object file doesn't contain an LC_UUID load command.
The new function is gonna be used by llvm-symbolizer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219866 91177308-0d34-0410-b5e6-96231b3b80d8
Store `User::NumOperands` (and `MDNode::NumOperands`) in `Value`.
On 64-bit host architectures, this reduces `sizeof(User)` and all
subclasses by 8, and has no effect on `sizeof(Value)` (or, incidentally,
on `sizeof(MDNode)`).
On 32-bit host architectures, this increases `sizeof(Value)` by 4.
However, it has no effect on `sizeof(User)` and `sizeof(MDNode)`, so the
only concrete subclasses of `Value` that actually see the increase are
`BasicBlock`, `Argument`, `InlineAsm`, and `MDString`. Moreover, I'll
be shocked and confused if this causes a tangible memory regression.
This has no functionality change (other than memory footprint).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219845 91177308-0d34-0410-b5e6-96231b3b80d8
A follow-up commit will modify the memory-layout of `Value`, `User`, and
`MDNode`. First fix the comments to be doxygen-friendly (and to follow
the coding standards).
- Use "\brief" instead of "repeatedName -".
- Add a brief intro where it was missing.
- Remove duplicated comments from source files (and a couple of
noisy/trivial comments altogether).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219844 91177308-0d34-0410-b5e6-96231b3b80d8
When processing assembly like
.long .text
we were creating a new undefined symbol .text. GAS on the other hand would
handle that as a reference to the .text section.
This patch implements that by creating the section symbols earlier so that
they are visible during asm parsing.
The patch also updates llvm-readobj to print the symbol number in the relocation
dump so that the test can differentiate between two sections with the same name.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219829 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently an error is thrown if bundle alignment mode is set more than once
per module (either via the API or the .bundle_align_mode directive). This
change allows setting it multiple times as long as the alignment doesn't
change.
Also nested bundle_lock groups are currently not allowed. This change allows
them, with the effect that the group stays open until all nests are exited,
and if any of the bundle_lock directives has the align_to_end flag, the
group becomes align_to_end.
These changes make the bundle aligment simpler to use in the compiler, and
also better match the corresponding support in GNU as.
Reviewers: jvoung, eliben
Differential Revision: http://reviews.llvm.org/D5801
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219811 91177308-0d34-0410-b5e6-96231b3b80d8
A number of comment cleanups:
- Remove duplicated function and class names from comments.
- Remove duplicated comments from source file (some of which were
out-of-sync).
- Move any unduplicated comments from source file to header.
- Remove some noisy comments entirely (e.g., a comment for
`DIDescriptor::print()` saying "print descriptor" just gets in the
way of reading the code).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219801 91177308-0d34-0410-b5e6-96231b3b80d8
Remove a strange round-trip through named metadata to assign preserved
local variables to their subprograms.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219798 91177308-0d34-0410-b5e6-96231b3b80d8
Peephole optimization that generates a single conditional branch
for csinc-branch sequences like in the examples below. This is
possible when the csinc sets or clears a register based on a condition
code and the branch checks that register. Also the condition
code may not be modified between the csinc and the original branch.
Examples:
1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44
to b.<invCC>
2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44
to b.<CC>
rdar://problem/18506500
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219742 91177308-0d34-0410-b5e6-96231b3b80d8
A few minor changes to prevent @llvm.assume from interfering with loop
vectorization. First, treat @llvm.assume like the lifetime intrinsics, which
are scalarized (but don't otherwise interfere with the legality checking).
Second, ignore the cost of ephemeral instructions in the loop (these will go
away anyway during CodeGen).
Alignment assumptions and other uses of @llvm.assume can often end up inside of
loops that should be vectorized (this is not uncommon for assumptions generated
by __attribute__((align_value(n))), for example).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219741 91177308-0d34-0410-b5e6-96231b3b80d8
Eliminate library calls and intrinsic calls to fabs when the input
is a squared value.
Note that no unsafe-math / fast-math assumptions are needed for
this optimization.
Differential Revision: http://reviews.llvm.org/D5777
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219717 91177308-0d34-0410-b5e6-96231b3b80d8
This effectively reverts revert 219707. After fixing the test to work with
new function name format and renamed intrinsic.
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219710 91177308-0d34-0410-b5e6-96231b3b80d8
v2: Add SI lowering
Add test
v3: Place work dimensions after the kernel arguments.
v4: Calculate offset while lowering arguments
v5: rebase
v6: change prefix to AMDGPU
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219705 91177308-0d34-0410-b5e6-96231b3b80d8
This goes with the earlier commit to remove the static destructor from ManagedStatic.cpp by controlling the allocation and de-allocation of the mutex.
Summary: This is part of the ongoing work to remove static constructors and destructors.
Reviewers: chandlerc, rnk
Reviewed By: rnk
Subscribers: rnk, llvm-commits
Differential Revision: http://reviews.llvm.org/D5473
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219640 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds a new llvm_call_once function which is used by the ManagedStatic implementation to safely initialize a global to avoid static construction and destruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219638 91177308-0d34-0410-b5e6-96231b3b80d8
We have a transform that changes:
(x lshr C1) udiv C2
into:
x udiv (C2 << C1)
However, it is unsafe to do so if C2 << C1 discards any of C2's bits.
This fixes PR21255.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219634 91177308-0d34-0410-b5e6-96231b3b80d8
has been modular since r206822, and excluding it was leading to workarounds
such as the one in r219592, which this change removes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219593 91177308-0d34-0410-b5e6-96231b3b80d8
On x86_64 this brings it from 80 bytes to 64 bytes. Also make any member
variables private and clean up uses to go through the existing accessors.
NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219573 91177308-0d34-0410-b5e6-96231b3b80d8
routines and fix all of the bugs they expose.
I hit a test case that crashed even without these asserts due to passing
a non-exiting latch to the ExitingBlock parameter of the trip count
computation machinery. However, when I add the nice asserts, it turns
out we have plenty of coverage of these bugs, they just didn't manifest
in crashers.
The core problem seems to stem from an assumption that the latch *is*
the exiting block. While this is often true, and somewhat the "normal"
way to think about loops, it isn't necessarily true. The correct way to
call the trip count routines in a *generic* fashion (that is, without
a particular exit in mind) is to just use the loop's single exiting
block if it has one. The trip count can't be computed generically unless
it does. This works great for the loop vectorizer. The loop unroller
actually *wants* to select the latch when it has to chose between
multiple exits because for unrolling it is the latch trips that matter.
But if this is the desire, it needs to explicitly guard for non-exiting
latches and check for the generic trip count in that case.
I've added the asserts, and added convenience APIs for querying the trip
count generically that check for a single exit block. I've kept the APIs
consistent between computing trip count and trip multiples.
Thansk to Mark for the help debugging and tracking down the *right* fix
here!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219550 91177308-0d34-0410-b5e6-96231b3b80d8
In fact, symbolization is now expected to work only on Linux and
FreeBSD/NetBSD, where we have dl_iterate_phdr and can learn the
main executable name without argv0 (it will be possible on BSD systems
after http://reviews.llvm.org/D5693 lands). #ifdef-out the code for
all the rest Unix systems.
Reviewed in http://reviews.llvm.org/D5610
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219534 91177308-0d34-0410-b5e6-96231b3b80d8
ScalarEvolution in the presence of multiple exits. Previously all
loops exits had to have identical counts for a loop trip count to be
considered computable. This pessimization was implemented by calling
getBackedgeTakenCount(L) rather than getExitCount(L, ExitingBlock)
inside of ScalarEvolution::getSmallConstantTripCount() (see the FIXME
in the comments of that function). The pessimization was added to fix
a corner case involving undefined behavior (pr/16130). This patch more
precisely handles the undefined behavior case allowing the pessimization
to be removed.
ControlsExit replaces IsSubExpr to more precisely track the case where
undefined behavior is expected to occur. Because undefined behavior is
tracked more precisely we can remove MustExit from ExitLimit. MustExit
was used to track the case where the limit was computed potentially
assuming undefined behavior even if undefined behavior didn't necessarily
occur.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219517 91177308-0d34-0410-b5e6-96231b3b80d8
DW_AT_specification and DW_AT_abstract_origin resolving was only performed
on subroutine DIEs because it used the getSubroutineName method. Introduce
a more generic getName() and use it to dump the reference attributes.
Testcases have been updated to check the printed names instead of the offsets
except when the name could be ambiguous.
Reviewers: dblaikie, samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5625
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219506 91177308-0d34-0410-b5e6-96231b3b80d8
We, I suppose naïvely, believed the COFF specification with regard to
auxiliary symbol records which defined sections: they specified that the
symbol value should be zero. However, dumpbin and MinGW's objdump do
not consider the symbol value as a restriction. Relaxing this allows us
to properly dump MinGW linked executables.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219479 91177308-0d34-0410-b5e6-96231b3b80d8
to what we actually want ilogb implementation. This makes everything
*much* easier to deal with and is actually what we want when using it
anyways.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219474 91177308-0d34-0410-b5e6-96231b3b80d8
code using it more readable.
Also add a copySign static function that works more like the standard
function by accepting the value and sign-carying value as arguments.
No interesting logic here, but tests added to cover the basic API
additions and make sure they do something plausible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219453 91177308-0d34-0410-b5e6-96231b3b80d8
This patch removes the PBQPBuilder class and its subclasses and replaces them
with a composable constraints class: PBQPRAConstraint. This allows constraints
that are only required for optimisation (e.g. coalescing, soft pairing) to be
mixed and matched.
This patch also introduces support for target writers to supply custom
constraints for their targets by overriding a TargetSubtargetInfo method:
std::unique_ptr<PBQPRAConstraints> getCustomPBQPConstraints() const;
This patch should have no effect on allocations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219421 91177308-0d34-0410-b5e6-96231b3b80d8
While getSectionContents was updated to do the right thing,
getSectionSize wasn't. Move the logic to getSectionSize and leverage it
from getSectionContents.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219391 91177308-0d34-0410-b5e6-96231b3b80d8
This adds the Pat<>'s for the intrinsics. These are necessary because we
don't lower these intrinsics to SDNodes but match them directly. See the
rational in the previous commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219362 91177308-0d34-0410-b5e6-96231b3b80d8
This change modifies fatal signal handler used in LLVM tools.
Now it attempts to find llvm-symbolizer binary and communicates
with it in order to turn instruction addresses into
function/file/line info entries. This should significantly improve
stack traces readability in Debug builds.
This feature only works on selected platforms (including Darwin
and Linux). If the symbolization fails for some reason, signal
handler will fallback to the original behavior.
Reviewed in http://reviews.llvm.org/D5610
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219354 91177308-0d34-0410-b5e6-96231b3b80d8
One of many steps to generalize subprogram emission to both the DWO and
non-DWO sections (to emit -gmlt-like data under fission). Once the
functions are pushed down into DwarfCompileUnit some of the data
structures will be pushed at least into DwarfFile so that they can be
unique per-file, allowing emission to both files independently.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219345 91177308-0d34-0410-b5e6-96231b3b80d8
There are two methods in SectionRef that can fail:
* getName: The index into the string table can be invalid.
* getContents: The section might point to invalid contents.
Every other method will always succeed and returning and std::error_code just
complicates the code. For example, a section can have an invalid alignment,
but if we are able to get to the section structure at all and create a
SectionRef, we will always be able to read that invalid alignment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219314 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We currently emit an DW_AT_APPLE_property_attribute with a value that is a
bitfield describing the various attributes applied to an ObjectiveC property.
While trying to add testing to one of my dwarfdump patches that would pretty
print that, I realized this information looks totally broken and has maybe
never been correct.
As with every DWARF info, we have some enum in Dwarf.h that describes this
attribute (enum ApplePropertyAttributes). It seems however that the attribute
value is set from another definition of these flags in Sema/DeclSpec.h (enum
ObjCPropertyAttributeKind). And these 2 enums aren't in sync.
This patch updates the Dwarf.h values to the ones we are (and have been for
a very long time) emitting. We change some publicly (and even documented
in SourceLevelDebugging.rst) values, but I doubt this could be an issue as
the information has been wrong for so long...
Reviewers: echristo, dblaikie, aprantl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5653
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219311 91177308-0d34-0410-b5e6-96231b3b80d8
After 4 years there is still no normalization library. We do support
disassembly and relocations, so it doesn't look like we need it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219308 91177308-0d34-0410-b5e6-96231b3b80d8
DWARF in COFF utilizes several relocations. Implement support for them
in RelocVisitor to support llvm-dwarfdump.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219280 91177308-0d34-0410-b5e6-96231b3b80d8
propagate. Also use the TargetSubtargetInfo and the MachineFunction
and move TargetRegisterInfo query closer to uses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219273 91177308-0d34-0410-b5e6-96231b3b80d8
mach-o supports "fat" files which are a header/table-of-contents followed by a
concatenation of mach-o files built for different architectures. Currently,
MemoryBuffer has no easy way to map a subrange (slice) of a file which lld
will need to select a mach-o slice of a fat file. The new function provides
an easy way to map a slice of a file into a MemoryBuffer. Test case included.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219260 91177308-0d34-0410-b5e6-96231b3b80d8
getOpenFileSlice gets passed the map size, so it makes no sense to say that
the size is volatile. The code will not even compute the size.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219226 91177308-0d34-0410-b5e6-96231b3b80d8
On this file we had a mix of
* Twine
* const char *
* StringRef
The two that make sense are
* const Twine & (caller convenience)
* consc char * (that is what will eventually be passed to open.
Given that sys::fs::openFileForRead takes a "const Twine &", I picked that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219224 91177308-0d34-0410-b5e6-96231b3b80d8
Most Unix-like operating systems guarantee that the file descriptor is
closed after a call to close(2), even if close comes back with EINTR.
For these systems, calling close _again_ will either do nothing or close
some other file descriptor open(2)'d by another thread. (Linux)
However, some operating systems do not have this behavior. They require
at least another call to close(2) before guaranteeing that the
descriptor is closed. (HP-UX)
And some operating systems have an unpredictable blend of the two
behaviors! (xnu)
Avoid this disaster by blocking all signals before we call close(2).
This ensures that a signal will not be delivered to the thread and
close(2) will not give us back EINTR. We restore the signal mask once
the operation is done.
N.B. This isn't a problem on Windows, it doesn't have a notion of EINTR
because signals always get delivered to dedicated signal handling
threads.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219189 91177308-0d34-0410-b5e6-96231b3b80d8
It's possible to start a program with one (or all) of the standard file
descriptors closed. Subsequent open system calls will give the program
a low-numbered file descriptor.
This is problematic because we may believe we are writing to standard
out instead of a file.
Introduce Process::FixupStandardFileDescriptors, a helper function to
remap standard file descriptors to /dev/null if they were closed before
the program started.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219170 91177308-0d34-0410-b5e6-96231b3b80d8
The patch's author points out that, despite the function's documentation,
getSetCCResultType is only used to get the SETCC result type (with one
here-removed problematic exception). In one case, getSetCCResultType was being
used to get the predicate type to use for a SELECT node, and then
SIGN_EXTENDing (or truncating) to get the input predicate to match that type.
Unfortunately, this was happening inside visitSIGN_EXTEND, and creating new
SIGN_EXTEND nodes was causing an infinite loop. In addition, this behavior was
wrong if a target was not using ZeroOrNegativeOneBooleanContent. Lastly, the
extension/truncation seems unnecessary here: SELECT is defined as:
Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not i1
then the high bits must conform to getBooleanContents.
So here we remove this use of getSetCCResultType and update
getSetCCResultType's documentation to reflect its actual uses.
Patch by deadal nix!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219141 91177308-0d34-0410-b5e6-96231b3b80d8
output of the llvm-dwarfdump and llvm-objdump report the endianness
used when the object files were generated.
Patch by Charlie Turner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219110 91177308-0d34-0410-b5e6-96231b3b80d8
group's interface to all of the implementations of that analysis group.
The groups themselves can and do manage this anyways, the pass registry
needn't involve itself.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219097 91177308-0d34-0410-b5e6-96231b3b80d8
pass registry.
This style of registry is somewhat questionable, but it being
non-monotonic is crazy. No one is (or should be) unloading DSOs with
passes and unregistering them here. I've checked with a few folks and
I don't know of anyone using this functionality or any important use
case where it is necessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219096 91177308-0d34-0410-b5e6-96231b3b80d8
Fix http://llvm.org/PR21158 by adding a cast to unsigned long long,
so the comparison would be between two unsigned long longs instead
of bool and unsigned long long.
if (getAsUnsignedInteger(*this, Radix, ULLVal) ||
static_cast<unsigned long long>(static_cast<T>(ULLVal)) != ULLVal)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219065 91177308-0d34-0410-b5e6-96231b3b80d8
C++14 adds new builtin signatures for 'operator delete'. This change allows
new/delete pairs to be removed in C++14 onwards, as they were in C++11 and
before.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219014 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r218918, effectively reapplying r218914 after fixing
an Ocaml bindings test and an Asan crash. The root cause of the latter
was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a
PR to investigate who requires the loose check (and why).
Original commit message follows.
--
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString. Integers are stringified and
a `\0` character is used as a separator.
Part of PR17891.
Note: I've attached my testcases upgrade scripts to the PR. If I've
just broken your out-of-tree testcases, they might help.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219010 91177308-0d34-0410-b5e6-96231b3b80d8
In the X86 backend, matching an address is initiated by the 'addr' complex
pattern and its friends. During this process we may reassociate and-of-shift
into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the
shift into the scale of the address.
However as demonstrated by the testcase, this can trigger CSE of not only the
shift and the AND which the code is prepared for but also the underlying load
node. In the testcase this node is sitting in the RecordedNode and MatchScope
data structures of the matcher and becomes a deleted node upon CSE. Returning
from the complex pattern function, we try to access it again hitting an assert
because the node is no longer a load even though this was checked before.
Now obviously changing the DAG this late is bending the rules but I think it
makes sense somewhat. Outside of addresses we prefer and-of-shift because it
may lead to smaller immediates (FoldMaskAndShiftToScale is an even better
example because it create a non-canonical node). We currently don't recognize
addresses during DAGCombiner where arguably this canonicalization should be
performed. On the other hand, having this in the matcher allows us to cover
all the cases where an address can be used in an instruction.
I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for
the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible
for initiating the recursive CSE on users
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it
is not strictly necessary since the shift is hooked into the visited user. Of
course it's safer to keep the DAG consistent at all times (e.g. for accurate
number of uses, etc.).
So rather than changing the fundamentals, I've decided to continue along the
previous patches and detect the CSE. This patch installs a very targeted
DAGUpdateListener for the duration of a complex-pattern match and updates the
matching state accordingly. (Previous patches used HandleSDNode to detect the
CSE but that's not practical here). The listener is only installed on X86.
I tested that there is no measurable overhead due to this while running
through the spec2k BC files with llc. The only thing we pay for is the
creation of the listener. The callback never ever triggers in spec2k since
this is a corner case.
Fixes rdar://problem/18206171
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219009 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The register names t4-t7 are not available in the N32 and N64 ABIs.
This patch prints a warning, when those names are used in N32/64,
along with a fix-it with the correct register names.
Patch by Vasileios Kalintiris
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5272
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218989 91177308-0d34-0410-b5e6-96231b3b80d8
That commit was introduced in order to help investigate a problem in ARM
codegen breaking from commit 202304 (Add a limit to the heuristic that register
allocates instructions in local order). Recent analisys indicated that the
problem no longer exists, so I'm reverting this change.
See PR18996.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218981 91177308-0d34-0410-b5e6-96231b3b80d8
This patch defines a new iterator for the imported symbols.
Make a change to COFFDumper to use that iterator to print
out imported symbols and its ordinals.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218915 91177308-0d34-0410-b5e6-96231b3b80d8
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString. Integers are stringified and
a `\0` character is used as a separator.
Part of PR17891.
Note: I've attached my testcases upgrade scripts to the PR. If I've
just broken your out-of-tree testcases, they might help.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218914 91177308-0d34-0410-b5e6-96231b3b80d8
Every time we were adding or removing an expression when generating a
coverage mapping we were doing a linear search to try and deduplicate
the list. The indices in the list are important, so we can't just
replace it by a DenseMap entirely, but an auxilliary DenseMap for fast
lookup massively improves the performance issues I was seeing here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218892 91177308-0d34-0410-b5e6-96231b3b80d8
When the flag is given, the command prints out the COFF import table.
Currently only the import table directory will be printed.
I'm going to make another patch to print out the imported symbols.
The implementation of import directory entry iterator in
COFFObjectFile.cpp was buggy. This patch fixes that too.
http://reviews.llvm.org/D5569
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218891 91177308-0d34-0410-b5e6-96231b3b80d8
When I was preparing r218879 for commit, I removed an early return
that I decided was just noise. It wasn't. This is r218879 no-crash
edition.
This reverts commit r218881, reapplying r218879.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218887 91177308-0d34-0410-b5e6-96231b3b80d8
The Terms vector here represented a polynomial of of all possible
counters, and is used to simplify expressions when generating coverage
mapping. There are a few problems with this:
1. Keeping the vector as a member is wasteful, since we clear it every
time we use it.
2. Most expressions refer to a subset of the counters, so we end up
iterating over a large number of zeros doing nothing a lot of the
time.
This updates the user of the vector to store the terms locally, and
uses a sort and combine approach so that we only operate on counters
that are actually used in a given expression. For small cases this
makes very little difference, but in cases with a very large number of
counted regions this is a significant performance fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218879 91177308-0d34-0410-b5e6-96231b3b80d8
in the future to attach useful information about the PBQP graph (e.g. the
associated MachineFunction, pointers to regalloc passes) to the graph itself,
making that information accessible to the solver. This should also allow the
PBQPBuilder interface to be simplified.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218848 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r218820. It turns out that Adrian has an
outstanding SROA patch that uses this.
I've updated it to forward to `createExpression()`.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218828 91177308-0d34-0410-b5e6-96231b3b80d8
I neglected to update `DIBuilder::createPieceExpression()` in r218797,
which I noticed while rebasing a patch for PR17891. On closer
inspection, it looks like dead code.
If there are any downstream users of this, you should transition to the
more general `createExpression()`. Or, we can add this back, but then
it should just forward to `createExpression()`.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218820 91177308-0d34-0410-b5e6-96231b3b80d8
`DIExpression`'s elements are 64-bit integers that are stored as
`ConstantInt`. The accessors already encapsulate the storage. This
commit updates the `DIBuilder` API to also encapsulate that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218797 91177308-0d34-0410-b5e6-96231b3b80d8
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.
By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.
The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.
What this patch doesn't do:
This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.
http://reviews.llvm.org/D4919
rdar://problem/17994491
Thanks to dblaikie and dexonsmith for reviewing this patch!
Note: I accidentally committed a bogus older version of this patch previously.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218787 91177308-0d34-0410-b5e6-96231b3b80d8
r206400 and r209442 added remarks that are disabled by default.
However, if a diagnostic handler is registered, the remarks are sent
unfiltered to the handler. This is the right behaviour for clang, since
it has its own filters.
However, the diagnostic handler exposed in the LTO API receives only the
severity and message. It doesn't have the information to filter by pass
name. For LTO, disabled remarks should be filtered by the producer.
I've changed `LLVMContext::setDiagnosticHandler()` to take a `bool`
argument indicating whether to respect the built-in filters. This
defaults to `false`, so other consumers don't have a behaviour change,
but `LTOCodeGenerator::setDiagnosticHandler()` sets it to `true`.
To make this behaviour testable, I added a `-use-diagnostic-handler`
command-line option to `llvm-lto`.
This fixes PR21108.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218784 91177308-0d34-0410-b5e6-96231b3b80d8
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.
By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.
The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.
What this patch doesn't do:
This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.
http://reviews.llvm.org/D4919
rdar://problem/17994491
Thanks to dblaikie and dexonsmith for reviewing this patch!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218778 91177308-0d34-0410-b5e6-96231b3b80d8
member of RTDyldMemoryManager (and rename to getSymbolAddressInProcess).
The functionality this provides is very specific to RTDyldMemoryManager, so it
makes sense to keep it in that class to avoid accidental re-use.
No functional change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218741 91177308-0d34-0410-b5e6-96231b3b80d8
This can be used for in-place initialization of non-moveable types.
For compilers that don't support variadic templates, only up to four
arguments are supported. We can always add more, of course, but this
should be good enough until we move to a later MSVC that has full
support for variadic templates.
Inspired by std::experimental::optional from the "Library Fundamentals" C++ TS.
Reviewed by David Blaikie.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218732 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch adds a threshold that controls the number of bonus instructions
allowed for folding branches with common destination. The original code allows
at most one bonus instruction. With this patch, users can customize the
threshold to allow multiple bonus instructions. The default threshold is still
1, so that the code behaves the same as before when users do not specify this
threshold.
The motivation of this change is that tuning this threshold significantly (up
to 25%) improves the performance of some CUDA programs in our internal code
base. In general, branch instructions are very expensive for GPU programs.
Therefore, it is sometimes worth trading more arithmetic computation for a more
straightened control flow. Here's a reduced example:
__global__ void foo(int a, int b, int c, int d, int e, int n,
const int *input, int *output) {
int sum = 0;
for (int i = 0; i < n; ++i)
sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i];
*output = sum;
}
The select statement in the loop body translates to two branch instructions "if
((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination.
With the default threshold, SimplifyCFG is unable to fold them, because
computing the condition of the second branch "(i | c) ^ d > e" requires two
bonus instructions. With the threshold increased, SimplifyCFG can fold the two
branches so that the loop body contains only one branch, making the code
conceptually look like:
sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i];
Increasing the threshold significantly improves the performance of this
particular example. In the configuration where both conditions are guaranteed
to be true, increasing the threshold from 1 to 2 improves the performance by
18.24%. Even in the configuration where the first condition is false and the
second condition is true, which favors shortcuts, increasing the threshold from
1 to 2 still improves the performance by 4.35%.
We are still looking for a good threshold and maybe a better cost model than
just counting the number of bonus instructions. However, according to the above
numbers, we think it is at least worth adding a threshold to enable more
experiments and tuning. Let me know what you think. Thanks!
Test Plan: Added one test case to check the threshold is in effect
Reviewers: nadav, eliben, meheff, resistor, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D5529
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218711 91177308-0d34-0410-b5e6-96231b3b80d8
It was hacky to use an opcode as a switch because it won't always match
(rsqrte != sqrte), and it looks like we'll need to add more special casing
per arch than I had hoped for. Eg, x86 will prefer a different NR estimate
implementation. ARM will want to use it's 'step' instructions. There also
don't appear to be any new estimate instructions in any arch in a long,
long time. Altivec vloge and vexpte may have been the first and last in
that field...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218698 91177308-0d34-0410-b5e6-96231b3b80d8
Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1.
Now cmp intrinsics lower in the following way:
(i8 (int_x86_avx512_mask_pcmpeq_q_128
(v2i64 %a), (v2i64 %b), (i8 %mask))) ->
(i8 (bitcast
(v8i1 (insert_subvector undef,
(v2i1 (and (PCMPEQM %a, %b),
(extract_subvector
(v8i1 (bitcast %mask)), 0))), 0))))
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218669 91177308-0d34-0410-b5e6-96231b3b80d8
The contract of this function seems problematic (fallback in either
direction seems like it could produce bugs in one client or another),
but here's some tests for its current behavior, at least. See the
commit/review thread of r218187 for more discussion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218626 91177308-0d34-0410-b5e6-96231b3b80d8
This takes a single argument convertible to T, and
- if the Optional has a value, returns the existing value,
- otherwise, constructs a T from the argument and returns that.
Inspired by std::experimental::optional from the "Library Fundamentals" C++ TS.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218618 91177308-0d34-0410-b5e6-96231b3b80d8
llvm::huge_valf is defined in a header file, so it is initialized
multiple times in every compiled unit upon program startup.
With non-VC compilers huge_valf is set to a HUGE_VALF which the
compiler can probably optimize out.
With VC numeric_limits<float>::infinity() does not return a number
but a runtime structure member which therotically may change
between calls so the compiler does not optimize out the
initialization and it happens many times. It can be easily seen by
placing a breakpoint on the initialization line.
This patch moves llvm::huge_valf initialization to a source file
instead of the header.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218567 91177308-0d34-0410-b5e6-96231b3b80d8