While working on a Thumb-2 code size optimization I just realized that we don't have any regression tests for it.
So here's a first test case, I plan to increase the coverage over time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216728 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r215862 due to nightly failures. Will work on getting a
reduced test case, but I wanted to get our bots green in the meantime.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216325 91177308-0d34-0410-b5e6-96231b3b80d8
There's no need to do this if the user doesn't call va_start. In the
future, we're going to have thunks that forward these register
parameters with musttail calls, and they won't need these spills for
handling va_start.
Most of the test suite changes are adding va_start calls to existing
tests to keep things working.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216294 91177308-0d34-0410-b5e6-96231b3b80d8
instruction from ARMInstrInfo to ARMBaseInstrInfo.
That way, thumb mode can also benefit from the advanced copy optimization.
<rdar://problem/12702965>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216274 91177308-0d34-0410-b5e6-96231b3b80d8
The FPv4-SP floating-point unit is generally referred to as
single-precision only, but it does have double-precision registers and
load, store and GPR<->DPR move instructions which operate on them.
This patch enables the use of these registers, the main advantage of
which is that we now comply with the AAPCS-VFP calling convention.
This partially reverts r209650, which added some AAPCS-VFP support,
but did not handle return values or alignment of double arguments in
registers.
This patch also adds tests for Thumb2 code generation for
floating-point instructions and intrinsics, which previously only
existed for ARM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216172 91177308-0d34-0410-b5e6-96231b3b80d8
advanced copy optimization.
This is the final step patch toward transforming:
udiv r0, r0, r2
udiv r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
into:
udiv r0, r0, r2
udiv r1, r1, r3
bx lr
Indeed, thanks to this patch, this optimization is able to look through
vmov.32 d16[0], r0
vmov.32 d16[1], r1
and is able to rewrite the following sequence:
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
into simple generic GPR copies that the coalescer managed to remove.
<rdar://problem/12702965>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216144 91177308-0d34-0410-b5e6-96231b3b80d8
the isRegSequence property.
This is a follow-up of r215394 and r215404, which respectively introduces the
isRegSequence property and uses it for ARM.
Thanks to the property introduced by the previous commits, this patch is able
to optimize the following sequence:
vmov d0, r2, r3
vmov d1, r0, r1
vmov r0, s0
vmov r1, s2
udiv r0, r1, r0
vmov r1, s1
vmov r2, s3
udiv r1, r2, r1
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
into:
udiv r0, r0, r2
udiv r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
This patch refactors how the copy optimizations are done in the peephole
optimizer. Prior to this patch, we had one copy-related optimization that
replaced a copy or bitcast by a generic, more suitable (in terms of register
file), copy.
With this patch, the peephole optimizer features two copy-related optimizations:
1. One for rewriting generic copies to generic copies:
PeepholeOptimizer::optimizeCoalescableCopy.
2. One for replacing non-generic copies with generic copies:
PeepholeOptimizer::optimizeUncoalescableCopy.
The goals of these two optimizations are slightly different: one rewrite the
operand of the instruction (#1), the other kills off the non-generic instruction
and replace it by a (sequence of) generic instruction(s).
Both optimizations rely on the ValueTracker introduced in r212100.
The ValueTracker has been refactored to use the information from the
TargetInstrInfo for non-generic instruction. As part of the refactoring, we
switched the tracking from the index of the definition to the actual register
(virtual or physical). This one change is to provide better consistency with
register related APIs and to ease the use of the TargetInstrInfo.
Moreover, this patch introduces a new helper class CopyRewriter used to ease the
rewriting of generic copies (i.e., #1).
Finally, this patch adds a dead code elimination pass right after the peephole
optimizer to get rid of dead code that may appear after rewriting.
This is related to <rdar://problem/12702965>.
Review: http://reviews.llvm.org/D4874
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216088 91177308-0d34-0410-b5e6-96231b3b80d8
LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic.
According to ARM ARM, rbit only takes register as argument, not immediate.
The correct instruction should be rbit <Rd>, <Rm>.
The bug was originally introduced in r211057.
Differential Revision: http://reviews.llvm.org/D4980
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216064 91177308-0d34-0410-b5e6-96231b3b80d8
Note: This was originally reverted to track down a buildbot error. This commit
exposed a latent bug that was fixed in r215753. Therefore it is reapplied
without any modifications.
I run it through SPEC2k and SPEC2k6 for AArch64 and it didn't introduce any new
regeressions.
Original commit message:
This changes the order in which FastISel tries to materialize a constant.
Originally it would try to use a simple target-independent approach, which
can lead to the generation of inefficient code.
On X86 this would result in the use of movabsq to materialize any 64bit
integer constant - even for simple and small values such as 0 and 1. Also
some very funny floating-point materialization could be observed too.
On AArch64 it would materialize the constant 0 in a register even the
architecture has an actual "zero" register.
On ARM it would generate unnecessary mov instructions or not use mvn.
This change simply changes the order and always asks the target first if it
likes to materialize the constant. This doesn't fix all the issues
mentioned above, but it enables the targets to implement such
optimizations.
Related to <rdar://problem/17420988>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216006 91177308-0d34-0410-b5e6-96231b3b80d8
Externally-defined functions with weak linkage should not be
tail-called on ARM or AArch64, as the AAELF spec requires normal calls
to undefined weak functions to be replaced with a NOP or jump to the
next instruction. The behaviour of branch instructions in this
situation (as used for tail calls) is implementation-defined, so we
cannot rely on the linker replacing the tail call with a return.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215890 91177308-0d34-0410-b5e6-96231b3b80d8
The set of functions defined in the RTABI was separated for no real reason.
This brings us closer to proper utilisation of the functions defined by the
RTABI. It also sets the ground for correctly emitting function calls to AEABI
functions on all AEABI conforming platforms.
The previously existing lie on the behaviour of __ldivmod and __uldivmod is
propagated as it is beyond the scope of the change.
The changes to the test are due to the fact that we now use the divmod functions
which return both the quotient and remainder and thus we no longer need to
invoke two functions on Linux (making it closer to EABI's behaviour).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215862 91177308-0d34-0410-b5e6-96231b3b80d8
FastEmit_i won't always succeed to materialize an i32 constant and just fail.
This would trigger a fall-back to SelectionDAG, which is really not necessary.
This fix will first fall-back to a constant pool load to materialize the constant
before giving up for good.
This fixes <rdar://problem/18022633>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215682 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts:
r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants."
r215594 "[FastISel][X86] Use XOR to materialize the "0" value."
r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization."
r215591 "[FastISel][AArch64] Make use of the zero register when possible."
r215588 "[FastISel] Let the target decide first if it wants to materialize a constant."
r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI."
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215673 91177308-0d34-0410-b5e6-96231b3b80d8
This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops.
This patch is very similar to a fabs patch committed at r214892.
Differential Revision: http://reviews.llvm.org/D4852
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215646 91177308-0d34-0410-b5e6-96231b3b80d8
This changes the order in which FastISel tries to materialize a constant.
Originally it would try to use a simple target-independent approach, which
can lead to the generation of inefficient code.
On X86 this would result in the use of movabsq to materialize any 64bit
integer constant - even for simple and small values such as 0 and 1. Also
some very funny floating-point materialization could be observed too.
On AArch64 it would materialize the constant 0 in a register even the
architecture has an actual "zero" register.
On ARM it would generate unnecessary mov instructions or not use mvn.
This change simply changes the order and always asks the target first if it
likes to materialize the constant. This doesn't fix all the issues
mentioned above, but it enables the targets to implement such
optimizations.
Related to <rdar://problem/17420988>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215588 91177308-0d34-0410-b5e6-96231b3b80d8
This change is also in preparation for a future change to make sure that
the constant materialization uses MOVT/MOVW when available and not a load
from the constant pool.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215584 91177308-0d34-0410-b5e6-96231b3b80d8
For many Thumb-1 register register instructions, setting the CPSR is not
permitted inside an IT block. We would not correctly flag those instructions.
The previous change to identify this scenario was insufficient as it did not
actually catch all the instances. The current list is formed by manual
inspection of the ARMv6M ARM.
The change to the Thumb2 IT block test is due to the fact that the new more
stringent checking of the MIs results in the If Conversion pass being prevented
from executing (since not all the instructions in the BB are predicable). This
results in code gen changes.
Thanks to Tim Northover for pointing out that the previous patch was
insufficient and hinting that the use of the v6M ARM would be much easier to use
than the v7 or v8!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215382 91177308-0d34-0410-b5e6-96231b3b80d8
By default, LLVM uses the "C" calling convention for all runtime
library functions. The half-precision FP conversion functions use the
soft-float calling convention, and are needed for some targets which
use the hard-float convention by default, so must have their calling
convention explicitly set.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215348 91177308-0d34-0410-b5e6-96231b3b80d8
The ARM ARM states that CPSR may not be updated by a MUL in thumb mode. Due to
an ordering of Thumb 2 Size Reduction and If Conversion, we would end up
generating a THUMB MULS inside an IT block.
The If Conversion pass uses the TTI isPredicable method to ensure that it can
transform a Basic Block. However, because we only check for IT handling on
Thumb2 functions, we may miss some cases. Even then, it only validates that the
CPSR is not *live* rather than it is not accessed. This corrects the handling
for that particular case since the same restriction does not hold on the vast
majority of the instructions.
This does prevent the IfConversion optimization from kicking in in certain
cases, but generating correct code is more valuable. Addresses PR20555.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215328 91177308-0d34-0410-b5e6-96231b3b80d8
BranchFolderPass was not correctly setting the basic block branch weights when
tail-merging created or merged blocks. This patch recomutes the weights of
tail-merged blocks using the following formula:
branch_weight(merged block to successor j) =
sum(block_frequency(bb) * branch_probability(bb -> j))
bb is a block that is in the set of merged blocks.
<rdar://problem/16256423>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215135 91177308-0d34-0410-b5e6-96231b3b80d8
Particularly on MachO, we were generating "blx _dest" instructions on M-class
CPUs, which don't actually exist. They happen to get fixed up by the linker
into valid "bl _dest" instructions (which is why such a massive issue has
remained largely undetected), but we shouldn't rely on that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214959 91177308-0d34-0410-b5e6-96231b3b80d8
This was coming in weird debug info that had variables (and hence
debug_locs) but was in GMLT mode (because it was missing the 13th field
of the compile_unit metadata) so no ranges were constructed. We should
always have at least one range for any CU with a debug_loc in it -
because the range should cover the debug_loc.
The assertion just ensures that the "!= 1" range case inside the
subsequent loop doesn't get entered for the case where there are no
ranges at all, which should never reach here in the first place.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214939 91177308-0d34-0410-b5e6-96231b3b80d8
Without the 13th field, the "emission kind" field defaults to 0 (which
is not equal to either of the values of the emission kind enum (1 ==
full debug info, 2 == line tables only)).
In this particular instance, the comparison with "FullDebugInfo" was
done when adding elements to the ranges list - so for these test cases
no values were added to the ranges list.
This got weirder when emitting debug_loc entries as the addresses should
be relative to the range of the CU if the CU has only one range (the
reasonable assumption is that if we're emitting debug_loc lists for a CU
that CU has at least one range - but due to the above situation, it has
zero) so the ranges were emitted relative to the start of the section
rather than relative to the start of the CU's singular range.
Fix these tests by accounting for the difference in the description of
debug_loc entries (in some cases making the test ignorant to these
differences, in others adding the extra label difference expression,
etc) or the presence/absence of high/low_pc on the CU, and add the 13th
field to their CUs to enable proper "full debug info" emission here.
In a future commit I'll fix up a bunch of other test cases that are not
so rigorously depending on this behavior, but still doing similarly
weird things due to the missing 13th field.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214937 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts r214893, re-applying r214881 with the test case relaxed a bit to
satiate the build bots.
POP on armv4t cannot be used to change thumb state (unilke later non-m-class
architectures), therefore we need a different return sequence that uses 'bx'
instead:
POP {r3}
ADD sp, #offset
BX r3
This patch also fixes an issue where the return value in r3 would get clobbered
for functions that return 128 bits of data. In that case, we generate this
sequence instead:
MOV ip, r3
POP {r3}
ADD sp, #offset
MOV lr, r3
MOV r3, ip
BX lr
http://reviews.llvm.org/D4748
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214928 91177308-0d34-0410-b5e6-96231b3b80d8
Allow vector fabs operations on bitcasted constant integer values to be optimized
in the same way that we already optimize scalar fabs.
So for code like this:
%bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000
%fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast)
%ret = bitcast <2 x float> %fabs to i64
Instead of generating something like this:
movabsq (constant pool loadi of mask for sign bits)
vmovq (move from integer register to vector/fp register)
vandps (mask off sign bits)
vmovq (move vector/fp register back to integer return register)
We should generate:
mov (put constant value in return register)
I have also removed a redundant clause in the first 'if' statement:
N0.getOperand(0).getValueType().isInteger()
is the same thing as:
IntVT.isInteger()
Testcases for x86 and ARM added to existing files that deal with vector fabs.
One existing testcase for x86 removed because it is no longer ideal.
For more background, please see:
http://reviews.llvm.org/D4770
And:
http://llvm.org/bugs/show_bug.cgi?id=20354
Differential Revision: http://reviews.llvm.org/D4785
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214892 91177308-0d34-0410-b5e6-96231b3b80d8
POP on armv4t cannot be used to change thumb state (unilke later non-m-class
architectures), therefore we need a different return sequence that uses 'bx'
instead:
POP {r3}
ADD sp, #offset
BX r3
This patch also fixes an issue where the return value in r3 would get clobbered
for functions that return 128 bits of data. In that case, we generate this
sequence instead:
MOV ip, r3
POP {r3}
ADD sp, #offset
MOV lr, r3
MOV r3, ip
BX lr
http://reviews.llvm.org/D4748
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214881 91177308-0d34-0410-b5e6-96231b3b80d8
It's a bit of a tradeoff, since llvm-dwarfdump doesn't print the name of
the global symbol being used as an address in the addressing mode, but
this avoids the dependence on hardcoded set labels that keep changing
(5+ commits over the last few years that each update the set label as it
changes due to other, unrelated differences in output). This could've,
instead, been changed to match the set name then match the name in the
string pool but that would present other issues (needing to skip over
the sets that weren't of interest, etc) and checking that the addresses
(granted, without relocations applied - so it's not the whole story)
match in the two variable location descriptions seems sufficient and
fairly stable here.
There are a few similar other tests with similar label dependence that
I'll update soonish.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214878 91177308-0d34-0410-b5e6-96231b3b80d8
expanding pseudo LOAD_STATCK_GUARD using instructions that are normally used
in pic mode. This patch fixes the bug.
<rdar://problem/17886592>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214614 91177308-0d34-0410-b5e6-96231b3b80d8
This is a followup patch for r214366, which added the same behavior to the
AArch64 and X86 FastISel code. This fix reproduces the already existing
behavior of SelectionDAG in FastISel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214531 91177308-0d34-0410-b5e6-96231b3b80d8
Before this patch we had
@a = weak global ...
but
@b = alias weak ...
The patch changes aliases to look more like global variables.
Looking at some really old code suggests that the reason was that the old
bison based parser had a reduction for alias linkages and another one for
global variable linkages. Putting the alias first avoided the reduce/reduce
conflict.
The days of the old .ll parser are long gone. The new one parses just "linkage"
and a later check is responsible for deciding if a linkage is valid in a
given context.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214355 91177308-0d34-0410-b5e6-96231b3b80d8
We need to make sure we use the softened version of all appropriate operands in
the libcall, or things go horribly wrong. This may entail actually executing a
1-stage softening.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214175 91177308-0d34-0410-b5e6-96231b3b80d8
address of the stack guard was being spilled to the stack.
Previously the address of the stack guard would get spilled to the stack if it
was impossible to keep it in a register. This patch introduces a new target
independent node and pseudo instruction which gets expanded post-RA to a
sequence of instructions that load the stack guard value. Register allocator
can now just remat the value when it can't keep it in a register.
<rdar://problem/12475629>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213967 91177308-0d34-0410-b5e6-96231b3b80d8
* Add CUs to the named CU node
* Add missing DW_TAG_subprogram nodes
* Add llvm::Functions to the DW_TAG_subprogram nodes
This cleans up the tests so that they don't break under a
soon-to-be-made change that is more strict about such things.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213951 91177308-0d34-0410-b5e6-96231b3b80d8
assembly instructions.
This is necessary to ensure ARM assembler switches to Thumb mode before it
starts assembling the file level inline assembly instructions at the beginning
of a .s file.
<rdar://problem/17757232>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213924 91177308-0d34-0410-b5e6-96231b3b80d8
which have successfully round-tripped through the combine phase, and use
this to ensure all operands to DAG nodes are visited by the combiner,
even if they are only added during the combine phase.
This is critical to have the combiner reach nodes that are *introduced*
during combining. Previously these would sometimes be visited and
sometimes not be visited based on whether they happened to end up on the
worklist or not. Now we always run them through the combiner.
This fixes quite a few bad codegen test cases lurking in the suite while
also being more principled. Among these, the TLS codegeneration is
particularly exciting for programs that have this in the critical path
like TSan-instrumented binaries (although I think they engineer to use
a different TLS that is faster anyways).
I've tried to check for compile-time regressions here by running llc
over a merged (but not LTO-ed) clang bitcode file and observed at most
a 3% slowdown in llc. Given that this is essentially a worst case (none
of opt or clang are running at this phase) I think this is tolerable.
The actual LTO case should be even less costly, and the cost in normal
compilation should be negligible.
With this combining logic, it is possible to re-legalize as we combine
which is necessary to implement PSHUFB formation on x86 as
a post-legalize DAG combine (my ultimate goal).
Differential Revision: http://reviews.llvm.org/D4638
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213898 91177308-0d34-0410-b5e6-96231b3b80d8
We were assuming all SBFX-like operations would have the shl/asr form, but
often when the field being extracted is an i8 or i16, we end up with a
SIGN_EXTEND_INREG acting on a shift instead. Simple enough to check for though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213754 91177308-0d34-0410-b5e6-96231b3b80d8
The post-indexed instructions were missing the constraint, causing unpredictable STRH instructions to be emitted.
The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213729 91177308-0d34-0410-b5e6-96231b3b80d8
insertions.
The old behavior could cause arbitrarily bad memory usage in the DAG
combiner if there was heavy traffic of adding nodes already on the
worklist to it. This commit switches the DAG combine worklist to work
the same way as the instcombine worklist where we null-out removed
entries and only add new entries to the worklist. My measurements of
codegen time shows slight improvement. The memory utilization is
unsurprisingly dominated by other factors (the IR and DAG itself
I suspect).
This change results in subtle, frustrating churn in the particular order
in which DAG combines are applied which causes a number of minor
regressions where we fail to match a pattern previously matched by
accident. AFAICT, all of these should be using AddToWorklist to directly
or should be written in a less brittle way. None of the changes seem
drastically bad, and a few of the changes seem distinctly better.
A major change required to make this work is to significantly harden the
way in which the DAG combiner handle nodes which become dead
(zero-uses). Previously, we relied on the ability to "priority-bump"
them on the combine worklist to achieve recursive deletion of these
nodes and ensure that the frontier of remaining live nodes all were
added to the worklist. Instead, I've introduced a routine to just
implement that precise logic with no indirection. It is a significantly
simpler operation than that of the combiner worklist proper. I suspect
this will also fix some other problems with the combiner.
I think the x86 changes are really minor and uninteresting, but the
avx512 change at least is hiding a "regression" (despite the test case
being just noise, not testing some performance invariant) that might be
looked into. Not sure if any of the others impact specific "important"
code paths, but they didn't look terribly interesting to me, or the
changes were really minor. The consensus in review is to fix any
regressions that show up after the fact here.
Thanks to the other reviewers for checking the output on other
architectures. There is a specific regression on ARM that Tim already
has a fix prepped to commit.
Differential Revision: http://reviews.llvm.org/D4616
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213727 91177308-0d34-0410-b5e6-96231b3b80d8
We should update the usages to all of the results;
otherwise, we might get assertion failure or SEGV during
the type legalization of ATOMIC_CMP_SWAP_WITH_SUCCESS
with two or more illegal types.
For example, in the following sequence, both i8 and i1
might be illegal in some target, e.g. armv5, mipsel, mips64el,
%0 = cmpxchg i8* %ptr, i8 %desire, i8 %new monotonic monotonic
%1 = extractvalue { i8, i1 } %0, 1
Since both i8 and i1 should be legalized, the corresponding
ATOMIC_CMP_SWAP_WITH_SUCCESS dag will be checked/replaced/updated
twice.
If we don't update the usage to *ALL* of the results in the
first round, the DAG for extractvalue might be processed earlier.
The GetPromotedInteger() will result in assertion failure,
because its operand (i.e. the success bit of cmpxchg) is not
promoted beforehand.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213569 91177308-0d34-0410-b5e6-96231b3b80d8
When performing a dynamic stack adjustment without optimisations, we would mark
SP as def and R4 as kill. This occurred as part of the expansion of a
WIN__CHKSTK SDNode which indicated the proper handling of SP and R4. The result
would be that we would double define SP as part of an operation, which is
obviously incorrect.
Furthermore, the VTList for the chain had an incorrect parameter type of i32
instead of Other.
Correct these to permit proper lowering of __builtin_alloca at -O0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213442 91177308-0d34-0410-b5e6-96231b3b80d8
Actual support for softening f16 operations is still limited, and can be added
when it's needed. But Soften is much closer to being a useful thing to try
than keeping it Legal when no registers can actually hold such values.
Longer term, we probably want something between Soften and Promote semantics
for most targets, it'll be more efficient to promote the 4 basic operations to
f32 than libcall them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213372 91177308-0d34-0410-b5e6-96231b3b80d8
The post-indexed instructions were missing the constraint, causing unpredictable STR instructions to be emitted.
The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed.
This fixes PR20323.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213369 91177308-0d34-0410-b5e6-96231b3b80d8
This makes the two intrinsics @llvm.convert.from.f16 and
@llvm.convert.to.f16 accept types other than simple "float". This is
only strictly needed for the truncate operation, since otherwise
double rounding occurs and there's no way to represent the strict IEEE
conversion. However, for symmetry we allow larger types in the extend
too.
During legalization, we can expand an "fp16_to_double" operation into
two extends for convenience, but abort when the truncate isn't legal. A new
libcall is probably needed here.
Even after this commit, various target tweaks are needed to actually use the
extended intrinsics. I've put these into separate commits for clarity, so there
are no actual tests of f64 conversion here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213248 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes an issue where a local value is defined before and used after an
inline asm call with side effects.
This fix simply flushes the local value map, which updates the insertion point
for the inline asm call to be above any previously defined local values.
This fixes <rdar://problem/17694203>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213203 91177308-0d34-0410-b5e6-96231b3b80d8
The coalescer is very aggressive at propagating constraints on the register classes, and the register allocator doesn’t know how to split sub-registers later to recover. This patch provides an escape valve for targets that encounter this problem to limit coalescing.
This patch also implements such for ARM to lower register pressure when using lots of large register classes. This works around PR18825.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213078 91177308-0d34-0410-b5e6-96231b3b80d8
This crash was pretty common while compiling Rust for iOS (armv7). Reason -
SjLj preparation step was lowering aggregate arguments as ExtractValue +
InsertValue. ExtractValue has assertion which checks that there is some data in
value, which is not true in case of empty (no fields) structures. Rust uses
them quite extensively so this patch uses a 'select true, %val, undef'
instruction to lower the argument.
Patch by Valerii Hiora.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212922 91177308-0d34-0410-b5e6-96231b3b80d8
This completes the handling for DLL import storage symbols when lowering
instructions. A DLL import storage symbol must have an additional load
performed prior to use. This is applicable to variables and functions.
This is particularly important for non-function symbols as it is possible to
handle function references by emitting a thunk which performs the translation
from the unprefixed __imp_ symbol to the proper symbol (although, this is a
non-optimal lowering). For a variable symbol, no such thunk can be
accommodated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212431 91177308-0d34-0410-b5e6-96231b3b80d8
Adds support for __builtin_arm_isb. Also corrects DSB and ISB instructions
modelling by adding has-side-effects property.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212276 91177308-0d34-0410-b5e6-96231b3b80d8
Most of this is just tests that were silently succeeding in spite of
schema changes I made over a year ago. Cleaning them up as they lead to
failures in a change I'm working on/will come soon.
test/DebugInfo/2010-01-19-DbgScope.ll was removed as it tested miscoping
where a DebugLoc described a location not in the current function. The
test case doesn't describe why this is a valid situation and should be
supported, so I'm removing it and shortly going to commit changes that
make this firmly unsupported/assert-fail.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211628 91177308-0d34-0410-b5e6-96231b3b80d8
ARM v7M has ldrex/strex but not ldrexd/strexd. This means 32-bit
operations should work as normal, but 64-bit ones are almost certainly
doomed.
Patch by Phoebe Buckheister.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211042 91177308-0d34-0410-b5e6-96231b3b80d8
This also simplifies the IR we create slightly: instead of working out
where success & failure should go manually, it turns out we can just
always jump to a success/failure block created for the purpose. Later
phases will sort out the mess without much difficulty.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210917 91177308-0d34-0410-b5e6-96231b3b80d8
This commit adds a weak variant of the cmpxchg operation, as described
in C++11. A cmpxchg instruction with this modifier is permitted to
fail to store, even if the comparison indicated it should.
As a result, cmpxchg instructions must return a flag indicating
success in addition to their original iN value loaded. Thus, for
uniformity *all* cmpxchg instructions now return "{ iN, i1 }". The
second flag is 1 when the store succeeded.
At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been
added as the natural representation for the new cmpxchg instructions.
It is a strong cmpxchg.
By default this gets Expanded to the existing ATOMIC_CMP_SWAP during
Legalization, so existing backends should see no change in behaviour.
If they wish to deal with the enhanced node instead, they can call
setOperationAction on it. Beware: as a node with 2 results, it cannot
be selected from TableGen.
Currently, no use is made of the extra information provided in this
patch. Test updates are almost entirely adapting the input IR to the
new scheme.
Summary for out of tree users:
------------------------------
+ Legacy Bitcode files are upgraded during read.
+ Legacy assembly IR files will be invalid.
+ Front-ends must adapt to different type for "cmpxchg".
+ Backends should be unaffected by default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210903 91177308-0d34-0410-b5e6-96231b3b80d8
Windows on ARM uses COFF/PE which is intrinsically position independent. For
the case of 32-bit immediates, use a pair-wise relocation as otherwise we may
exceed the range of operators. This fixes a code generation crash when using
-Oz when targeting Windows on ARM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210814 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is to improve global merge pass and support global symbol merge.
The global symbol merge is not enabled by default. For aarch64, we need some
more back-end fix to make it really benifit ADRP CSE.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210640 91177308-0d34-0410-b5e6-96231b3b80d8
The armv7-windows-itanium environment is nearly identical to the MSVC ABI. It
has a few divergences, mostly revolving around the use of the Itanium ABI for
C++. VLA support is one of the extensions that are amongst the set of the
extensions.
This adds support for proper VLA emission for this environment. This is
somewhat similar to the handling for __chkstk emission on X86 and the large
stack frame emission for ARM. The invocation style for chkstk is still
controlled via the -mcmodel flag to clang.
Make an explicit note that this is an extension.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210489 91177308-0d34-0410-b5e6-96231b3b80d8
COFF/PE, so the relocation model is never static. Loosen the assertion
accordingly. The relocation can still be emitted properly, as it will be
converted to an IMAGE_REL_ARM_ADDR32 which will be resolved by the loader
taking the base relocation into account. This is necessary to permit the
emission of long calls which can be controlled via the -mlong-calls option in
the driver.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210399 91177308-0d34-0410-b5e6-96231b3b80d8
It includes a pass that rewrites all indirect calls to jumptable functions to pass through these tables.
This also adds backend support for generating the jump-instruction tables on ARM and X86.
Note that since the jumptable attribute creates a second function pointer for a
function, any function marked with jumptable must also be marked with unnamed_addr.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210280 91177308-0d34-0410-b5e6-96231b3b80d8
This patch changes GlobalAlias to point to an arbitrary ConstantExpr and it is
up to MC (or the system assembler) to decide if that expression is valid or not.
This reduces our ability to diagnose invalid uses and how early we can spot
them, but it also lets us do things like
@test5 = alias inttoptr(i32 sub (i32 ptrtoint (i32* @test2 to i32),
i32 ptrtoint (i32* @bar to i32)) to i32*)
An important implication of this patch is that the notion of aliased global
doesn't exist any more. The alias has to encode the information needed to
access it in its metadata (linkage, visibility, type, etc).
Another consequence to notice is that getSection has to return a "const char *".
It could return a NullTerminatedStringRef if there was such a thing, but when
that was proposed the decision was to just uses "const char*" for that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210062 91177308-0d34-0410-b5e6-96231b3b80d8
Unordered is strictly weaker than monotonic, so if the latter doesn't have any
barriers then the former certainly shouldn't.
rdar://problem/16548260
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209901 91177308-0d34-0410-b5e6-96231b3b80d8
Darwin prologues save their GPRs in two stages: a narrow push of r0-r7 & lr,
followed by a wide push of the remaining registers if there are any. AAPCS uses
a single push.w instruction.
It turns out that, on average, enough registers get pushed that code is smaller
in the AAPCS prologue, which is a nice property for M-class programmers. They
also have other options available for back-traces, so can hopefully deal with
the fact that FP & LR aren't adjacent in memory.
rdar://problem/15909583
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209895 91177308-0d34-0410-b5e6-96231b3b80d8
The C and C++ semantics for compare_exchange require it to return a bool
indicating success. This gets mapped to LLVM IR which follows each cmpxchg with
an icmp of the value loaded against the desired value.
When lowered to ldxr/stxr loops, this extra comparison is redundant: its
results are implicit in the control-flow of the function.
This commit makes two changes: it replaces that icmp with appropriate PHI
nodes, and then makes sure earlyCSE is called after expansion to actually make
use of the opportunities revealed.
I've also added -{arm,aarch64}-enable-atomic-tidy options, so that
existing fragile tests aren't perturbed too much by the change. Many
of them either rely on undef/unreachable too pervasively to be
restored to something well-defined (particularly while making sure
they test the same obscure assert from many years ago), or depend on a
particular CFG shape, which is disrupted by SimplifyCFG.
rdar://problem/16227836
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209883 91177308-0d34-0410-b5e6-96231b3b80d8
Cortex-M4 only has single-precision floating point support, so any LLVM
"double" type will have been split into 2 i32s by now. Fortunately, the
consecutive-register framework turns out to be precisely what's needed to
reconstruct the double and follow AAPCS-VFP correctly!
rdar://problem/17012966
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209650 91177308-0d34-0410-b5e6-96231b3b80d8
This intrinsic permits the emission of platform specific undefined sequences.
ARM has reserved the 0xde opcode which takes a single integer parameter (ignored
by the CPU). This permits the operating system to implement custom behaviour on
this trap. The llvm.arm.undefined intrinsic is meant to provide a means for
generating the target specific behaviour from the frontend. This is
particularly useful for Windows on ARM which has made use of a series of these
special opcodes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209390 91177308-0d34-0410-b5e6-96231b3b80d8
Although the previous code would construct a bundle and add the correct elements
to it, it would not finalise the bundle. This resulted in the InternalRead
markers not being added to the MachineOperands nor, more importantly, the
externally visible defs to the bundle itself. So, although the bundle was not
exposing the def, the generated code would be correct because there was no
optimisations being performed. When optimisations were enabled, the post
register allocator would kick in, and the hazard recognizer would reorder
operations around the load which would define the value being operated upon.
Rather than manually constructing the bundle, simply construct and finalise the
bundle via the finaliseBundle call after both MIs have been emitted. This
improves the code generation with optimisations where IMAGE_REL_ARM_MOV32T
relocations are emitted.
The changes to the other tests are the result of the bundle generation
preventing the scheduler from hoisting the moves across the loads. The net
effect of the generated code is equivalent, but, is much more identical to what
is actually being lowered.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209267 91177308-0d34-0410-b5e6-96231b3b80d8
- On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though.
- On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal.
- On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209123 91177308-0d34-0410-b5e6-96231b3b80d8
Windows on ARM uses R11 for the frame pointer even though the environment is a
pure Thumb-2, thumb-only environment. Replicate this behaviour to improve
Windows ABI compatibility. This register is used for fast stack walking, and
thus is part of the Windows ABI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209085 91177308-0d34-0410-b5e6-96231b3b80d8
WoA uses COFF, not ELF. ARMISelLowering::createTLOF would previously return ELF
for any non-MachO platform. This was a missed site when the original change for
target format support for Windows on ARM was done.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209057 91177308-0d34-0410-b5e6-96231b3b80d8
This patch changes the design of GlobalAlias so that it doesn't take a
ConstantExpr anymore. It now points directly to a GlobalObject, but its type is
independent of the aliasee type.
To avoid changing all alias related tests in this patches, I kept the common
syntax
@foo = alias i32* @bar
to mean the same as now. The cases that used to use cast now use the more
general syntax
@foo = alias i16, i32* @bar.
Note that GlobalAlias now behaves a bit more like GlobalVariable. We
know that its type is always a pointer, so we omit the '*'.
For the bitcode, a nice surprise is that we were writing both identical types
already, so the format change is minimal. Auto upgrade is handled by looking
through the casts and no new fields are needed for now. New bitcode will
simply have different types for Alias and Aliasee.
One last interesting point in the patch is that replaceAllUsesWith becomes
smart enough to avoid putting a ConstantExpr in the aliasee. This seems better
than checking and updating every caller.
A followup patch will delete getAliasedGlobal now that it is redundant. Another
patch will add support for an explicit offset.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209007 91177308-0d34-0410-b5e6-96231b3b80d8
DIBuilder maintains this invariant and the current DwarfDebug code could
end up doing weird things if it contained declarations (such as putting
the definition DIE inside a CU that contained the declaration - this
doesn't seem like a good idea, so rather than adding logic to handle
this case we'll just ban in for now & cross that bridge if we come to
it later).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209004 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r208934.
The patch depends on aliases to GEPs with non zero offsets. That is not
supported and fairly broken.
The good news is that GlobalAlias is being redesigned and will have support
for offsets, so this patch should be a nice match for it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208978 91177308-0d34-0410-b5e6-96231b3b80d8
Add some Windows on ARM specific library calls. These are provided by msvcrt,
and can be used to perform integer to floating-point conversions (and
vice-versa) mirroring similar functions in the RTABI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208949 91177308-0d34-0410-b5e6-96231b3b80d8
This commit implements two command line switches -global-merge-on-external
and -global-merge-aligned, and both of them are false by default, so this
optimization is disabled by default for all targets.
For ARM64, some back-end behaviors need to be tuned to get this optimization
further enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208934 91177308-0d34-0410-b5e6-96231b3b80d8
If the function has the landingpad instruction, then the
handlerdata should be emitted even if the function has
nouwnind attribute. Otherwise, following code will not
work:
void test1() noexcept {
try {
throw_exception();
} catch (...) {
log_unexpected_exception();
}
}
Since the cantunwind was incorrectly emitted and the
LSDA is not available.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208791 91177308-0d34-0410-b5e6-96231b3b80d8
The commit r208166 will cause some regression on ARM EHABI.
This fix has been committed in r208715, and an assertion failure
test case has been committed in r208770.
This commit further extends the unittest so that the actual
value in the handlerdata will be checked.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208790 91177308-0d34-0410-b5e6-96231b3b80d8
This commit was already commited as revision rL208689 and discussd in
phabricator revision D3704.
But the test file was crashing on OS X and windows.
I fixed the test file in the same way as in rL208340.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208711 91177308-0d34-0410-b5e6-96231b3b80d8
Windows on ARM only supports thumb mode execution, so we have to
explicitly pick some non-Windows OS to test ARM mode codegen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208638 91177308-0d34-0410-b5e6-96231b3b80d8
The current patterns for REV16 misses mostn __builtin_bswap16() due to
legalization promoting the operands to from load/stores toi32s and then
truncing/extending them. This patch adds new patterns that catch the resultant
DAGs and codegens them to rev16 instructions. Tests included.
rdar://15353652
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208620 91177308-0d34-0410-b5e6-96231b3b80d8
Windows on ARM only supports thumb mode execution, so we have to
explicitly pick some non-Windows OS to test ARM mode codegen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208448 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support to ARM for custom lowering of the
llvm.{u|s}add.with.overflow.i32 intrinsics for i32/i64. This is particularly useful
for handling idiomatic saturating math functions as generated by
InstCombineCompare.
Test cases included.
rdar://14853450
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208435 91177308-0d34-0410-b5e6-96231b3b80d8
When using the ARM AAPCS, HFAs (Homogeneous Floating-point Aggregates) must
be passed in a block of consecutive floating-point registers, or on the stack.
This means that unused floating-point registers cannot be back-filled with
part of an HFA, however this can currently happen. This patch, along with the
corresponding clang patch (http://reviews.llvm.org/D3083) prevents this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208413 91177308-0d34-0410-b5e6-96231b3b80d8
Handle lowering of global addresses for PIC mode compilation on Windows. Always
use the movw/movt load to load the address as Windows on ARM requires ARMv7+ and
is a pure Thumb environment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208385 91177308-0d34-0410-b5e6-96231b3b80d8
This tightens up r208351 to ensure that a store is fed with the
correct value.
Thanks to Quentin Colombet for spotting this!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208368 91177308-0d34-0410-b5e6-96231b3b80d8
When building on Windows, the default target is Windows. Windows on ARM does
not support ARM mode compilation, resulting in test failures. Simply specify a
triple to ensure that we are testing the correct behaviour.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208340 91177308-0d34-0410-b5e6-96231b3b80d8
The ARM::BLX instruction is an ARM mode instruction. The Windows on ARM target
is limited to Thumb instructions. Correctly use the thumb mode tBLXr
instruction. This would manifest as an errant write into the object file as the
instruction is 4-bytes in length rather than 2. The result would be a corrupted
object file that would eventually result in an executable that would crash at
runtime.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208152 91177308-0d34-0410-b5e6-96231b3b80d8
remove it from the list of unspilled registers. Otherwise the following
attempt to keep the stack aligned by picking an extra GPR register to
spill will not work as it picks up r11.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208129 91177308-0d34-0410-b5e6-96231b3b80d8
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).
So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208104 91177308-0d34-0410-b5e6-96231b3b80d8
Windows on ARM does not conform to AEABI. However, memset would be emitted
using the AEABI signature, resulting in inverted parameters. Handle this
special case appropriately.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207943 91177308-0d34-0410-b5e6-96231b3b80d8
This introduces the stack lowering emission of the stack probe function for
Windows on ARM. The stack on Windows on ARM is a dynamically paged stack where
any page allocation which crosses a page boundary of the following guard page
will cause a page fault. This page fault must be handled by the kernel to
ensure that the page is faulted in. If this does not occur and a write access
any memory beyond that, the page fault will go unserviced, resulting in an
abnormal program termination.
The watermark for the stack probe appears to be at 4080 bytes (for
accommodating the stack guard canaries and stack alignment) when SSP is
enabled. Otherwise, the stack probe is emitted on the page size boundary of
4096 bytes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207615 91177308-0d34-0410-b5e6-96231b3b80d8
IMAGE_REL_ARM_MOV32T relocations require that the movw/movt pair-wise
relocation is not split up and reordered. When expanding the mov32imm
pseudo-instruction, create a bundle if the machine operand is referencing an
address. This helps ensure that the relocatable address load is not reordered
by subsequent passes.
Unfortunately, this only partially handles the case as the Constant Island Pass
occurs after the instructions are unbundled and does not properly handle
bundles. That is a more fundamental issue with the pass itself and beyond the
scope of this change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207608 91177308-0d34-0410-b5e6-96231b3b80d8
This intrinsic is no longer needed with the new @llvm.arm.hint(i32) intrinsic
which provides a generic, extensible manner for adding hint instructions. This
functionality can now be represented as @llvm.arm.hint(i32 5).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207246 91177308-0d34-0410-b5e6-96231b3b80d8
Introduce the llvm.arm.hint(i32) intrinsic that can be used to inject hints into
the instruction stream. This is particularly useful for generating IR from a
compiler where the user may inject an intrinsic (e.g. __yield). These are then
pattern substituted into the correct instruction which already existed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207242 91177308-0d34-0410-b5e6-96231b3b80d8
The 'CHECK: add' line was occasionally matching against the filename,
breaking the subsequent CHECK-NOT. Also use CHECK-LABEL.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206936 91177308-0d34-0410-b5e6-96231b3b80d8
The point of these calls is to allow Thumb-1 code to make use of the VFP unit
to perform its operations. This is not desirable with -msoft-float, since most
of the reasons you'd want that apply equally to the runtime library.
rdar://problem/13766161
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206874 91177308-0d34-0410-b5e6-96231b3b80d8
handles Intrinsic::trap if TargetOptions::TrapFuncName is set.
This fixes a bug in which the trap function was not taken into consideration
when a program was compiled without optimization (at -O0).
<rdar://problem/16291933>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206323 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, BranchProbabilityInfo::calcLoopBranchHeuristics would determine the weights of basic blocks inside loops even when it didn't have enough information to estimate the branch probabilities correctly. This patch fixes the function to exit early if it doesn't see any exit edges or back edges and let the later heuristics determine the weights.
This fixes PR18705 and <rdar://problem/15991090>.
Differential Revision: http://reviews.llvm.org/D3363
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206194 91177308-0d34-0410-b5e6-96231b3b80d8
The current memory-instruction optimization logic in CGP, which sinks parts of
the address computation that can be adsorbed by the addressing mode, does this
by explicitly converting the relevant part of the address computation into
IR-level integer operations (making use of ptrtoint and inttoptr). For most
targets this is currently not a problem, but for targets wishing to make use of
IR-level aliasing analysis during CodeGen, the use of ptrtoint/inttoptr is a
problem for two reasons:
1. BasicAA becomes less powerful in the face of the ptrtoint/inttoptr
2. In cases where type-punning was used, and BasicAA was used
to override TBAA, BasicAA may no longer do so. (this had forced us to disable
all use of TBAA in CodeGen; something which we can now enable again)
This (use of GEPs instead of ptrtoint/inttoptr) is not currently enabled by
default (except for those targets that use AA during CodeGen), and so aside
from some PowerPC subtargets and SystemZ, there should be no change in
behavior. We may be able to switch completely away from the ptrtoint/inttoptr
sinking on all targets, but further testing is required.
I've doubled-up on a number of existing tests that are sensitive to the
address sinking behavior (including some store-merging tests that are
sensitive to the order of the resulting ADD operations at the SDAG level).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206092 91177308-0d34-0410-b5e6-96231b3b80d8
This removes the -segmented-stacks command line flag in favor of a
per-function "split-stack" attribute.
Patch by Luqman Aden and Alex Crichton!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205997 91177308-0d34-0410-b5e6-96231b3b80d8
More updating of tests to be explicit about the target triple rather than
relying on the default target triple supporting ARM mode.
Indicate to lit that object emission is not yet available for Windows on ARM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205545 91177308-0d34-0410-b5e6-96231b3b80d8
This changes the tests that were targeting ARM EABI to explicitly specify the
environment rather than relying on the default. This breaks with the new
Windows on ARM support when running the tests on Windows where the default
environment is no longer EABI.
Take the opportunity to avoid a pointless redirect (helps when trying to debug
with providing a command line invocation which can be copy and pasted) and
removing a few greps in favour of FileCheck.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205541 91177308-0d34-0410-b5e6-96231b3b80d8
Implementing this via ComputeMaskedBits has two advantages:
+ It actually works. DAGISel doesn't deal with the chains properly
in the previous pattern-based solution, so they never trigger.
+ The information can be used in other DAG combines, as well as the
trivial "get rid of truncs". For example if the trunc is in a
different basic block.
rdar://problem/16227836
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205540 91177308-0d34-0410-b5e6-96231b3b80d8