This patch adds support to ARM for custom lowering of the
llvm.{u|s}add.with.overflow.i32 intrinsics for i32/i64. This is particularly useful
for handling idiomatic saturating math functions as generated by
InstCombineCompare.
Test cases included.
rdar://14853450
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208435 91177308-0d34-0410-b5e6-96231b3b80d8
When using the ARM AAPCS, HFAs (Homogeneous Floating-point Aggregates) must
be passed in a block of consecutive floating-point registers, or on the stack.
This means that unused floating-point registers cannot be back-filled with
part of an HFA, however this can currently happen. This patch, along with the
corresponding clang patch (http://reviews.llvm.org/D3083) prevents this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208413 91177308-0d34-0410-b5e6-96231b3b80d8
Handle lowering of global addresses for PIC mode compilation on Windows. Always
use the movw/movt load to load the address as Windows on ARM requires ARMv7+ and
is a pure Thumb environment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208385 91177308-0d34-0410-b5e6-96231b3b80d8
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).
So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208104 91177308-0d34-0410-b5e6-96231b3b80d8
Otherwise the legalizer would just scalarize everything. Support for
mulhi in the targets isn't that great yet so on most targets we get
exactly the same scalarized output. Add a test for x86 vector udiv.
I had to disable the mulhi nodes on ARM because there aren't any patterns
for it. As far as I know ARM has instructions for getting the high part of
a multiply so this should be fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207315 91177308-0d34-0410-b5e6-96231b3b80d8
This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur. It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.
Reviewers: nicholas
Differential Revision: http://llvm-reviews.chandlerc.com/D3240
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207143 91177308-0d34-0410-b5e6-96231b3b80d8
The point of these calls is to allow Thumb-1 code to make use of the VFP unit
to perform its operations. This is not desirable with -msoft-float, since most
of the reasons you'd want that apply equally to the runtime library.
rdar://problem/13766161
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206874 91177308-0d34-0410-b5e6-96231b3b80d8
Still only 32-bit ARM using it at this stage, but the promotion allows
direct testing via opt and is a reasonably self-contained patch on the
way to switching ARM64.
At this point, other targets should be able to make use of it without
too much difficulty if they want. (See ARM64 commit coming soon for an
example).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206485 91177308-0d34-0410-b5e6-96231b3b80d8
Implementing this via ComputeMaskedBits has two advantages:
+ It actually works. DAGISel doesn't deal with the chains properly
in the previous pattern-based solution, so they never trigger.
+ The information can be used in other DAG combines, as well as the
trivial "get rid of truncs". For example if the trunc is in a
different basic block.
rdar://problem/16227836
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205540 91177308-0d34-0410-b5e6-96231b3b80d8
The previous situation where ATOMIC_LOAD_WHATEVER nodes were expanded
at MachineInstr emission time had grown to be extremely large and
involved, to account for the subtly different code needed for the
various flavours (8/16/32/64 bit, cmpxchg/add/minmax).
Moving this transformation into the IR clears up the code
substantially, and makes future optimisations much easier:
1. an atomicrmw followed by using the *new* value can be more
efficient. As an IR pass, simple CSE could handle this
efficiently.
2. Making use of cmpxchg success/failure orderings only has to be done
in one (simpler) place.
3. The common "cmpxchg; did we store?" idiom can be exposed to
optimisation.
I intend to gradually improve this situation within the ARM backend
and make sure there are no hidden issues before moving the code out
into CodeGen to be shared with (at least ARM64/AArch64, though I think
PPC & Mips could benefit too).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205525 91177308-0d34-0410-b5e6-96231b3b80d8
add operation since extract_vector_elt can perform an extend operation. Get the input lane
type from the vector on which we're performing the vpaddl operation on and extend or
truncate it to the output type of the original add node.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205523 91177308-0d34-0410-b5e6-96231b3b80d8
Update the subtarget information for Windows on ARM. This enables using the MC
layer to target Windows on ARM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205459 91177308-0d34-0410-b5e6-96231b3b80d8
We've already got versions without the barriers, so this just adds IR-level
support for generating the new v8 ones.
rdar://problem/16227836
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204813 91177308-0d34-0410-b5e6-96231b3b80d8
Use the options in the ARMISelLowering to control whether tail calls are
optimised or not. Previously, this option was entirely ignored on the ARM
target and only honoured on x86.
This option is mostly useful in profiling scenarios. The default remains that
tail call optimisations will be applied.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203577 91177308-0d34-0410-b5e6-96231b3b80d8
This option is from 2010, designed to work around a linker issue on Darwin for
ARM. According to grosbach this is no longer an issue and this option can
safely be removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203576 91177308-0d34-0410-b5e6-96231b3b80d8
ATOMIC_STORE operations always get here as a lowered ATOMIC_SWAP, so there's no
need for any code to handle them specially.
There should be no functionality change so no tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203567 91177308-0d34-0410-b5e6-96231b3b80d8
The syntax for "cmpxchg" should now look something like:
cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic
where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).
rdar://problem/15996804
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203559 91177308-0d34-0410-b5e6-96231b3b80d8
NaCl's ARM ABI uses 16 byte stack alignment, so set that in
ARMSubtarget.cpp.
Using 16 byte alignment exposes an issue in code generation in which a
varargs function leaves a 4 byte gap between the values of r1-r3 saved
to the stack and the following arguments that were passed on the
stack. (Previously, this code only needed to support 4 byte and 8
byte alignment.)
With this issue, llc generated:
varargs_func:
sub sp, sp, #16
push {lr}
sub sp, sp, #12
add r0, sp, #16 // Should be 20
stm r0, {r1, r2, r3}
ldr r0, .LCPI0_0 // Address of va_list
add r1, sp, #16
str r1, [r0]
bl external_func
Fix the bug by checking for "Align > 4". Also simplify the code by
using OffsetToAlignment(), and update comments.
Differential Revision: http://llvm-reviews.chandlerc.com/D2677
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201497 91177308-0d34-0410-b5e6-96231b3b80d8
Similarly to the vshrn instructions, these are simple zext/sext + trunc
operations. Using normal LLVM IR should allow for better code, and more sharing
with the AArch64 backend.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201093 91177308-0d34-0410-b5e6-96231b3b80d8
vshrn is just the combination of a right shift and a truncate (and the limits
on the immediate value actually mean the signedness of the shift doesn't
matter). Using that representation allows us to get rid of an ARM-specific
intrinsic, share more code with AArch64 and hopefully get better code out of
the mid-end optimisers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201085 91177308-0d34-0410-b5e6-96231b3b80d8
Before this patch we used getIntImmCost from TargetTransformInfo to determine if
a load of a constant should be converted to just a constant, but the threshold
for this was set to an arbitrary value. This value works well for the two
targets (X86 and ARM) that implement this target-hook, but it isn't
target-independent at all.
Now targets have the possibility to decide directly if this optimization should
be performed. The default value is set to false to preserve the current
behavior. The target hook has been moved to TargetLowering, which removed the
last use and need of TargetTransformInfo in SelectionDAG.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200271 91177308-0d34-0410-b5e6-96231b3b80d8
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200018 91177308-0d34-0410-b5e6-96231b3b80d8
The already allocatable DPair superclass contains odd-even D register
pair in addition to the even-odd pairs in the QPR register class. There
is no reason to constrain the set of D register pairs that can be used
for NEON values. Any NEON instructions that require a Q register will
automatically constrain the register class to QPR.
The allocation order for DPair begins with the QPR registers, so
register allocation is unlikely to change much.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199186 91177308-0d34-0410-b5e6-96231b3b80d8
The ARM backend has been using most of the MachO related subtarget
checks almost interchangeably, and since the only target it's had to
run on has been IOS (which is all three of MachO, Darwin and IOS) it's
worked out OK so far.
But we'd like to support embedded targets under the "*-*-none-macho"
triple, which means everything starts falling apart and inconsistent
behaviours emerge.
This patch should pick a reasonably sensible set of behaviours for the
new triple (and any others that come along, with luck). Some choices
were debatable (notably FP == r7 or r11), but we can revisit those
later when deficiencies become apparent.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198617 91177308-0d34-0410-b5e6-96231b3b80d8
This moves the check up into the parent class so that all targets can use it
without having to copy (and keep in sync) the same error message.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198579 91177308-0d34-0410-b5e6-96231b3b80d8
__builtin_returnaddress requires that the value passed into is be a constant.
However, at -O0 even a constant expression may not be converted to a constant.
Emit an error message intead of crashing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198531 91177308-0d34-0410-b5e6-96231b3b80d8
Given vsel_cc, op1, op2, since vsel has no LE/LT, to generate vsel for
such selection, it needs to inverse cc and swap op1 and op2. To inverse
cc, both L/G and E bits should be flipped.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@197615 91177308-0d34-0410-b5e6-96231b3b80d8
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196471 91177308-0d34-0410-b5e6-96231b3b80d8
These are used by MachO only at the moment, and (much like the existing
MOVW/MOVT set) work around the fact that the labels used in the actual
instructions often contain PC-dependent components, which means that repeatedly
materialising the same global can't be CSEed.
With small modifications, it could be adapted to how ELF finds the address of
_GLOBAL_OFFSET_TABLE_, which would give similar benefits in PIC mode there.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196090 91177308-0d34-0410-b5e6-96231b3b80d8
These are handled almost identically to static mode (and ELF's global address
materialisation), except that a symbol may have "$non_lazy_ptr" appended. This
can be handled by passing appropriate flags along with the instruction instead
of using entirely separate pseudo-instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195655 91177308-0d34-0410-b5e6-96231b3b80d8
We used to perform an invalid operation on an MVT and crash, which wasn't much
fun.
Patch by Oliver Stannard.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194714 91177308-0d34-0410-b5e6-96231b3b80d8