Commit Graph

7694 Commits

Author SHA1 Message Date
Reid Kleckner
e1a4787d5d Work around bugs in MSVC "14" CTP 3's conversion logic
It appears to ignore or find ambiguous MachineInstrBuilder's conversion
operators that allow conversion to MachineInstr* and
MachineBasicBlock::bundle_iterator.

As a workaround, add an explicit way to get the MachineInstr.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221017 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-31 23:19:46 +00:00
Quentin Colombet
9b6ca9304c [CodeGenPrepare] Move extractelement close to store if they can be combined.
This patch adds an optimization in CodeGenPrepare to move an extractelement
right before a store when the target can combine them.
The optimization may promote any scalar operations to vector operations in the
way to make that possible.


** Context **

Some targets use different register files for both vector and scalar operations.
This means that transitioning from one domain to another may incur copy from one
register file to another. These copies are not coalescable and may be expensive.
For example, according to the scheduling model, on cortex-A8 a vector to GPR
move is 20 cycles.


** Motivating Example **

Let us consider an example:
define void @foo(<2 x i32>* %addr1, i32* %dest) {
 %in1 = load <2 x i32>* %addr1, align 8
 %extract = extractelement <2 x i32> %in1, i32 1
 %out = or i32 %extract, 1
 store i32 %out, i32* %dest, align 4
 ret void
}

As it is, this IR generates the following assembly on armv7:
  vldr  d16, [r0]            @vector load  
  vmov.32 r0, d16[1]  @ cross-register-file copy: 20 cycles
  orr r0, r0, #1           @ scalar bitwise or
  str r0, [r1]               @ scalar store
  bx  lr

Whereas we could generate much faster code:
  vldr  d16, [r0]               @ vector load
  vorr.i32  d16, #0x1     @ vector bitwise or
  vst1.32 {d16[1]}, [r1:32] @ vector extract + store
  bx  lr

Half of the computation made in the vector is useless, but this allows to get
rid of the expensive cross-register-file copy.


** Proposed Solution **

To avoid this cross-register-copy penalty, we promote the scalar operations to
vector operations. The penalty will be removed if we manage to promote the whole
chain of computation in the vector domain.
Currently, we do that only when the chain of computation ends by a store and the
target is able to combine an extract with a store.

Stores are the most likely candidates, because other instructions produce values
that would need to be promoted and so, extracted as some point[1]. Moreover,
this is customary that targets feature stores that perform a vector extract (see
AArch64 and X86 for instance).

The proposed implementation relies on the TargetTransformInfo to decide whether
or not it is beneficial to promote a chain of computation in the vector domain.
Unfortunately, this interface is rather inaccurate for this level of details and
although this optimization may be beneficial for X86 and AArch64, the inaccuracy
will lead to the optimization being too aggressive.
Basically in TargetTransformInfo, everything that is legal has a cost of 1,
whereas, even if a vector type is legal, usually a vector operation is slightly
more expensive than its scalar counterpart. That will lead to too many
promotions that may not be counter balanced by the saving of the
cross-register-file copy. For instance, on AArch64 this penalty is just 4
cycles.

For now, the optimization is just enabled for ARM prior than v8, since those
processors have a larger penalty on cross-register-file copies, and the scope is
limited to basic blocks. Because of these two factors, we limit the effects of
the inaccuracy. Indeed, I did not want to build up a fancy cost model with block
frequency and everything on top of that.

[1] We can imagine targets that can combine an extractelement with  other
instructions than just stores. If we want to go into that direction, the current
interfaces must be augmented and, moreover, I think this becomes a global isel
problem.

Differential Revision: http://reviews.llvm.org/D5921

<rdar://problem/14170854>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220978 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-31 17:52:53 +00:00
Oliver Stannard
b1d8e7e77c [ARM] Select VMAXNM and VMINNM regardless of operand order
Currently, the ARM backend will select the VMAXNM and VMINNM for these C
expressions:
  (a < b) ? a : b
  (a > b) ? a : b
but not these expressions:
  (a > b) ? b : a
  (a < b) ? b : a

This patch allows all of these expressions to be matched.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220671 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-27 09:23:02 +00:00
Renato Golin
06b11e36e5 Do not emit intermediate register for zero FP immediate
This updates check for double precision zero floating point constant to allow
use of instruction with immediate value rather than temporary register.
Currently "a == 0.0", where "a" is of "double" type generates:

vmov.i32        d16, #0x0
vcmpe.f64       d0, d16

With this change it becomes:

vcmpe.f64        d0, #0

Patch by Sergey Dmitrouk.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220486 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-23 15:31:50 +00:00
Oliver Stannard
9982879c4e [Thumb2] Improve disassembly of memory hints
Currently, the ARM disassembler will disassemble the Thumb2 memory hint
instructions (PLD, PLDW and PLI), even for targets which do not have
these instructions. This patch adds the required checks to the
disassmebler.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220472 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-23 08:52:58 +00:00
Akira Hatanaka
be9668675a [ARM, stack protector] If supported, use armv7 instructions.
This commit enables using movt/movw to load the stack guard address:

movw r0, :lower16:(L_g3$non_lazy_ptr-(LPC0_0+8))
movt r0, :upper16:(L_g3$non_lazy_ptr-(LPC0_0+8))
ldr r0, [pc, r0]

Previously a pc-relative load was emitted:

ldr r0, LCPI0_0
ldr r0, [pc, r0]

rdar://problem/18740489


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220470 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-23 04:17:05 +00:00
Jyoti Allur
8546076401 [Thumb/Thumb2] Implement restrictions on SP in register list on LDM, STM variants in thumb mode
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220379 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-22 10:41:14 +00:00
Oliver Stannard
00e0b8a016 [ARM] NEON 32-bit scalar moves are also available in VFPv2
The 32-bit variants of the NEON scalar<->GPR move instructions are
also available in VFPv2. The 8- and 16-bit variants do require NEON.

Note that the checks in the test file are all -DAG because they are
checking a mixture of stdout and stderr, and the ordering is not
guaranteed.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220288 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-21 11:49:14 +00:00
Oliver Stannard
d75e7ad0c8 [Thumb2] LDRS?[BH] cannot load to the PC
The Thumb2 LDRS?[BH] instructions are not valid when the destination
register is the PC (these encodings are used for preload hints).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220278 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-21 09:14:15 +00:00
Tim Northover
4e399f4500 ARM: rework Thumb1 frame index rewriting
The previous code had a few problems, motivating the choices here.

1. It could create instructions clobbering CPSR, but the incoming MachineInstr
   didn't reflect this. A potential source of corruption. This is why the patch
   has a new PseudoInst for before lowering.
2. Similarly, there was some code to handle the incoming instruction not being
   ARMCC::AL, but this would have caused massive problems if it was actually
   invoked when a complex offset needing more than one instruction was requested.
3. It wasn't designed to handle unaligned pointers (or offsets). These should
   probably be minimised anyway, but the code needs to deal with them properly
   regardless.
4. It had some rather dubious ad-hoc code to avoid calling
   emitThumbRegPlusImmediate, a function which should be designed to do precisely
   this job.

We seem to cover the common cases correctly now, and hopefully can enhance
emitThumbRegPlusImmediate to handle any extra optimisations we need to add in
future.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220236 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-20 21:28:41 +00:00
Oliver Stannard
e7c9c44387 [Thumb2] RFE, SRS and "SUBS pc, lr" are undefined on v7M
These instructions are related to the v7[AR] exception model, and are
not defined on v7M.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220204 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-20 15:37:35 +00:00
Oliver Stannard
19d010b851 [ARM] Do not select SMULW[BT] or SMLAW[BT]
The current instruction selection patterns for SMULW[BT] and SMLAW[BT]
are incorrect. These instructions multiply a 32-bit and a 16-bit value
(both signed) and return the top 32 bits of the 48-bit result. This
preserves the 16 bits of overflow, whereas the patterns they currently
match truncate the result to 16 bits then sign extend.

To select these instructions, we would need to match an ISD::SMUL_LOHI,
a sign extend, two shifts and an or. There is no way to match SMUL_LOHI
in an instruction pattern as it defines multiple values, so this would
have to be done in C++. I have raised
http://llvm.org/bugs/show_bug.cgi?id=21297 to cover allowing correct
selection of these instructions.

This fixes http://llvm.org/bugs/show_bug.cgi?id=19396



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220196 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-20 11:30:35 +00:00
Oliver Stannard
508c39393a [Thumb] Fix crash in Thumb1RegisterInfo::rewriteFrameIndex
This function can, for some offsets from the SP, split one instruction
into two. Since it re-uses the original instruction as the first
instruction of the result, we need ensure its result register is not
marked as dead before we use it in the second instruction.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220194 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-20 11:00:18 +00:00
Bob Wilson
efed41c621 Use triple predicate functions instead of checking values directly. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220155 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-19 00:39:30 +00:00
Akira Hatanaka
1cdebe50c1 ARM: Fix a bug which was causing convergence failure in constant-island pass.
The bug is in ARMConstantIslands::createNewWater where the upper bound of the
new water split point is computed:

// This could point off the end of the block if we've already got constant
// pool entries following this block; only the last one is in the water list.
// Back past any possible branches (allow for a conditional and a maximally
// long unconditional).
if (BaseInsertOffset + 8 >= UserBBI.postOffset()) {
  BaseInsertOffset = UserBBI.postOffset() - UPad - 8;
  DEBUG(dbgs() << format("Move inside block: %#x\n", BaseInsertOffset));
}

The split point is supposed to be somewhere between the machine instruction that
loads from the constant pool entry and the end of the basic block, before branch
instructions. The code above is fine if the basic block is large enough and
there are a sufficient number of instructions following the machine instruction.
However, if the machine instruction is near the end of the basic block,
BaseInsertOffset can point to the machine instruction or another instruction
that precedes it, and this can lead to convergence failure.

This commit fixes this bug by ensuring BaseInsertOffset is larger than the
offset of the instruction following the constant-loading instruction.

rdar://problem/18581150


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220015 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-17 01:31:47 +00:00
Rafael Espindola
90ce9f70e2 Simplify handling of --noexecstack by using getNonexecutableStackSection.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219799 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-15 16:12:52 +00:00
Tim Northover
d3458577a9 ARM: drop check for triple that's no longer used.
Early attempts to support AAPCS bare metal MachO targets based the decision on
the CPU being compiled for. This was not a particularly great idea and we've
got a better option now, but this check remained.

No functional change for any target we care about.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219767 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-15 01:05:01 +00:00
Tim Northover
7419c9c0c0 ARM: remove ARM/Thumb distinction for preferred alignment.
Thumb1 has legitimate reasons for preferring 32-bit alignment of types
i1/i8/i16, since the 16-bit encoding of "add rD, sp, #imm" requires #imm to be
a multiple of 4. However, this is a trade-off betweem code size and RAM usage;
the DataLayout string is not the best place to represent it even if desired.

So this patch removes the extra Thumb requirements, hopefully making ARM and
Thumb completely compatible in this respect.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219734 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-14 22:12:17 +00:00
Tim Northover
32d728fbb9 ARM: allow misaligned local variables in Thumb1 mode.
There's no hard requirement on LLVM to align local variable to 32-bits, so the
Thumb1 frame handling needs to be able to deal with variables that are only
naturally aligned without falling over.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219733 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-14 22:12:14 +00:00
Tim Northover
eddeac0b8c ARM: set preferred aggregate alignment to 32 universally.
Before, ARM and Thumb mode code had different preferred alignments, which could
lead to some rather unexpected results. There's justification for reducing it
from the default 64-bits (wasted space), but I don't think there is for going
below 32-bits.

There's no actual ABI change here, just to reassure people.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219719 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-14 20:57:26 +00:00
Eric Christopher
0ba4483d01 Grab the subtarget info off of the MachineFunction rather than
indirecting through the TargetMachine.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219674 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-14 08:44:19 +00:00
Eric Christopher
9accb10855 Include map into the A15SDOptimizer rather than pick it up
transitively from the DFAPacketizer via TargetInstrInfo.h.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219652 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-14 01:13:51 +00:00
Renato Golin
d0c745a9f0 Adds support for the Cortex-A17 to the ARM backend
Patch by Matthew Wahab.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219606 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-13 10:22:19 +00:00
Benjamin Kramer
fedd0e2a21 MC: Bit pack MCSymbolData.
On x86_64 this brings it from 80 bytes to 64 bytes. Also make any member
variables private and clean up uses to go through the existing accessors.

NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219573 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-11 15:07:21 +00:00
Benjamin Kramer
07200c4b6c Remove a compiler bug workaround from 2007. The affected versions of gcc are long gone.
NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219433 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-09 19:50:39 +00:00
Bob Wilson
8e673570b9 Use triple's isiOS() and isOSDarwin() methods.
These methods are already used in lots of places. This makes things more
consistent. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219386 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-09 05:43:30 +00:00
Renato Golin
1e059a88f8 Emit unaligned access build attribute for ARM
Patch by Charlie Turner.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219301 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-08 12:26:22 +00:00
Renato Golin
ef2ed3f465 Refactor isThumb1Only() && isMClass() into a predicate called isV6M()
This must be enforced for all v6M cores, not just the cortex-m0,
irregardless of the user-specified alignment.

Patch by Charlie Turner.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219300 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-08 12:26:16 +00:00
Renato Golin
3828392790 Simplify switch statement in ARM subtarget align access
This switch can be reduced to a simpler if/else statement.

Patch by Charlie Turner.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219299 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-08 12:26:13 +00:00
Eric Christopher
b7cd35b171 Cache TargetLowering on SelectionDAGISel and update previous
calls to getTargetLowering() with the cached variable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219284 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-08 07:32:17 +00:00
NAKAMURA Takumi
844eeb3741 ARMInstPrinter.cpp: Suppress a warning for -Asserts. [-Wunused-variable]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219172 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-06 23:48:04 +00:00
Tim Northover
ca6c95df36 ARM: silence unused variable warning
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219128 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-06 17:26:36 +00:00
Tim Northover
969587d62d ARM: remove dead InstPrinting code
This instruction form is handled by different AsmOperands now, so the code is
completely dead (and wrong anyway).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219127 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-06 17:10:13 +00:00
Eric Christopher
e6f32be8df Add subtarget caches to aarch64, arm, ppc, and x86.
These will make it easier to test further changes to the
code generation and optimization pipelines as those are
moved to subtargets initialized with target feature and
target cpu.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219106 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-06 06:45:36 +00:00
Benjamin Kramer
63688e622c Eliminate some deep std::vector copies. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218999 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-03 18:33:16 +00:00
Renato Golin
b157cb7afd Revert 202433 - Provide a target override for the latest regalloc heuristic
That commit was introduced in order to help investigate a problem in ARM
codegen breaking from commit 202304 (Add a limit to the heuristic that register
allocates instructions in local order). Recent analisys indicated that the
problem no longer exists, so I'm reverting this change.

See PR18996.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218981 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-03 12:20:53 +00:00
Eric Christopher
59cacc9dec constify TargetMachine argument.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218930 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-03 00:17:59 +00:00
Eric Christopher
1340986490 We can grab the options struct from the TargetMachine, no need to
pass it down in the constructor.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218929 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-03 00:10:03 +00:00
Tim Northover
472f2a056d ARM: allow copying of CPSR when all else fails.
As with x86 and AArch64, certain situations can arise where we need to spill
CPSR in the middle of a calculation. These should be avoided where possible
(MRS/MSR is rather expensive), which ARM is actually better at than the other
two since it tries to Glue defs to uses, but as a last ditch effort, copying is
better than crashing.

rdar://problem/18011155

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218789 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 19:21:03 +00:00
Oliver Stannard
9d7038c437 [ARM] Allow selecting VRINT[APMXZR] and VCVT[BT] instructions for FPv5
Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions
when targeting ARMv8, but they are actually present on any target with
FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an
M-profile core, but they have the same instructions so we model them
both as FPARMv8 in the ARM backend.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218763 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 13:13:18 +00:00
Oliver Stannard
ff18b9ff38 [ARM] Add support for Cortex-M7, FPv5-SP and FPv5-DP (LLVM)
The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and
FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be
modelled using the same target feature, and all double-precision
operations are already disabled by the fp-only-sp target features.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218747 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 09:02:17 +00:00
Oliver Stannard
017c6111a8 [Thumb2] ldrexd and strexd are not defined on v7M
The Thumb2 ldrexd and strexd instructions are not defined for
M-class architectures.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218603 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-29 10:57:29 +00:00
Renato Golin
6215f78195 Elide repeated register operand in Thumb1 instructions
This patch makes the ARM backend transform 3 operand instructions such as
'adds/subs' to the 2 operand version of the same instruction if the first
two register operands are the same.

Example: 'adds r0, r0, #1' will is transformed to 'adds r0, #1'.

Currently for some instructions such as 'adds' if you try to assemble
'adds r0, r0, #8' for thumb v6m the assembler would throw an error message
because the immediate cannot be encoded using 3 bits.

The backend should be smart enough to transform the instruction to
'adds r0, #8', which allows for larger immediate constants.

Patch by Ranjeet Singh.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218521 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 16:14:29 +00:00
Tom Stellard
8361c84894 ARM: Remove unneeded check for MI->hasPostISelHook()
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218459 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 18:59:23 +00:00
Renato Golin
6765c34b0c Add aliases for VAND imm to VBIC ~imm
On ARM NEON, VAND with immediate (16/32 bits) is an alias to VBIC ~imm with
the same type size. Adding that logic to the parser, and generating VBIC
instructions from VAND asm files.

This patch also fixes the validation routines for NEON splat immediates which
were wrong.

Fixes PR20702.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218450 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 11:31:24 +00:00
Oliver Stannard
f220c5387b [Thumb2] BXJ should be undefined for v7M, v8A
The Thumb2 BXJ instruction (Branch and Exchange Jazelle) is not
defined for v7M or v8A. It is defined for all other Thumb2-supporting
architectures (v6T2, v7A and v7R).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218445 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-25 10:02:05 +00:00
Moritz Roth
8c4e64af8a [Thumb] Make load/store optimizer less conservative.
If it's safe to clobber the condition flags, we can do a few extra things:
it's then possible to reset the base register writeback using a SUBS, so
we can try to merge even if the base register isn't dead after the merged
instruction.

This is effectively a (heavily bug-fixed) rewrite of r208992.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218386 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 16:35:50 +00:00
Oliver Stannard
43c6b6be8f [Thumb] 32-bit encodings of 'cps' are not valid for v7M
v7M only allows the 16-bit encoding of the 'cps' (Change Processor
State) instruction, and does not have the 32-bit encoding which is
valid from v6T2 onwards.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218382 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-24 14:20:01 +00:00
Robin Morisset
fd4c3c983e Add AtomicExpandPass::bracketInstWithFences, and use it whenever getInsertFencesForAtomic would trigger in SelectionDAGBuilder
Summary:
The goal is to eventually remove all the code related to getInsertFencesForAtomic
in SelectionDAGBuilder as it is wrong (designed for ARM, not really portable, works
mostly by accident because the backends are overly conservative), and repeats the
same logic that goes in emitLeading/TrailingFence.

In this patch, I make AtomicExpandPass insert the fences as it knows better
where to put them. Because this requires getting the fences and not just
passing an IRBuilder around, I had to change the return type of
emitLeading/TrailingFence.
This code only triggers on ARM for now. Because it is earlier in the pipeline
than SelectionDAGBuilder, it triggers and lowers atomic accesses to atomic so
SelectionDAGBuilder does not add barriers anymore on ARM.

If this patch is accepted I plan to implement emitLeading/TrailingFence for all
backends that setInsertFencesForAtomic(true), which will allow both making them
less conservative and simplifying SelectionDAGBuilder once they are all using
this interface.

This should not cause any functionnal change so the existing tests are used
and not modified.

Test Plan: make check-all, benefits from existing tests of atomics on ARM

Reviewers: jfb, t.p.northover

Subscribers: aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D5179

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218329 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 20:31:14 +00:00
Robin Morisset
8439e5e4c4 Just add a fixme about a possibly faster implementation of some atomic loads on some ARM processors
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218326 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 18:33:21 +00:00