Commit Graph

11434 Commits

Author SHA1 Message Date
Robert Khasanov
37e671e894 [SKX] Enable lowering of integer CMP operations.
Added new types to Legalizer.
Fixed getSetCCResultType function
Added lowering tests.

Reviewed by Elena Demikhovsky.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216717 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 08:46:04 +00:00
Job Noorman
d2323fc295 Do not assume the value passed to memset is an i32.
The code in SelectionDAG::getMemset for some reason assumes the value passed to
memset is an i32. This breaks the generated code for targets that only have
registers smaller than 32 bits because the value might get split into multiple
registers by the calling convention. See the test for the MSP430 target included
in the patch for an example.

This patch ensures that nothing is assumed about the type of the value. Instead,
the type is taken from the selected overload of the llvm.memset intrinsic.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216716 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 08:23:53 +00:00
Jiangning Liu
3cd73a5ded [AArch64] Fix some failures exposed by value type v4f16 and v8f16.
1) Add some missing bitcast patterns for v8f16.
2) Add type promotion for operand of ld/st operations.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216706 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 01:31:42 +00:00
Juergen Ributzka
cf45151b2c [FastISel][AArch64] Don't fold instructions that are not in the same basic block.
This fix checks first if the instruction to be folded (e.g. sign-/zero-extend,
or shift) is in the same machine basic block as the instruction we are folding
into.

Not doing so can result in incorrect code, because the value might not be
live-out of the basic block, where the value is defined.

This fixes rdar://problem/18169495.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216700 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-29 00:19:21 +00:00
Jim Grosbach
0d34b1ed26 AArch64: More correctly constrain target vector extend lowering.
The AArch64 target lowering for [zs]ext of vectors is set up to handle
input simple types and expects the generic SDag path to do something reasonable
with anything that's not a simple type. The code, however, was only
checking that the result type was a simple type and assuming that
implied that the source type would also be a simple type. That's not a
valid assumption, as operations like "zext <1 x i1> %0 to <1 x i32>"
demonstrate. The fix is to simply explicitly validate the source type
as well as the result type.

PR20791

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216689 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 22:08:28 +00:00
Rafael Espindola
4bb535027e On MachO, don't put non-private constants in mergeable sections.
On MachO, putting a symbol that doesn't start with a 'L' or 'l' in one of the
__TEXT,__literal* sections prevents the linker from merging the context of the
section.

Since private GVs are the ones the get mangled to start with 'L' or 'l', we now
only put those on the __TEXT,__literal* sections.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216682 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 20:13:31 +00:00
Sanjay Patel
cf9661c6f9 Fix a logic bug in x86 vector codegen: sext (zext (x) ) != sext (x) (PR20472).
Remove a block of code from LowerSIGN_EXTEND_INREG() that was added with:
http://llvm.org/viewvc/llvm-project?view=revision&revision=177421

And caused:
http://llvm.org/bugs/show_bug.cgi?id=20472 (more analysis here)
http://llvm.org/bugs/show_bug.cgi?id=18054

The testcases confirm that we (1) don't remove a zext op that is necessary and (2) generate
a pmovz instead of punpck if SSE4.1 is available. Although pmovz is 1 byte longer, it allows 
folding of the load, and so saves 3 bytes overall.

Differential Revision: http://reviews.llvm.org/D4909



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216679 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 18:59:22 +00:00
David Xu
5ca793561e Generate CMN when comparing a short int with minus
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216651 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 04:59:53 +00:00
Chandler Carruth
da3a293313 [x86] Clean up some tests to use FileCheck and combine two into a single
file.

Changing code that is covered by these tests is just too hard to debug
currently, and now it will be clear the nature of the changes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216643 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 03:41:28 +00:00
Juergen Ributzka
4a76317ebb [FastISel] Undo phi node updates when falling-back to SelectionDAG.
The included test case would fail, because the MI PHI node would have two
operands from the same predecessor.

This problem occurs when a switch instruction couldn't be selected. This happens
always, because there is no default switch support for FastISel to begin with.

The problem was that FastISel would first add the operand to the PHI nodes and
then fall-back to SelectionDAG, which would then in turn add the same operands
to the PHI nodes again.

This fix removes these duplicate PHI node operands by reseting the
PHINodesToUpdate to its original state before FastISel tried to select the
instruction.

This fixes <rdar://problem/18155224>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216640 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 02:06:55 +00:00
Juergen Ributzka
d24494d672 [FastISel]
Currently instructions are folded very aggressively for AArch64 into the memory
operation, which can lead to the use of killed operands:
  %vreg1<def> = ADDXri %vreg0<kill>, 2
  %vreg2<def> = LDRBBui %vreg0, 2
  ... = ... %vreg1 ...

This usually happens when the result is also used by another non-memory
instruction in the same basic block, or any instruction in another basic block.

This fix teaches hasTrivialKill to not only check the LLVM IR that the value has
a single use, but also to check if the register that represents that value has
already been used. This can happen when the instruction with the use was folded
into another instruction (in this particular case a load instruction).

This fixes rdar://problem/18142857.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216634 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 00:09:46 +00:00
Juergen Ributzka
a26b1bdcc8 Revert "[FastISel][AArch64] Don't fold instructions too aggressively into the memory operation."
Quentin pointed out that this is not the correct approach and there is a better and easier solution.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216632 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 23:09:40 +00:00
Juergen Ributzka
1f5263e43f [FastISel][AArch64] Don't fold instructions too aggressively into the memory operation.
Currently instructions are folded very aggressively into the memory operation,
which can lead to the use of killed operands:
  %vreg1<def> = ADDXri %vreg0<kill>, 2
  %vreg2<def> = LDRBBui %vreg0, 2
  ... = ... %vreg1 ...

This usually happens when the result is also used by another non-memory
instruction in the same basic block, or any instruction in another basic block.

If the computed address is used by only memory operations in the same basic
block, then it is safe to fold them. This is because all memory operations will
fold the address computation and the original computation will never be emitted.

This fixes rdar://problem/18142857.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216629 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 22:52:33 +00:00
Juergen Ributzka
ccf53013cd [FastISel][AArch64] Fix simplify address when the address comes from a shift.
When the address comes directly from a shift instruction then the address
computation cannot be folded into the memory instruction, because the zero
register is not available as a base register. Simplify addess needs to emit the
shift instruction and use the result as base register.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216621 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 21:38:33 +00:00
Juergen Ributzka
d445e4acdb [FastISel][AArch64] Use the zero register for stores.
Use the zero register directly when possible to avoid an unnecessary register
copy and a wasted register at -O0. This also uses integer stores to store a
positive floating-point zero. This saves us from materializing the positive zero
in a register and then storing it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216617 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 21:04:52 +00:00
Oliver Stannard
5e487f8dc7 Teach the AArch64 backend about v4f16 and v8f16
This teaches the AArch64 backend to deal with the operations required
to deal with the operations on v4f16 and v8f16 which are exposed by
NEON intrinsics, plus the add, sub, mul and div operations.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216555 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 16:16:04 +00:00
Chandler Carruth
7e3dc40fab [x86] Fix a regression introduced with r213897 for 32-bit targets where
we stopped efficiently lowering sextload using the SSE41 instructions
for that operation.

This is a consequence of a bad predicate I used thinking of the memory
access needs. The code actually handles the cases where the predicate
doesn't apply, and handles them much better. =] Simple fix and a test
case added. Fixes PR20767.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216538 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 11:39:47 +00:00
Chandler Carruth
963a5e6c61 [SDAG] Re-instate r215611 with a fix to a pesky X86 DAG combine.
This combine is essentially combining target-specific nodes back into target
independent nodes that it "knows" will be combined yet again by a target
independent DAG combine into a different set of target-independent nodes that
are legal (not custom though!) and thus "ok". This seems... deeply flawed. The
crux of the problem is that we don't combine un-legalized shuffles that are
introduced by legalizing other operations, and thus we don't see a very
profitable combine opportunity. So the backend just forces the input to that
combine to re-appear.

However, for this to work, the conditions detected to re-form the unlegalized
nodes must be *exactly* right. Previously, failing this would have caused poor
code (if you're lucky) or a crasher when we failed to select instructions.
After r215611 we would fall back into the legalizer. In some cases, this just
"fixed" the crasher by produces bad code. But in the test case added it caused
the legalizer and the dag combiner to iterate forever.

The fix is to make the alignment checking in the x86 side of things match the
alignment checking in the generic DAG combine exactly. This isn't really a
satisfying or principled fix, but it at least make the code work as intended.
It also highlights that it would be nice to detect the availability of under
aligned loads for a given type rather than bailing on this optimization. I've
left a FIXME to document this.

Original commit message for r215611 which covers the rest of the chang:
  [SDAG] Fix a case where we would iteratively legalize a node during
  combining by replacing it with something else but not re-process the
  node afterward to remove it.

  In a truly remarkable stroke of bad luck, this would (in the test case
  attached) end up getting some other node combined into it without ever
  getting re-processed. By adding it back on to the worklist, in addition
  to deleting the dead nodes more quickly we also ensure that if it
  *stops* being dead for any reason it makes it back through the
  legalizer. Without this, the test case will end up failing during
  instruction selection due to an and node with a type we don't have an
  instruction pattern for.

It took many million runs of the shuffle fuzz tester to find this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216537 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 11:22:16 +00:00
Elena Demikhovsky
fe0c6ead85 AVX-512: Added intrinsic for VMOVSS store form with mask.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216530 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 07:38:43 +00:00
Juergen Ributzka
fc03e72b4f [FastISel][AArch64] Fix address simplification.
When a shift with extension or an add with shift and extension cannot be folded
into the memory operation, then the address calculation has to be materialized
separately. While doing so the code forgot to consider a possible sign-/zero-
extension. This fix folds now also the sign-/zero-extension into the add or
shift instruction which is used to materialize the address.

This fixes rdar://problem/18141718.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216511 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 00:58:30 +00:00
Juergen Ributzka
836f4bd090 [FastISel][AArch64] Fold Sign-/Zero-Extend into the shift immediate instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216510 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-27 00:58:26 +00:00
Yi Kong
2282afa6cc ARM: Add patterns for dbg
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216451 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-26 12:47:26 +00:00
Chad Rosier
373fc00835 [AArch32] Add patterns for VCVT{A,N,P,M}.
Patterns for lowering libm calls to VCVT{A,N,P,M} are also included.
Phabricator Revision: http://reviews.llvm.org/D5033

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216388 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-25 16:56:33 +00:00
Hal Finkel
7ca2a7d742 [PowerPC] Add support for dcbtst and icbt (prefetch)
Adds code generation support for dcbtst (data cache prefetch for write) and
icbt (instruction cache prefetch for read - Book E cores only).

We still end up with a 'cannot select' error for the non-supported prefetch
intrinsic forms. This will be fixed in a later commit.

Fixes PR20692.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216339 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-23 23:21:04 +00:00
Chad Rosier
8eb867e97d Revert "ARM: improve RTABI 4.2 conformance on Linux"
This reverts commit r215862 due to nightly failures.  Will work on getting a
reduced test case, but I wanted to get our bots green in the meantime.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216325 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-23 18:29:43 +00:00
Chandler Carruth
bfed08e41f [x86] Start fixing a really subtle and terrible form of miscompile in
these DAG combines.

The DAG auto-CSE thing is truly terrible. Due to it, when RAUW-ing
a node with its operand, you can cause its uses to CSE to itself, which
then causes their uses to become your uses which causes them to be
picked up by the RAUW. For nodes that are determined to be "no-ops",
this is "fine". But if the RAUW is one of several steps to enact
a transformation, this causes the DAG to really silently eat an discard
nodes that you would never expect. It took days for me to actually
pinpoint a test case triggering this and a really frustrating amount of
time to even comprehend the bug because I never even thought about the
ability of RAUW to iteratively consume nodes due to CSE-ing them into
itself.

To fix this, we have to build up a brand-new chain of operations any
time we are combining across (potentially) intervening nodes. But once
the logic is added to do this, another issue surfaces: CombineTo eagerly
deletes the one node combined, *but no others*. This is... really
frustrating. If deleting it makes its operands become dead, those
operand nodes often won't go onto the worklist in the
order you would want -- they're already on it and not near the top. That
means things higher on the worklist will get combined prior to these
dead nodes being GCed out of the worklist, and if the chain is long, the
immediate users won't be enough to re-detect where the root of the chain
is that became single-use again after deleting the dead nodes. The
better way to do this is to never immediately delete nodes, and instead
to just enqueue them so we can recursively delete them. The
combined-from node is typically not on the worklist anyways by virtue of
having been popped off.... But that in turn breaks other tests that
*require* CombineTo to delete unused nodes. :: sigh ::

Fortunately, there is a better way. This whole routine should have been
returning the replacement rather than using CombineTo which is quite
hacky. Switch to that, and all the pieces fall together.

I suspect the same kind of miscompile is possible in the half-shuffle
folding code, and potentially the recursive folding code. I'll be
switching those over to a pattern more like this one for safety's sake
even though I don't immediately have any test cases for them. Note that
the only way I got a test case for this instance was with *heavily* DAG
combined 256-bit shuffle sequences generated by my fuzzer. ;]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216319 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-23 10:25:15 +00:00
Nick Lewycky
f591b9c33e Revert r215611 because it caused the infinite loop in bug 20736. There is a reduced testcase in that bug.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216307 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-23 00:45:03 +00:00
Reid Kleckner
d89c0abc07 ARM / x86_64 varargs: Don't save regparms in prologue without va_start
There's no need to do this if the user doesn't call va_start. In the
future, we're going to have thunks that forward these register
parameters with musttail calls, and they won't need these spills for
handling va_start.

Most of the test suite changes are adding va_start calls to existing
tests to keep things working.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216294 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-22 21:59:26 +00:00
Tom Stellard
f50f927d65 R600/SI: Use READ2/WRITE2 instructions for 64-bit mem ops with 32-bit alignment
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216279 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-22 18:49:35 +00:00
Tom Stellard
ec4cb3346d R600/SI: Use a ComplexPattern for DS loads and stores
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216278 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-22 18:49:33 +00:00
Quentin Colombet
c3f2ad0879 [ARM] Move the implementation of the target hooks related to copy-related
instruction from ARMInstrInfo to ARMBaseInstrInfo.
That way, thumb mode can also benefit from the advanced copy optimization.

<rdar://problem/12702965>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216274 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-22 18:05:22 +00:00
Sasa Stankovic
cc59c3f335 [mips] Don't use odd-numbered float registers for double arguments for fastcc
calling convention if FP is 64-bit and +nooddspreg is used.

Differential Revision: http://reviews.llvm.org/D4981.diff


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216262 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-22 09:23:22 +00:00
Juergen Ributzka
5e34dffb9c [FastISel][AArch64] Add support for variable shift.
This adds the missing variable shift support for value type i8, i16, and i32.

This fixes <rdar://problem/18095685>.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216242 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 23:06:07 +00:00
Juergen Ributzka
5d6365c80c [FastISel][AArch64] Use the correct register class to make the MI verifier happy.
This is mostly achieved by providing the correct register class manually,
because getRegClassFor always returns the GPR*AllRegClass for MVT::i32 and
MVT::i64.

Also cleanup the code to use the FastEmitInst_* method whenever possible. This
makes sure that the operands' register class is properly constrained. For all
the remaining cases this adds the missing constrainOperandRegClass calls for
each operand.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216225 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 20:57:57 +00:00
Tom Stellard
fdbf61d00d R600/SI: Teach moveToVALU how to handle more S_LOAD_* instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216220 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 20:41:00 +00:00
Tom Stellard
5f52739370 R600/SI: Make sure SCRATCH_WAVE_OFFSET is added as Live-In to the function
This fixes a crash in an ocl conformance test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216219 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 20:40:58 +00:00
Quentin Colombet
ad3c6289b6 [AArch64] Run a peephole pass right after AdvSIMD pass.
The AdvSIMD pass may produce copies that are not coalescer-friendly. The
peephole optimizer knows how to fix that as demonstrated in the test case.

<rdar://problem/12702965>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216200 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 18:10:07 +00:00
Moritz Roth
a6afad8b33 Thumb1 load/store optimizer: Improve code to materialize new base register.
There are two add-immediate instructions in Thumb1: tADDi8 and tADDi3. Only
the latter supports using different source and destination registers, so
whenever we materialize a new base register (at a certain offset) we'd do
so by moving the base register value to the new register and then adding in
place. This patch changes the code to use a single tADDi3 if the offset is
small enough to fit in 3 bits.

Differential Revision: http://reviews.llvm.org/D5006

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216193 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 17:11:03 +00:00
Juergen Ributzka
69ec09b61f [FastISel][AArch64] Remove redundant test.
These tests and many more are already covered by fast-isel-addressing-modes.ll.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216186 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 16:40:05 +00:00
Jonathan Roelofs
4c3be1aa0f Add a thread-model knob for lowering atomics on baremetal & single threaded systems
http://reviews.llvm.org/D4984


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216182 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 14:35:47 +00:00
Benjamin Kramer
daada81e5c DAGCombiner: Make concat_vector combine safe for EVTs and concat_vectors with many arguments.
PR20677

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216175 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 13:28:02 +00:00
Oliver Stannard
760a46522a [ARM] Enable DP copy, load and store instructions for FPv4-SP
The FPv4-SP floating-point unit is generally referred to as
single-precision only, but it does have double-precision registers and
load, store and GPR<->DPR move instructions which operate on them.
This patch enables the use of these registers, the main advantage of
which is that we now comply with the AAPCS-VFP calling convention.
This partially reverts r209650, which added some AAPCS-VFP support,
but did not handle return values or alignment of double arguments in
registers.

This patch also adds tests for Thumb2 code generation for
floating-point instructions and intrinsics, which previously only
existed for ARM.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216172 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 12:50:31 +00:00
Robert Khasanov
fec1abaeab [x86] Added _addcarry_ and _subborrow_ intrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216164 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 09:43:43 +00:00
Robert Khasanov
10dacc4b52 [x86] Broadwell: ADOX/ADCX. Added _addcarryx_u{32|64} intrinsics to LLVM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216162 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 09:27:00 +00:00
Jiangning Liu
150cef218f Revert r216066, "Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216147 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 01:59:30 +00:00
Quentin Colombet
e817bdd304 [PeepholeOptimizer] Take advantage of the isInsertSubreg property in the
advanced copy optimization.

This is the final step patch toward transforming:
udiv    r0, r0, r2
udiv    r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov    r0, r1, d16
bx      lr

into:
udiv    r0, r0, r2
udiv    r1, r1, r3
bx      lr

Indeed, thanks to this patch, this optimization is able to look through
vmov.32 d16[0], r0
vmov.32 d16[1], r1

and is able to rewrite the following sequence:
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov    r0, r1, d16

into simple generic GPR copies that the coalescer managed to remove.

<rdar://problem/12702965>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216144 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 00:19:16 +00:00
Jonathan Roelofs
506ed4d4a5 Lower thumbv4t & thumbv5 lo->lo copies through a push-pop sequence
On pre-v6 hardware, 'MOV lo, lo' gives undefined results, so such copies need to
be avoided. This patch trades simplicity for implementation time at the expense
of performance... As they say: correctness first, then performance.

See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075998.html for a few
ideas on how to make this better.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216138 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 23:38:50 +00:00
Sanjay Patel
89305e5345 Don't prevent a vselect of constants from becoming a single load (PR20648).
Fix for PR20648 - http://llvm.org/bugs/show_bug.cgi?id=20648

This patch checks the operands of a vselect to see if all values are constants.
If yes, bail out of any further attempts to create a blend or shuffle because
SelectionDAGLegalize knows how to turn this kind of vselect into a single load.

This already happens for machines without SSE4.1, so the added checks just send
more targets down that path.

Differential Revision: http://reviews.llvm.org/D4934


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216121 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 20:34:56 +00:00
Duncan P. N. Exon Smith
4641d5dbdf X86: Add missing triples from r216119
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216120 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 19:58:59 +00:00
Duncan P. N. Exon Smith
5012f1db20 X86: Align the stack on word boundaries in LowerFormalArguments()
The goal of the patch is to implement section 3.2.3 of the AMD64 ABI
correctly.  The controlling sentence is, "The size of each argument gets
rounded up to eightbytes.  Therefore the stack will always be eightbyte
aligned." The equivalent sentence in the i386 ABI page 37 says, "At all
times, the stack pointer should point to a word-aligned area."  For both
architectures, the stack pointer is not being rounded up to the nearest
eightbyte or word between the last normal argument and the first
variadic argument.

Patch by Thomas Jablin!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216119 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-20 19:40:59 +00:00