2207 Commits

Author SHA1 Message Date
Rafael Espindola
33f5127540 Don't create new comdats in CodeGen.
This patch stops the implicit creation of comdats during codegen.

Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226038 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 20:55:48 +00:00
Tim Northover
09bec94c16 ARM: add test for crc32 instructions in CodeGen.
Somehow we seem to have ended up without any actual tests of the
CodeGen side. Easy enough to fix.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225930 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 01:43:33 +00:00
Adrian Prantl
f89325d832 Debug info: Factor out the creation of DWARF expressions from AsmPrinter
into a new class DwarfExpression that can be shared between AsmPrinter
and DwarfUnit.

This is the first step towards unifying the two entirely redundant
implementations of dwarf expression emission in DwarfUnit and AsmPrinter.

Almost no functional change — Testcases were updated because asm comments
that used to be on two lines now appear on the same line, which is
actually preferable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225706 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-12 22:19:22 +00:00
Kristof Beyls
d1cee9b3bc Fix large stack alignment codegen for ARM and Thumb2 targets
This partially fixes PR13007 (ARM CodeGen fails with large stack
alignment): for ARM and Thumb2 targets, but not for Thumb1, as it
seems stack alignment for Thumb1 targets hasn't been supported at
all.

Producing an aligned stack pointer is done by zero-ing out the lower
bits of the stack pointer. The BIC instruction was used for this.
However, the immediate field of the BIC instruction only allows to
encode an immediate that can zero out up to a maximum of the 8 lower
bits. When a larger alignment is requested, a BIC instruction cannot
be used; llvm was silently producing incorrect code in this case.

This commit fixes code generation for large stack aligments by
using the BFC instruction instead, when the BFC instruction is
available.  When not, it uses 2 instructions: a right shift,
followed by a left shift to zero out the lower bits.

The lowering of ARM::Int_eh_sjlj_dispatchsetup still has code
that unconditionally uses BIC to realign the stack pointer, so it
very likely has the same problem. However, I wasn't able to
produce a test case for that. This commit adds an assert so that
the compiler will fail the assert instead of silently generating
wrong code if this is ever reached.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225446 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-08 15:09:14 +00:00
Charlie Turner
7fbbc81d65 [ARM] Add missing Tag_DIV_use tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225348 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 11:37:40 +00:00
Charlie Turner
b99b8ffb7f Emit the build attribute Tag_conformance.
Claim conformance to version 2.09 of the ARM ABI.

This build attribute must be emitted first amongst the build attributes when
written to an object file. This is to simplify conformance detection by
consumers.

Change-Id: If9eddcfc416bc9ad6e5cc8cdcb05d0031af7657e

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225166 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-05 13:12:17 +00:00
Saleem Abdulrasool
97f8f69a7f ARM: permit tail calls to weak externals on COFF
Weak externals are resolved statically, so we can actually generate the tail
call on PE/COFF targets without breaking the requirements.  It is questionable
whether we want to propagate the current behaviour for MachO as the requirements
are part of the ARM ELF specifications, and it seems that prior to the SVN
r215890, we would have tail'ed the call.  For now, be conservative and only
permit it on PE/COFF where the call will always be fully resolved.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225119 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-03 21:35:00 +00:00
Ahmed Bougacha
bc47ceef43 [ARM] Don't break alignment when combining base updates into load/stores.
r223862/r224203 tried to also combine base-updating load/stores.
There was a mistake there: the alignment was added as is as an operand to
the ARMISD::VLD/VST node.  However, the VLD/VST selection logic doesn't care
about less-than-standard alignment attributes.
For example, no matter the alignment of a v2i64 load (say 1), SelectVLD picks
VLD1q64 (because of the memory type).  But VLD1q64 ("vld1.64 {dXX, dYY}") is
8-aligned, per ARMARMv7a 3.2.1.
For the 1-aligned load, what we really want is VLD1q8.

This commit introduces bitcasts if necessary, and changes the vld/vst type to
one whose standard alignment matches the original load/store alignment.

Differential Revision: http://reviews.llvm.org/D6759


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224754 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-23 06:07:31 +00:00
Rafael Espindola
b646a4b0b8 Convert a few tests to FileCheck. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224705 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-22 13:29:46 +00:00
Eric Christopher
c559ba7251 Add a new string member to the TargetOptions struct for the name
of the abi we should be using. For targets that don't use the
option there's no change, otherwise this allows external users
to set the ABI via string and avoid some of the -backend-option
pain in clang.

Use this option to move the ABI for the ARM port from the
Subtarget to the TargetMachine and update the testcases
accordingly since it's no longer valid to set via -mattr.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224492 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-18 02:20:58 +00:00
Eric Christopher
360cbd4839 Model ARM backend ABI selection after the front end code doing the
same. This will change the "bare metal" ABI from APCS to AAPCS.

The only difference between the front and back end code is that
the code for Triple::GNU was added for environment. That will migrate
to the front end shortly.

Tests updated with the ABI they were originally testing in the case
of bare metal (e.g. -mtriple armv7) or with a -gnu for arm-linux
triples.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224489 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-18 02:08:45 +00:00
Bradley Smith
a9d9f7eae8 [ARM] Prevent PerformVCVTCombine from combining a vmul/vcvt with 8 lanes
This would result in a crash since the vcvt used does not support v8i32 types.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224332 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-16 10:59:27 +00:00
Duncan P. N. Exon Smith
1ef70ff39b IR: Make metadata typeless in assembly
Now that `Metadata` is typeless, reflect that in the assembly.  These
are the matching assembly changes for the metadata/value split in
r223802.

  - Only use the `metadata` type when referencing metadata from a call
    intrinsic -- i.e., only when it's used as a `Value`.

  - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode`
    when referencing it from call intrinsics.

So, assembly like this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata !{i32 %v}, metadata !0)
      call void @llvm.foo(metadata !{i32 7}, metadata !0)
      call void @llvm.foo(metadata !1, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{metadata !3}, metadata !0)
      ret void, !bar !2
    }
    !0 = metadata !{metadata !2}
    !1 = metadata !{i32* @global}
    !2 = metadata !{metadata !3}
    !3 = metadata !{}

turns into this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata i32 %v, metadata !0)
      call void @llvm.foo(metadata i32 7, metadata !0)
      call void @llvm.foo(metadata i32* @global, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{!3}, metadata !0)
      ret void, !bar !2
    }
    !0 = !{!2}
    !1 = !{i32* @global}
    !2 = !{!3}
    !3 = !{}

I wrote an upgrade script that handled almost all of the tests in llvm
and many of the tests in cfe (even handling many `CHECK` lines).  I've
attached it (or will attach it in a moment if you're speedy) to PR21532
to help everyone update their out-of-tree testcases.

This is part of PR21532.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224257 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-15 19:07:53 +00:00
Ahmed Bougacha
780a093afb Reapply "[ARM] Combine base-updating/post-incrementing vector load/stores."
r223862 tried to also combine base-updating load/stores.
r224198 reverted it, as "it created a regression on the test-suite
on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order
in which the words are shown."
Reapply, with a fix to ignore non-normal load/stores.
Truncstores are handled elsewhere (you can actually write a pattern for
those, whereas for postinc loads you can't, since they return two values),
but it should be possible to also combine extloads base updates, by checking
that the memory (rather than result) type is of the same size as the addend.

Original commit message:
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.

We can do the same thing for generic load/stores.

Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).

Differential Revision: http://reviews.llvm.org/D6585


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224203 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-13 23:22:12 +00:00
Renato Golin
1e173b7139 Revert "[ARM] Combine base-updating/post-incrementing vector load/stores."
This reverts commit r223862, as it created a regression on the test-suite
on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order
in which the words are shown. We'll investigate the issue and re-apply
when safe.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224198 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-13 20:23:18 +00:00
Charlie Turner
2a3c63a58f Emit Tag_ABI_FP_16bit_format build attribute.
The __fp16 type is unconditionally exposed. Since -mfp16-format is not yet
supported, there is not a user switch to change this behaviour. This build
attribute should capture the default behaviour of the compiler, which is to
expose the IEEE 754 version of __fp16.

When -mfp16-format is emitted, that will be the way to control the value of
this build attribute.

Change-Id: I8a46641ff0fd2ef8ad0af5f482a6d1af2ac3f6b0

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224115 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-12 11:59:18 +00:00
Tim Northover
19734e7811 ARM: correctly expand LDR-lit based globals.
Quite a major error here: the expansions for the Pseudos with and without
folded load were mixed up. Fortunately it only affects ARM-mode, when not using
movw/movt, on Darwin. I'm guessing no-one actually uses that combination.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223986 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-10 23:40:50 +00:00
Ahmed Bougacha
605c40341b [ARM] Combine base-updating/post-incrementing vector load/stores.
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.

We can do the same thing for generic load/stores.

Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).

Differential Revision: http://reviews.llvm.org/D6585


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223862 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-10 00:07:37 +00:00
Ahmed Bougacha
5898fd4260 [ARM] Make testcase more explicit. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223841 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 22:08:57 +00:00
Ahmed Bougacha
a31421e2dc [ARM] Also support v2f64 vld1/vst1.
It was missing from the VLD1/VST1 handling logic, even though the
corresponding instructions exist (same form as v2i64).

In preparation for a future patch.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223832 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 21:25:00 +00:00
Charlie Turner
1610d6e878 Add missing FP build attribute tests.
The test file test/CodeGen/ARM/build-attributes.ll was missing several
floating-point build attribute tests. The intention of this commit is that for
each CPU / architecture currently tested, there are now tests that make sure
the following attributes are sufficiently checked,

  * Tag_ABI_FP_rounding
  * Tag_ABI_FP_denormal
  * Tag_ABI_FP_exceptions
  * Tag_ABI_FP_user_exceptions
  * Tag_ABI_FP_number_model

Also in this commit, the -unsafe-fp-math flag has been augmented with the full
suite of flags Clang sends to LLVM when you pass -ffast-math to Clang. That is,
`-unsafe-fp-math' has been changed to `-enable-unsafe-fp-math -disable-fp-elim
-enable-no-infs-fp-math -enable-no-nans-fp-math -fp-contract=fast'

Change-Id: I35d766076bcbbf09021021c0a534bf8bf9a32dfc

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223454 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 08:22:47 +00:00
Jonathan Roelofs
3d47855394 Fix thumbv4t indirect calls
So there are a couple of issues with indirect calls on thumbv4t. First, the most
'obvious' instruction, 'blx' isn't available until v5t. And secondly, the
next-most-obvious sequence: 'mov lr, pc; bx rN' doesn't DTRT in thumb code
because the saved off pc has its thumb bit cleared, so when the callee returns
we end up in ARM mode.... yuck.

The solution is to 'bl' to a nearby landing pad with a 'bx rN' in it.

We could cut down on code size by sharing the landing pads between call sites
that are close enough, but for the moment let's do correctness first and look at
performance later.


Patch by: Iain Sandoe

http://reviews.llvm.org/D6519


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223380 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 19:34:50 +00:00
Charlie Turner
10cae8e352 Emit ABI_FP_rounding attribute.
LLVM understands a -enable-sign-dependent-rounding-fp-math codegen option. When
the user has specified this option, the Tag_ABI_FP_rounding attribute should be
emitted with value 1. This option currently does not appear to disable
transformations and optimizations that assume default floating point rounding
behavior, AFAICT, but the intention should be recorded in the build attributes,
regardless of what the compiler actually does with the intention.

Change-Id: If838578df3dc652b6f2796b8d152545674bcb30e

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223218 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-03 08:12:26 +00:00
Charlie Turner
78f9ab5f7c Add tests for default value of Tag_ABI_FP_rounding.
Change-Id: I051866d073fc6ce87ce3e693a3762da6d81f4393

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223217 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-03 07:59:50 +00:00
Charlie Turner
364f2f3fcf Emit Tag_ABI_FP_denormal correctly in fast-math mode.
The default ARM floating-point mode does not support IEEE 754 mode exactly. Of
relevance to this patch is that input denormals are flushed to zero. The way in
which they're flushed to zero depends on the architecture,

  * For VFPv2, it is implementation defined as to whether the sign of zero is
    preserved.
  * For VFPv3 and above, the sign of zero is always preserved when a denormal
    is flushed to zero.

When FP support has been disabled, the strategy taken by this patch is to
assume the software support will mirror the behaviour of the hardware support
for the target *if it existed*. That is, for architectures which can only have
VFPv2, it is assumed the software will flush to positive zero. For later
architectures it is assumed the software will flush to zero preserving sign.

Change-Id: Icc5928633ba222a4ba3ca8c0df44a440445865fd

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223110 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-02 08:22:29 +00:00
Reid Kleckner
03c735b42c Parse 'ghccc' in .ll files as the GHC convention (cc 10)
Previously we just used "cc 10" in the .ll files, but that isn't very
human readable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223076 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-01 21:04:44 +00:00
Tim Northover
f7f88095a3 ARM: lower tail calls correctly when using GHC calling convention.
Patch by Ben Gamari.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223055 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-01 17:46:39 +00:00
Charlie Turner
72ba1af89c Stop uppercasing build attribute data.
The string data for string-valued build attributes were being unconditionally
uppercased. There is no mention in the ARM ABI addenda about case conventions,
so it's technically implementation defined as to whether the data are
capitialised in some way or not. However, there are good reasons not to
captialise the data.

  * It's less work.
  * Some vendors may legitimately have case-sensitive checks for these
    attributes which would fail on LLVM generated object files.
  * There could be locale issues with uppercasing.

The original reasons for uppercasing appear to have stemmed from an
old codesourcery toolchain behaviour, see

http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/87133

This patch makes the object file emitted no longer captialise string
data, it encodes as seen in the assembly source.

Change-Id: Ibe20dd6e60d2773d57ff72a78470839033aa5538

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222882 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-27 12:13:56 +00:00
Renato Golin
18e5ce0188 Fix ARM triple parsing
The triple parser should only accept existing architecture names
when the triple starts with armv, armebv, thumbv or thumbebv.

Patch by Gabor Ballabas.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222129 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-17 14:08:57 +00:00
Oliver Stannard
8f832fce3b [Thumb1] Re-write emitThumbRegPlusImmediate
This was motivated by a bug which caused code like this to be
miscompiled:
  declare void @take_ptr(i8*)
  define void @test() {
    %addr1.32 = alloca i8
    %addr2.32 = alloca i32, i32 1028
    call void @take_ptr(i8* %addr1)
    ret void
  }

This was emitting the following assembly to get the value of %addr1:
  add r0, sp, #1020
  add r0, r0, #8
However, "add r0, r0, #8" is not a valid Thumb1 instruction, and this
could not be assembled. The generated object file contained this,
resulting in r0 holding SP+8 rather tha SP+1028:
  add r0, sp, #1020
  add r0, sp, #8

This function looked like it could have caused miscompilations for
other combinations of registers and offsets (though I don't think it is
currently called with these), and the heuristic it used did not match
the emitted code in all cases.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222125 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-17 11:18:10 +00:00
Oliver Stannard
d9d2703b71 Fix optimisations of SELECT_CC which assumed result is boolean
Some optimisations in DAGCombiner cause miscompilations for targets that use
TargetLowering::UndefinedBooleanContent, because they assume that the results
of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and
XORed. These optimisations are only valid for targets that use
ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent.

This is a follow-up to D6210/r221693.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222123 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-17 10:49:31 +00:00
Tim Northover
52e76186d3 ARM: refactor .cfi_def_cfa_offset emission.
We use to track quite a few "adjusted" offsets through the FrameLowering code
to account for changes in the prologue instructions as we went and allow the
emission of correct CFA annotations. However, we were missing a couple of cases
and the code was almost impenetrable.

It's easier to just add any stack-adjusting instruction to a list and emit them
together.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222057 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-14 22:45:33 +00:00
Tim Northover
d96893fd3d ARM: correctly calculate the offset of FP in its push.
When we folded the DPR alignment gap into a push, we weren't noting the extra
distance from the beginning of the push to the FP, and so FP ended up pointing
at an incorrect offset.

The .cfi_def_cfa_offset directives are still wrong in this case, but I think
that can be improved by refactoring.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222056 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-14 22:45:31 +00:00
Tim Northover
9aa6fd59b3 ARM: simplify test.
The test's DWARF stubs were there just to trigger the emission of .cfi
directives. Fortunately, the NetBSD ABI already demands proper DWARF unwind
info, so it's easier to just use that triple.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222055 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-14 22:45:23 +00:00
Tim Northover
064da63fcb ARM: avoid duplicating branches during constant islands.
We were using a naive heuristic to determine whether a basic block already had
an unconditional branch at the end. This mostly corresponded to reality
(assuming branches got optimised) because there's not much point in a branch to
the next block, but could go wrong.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221904 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 17:58:51 +00:00
Tim Northover
5bd311bf17 ARM: add @llvm.arm.space intrinsic for testing ConstantIslands.
Creating tests for the ConstantIslands pass is very difficult, since it depends
on precise layout details. Having the ability to precisely inject a number of
bytes into the stream helps greatly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221903 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 17:58:48 +00:00
Tom Roeder
63dea2c952 Add Forward Control-Flow Integrity.
This commit adds a new pass that can inject checks before indirect calls to
make sure that these calls target known locations. It supports three types of
checks and, at compile time, it can take the name of a custom function to call
when an indirect call check fails. The default failure function ignores the
error and continues.

This pass incidentally moves the function JumpInstrTables::transformType from
private to public and makes it static (with a new argument that specifies the
table type to use); this is so that the CFI code can transform function types
at call sites to determine which jump-instruction table to use for the check at
that site.

Also, this removes support for jumptables in ARM, pending further performance
analysis and discussion.

Review: http://reviews.llvm.org/D4167



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221708 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-11 21:08:02 +00:00
Oliver Stannard
659b1491b8 LLVM incorrectly folds xor into select
LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with
(set_cc !cc x y), which is only correct when the xor has type i1.
Instead, we should check that the constant operand to the xor is all
ones.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221693 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-11 17:36:01 +00:00
Lang Hames
0ea3b243b0 [RegAlloc] Remove reference to the trivial spiller in test case.
This test case was never actually testing the trivial spiller: the -spiller
option has not been hooked up for a while now.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221475 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-06 19:24:18 +00:00
Rafael Espindola
4396b44b9a Use FileCheck in a few tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221459 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-06 15:05:51 +00:00
Tim Northover
cafa378fe0 ARM: try to add extra CS-register whenever stack alignment >= 8.
We currently try to push an even number of registers to preserve 8-byte
alignment during a function's prologue, but only when the stack alignment is
prcisely 8. Many of the reasons for doing it apply also when that alignment > 8
(the extra store is often free, and can save another stack adjustment, though
less frequently for 16-byte stack alignment).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221321 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-05 00:27:20 +00:00
Tim Northover
1f771b80c0 ARM/Dwarf: correctly align stack before callee-saved VPRs
We were making an attempt to do this by adding an extra callee-saved GPR (so
that there was an even number in the list), but when that failed we went ahead
and pushed anyway.

This had a couple of potential issues:
  + The .cfi directives we emit misplaced dN because they were based on
    PrologEpilogInserter's calculation.
  + Unaligned stores can be less efficient.
  + Unaligned stores can actually fault (likely only an issue in niche cases,
    but possible).

This adds a final explicit stack adjustment if all other options fail, so that
the actual locations of the registers match up with where they should be.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221320 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-05 00:27:13 +00:00
Charlie Turner
c3606b6b2e Remove the cortex-a9-mp CPU.
This CPU definition is redundant. The Cortex-A9 is defined as
supporting multiprocessing extensions. Remove its definition and
update appropriate tests.

LLVM defines both a cortex-a9 CPU and a cortex-a9-mp CPU. The only
difference between the two CPU definitions in ARM.td is that
cortex-a9-mp contains the feature FeatureMP for multiprocessing
extensions.

This is redundant since the Cortex-A9 is defined as having
multiprocessing extensions in the TRMs. armcc also defines the
Cortex-A9 as having multiprocessing extensions by default.

Change-Id: Ifcadaa6c322be0a33d9d2a39cfdd7da1d75981a7

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221166 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-03 17:38:00 +00:00
Quentin Colombet
9b6ca9304c [CodeGenPrepare] Move extractelement close to store if they can be combined.
This patch adds an optimization in CodeGenPrepare to move an extractelement
right before a store when the target can combine them.
The optimization may promote any scalar operations to vector operations in the
way to make that possible.


** Context **

Some targets use different register files for both vector and scalar operations.
This means that transitioning from one domain to another may incur copy from one
register file to another. These copies are not coalescable and may be expensive.
For example, according to the scheduling model, on cortex-A8 a vector to GPR
move is 20 cycles.


** Motivating Example **

Let us consider an example:
define void @foo(<2 x i32>* %addr1, i32* %dest) {
 %in1 = load <2 x i32>* %addr1, align 8
 %extract = extractelement <2 x i32> %in1, i32 1
 %out = or i32 %extract, 1
 store i32 %out, i32* %dest, align 4
 ret void
}

As it is, this IR generates the following assembly on armv7:
  vldr  d16, [r0]            @vector load  
  vmov.32 r0, d16[1]  @ cross-register-file copy: 20 cycles
  orr r0, r0, #1           @ scalar bitwise or
  str r0, [r1]               @ scalar store
  bx  lr

Whereas we could generate much faster code:
  vldr  d16, [r0]               @ vector load
  vorr.i32  d16, #0x1     @ vector bitwise or
  vst1.32 {d16[1]}, [r1:32] @ vector extract + store
  bx  lr

Half of the computation made in the vector is useless, but this allows to get
rid of the expensive cross-register-file copy.


** Proposed Solution **

To avoid this cross-register-copy penalty, we promote the scalar operations to
vector operations. The penalty will be removed if we manage to promote the whole
chain of computation in the vector domain.
Currently, we do that only when the chain of computation ends by a store and the
target is able to combine an extract with a store.

Stores are the most likely candidates, because other instructions produce values
that would need to be promoted and so, extracted as some point[1]. Moreover,
this is customary that targets feature stores that perform a vector extract (see
AArch64 and X86 for instance).

The proposed implementation relies on the TargetTransformInfo to decide whether
or not it is beneficial to promote a chain of computation in the vector domain.
Unfortunately, this interface is rather inaccurate for this level of details and
although this optimization may be beneficial for X86 and AArch64, the inaccuracy
will lead to the optimization being too aggressive.
Basically in TargetTransformInfo, everything that is legal has a cost of 1,
whereas, even if a vector type is legal, usually a vector operation is slightly
more expensive than its scalar counterpart. That will lead to too many
promotions that may not be counter balanced by the saving of the
cross-register-file copy. For instance, on AArch64 this penalty is just 4
cycles.

For now, the optimization is just enabled for ARM prior than v8, since those
processors have a larger penalty on cross-register-file copies, and the scope is
limited to basic blocks. Because of these two factors, we limit the effects of
the inaccuracy. Indeed, I did not want to build up a fancy cost model with block
frequency and everything on top of that.

[1] We can imagine targets that can combine an extractelement with  other
instructions than just stores. If we want to go into that direction, the current
interfaces must be augmented and, moreover, I think this becomes a global isel
problem.

Differential Revision: http://reviews.llvm.org/D5921

<rdar://problem/14170854>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220978 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-31 17:52:53 +00:00
Tim Northover
487dfd6e80 ARM: test default values for TAG_CPU_unaligned_access attribute.
It should be on for every target that supports unaligned accesses (e.g. not
v6m).

Patch by Charlie Turner.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220912 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-30 17:05:44 +00:00
Oliver Stannard
b1d8e7e77c [ARM] Select VMAXNM and VMINNM regardless of operand order
Currently, the ARM backend will select the VMAXNM and VMINNM for these C
expressions:
  (a < b) ? a : b
  (a > b) ? a : b
but not these expressions:
  (a > b) ? b : a
  (a < b) ? b : a

This patch allows all of these expressions to be matched.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220671 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-27 09:23:02 +00:00
Renato Golin
06b11e36e5 Do not emit intermediate register for zero FP immediate
This updates check for double precision zero floating point constant to allow
use of instruction with immediate value rather than temporary register.
Currently "a == 0.0", where "a" is of "double" type generates:

vmov.i32        d16, #0x0
vcmpe.f64       d0, d16

With this change it becomes:

vcmpe.f64        d0, #0

Patch by Sergey Dmitrouk.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220486 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-23 15:31:50 +00:00
Akira Hatanaka
be9668675a [ARM, stack protector] If supported, use armv7 instructions.
This commit enables using movt/movw to load the stack guard address:

movw r0, :lower16:(L_g3$non_lazy_ptr-(LPC0_0+8))
movt r0, :upper16:(L_g3$non_lazy_ptr-(LPC0_0+8))
ldr r0, [pc, r0]

Previously a pc-relative load was emitted:

ldr r0, LCPI0_0
ldr r0, [pc, r0]

rdar://problem/18740489


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220470 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-23 04:17:05 +00:00
Tim Northover
4e399f4500 ARM: rework Thumb1 frame index rewriting
The previous code had a few problems, motivating the choices here.

1. It could create instructions clobbering CPSR, but the incoming MachineInstr
   didn't reflect this. A potential source of corruption. This is why the patch
   has a new PseudoInst for before lowering.
2. Similarly, there was some code to handle the incoming instruction not being
   ARMCC::AL, but this would have caused massive problems if it was actually
   invoked when a complex offset needing more than one instruction was requested.
3. It wasn't designed to handle unaligned pointers (or offsets). These should
   probably be minimised anyway, but the code needs to deal with them properly
   regardless.
4. It had some rather dubious ad-hoc code to avoid calling
   emitThumbRegPlusImmediate, a function which should be designed to do precisely
   this job.

We seem to cover the common cases correctly now, and hopefully can enhance
emitThumbRegPlusImmediate to handle any extra optimisations we need to add in
future.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220236 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-20 21:28:41 +00:00
Oliver Stannard
19d010b851 [ARM] Do not select SMULW[BT] or SMLAW[BT]
The current instruction selection patterns for SMULW[BT] and SMLAW[BT]
are incorrect. These instructions multiply a 32-bit and a 16-bit value
(both signed) and return the top 32 bits of the 48-bit result. This
preserves the 16 bits of overflow, whereas the patterns they currently
match truncate the result to 16 bits then sign extend.

To select these instructions, we would need to match an ISD::SMUL_LOHI,
a sign extend, two shifts and an or. There is no way to match SMUL_LOHI
in an instruction pattern as it defines multiple values, so this would
have to be done in C++. I have raised
http://llvm.org/bugs/show_bug.cgi?id=21297 to cover allowing correct
selection of these instructions.

This fixes http://llvm.org/bugs/show_bug.cgi?id=19396



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220196 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-20 11:30:35 +00:00