Commit Graph

25286 Commits

Author SHA1 Message Date
Juergen Ributzka
4fa6ecc26f [FastISel][AArch64] Fix return type in FastLowerCall.
I used the wrong method to obtain the return type inside FinishCall. This fix
simply uses the return type from FastLowerCall, which we already determined to
be a valid type.

Reduced test case from Chad. Thanks.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213788 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 20:03:13 +00:00
Justin Holewinski
03f160f9d3 [NVPTX] mul.wide generation works for any smaller integer source types, not just the next smaller power of two
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213784 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 18:46:03 +00:00
Robert Khasanov
8b832c1c39 [SKX] Added missed test files for rev 213757
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213780 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 18:17:49 +00:00
Saleem Abdulrasool
e2c63ff3c9 AsmParser: remove deprecated LLIR support
linker_private and linker_private_weak were deprecated in 3.5.  Remove support
for them now that the 3.5 branch has been created.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213777 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 18:09:31 +00:00
Robert Khasanov
f47bcb31e0 [SKX] Fix lowercase "error:" in rev 213757
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213774 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 17:42:13 +00:00
Justin Holewinski
6b50a7d179 [NVPTX] Make sure we do not generate MULWIDE ISD nodes when optimizations are disabled
With optimizations disabled, we disable the isel patterns for mul.wide; but we
were still generating MULWIDE ISD nodes.  Now, we only try to generate MULWIDE
ISD nodes in DAGCombine if the optimization level is not zero.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213773 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 17:40:45 +00:00
Mark Heffernan
e8d7ebcd5a In unroll pragma syntax and loop hint metadata, change "enable" forms to a new form using the string "full".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213772 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 17:31:37 +00:00
Chad Rosier
67c325e9f0 [AArch64] Lower sdiv x, pow2 using add + select + shift.
The target-independent DAGcombiner will generate:
asr w1, X, #31 w1 = splat sign bit.
add X, X, w1, lsr #28 X = X + 0 or pow2-1
asr w0, X, asr #4 w0 = X/pow2

However, the add + shifts is expensive, so generate:
add w0, X, 15 w0 = X + pow2-1
cmp X, wzr X - 0
csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X;
asr w0, X, asr 4 w0 = X/pow2

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213758 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 14:57:52 +00:00
Robert Khasanov
3922da8ae8 [SKX] Enabling mask instructions: encoding, lowering
KMOVB, KMOVW, KMOVD, KMOVQ, KNOTB, KNOTW, KNOTD, KNOTQ

Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213757 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 14:49:42 +00:00
Tim Northover
afb1938c39 ARM: spot SBFX-compatbile code expressed with sign_extend_inreg
We were assuming all SBFX-like operations would have the shl/asr form, but
often when the field being extracted is an i8 or i16, we end up with a
SIGN_EXTEND_INREG acting on a shift instead. Simple enough to check for though.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213754 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 13:59:12 +00:00
Tim Northover
52aa80b068 ARM: add patterns for [su]xta[bh] from just a shift.
Although the final shifter operand is a rotate, this actually only matters for
the half-word extends when the amount == 24. Otherwise folding a shift in is
just as good.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213753 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 13:59:07 +00:00
Tilmann Scheller
3b867c9c8e [ARM] Make the assembler reject unpredictable pre/post-indexed ARM STRB instructions.
The ARM ARM prohibits STRB instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STRB instructions with unpredictable behavior.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213750 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 13:03:47 +00:00
Tim Northover
8b6257629a AArch64: remove "arm64_be" support in favour of "aarch64_be".
There really is no arm64_be: it was a useful fiction to test big-endian support
while both backends existed in parallel, but now the only platform that uses
the name (iOS) doesn't have a big-endian variant, let alone one called
"arm64_be".

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213748 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 12:58:11 +00:00
Tilmann Scheller
0f7b2e9db0 [ARM] Make the assembler reject unpredictable pre/post-indexed ARM STR instructions.
The ARM ARM prohibits STR instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STR instructions with unpredictable behavior.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213745 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 12:38:17 +00:00
Andrea Di Biagio
bf0fb36d72 Revert r211771. It was: "[X86] Improve the selection of SSE3/AVX addsub instructions".
This chang fully reverts r211771.
That revision added a canonicalization rule which has the potential to causes a
combine-cycle in the target-independent canonicalizing DAG combine.

The plan is to move the logic that forms target specific addsub nodes as part of
the lowering of shuffles.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213736 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 11:20:24 +00:00
Chandler Carruth
5a37cccc7e [x86] Clean up a test case to use check labels and spell out the exact
instruction sequences with CHECK-NEXT for these test cases.

This notably exposes how absolutely horrible the generated code is for
several of these test cases, and will make any future updates to the
test as our vector instruction selection gets better.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213732 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 09:11:48 +00:00
Tilmann Scheller
bd69956ecd [ARM] Add regression test for the earlyclobber constraint of ARM STRB.
The constraint was added in r213369.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213730 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 08:39:50 +00:00
Tilmann Scheller
fbf7b85869 [ARM] Add earlyclobber constraint to pre/post-indexed ARM STRH instructions.
The post-indexed instructions were missing the constraint, causing unpredictable STRH instructions to be emitted.

The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213729 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 08:12:51 +00:00
Chandler Carruth
a6425604c2 [SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate
insertions.

The old behavior could cause arbitrarily bad memory usage in the DAG
combiner if there was heavy traffic of adding nodes already on the
worklist to it. This commit switches the DAG combine worklist to work
the same way as the instcombine worklist where we null-out removed
entries and only add new entries to the worklist. My measurements of
codegen time shows slight improvement. The memory utilization is
unsurprisingly dominated by other factors (the IR and DAG itself
I suspect).

This change results in subtle, frustrating churn in the particular order
in which DAG combines are applied which causes a number of minor
regressions where we fail to match a pattern previously matched by
accident. AFAICT, all of these should be using AddToWorklist to directly
or should be written in a less brittle way. None of the changes seem
drastically bad, and a few of the changes seem distinctly better.

A major change required to make this work is to significantly harden the
way in which the DAG combiner handle nodes which become dead
(zero-uses). Previously, we relied on the ability to "priority-bump"
them on the combine worklist to achieve recursive deletion of these
nodes and ensure that the frontier of remaining live nodes all were
added to the worklist. Instead, I've introduced a routine to just
implement that precise logic with no indirection. It is a significantly
simpler operation than that of the combiner worklist proper. I suspect
this will also fix some other problems with the combiner.

I think the x86 changes are really minor and uninteresting, but the
avx512 change at least is hiding a "regression" (despite the test case
being just noise, not testing some performance invariant) that might be
looked into. Not sure if any of the others impact specific "important"
code paths, but they didn't look terribly interesting to me, or the
changes were really minor. The consensus in review is to fix any
regressions that show up after the fact here.

Thanks to the other reviewers for checking the output on other
architectures. There is a specific regression on ARM that Tim already
has a fix prepped to commit.

Differential Revision: http://reviews.llvm.org/D4616

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213727 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 07:08:53 +00:00
Nick Lewycky
1bda2d9386 We may visit a call that uses an alloca multiple times in callUsesLocalStack, sometimes with IsNocapture true and sometimes with IsNocapture false. We accidentally skipped work we needed to do in the IsNocapture=false case if we were called with IsNocapture=true the first time. Fixes PR20405!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213726 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 06:24:49 +00:00
NAKAMURA Takumi
b8d3634313 Rework to let RuntimeDyld/X86/MachO_x86-64_PIC_relocations.s pass on win32.
FIXME: "llvm-rtdyld -verify -check" is still sensitive to path separator.
Fix searching StubMap to be tolerant of both '/' and '\\' on Win32.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213723 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 04:32:21 +00:00
NAKAMURA Takumi
11e84aefe0 Suppress a test on win32 for now, llvm/test/ExecutionEngine/RuntimeDyld/X86/MachO_x86-64_PIC_relocations.s.
FIXME: Fix searching StubMap with '/' and '\\' on Win32.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213721 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 04:05:58 +00:00
NAKAMURA Takumi
4c72ce8cff RuntimeDyld/X86/MachO_x86-64_PIC_relocations.s: Use %/T here, or sed(1) would be confused with dos path.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213720 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-23 04:05:46 +00:00
Juergen Ributzka
b9b8d09137 XFAIL the test on MIPS
Not sure how to debug this one without a MIPS machine. Any takers?

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213705 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-22 23:15:01 +00:00
Juergen Ributzka
7edf396977 [FastIsel][AArch64] Add support for the FastLowerCall and FastLowerIntrinsicCall target-hooks.
This commit modifies the existing call lowering functions to be used as the
FastLowerCall and FastLowerIntrinsicCall target-hooks instead.

This enables patchpoint intrinsic lowering for AArch64.

This fixes <rdar://problem/17733076>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213704 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-22 23:14:58 +00:00
Juergen Ributzka
be8c68d72d [AArch64] Use CHECK-LABEL in ARM64 ABI unit tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213703 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-22 23:14:54 +00:00
Lang Hames
daf061cf05 [MCJIT] Refactor and add stub inspection to the RuntimeDyldChecker framework.
This patch introduces a 'stub_addr' builtin that can be used to find the address
of the stub for a given (<file>, <section>, <symbol>) tuple. This address can be
used both to verify the contents of stubs (by loading from the returned address)
and to verify references to stubs (by comparing against the returned address).

Example (1) - Verifying stub contents:

Load 8 bytes (assuming a 64-bit target) from the stub for 'x' in the __text
section of f.o, and compare that value against the addres of 'x'.

# rtdyld-check: *{8}(stub_addr(f.o, __text, x) = x

Example (2) - Verifying references to stubs:

Decode the immediate of the instruction at label 'l', and verify that it's
equal to the offset from the next instruction's PC to the stub for 'y' in the
__text section of f.o (i.e. it's the correct PC-rel difference).

# rtdyld-check: decode_operand(l, 4) = stub_addr(f.o, __text, y) - next_pc(l)
l:
        movq    y@GOTPCREL(%rip), %rax

Since stub inspection requires cooperation with RuntimeDyldImpl this patch
pimpl-ifies RuntimeDyldChecker. Its implementation is moved in to a new class,
RuntimeDyldCheckerImpl, that has access to the definition of RuntimeDyldImpl.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213698 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-22 22:47:39 +00:00
Juergen Ributzka
04e8cc79cd [RuntimeDyld][MachO][AArch64] Add a helper function for encoding addends in instructions.
Factor out the addend encoding into a helper function and simplify the
processRelocationRef.

Also add a few simple rtdyld tests. More tests to come once GOTs can be tested too.

Related to <rdar://problem/17768539>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213689 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-22 21:42:55 +00:00
Suyog Sarda
c9ea25fc51 This patch implements optimization as mentioned in PR19753: Optimize comparisons with "ashr/lshr exact" of a constanst.
It handles the errors which were seen in PR19958 where wrong code was being emitted due to earlier patch.
Added code for lshr as well as non-exact right shifts.

It implements : 
(icmp eq/ne (ashr/lshr const2, A), const1)" ->
(icmp eq/ne A, Log2(const2/const1)) ->
(icmp eq/ne A, Log2(const2) - Log2(const1))

Differential Revision: http://reviews.llvm.org/D4068
 


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213678 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-22 19:19:36 +00:00
Suyog Sarda
3326ee444a Added InstCombine transform for pattern "(A & B) ^ (A ^ B) -> (A | B)"
Patch idea by Ankit Jain !

Differential Revision: http://reviews.llvm.org/D4618



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213677 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-22 18:30:54 +00:00
Suyog Sarda
1a1b1f708d Added InstCombine Transform for patterns:
"((~A & B) | A) -> (A | B)" and "((A & B) | ~A) -> (~A | B)"

Original Patch credit to Ankit Jain !!

Differential Revision: http://reviews.llvm.org/D4591



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213676 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-22 18:09:41 +00:00
Hal Finkel
b3b2aac5be Make use of the align parameter attribute for all pointer arguments
We previously supported the align attribute on all (pointer) parameters, but we
only used it for byval parameters. However, it is completely consistent at the
IR level to treat 'align n' on all pointer parameters as an alignment
assumption on the pointer, and now we wll. Specifically, this causes
computeKnownBits to use the align attribute on all pointer parameters, not just
byval parameters. I've also added an explicit parameter attribute test for this
to test/Bitcode/attributes.ll.

And I've updated the LangRef to document the align parameter attribute (as it
turns out, it was not documented at all previously, although the byval
documentation mentioned that it could be used).

There are (at least) two benefits to doing this:
 - It allows enhancing alignment based on the pointer alignment after inlining callees.
 - It allows simplification of pointer arithmetic.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213670 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-22 16:58:55 +00:00
Tim Northover
50f2f1434c X86: drop relocations on __eh_frame sections globally.
Without this, we produce non-extern relocations when targeting older OS X
versions that ld64 can't cope with in the particular context of __eh_frame
sections (who'd want generic relocation-processing anyway?).

This means that an updated linker (ld64 from Xcode 3.2.6 or later) may be
needed when targeting such platforms with a modern version of LLVM, but this is
probably the case anyway and a reasonable requirement.

PR20212, rdar://problem/17544795

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213665 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-22 15:47:09 +00:00
Suyog Sarda
578c74e35d This patch implements transform for pattern "(A | B) ^ (~A) -> (A | ~B)".
Patch Credit to Ankit Jain !!

Differential Revision: http://reviews.llvm.org/D4588



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213662 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-22 15:37:39 +00:00
Peter Zotov
acdcb3773d [OCaml] Don't truncate constants over 32 bits in Llvm.const_int.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213655 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-22 13:55:20 +00:00
Sasa Stankovic
f8b83e39c5 [mips] Fix two patterns that select i32's (for MIPS32r6) / i64's (for MIPS64r6)
from setne comparison with an i32.

The patterns that are fixed:
  * (select (i32 (setne i32, immZExt16)), i32, i32) (for MIPS32r6)
  * (select (i32 (setne i32, immZExt16)), i64, i64) (for MIPS64r6)


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213653 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-22 13:36:02 +00:00
Elena Demikhovsky
bf348c4e46 AVX-512: Fixed intrinsic of VSQRTPS/PD instructions.
I set number and types of parameters according to GCC intrinsics.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213640 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-22 11:07:31 +00:00
Richard Smith
434f0e3548 Revert of r213521. This change introduced a non-hermetic test (depending on a
file not in the test/ area). Backing out now so that this test isn't part of
the 3.5 branch.

Original commit message: "TableGen: Allow AddedComplexity values to be negative
[...]"


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213596 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-22 02:32:12 +00:00
Mark Heffernan
bc7f1aba2d Rename metadata llvm.loop.vectorize.unroll to llvm.loop.vectorize.interleave.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213588 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 23:11:03 +00:00
Eli Bendersky
07efe87198 Add some tests for NVPTX lowering of cmpxchg
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213586 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 22:54:44 +00:00
David Blaikie
529165298e Revert "Recommit r212203: Don't try to construct debug LexicalScopes hierarchy for functions that do not have top level debug information."
This reverts commit r212649 while I investigate/reduce/etc PR20367.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213581 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 20:45:59 +00:00
Logan Chien
8c4cf40507 Replace the result usages while legalizing cmpxchg.
We should update the usages to all of the results;
otherwise, we might get assertion failure or SEGV during
the type legalization of ATOMIC_CMP_SWAP_WITH_SUCCESS
with two or more illegal types.

For example, in the following sequence, both i8 and i1
might be illegal in some target, e.g. armv5, mipsel, mips64el,

    %0 = cmpxchg i8* %ptr, i8 %desire, i8 %new monotonic monotonic
    %1 = extractvalue { i8, i1 } %0, 1

Since both i8 and i1 should be legalized, the corresponding
ATOMIC_CMP_SWAP_WITH_SUCCESS dag will be checked/replaced/updated
twice.

If we don't update the usage to *ALL* of the results in the
first round, the DAG for extractvalue might be processed earlier.
The GetPromotedInteger() will result in assertion failure,
because its operand (i.e. the success bit of cmpxchg) is not
promoted beforehand.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213569 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 17:33:44 +00:00
Tom Stellard
9787e8c76b R600/SI: Add instruction shrinking pass
This pass converts 64-bit instructions to 32-bit when possible.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213561 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 16:55:33 +00:00
Tom Stellard
05388f25d7 R600/SI: Clean up some of the unused REGISTER_{LOAD,STORE} code
There are a few more cleanups to do, but I ran into some problems
with ext loads and trunc stores, when I tried to change some of the
vector loads and stores from custom to legal, so I wasn't able to
get rid of everything.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213552 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 15:45:06 +00:00
Tom Stellard
3280804237 R600/SI: Use scratch memory for large private arrays
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213551 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 15:45:01 +00:00
Daniel Sanders
2479def756 [mips] Do not emit '.module fp=...' unless we really need to.
We now emit this value when we need to contradict the default value. This
restores support for binutils 2.24.

When a suitable binutils has been released we can resume unconditionally
emitting .module directives. This is preferable to omitting the .module
directives since the .module directives protect against, for example,
accidentally assembling FP32 code with -mfp64 and producing an unusuable object.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213548 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 15:25:24 +00:00
Tom Stellard
b664d47cb0 R600/SI: Store constant initializer data in constant memory
This implements a solution for constant initializers suggested
by Vadim Girlin, where we store the data after the shader code
and then use the S_GETPC instruction to compute its address.

This saves use the trouble of creating a new buffer for constant data
and then having to pass the pointer to the kernel via user SGPRs or the
input buffer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213530 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 14:01:14 +00:00
Tom Stellard
54a2540fee R600/SI: Use VALU for i1 XOR
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213528 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 14:01:10 +00:00
Daniel Sanders
6816d66d99 [mips] Add MipsOptionRecord abstraction and use it to implement .reginfo/.MIPS.options
This abstraction allows us to support the various records that can be placed in
the .MIPS.options section in the future. We currently use it to record register
usage information (the ODK_REGINFO record in our ELF64 spec).

Each .MIPS.options record should subclass MipsOptionRecord and provide an
implementation of EmitMipsOptionRecord.

Patch by Matheus Almeida and Toma Tabacu



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213522 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 13:30:55 +00:00
Tom Stellard
4950ae5960 TableGen: Allow AddedComplexity values to be negative
This is useful for cases when stand-alone patterns are preferred to the
patterns included in the instruction definitions.  Instead of requiring
that stand-alone patterns set a larger AddedComplexity value, which
can be confusing to new developers, the allows us to reduce the
complexity of the included patterns to achieve the same result.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213521 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 13:28:54 +00:00
Daniel Sanders
8ecdc6d34f [mips] Do not emit '.module [no]oddspreg' unless we really need to.
We now emit this directive when we need to contradict the default value (e.g.
-mno-odd-spreg is given) or an option changed the default value (e.g. -mfpxx
is given).

This restores support for the currently available head of binutils. However,
at this point binutils 2.24 is still not sufficient since it does not support
'.module fp=...'.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213511 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 10:45:47 +00:00
Chandler Carruth
80a15c2ff3 FileCheck-ize a test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213508 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 09:23:21 +00:00
Tim Northover
f8d927f22b CodeGen: emit IR-level f16 conversion intrinsics as fptrunc/fpext
This makes the first stage DAG for @llvm.convert.to.fp16 an fptrunc,
and correspondingly @llvm.convert.from.fp16 an fpext. The legalisation
path is now uniform, regardless of the input IR:

  fptrunc -> FP_TO_FP16 (if f16 illegal) -> libcall
  fpext -> FP16_TO_FP (if f16 illegal) -> libcall

Each target should be able to select the version that best matches its
operations and not be required to duplicate patterns for both fptrunc
and FP_TO_FP16 (for example).

As a result we can remove some redundant AArch64 patterns.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213507 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 09:13:56 +00:00
Andrea Di Biagio
19e39c6f12 [DAGCombiner] Improve the shuffle-vector folding logic.
Canonicalize shuffles according to rules:
 *  shuffle(A, shuffle(A, B)) -> shuffle(shuffle(A,B), A)
 *  shuffle(B, shuffle(A, B)) -> shuffle(shuffle(A,B), B)
 *  shuffle(B, shuffle(A, Undef)) -> shuffle(shuffle(A, Undef), B)

This patch helps identifying more shuffle pairs that could be combined reusing
the already existing rules in the DAGCombiner.

Added new test 'combine-vec-shuffle-5.ll' to verify that the canonicalized
shuffles are now folded into a single shuffle node by the DAGCombiner.
Added more test cases to 'combine-vec-shuffle-4.ll'.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213504 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 07:30:54 +00:00
Ulrich Weigand
d4542a8cdc [PowerPC] ELFv2 aggregate passing support
This patch adds infrastructure support for passing array types
directly.  These can be used by the front-end to pass aggregate
types (coerced to an appropriate array type).  The details of the
array type being used inform the back-end about ABI-relevant
properties.  Specifically, the array element type encodes:
- whether the parameter should be passed in FPRs, VRs, or just
  GPRs/stack slots  (for float / vector / integer element types,
  respectively)
- what the alignment requirements of the parameter are when passed in
  GPRs/stack slots  (8 for float / 16 for vector / the element type
  size for integer element types) -- this corresponds to the
  "byval align" field

Using the infrastructure provided by this patch, a companion patch
to clang will enable two features:
- In the ELFv2 ABI, pass (and return) "homogeneous" floating-point
  or vector aggregates in FPRs and VRs (this is similar to the ARM
  homogeneous aggregate ABI)
- As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates
  that fit fully in registers without using the "byval" mechanism

The patch uses the functionArgumentNeedsConsecutiveRegisters callback
to encode that special treatment is required for all directly-passed
array types.  The isInConsecutiveRegs / isInConsecutiveRegsLast bits set
as a results are then used to implement the required size and alignment
rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc.

As a related change, the ABI routines have to be modified to support
passing floating-point types in GPRs.  This is necessary because with
homogeneous aggregates of 4-byte float type we can now run out of FPRs
*before* we run out of the 64-byte argument save area that is shadowed
by GPRs.  Any extra floating-point arguments that no longer fit in FPRs
must now be passed in GPRs until we run out of those too.

Note that there was already code to pass floating-point arguments in
GPRs used with vararg parameters, which was done by writing the argument
out to the argument save area first and then reloading into GPRs.  The
patch re-implements this, however, in favor of code packing float arguments
directly via extension/truncation, BITCAST, and BUILD_PAIR operations.

This is required to support the ELFv2 ABI, since we cannot unconditionally
write to the argument save area (which the caller might not have allocated).
The change does, however, affect ELFv1 varags routines too; but even here
the overall effect should be advantageous: Instead of loading the argument
into the FPR, then storing the argument to the stack slot, and finally
reloading the argument from the stack slot into a GPR, the new code now
just loads the argument into the FPR, and subsequently loads the argument
into the GPR (via BITCAST).  That BITCAST might imply a save/reload from
a stack temporary (in which case we're no worse than before); but it
might be implemented more efficiently in some cases.

The final part of the patch enables up to 8 FPRs and VRs for argument
return in PPCCallingConv.td; this is required to support returning
ELFv2 homogeneous aggregates.  (Note that this doesn't affect other ABIs
since LLVM wil only look for which register to use if the parameter is
marked as "direct" return anyway.)

Reviewed by Hal Finkel.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213493 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 00:13:26 +00:00
Ulrich Weigand
970c019d02 [PowerPC] ELFv2 explicit CFI for CR fields
This is a minor improvement in the ELFv2 ABI.   In ELFv1, DWARF CFI
would represent a saved CR word (holding CR fields CR2, CR3, and CR4)
using just a single CFI record refering to CR2.   In ELFv2 instead,
each of the CR fields is represented by its own CFI record.  The
advantage is that the compiler can now chose to save just a single
(or two) CR fields instead of all of them, if those are the only ones
that actually need saving.  That can lead to more efficient code using
mf(o)crf instead of the (slow) mfcr instruction.

Note that this patch does not (yet) implement this more efficient
code generation, but it does implement the part that is required to
be ABI compliant: creating multiple CFI records if multiple CR fields
are saved.

Reviewed by Hal Finkel.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213492 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 00:03:18 +00:00
Ulrich Weigand
7fc5011e8d [PowerPC] ELFv2 stack space reduction
The ELFv2 ABI reduces the amount of stack required to implement an
ABI-compliant function call in two ways:
* the "linkage area" is reduced from 48 bytes to 32 bytes by
  eliminating two unused doublewords
* the 64-byte "parameter save area" is now optional and need not be
  present in certain cases (it remains mandatory in functions with
  variable arguments, and functions that have any parameter that is
  passed on the stack)

The following patch implements this required changes:
- reducing the linkage area, and associated relocation of the TOC save
  slot, in getLinkageSize / getTOCSaveOffset (this requires updating all
  callers of these routines to pass in the isELFv2ABI flag).
- (partially) handling the case where the parameter save are is optional

This latter part requires some extra explanation:  Currently, we still
always allocate the parameter save area when *calling* a function.
That is certainly always compliant with the ABI, but may cause code to
allocate stack unnecessarily.  This can be addressed by a follow-on
optimization patch.

On the *callee* side, in LowerFormalArguments, we *must* track
correctly whether the ABI guarantees that the caller has allocated
the parameter save area for our use, and the patch does so. However,
there is one complication: the code that handles incoming "byval"
arguments will currently *always* write to the parameter save area,
because it has to force incoming register arguments to the stack since
it must return an *address* to implement the byval semantics.

To fix this, the patch changes the LowerFormalArguments code to write
arguments to a freshly allocated stack slot on the function's own stack
frame instead of the argument save area in those cases where that area
is not present.

Reviewed by Hal Finkel.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213490 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-20 23:43:15 +00:00
Ulrich Weigand
edfd4f18bc [PowerPC] ELFv2 function call changes
This patch builds upon the two preceding MC changes to implement the
basic ELFv2 function call convention.  In the ELFv1 ABI, a "function
descriptor" was associated with every function, pointing to both the
entry address and the related TOC base (and a static chain pointer
for nested functions).  Function pointers would actually refer to that
descriptor, and the indirect call sequence needed to load up both entry
address and TOC base.

In the ELFv2 ABI, there are no more function descriptors, and function
pointers simply refer to the (global) entry point of the function code.
Indirect function calls simply branch to that address, after loading it
up into r12 (as required by the ABI rules for a global entry point).
Direct function calls continue to just do a "bl" to the target symbol;
this will be resolved by the linker to the local entry point of the
target function if it is local, and to a PLT stub if it is global.
That PLT stub would then load the (global) entry point address of the
final target into r12 and branch to it.  Note that when performing a
local function call, r2 must be set up to point to the current TOC
base: if the target ends up local, the ABI requires that its local
entry point is called with r2 set up; if the target ends up global,
the PLT stub requires that r2 is set up.

This patch implements all LLVM changes to implement that scheme:
- No longer create a function descriptor when emitting a function
  definition (in EmitFunctionEntryLabel)
- Emit two entry points *if* the function needs the TOC base (r2)
  anywhere (this is done EmitFunctionBodyStart; note that this cannot
  be done in EmitFunctionBodyStart because the global entry point
  prologue code must be *part* of the function as covered by debug info).
- In order to make use tracking of r2 (as needed above) work correctly,
  mark direct function calls as implicitly using r2.
- Implement the ELFv2 indirect function call sequence (no function
  descriptors; load target address into r12).
- When creating an ELFv2 object file, emit the .abiversion 2 directive
  to tell the linker to create the appropriate version of PLT stubs.  

Reviewed by Hal Finkel.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213489 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-20 23:31:44 +00:00
Ulrich Weigand
76fcace66e [MC] Pass MCSymbolData to needsRelocateWithSymbol
As discussed in a previous checking to support the .localentry
directive on PowerPC, we need to inspect the actual target symbol
in needsRelocateWithSymbol to make the appropriate decision based
on that symbol's st_other bits.

Currently, needsRelocateWithSymbol does not get the target symbol.
However, it is directly available to its sole caller.  This patch
therefore simply extends the needsRelocateWithSymbol by a new
parameter "const MCSymbolData &SD", passes in the target symbol,
and updates all derived implementations.

In particular, in the PowerPC implementation, this patch removes
the FIXME added by the previous checkin.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213487 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-20 23:15:06 +00:00
Hal Finkel
160f9b9c10 [LoopVectorize] Use AA to partition potential dependency checks
Prior to this change, the loop vectorizer did not make use of the alias
analysis infrastructure. Instead, it performed memory dependence analysis using
ScalarEvolution-based linear dependence checks within equivalence classes
derived from the results of ValueTracking's GetUnderlyingObjects.

Unfortunately, this meant that:
  1. The loop vectorizer had logic that essentially duplicated that in BasicAA
     for aliasing based on identified objects.
  2. The loop vectorizer could not partition the space of dependency checks
     based on information only easily available from within AA (TBAA metadata is
     currently the prime example).

This means, for example, regardless of whether -fno-strict-aliasing was
provided, the vectorizer would only vectorize this loop with a runtime
memory-overlap check:

void foo(int *a, float *b) {
  for (int i = 0; i < 1600; ++i)
    a[i] = b[i];
}

This is suboptimal because the TBAA metadata already provides the information
necessary to show that this check unnecessary. Of course, the vectorizer has a
limit on the number of such checks it will insert, so in practice, ignoring
TBAA means not vectorizing more-complicated loops that we should.

This change causes the vectorizer to use an AliasSetTracker to keep track of
the pointers in the loop. The resulting alias sets are then used to partition
the space of dependency checks, and potential runtime checks; this results in
more-efficient vectorizations.

When pointer locations are added to the AliasSetTracker, two things are done:
  1. The location size is set to UnknownSize (otherwise you'd not catch
     inter-iteration dependencies)
  2. For instructions in blocks that would need to be predicated, TBAA is
     removed (because the metadata might have a control dependency on the condition
     being speculated).

For non-predicated blocks, you can leave the TBAA metadata. This is safe
because you can't have an iteration dependency on the TBAA metadata (if you
did, and you unrolled sufficiently, you'd end up with the same pointer value
used by two accesses that TBAA says should not alias, and that would yield
undefined behavior).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213486 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-20 23:07:52 +00:00
Ulrich Weigand
5ee5fc4c47 [PowerPC] ELFv2 MC support for .localentry directive
A second binutils feature needed to support ELFv2 is the .localentry
directive.  In the ELFv2 ABI, functions may have two entry points:
one for calling the routine locally via "bl", and one for calling the
function via function pointer (either at the source level, or implicitly
via a PLT stub for global calls).  The two entry points share a single
ELF symbol, where the ELF symbol address identifies the global entry
point address, while the local entry point is found by adding a delta
offset to the symbol address.  That offset is encoded into three
platform-specific bits of the ELF symbol st_other field.

The .localentry directive instructs the assembler to set those fields
to encode a particular offset.  This is typically used by a function
prologue sequence like this:

func:
        addis r2, r12, (.TOC.-func)@ha
        addi r2, r2, (.TOC.-func)@l
        .localentry func, .-func

Note that according to the ABI, when calling the global entry point,
r12 must be set to point the global entry point address itself; while
when calling the local entry point, r2 must be set to point to the TOC
base.  The two instructions between the global and local entry point in
the above example translate the first requirement into the second.

This patch implements support in the PowerPC MC streamers to emit the
.localentry directive (both into assembler and ELF object output), as
well as support in the assembler parser to parse that directive.

In addition, there is another change required in MC fixup/relocation
handling to properly deal with relocations targeting function symbols
with two entry points: When the target function is known local, the MC
layer would immediately handle the fixup by inserting the target
address -- this is wrong, since the call may need to go to the local
entry point instead.  The GNU assembler handles this case by *not*
directly resolving fixups targeting functions with two entry points,
but always emits the relocation and relies on the linker to handle
this case correctly.  This patch changes LLVM MC to do the same (this
is done via the processFixupValue routine).

Similarly, there are cases where the assembler would normally emit a
relocation, but "simplify" it to a relocation targeting a *section*
instead of the actual symbol.  For the same reason as above, this
may be wrong when the target symbol has two entry points.  The GNU
assembler again handles this case by not performing this simplification
in that case, but leaving the relocation targeting the full symbol,
which is then resolved by the linker.  This patch changes LLVM MC
to do the same (via the needsRelocateWithSymbol routine).
NOTE: The method used in this patch is overly pessimistic, since the
needsRelocateWithSymbol routine currently does not have access to the
actual target symbol, and thus must always assume that it might have
two entry points.  This will be improved upon by a follow-on patch
that modifies common code to pass the target symbol when calling
needsRelocateWithSymbol.

Reviewed by Hal Finkel.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213485 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-20 23:06:03 +00:00
Ulrich Weigand
0d9bcaacd1 [PowerPC] ELFv2 MC support for .abiversion directive
ELFv2 binaries are marked by a bit in the ELF header e_flags field.
A new assembler directive .abiversion can be used to set that flag.
This patch implements support in the PowerPC MC streamers to emit the
.abiversion directive (both into assembler and ELF binary output),
as well as support in the assembler parser to parse the .abiversion
directive.

Reviewed by Hal Finkel.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213484 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-20 22:56:57 +00:00
Ulrich Weigand
e4b2165648 [PowerPC] Fix FrameIndex handling in SelectAddressRegImm
The PPCTargetLowering::SelectAddressRegImm routine needs to handle
FrameIndex nodes in a special manner, by tranlating them into a
TargetFrameIndex node.  This was done in most cases, but seems to
have been neglected in one path: when the input tree has an OR of
the FrameIndex with an immediate.  This can happen if the FrameIndex
can be proven to be sufficiently aligned that an OR of that immediate
is equivalent to an ADD.

The missing handling of FrameIndex in that case caused the SelectionDAG
instruction selection to miss opportunities to merge the OR back into
the FrameIndex node, leading to superfluous addi/ori instructions in
the final assembler output.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213482 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-20 22:26:40 +00:00
Matt Arsenault
83a385a57b R600: Add missing test for concat_vectors
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213473 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-20 07:13:17 +00:00
Matt Arsenault
eb957bfe32 R600/SI: Remove dead code and add missing tests.
This probably was killed by some generic DAGCombiner
improvements in checking the TargetBooleanContents instead
of just 1.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213471 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-20 06:11:02 +00:00
Matt Arsenault
18ecf3fff3 R600/SI: implement range reduction for sin/cos
These instructions can only take a limited input range, and return
the constant value 1 out of range. We should do range reduction to
be able to process arbitrary values. Use a FRACT instruction after
normalization to achieve this. Also add a test for constant folding
with the lowered code with unsafe-fp-math enabled.

v2: use DAG lowering instead of intrinsic, adapt test
v3: calculate constant, fold pattern into instruction definition
v4: misc style fixes, add sin-fold testcase, cosmetics

Patch by Grigori Goronzy

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213458 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-19 18:44:39 +00:00
Hal Finkel
2350e9f6b7 [LoopVectorize] Propagate known metadata to vectorized instructions
There are some kinds of metadata that are safe to propagate from the scalar
instructions to the vector instructions (fpmath and tbaa currently).

Regarding TBAA, one might worry about propagating it on if-converted loads and
stores, because the metadata might have had a control dependency on the
condition, and thus actually aliased with some other non-speculated memory
access when the condition was false. However, this would be caught by the
runtime overlap checks.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213452 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-19 13:33:16 +00:00
Andrea Di Biagio
5bc21c3b57 [x86] Fix wrong shuffle mask in test 'combine-vec-shuffle-3.ll'. No functional change.
Function @test3c should check that the DAGCombiner is able to fold a pair of
shuffles into a new shuffle with a permute mask of <6,7,2,3>. However, one of
the shuffles in @test3c had a wrong permute mask; this prevented the DAGCombiner
from folding the shuffles into the expected result.
Now that the shuffle mask is fixed, the backend correctly folds the two shuffles
in function @test3c into a single movhlps instruction.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213451 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-19 07:52:58 +00:00
Hal Finkel
7c11695a23 Make Value::isDereferenceablePointer handle offsets to pointer types with dereferenceable attributes
When we have a parameter (or call site return) with a dereferenceable
attribute, it can specify the size of an array pointed to by that parameter. If
we have a value for which we can accumulate a constant offset to such a
parameter, then we can use that offset in a direct comparison with the size
specified by the dereferenceable attribute.

This enables us to handle cases like this:

  int foo(int a[static 3]) {
    return a[2]; /* this is always dereferenceable */
  }

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213447 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-19 03:25:16 +00:00
Saleem Abdulrasool
b0a5225e6f ARM: correct WoA __builtin_alloca handling on O0
When performing a dynamic stack adjustment without optimisations, we would mark
SP as def and R4 as kill.  This occurred as part of the expansion of a
WIN__CHKSTK SDNode which indicated the proper handling of SP and R4.  The result
would be that we would double define SP as part of an operation, which is
obviously incorrect.

Furthermore, the VTList for the chain had an incorrect parameter type of i32
instead of Other.

Correct these to permit proper lowering of __builtin_alloca at -O0.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213442 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-19 01:29:51 +00:00
Hal Finkel
d644d17dd4 [PowerPC] 32-bit ELF PIC support
This adds initial support for PPC32 ELF PIC (Position Independent Code; the
-fPIC variety), thus rectifying a long-standing deficiency in the PowerPC
backend.

Patch by Justin Hibbits!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213427 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 23:29:49 +00:00
Mark Heffernan
354f2afffd Remove unroll pragma metadata after it is used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213412 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 21:04:33 +00:00
Eli Bendersky
bd9c6c0773 Add tests for atomic adds on floats.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213406 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 20:11:26 +00:00
Eli Bendersky
6a33571ba9 Use CHECK-LABEL where appropriate in this test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213398 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 19:32:09 +00:00
Gerolf Hoflehner
d94715e273 MergedLoadStoreMotion pass
Merges equivalent loads on both sides of a hammock/diamond
and hoists into into the header.
Merges equivalent stores on both sides of a hammock/diamond
and sinks it to the footer.
Can enable if conversion and tolerate better load misses
and store operand latencies.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213396 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 19:13:09 +00:00
David Peixotto
12f33da20b MC: support different sized constants in constant pools
On AArch64 the pseudo instruction ldr <reg>, =... supports both
32-bit and 64-bit constants. Add support for 64 bit constants for
the pools to support the pseudo instruction fully.

Changes the AArch64 ldr-pseudo tests to use 32-bit registers and
adds tests with 64-bit registers.

Patch by Janne Grunau!

Differential Revision: http://reviews.llvm.org/D4279



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213387 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 16:05:14 +00:00
Hal Finkel
11af4b49b2 Add a dereferenceable attribute
This attribute indicates that the parameter or return pointer is
dereferenceable. Practically speaking, loads from such a pointer within the
associated byte range are safe to speculatively execute. Such pointer
parameters are common in source languages (C++ references, for example).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213385 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 15:51:28 +00:00
Tim Northover
e72ff8829e AArch64: implement efficient f16 bitcasts
Because i16 is illegal, there's no native DAG method to
represent a bitcast to or from an f16 type. This meant LLVM was
inserting a stack store/load pair which is really not ideal.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213378 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 13:07:05 +00:00
Tim Northover
b41b1d4bac NVPTX: support fpext/fptrunc to and from f16.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213377 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 13:01:43 +00:00
Tim Northover
7714a60ed1 R600: support fpext/fptrunc operations to and from f16.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213376 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 13:01:37 +00:00
Tim Northover
1a8bcdb72e AArch64: support f16 extend/trunc operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213375 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 13:01:31 +00:00
Tim Northover
e683321270 X86: support fpext/fptrunc operations to and from 16-bit floats.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213374 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 13:01:25 +00:00
Tim Northover
4413539ee4 ARM: support legalisation of "fptrunc ... to half" operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213373 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 13:01:19 +00:00
Tim Northover
0afed03229 CodeGen: soften f16 type by default instead of marking legal.
Actual support for softening f16 operations is still limited, and can be added
when it's needed.  But Soften is much closer to being a useful thing to try
than keeping it Legal when no registers can actually hold such values.

Longer term, we probably want something between Soften and Promote semantics
for most targets, it'll be more efficient to promote the 4 basic operations to
f32 than libcall them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213372 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 12:41:46 +00:00
Tilmann Scheller
7cd0201f02 [ARM] Add earlyclobber constraint to pre/post-indexed ARM STR instructions.
The post-indexed instructions were missing the constraint, causing unpredictable STR instructions to be emitted.

The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed.

This fixes PR20323.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213369 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 12:05:49 +00:00
Tim Northover
f811b2895f R600: rename misleading fp16 test.
This test is actually going in the opposite direction to what the
filename and function name suggested.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213358 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 08:43:30 +00:00
Tim Northover
cc03227446 R600: support f16 -> f64 conversion intrinsic.
Unfortunately, we don't seem to have a direct truncation, but the
extension can be legally split into two operations so we should
support that.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213357 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 08:43:24 +00:00
Tim Northover
7bbf5786d7 NVPTX: support direct f16 <-> f64 conversions via intrinsics.
Clang may well start emitting these soon, and while it may not be
directly relevant for OpenCL or GLSL, the instructions were just
sitting there waiting to be used.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213356 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 08:30:10 +00:00
Matt Arsenault
a32c319741 R600: Implement TTI:getPopcntSupport
The test is just copied from X86, and I don't know of a better
way to test it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213351 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 06:07:13 +00:00
Jim Grosbach
c6058e2462 X86: Constant fold converting vector setcc results to float.
Since the result of a SETCC for X86 is 0 or -1 in each lane, we can
move unary operations, in this case [su]int_to_fp through the mask
operation and constant fold the operation away. Generally speaking:
  UNARYOP(AND(VECTOR_CMP(x,y), constant))
      --> AND(VECTOR_CMP(x,y), constant2)
where constant2 is UNARYOP(constant).

This implements the transform where UNARYOP is [su]int_to_fp.

For example, consider the simple function:
define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind {
  %cmp = fcmp oeq <4 x float> %val, %test
  %ext = zext <4 x i1> %cmp to <4 x i32>
  %result = sitofp <4 x i32> %ext to <4 x float>
  ret <4 x float> %result
}

Before this change, the SSE code is generated as:
LCPI0_0:
  .long 1                       ## 0x1
  .long 1                       ## 0x1
  .long 1                       ## 0x1
  .long 1                       ## 0x1
  .section  __TEXT,__text,regular,pure_instructions
  .globl  _foo
  .align  4, 0x90
_foo:                                   ## @foo
  cmpeqps %xmm1, %xmm0
  andps LCPI0_0(%rip), %xmm0
  cvtdq2ps  %xmm0, %xmm0
  retq

After, the code is improved to:
LCPI0_0:
  .long 1065353216              ## float 1.000000e+00
  .long 1065353216              ## float 1.000000e+00
  .long 1065353216              ## float 1.000000e+00
  .long 1065353216              ## float 1.000000e+00
  .section  __TEXT,__text,regular,pure_instructions
  .globl  _foo
  .align  4, 0x90
_foo:                                   ## @foo
  cmpeqps %xmm1, %xmm0
  andps LCPI0_0(%rip), %xmm0
  retq

The cvtdq2ps has been constant folded away and the floating point 1.0f
vector lanes are materialized directly via the ModRM operand of andps.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213342 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 00:40:56 +00:00
Jim Grosbach
f4e104f5eb AArch64: Constant fold converting vector setcc results to float.
Since the result of a SETCC for AArch64 is 0 or -1 in each lane, we can
move unary operations, in this case [su]int_to_fp through the mask
operation and constant fold the operation away. Generally speaking:
  UNARYOP(AND(VECTOR_CMP(x,y), constant))
      --> AND(VECTOR_CMP(x,y), constant2)
where constant2 is UNARYOP(constant).

This implements the transform where UNARYOP is [su]int_to_fp.

For example, consider the simple function:
define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind {
  %cmp = fcmp oeq <4 x float> %val, %test
  %ext = zext <4 x i1> %cmp to <4 x i32>
  %result = sitofp <4 x i32> %ext to <4 x float>
  ret <4 x float> %result
}

Before this change, the code is generated as:
  fcmeq.4s  v0, v0, v1
  movi.4s v1, #0x1        // Integer splat value.
  and.16b v0, v0, v1      // Mask lanes based on the comparison.
  scvtf.4s  v0, v0        // Convert each lane to f32.
  ret

After, the code is improved to:
  fcmeq.4s  v0, v0, v1
  fmov.4s v1, #1.00000000 // f32 splat value.
  and.16b v0, v0, v1      // Mask lanes based on the comparison.
  ret

The svvtf.4s has been constant folded away and the floating point 1.0f
vector lanes are materialized directly via fmov.4s.

Rather than do the folding manually in the target code, teach getNode()
in the generic SelectionDAG to handle folding constant operands of
vector [su]int_to_fp nodes. It is reasonable (as noted in a FIXME) to do
additional constant folding there as well, but I don't have test cases
for those operations, so leaving them for another time when it becomes
appropriate.

rdar://17693791

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213341 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 00:40:52 +00:00
Michael J. Spencer
f2c19f622f Revert "[x86] Fold extract_vector_elt of a load into the Load's address computation."
There's a bug where this can create cycles in the DAG. It will take a bit
to fix, so I'm backing it out for now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213339 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 00:15:50 +00:00
Kevin Enderby
c61aa0219e Add printing of Mach-O stabs in llvm-nm.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213327 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-17 22:47:16 +00:00
Alexey Samsonov
30ea42931a [ASan] Don't instrument load/stores with !nosanitize metadata.
This is used to avoid instrumentation of instructions added by UBSan
in Clang frontend (see r213291). This fixes PR20085.

Reviewed in http://reviews.llvm.org/D4544.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213292 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-17 18:48:12 +00:00
Justin Holewinski
11ae250ec9 [NVPTX] Improve handling of FP fusion
We now consider the FPOpFusion flag when determining whether
to fuse ops.  We also explicitly emit add.rn when fusion is
disabled to prevent ptxas from fusing the operations on its
own.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213287 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-17 18:10:09 +00:00
Zinovy Nis
345cb09610 [BUG] Due to a typo introduced in r199933 and r200027 two tests for FMA were never even started.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213283 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-17 17:14:35 +00:00
Adam Nemet
6ae2941874 [X86] AVX512: Add disassembler support for compressed displacement
There are two parts here.  First is to modify tablegen to adjust the encoding
type ENCODING_RM with the scaling factor.

The second is to use the new encoding types to compute the correct
displacement in the decoder.

Fixes <rdar://problem/17608489>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213281 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-17 17:04:56 +00:00
Adam Nemet
30cced119b [TableGen] Allow shift operators to take bits<n>
Convert the operand to int if possible, i.e. if the value is properly
initialized.  (I suppose there is further room for improvement here to also
peform the shift if the uninitialized bits are shifted out.)

With this little change we can now compute the scaling factor for compressed
displacement with pure tablegen code in the X86 backend.  This is useful
because both the X86-disassembler-specific part of tablegen and the assembler
need this and TD is the natural sharing place.

The patch also adds the missing documentation for the shift and add operator.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213277 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-17 17:04:27 +00:00
Justin Holewinski
07bc0b6ae6 [NVPTX] Add missing .v4 qualifier on vector store instruction
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213276 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-17 16:58:56 +00:00
Saleem Abdulrasool
7be793b8cd MC: correct DWARF header for PE/COFF assembly input
The header contains an offset to the DWARF abbreviations for the CU.  The offset
must be section relative for COFF and absolute for others.  The non-assembly
code path for the DWARF header generation already had the correct emission for
the headers.  This corrects just the assembly path.  Due to the invalid
relocation, processing of the debug information would halt previously on the
first assembly input as the associated abbreviations would be out of range as
they would have the location increased by image base and the section offset.

This address PR20332.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213275 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-17 16:27:44 +00:00