Commit Graph

12680 Commits

Author SHA1 Message Date
Ramkumar Ramachandra
e67f5de1f7 overloaded-intrinsic-name: exercise anyptr on struct
No other test I know shows how struct names are mangled in overloaded
intrinsic functions.

Differential Revision: http://reviews.llvm.org/D7037

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227229 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-27 20:03:08 +00:00
Marek Olsak
fd55bcd060 R600/SI: Enable all tests that pass on VI without changes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227214 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-27 17:27:15 +00:00
Simon Pilgrim
44513da617 [X86][SSE] Float comparisons can sometimes be safely commuted
For ordered, unordered, equal and not-equal tests, packed float and double comparison instructions can be safely commuted without affecting the results. This patch checks the comparison mode of the (v)cmpps + (v)cmppd instructions and commutes the result if it can.

Differential Revision: http://reviews.llvm.org/D7178

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227145 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 22:29:24 +00:00
Simon Pilgrim
3ba85ab23a [X86][PCLMUL] Enable commutation for PCLMUL instructions
Patch to allow (v)pclmulqdq to be commuted - swaps the src registers and inverts the immediate (low/high) src mask.

Differential Revision: http://reviews.llvm.org/D7180

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227141 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 22:00:18 +00:00
Simon Pilgrim
38c35f3e2c Line endings fix. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227138 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 21:28:32 +00:00
Matt Arsenault
b33118d503 R600: Cleanup or test
Fix broken check lines, use multiple check prefixes,
add an additional test for i1 or.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227137 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 21:16:10 +00:00
Simon Pilgrim
1d34ec14e9 Line endings fix. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227136 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 21:15:42 +00:00
Bruno Cardoso Lopes
0f3d4650f7 [x86][MMX] Rename and cleanup tests: arith, intrinsics and shuffle
- Rename mmx-builtins to mmx-intrinsics to match other intrinsic test naming.
- Remove tests that duplicate functionality from mmx-intrinsics.ll.
- Move arith related tests to mmx-arith.ll.
- MMX related shuffle goes to vector-shuffle-mmx.ll.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227130 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 20:06:51 +00:00
Justin Holewinski
33eac3ee53 [NVPTX] Generate a more optimal sequence for select of i1
Instead of creating a pattern like "(p && a) || ((!p) && b)",
just expand the i8 operands to i32 and perform the selp on them.

Fixes PR22246

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227123 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 19:52:20 +00:00
Justin Holewinski
48e872230d [NVPTX] Handle floating-point conversion patterns that are not explicitly ordered or unordered
Fixes PR22322

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227117 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 19:11:20 +00:00
Alex Rosenberg
8da9a6686a Use a different encoding for debugtrap on PS4.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227116 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 19:09:27 +00:00
Sanjay Patel
956d6f0cf5 Model sqrtsd as a binary operation with one source operand tied to the destination (PR14221)
This patch fixes the following miscompile:

define void @sqrtsd(<2 x double> %a) nounwind uwtable ssp {
  %0 = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a) nounwind 
  %a0 = extractelement <2 x double> %0, i32 0
  %conv = fptrunc double %a0 to float
  %a1 = extractelement <2 x double> %0, i32 1
  %conv3 = fptrunc double %a1 to float
  tail call void @callee2(float %conv, float %conv3) nounwind
  ret void
}

Current codegen:

sqrtsd	%xmm0, %xmm1        ## high element of %xmm1 is undef here
xorps	%xmm0, %xmm0
cvtsd2ss	%xmm1, %xmm0
shufpd	$1, %xmm1, %xmm1
cvtsd2ss	%xmm1, %xmm1 ## operating on undef value
jmp	_callee

This is a continuation of http://llvm.org/viewvc/llvm-project?view=revision&revision=224624 ( http://reviews.llvm.org/D6330 ) 
which was itself a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ).

All of these patches are partial fixes for PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ); 
this should be the final patch needed to resolve that bug.

Differential Revision: http://reviews.llvm.org/D6885



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227111 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 18:42:16 +00:00
Eric Christopher
fcd3c4065d Move the Mips target to storing the ABI in the TargetMachine rather
than on MipsSubtargetInfo.

This required a bit of massaging in the MC level to handle this since
MC is a) largely a collection of disparate classes with no hierarchy,
and b) there's no overarching equivalent to the TargetMachine, instead
only the subtarget via MCSubtargetInfo (which is the base class of
TargetSubtargetInfo).

We're now storing the ABI in both the TargetMachine level and in the
MC level because the AsmParser and the TargetStreamer both need to
know what ABI we have to parse assembly and emit objects. The target
streamer has a pointer to the one in the asm parser and is updated
when the asm parser is created. This is fragile as the FIXME comment
notes, but shouldn't be a problem in practice since we always
create an asm parser before attempting to emit object code via the
assembler. The TargetMachine now contains the ABI so that the DataLayout
can be constructed dependent upon ABI.

All testcases have been updated to use the -target-abi command line
flag so that we can set the ABI without using a subtarget feature.

Should be no change visible externally here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227102 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 17:33:46 +00:00
Sanjay Patel
dbc6dda771 fix line-endings; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227095 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 17:21:36 +00:00
Vasileios Kalintiris
536bce219d [mips] Enable arithmetic and binary operations for the i128 data type.
Summary:
This patch adds support for some operations that were missing from
128-bit integer types (add/sub/mul/sdiv/udiv... etc.). With these
changes we can support the __int128_t and __uint128_t data types
from C/C++.

Depends on D7125

Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7143

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227089 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 12:33:22 +00:00
Vasileios Kalintiris
71ec66e7fd [mips] Add tests for bitwise binary and integer arithmetic operators.
Reviewers: dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7125

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227087 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 12:04:40 +00:00
Vasileios Kalintiris
823e8548a0 Revert "[mips] Fix assertion on i128 addition/subtraction on MIPS64"
This reverts commit r227003. Support for addition/subtraction and
various other operations for the i128 data type will be added in a
future commit based on the review D7143.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227082 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 09:53:30 +00:00
Craig Topper
1656cc8ddb [X86] Change comparision immediate type to i8 in test cases for AVX512 floating point comparisons. The type was already changed in the definitions and was being auto upgraded to the new type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227064 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 23:26:12 +00:00
Craig Topper
fd176682b9 [X86] Use i8 immediate for comparison type on AVX512 packed integer instructions. This matches floating point equivalents. Includes autoupgrade support to convert old code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227063 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 23:26:02 +00:00
Bill Schmidt
9dbb6a4f63 [PowerPC] Revert ppc64le-aggregates.ll test changes from r227053
It appears we have different behavior with and without -mcpu=pwr8 even
with ppc64le defaulting to POWER8.  The failure appears as follows:

/home/bb/cmake-llvm-x86_64-linux/llvm-project/llvm/test/CodeGen/PowerPC/ppc64le-aggregates.ll:268:14: error: expected string not found in input
; CHECK-DAG: lfs 1, 0([[REG]])
             ^
<stdin>:497:11: note: scanning from here
 ld 3, .LC1@toc@l(3)
          ^
<stdin>:497:11: note: with variable "REG" equal to "3"
 ld 3, .LC1@toc@l(3)
          ^
<stdin>:514:2: note: possible intended match here
 lfs 1, 0(4)
 ^

Reverting this particular test case change.  Nemanja, please have a look
at the reason for the failure.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227055 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 18:18:54 +00:00
Bill Schmidt
c536eed4d8 [PowerPC] Reset the baseline for ppc64le to be equivalent to pwr8
Test by Nemanja Ivanovic.

Since ppc64le implies POWER8 as a minimum, it makes sense that the
same features are included. Since the pwr8 processor model will likely
be getting new features until the implementation is complete, I
created a new list to add these updates to. This will include them in
both pwr8 and ppc64le.

Furthermore, it seems that it would make sense to compose the feature
lists for other processor models (pwr3 and up). Per discussion in the
review, I will make this change in a subsequent patch.

In order to test the changes, I've added an additional run step to
test cases that specify -march=ppc64le -mcpu=pwr8 to omit the -mcpu
option. Since the feature lists are the same, the behaviour should be
unchanged.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227053 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 18:05:42 +00:00
Elena Demikhovsky
717d41d8c3 AVX-512: Changes in operations on masks registers for KNL and SKX
- Added KSHIFTB/D/Q for skx
- Added KORTESTB/D/Q for skx
- Fixed store operation for v8i1 type for KNL
- Store size of v8i1, v4i1 and v2i1 are changed to 8 bits



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227043 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 12:47:15 +00:00
Alexei Starovoitov
fb490166f4 bpf: add missing lit.local.cfg
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227009 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-24 18:20:52 +00:00
Alexei Starovoitov
4fe85c7548 BPF backend
Summary:
V8->V9:
- cleanup tests

V7->V8:
- addressed feedback from David:
- switched to range-based 'for' loops
- fixed formatting of tests

V6->V7:
- rebased and adjusted AsmPrinter args
- CamelCased .td, fixed formatting, cleaned up names, removed unused patterns
- diffstat: 3 files changed, 203 insertions(+), 227 deletions(-)

V5->V6:
- addressed feedback from Chandler:
- reinstated full verbose standard banner in all files
- fixed variables that were not in CamelCase
- fixed names of #ifdef in header files
- removed redundant braces in if/else chains with single statements
- fixed comments
- removed trailing empty line
- dropped debug annotations from tests
- diffstat of these changes:
  46 files changed, 456 insertions(+), 469 deletions(-)

V4->V5:
- fix setLoadExtAction() interface
- clang-formated all where it made sense

V3->V4:
- added CODE_OWNERS entry for BPF backend

V2->V3:
- fix metadata in tests

V1->V2:
- addressed feedback from Tom and Matt
- removed top level change to configure (now everything via 'experimental-backend')
- reworked error reporting via DiagnosticInfo (similar to R600)
- added few more tests
- added cmake build
- added Triple::bpf
- tested on linux and darwin

V1 cover letter:
---------------------
recently linux gained "universal in-kernel virtual machine" which is called
eBPF or extended BPF. The name comes from "Berkeley Packet Filter", since
new instruction set is based on it.
This patch adds a new backend that emits extended BPF instruction set.

The concept and development are covered by the following articles:
http://lwn.net/Articles/599755/
http://lwn.net/Articles/575531/
http://lwn.net/Articles/603983/
http://lwn.net/Articles/606089/
http://lwn.net/Articles/612878/

One of use cases: dtrace/systemtap alternative.

bpf syscall manpage:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b4fc1a460f3017e958e6a8ea560ea0afd91bf6fe

instruction set description and differences vs classic BPF:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/filter.txt

Short summary of instruction set:
- 64-bit registers
  R0      - return value from in-kernel function, and exit value for BPF program
  R1 - R5 - arguments from BPF program to in-kernel function
  R6 - R9 - callee saved registers that in-kernel function will preserve
  R10     - read-only frame pointer to access stack
- two-operand instructions like +, -, *, mov, load/store
- implicit prologue/epilogue (invisible stack pointer)
- no floating point, no simd

Short history of extended BPF in kernel:
interpreter in 3.15, x64 JIT in 3.16, arm64 JIT, verifier, bpf syscall in 3.18, more to come in the future.

It's a very small and simple backend.
There is no support for global variables, arbitrary function calls, floating point, varargs,
exceptions, indirect jumps, arbitrary pointer arithmetic, alloca, etc.
From C front-end point of view it's very restricted. It's done on purpose, since kernel
rejects all programs that it cannot prove safe. It rejects programs with loops
and with memory accesses via arbitrary pointers. When kernel accepts the program it is
guaranteed that program will terminate and will not crash the kernel.

This patch implements all 'must have' bits. There are several things on TODO list,
so this is not the end of development.
Most of the code is a boiler plate code, copy-pasted from other backends.
Only odd things are lack or < and <= instructions, specialized load_byte intrinsics
and 'compare and goto' as single instruction.
Current instruction set is fixed, but more instructions can be added in the future.

Signed-off-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>

Subscribers: majnemer, chandlerc, echristo, joerg, pete, rengolin, kristof.beyls, arsenm, t.p.northover, tstellarAMD, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D6494

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227008 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-24 17:51:26 +00:00
Daniel Sanders
f945a40f9d [mips] Fix assertion on i128 addition/subtraction on MIPS64
Summary:
In addition to the included tests, this fixes
test/CodeGen/Generic/i128-addsub.ll on a mips64 host.

Reviewers: atanasyan, sagar, vmedic

Reviewed By: vmedic

Subscribers: sdkie, llvm-commits

Differential Revision: http://reviews.llvm.org/D6610

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227003 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-24 12:58:10 +00:00
Andrea Di Biagio
b981dc4a21 [DAG] Fix wrong canonicalization performed on shuffle nodes.
This fixes a regression introduced by r226816.
When replacing a splat shuffle node with a constant build_vector,
make sure that the new build_vector has a valid number of elements.

Thanks to Patrik Hagglund for reporting this problem and providing a
small reproducible.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227002 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-24 11:54:29 +00:00
Quentin Colombet
af1cd03764 [AArch64][LoadStoreOptimizer] Form LDPSW when possible.
This patch adds the missing LD[U]RSW variants to the load store optimizer, so
that we generate LDPSW when possible.

<rdar://problem/19583480>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226978 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-24 01:25:54 +00:00
Tom Stellard
5b37a2e5ff R600/SI: Emit .hsa.version section for amdhsa OS
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226970 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 23:59:08 +00:00
Reid Kleckner
339591e0a9 Fix assertion when C++ EH filters are present in functions using SEH
Should fix PR22305.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226969 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 23:51:25 +00:00
Bruno Cardoso Lopes
807360ab08 [x86] Combine x86mmx/i64 to v2i64 conversion to use scalar_to_vector
Handle the poor codegen for i64/x86xmm->v2i64 (%mm -> %xmm) moves. Instead of
using stack store/load pair to do the job, use scalar_to_vector directly, which
in the MMX case can use movq2dq. This was the current behavior prior to
improvements for vector legalization of extloads in r213897.

This commit fixes the regression and as a side-effect also remove some
unnecessary shuffles.

In the new attached testcase, we go from:

pshufw  $-18, (%rdi), %mm0
movq    %mm0, -8(%rsp)
movq    -8(%rsp), %xmm0
pshufd  $-44, %xmm0, %xmm0
movd    %xmm0, %eax
...

To:

pshufw  $-18, (%rdi), %mm0
movq2dq %mm0, %xmm0
movd    %xmm0, %eax
...

Differential Revision: http://reviews.llvm.org/D7126
rdar://problem/19413324

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226953 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 22:44:16 +00:00
Tom Stellard
511a3c71fc R600/SI: Move i64 -> v2i32 load promotion into AMDGPUDAGToDAGISel::Select()
We used to do this promotion during DAG legalization, but this
caused an infinite loop in ExpandUnalignedLoad() because it assumed
that i64 loads were legal if i64 was a legal type.

It also seems better to report i64 loads as legal, since they actually
are and we were just promoting them to simplify our tablegen files.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226945 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 22:05:45 +00:00
Reid Kleckner
26ba4c13a7 Classify functions by EH personality type rather than using the triple
This mostly reverts commit r222062 and replaces it with a new enum. At
some point this enum will grow at least for other MSVC EH personalities.

Also beefs up the way we were sniffing the personality function.
Previously we would emit the Itanium LSDA despite using
__C_specific_handler.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6987

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226920 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 18:49:01 +00:00
Jyoti Allur
245caec9b3 This patch fixes issue with lowering below mentioned pattern :-
_foo:
        smull	 r0, r1, r1, r0
	smull	 r2, r3, r3, r2
	adds	r0, r2, r0
	adc	r1, r3, r1
	bx	lr

to

_foo:
        smull	 r0, r1, r1, r0
	smlal	 r0, r1, r3, r2
	bx	lr


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226904 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 09:10:03 +00:00
Craig Topper
d05a6aa4e6 [x86] Change u8imm operands to always print as unsigned. This makes shuffle masks and the like make way more sense.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226902 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-23 08:00:59 +00:00
Jan Vesely
1d07592ec7 R600: Try to use lower types for 64bit division if possible
v2: add and enable tests for SI

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226881 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 23:42:43 +00:00
Simon Pilgrim
316b43f7df [X86][AVX] Added (V)MOVDDUP / (V)MOVSLDUP / (V)MOVSHDUP memory folding + tests.
Minor tweak now that D7042 is complete, we can enable stack folding for (V)MOVDDUP and do proper testing.

Added missing AVX ymm folding patterns and fixed alignment for AVX VMOVSLDUP / VMOVSHDUP.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226873 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 22:39:59 +00:00
Simon Pilgrim
6377361399 Line endings fixes. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226872 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 22:27:37 +00:00
Simon Pilgrim
c7d6e9b0f9 [X86][SSE] Simplified PSUBUS tests
Removed loops from PSUBUS tests - ensures folding is tested. Also renamed SSE2 tests SSSE3 to match cpu.

This is a follow up commit agreed in http://reviews.llvm.org/D7094



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226871 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 22:19:58 +00:00
Ramkumar Ramachandra
230796b278 Intrinsics: introduce llvm_any_ty aka ValueType Any
Specifically, gc.result benefits from this greatly. Instead of:

gc.result.int.*
gc.result.float.*
gc.result.ptr.*
...

We now have a gc.result.* that can specialize to literally any type.

Differential Revision: http://reviews.llvm.org/D7020

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226857 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 20:14:38 +00:00
Sanjay Patel
05d5e213c4 merge consecutive stores of extracted vector elements (PR21711)
This is a 2nd try at the same optimization as http://reviews.llvm.org/D6698. 
That patch was checked in at r224611, but reverted at r225031 because it
caused a failure outside of the regression tests.

The cause of the crash was not recognizing consecutive stores that have mixed
source values (loads and vector element extracts), so this patch adds a check
to bail out if any store value is not coming from a vector element extract.

This patch also refactors the shared logic of the constant source and vector
extracted elements source cases into a helper function.

Differential Revision: http://reviews.llvm.org/D6850
 


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226845 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 18:21:26 +00:00
Michael Kuperstein
a52ddfa930 [DAGCombine] Produce better code for constant splats
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.

Differential Revision: http://reviews.llvm.org/D7093

Fixed recommit of r226811.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226816 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 13:07:28 +00:00
Michael Kuperstein
8fc1c3a619 Revert r226811, MSVC accepts code sane compilers don't.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226814 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 12:48:07 +00:00
Michael Kuperstein
0a979a09ae [DAGCombine] Produce better code for constant splats
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.

Differential Revision: http://reviews.llvm.org/D7093

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226811 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 12:37:23 +00:00
Elena Demikhovsky
2785766bc8 Fixed a bug in type legalizer for masked load/store intrinsics.
The problem occurs when after vectorization we have type
<2 x i32>. This type is promoted to <2 x i64> and then requires
additional efforts for expanding loads and truncating stores.
I added EXPAND / TRUNCATE attributes to the masked load/store
SDNodes. The code now contains additional shuffles.
I've prepared changes in the cost estimation for masked memory
operations, it will be submitted separately.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226808 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 12:07:59 +00:00
Elena Demikhovsky
cdce03426d Fixed a bug in narrowing store operation.
Type MVT::i1 became legal in KNL, but store operation can't be narrowed to this type,
since the size of VT (1 bit) is not equal to its actual store size(8 bits).

Added a test provided by David (dag@cray.com)


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226805 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 09:39:08 +00:00
Reid Kleckner
73f671a60e SEH: Finish writing the catch-all test case
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226768 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 02:31:09 +00:00
Reid Kleckner
0d056fd4c3 Win64 SEH: Emit the constant 1 for catch-all into xdata
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226767 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-22 02:27:44 +00:00
Simon Pilgrim
3f6acdd265 [X86][SSE] Missing SSE/AVX1 memory folding integer instructions
Added most of the missing integer vector folding patterns for SSE (to SSE42) and AVX1.

The most useful of these are probably the i32/i64 extraction, i8/i16/i32/i64 insertions, zero/sign extension, unsigned saturation subtractions, i64 subtractions and the variable mask blends (pblendvb) - others include CLMUL, SSE42 string comparisons and bit tests.

Differential Revision: http://reviews.llvm.org/D7094



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226745 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 23:43:30 +00:00
Tim Northover
f5f8a3e6a6 DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))
It can help with argument juggling on some targets, and is generally a good
idea.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226740 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 23:17:19 +00:00
Matt Arsenault
85661f76e3 R600: Add checks for urem/srem by a constant
Make sure this uses the faster expansion using magic constants
to avoid the full division path.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226734 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 22:56:15 +00:00
Simon Pilgrim
4269590166 [X86][SSE] Added support for SSE3 lane duplication shuffle instructions
This patch adds shuffle matching for the SSE3 MOVDDUP, MOVSLDUP and MOVSHDUP instructions. The big use of these being that they avoid many single source shuffles from needing to use (pre-AVX) dual source instructions such as SHUFPD/SHUFPS: causing extra moves and preventing load folds.

Adding these instructions uncovered an issue in XFormVExtractWithShuffleIntoLoad which crashed on single operand shuffle instructions (now fixed). It also involved fixing getTargetShuffleMask to correctly identify theses instructions as unary shuffles.

Also adds a missing tablegen pattern for MOVDDUP.

Differential Revision: http://reviews.llvm.org/D7042



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226716 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 22:44:35 +00:00
Matt Arsenault
50c3bc9956 R600: Add missing tests for i64 srem
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226713 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 22:43:19 +00:00
Jonathan Roelofs
cab5680f6c Fix load-store optimizer on thumbv4t
Thumbv4t does not have lo->lo copies other than MOVS,
and that can't be predicated. So emit MOVS when needed
and bail if there's a predicate.

http://reviews.llvm.org/D6592


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226711 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 22:39:43 +00:00
Matt Arsenault
305228cc0b R600/SI: Custom lower fround
This fixes it for SI. It also removes the pattern
used previously for Evergreen for f32. I'm not sure
if the the new R600 output is better or not, but it uses
1 fewer instructions if BFI is available.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226682 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 18:18:25 +00:00
Colin LeMahieu
62b9c33e13 [Hexagon] Converting multiply and accumulate with immediate intrinsics to patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226681 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 18:13:15 +00:00
Ahmed Bougacha
34288d885e [X86] Declare SSE4.1/AVX2 vector extloads covered by PMOV[SZ]X legal.
Now that we can fully specify extload legality, we can declare them
legal for the PMOVSX/PMOVZX instructions.  This for instance enables
a DAGCombine to fire on code such as
  (and (<zextload-equivalent> ...), <redundant mask>)
to turn it into:
  (zextload ...)
as seen in the testcase changes.

There is one regression, in widen_load-2.ll: we're no longer able
to do store-to-load forwarding with illegal extload memory types.
This will be addressed separately.

Differential Revision: http://reviews.llvm.org/D6533


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226676 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 17:07:06 +00:00
Tim Northover
c49e57ade1 Revert "DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))"
It hadn't gone through review yet, but was still on my local copy.

This reverts commit r226663

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226665 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 15:48:52 +00:00
Tim Northover
004d725549 AArch64: add backend option to reserve x18 (platform register)
AAPCS64 says that it's up to the platform to specify whether x18 is
reserved, and a first step on that way is to add a flag controlling
it.

From: Andrew Turner <andrew@fubar.geek.nz>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226664 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 15:43:31 +00:00
Tim Northover
47f47f5d2a DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226663 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 15:43:28 +00:00
Michael Kuperstein
0b4244ade1 [x32] Fast ISel should use LEA64_32r instead of LEA32r to adjust addresses in x32 mode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226661 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 14:44:05 +00:00
Simon Pilgrim
650d4f00ae [X86][AVX] Simplified diff between AVX1 and SSE42 fp stack folding tests. NFC.
Changed the AVX1 tests register spill tail call to return a xmm like the SSE42 version - makes doing diffs between them a lot easier without affecting the spills themselves.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226623 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-21 00:02:13 +00:00
Simon Pilgrim
8608e5bbc7 [X86][SSE] Added SSE/AVX1 integer stack folding tests.
Some folding patterns + tests are missing (marked as TODO) - these will be added in a future patch for review.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226622 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 23:54:17 +00:00
Simon Pilgrim
32f0438ada [X86][SSE] Added SSE fp stack folding tests.
Some folding patterns + tests are missing (marked as TODO) - these will be added in a future patch for review.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226621 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 23:50:18 +00:00
Simon Pilgrim
bddfe2660e [X86][AVX] Renamed AVX1 fp stack folding tests. NFC.
The SSE42 version of the AVX1 float stack folding tests will be added shortly, this renames the AVX1 file so that the files will be near each other in a directory listing to help ensure they are kept in sync.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226620 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 23:45:50 +00:00
Colin LeMahieu
0d9733d596 [Hexagon] Adding intrinsics for doubleword ALU operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226606 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 20:45:05 +00:00
Colin LeMahieu
50e4abc0ee [Hexagon] Removing unnecessary clutter in intrinsic tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226602 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 19:46:07 +00:00
Daniel Jasper
529ff2f257 Prevent binary-tree deterioration in sparse switch statements.
This addresses part of llvm.org/PR22262. Specifically, it prevents
considering the densities of sub-ranges that have fewer than
TLI.getMinimumJumpTableEntries() elements. Those densities won't help
jump tables.

This is not a complete solution but works around the most pressing
issue.

Review: http://reviews.llvm.org/D7070

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226600 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 19:43:33 +00:00
Ramkumar Ramachandra
b8ae0acfaf [GC] Verify-pass void vararg functions in gc.statepoint
With the appropriate Verifier changes, exactracting the result out of a
statepoint wrapping a vararg function crashes. However, a void vararg
function works fine: commit this first step.

Differential Revision: http://reviews.llvm.org/D7071

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226599 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 19:42:46 +00:00
Tom Stellard
5d96beaab5 R600/SI: Fix simple-loop.ll test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226596 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 19:33:02 +00:00
Tom Stellard
ad7a884efe R600/SI: Add kill flag when copying scratch offset to a register
This allows us to re-use the same register for the scratch offset
when accessing large private arrays.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226585 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 17:49:45 +00:00
Tom Stellard
a978a481bb R600/SI: Don't store scratch buffer frame index in MUBUF offset field
We don't have a good way of legalizing this if the frame index offset
is more than the 12-bits, which is size of MUBUF's offset field, so
now we store the frame index in the vaddr field.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226584 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 17:49:43 +00:00
Kai Nacke
fa4d8baf54 [mips] Add registers and ALL check prefix to octeon test case.
No functional change.

Reviewed by D. Sanders


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226574 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 16:14:02 +00:00
Kai Nacke
57e80129b9 [mips] Add octeon branch instructions bbit0/bbit032/bbit1/bbit132
This commits adds the octeon branch instructions bbit0/bbit032/bbit1/bbit132.
It also includes patterns for instruction selection and test cases.

Reviewed by D. Sanders


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226573 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-20 16:10:51 +00:00
Simon Pilgrim
2a2f94c5f2 [X86][AVX] Missing AVX1 memory folding float instructions
Now that we can create much more exhaustive X86 memory folding tests, this patch adds the missing AVX1/F16C floating point instruction stack foldings we can easily test for including the scalar intrinsics (add, div, max, min, mul, sub), conversions float/int to double, half precision conversions, rounding, dot product and bit test. The patch also adds a couple of obviously missing SSE instructions (more to follow once we have full SSE testing).

Now that scalar folding is working it broke a very old test (2006-10-07-ScalarSSEMiscompile.ll) - this test appears to make no sense as its trying to ensure that a scalar subtraction isn't folded as it 'would zero the top elts of the loaded vector' - this test just appears to be wrong to me.

Differential Revision: http://reviews.llvm.org/D7055



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226513 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 22:40:45 +00:00
Colin LeMahieu
dc8beeba1b [Hexagon] Updating muxir/ri/ii intrinsics. Setting predicate registers as compatible with i32 rather than doing custom type conversion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226500 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 20:31:18 +00:00
Colin LeMahieu
3bea6a4959 [Hexagon] Converting intrinsics combine imm/imm, simple shifts and extends.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226483 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 18:56:19 +00:00
Colin LeMahieu
596cfabbc4 [Hexagon] Converting remaining ALU32/ALU intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226480 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 18:33:58 +00:00
Colin LeMahieu
254d992ab8 [Hexagon] Converting ALU32/ALU intrinsics to new patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226478 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 18:22:19 +00:00
Greg Fitzgerald
4b96323e81 [AArch64] Implement GHC calling convention
Original patch by Luke Iannini.  Minor improvements and test added by
Erik de Castro Lopo.

Differential Revision: http://reviews.llvm.org/D6877

From: Erik de Castro Lopo <erikd@mega-nerd.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226473 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 17:40:05 +00:00
Colin LeMahieu
438f1e4979 [Hexagon] Converting halfword to double accumulating multiply intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226472 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 17:36:32 +00:00
Rafael Espindola
4b678bff4e Bring r226038 back.
No change in this commit, but clang was changed to also produce trivial comdats when
needed.

Original message:

Don't create new comdats in CodeGen.

This patch stops the implicit creation of comdats during codegen.

Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226467 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 15:16:06 +00:00
Michael Kuperstein
5a0c8601d3 [MIScheduler] Slightly better handling of constrainLocalCopy when both source and dest are local
This fixes PR21792.

Differential Revision: http://reviews.llvm.org/D6823

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226433 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 07:30:47 +00:00
Hal Finkel
d1f1656447 [PowerPC] Add r2 as an operand for all calls under both PPC64 ELF V1 and V2
Our PPC64 ELF V2 call lowering logic added r2 as an operand to all direct call
instructions in order to represent the dependency on the TOC base pointer
value. Restricting this to ELF V2, however, does not seem to make sense: calls
under ELF V1 have the same dependence, and indirect calls have an r2 dependence
just as direct ones. Make sure the dependence is noted for all calls under both
ELF V1 and ELF V2.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226432 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 07:20:27 +00:00
Matt Arsenault
676db0a373 R600: Remove redundant test
This is already covered in ftrunc.ll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226412 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 19:30:32 +00:00
Simon Pilgrim
a707cf4652 [X86][SSE] Added scalar min/max folding tests. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226406 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 18:06:23 +00:00
Simon Pilgrim
7c62013afe [X86][SSE] Added float extract and xmm extract/insert stack folding tests. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226405 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 17:04:32 +00:00
Simon Pilgrim
2793dd7a29 [X86][SSE] Added scalar conversion stack folding tests. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226404 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 16:22:15 +00:00
Simon Pilgrim
65a3a9bea3 AVX1 stack folding tests. NFC.
Begun adding more exhaustive tests - all floating point instructions should now be either tested or have placeholders. We do seem to have a number of missing instructions, I will add a patch for review once the remaining working instructions are added.

I'll then move on to SSE tests and then the integer instructions.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226400 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 12:56:39 +00:00
Hal Finkel
a01b583dbc [PowerPC] Initial PPC64 calling-convention changes for fastcc
The default calling convention specified by the PPC64 ELF (V1 and V2) ABI is
designed to work with both prototyped and non-prototyped/varargs functions. As
a result, GPRs and stack space are allocated for every argument, even those
that are passed in floating-point or vector registers.

GlobalOpt::OptimizeFunctions will transform local non-varargs functions (that
do not have their address taken) to use the 'fast' calling convention.

When functions are using the 'fast' calling convention, don't allocate GPRs for
arguments passed in other types of registers, and don't allocate stack space for
arguments passed in registers. Other changes for the fast calling convention
may be added in the future.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226399 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 12:08:47 +00:00
Hal Finkel
962cff0c08 [PowerPC] Don't list R11 as a patchpoint scratch register
R11's status is the same under both the PPC64 ELF V1 and V2 ABIs: it is
reserved for use as an "environment pointer" for compilation models that
require such a thing. We don't, we also don't need a second scratch register,
and because we support only "local" patchpoint call targets, we might as well
let R11 be used for anyregcc patchpoints.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226369 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-17 03:57:34 +00:00
Mehdi Amini
5eed637b34 Improve DAG combine pass on certain IR vector patterns
Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector
produced suboptimal code in AVX2 mode with certain IR combinations.

In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32
(undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical,
but then mysteriously generated rather bad code; the movq/movhpd combination
didn't match.

The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs
would get promoted to 4f32 by the type legalizer, eventually resulting
in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing
these were both half the output size, concatted them and then produced
a shuffle. However, the resulting concat + shuffle was more complex than
it should be; in the case where the upper half of the output is undef, we
probably want to generate shuffle + concat instead.

This enhancement causes the vector_shuffle combine step to recognize this
suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR
in case the same suboptimal pattern occurs for other reasons.

This results in the optimizer correctly producing the optimal movq + movhpd
sequence for all three variations on this IR, even with AVX2.

I've included a test case.

Radar link: rdar://problem/19287012
Fix for PR 21943.

From: Fiona Glaser <fglaser@apple.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226360 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-17 01:35:56 +00:00
Matt Arsenault
b2bb846f17 R600: Clean up floor tests
These were using different naming schemes,
not using multiple check prefixes and not using
-LABEL.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226333 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 22:11:00 +00:00
Colin LeMahieu
d08a6cd083 [Hexagon] Converting halfword to doubleword multiply intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226326 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 21:41:57 +00:00
Colin LeMahieu
9a56f06825 [Hexagon] Converting accumulating halfword multiply intrinsics to patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226324 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 21:36:34 +00:00
Colin LeMahieu
67451fe320 [Hexagon] Beginning converting intrinsics to patterns instead of duplicated definitions. Converting halfword multiply intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226318 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 20:38:54 +00:00
Adam Nemet
ad2ac976af [AVX512] Add intrinsics for masked aligned FP loads and stores
Similar to the unaligned cases.

Test was generated with update_llc_test_checks.py.

Part of <rdar://problem/17688758>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226296 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 18:50:09 +00:00
Adam Nemet
1310e4f3c7 [AVX512] Remove trailing whitespaces in this test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226295 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 18:50:07 +00:00
Andrea Di Biagio
ac7b9c828f [X86][DAG] Disable target specific combine on INSERTPS dag nodes at -O0.
This patch disables target specific combine on X86ISD::INSERTPS dag nodes
if optlevel is CodeGenOpt::None.

The backend currently implements a target specific combine rule that converts
a vector load used by an INSERTPS dag node into a scalar load plus a
scalar_to_vector. This allows ISel to select a single INSERTPSrm instead of
two instructions (i.e. a vector load plus INSERTPSrr).

However, the existing target combine rule on INSERTPS nodes only works under
the assumption that ISel will always be able to match an INSERTPSrm. This is
not true in general at -O0, since the backend only allows folding a load into
the memory operand of an instruction if the optimization level is not
CodeGenOpt::None.

In the example below:

//
__m128 test(__m128 a, __m128 *b) {
  __m128 c = _mm_insert_ps(a, *b, 1 << 6);
  return c;
}
//

Before this patch, at -O0, the backend would have canonicalized the load to 'b'
into a scalar load plus scalar_to_vector. Later on, ISel would have selected an
INSERTPSrr leaving the insertps mask in an inconsistent state:

  movss 4(%rdi), %xmm1
  insertps  $64, %xmm1, %xmm0 # xmm0 = xmm1[1],xmm0[1,2,3].

With this patch, the backend avoids folding the vector load into the operand of
the INSERTPS. The new codegen at -O0 is:

  movaps (%rdi), %xmm1
  insertps  $64, %xmm1, %xmm0 # %xmm1[1],xmm0[1,2,3].


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226277 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 14:55:26 +00:00
Simon Pilgrim
3717f7c80c [X86] Refactored stack memory folding tests to explicitly force register spilling
The current 'big vectors' stack folded reload testing pattern is very bulky and makes it difficult to test all instructions as big vectors will tend to use only the ymm instruction implementations.

This patch changes the tests to use a nop call that lists explicit xmm registers as sideeffects, with this we can force a partial register spill of the relevant registers and then check that the reload is correctly folded. The asm generated only adds the forced spill, a nop instruction and a couple of extra labels (a fraction of the current approach).

More exhaustive tests will follow shortly, I've added some extra tests (the xmm versions of some of the existing folding tests) as a starting point.

Differential Revision: http://reviews.llvm.org/D6932



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226264 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 09:32:54 +00:00
Timur Iskhodzhanov
6a7c74de33 Revert r226242 - Revert Revert Don't create new comdats in CodeGen
This breaks AddressSanitizer (ninja check-asan) on Windows

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226251 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 08:38:45 +00:00
Hal Finkel
92cd0ca3b2 [PowerPC] Adjust PatchPoints for ppc64le
Bill Schmidt pointed out that some adjustments would be needed to properly
support powerpc64le (using the ELF V2 ABI). For one thing, R11 is not available
as a scratch register, so we need to use R12. R12 is also available under ELF
V1, so to maintain consistency, I flipped the order to make R12 the first
scratch register in the array under both ABIs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226247 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 04:40:58 +00:00
Rafael Espindola
dfe88a08c7 Revert "Revert Don't create new comdats in CodeGen"
This reverts commit r226173, adding r226038 back.

No change in this commit, but clang was changed to also produce trivial comdats for
costructors, destructors and vtables when needed.

Original message:

Don't create new comdats in CodeGen.

This patch stops the implicit creation of comdats during codegen.

Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226242 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 02:22:55 +00:00
Matt Arsenault
ab2315014e R600/SI: Add patterns for v_cvt_{flr|rpi}_i32_f32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226230 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 23:58:35 +00:00
Matt Arsenault
c204f47feb R600/SI: Fix trailing comma with modifiers
Instructions with 1 operand can still use source modifiers,
so make sure we don't print an extra comma afterwards.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226226 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 23:17:03 +00:00
Hal Finkel
94dc061e85 [PowerPC] Loosen ELFv1 PPC64 func descriptor loads for indirect calls
Function pointers under PPC64 ELFv1 (which is used on PPC64/Linux on the
POWER7, A2 and earlier cores) are really pointers to a function descriptor, a
structure with three pointers: the actual pointer to the code to which to jump,
the pointer to the TOC needed by the callee, and an environment pointer. We
used to chain these loads, and make them opaque to the rest of the optimizer,
so that they'd always occur directly before the call. This is not necessary,
and in fact, highly suboptimal on embedded cores. Once the function pointer is
known, the loads can be performed ahead of time; in fact, they can be hoisted
out of loops.

Now these function descriptors are almost always generated by the linker, and
thus the contents of the descriptors are invariant. As a result, by default,
we'll mark the associated loads as invariant (allowing them to be hoisted out
of loops). I've added a target feature to turn this off, however, just in case
someone needs that option (constructing an on-stack descriptor, casting it to a
function pointer, and then calling it cannot be well-defined C/C++ code, but I
can imagine some JIT-compilation system doing so).

Consider this simple test:
  $ cat call.c

  typedef void (*fp)();
  void bar(fp x) {
    for (int i = 0; i < 1600000000; ++i)
      x();
  }

  $ cat main.c

  typedef void (*fp)();
  void bar(fp x);
  void foo() {}
  int main() {
    bar(foo);
  }

On the PPC A2 (the BG/Q supercomputer), marking the function-descriptor loads
as invariant brings the execution time down to ~8 seconds from ~32 seconds with
the loads in the loop.

The difference on the POWER7 is smaller. Compiling with:

  gcc -std=c99 -O3 -mcpu=native call.c main.c : ~6 seconds [this is 4.8.2]

  clang -O3 -mcpu=native call.c main.c : ~5.3 seconds

  clang -O3 -mcpu=native call.c main.c -mno-invariant-function-descriptors : ~4 seconds
  (looks like we'd benefit from additional loop unrolling here, as a first
   guess, because this is faster with the extra loads)

The -mno-invariant-function-descriptors will be added to Clang shortly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226207 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 21:17:34 +00:00
Colin LeMahieu
02b677594c [Hexagon] Updating indexed load-extend patterns and changing test to new expected output.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226206 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 21:07:52 +00:00
Hal Finkel
39b09ae788 Revert "r226086 - Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers""
Reapply r226071 with fixes. Two fixes:

 1. We need to manually remove the old and create the new 'deaf defs'
    associated with physical register definitions when we move the definition of
    the physical register from the copy point to the point of the original vreg def.

    This problem was picked up by the machinstr verifier, and could trigger a
    verification failure on test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll, so I've
    turned on the verifier in the tests.

 2. When moving the def point of the phys reg up, we need to make sure that it
    is neither defined nor read in between the two instructions. We don't, however,
    extend the live ranges of phys reg defs to cover uses, so just checking for
    live-range overlap between the pair interval and the phys reg aliases won't
    pick up reads. As a result, we manually iterate over the range and check for
    reads.

    A test soon to be committed to the PowerPC backend will test this change.

Original commit message:

[RegisterCoalescer] Remove copies to reserved registers

This allows the RegisterCoalescer to join "non-flipped" range pairs with a
physical destination register -- which allows the RegisterCoalescer to remove
copies like this:

<vreg> = something (maybe a load, for example)
... (things that don't use PHYSREG)
PHYSREG = COPY <vreg>

(with all of the restrictions normally applied by the RegisterCoalescer: having
compatible register classes, etc. )

Previously, the RegisterCoalescer handled only the opposite case (copying
*from* a physical register). I don't handle the problem fully here, but try to
get the common case where there is only one use of <vreg> (the COPY).

An upcoming commit to the PowerPC backend will make this pattern much more
common on PPC64/ELF systems.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226200 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 20:32:09 +00:00
Matt Arsenault
ecbec418bd R600/SI: Improve fpext / fptrunc test coverage
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226197 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 19:39:42 +00:00
Marek Olsak
232d5fa02c R600/SI: Use 64-bit encoding by default for opcodes that are VOP3-only on VI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226190 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 18:43:01 +00:00
Ramkumar Ramachandra
4f158a708b statepoint tests: use statepoint-example gc
Mechanical conversion of statepoint tests to use the example-statepoint
gc.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226183 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 18:10:44 +00:00
Colin LeMahieu
044438aff5 [Hexagon] Deleting old float comparison instruction and updating references to new ones.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226179 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 17:28:14 +00:00
Colin LeMahieu
4ce3b1e4ce [Hexagon] Replacing old fadd/fsub instructions and updating references.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226176 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 16:30:07 +00:00
Timur Iskhodzhanov
d048b3be70 Revert Don't create new comdats in CodeGen
It breaks AddressSanitizer on Windows.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226173 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 16:14:34 +00:00
Daniel Sanders
cb71ef1b46 [mips] Fix a typo in the compare patterns for MIPS32r6/MIPS64r6.
Summary: The patterns intended for the SETLE node were actually matching the SETLT node.

Reviewers: atanasyan, sstankovic, vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6997

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226171 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 15:41:03 +00:00
Hal Finkel
f908e37144 Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers"
Reverting this while I investigate some bad behavior this is causing. As a
possibly-related issue, adding -verify-machineinstrs to one of the test cases
now fails because of this change:

  llc test/CodeGen/X86/2009-02-12-DebugInfoVLA.ll -march=x86-64 -o - -verify-machineinstrs

*** Bad machine code: No instruction at def index ***
- function:    foo
- basic block: BB#0 return (0x10007e21f10) [0B;736B)
- liverange:   [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78
4r,784d:0)  0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r
- register:    %DS
Valno #3 is defined at 624r

*** Bad machine code: Live segment doesn't end at a valid instruction ***
- function:    foo
- basic block: BB#0 return (0x10007e21f10) [0B;736B)
- liverange:   [128r,128d:9)[160r,160d:8)[176r,176d:7)[336r,336d:6)[464r,464d:5)[480r,480d:4)[624r,624d:3)[752r,752d:2)[768r,768d:1)[78
4r,784d:0)  0@784r 1@768r 2@752r 3@624r 4@480r 5@464r 6@336r 7@176r 8@160r 9@128r
- register:    %DS
[624r,624d:3)
LLVM ERROR: Found 2 machine code errors.

where 624r corresponds exactly to the interval combining change:

624B    %RSP<def> = COPY %vreg16; GR64:%vreg16
        Considering merging %vreg16 with %RSP
                RHS = %vreg16 [608r,624r:0)  0@608r
                updated: 608B   %RSP<def> = MOV64rm <fi#3>, 1, %noreg, 0, %noreg; mem:LD8[%saved_stack.1]
        Success: %vreg16 -> %RSP
        Result = %RSP

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226086 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 03:08:59 +00:00
Hal Finkel
47ab8c106f [RegisterCoalescer] Remove copies to reserved registers
This allows the RegisterCoalescer to join "non-flipped" range pairs with a
physical destination register -- which allows the RegisterCoalescer to remove
copies like this:

<vreg> = something (maybe a load, for example)
... (things that don't use PHYSREG)
PHYSREG = COPY <vreg>

(with all of the restrictions normally applied by the RegisterCoalescer: having
compatible register classes, etc. )

Previously, the RegisterCoalescer handled only the opposite case (copying
*from* a physical register). I don't handle the problem fully here, but try to
get the common case where there is only one use of <vreg> (the COPY).

An upcoming commit to the PowerPC backend will make this pattern much more
common on PPC64/ELF systems.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226071 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 01:25:28 +00:00
Philip Reames
8f9d11309a getMangledTypeStr: clarify how it mangles types, and add tests
"Write a set of tests that show how name mangling is done for overloaded intrinsics."  These happen to use gc.relocates to exercise the codepath in question, but is not a GC specific test.

Patch by: artagnon@gmail.com
Differential Revision: http://reviews.llvm.org/D6915



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226056 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 23:05:17 +00:00
Duncan P. N. Exon Smith
37ac8d3622 IR: Move MDLocation into place
This commit moves `MDLocation`, finishing off PR21433.  There's an
accompanying clang commit for frontend testcases.  I'll attach the
testcase upgrade script I used to PR21433 to help out-of-tree
frontends/backends.

This changes the schema for `DebugLoc` and `DILocation` from:

    !{i32 3, i32 7, !7, !8}

to:

    !MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8)

Note that empty fields (line/column: 0 and inlinedAt: null) don't get
printed by the assembly writer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226048 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 22:27:36 +00:00
Rafael Espindola
33f5127540 Don't create new comdats in CodeGen.
This patch stops the implicit creation of comdats during codegen.

Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226038 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 20:55:48 +00:00
Chandler Carruth
8af8091ef5 [MBP] Add flags to disable the BadCFGConflict check in MachineBlockPlacement.
Some benchmarks have shown that this could lead to a potential
performance benefit, and so adding some flags to try to help measure the
difference.

A possible explanation. In diamond-shaped CFGs (A followed by either
B or C both followed by D), putting B and C both in between A and
D leads to the code being less dense than it could be. Always either
B or C have to be skipped increasing the chance of cache misses etc.
Moving either B or C to after D might be beneficial on average.

In the long run, but we should probably do a better job of analyzing the
basic block and branch probabilities to move the correct one of B or
C to after D. But even if we don't use this in the long run, it is
a good baseline for benchmarking.

Original patch authored by Daniel Jasper with test tweaks and a second
flag added by me.

Differential Revision: http://reviews.llvm.org/D6969

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226034 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 20:19:29 +00:00
Bill Schmidt
11abe69e98 [PPC64] Add support for the ICBT instruction on POWER8.
Patch by Kit Barton.

Support for the ICBT instruction is currently present, but limited to
embedded processors. This change adds a new FeatureICBT that can be used
to identify whether the ICBT instruction is available on a specific processor.

Two new tests are added:
 * Positive test to ensure the icbt instruction is present when using
-mcpu=pwr8
 * Negative test to ensure the icbt instruction is not generated when
using -mcpu=pwr7

Both test cases use the Prefetch opcode in LLVM. They are based on the
ppc64-prefetch.ll test case.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226033 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 20:17:10 +00:00
Olivier Sallenave
735aa71398 Check that the TLI callback enableAggressiveFMAFusion has the desired effect on FMA folding.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225987 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 15:36:28 +00:00
Kai Nacke
92e28620d3 [mips] Refine octeon instructions seq/seqi/sne/snei
This commit refines the pattern for the octeon seq/seqi/sne/snei instructions.
The target register is set to 0 or 1 according to the result of the comparison.
In C, this is something like

rd = (unsigned long)(rs == rt)

This commit adds a zext to bring the result to i64. With this change the
instruction is selected for this type of code. (gcc produces the same code for
the above C code.)


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225968 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 10:19:09 +00:00
Brad Smith
f449c53c89 Use the integrated assembler by default on SPARC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225957 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 07:53:39 +00:00
JF Bastien
7f0cbb5703 Revert "Insert random noops to increase security against ROP attacks (llvm)"
This reverts commit:
http://reviews.llvm.org/D3392

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225948 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 05:24:33 +00:00
NAKAMURA Takumi
2a38522280 Disable a couple of tests, CodeGen/X86/noop-insert.ll and CodeGen/X86/noop-insert-percentage.ll, in r225908, to unbreak tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225940 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 04:21:33 +00:00
Tim Northover
09bec94c16 ARM: add test for crc32 instructions in CodeGen.
Somehow we seem to have ended up without any actual tests of the
CodeGen side. Easy enough to fix.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225930 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 01:43:33 +00:00
Hal Finkel
037c21f82c [PowerPC] Fix the noop-insert test
The form of nops used is CPU-specific (some CPUs, such as the POWER7, have
special group-terminating nops). We probably want a different callback for this
kind of nop insertion (something more like MCAsmBackend::writeNopData), or for
PPC to use a different mechanism for scheduling nops, but this will stop the
test from failing for now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225928 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 01:37:21 +00:00
Matt Arsenault
140c2ece1e R600/SI: Remove some redudant load testcases.
This reduces coverage for Evergreen, since the more
complete tests have those run lines disabled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225927 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 01:35:26 +00:00
Matt Arsenault
781f7ee502 R600/SI: Fix bad code with unaligned byte vector loads
Don't do the v4i8 -> v4f32 combine if the load will need to
be expanded due to alignment. This stops adding instructions
to repack into a single register that the v_cvt_ubyteN_f32
instructions read.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225926 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 01:35:22 +00:00
Matt Arsenault
8b6a26ca85 Implement new way of expanding extloads.
Now that the source and destination types can be specified,
allow doing an expansion that doesn't use an EXTLOAD of the
result type. Try to do a legal extload to an intermediate type
and extend that if possible.

This generalizes the special case custom lowering of extloads
R600 has been using to work around this problem.

This also happens to fix a bug that would incorrectly use more
aligned loads than should be used.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225925 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 01:35:17 +00:00
Hal Finkel
ade705c6e5 Revert "r225811 - Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support""
This re-applies r225808, fixed to avoid problems with SDAG dependencies along
with the preceding fix to ScheduleDAGSDNodes::RegDefIter::InitNodeNumDefs.
These problems caused the original regression tests to assert/segfault on many
(but not all) systems.

Original commit message:

This commit does two things:

 1. Refactors PPCFastISel to use more of the common infrastructure for call
    lowering (this lets us take advantage of this common code for lowering some
    common intrinsics, stackmap/patchpoint among them).

 2. Adds support for stackmap/patchpoint lowering. For the most part, this is
    very similar to the support in the AArch64 target, with the obvious differences
    (different registers, NOP instructions, etc.). The test cases are adapted
    from the AArch64 test cases.

One difference of note is that the patchpoint call sequence takes 24 bytes, so
you can't use less than that (on AArch64 you can go down to 16). Also, as noted
in the docs, we take the patchpoint address to be the actual code address
(assuming the call is local in the TOC-sharing sense), which should yield
higher performance than generating the full cross-DSO indirect-call sequence
and is likely just as useful for JITed code (if not, we'll change it).

StackMaps and Patchpoints are still marked as experimental, and so this support
is doubly experimental. So go ahead and experiment!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225909 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 01:07:51 +00:00
JF Bastien
21befa7761 Insert random noops to increase security against ROP attacks (llvm)
A pass that adds random noops to X86 binaries to introduce diversity with the goal of increasing security against most return-oriented programming attacks.

Command line options:
  -noop-insertion // Enable noop insertion.
  -noop-insertion-percentage=X // X% of assembly instructions will have a noop prepended (default: 50%, requires -noop-insertion)
  -max-noops-per-instruction=X // Randomly generate X noops per instruction. ie. roll the dice X times with probability set above (default: 1). This doesn't guarantee X noop instructions.

In addition, the following 'quick switch' in clang enables basic diversity using default settings (currently: noop insertion and schedule randomization; it is intended to be extended in the future).
  -fdiversify

This is the llvm part of the patch.
clang part: D3393

http://reviews.llvm.org/D3392
Patch by Stephen Crane (@rinon)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225908 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 01:07:26 +00:00
Reid Kleckner
504fa89c8e CodeGen support for x86_64 SEH catch handlers in LLVM
This adds handling for ExceptionHandling::MSVC, used by the
x86_64-pc-windows-msvc triple. It assumes that filter functions have
already been outlined in either the frontend or the backend. Filter
functions are used in place of the landingpad catch clause type info
operands. In catch clause order, the first filter to return true will
catch the exception.

The C specific handler table expects the landing pad to be split into
one block per handler, but LLVM IR uses a single landing pad for all
possible unwind actions. This patch papers over the mismatch by
synthesizing single instruction BBs for every catch clause to fill in
the EH selector that the landing pad block expects.

Missing functionality:
- Accessing data in the parent frame from outlined filters
- Cleanups (from __finally) are unsupported, as they will require
  outlining and parent frame access
- Filter clauses are unsupported, as there's no clear analogue in SEH

In other words, this is the minimal set of changes needed to write IR to
catch arbitrary exceptions and resume normal execution.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6300

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225904 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 01:05:27 +00:00
Adam Nemet
656da67bc0 [AVX512] Add 16x32 unpck tests as well
Forgot this from r225838.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225850 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 23:27:55 +00:00
Adam Nemet
f38f71d8a0 Fix function names in tests from r225838.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225840 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 22:40:15 +00:00
Adam Nemet
293f71ddd2 [AVX512] Unpack support in new shuffle lowering
This now handles both 32 and 64-bit element sizes.

In this version, the test are in vector-shuffle-512-v8.ll, canonicalized by
Chandler's update_llc_test_checks.py.

Part of <rdar://problem/17688758>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225838 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 22:20:18 +00:00
Matt Arsenault
8603a3d1c5 R600: Implement getRsqrtEstimate
Only do for f32 since I'm unclear on both what this is expecting
for the refinement steps in terms of accuracy, and what
f64 instruction actually provides.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225827 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 20:53:18 +00:00
Matt Arsenault
9e495c518c R600: Make cttz / ctlz cheap to speculate
Speculating things is generally good. SI+ has instructions for these
for 32-bit values. This is still probably better even with the expansion
for 64-bit values, although it is odd that this callback doesn't have
the size as a parameter.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225822 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 19:46:48 +00:00
Ulrich Weigand
81d2500685 Use the integrated assembler as default on SystemZ
This was already done in clang, this commit now uses the integrated
assembler as default when using LLVM tools directly.

A number of test cases deliberately using an invalid instruction in
inline asm now have to use -no-integrated-as.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225820 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 19:45:16 +00:00
Ulrich Weigand
5a4c26e7bc Use the integrated assembler as default on PowerPC
This was already done in clang, this commit now uses the integrated
assembler as default when using LLVM tools directly.

A number of test cases using inline asm had to be adapted, either by
updating the expected output, or by using -no-integrated-as (for such
tests that deliberately use an invalid instruction in inline asm).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225819 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 19:43:45 +00:00
Hal Finkel
ea55eceaed Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support"
Reverting this while I investiage buildbot failures (segfaulting in
GetCostForDef at ScheduleDAGRRList.cpp:314).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225811 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 18:25:05 +00:00
Hal Finkel
232f393466 [PowerPC] Add StackMap/PatchPoint support
This commit does two things:

 1. Refactors PPCFastISel to use more of the common infrastructure for call
    lowering (this lets us take advantage of this common code for lowering some
    common intrinsics, stackmap/patchpoint among them).

 2. Adds support for stackmap/patchpoint lowering. For the most part, this is
    very similar to the support in the AArch64 target, with the obvious differences
    (different registers, NOP instructions, etc.). The test cases are adapted
    from the AArch64 test cases.

One difference of note is that the patchpoint call sequence takes 24 bytes, so
you can't use less than that (on AArch64 you can go down to 16). Also, as noted
in the docs, we take the patchpoint address to be the actual code address
(assuming the call is local in the TOC-sharing sense), which should yield
higher performance than generating the full cross-DSO indirect-call sequence
and is likely just as useful for JITed code (if not, we'll change it).

StackMaps and Patchpoints are still marked as experimental, and so this support
is doubly experimental. So go ahead and experiment!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225808 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 17:48:12 +00:00
Jozef Kolek
abdc0284ff [mips][microMIPS] Fix issue with 16b instructions in jr instruction delay slot
16 bit instructions are not allowed in jr delay slot. Same stands for
PseudoIndirectBranch and PseudoReturn.

Differential Revision: http://reviews.llvm.org/D6815


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225798 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 15:59:17 +00:00
Reid Kleckner
d8f69a7201 Rename llvm.recoverframeallocation to llvm.framerecover
This name is less descriptive, but it sort of puts things in the
'llvm.frame...' namespace, relating it to frameallocate and
frameaddress. It also avoids using "allocate" and "allocation" together.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225752 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 01:51:34 +00:00
Reid Kleckner
221a7075cf Add the llvm.frameallocate and llvm.recoverframeallocation intrinsics
These intrinsics allow multiple functions to share a single stack
allocation from one function's call frame. The function with the
allocation may only perform one allocation, and it must be in the entry
block.

Functions accessing the allocation call llvm.recoverframeallocation with
the function whose frame they are accessing and a frame pointer from an
active call frame of that function.

These intrinsics are very difficult to inline correctly, so the
intention is that they be introduced rarely, or at least very late
during EH preparation.

Reviewers: echristo, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D6493

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225746 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 00:48:10 +00:00
Matt Arsenault
29ad7506e1 Combine fcmp + select to fminnum / fmaxnum if no nans and legal
Also require unsafe FP math for no since there isn't a way to
test for signed zeros.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225744 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 00:43:00 +00:00
Reid Kleckner
1ec250a32f musttail: Only set the inreg flag for fastcall and vectorcall
Otherwise we'll attempt to forward ECX, EDX, and EAX for cdecl and
stdcall thunks, leaving us with no scratch registers for indirect call
targets.

Fixes PR22052.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225729 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-12 23:28:23 +00:00
Adrian Prantl
f89325d832 Debug info: Factor out the creation of DWARF expressions from AsmPrinter
into a new class DwarfExpression that can be shared between AsmPrinter
and DwarfUnit.

This is the first step towards unifying the two entirely redundant
implementations of dwarf expression emission in DwarfUnit and AsmPrinter.

Almost no functional change — Testcases were updated because asm comments
that used to be on two lines now appear on the same line, which is
actually preferable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225706 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-12 22:19:22 +00:00
Ahmed Bougacha
cd5bbd8bad [X86] Also create+widen FMIN/FMAX nodes for v2f32.
This happens in the HINT benchmark, where the SLP-vectorizer created
v2f32 fcmp/select code.  The "correct" solution would have been to
teach the vectorizer cost model that v2f32 isn't legal (because really,
it isn't), but if we can vectorize we might as well do so.

We legalize these v2f32 FMIN/FMAX nodes by widening to v4f32 later on.
v3f32 were already widened to v4f32 by the generic unroll-and-build-vector
legalization.

rdar://15763436
Differential Revision: http://reviews.llvm.org/D6557


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225691 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-12 20:31:30 +00:00
Ahmed Bougacha
5316023a4e [X86] Make SSE min/max testcases more explicit. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225687 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-12 20:15:47 +00:00
Tom Stellard
d275e025d2 R600/SI: Use RegisterOperands to specify which operands can accept immediates
There are some operands which can take either immediates or registers
and we were previously using different register class to distinguish
between operands that could take immediates and those that could not.

This patch switches to using RegisterOperands which should simplify the
backend by reducing the number of register classes and also make it
easier to implement the assembler.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225662 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-12 19:33:18 +00:00
Hal Finkel
b6bb7db62b [PowerPC] Fix calls to non-function objects
Looking at r225438 inspired me to see how the PowerPC backend handled the
situation (calling a bitcasted TLS global), and it turns out we also produced
an error (cannot select ...). What it means to "call" something that is not a
function is implementation and platform specific, but in the name of doing
something (besides crashing), this makes sure we do what GCC does (treat all
such calls as calls through a function pointer -- meaning that the pointer is
assumed, as is the convention on PPC, to point to a function descriptor
structure holding the actual code address along with the function's TOC pointer
and environment pointer). As GCC does, we now do the same for calling regular
(non-TLS) non-function globals too.

I'm not sure whether this is the most useful way to define the behavior, but at
least we won't be alone.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225617 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-12 04:34:47 +00:00
David Majnemer
85a0cb9bf2 Revert most of r225597
We can't rely on a DataLayout enlightened constant folder.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225599 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-11 07:29:51 +00:00
David Majnemer
d2f4460ee7 X86: Properly decode shuffle masks when the constant pool type is weird
It's possible for the constant pool entry for the shuffle mask to come
from a completely different operation.  This occurs when Constants have
the same bit pattern but have different types.

Make DecodePSHUFBMask tolerant of types which, after a bitcast, are
appropriately sized vector types.

This fixes PR22188.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225597 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-11 05:08:57 +00:00
Saleem Abdulrasool
776673ea09 X86: teach X86TargetLowering about L,M,O constraints
Teach the ISelLowering for X86 about the L,M,O target specific constraints.
Although, for the moment, clang performs constraint validation and prevents
passing along inline asm which may have immediate constant constraints violated,
the backend should be able to cope with the invalid inline asm a bit better.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225596 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-11 04:39:24 +00:00
Chandler Carruth
561088eb5d [x86] Remove some windows line endings that snuck into the tests here.
Folks on Windows, remember to set up your subversion to strip these when
submitting...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225593 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-11 01:36:20 +00:00
Sanjoy Das
7f0da20b97 Fix PR22179.
We were incorrectly inferring nsw for certain SCEVs. We can be more
aggressive here (see Richard Smith's comment on
http://llvm.org/bugs/show_bug.cgi?id=22179) but this change just
focuses on correctness.

Differential Revision: http://reviews.llvm.org/D6914



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225591 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-10 23:41:24 +00:00
Simon Pilgrim
47abf0e3da [X86][SSE] Improved (v)insertps shuffle matching
In the current code we only attempt to match against insertps if we have exactly one element from the second input vector, irrespective of how much of the shuffle result is zeroable.

This patch checks to see if there is a single non-zeroable element from either input that requires insertion. It also supports matching of cases where only one of the inputs need to be referenced.

We also split insertps shuffle matching off into a new lowerVectorShuffleAsInsertPS function.

Differential Revision: http://reviews.llvm.org/D6879



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225589 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-10 19:45:33 +00:00
Hal Finkel
9ae5b7a40a [PowerPC] Mark zext of a small scalar load as free
This initial implementation of PPCTargetLowering::isZExtFree marks as free
zexts of small scalar loads (that are not sign-extending). This callback is
used by SelectionDAGBuilder's RegsForValue::getCopyToRegs, and thus to
determine whether a zext or an anyext is used to lower illegally-typed PHIs.
Because later truncates of zero-extended values are nops, this allows for the
elimination of later unnecessary truncations.

Fixes the initial complaint associated with PR22120.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225584 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-10 08:21:59 +00:00
Justin Hibbits
1c6936f6d7 Fully fix Bug #22115.
Summary:
In the previous commit, the register was saved, but space was not allocated.
This resulted in the parameter save area potentially clobbering r30, leading to
nasty results.

Test Plan: Tests updated

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6906

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225573 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-10 01:57:21 +00:00
Simon Pilgrim
34630b6ea9 [X86][SSE] Avoid vector byte shuffles with zero by using pshufb to create zeros
pshufb can shuffle in zero bytes as well as bytes from a source vector - we can use this to avoid having to shuffle 2 vectors and ORing the result when the used inputs from a vector are all zeroable.

Differential Revision: http://reviews.llvm.org/D6878



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225551 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-09 22:03:19 +00:00
Daniel Sanders
8d7b0bdcf0 [mips] Add support for accessing $gp as a named register.
Summary:
Mips Linux uses $gp to hold a pointer to thread info structure and accesses it
with a named register. This makes this work for LLVM.

The N32 ABI doesn't quite work yet since the frontend generates incorrect IR
for this case. It neglects to truncate the 64-bit GPR to a 32-bit value before
converting to a pointer. Given correct IR (as in the testcase in this patch),
it works correctly.

Reviewers: sstankovic, vmedic, atanasyan

Reviewed By: atanasyan

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6893

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225529 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-09 17:21:30 +00:00
Matthias Braun
c41acffe22 RegisterCoalescer: Fix removeCopyByCommutingDef with subreg liveness
The code that eliminated additional coalescable copies in
removeCopyByCommutingDef() used MergeValueNumberInto() which internally
may merge A into B or B into A. In this case A and B had different Def
points, so we have to reset ValNo.Def to the intended one after merging.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225503 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-09 03:01:31 +00:00
Hal Finkel
4e98296890 [PowerPC] Fold [sz]ext with fp_to_int lowering where possible
On modern cores with lfiw[az]x, we can fold a sign or zero extension from i32
to i64 into the load necessary for an i64 -> fp conversion.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225493 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-09 01:34:30 +00:00
Hal Finkel
b7c01bf403 [PowerPC] Mark all instructions as non-cheap for MachineLICM
MachineLICM uses a callback named hasLowDefLatency to determine if an
instruction def operand has a 'low' latency. If all relevant operands have a
'low' latency, the instruction is considered too cheap to hoist out of loops
even in low-register-pressure situations. On PowerPC cores, both the embedded
cores and the others, there is no reason to believe that this is a good choice:
all instructions have a cost inside a loop, and hoisting them when not limited
by register pressure is a reasonable default.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225471 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-08 22:11:49 +00:00
Akira Hatanaka
40cd57eb5c [ARM] Fix a bug in constant island pass that was triggering an assertion.
The assert was being triggered when the distance between a constant pool entry
and its user exceeded the maximally allowed distance after thumb2 branch
shortening. A padding was inserted after a thumb2 branch instruction was shrunk,
which caused the user to be out of range. This is wrong as the padding should
have been inserted by the layout algorithm so that the distance between two
instructions doesn't grow later during thumb2 instruction optimization.

This commit fixes the code in ARMConstantIslands::createNewWater to call
computeBlockSize and set BasicBlock::Unalign when a branch instruction is
inserted to create new water after a basic block. A non-zero Unalign causes
the worst-case padding to be inserted when adjustBBOffsetsAfter is called to
recompute the basic block offsets.

rdar://problem/19130476


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225467 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-08 20:44:50 +00:00
Justin Hibbits
77e85a150c Add saving and restoring of r30 to the prologue and epilogue, respectively
Summary: The PIC additions didn't update the prologue and epilogue code to save and restore r30 (PIC base register).  This does that.

Test Plan: Tests updated.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6876

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225450 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-08 15:47:19 +00:00
Kristof Beyls
d1cee9b3bc Fix large stack alignment codegen for ARM and Thumb2 targets
This partially fixes PR13007 (ARM CodeGen fails with large stack
alignment): for ARM and Thumb2 targets, but not for Thumb1, as it
seems stack alignment for Thumb1 targets hasn't been supported at
all.

Producing an aligned stack pointer is done by zero-ing out the lower
bits of the stack pointer. The BIC instruction was used for this.
However, the immediate field of the BIC instruction only allows to
encode an immediate that can zero out up to a maximum of the 8 lower
bits. When a larger alignment is requested, a BIC instruction cannot
be used; llvm was silently producing incorrect code in this case.

This commit fixes code generation for large stack aligments by
using the BFC instruction instead, when the BFC instruction is
available.  When not, it uses 2 instructions: a right shift,
followed by a left shift to zero out the lower bits.

The lowering of ARM::Int_eh_sjlj_dispatchsetup still has code
that unconditionally uses BIC to realign the stack pointer, so it
very likely has the same problem. However, I wasn't able to
produce a test case for that. This commit adds an assert so that
the compiler will fail the assert instead of silently generating
wrong code if this is ever reached.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225446 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-08 15:09:14 +00:00
Tom Stellard
9a6e4f08fe R600/SI: Remove SIISelLowering::legalizeOperands()
Its functionality has been replaced by calling
SIInstrInfo::legalizeOperands() from
SIISelLowering::AdjstInstrPostInstrSelection() and running the
SIFoldOperands and SIShrinkInstructions passes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225445 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-08 15:08:17 +00:00
Elena Demikhovsky
6e8b53da17 Masked Load/Store - fixed a bug in type legalization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225441 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-08 12:29:19 +00:00
Michael Kuperstein
477eba5f81 Fix a think-o in the test for r225438.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225440 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-08 12:05:02 +00:00
Michael Kuperstein
0858c28ca8 [X86] Don't try to generate direct calls to TLS globals
The call lowering assumes that if the callee is a global, we want to emit a direct call.
This is correct for regular globals, but not for TLS ones.

Differential Revision: http://reviews.llvm.org/D6862

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225438 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-08 11:50:58 +00:00
Craig Topper
cb964a5c58 Fix test case I missed in r225432.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225434 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-08 07:57:27 +00:00
Quentin Colombet
9d60e0ff0a [RegAllocGreedy] Introduce a late pass to repair broken hints.
A broken hint is a copy where both ends are assigned different colors. When a
variable gets evicted in the neighborhood of such copies, it is likely we can
reconcile some of them.


** Context **

Copies are inserted during the register allocation via splitting. These split
points are required to relax the constraints on the allocation problem. When
such a point is inserted, both ends of the copy would not share the same color
with respect to the current allocation problem. When variables get evicted,
the allocation problem becomes different and some split point may not be
required anymore. However, the related variables may already have been colored.

This usually shows up in the assembly with pattern like this:
def A
...
save A to B
def A
use A
restore A from B
...
use B

Whereas we could simply have done:
def B
...
def A
use A
...
use B


** Proposed Solution **

A variable having a broken hint is marked for late recoloring if and only if
selecting a register for it evict another variable. Indeed, if no eviction
happens this is pointless to look for recoloring opportunities as it means the
situation was the same as the initial allocation problem where we had to break
the hint.

Finally, when everything has been allocated, we look for recoloring
opportunities for all the identified candidates.
The recoloring is performed very late to rely on accurate copy cost (all
involved variables are allocated).
The recoloring is simple unlike the last change recoloring. It propagates the
color of the broken hint to all its copy-related variables. If the color is
available for them, the recoloring uses it, otherwise it gives up on that hint
even if a more complex coloring would have worked.

The recoloring happens only if it is profitable. The profitability is evaluated
using the expected frequency of the copies of the currently recolored variable
with a) its current color and b) with the target color. If a) is greater or
equal than b), then it is profitable and the recoloring happen.


** Example **

Consider the following example:
BB1:
  a =
  b =
BB2:
  ...
   = b
   = a
Let us assume b gets split:
BB1:
  a =
  b =
BB2:
  c = b
  ...
  d = c
  = d
  = a
Because of how the allocation work, b, c, and d may be assigned different
colors. Now, if a gets evicted to make room for c, assuming b and d were
assigned to something different than a.
We end up with:
BB1:
  a =
  st a, SpillSlot
  b =
BB2:
  c = b
  ...
  d = c
  = d
  e = ld SpillSlot
  = e
This is likely that we can assign the same register for b, c, and d,
getting rid of 2 copies.


** Performances **

Both ARM64 and x86_64 show performance improvements of up to 3% for the
llvm-testsuite + externals with Os and O3. There are a few regressions too that
comes from the (in)accuracy of the block frequency estimate.

<rdar://problem/18312047>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225422 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-08 01:16:39 +00:00
Matthias Braun
a065cf13cd RegisterCoalescer: Fix valuesIdentical() in some subrange merge cases.
I got confused and assumed SrcIdx/DstIdx of the CoalescerPair is a
subregister index in SrcReg/DstReg, but they are actually subregister
indices of the coalesced register that get you back to SrcReg/DstReg
when applied.

Fixed the bug, improved comments and simplified code accordingly.

Testcase by Tom Stellard!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225415 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 23:58:38 +00:00
Philip Reames
a7f8f932a6 [GC] improve testing around gc.relocate and fix a test
Patch by: Ramkumar Ramachandra <artagnon@gmail.com>

"This patch started out as an exploration of gc.relocate, and an attempt
to write a simple test in call-lowering. I then noticed that the
arguments of gc.relocate were not checked fully, so I went in and fixed
a few things. Finally, the most important outcome of this patch is that
my new error handling code caught a bug in a callsite in
stackmap-format."

Differential Revision: http://reviews.llvm.org/D6824



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225412 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 22:48:01 +00:00
Tom Stellard
a36b682c17 R600/SI: Commute instructions to enable more folding opportunities
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225410 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 22:44:19 +00:00
Tom Stellard
a3ee583339 R600/SI: Only fold immediates that have one use
Folding the same immediate into multiple instruction will increase
program size, which can hurt performance.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225405 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 22:18:27 +00:00
Olivier Sallenave
033a537a84 More FMA folding opportunities.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225380 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 20:54:17 +00:00
Tom Stellard
f7587043ef R600/SI: Add a V_MOV_B64 pseudo instruction
This is used to simplify the SIFoldOperands pass and make it easier to
fold immediates.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225373 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 20:27:25 +00:00
Tom Stellard
546520a727 R600/SI: Teach SIFoldOperands to split 64-bit constants when folding
This allows folding of sequences like:

s[0:1] = s_mov_b64 4
v_add_i32 v0, s0, v0
v_addc_u32 v1, s1, v1

into

v_add_i32 v0, 4, v0
v_add_i32 v1, 0, v1

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225369 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 19:56:17 +00:00
Philip Reames
28fa9e1e9f Introduce an example statepoint GC strategy
This change includes the most basic possible GCStrategy for a GC which is using the statepoint lowering code. At the moment, this GCStrategy doesn't really do much - aside from actually generate correct stackmaps that is - but I went ahead and added a few extra correctness checks as proof of concept. It's mostly here to provide documentation on how to do one, and to provide a point for various optimization legality hooks I'd like to add going forward. (For context, see the TODOs in InstCombine around gc.relocate.)

Most of the validation logic added here as proof of concept will soon move in to the Verifier.  That move is dependent on http://reviews.llvm.org/D6811

There was discussion in the review thread about addrspace(1) being reserved for something.  I'm going to follow up on a seperate llvmdev thread.  If needed, I'll update all the code at once.

Note that I am deliberately not making a GCStrategy required to use gc.statepoints with this change. I want to give folks out of tree - including myself - a chance to migrate. In a week or two, I'll make having a GCStrategy be required for gc.statepoints. To this end, I added the gc tag to one of the test cases but not others.

Differential Revision: http://reviews.llvm.org/D6808



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225365 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 19:07:50 +00:00
David Majnemer
60812c05e7 X86: Allow the stack probe size to be configurable per function
LLVM emits stack probes on Windows targets to ensure that the stack is
correctly accessed.  However, the amount of stack allocated before
emitting such a probe is hardcoded to 4096.

It is desirable to have this be configurable so that a function might
opt-out of stack probes.  Our level of granularity is at the function
level instead of, say, the module level to permit proper generation of
code after LTO.

Patch by Andrew H!

N.B.  The inliner needs to be updated to properly consider what happens
after inlining a function with a specific stack-probe-size into another
function with a different stack-probe-size.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225360 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 18:14:07 +00:00
Ahmed Bougacha
d412c608fc [X86] Teach FCOPYSIGN lowering to recognize constant magnitudes.
For code like:
    float foo(float x) { return copysign(1.0, x); }
We used to generate:
    andps  <-0.000000e+00,0,0,0>, %xmm0
    movss  <1.000000e+00>, %xmm1
    andps  <nan>, %xmm1
    orps   %xmm0, %xmm1
Basically doing an abs(1.0f) in the two middle instructions.

We now generate:
    andps  <-0.000000e+00,0,0,0>, %xmm0
    orps   <1.000000e+00,0,0,0>, %xmm0

Builds on cleanups r223415, r223542.
rdar://19049548
Differential Revision: http://reviews.llvm.org/D6555


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225357 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 17:33:03 +00:00
Charlie Turner
7fbbc81d65 [ARM] Add missing Tag_DIV_use tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225348 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 11:37:40 +00:00
Karthik Bhat
f2b3638c3d Revert r225165 and r225169
Even thouh gcc produces simialr instructions as Owen pointed out the two patterns aren’t equivalent in the case
where the original subtraction could have caused an overflow.
Reverting the same.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225341 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 06:34:34 +00:00
Matt Arsenault
6a72b20325 R600/SI: Add combine for isinfinite pattern
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225310 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 23:00:46 +00:00
Matt Arsenault
42d9f7cf0a R600/SI: Pattern match isinf to v_cmp_class instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225307 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 23:00:41 +00:00
Matt Arsenault
a5b2b64292 R600/SI: Add basic DAG combines for fp_class
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225306 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 23:00:39 +00:00
Matt Arsenault
b6520ab625 R600/SI: Add class intrinsic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225305 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 23:00:37 +00:00
Rafael Espindola
f907a26bc2 Change the .ll syntax for comdats and add a syntactic sugar.
In order to make comdats always explicit in the IR, we decided to make
the syntax a bit more compact for the case of a GlobalObject in a
comdat with the same name.

Just dropping the $name causes problems for

@foo = globabl i32 0, comdat
$bar = comdat ...

and

declare void @foo() comdat
$bar = comdat ...

So the syntax is changed to

@g1 = globabl i32 0, comdat($c1)
@g2 = globabl i32 0, comdat

and

declare void @foo() comdat($c1)
declare void @foo() comdat

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225302 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 22:55:16 +00:00
Hal Finkel
8e9ba0e588 [PowerPC] Reuse a load operand in int->fp conversions
int->fp conversions on PPC must be done through memory loads and stores. On a
modern core, this process begins by storing the int value to memory, then
loading it using a (sometimes special) FP load instruction. Unfortunately, we
would do this even when the value to be converted was itself a load, and we can
just use that same memory location instead of copying it to another first.
There is a slight complication when handling int_to_fp(fp_to_int(x)) pairs,
because the fp_to_int operand has not been lowered when the int_to_fp is being
lowered. We handle this specially by invoking fp_to_int's lowering logic
(partially) and getting the necessary memory location (some trivial refactoring
was done to make this possible).

This is all somewhat ugly, and it would be nice if some later CodeGen stage
could just clean this stuff up, but because doing so would involve modifying
target-specific nodes (or instructions), it is not immediately clear how that
would work.

Also, remove a related entry from the README.txt for which we now generate
reasonable code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225301 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 22:31:02 +00:00
Tom Stellard
bac89f3dd2 R600/SI: Insert s_waitcnt before s_barrier instructions.
This ensures that all memory operations are complete when all threads
reach the barrier.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225290 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 19:52:07 +00:00
Tom Stellard
1f996fa36b R600/SI: Add a stub GCNTargetMachine
This is equivalent to the AMDGPUTargetMachine now, but it is the
starting point for separating R600 and GCN functionality into separate
targets.

It is recommened that users start using the gcn triple for GCN-based
GPUs, because using the r600 triple for these GPUs will be deprecated in
the future.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225277 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 18:00:21 +00:00
Andrea Di Biagio
e46783d5b7 [CodeGenPrepare] Improved logic to speculate calls to cttz/ctlz.
This patch improves the logic added at revision 224899 (see review D6728) that
teaches the backend when it is profitable to speculate calls to cttz/ctlz.

The original algorithm conservatively avoided speculating more than one
instruction from a basic block in a control flow grap modelling an if-statement.
In particular, the only allowed instruction (excluding the terminator) was a
call to cttz/ctlz. However, there are cases where we could be less conservative
and still be able to speculate a call to cttz/ctlz.

With this patch, CodeGenPrepare now tries to speculate a cttz/ctlz if the
result is zero extended/truncated in the same basic block, and the zext/trunc
instruction is "free" for the target.

Added new test cases to CodeGen/X86/cttz-ctlz.ll

Differential Revision: http://reviews.llvm.org/D6853


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225274 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 17:41:18 +00:00
Hal Finkel
15914b5c22 [PowerPC] Add a regression test for r225251
In r225251, I removed an old entry from the README.txt file. While there are
several contributing factors (including pieces in Clang's ABI code), upon
further reflection, the backend part deserves a regression test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225268 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 16:46:37 +00:00
Colin LeMahieu
a24e012976 [Hexagon] Adding dealloc_return encoding and absolute address stores.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225267 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 16:15:15 +00:00
David Majnemer
b3065539bd X86: Don't make illegal GOTTPOFF relocations
"ELF Handling for Thread-Local Storage" specifies that R_X86_64_GOTTPOFF
relocation target a movq or addq instruction.

Prohibit the truncation of such loads to movl or addl.

This fixes PR22083.

Differential Revision: http://reviews.llvm.org/D6839

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225250 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 07:12:52 +00:00
Hal Finkel
10ae865847 [PowerPC] Improve int_to_fp(fp_to_int(x)) combining
The old target DAG combine that allowed for performing int_to_fp(fp_to_int(x))
without a load/store pair is updated here with support for unsigned integers,
and to support single-precision values without a third rounding step, on newer
cores with the appropriate instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225248 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 06:01:57 +00:00