llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2024-12-16 11:30:51 +00:00

Author	SHA1	Message	Date
Eric Christopher	ae6fc14d54	Remove the bare getSubtargetImpl call from the AArch64 port. As part of this add a test that shows we can generate code for functions that specifically enable a subtarget feature. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232884 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-21 04:04:50 +00:00
Eric Christopher	bc473edd7b	Remove the bare getSubtargetImpl call from the PPC port. As part of this add a test that shows we can generate code with for functions that differ by subtarget feature. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232882 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-21 03:36:02 +00:00
Eric Christopher	6f125f52d3	Cache the Function dependent subtarget on the MachineFunction. As preparation for removing the getSubtargetImpl() call from TargetMachine go ahead and flip the switch on caching the function dependent subtarget and remove the bare getSubtargetImpl call from the X86 port. As part of this add a few tests that show we can generate code and assemble on X86 based on features/cpu on the Function. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232879 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-21 03:13:10 +00:00
Ahmed Bougacha	995f4f8fd1	[CodeGen][IfCvt] Don't re-ifcvt blocks with unanalyzable terminators. If we couldn't analyze its terminator (i.e., it's an indirectbr, or some other weirdness), we can't safely re-if-convert a predicated block, because we can't tell whether the predicated terminator can fallthrough (it does). Currently, we would completely ignore the fallthrough successor. In the added testcase, this means we used to generate: ... @ %entry: cmp r5, #21 ittt ne @ %cc1f: cmpne r7, #42 @ %cc2t: strne.w r5, [r8] movne pc, r10 @ %cc1t: ... Whereas the successor of %cc1f was originally %bb1. With the fix, we get the correct: ... @ %entry: cmp r5, #21 itt eq @ %cc1t: streq.w r5, [r11] moveq pc, r0 @ %cc1f: cmp r7, #42 itt ne @ %cc2t: strne.w r5, [r8] movne pc, r10 @ %bb1: ... rdar://20192768 Differential Revision: http://reviews.llvm.org/D8509 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232872 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-21 01:23:15 +00:00
Ahmed Bougacha	165bd1733b	[AArch64] Prefer UZP for concat_vector of illegal truncs. Follow-up to r232459: prefer a UZP shuffle to the intermediate truncs. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232871 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-21 01:08:39 +00:00
Andrew Kaylor	e0e1c1d94d	Fixing a bug with WinEH PHI handling git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232851 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-20 21:42:54 +00:00
Sanjay Patel	39110ecd35	[X86] Prefer blendps over insertps codegen for one special case With this patch, for this one exact case, we'll generate: blendps %xmm0, %xmm1, $1 instead of: insertps %xmm0, %xmm1, $0 If there's a memory operand available for load folding and we're optimizing for size, we'll still generate the insertps. The detailed performance data motivation for this may be found in D7866; in summary, blendps has 2-3x throughput vs. insertps on widely used chips. Differential Revision: http://reviews.llvm.org/D8332 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232850 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-20 21:19:52 +00:00
Rafael Espindola	d80979b25d	Don't declare all text sections at the start of the .s The code this patch removes was there to make sure the text sections went before the dwarf sections. That is necessary because MachO uses offsets relative to the start of the file, so adding a section can change relaxations. The dwarf sections were being printed at the start just to produce symbols pointing at the start of those sections. The underlying issue was fixed in r231898. The dwarf sections are now printed when they are about to be used, which is after we printed the text sections. To make sure we don't regress, the patch makes the MachO streamer assert if CodeGen puts anything unexpected after the DWARF sections. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232842 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-20 20:00:01 +00:00
John Brawn	151a5da534	[ARM] Fix handling of thumb1 out-of-range frame offsets LocalStackSlotPass assumes that isFrameOffsetLegal doesn't change its answer when the base register changes. Unfortunately this isn't true in thumb1, where SP-based loads allow a larger offset than non-SP-based loads, and this causes the base register reuse code to generate instructions that are unencodable, causing an assertion failure. Solve this by adding a BaseReg parameter to isFrameOffsetLegal, which ARMBaseRegisterInfo can then make use of to give the correct answer. Differential Revision: http://reviews.llvm.org/D8419 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232825 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-20 17:20:07 +00:00
Daniel Jasper	70b146b25e	[MBP] Don't outline short optional branches With the option -outline-optional-branches, LLVM will place optional branches out of line (more details on r231230). With this patch, this is not done for short optional branches. A short optional branch is a branch containing a single block with an instruction count below a certain threshold (defaulting to 3). Still everything is guarded under -outline-optional-branches). Outlining a short branch can't significantly improve code locality. It can however decrease performance because of the additional jmp and in cases where the optional branch is hot. This fixes a compile time regression I have observed in a benchmark. Review: http://reviews.llvm.org/D8108 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232802 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-20 10:00:37 +00:00
Tom Stellard	4aee931a46	R600/SI: Add missing CHECK-LABEL lines to a test git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232797 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-20 03:12:42 +00:00
Owen Anderson	8154ef7589	Fix a nasty bug in DAGCombine of STORE nodes. This is very related to the bug fixed in r174431. The problem is that SelectionDAG does not include alignment in the uniquing of loads and stores. When an otherwise no-op DAGCombine would increase the alignment of a load or store, the original node would be returned (with the alignment increased), which would cause the node not to be processed by any further DAGCombines. I don't have a direct testcase for this that manifests on an in-tree target, but I did see some noise in the tests for other targets and have updated them for it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232780 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-19 22:48:57 +00:00
Reid Kleckner	c39212a2fc	WinEH: Make llvm.eh.actions emission match the EH docs This switches the sense of the i32 values and updates the test cases. We can also use CHECK-SAME to clean up some tests, and reduce the visual noise from bitcasts. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232774 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-19 22:31:02 +00:00
Sanjay Patel	11d77223a5	[X86, AVX] use blends instead of insert128 with index 0 Another case of x86-specific shuffle strength reduction: avoid generating insert*128 instructions with index 0 because they are slower than their non-lane-changing blend equivalents. Shuffle lowering already catches most of these cases, but the zero vector case and some other paths such as in the modified test in vector-shuffle-256-v32.ll were getting through. Differential Revision: http://reviews.llvm.org/D8366 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232773 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-19 22:29:40 +00:00
Krzysztof Parzyszek	8962c01fbf	Unxfail test/CodeGen/Generic/vector.ll now passing on Hexagon git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232758 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-19 20:22:17 +00:00
Artem Belevich	97f4d01ee1	Add support for __nvvm_reflect changes in libdevice in CUDA-7.0 Summary: CUDA 7.0's libdevice uses slightly different IR to call __nvvm_reflect and that triggers an assertion in nvvm_reflect optimization pass. This change allows nvvm_reflect pass to deal with both old and new ways to pass an argument to __nvvm_reflect. Test Plan: ninja check-all Reviewers: eliben, echristo Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8399 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232732 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-19 17:05:35 +00:00
Krzysztof Parzyszek	07121ea974	[Hexagon] Add support for vector instructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232728 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-19 16:33:08 +00:00
Rafael Espindola	2c275b1f80	Note that we don't support COFF on PPC. Should bring back the windows bots. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232701 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-19 02:40:56 +00:00
Simon Pilgrim	4c38456ead	Fixed failing test due to missing target triple causing different results on different buildbots. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232685 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 22:51:45 +00:00
Rafael Espindola	b354ef31cf	Teach getDefaultFormat that we only support ELF on some architectures. This should bring the windows bots back. It is a bit ugly, but it is better than what we had before: The triple would say that the object format was COFF, but llc/llvm-mc would produce an ELF. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232683 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 22:19:16 +00:00
Simon Pilgrim	ab18d0e7cb	[X86][SSE] Avoid scalarization of v2i64 vector shifts (REAPPLIED) Fixed broken tests. Differential Revision: http://reviews.llvm.org/D8416 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232682 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 22:18:51 +00:00
Eric Christopher	3932b367d7	Revert "[X86][SSE] Avoid scalarization of v2i64 vector shifts" as it appears to have broken tests/bots. This reverts commit r232660. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232670 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 21:01:00 +00:00
Reid Kleckner	01a1af4fe4	Use WinEHPrepare to outline SEH finally blocks No outlining is necessary for SEH catch blocks. Use the blockaddr of the handler in place of the usual outlined function. Reviewers: majnemer, andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D8370 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232664 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 20:26:53 +00:00
Simon Pilgrim	0ee70a1554	[X86][SSE] Avoid scalarization of v2i64 vector shifts Currently v2i64 vectors shifts (non-equal shift amounts) are scalarized, costing 4 x extract, 2 x x86-shifts and 2 x insert instructions - and it gets even more awkward on 32-bit targets. This patch separately shifts the vector by both shift amounts and then shuffles the partial results back together, costing 2 x shuffles and 2 x sse-shifts instructions (+ 2 movs on pre-AVX hardware). Note - this patch only improves the SHL / LSHR logical shifts as only these are supported in SSE hardware. Differential Revision: http://reviews.llvm.org/D8416 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232660 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 19:35:31 +00:00
Matthias Braun	8b41add6ca	TableGen: Fix register class lane masks being too conservative. When calculating the lanemask of a register class we have to include the masks of subregisters supported by any of the class members, not just the ones supported by all class members. This fixes problems when coalescing towards a subclass with additional subregisters available. The attached testcase works fine as is, but does crash if you enable subregister liveness on x86 without this change applied. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232652 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 17:56:09 +00:00
Sanjay Patel	22a94d59d9	Use utils/update_llc_test_checks.py to update all CHECKs The checks here were so vague that we could nuke intrinsics from existence and still pass the test because we'd match the function name. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232647 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 16:38:44 +00:00
Krzysztof Parzyszek	f795de029a	[Hexagon] Intrinsics for circular and bit-reversed loads and stores git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232645 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 16:23:44 +00:00
Sanjay Patel	4795cb202c	fixed to test features, not CPU model The 'vmovntdq' was only passing due to a fluke in SandyBridge codegen that splits 32-byte stores in half, but that meant that the test was not correctly checking for the 32-byte store that we thought we were generating. The lax checking in this file will be addressed in another commit. There are bigger problems here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232644 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 16:07:10 +00:00
Krzysztof Parzyszek	d5cb4a90e5	[Hexagon] Handle ENDLOOP0 in InsertBranch and RemoveBranch git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232643 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 15:56:43 +00:00
Daniel Jasper	bf2e6a6be2	Change test to accept an additional critical edge split. The two hot blocks are right next to each other and I verified that there is no performance regression by compressing/uncompressing some files with a minigzip built with the different options. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232629 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 12:45:45 +00:00
John Brawn	0328ca6cd7	[ARM] Align stack objects passed to memory intrinsics Memcpy, and other memory intrinsics, typically tries to use LDM/STM if the source and target addresses are 4-byte aligned. In CodeGenPrepare look for calls to memory intrinsics and, if the object is on the stack, 4-byte align it if it's large enough that we expect that memcpy would want to use LDM/STM to copy it. Differential Revision: http://reviews.llvm.org/D7908 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232627 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 12:01:59 +00:00
John Brawn	bf60cd0751	Add missing newline to end of test file. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232626 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 10:45:12 +00:00
Josh Magee	cbaefea0c0	Add testcases for BEXTR. These BEXTR cases are a check for the 64-bit load form and two negative cases where the bitrange is non-contiguous. From a private patch equivalent to r189742/PR17028. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232580 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 01:34:06 +00:00
Krzysztof Parzyszek	dbe964d3a6	Missed testcase for r232577 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232578 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 00:44:46 +00:00
David Majnemer	8f01b96d93	DAGCombiner: fold (xor (shl 1, x), -1) -> (rotl ~1, x) Targets which provide a rotate make it possible to replace a sequence of (XOR (SHL 1, x), -1) with (ROTL ~1, x). This saves an instruction on architectures like X86 and POWER(64). Differential Revision: http://reviews.llvm.org/D8350 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232572 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-18 00:03:36 +00:00
David Majnemer	7605cdd6e4	COFF: Let globals with private linkage reside in their own section COFF COMDATs (for selection kinds other than 'select any') require at least one non-section symbol in the symbol table. Satisfy this by morally enhancing the linkage from private to internal. Differential Revision: http://reviews.llvm.org/D8394 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232570 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-17 23:54:51 +00:00
Pirama Arumuga Nainar	5e15d64948	Fix bug while building FP16 constant vectors for AArch64 Summary: Building FP16 constant vectors caused the FP16 data to be bitcast to i64. This patch creates a BITCAST node with the correct value, and adds a test to verify correct handling. Reviewers: mcrosier Reviewed By: mcrosier Subscribers: mcrosier, jmolloy, ab, srhines, llvm-commits, rengolin, aemerson Differential Revision: http://reviews.llvm.org/D8369 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232562 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-17 23:10:29 +00:00
David Majnemer	76d3a99d10	Revert "COFF: Let globals with private linkage reside in their own section" This reverts commit r232539. This was committed accidently. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232543 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-17 20:41:11 +00:00
David Majnemer	6526150f82	COFF: Let globals with private linkage reside in their own section Summary: COFF COMDATs (for selection kinds other than 'select any') require at least one non-section symbol in the symbol table. Satisfy this by morally enhancing the linkage from private to internal. Reviewers: rafael Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8374 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232539 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-17 20:39:25 +00:00
Richard Barton	b59aee170f	[ARM] Fix offset calculation in ARMBaseRegisterInfo::needsFrameBaseReg The input offset to needsFrameBaseReg is a negative value below the top of the stack frame, but when converting to a positive offset from the bottom of the stack frame this value was negated, causing the final offset to be too large by twice the input offset's magnitude. Fix that by not negating the offset. Patch by John Brawn Differential Revision: http://reviews.llvm.org/D8316 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232513 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-17 18:20:47 +00:00
Samuel Antao	7684e7d987	Fix R0 use in PowerPC VSX store for FastIsel. The VSX stores are sometimes generated with a undefined index register, causing %noreg to be used and R0 to be emitted later on. The semantics of the VSX store (e.g. stdsdx) requires R0 to be used as base if we want zero to be used in the computation of the effective address instead of the content of R0. This patch checks if no index register was generated and forces R0 to be used as base address. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232486 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-17 15:00:57 +00:00
Rafael Espindola	cebed4aaf1	Use createTempSymbol to avoid collisions instead of an ad hoc method. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232483 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-17 14:50:32 +00:00
Rafael Espindola	99739705ac	Call EmitFunctionHeader just before EmitFunctionBody. This avoids switching to .AMDGPU.config and back and hardcoding the section it switches back to. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232479 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-17 14:34:42 +00:00
Rafael Espindola	a480f88b3c	Move the EH symbol to the asm printer and use it for the SJLJ case too. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232475 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-17 13:57:48 +00:00
Rafael Espindola	4d3df54336	Replace a use of GetTempSymbol with createTempSymbol. This is cleaner and avoids a crash in a corner case. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232471 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-17 12:54:04 +00:00
Renato Golin	ce1f16421f	[ARM] Add support for ARMV6K subtarget (LLVM) ARMv6K is another layer between ARMV6 and ARMV6T2. This is the LLVM side of the changes. ARMV6 family LLVM implementation. +-------------------------------------+ \| ARMV6 \| +----------------+--------------------+ \| ARMV6M (thumb) \| ARMV6K (arm,thumb) \| <- From ARMV6K and ARMV6M processors +----------------+--------------------+ have support for hint instructions \| ARMV6T2 (arm,thumb,thumb2) \| (SEV/WFE/WFI/NOP/YIELD). They can +-------------------------------------+ be either real or default to NOP. \| ARMV7 (arm,thumb,thumb2) \| The two processors also use +-------------------------------------+ different encoding for them. Patch by Vinicius Tinti. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232468 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-17 11:55:28 +00:00
Ahmed Bougacha	df08543f48	[AArch64] Use intermediate step for concat_vectors of illegal truncs. Optimize concat_vectors of truncated vectors, where the intermediate type is illegal, to avoid said illegality, e.g., (v4i16 (concat_vectors (v2i16 (truncate (v2i64))), (v2i16 (truncate (v2i64))))) -> (v4i16 (truncate (v4i32 (concat_vectors (v2i32 (truncate (v2i64))), (v2i32 (truncate (v2i64))))))) This isn't really target-specific, and, as such, would best go in the DAGCombiner. However, ISD::TRUNCATE legality isn't keyed on both input and result type, so we might generate worse code when we don't know better. On AArch64 we know it's fine for v2i64->v4i16 and v4i32->v8i8. rdar://20022387 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232459 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-17 03:23:09 +00:00
David Majnemer	759acf348d	CodeGen: @llvm.eh.typeid.for replaced @llvm.eh.typeid.for.i32 We removed @llvm.eh.typeid.for.i32 and replaced it with @llvm.eh.typeid.for quite some time ago. Fix up some test cases which never got updated. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232421 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-16 21:36:38 +00:00
Duncan P. N. Exon Smith	763e18696f	DebugInfo: Fix testcases that fail -verify-debug-info=true As part of PR22777, fix testcases that fail the debug info verifier. The changes fall into the following categories: - Empty `filename:` fields in `MDFile`s. Compile units and some types require non-empty filenames. A number of testcases have empty filenames, probably due to hand-reduction of testcases. - Not-quite empty arrays: `!{i32 0}`. This used to be equivalent in the debug info schema to `!{}`. They cause problems for `!MDSubroutineType`'s `types:` array, since it requires all operands to be valid types. (Note that `!{null}` is the correct type array for functions that take no arguments and return `void`.) - Significantly bitrotted testcases. Nodes got left behind a few upgrades ago because of missing or invalid tags. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232415 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-16 21:10:12 +00:00
Sanjay Patel	8233e8c233	fixed to test feature, not CPU git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232398 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-16 18:24:28 +00:00
Sanjay Patel	89095a7882	add CHECK-LABELs for more reliable testing git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232391 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-16 17:59:07 +00:00
Sanjay Patel	8f74fd0883	fixed to test feature, not CPU; removed unnecessary declaration git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232387 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-16 17:01:34 +00:00
Tom Stellard	6ebc34281f	R600/SI: don't try min3/max3/med3 with f64 There are no opcodes for this. This also adds a test case. v2: make test more robust Patch by: Grigori Goronzy git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232386 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-16 15:53:55 +00:00
Petar Jovanovic	b3b90bd679	[MIPS] Fix justify error for small structures Fix justify error for small structures bigger than 32 bits in fixed arguments for MIPS64 big endian. There was a problem when small structures are passed as fixed arguments. The structures that are bigger than 32 bits but smaller than 64 bits were not left justified properly on MIPS64 big endian. This is fixed by shifting the value to make it left justified when appropriate. Patch by Aleksandar Beserminji. Differential Revision: http://reviews.llvm.org/D8174 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232382 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-16 15:01:09 +00:00
Rafael Espindola	8d8c155a61	Use the i8 immediate cmp instructions when possible. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232378 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-16 14:25:08 +00:00
Simon Pilgrim	4f3864d05f	[SSE} Added tests for float4-float3 conversions (PR11580) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232324 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-15 16:19:15 +00:00
Simon Pilgrim	54db4092c1	Simplified some stack folding tests. Replaced explicit pmovzx* intrinsic tests with general shuffles git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232286 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-14 23:16:43 +00:00
Daniel Jasper	439cc2c5de	[MachineLICM] First steps of sinking GEPs near calls. Specifically, if there are copy-like instructions in the loop header they are moved into the loop close to their uses. This reduces the live intervals of the values and can avoid register spills. This is working towards a fix for http://llvm.org/PR22230. Review: http://reviews.llvm.org/D7259 Next steps: - Find a better cost model (which non-copy instructions should be sunk?) - Make this dependent on register pressure git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232262 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-14 10:58:38 +00:00
Ahmed Bougacha	4a2d95826e	Add a bunch of CHECK missing colons in tests. NFC. Some wouldn't pass; fixed most, the rest will be fixed separately. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232239 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-14 01:43:57 +00:00
Rafael Espindola	89c84b0c83	Use add32ri8 and friends on fast isel. This fixes pr22854. The core issue on the bug is that there are multiple instructions that print the same in assembly. In fact, there doesn't seem to be any syntax for specifying that a constant that fits in 8 bits should use a 32 bit immediate. The attached patch changes fast isel to consider i16immSExt8, i32immSExt8, and i64immSExt8. They were disabled because fastisel didn’t know to call the predicate back in the day. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232223 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-13 22:18:18 +00:00
David Blaikie	5a70dd1d82	[opaque pointer type] Add textual IR support for explicit type parameter to gep operator Similar to gep (r230786) and load (r230794) changes. Similar migration script can be used to update test cases, which successfully migrated all of LLVM and Polly, but about 4 test cases needed manually changes in Clang. (this script will read the contents of stdin and massage it into stdout - wrap it in the 'apply.sh' script shown in previous commits + xargs to apply it over a large set of test cases) import fileinput import sys import re rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s$)((<\d\s+x\s+)?([^@]?)(\|\saddrspace\(\d+$)\s\(?(3)>)\s*)(?=$\|%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|zeroinitializer\|<\|\[\[[a-zA-Z]\|\{\{)", re.MULTILINE \| re.DOTALL) def conv(match): line = match.group(1) line += match.group(4) line += ", " line += match.group(2) return line line = sys.stdin.read() off = 0 for match in re.finditer(rep, line): sys.stdout.write(line[off:match.start()]) sys.stdout.write(conv(match)) off = match.end() sys.stdout.write(line[off:]) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232184 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-13 18:20:45 +00:00
Andrea Di Biagio	d288259ccd	[X86][AVX] Fix wrong lowering of v4x64 shuffles into concat_vector plus extract_subvector nodes. This patch fixes a bug in the shuffle lowering logic implemented by function 'lowerV2X128VectorShuffle'. The are few cases where function 'lowerV2X128VectorShuffle' wrongly expands a shuffle of two v4X64 vectors into a CONCAT_VECTORS of two EXTRACT_SUBVECTOR nodes. The problematic expansion only occurs when the shuffle mask M has an 'undef' element at position 2, and M is equivalent to mask <0,1,4,5>. In that case, the algorithm propagates the wrong vector to one of the two new EXTRACT_SUBVECTOR nodes. Example: ;; define <4 x double> @test(<4 x double> %A, <4 x double> %B) { entry: %0 = shufflevector <4 x double> %A, <4 x double> %B, <4 x i32><i32 undef, i32 1, i32 undef, i32 5> ret <4 x double> %0 } ;; Before this patch, llc (-mattr=+avx) generated: vinsertf128 $1, %xmm0, %ymm0, %ymm0 With this patch, llc correctly generates: vinsertf128 $1, %xmm1, %ymm0, %ymm0 Added test lower-vec-shuffle-bug.ll Differential Revision: http://reviews.llvm.org/D8259 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232179 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-13 17:29:49 +00:00
Matt Arsenault	462f98dd60	R600/SI: Add test for min / max with immediate Make sure this isn't getting confused by canonicalizations of comparisons with a constant. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232177 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-13 16:43:48 +00:00
Hao Liu	fcc897cc45	[MachineCopyPropagation] Fix a bug causing incorrect removal for the instruction sequences as follows %Q5_Q6<def> = COPY %Q2_Q3 %D5<def> = %D3<def> = %D3<def> = COPY %D6 // Incorrectly removed in MachineCopyPropagation Using of %D3 results in incorrect result ... Reviewed in http://reviews.llvm.org/D8242 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232142 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-13 05:15:23 +00:00
Sanjay Patel	cae9695fbb	[X86, AVX2] Replace inserti128 and extracti128 intrinsics with generic shuffles This should complete the job started in r231794 and continued in r232045: We want to replace as much custom x86 shuffling via intrinsics as possible because pushing the code down the generic shuffle optimization path allows for better codegen and less complexity in LLVM. AVX2 introduced proper integer variants of the hacked integer insert/extract C intrinsics that were created for this same functionality with AVX1. This should complete the removal of insert/extract128 intrinsics. The Clang precursor patch for this change was checked in at r232109. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232120 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-12 23:16:18 +00:00
Simon Pilgrim	7385cafb7a	Removed useless palignr test - we don't actually provide a llvm.x86.ssse3.palign.r.128 intrinsic Differential Revision: http://reviews.llvm.org/D8302 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232108 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-12 21:42:03 +00:00
Tom Stellard	3d712a6373	R600/SI: Remove _e32 and _e64 suffixes from mnemonics Instead print them as part of the $dst operand. The AsmMatcher requires the 32-bit and 64-bit encodings have the same mnemonic in order to parse them correctly. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232105 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-12 21:34:22 +00:00
Andrew Kaylor	d434c0d548	Adding WinEHPrepare tests (currently XFAILs) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232104 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-12 21:32:59 +00:00
Krzysztof Parzyszek	49fa37992d	Unxfail passing test on Hexagon test/CodeGen/Generic/2008-02-20-MatchingMem.ll git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232098 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-12 20:38:10 +00:00
Quentin Colombet	be45e0e669	[X86] Fix a regression introduced by r223641. The permps and permd instructions have their operands swapped compared to the intrinsic definition. Therefore, they do not fall into the INTR_TYPE_2OP category. I did not create a new category for those two, as they are the only one AFAICT in that case. <rdar://problem/20108262> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232085 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-12 19:34:12 +00:00
Krzysztof Parzyszek	7b110fe366	Remove unused complex patterns for addressing modes on Hexagon. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232057 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-12 16:44:50 +00:00
Andrea Di Biagio	be9322ae7c	[X86] Fix wrong target specific combine on SETCC nodes. Part of the folding logic implemented by function 'PerformISDSETCCCombine' only worked under the assumption that the condition code in input could have been either SETNE or SETEQ. Unfortunately that assumption was incorrect, and in some cases the algorithm ended up incorrectly folding SETCC nodes. The incorrect folding only affected SETCC dag nodes where: - one of the operands was a build_vector of all zeroes; - the other operand was a SIGN_EXTEND from a vector of MVT:i1 elements; - the condition code was neither SETNE nor SETEQ. Example: (setcc (v4i32 (sign_extend v4i1:%A)), (v4i32 VectorOfAllZeroes), setge) Before this patch, the entire dag node sequence from the example was incorrectly folded to node %A. With this patch, the dag node sequence is folded to a (xor %A, (v4i1 VectorOfAllOnes)). Added test setcc-combine.ll. Thanks to Greg Bedwell for spotting this issue. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232046 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-12 15:16:58 +00:00
Sanjay Patel	b4c1547749	[X86, AVX] replace vextractf128 intrinsics with generic shuffles Now that we've replaced the vinsertf128 intrinsics, do the same for their extract twins. This is very much like D8086 (checked in at r231794): We want to replace as much custom x86 shuffling via intrinsics as possible because pushing the code down the generic shuffle optimization path allows for better codegen and less complexity in LLVM. This is also the LLVM sibling to the cfe D8275 patch. Differential Revision: http://reviews.llvm.org/D8276 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232045 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-12 15:15:19 +00:00
Simon Pilgrim	df0acf35f3	[X86][AVX2] Added missing palignr stack folding test git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232033 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-12 13:12:33 +00:00
Jingyue Wu	3ea0adcdd5	[NVPTXAsmPrinter] do not print .align on function headers Summary: PTX does not allow .align directives on function headers. Fixes PR21551. Test Plan: test/Codegen/NVPTX/function-align.ll Reviewers: eliben, jholewinski Reviewed By: eliben, jholewinski Subscribers: llvm-commits, eliben, jpienaar, jholewinski Differential Revision: http://reviews.llvm.org/D8274 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232004 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-12 01:50:30 +00:00
Reid Kleckner	b53bb04b2f	Remove some CHECK-NOT lines in favor of CHECK-NEXT NFC, this is just shorter. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@232000 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-12 01:38:48 +00:00
Reid Kleckner	7dedaabcae	Stop calling DwarfEHPrepare from WinEHPrepare Instead, run both EH preparation passes, and have them both ignore functions with unrecognized EH personalities. Pass delegation involved some hacky code for creating an AnalysisResolver that we don't need now. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231995 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-12 00:36:20 +00:00
Reid Kleckner	70992ac969	Handle big index in getelementptr instruction CodeGen incorrectly ignores (assert from APInt) constant index bigger than 2^64 in getelementptr instruction. This is a test and fix for that. Patch by Paweł Bylica! Reviewed By: rnk Subscribers: majnemer, rnk, mcrosier, resistor, llvm-commits Differential Revision: http://reviews.llvm.org/D8219 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231984 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-11 23:36:10 +00:00
Andrew Kaylor	1134ac4a0f	Extended support for native Windows C++ EH outlining Differential Review: http://reviews.llvm.org/D7886 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231981 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-11 23:22:06 +00:00
Jozef Kolek	a2b4e9a30e	[mips][microMIPS] Make usage of NOT16 by code generator Differential Revision: http://reviews.llvm.org/D7748 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231963 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-11 20:28:31 +00:00
Sanjay Patel	02402b3cc1	add CHECK-LABELs for better reliability git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231962 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-11 20:12:07 +00:00
Rafael Espindola	0c78583bf6	Put jump tables in unique sections on COFF. If a function is going in an unique section (because of -ffunction-sections for example), putting a jump table in .rodata will keep .rodata alive and that will keep alive any other function that also has a jump table. Instead, put the jump table in a unique section that is associated with the function. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231961 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-11 19:58:37 +00:00
Tim Northover	52f83a9ab3	ARM: simplify and extend byval handling The main issue being fixed here is that APCS targets handling a "byval align N" parameter with N > 4 were miscounting what objects were where on the stack, leading to FrameLowering setting the frame pointer incorrectly and clobbering the stack. But byval handling had grown over many years, and had multiple layers of cruft trying to compensate for each other and calculate padding correctly. This only really needs to be done once, in the HandleByVal function. Elsewhere should just do what it's told by that call. I also stripped out unnecessary APCS/AAPCS distinctions (now that Clang emits byvals with the correct C ABI alignment), which simplified HandleByVal. rdar://20095672 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231959 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-11 18:54:22 +00:00
Derek Schuff	87e6561f34	Make NaCl's use of .init_array for static constructors match Linux Summary: The generic ELF TargetObjectFile defaults to .ctors, but Linux's defaults to .init_array by calling InitializeELF with the value of UseInitArray from TargetMachine. Make NaCl's behavior match. Reviewers: jvoung Differential Revision: http://reviews.llvm.org/D8240 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231934 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-11 16:16:09 +00:00
Quentin Colombet	7775242da3	[CodeGenPrepare] Refine the cost model provided by the promotion helper. - Use TargetLowering to check for the actual cost of each extension. - Provide a factorized method to check for the cost of an extension: TargetLowering::isExtFree. - Provide a virtual method TargetLowering::isExtFreeImpl for targets to be able to tune the cost of non-free extensions. This refactoring offers a better granularity to model what really happens on different targets. No performance changes and very few code differences. Part of <rdar://problem/19267165> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231855 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-10 21:48:15 +00:00
Nemanja Ivanovic	dc12298109	Add support for part-word atomics for PPC http://reviews.llvm.org/D8090#inline-67337 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231843 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-10 20:51:07 +00:00
Ahmed Bougacha	4a3cd42601	[AArch64] Avoid going through GPRs for across-vector instructions. This adds new node types for each intrinsic. For instance, for addv, we have AArch64ISD::UADDV, such that: (v4i32 (uaddv ...)) is the same as (v4i32 (scalar_to_vector (i32 (int_aarch64_neon_uaddv ...)))) that is, (v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)), (i32 (int_aarch64_neon_uaddv ...)), ssub) In a combine, we transform all such across-vector-lanes intrinsics to: (i32 (extract_vector_elt (uaddv ...), 0)) This has one big advantage: by making the extract_element explicit, we enable the existing patterns for lane-aware instructions to fire. This lets us avoid needlessly going through the GPRs. Consider: uint32x4_t test_mul(uint32x4_t a, uint32x4_t b) { return vmulq_n_u32(a, vaddvq_u32(b)); } We now generate: addv.4s s1, v1 mul.4s v0, v0, v1[0] instead of the previous: addv.4s s1, v1 fmov w8, s1 dup.4s v1, w8 mul.4s v0, v1, v0 rdar://20044838 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231840 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-10 20:45:38 +00:00
Kit Barton	1f9ea3a230	Change the generation of the vmuluwm instruction to be based on the MUL opcode. Phabricator review: http://reviews.llvm.org/D8185 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231827 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-10 19:49:38 +00:00
Igor Laevsky	68beb2a9ec	Teach lowering to correctly handle invoke statepoint and gc results tied to them. Note that we still can not lower gc.relocates for invoke statepoints. Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result. (Resubmitting this change after not being able to reproduce buildbot failure) Differential Revision: http://reviews.llvm.org/D7760 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231800 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-10 16:26:48 +00:00
Sanjay Patel	137e1f3f28	[X86, AVX] replace vinsertf128 intrinsics with generic shuffles We want to replace as much custom x86 shuffling via intrinsics as possible because pushing the code down the generic shuffle optimization path allows for better codegen and less complexity in LLVM. This is the sibling patch for the Clang half of this change: http://reviews.llvm.org/D8088 Differential Revision: http://reviews.llvm.org/D8086 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231794 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-10 16:08:36 +00:00
Colin LeMahieu	376b961126	[Hexagon] Removing unused patterns. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231723 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-09 23:08:46 +00:00
Ahmed Bougacha	55a060641f	[CodeGen] Replace the reused stores' chain for extractelt expansion. This fixes a subtle issue that was introduced in r205153. When reusing a store for the extractelement expansion (to load directly from it, inserting of going through the stack), later stores to the same location might have overwritten the data we were expecting to extract from. To fix that, we need to explicitly replace the chain going out of the reused store, so that later stores also have an explicit dependency on the generated element-extracting loads, and can't clobber them. rdar://20066785 Differential Revision: http://reviews.llvm.org/D8180 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231721 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-09 22:51:05 +00:00
Ahmed Bougacha	fad749559c	[X86] Add nounwind to vector-idiv.ll testcases. NFC. In preparation for a patch where cfi directives get in the way. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231720 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-09 22:46:02 +00:00
Reid Kleckner	4c27f8d49e	Reland r229944: EH: Prune unreachable resume instructions during Dwarf EH preparation Fix the double-deletion of AnalysisResolver when delegating through to Dwarf EH preparation by creating one from scratch. Hopefully the new pass manager simplifies this. This reverts commit r229952. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231719 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-09 22:45:16 +00:00
Colin LeMahieu	ffc2de43d9	[Hexagon] Reapply r231699. Remove assumption that second operand is an immediate when checking if A2_tfrsi is combinable. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231710 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-09 21:48:13 +00:00
Colin LeMahieu	c2d30aebf3	[Hexagon] Reverting r231699 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231703 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-09 21:19:02 +00:00
Colin LeMahieu	8c2919a34e	[Hexagon] Updating constant set to simpler versions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231699 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-09 20:33:12 +00:00
Colin LeMahieu	a0ce232a65	[Hexagon] Eliminating immediate condition set. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231693 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-09 19:57:18 +00:00
Rafael Espindola	be886690bd	Print jump tables before exception tables. In the case where just tables are part of the function section, this produces more readable assembly by avoiding switching to the eh section and back to .text. This would also break with non unique section names, as trying to switch to a unique section actually creates a new one. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231677 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-09 18:29:12 +00:00
Reed Kotler	18afdb3210	Add logical ops to Mips fast-isel Summary: Code is mostly copied from AArch64 port and modified where needed for Mips. This handles the "non" legal cases of logical ops. Legal cases are handled by tablegen patterns. Test Plan: Make check test logopm.ll All of test-suite passes at O0/O2 and mips32 r1/r2 with this new change. Reviewers: dsanders Reviewed By: dsanders Subscribers: echristo, llvm-commits, aemerson, rfuhler Differential Revision: http://reviews.llvm.org/D6599 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231665 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-09 16:28:10 +00:00
Marek Olsak	c4ca7b59db	R600/SI: Limit SGPRs to 80 on Tonga and Iceland This is a candidate for stable. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231659 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-09 15:48:09 +00:00
Andrea Di Biagio	d1232fa1c0	Fix line ending in test CodeGen/X86/pr22774.ll. NFC. Also, replaced line with 'target triple' with flag -mtriple on the RUN line. Removed the data layout string as it is not needed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231654 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-09 15:02:01 +00:00
Andrea Di Biagio	692f7382b5	[X86][AVX] Fix wrong lowering of VPERM2X128 nodes There were cases where the backend computed a wrong permute mask for a VPERM2X128 node. Example: \code define <8 x float> @foo(<8 x float> %a, <8 x float> %b) { %shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 undef, i32 undef, i32 6, i32 7, i32 undef, i32 undef, i32 6, i32 7> ret <8 x float> %shuffle } \code end Before this patch, llc (with -mattr=+avx) emitted the following vperm2f128: vperm2f128 $0, %ymm0, %ymm0, %ymm0 # ymm0 = ymm0[0,1,0,1] With this patch, llc emits a vperm2f128 with a correct permute mask: vperm2f128 $17, %ymm0, %ymm0, %ymm0 # ymm0 = ymm0[2,3,2,3] Differential Revision: http://reviews.llvm.org/D8119 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231601 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-08 16:28:47 +00:00
Andrea Di Biagio	15d2c3fb00	[DAGCombiner] Fix wrong folding of AND dag nodes. This patch fixes the logic in the DAGCombiner that folds an AND node according to rule: (and (X (load V)), C) -> (X (load V)) An AND between a vector load 'X' and a constant build_vector 'C' can be folded into the load itself only if we can prove that the AND operation is redundant. The algorithm implemented by 'visitAND' firstly computes the splat value 'S' from C, and then checks if S has the lower 'B' bits set (where B is the size in bits of the vector element type). The algorithm takes into account also the 'undef' bits in the splat mask. Unfortunately, the algorithm only worked under the assumption that the size of S is a multiple of the vector element type. With this patch, we conservatively avoid folding the AND if the splat bits are not compatible with the vector element type. Added X86 test and-load-fold.ll Differential Revision: http://reviews.llvm.org/D8085 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231563 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-07 12:24:55 +00:00
Simon Pilgrim	62ba058dea	[DAGCombiner] SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT(V,C)) -> VECTOR_SHUFFLE This patch attempts to convert a SCALAR_TO_VECTOR using an operand from an EXTRACT_VECTOR_ELT into a VECTOR_SHUFFLE. This prevents many cases of spilling scalar data between the gpr + simd registers. At present the optimization only accepts cases where there is no TRUNC of the scalar type (i.e. all types must match). Differential Revision: http://reviews.llvm.org/D8132 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231554 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-07 05:52:42 +00:00
Eric Christopher	80d55e6bcb	Remove use of misched-bench from this test and replace it with non-temporary enabling options. This is part of removing misched-bench as an option. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231546 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-07 01:39:06 +00:00
Eric Christopher	5dc2251b4e	Recommit r231324 with a fix to the ARM execution domain code to disable lane switching if we don't actually have the instruction set we want to switch to. Models the earlier check above the conditional for the pass. The testcase is one that triggered with the assert that's added as part of the fix, use it to avoid adding a new testcase as it highlights the same problem. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231539 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-07 00:12:22 +00:00
Quentin Colombet	05a3f9120a	[AArch64][LoadStoreOptimizer] Generate LDP + SXTW instead of LD[U]R + LD[U]RSW. Teach the load store optimizer how to sign extend a result of a load pair when it helps creating more pairs. The rational is that loads are more expensive than sign extensions, so if we gather some in one instruction this is better! <rdar://problem/20072968> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231527 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-06 22:42:10 +00:00
Sanjay Patel	5bac8f9b95	fixed to test features, not CPUs git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231524 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-06 21:50:42 +00:00
Sanjay Patel	f528a40240	fixed to test features, not CPUs git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231523 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-06 21:50:27 +00:00
Sanjay Patel	5ff9d3e6ed	loosen checking for buildbots git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231522 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-06 21:30:18 +00:00
Sanjay Patel	aedb16fc6f	fixed to test only the feature, not the feature and a CPU git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231521 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-06 21:24:56 +00:00
Sanjay Patel	b00a131bda	fixed to test only the feature, not the feature and a CPU git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231520 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-06 21:19:32 +00:00
Sanjay Patel	50652e746b	fixed test to use FileCheck git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231519 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-06 21:16:15 +00:00
Sanjay Patel	6deb05e63a	fixed to use CHECK-LABELs git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231517 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-06 21:05:02 +00:00
Sanjay Patel	d532d10275	fixed to test only the feature, not the feature and a CPU git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231516 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-06 20:58:15 +00:00
Sanjay Patel	f2b34a568b	fixed to test only the feature, not the feature and a CPU git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231515 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-06 20:57:40 +00:00
Sanjay Patel	9e03e85cac	fixed to test feature, not CPU git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231513 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-06 20:51:25 +00:00
Sanjay Patel	17392758a1	fixed to test features, not CPUs git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231512 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-06 20:46:16 +00:00
Sanjay Patel	91b79125ec	fixed test to use SSE2 attribute git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231510 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-06 20:38:55 +00:00
Sanjay Patel	e7abe0cdbc	fixed to test only the feature, not the feature and a CPU git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231509 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-06 20:34:20 +00:00
Matthias Braun	47941aa098	DAGCombiner: Canonicalize select(and/or,x,y) depending on target. This is based on the following equivalences: select(C0 & C1, X, Y) <=> select(C0, select(C1, X, Y), Y) select(C0 \| C1, X, Y) <=> select(C0, X, select(C1, X, Y)) Many target cannot perform and/or on the CPU flags and therefore the right side should be choosen to avoid materializign the i1 flags in an integer register. If the target can perform this operation efficiently we normalize to the left form. Differential Revision: http://reviews.llvm.org/D7622 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231507 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-06 19:49:10 +00:00
Michael Zolotukhin	6023ad2d37	LegalizeTypes: Handle shift by 0 in ExpandShiftByConstant. Though such shifts are usually optimized away by combiner, we still can encounter them after a vector shift is legalized. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231443 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-06 01:13:01 +00:00
Sanjay Patel	5f79fd2f02	[AVX] Lower / fast-isel scalar FP selects into VBLENDV instructions (PR22483) This patch reduces code size for all AVX targets and increases speed for some chips. SSE 4.1 introduced the useless (see code comments) 2-register form of BLENDV and only in the packed float/double flavors. AVX subsequently made the instruction useful by adding a 4-register operand form. So we just need to paper over the lack of scalar forms of this instruction, complicate the code to choose float or double forms, and use blendv on scalars since all FP is in xmm registers anyway. This gives us an approximately 50% speed up for a blendv microbenchmark sequence on SandyBridge and Haswell: blendv : 29.73 cycles/iter logic : 43.15 cycles/iter No new test cases with this patch because: 1. fast-isel-select-sse.ll tests the positive side for regular X86 lowering and fast-isel 2. sse-minmax.ll and fp-select-cmp-and.ll confirm that we're not firing for scalar selects without AVX 3. fp-select-cmp-and.ll and logical-load-fold.ll confirm that we're not firing for scalar selects with constants. http://llvm.org/bugs/show_bug.cgi?id=22483 Differential Revision: http://reviews.llvm.org/D8063 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231408 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-05 21:46:54 +00:00
Ahmed Bougacha	77f46f4f9f	[AArch64] Teach AsmPrinter about GlobalAddress operands. Fixes PR22761, rdar://20024866. Differential Revision: http://reviews.llvm.org/D8042 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231400 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-05 20:04:21 +00:00
Rafael Espindola	2f76abe7d7	Use the correct func begin symbol in all places in ppc. I missed an occurrence of the old symbol in my previous patch. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231398 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-05 19:47:50 +00:00
Ahmed Bougacha	67297cd956	[ARM] Enable vector extload combine for legal types. This commit enables forming vector extloads for ARM. It only does so for legal types, and when we can't fold the extension in a wide/long form of the user instruction. Enabling it for larger types isn't as good an idea on ARM as it is on X86, because: - we pretend that extloads are legal, but end up generating vld+vmov - we have instructions like vld {dN, dM}, which can't be generated when we "manually expand" extloads to vld+vmov. For legal types, the combine doesn't fire that often: in the integration tests only in a big endian testcase, where it removes a pointless AND. Related to rdar://19723053 Differential Revision: http://reviews.llvm.org/D7423 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231396 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-05 19:37:53 +00:00
Rafael Espindola	2e2dbc35da	Use the generic Lfunc_begin label on ppc. This removes yet another custom label to mark the start of a function. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231390 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-05 18:55:50 +00:00
David Majnemer	42fcf79f36	X86: Optimize address mode matching for FRAME_ALLOC_RECOVER nodes We know that the absolute symbol will be less than 2GB and thus will always fit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231389 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-05 18:50:12 +00:00
Reid Kleckner	9f7c861416	Replace llvm.frameallocate with llvm.frameescape Turns out it's pretty straightforward and simplifies the implementation. Reviewers: andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D8051 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231386 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-05 18:26:34 +00:00
Simon Pilgrim	a744a15e97	[DagCombiner] Allow shuffles to merge through bitcasts Currently shuffles may only be combined if they are of the same type, despite the fact that bitcasts are often introduced in between shuffle nodes (e.g. x86 shuffle type widening). This patch allows a single input shuffle to peek through bitcasts and if the input is another shuffle will merge them, shuffling using the smallest sized type, and re-applying the bitcasts at the inputs and output instead. Dropped old ShuffleToZext test - this patch removes the use of the zext and vector-zext.ll covers these anyhow. Differential Revision: http://reviews.llvm.org/D7939 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231380 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-05 17:14:04 +00:00
Kit Barton	b98636a0f8	While reviewing the changes to Clang to add builtin support for the vsld, vsrd, and vsrad instructions, it was pointed out that the builtins are generating the LLVM opcodes (shl, lshr, and ashr) not calls to the intrinsics. This patch changes the implementation of the vsld, vsrd, and vsrad instructions from from intrinsics to VXForm_1 instructions and makes them legal with P8 Altivec. It also removes the definition of the int_ppc_altivec_vsld, int_ppc_altivec_vsrd, and int_ppc_altivec_vsrad intrinsics. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231378 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-05 16:24:38 +00:00
Igor Laevsky	684d323b9b	Revert change r231366 as it broke clang-native-arm-cortex-a9 Analysis/properties.m test. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231374 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-05 15:41:14 +00:00
Elena Demikhovsky	e670dc7848	AVX-512, SKX: Enabled masked_load/store operations for this target. Added lowering for ISD::CONCAT_VECTORS and ISD::INSERT_SUBVECTOR for i1 vectors, it is needed to pass all masked_memop.ll tests for SKX. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231371 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-05 15:11:35 +00:00
Igor Laevsky	f8b3003ab8	Teach lowering to correctly handle invoke statepoint and gc results tied to them. Note that we still can not lower gc.relocates for invoke statepoints. Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231366 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-05 14:11:21 +00:00
Craig Topper	62eaac6087	[X86] Use vmovss to handle inserting an element into index 0 of a v8f32 vector of zeros. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231354 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-05 06:38:42 +00:00
Chandler Carruth	4197c13062	[MBP] Revert r231238 which attempted to fix a nasty bug where MBP is just arbitrarily interleaving unrelated control flows once they get moved "out-of-line" (both outside of natural CFG ordering and with diamonds that cannot be fully laid out by chaining fallthrough edges). This easy solution doesn't work in practice, and it isn't just a small bug. It looks like a very different strategy will be required. I'm working on that now, and it'll again go behind some flag so that everyone can experiment and make sure it is working well for them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231332 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-05 01:07:03 +00:00
Matthias Braun	29aeaf5408	Improve test robustness Improve test robustness in preparation of coming commits: - Avoid undefs which may get propagated too much. - Remove several pointless add 0, instructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231307 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-04 22:31:18 +00:00
Nemanja Ivanovic	b69d556c37	Add LLVM support for PPC cryptography builtins Review: http://reviews.llvm.org/D7955 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231285 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-04 20:44:33 +00:00
Mehdi Amini	c94da20917	Make DataLayout Non-Optional in the Module Summary: DataLayout keeps the string used for its creation. As a side effect it is no longer needed in the Module. This is "almost" NFC, the string is no longer canonicalized, you can't rely on two "equals" DataLayout having the same string returned by getStringRepresentation(). Get rid of DataLayoutPass: the DataLayout is in the Module The DataLayout is "per-module", let's enforce this by not duplicating it more than necessary. One more step toward non-optionality of the DataLayout in the module. Make DataLayout Non-Optional in the Module Module->getDataLayout() will never returns nullptr anymore. Reviewers: echristo Subscribers: resistor, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D7992 From: Mehdi Amini <mehdi.amini@apple.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231270 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-04 18:43:29 +00:00
Adrian Prantl	2e74ddea3a	Update the out-of-date dwarf expressions in these testcases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231261 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-04 17:39:59 +00:00
Marek Olsak	506d4b2cb4	R600/SI: Add an intrinsic for S_FLBIT_I32 / V_FFBH_I32 Required by OpenGL (ARB_gpu_shader5). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231259 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-04 17:33:45 +00:00
Jozef Kolek	2e37a6f306	[mips][microMIPS] Make usage of ADDU16 and SUBU16 by code generator Differential Revision: http://reviews.llvm.org/D7609 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231249 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-04 15:47:42 +00:00
Andrea Di Biagio	da5e5688e9	[X86][FastISel] Simplify the logic in method X86SelectSIToFP. The target-independent selection algorithm in FastISel already knows how to select a SINT_TO_FP if the target is SSE but not AVX. On targets that have SSE but not AVX, the tablegen'd 'fastEmit' functions for ISD::SINT_TO_FP know how to select instruction X86::CVTSI2SSrr (for an i32 to f32 conversion) and X86::CVTSI2SDrr (for an i32 to f64 conversion). This patch simplifies the logic in method X86SelectSIToFP knowing that the code would not be reachable if the subtarget doesn't have AVX. No functional change intended. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231243 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-04 14:23:25 +00:00
Chandler Carruth	67fade9110	[MBP] Fix a really horrible bug in MachineBlockPlacement, but behind a flag for now. First off, thanks to Daniel Jasper for really pointing out the issue here. It's been here forever (at least, I think it was there when I first wrote this code) without getting really noticed or fixed. The key problem is what happens when two reasonably common patterns happen at the same time: we outline multiple cold regions of code, and those regions in turn have diamonds or other CFGs for which we can't just topologically lay them out. Consider some C code that looks like: if (a1()) { if (b1()) c1(); else d1(); f1(); } if (a2()) { if (b2()) c2(); else d2(); f2(); } done(); Now consider the case where a1() and a2() are unlikely to be true. In that case, we might lay out the first part of the function like: a1, a2, done; And then we will be out of successors in which to build the chain. We go to find the best block to continue the chain with, which is perfectly reasonable here, and find "b1" let's say. Laying out successors gets us to: a1, a2, done; b1, c1; At this point, we will refuse to lay out the successor to c1 (f1) because there are still un-placed predecessors of f1 and we want to try to preserve the CFG structure. So we go get the next best block, d1. ... wait for it ... Except that the next best block isn't d1. It is b2! d1 is waaay down inside these conditionals. It is much less important than b2. Except that this is exactly what we didn't want. If we keep going we get the entire set of the rest of the CFG interleaved!!! a1, a2, done; b1, c1; b2, c2; d1, f1; d2, f2; So we clearly need a better strategy here. =] My current favorite strategy is to actually try to place the block whose predecessor is closest. This very simply ensures that we unwind these kinds of CFGs the way that is natural and fitting, and should minimize the number of cache lines instructions are spread across. It also happens to be dead simple. It's like the datastructure was specifically set up for this use case or something. We only push blocks onto the work list when the last predecessor for them is placed into the chain. So the back of the worklist is the nearest next block. Unfortunately, a change like this is going to cause soooo many benchmarks to swing wildly. So for now I'm adding this under a flag so that we and others can validate that this is fixing the problems described, that it seems possible to enable, and hopefully that it fixes more of our problems long term. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231238 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-04 12:18:08 +00:00
Daniel Jasper	f68f28a41d	Add a flag to experiment with outlining optional branches. In a CFG with the edges A->B->C and A->C, B is an optional branch. LLVM's default behavior is to lay the blocks out naturally, i.e. A, B, C, in order to improve code locality and fallthroughs. However, if a function contains many of those optional branches only a few of which are taken, this leads to a lot of unnecessary icache misses. Moving B out of line can work around this. Review: http://reviews.llvm.org/D7719 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231230 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-04 11:05:34 +00:00
Kristof Beyls	78c4ef5120	Fix PR22408 - LLVM producing AArch64 TLS relocations that GNU linkers cannot handle yet. As is described at http://llvm.org/bugs/show_bug.cgi?id=22408, the GNU linkers ld.bfd and ld.gold currently only support a subset of the whole range of AArch64 ELF TLS relocations. Furthermore, they assume that some of the code sequences to access thread-local variables are produced in a very specific sequence. When the sequence is not as the linker expects, it can silently mis-relaxe/mis-optimize the instructions. Even if that wouldn't be the case, it's good to produce the exact sequence, as that ensures that linkers can perform optimizing relaxations. This patch: * implements support for 16MiB TLS area size instead of 4GiB TLS area size. Ideally clang would grow an -mtls-size option to allow support for both, but that's not part of this patch. * by default doesn't produce local dynamic access patterns, as even modern ld.bfd and ld.gold linkers do not support the associated relocations. An option (-aarch64-elf-ldtls-generation) is added to enable generation of local dynamic code sequence, but is off by default. * makes sure that the exact expected code sequence for local dynamic and general dynamic accesses is produced, by making use of a new pseudo instruction. The patch also removes two (AArch64ISD::TLSDESC_BLR, AArch64ISD::TLSDESC_CALL) pre-existing AArch64-specific pseudo SDNode instructions that are superseded by the new one (TLSDESC_CALLSEQ). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231227 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-04 09:12:08 +00:00
Michael Kuperstein	bbfda9c125	[DAGCombine] Fix a bug in a BUILD_VECTOR combine When trying to convert a BUILD_VECTOR into a shuffle, we try to split a single source vector that is twice as wide as the destination vector. We can not do this when we also need the zero vector to create a blend. This fixes PR22774. Differential Revision: http://reviews.llvm.org/D8040 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231219 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-04 07:27:39 +00:00
Filipe Cabecinhas	7eefc249b8	Fix the test for r231201. We don't crash anymore. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231207 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-04 02:09:40 +00:00
Rafael Espindola	bd490c174e	Use the vanilla func_end symbol for .size. No need to create yet another temp symbol. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231198 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-04 01:35:23 +00:00
Eric Christopher	1df6d33c5e	Weaken the check for a specific movl on the twoaddr-coalesce-3 test - we only care that there are two moves in the loop and not which part is relative to which register anyhow. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231191 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-04 01:19:17 +00:00
Filipe Cabecinhas	08efe825e4	Fix the x86-upgrade-avx2-vbroadcast.ll test by commenting the CHECK lines git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231187 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-04 00:49:12 +00:00
Rafael Espindola	c82398a2ac	Drop the "eh_" from eh_func_begin and eh_func_end. They will be used for more than eh tables. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231185 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-04 00:27:43 +00:00
Juergen Ributzka	e49da9aff1	Remove 'llvm.x86.avx2.vbroadcasti128' intrinsic. The intrinsic is no longer generated by the front-end. Remove the intrinsic and auto-upgrade it to a vector shuffle. Reviewed by Nadav This is related to rdar://problem/18742778. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231182 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-04 00:13:25 +00:00
Eric Christopher	f2bf51c593	Update twoaddr-coalesce-3.ll to run on darwin and linux machines: a) Default relocation model differences, b) Different numbers of # in comments git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231178 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-03 23:56:20 +00:00
Reid Kleckner	ec0a396ffa	WinEH: Remove vestigial EH object Ultimately, we'll need to leave something behind to indicate which alloca will hold the exception, but we can figure that out when it comes time to emit the __CxxFrameHandler3 catch handler table. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231164 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-03 23:20:30 +00:00
Andrew Kaylor	92dabb5710	Moving WinEH outlining tests to an architecture neutral location git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231155 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-03 22:33:39 +00:00
Eric Christopher	63295d884c	Fix a problem where the TwoAddressInstructionPass which generate redundant register moves in a loop. From: int M, total; void foo() { int i; for (i = 0; i < M; i++) { total = total + i / 2; } } This is the kernel loop: .LBB0_2: # %for.body =>This Inner Loop Header: Depth=1 movl %edx, %esi movl %ecx, %edx shrl $31, %edx addl %ecx, %edx sarl %edx addl %esi, %edx incl %ecx cmpl %eax, %ecx jl .LBB0_2 -------------------------- The first mov insn "movl %edx, %esi" could be removed if we change "addl %esi, %edx" to "addl %edx, %esi". The IR before TwoAddressInstructionPass is: BB#2: derived from LLVM BB %for.body Predecessors according to CFG: BB#1 BB#2 %vreg3<def> = COPY %vreg12<kill>; GR32:%vreg3,%vreg12 %vreg2<def> = COPY %vreg11<kill>; GR32:%vreg2,%vreg11 %vreg7<def,tied1> = SHR32ri %vreg3<tied0>, 31, %EFLAGS<imp-def,dead>; GR32:%vreg7,%vreg3 %vreg8<def,tied1> = ADD32rr %vreg3<tied0>, %vreg7<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg8,%vreg3,%vreg7 %vreg9<def,tied1> = SAR32r1 %vreg8<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg9,%vreg8 %vreg4<def,tied1> = ADD32rr %vreg9<kill,tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg4,%vreg9,%vreg2 %vreg5<def,tied1> = INC64_32r %vreg3<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg5,%vreg3 CMP32rr %vreg5, %vreg0, %EFLAGS<imp-def>; GR32:%vreg5,%vreg0 %vreg11<def> = COPY %vreg4; GR32:%vreg11,%vreg4 %vreg12<def> = COPY %vreg5<kill>; GR32:%vreg12,%vreg5 JL_4 <BB#2>, %EFLAGS<imp-use,kill> Now TwoAddressInstructionPass will choose vreg9 to be tied with vreg4. However, it doesn't see that there is copy from vreg4 to vreg11 and another copy from vreg11 to vreg2 inside the loop body. To remove those copies, it is necessary to choose vreg2 to be tied with vreg4 instead of vreg9. This code pattern commonly appears when there is reduction operation in a loop. So check for a reversed copy chain and if we encounter one then we can commute the add instruction so we can avoid a copy. Patch by Wei Mi. http://reviews.llvm.org/D7806 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231148 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-03 22:03:03 +00:00
Andrew Kaylor	83eaade2b4	Outline cleanup handlers for native Windows C++ exception handling Differential Revision: http://reviews.llvm.org/D7865 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231117 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-03 20:00:16 +00:00
Kit Barton	40057e8ee8	Add the following 64-bit vector integer arithmetic instructions added in POWER8: vaddudm vsubudm vmulesw vmulosw vmuleuw vmulouw vmuluwm vmaxsd vmaxud vminsd vminud vcmpequd vcmpequd. vcmpgtsd vcmpgtsd. vcmpgtud vcmpgtud. vrld vsld vsrd vsrad Phabricator review: http://reviews.llvm.org/D7959 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231115 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-03 19:55:45 +00:00
Reid Kleckner	9c28314a68	Make llvm.eh.begincatch use an outparam Ultimately, __CxxFrameHandler3 needs us to put a stack offset in a table, and it will take responsibility for copying the exception object into that slot. Modelling the exception object as an SSA value returned by begincatch isn't going to work in general, so make it use an output parameter. Reviewers: andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D7920 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231086 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-03 17:41:09 +00:00
Chad Rosier	f1de1adc82	[AArch64] When combining constant mul of -3, prefer (sub x, (shl x, N)). This change only effects codegen when the constant is -3. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231085 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-03 17:31:01 +00:00
Duncan P. N. Exon Smith	b056aa798d	DebugInfo: Move new hierarchy into place Move the specialized metadata nodes for the new debug info hierarchy into place, finishing off PR22464. I've done bootstraps (and all that) and I'm confident this commit is NFC as far as DWARF output is concerned. Let me know if I'm wrong :). The code changes are fairly mechanical: - Bumped the "Debug Info Version". - `DIBuilder` now creates the appropriate subclass of `MDNode`. - Subclasses of DIDescriptor now expect to hold their "MD" counterparts (e.g., `DIBasicType` expects `MDBasicType`). - Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp` for printing comments. - Big update to LangRef to describe the nodes in the new hierarchy. Feel free to make it better. Testcase changes are enormous. There's an accompanying clang commit on its way. If you have out-of-tree debug info testcases, I just broke your build. - `upgrade-specialized-nodes.sh` is attached to PR22564. I used it to update all the IR testcases. - Unfortunately I failed to find way to script the updates to CHECK lines, so I updated all of these by hand. This was fairly painful, since the old CHECKs are difficult to reason about. That's one of the benefits of the new hierarchy. This work isn't quite finished, BTW. The `DIDescriptor` subclasses are almost empty wrappers, but not quite: they still have loose casting checks (see the `RETURN_FROM_RAW()` macro). Once they're completely gutted, I'll rename the "MD" classes to "DI" and kill the wrappers. I also expect to make a few schema changes now that it's easier to reason about everything. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231082 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-03 17:24:31 +00:00
Daniel Jasper	bbaf4fd14c	During PHI elimination, split critical edges that move copies out of loops. This prevents the behavior observed in llvm.org/PR22369. I am not sure whether I am reading the code correctly, but the early exit based on isLiveOutPastPHIs() seems to make the wrong assumption that RegisterCoalescer won't be able to coalesce those copies later. This change hides the new behavior behind -no-phi-elim-live-out-early-exit as it currently breaks four tests: * Assertion in: CodeGen/Hexagon/hwloop-cleanup.ll * Worse code in: CodeGen/X86/coalescer-commute4.ll CodeGen/X86/phys_subreg_coalesce-2.ll CodeGen/X86/zlib-longest-match.ll The root cause here seems to be that the heuristic that determines the visitation order in RegisterCoalescer gets less lucky. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231064 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-03 10:23:11 +00:00
Ahmed Bougacha	14593eb417	[X86] Special-case 2x CMOV when custom-inserting. This lets us avoid a few copies that are otherwise hard to get rid of. The way this is done is, the custom-inserter looks at the following instruction for another CMOV, and replaces both at the same time. A previous version used a new CMOV2 opcode, but the custom inserter is expected to be able to return a different basic block anyway, which means it's OK - though far from ideal - to alter that block's contents. Explicitly document that, in case it ever makes a difference. Alternatives welcome! Follow-up to r231045. rdar://19767934 Closes http://reviews.llvm.org/D8019 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231046 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-03 01:21:16 +00:00
Ahmed Bougacha	8b5527deef	[X86] Combine (cmov (and/or (setcc) (setcc))) into (cmov (cmov)). Fold and/or of setcc's to double CMOV: (CMOV F, T, ((cc1 \| cc2) != 0)) -> (CMOV (CMOV F, T, cc1), T, cc2) (CMOV F, T, ((cc1 & cc2) != 0)) -> (CMOV (CMOV T, F, !cc1), F, !cc2) When we can't use the CMOV instruction, it might increase branch mispredicts. When we can, or when there is no mispredict, this improves throughput and reduces register pressure. These can't be catched by generic combines, because the pattern can appear when legalizing some instructions (such as fcmp une). rdar://19767934 http://reviews.llvm.org/D7634 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231045 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-03 01:09:14 +00:00
Reid Kleckner	a8182029a1	Fix cppeh breakage due to racing commits git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231044 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-03 01:04:39 +00:00
Andrew Kaylor	f89c6af16b	Remap arguments and non-alloca values used by outlined C++ exception handlers. Differential Revision: http://reviews.llvm.org/D7844 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231042 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-03 00:41:03 +00:00
Reid Kleckner	8594a2aa79	WinEH: Run opt -instnamer over some cppeh tests and update CHECKs In the future, we should run the output of clang through instnamer to make it easier to manually edit test cases. No functionality change. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231037 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-03 00:05:35 +00:00
Adrian Prantl	994176ad7c	Refactor DebugLocDWARFExpression so it doesn't require access to the TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes of the pre-calculated DWARF expression. Ought to be NFC, but it does slightly alter the output format of the textual assembly. This reapplies 230930 without the assertion in DebugLocEntry::finalize() because not all Machine registers can be lowered into DWARF register numbers and floating point constants cannot be expressed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231023 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-02 22:02:33 +00:00
Adrian Prantl	a2e69c9c58	Revert "Refactor DebugLocDWARFExpression so it doesn't require access to the" This reverts commit 230975 to investigate buildbot breakage. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231004 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-02 20:01:54 +00:00
David Blaikie	7f620e56cd	Change SystemZ large tests to use the existing long_tests property (this is already used in Clang for a couple of tests) Reviewers: uweigand Differential Revision: http://reviews.llvm.org/D7965 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230998 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-02 19:34:11 +00:00
Adrian Prantl	9680c9c1a8	Refactor DebugLocDWARFExpression so it doesn't require access to the TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes of the pre-calculated DWARF expression. Ought to be NFC, but it does slightly alter the output format of the textual assembly. This reapplies 230930 with a relaxed assertion in DebugLocEntry::finalize() that allows for empty DWARF expressions for constant FP values. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230975 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-02 17:21:06 +00:00
Vasileios Kalintiris	5a393cab69	[mips] Optimize conditional moves where RHS is zero. Summary: When the RHS of a conditional move node is zero, we can utilize the $zero register by inverting the conditional move instruction and by swapping the order of its True/False operands. Reviewers: dsanders Differential Revision: http://reviews.llvm.org/D7945 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230956 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-02 12:47:32 +00:00
Nico Weber	5e871d0b9c	Revert r230930, it caused PR22747. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230932 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-02 04:37:11 +00:00
Adrian Prantl	d21acaf6a1	Refactor DebugLocDWARFExpression so it doesn't require access to the TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes of the pre-calculated DWARF expression. Ought to be NFC, but it does slightly alter the output format of the textual assembly. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230930 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-02 02:38:18 +00:00
Elena Demikhovsky	975e9b99aa	AVX-512: Added mask and rounding mode for scalar arithmetics Added more tests for scalar instructions to destinguish between AVX and AVX-512 forms. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230891 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-01 07:44:04 +00:00
Sanjay Patel	821bba7fda	avoid infinite looping when folding vector multiplies of constants (PR22698) We were missing a check for the following fold in DAGCombiner: // fold (fmul (fmul x, c1), c2) -> (fmul x, (fmul c1, c2)) If 'x' is also a constant, then we shouldn't do anything. Otherwise, we could end up swapping the operands back and forth forever. This should fix: http://llvm.org/bugs/show_bug.cgi?id=22698 Differential Revision: http://reviews.llvm.org/D7917 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230884 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-01 00:09:35 +00:00
Sanjay Patel	7497834516	fixed to test only the feature, not the feature and a CPU git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230883 91177308-0d34-0410-b5e6-96231b3b80d8	2015-03-01 00:02:03 +00:00
Sanjay Patel	4f00dcbf22	make the tested feature (SSE2) explicit git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230881 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-28 23:55:24 +00:00
Duncan P. N. Exon Smith	aee20c05e6	DebugInfo: Fix invalid file reference in CodeGen/X86/unknown-location.ll There are two types of files in the old (current) debug info schema. !0 = !{!"some/filename", !"/path/to/dir"} !1 = !{!"0x29", !0} ; [ DW_TAG_file_type ] !1 has a wrapper class called `DIFile` which inherits from `DIScope` and is referenced in 'scope' fields. !0 is called a "file node", and debug info nodes with a 'file' field point at one of these directly -- although they're built in `DIBuilder` by sending in a `DIFile` and reaching into it. In the new hierarchy, I unified these nodes as `MDFile` (which `DIFile` is a lightweight wrapper for) in r230057. Moving the new hierarchy into place (and upgrading testcases) caused CodeGen/X86/unknown-location.ll to start failing -- apparently "0x29" was previously showing up in the linetable as a filename, causing: .loc 2 4 3 (where 2 points at filename "0x29") instead of: .loc 1 4 3 (where 1 points at the actual filename). Change the testcase to use the old schema correctly. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230880 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-28 23:52:24 +00:00
Sanjay Patel	be657e70b1	fixed to test only the feature, not the feature and a CPU git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230878 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-28 23:47:09 +00:00
Craig Topper	8df1c6ef09	[X86] Remove the blendpd/blendps/pblendw/pblendd intrinsics. They can represented by shuffle_vector instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230860 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-28 19:33:17 +00:00
Bill Schmidt	88bbdc790e	Regenerated test case from pr 230801 for change in LLVM IR syntax git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230811 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 23:29:57 +00:00
David Blaikie	ee62d93f1b	Update SystemZ/Large test generators to handle new gep IR syntax git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230810 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 23:29:39 +00:00
David Blaikie	834cb56c1b	Update SystemZ/Large test generators to handle new load IR syntax git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230809 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 23:29:33 +00:00
Bill Schmidt	52a5087a4b	Revert test case until it can be fixed git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230803 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 22:31:14 +00:00
Bill Schmidt	0e1e8e2f62	[PowerPC] Fix PR22711 - Misaligned .toc section Straightforward patch to emit an alignment directive when emitting a TOC entry. The test case was generated from the test in PR22711 that demonstrated a misaligned .toc section. The object code is run through llvm-readobj to verify that the correct alignment has been applied to the .toc section. Thanks to Ulrich Weigand for running down where the fix was needed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230801 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 22:14:10 +00:00
David Blaikie	7c9c6ed761	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230794 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 21:17:42 +00:00
Charles Davis	dc64962c86	Target/X86: Never use the redzone for Win64 ABI functions. Summary: Until now, we did this (among other things) based on whether or not the target was Windows. This is clearly wrong, not just for Win64 ABI functions on non-Windows, but for System V ABI functions on Windows, too. In this change, we make this decision based on the ABI the calling convention specifies instead. Reviewers: rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7953 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230793 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 21:11:16 +00:00
Hal Finkel	e03aac601f	[PowerPC] Use vector types for memcpy and friends (sometimes) When using Altivec, we can use vector loads and stores for aligned memcpy and friends. Starting with the P7 and VXS, we have reasonable unaligned vector stores. Starting with the P8, we have fast unaligned loads too. For QPX, we use vector loads are stores, but only for aligned memory accesses. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230788 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 19:58:28 +00:00
David Blaikie	198d8baafb	[opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float> %x, ... ->getelementptr float, <4 x float> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.?[^%\w]getelementptr inbounds )(((?:<\d x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") normrep = re.compile( r"(^.?[^%\w]getelementptr )(((?:<\d* x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll \| xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230786 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 19:29:02 +00:00
Eric Christopher	930da21265	Remove the Forward Control Flow Integrity pass and its dependencies. This work is currently being rethought along different lines and if this work is needed it can be resurrected out of svn. Remove it for now as no current work in ongoing on it and it's unused. Verified with the authors before removal. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230780 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 19:03:38 +00:00
Mehdi Amini	26d628d6ce	Change the fast-isel-abort option from bool to int to enable "levels" Summary: Currently fast-isel-abort will only abort for regular instructions, and just warn for function calls, terminators, function arguments. There is already fast-isel-abort-args but nothing for calls and terminators. This change turns the fast-isel-abort options into an integer option, so that multiple levels of strictness can be defined. This will help no being surprised when the "abort" option indeed does not abort, and enables the possibility to write test that verifies that no intrinsics are forgotten by fast-isel. Reviewers: resistor, echristo Subscribers: jfb, llvm-commits Differential Revision: http://reviews.llvm.org/D7941 From: Mehdi Amini <mehdi.amini@apple.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230775 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 18:32:11 +00:00
Rafael Espindola	150233c378	Centralize handling of the eh_begin and eh_end labels. This removes a bit of duplicated code and more importantly, remembers the labels so that they don't need to be looked up by name. This in turn allows for any name to be used and avoids a crash if the name we wanted was already taken. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230772 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 18:18:39 +00:00
Renato Golin	636aacf211	Equally to NetBSD, Bitrig/ARM uses the Itanium-ABI. Patch by Patrick Wildt. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230762 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 16:35:27 +00:00
Zoran Jovanovic	2846ef3680	[mips][microMIPS] Change register class for GP register Differential Revision: http://reviews.llvm.org/D7934 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230760 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 15:03:50 +00:00
Petar Jovanovic	8407da0dbc	Pass correct -mtriple for krait-cpu-div-attribute.ll Not passing mtriple for one of the tests caused a regression failure on MIPS buildbot. The issue was introduced by r230651. Differential Revision: http://reviews.llvm.org/D7938 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230756 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 14:46:41 +00:00
Chandler Carruth	c4179ffed3	[x86] Run most of the rest of the shuffle combining over non-128-bit vectors. This lets us fix the rest of the v16 lowering problems when pshufb is clearly better. We might still be able to improve some of the lowerings by enabling the other combine-based rewriting to fire for non-128-bit vectors, but this at least should remove any regressions from using the fancy v16i16 lowering strategy. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230753 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 12:13:14 +00:00
Chandler Carruth	2d58cc5f1b	[x86] Teach a bunch of the x86-specific shuffle combining to work with 256-bit vectors as well as 128-bit vectors. Fixes some of the redundant shuffles for v16i16. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230752 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 11:45:13 +00:00
Chandler Carruth	8c71e440a2	[x86] Make the v8i16 clever single-input shuffle lowering usable for repeated 128-bit lane shuffles of wider vector types and use it to lower 256-bit v16i16 vector shuffles where applicable. This should let us perfectly lowering the pattern of pshuflw and pshufhw even for AVX2 256-bit patterns. I've not added AVX-512 support, but it should be trivial for someone working on that to wire up. Note that currently this generates bad, long shuffle chains because we don't combine 256-bit target shuffles. The subsequent patches will fix that. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230751 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 11:33:46 +00:00
Chandler Carruth	f5651f8ab6	[x86] Add a bunch more tests for v16i16 shuffles. All of these are taken by mirroring v8i16 test cases across both 128-bit lanes. This should highlight problems where we aren't correctly using 128-bit shuffles to implement things. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230750 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 11:25:10 +00:00
Vasileios Kalintiris	912e816cc2	[mips] Account for constant-zero operands in ADDE nodes. Summary: We identify the cases where the operand to an ADDE node is a constant zero. In such cases, we can avoid generating an extra ADDu instruction disguised as an identity move alias (ie. addu $r, $r, 0 --> move $r, $r). Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7906 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230742 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 09:01:39 +00:00
Charles Davis	d51be017f0	Target/X86: Save Win64 non-volatile registers in a Win64 ABI function. Summary: This change causes us to actually save non-volatile registers in a Win64 ABI function that calls a System V ABI function, and vice-versa. Reviewers: rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7919 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230714 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-27 00:57:01 +00:00
Rafael Espindola	fc0ad8d28d	Put jump tables in distinct sections if -ffunction-sections is used. A small regression in r230411 was that we were basing the decision on -fdata-sections. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230707 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-26 23:55:11 +00:00
Chandler Carruth	b54c36fb4d	[x86] Fix PR22706 where we would incorrectly try lower a v32i8 dynamic blend as legal. We made the same mistake in two different places. Whenever we are custom lowering a v32i8 blend we need to check whether we are custom lowering it only for constant conditions that can be shuffled, or whether we actually have AVX2 and full dynamic blending support on bytes. Both are fixed, with comments added to make it clear what is going on and a new test case. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230695 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-26 22:15:34 +00:00
Reid Kleckner	783f7f989e	Don't sibcall between SysV and Win64 convention functions The shadow stack space expectations won't match. Fixes PR22709. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230667 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-26 19:43:20 +00:00
Paul Robinson	b2f521b647	When the source has a series of assignments, users reasonably want to have the debugger step through each one individually. Turn off the combine for adjacent stores at -O0 so we get this behavior. Possibly, DAGCombine shouldn't run at all at -O0, but that's for another day; see PR22346. Differential Revision: http://reviews.llvm.org/D7181 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230659 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-26 18:47:57 +00:00
Petar Jovanovic	e53d9df042	Fix justify error for small structures in varargs for MIPS64BE There was a problem when passing structures as variable arguments. The structures smaller than 64 bit were not left justified on MIPS64 big endian. This is now fixed by shifting the value to make it left- justified when appropriate. This fixes the bug http://llvm.org/bugs/show_bug.cgi?id=21608 Patch by Aleksandar Beserminji. Differential Revision: http://reviews.llvm.org/D7881 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230657 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-26 18:35:15 +00:00
Sumanth Gundapaneni	adaebc8b56	Use ".arch_extension" ARM directive to support hwdiv on krait In case of "krait" CPU, asm printer doesn't emit any ".cpu" so the features bits are not computed. This patch lets the asm printer emit ".cpu cortex-a9" directive for krait and the hwdiv feature is enabled through ".arch_extension". In short, krait is treated as "cortex-a9" with hwdiv. We can not emit ".krait" as CPU since it is not supported bu GNU GAS yet git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230651 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-26 18:08:41 +00:00
Tom Stellard	89e4328381	R600/SI: Remove M0 from DS assembly strings This matches the assembly syntax for the proprietary compiler. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230645 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-26 17:08:43 +00:00
Bruno Cardoso Lopes	bfa9a71f23	[X86][MMX] Fix a typo in a couple of tests git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230638 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-26 15:16:09 +00:00
Bruno Cardoso Lopes	dde2e4f7b9	[X86][MMX] Remove widening experimental flag from MMX tests. Turns out that after the past MMX commits, we don't need to rely on this flag to get better codegen for MMX. Also update the tests to become triple neutral. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230637 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-26 15:10:38 +00:00
Vladimir Medic	d89ac8f158	Replace obsolete -mattr=n64 command line option with -target-abi=n64. No functional changes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230628 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-26 12:29:48 +00:00
Hal Finkel	7840990de8	[PowerPC] Make LDtocL and friends invariant loads LDtocL, and other loads that roughly correspond to the TOC_ENTRY SDAG node, represent loads from the TOC, which is invariant. As a result, these loads can be hoisted out of loops, etc. In order to do this, we need to generate GOT-style MMOs for TOC_ENTRY, which requires treating it as a legitimate memory intrinsic node type. Once this is done, the MMO transfer is automatically handled for TableGen-driven instruction selection, and for nodes generated directly in PPCISelDAGToDAG, we need to transfer the MMOs manually. Also, we were not transferring MMOs associated with pre-increment loads, so do that too. Lastly, this fixes an exposed bug where R30 was not added as a defined operand of UpdateGBR. This problem was highlighted by an example (used to generate the test case) posted to llvmdev by Francois Pichet. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230553 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-25 21:36:59 +00:00
David Majnemer	92d1637e2f	X86, Win64: Allow 'mov' to restore the stack pointer if we have a FP The Win64 epilogue structure is very restrictive, it permits a very small number of opcodes and none of them are 'mov'. This means that given: mov %rbp, %rsp pop %rbp The mov isn't the epilogue, only the pop is. This is problematic unless a frame pointer is present in which case we are free to do whatever we'd like in the "body" of the function. If a frame pointer is present, unwinding will undo the prologue operations in reverse order regardless of the fact that we are at an instruction which is reseting the stack pointer. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230543 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-25 21:13:37 +00:00
Sanjoy Das	a0a0b40aa3	Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap (The change was landed in r230280 and caused the regression PR22674. This version contains a fix and a test-case for PR22674). When emitting the increment operation, SCEVExpander marks the operation as nuw or nsw based on the flags on the preincrement SCEV. This is incorrect because, for instance, it is possible that {-6,+,1} is <nuw> while {-6,+,1}+1 = {-5,+,1} is not. This change teaches SCEV to mark the increment as nuw/nsw only if it can explicitly prove that the increment operation won't overflow. Apart from the attached test case, another (more realistic) manifestation of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll. Differential Revision: http://reviews.llvm.org/D7778 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230533 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-25 20:02:59 +00:00
Vladimir Medic	d692ee81e8	[MIPS]Multiple and add instructions for Mips are currently available in mips32r2/mips64r2 and later but should also be available in mips4, mips5, and mips64. This patch fixes the requested features and updates the corresponding test files. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230500 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-25 15:24:37 +00:00
Bruno Cardoso Lopes	51fc7f5afa	[X86][MMX] Reapply: Add MMX instructions to foldable tables Reapply r230248. Teach the peephole optimizer to work with MMX instructions by adding entries into the foldable tables. This covers folding opportunities not handled during isel. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230499 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-25 15:14:02 +00:00
Renato Golin	b451f4e376	Improve handling of stack accesses in Thumb-1 Thumb-1 only allows SP-based LDR and STR to be word-sized, and SP-base LDR, STR, and ADD only allow offsets that are a multiple of 4. Make some changes to better make use of these instructions: * Use word loads for anyext byte and halfword loads from the stack. * Enforce 4-byte alignment on objects accessed in this way, to ensure that the offset is valid. * Do the same for objects whose frame index is used, in order to avoid having to use more than one ADD to generate the frame index. * Correct how many bits of offset we think AddrModeT1_s has. Patch by John Brawn. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230496 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-25 14:41:06 +00:00
Vladimir Medic	551fd3d820	Replace obsolete -mattr=n64 command line option with -target-abi=n64. No functional changes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230482 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-25 11:43:01 +00:00
Hal Finkel	d37914a662	[PowerPC] Add triples to QPX tests Some of these tests fail on Darwin systems because of a lack of a triple; fix that. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230421 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-25 01:26:59 +00:00
Hal Finkel	f8d179ba76	[PowerPC] Add support for the QPX vector instruction set This adds support for the QPX vector instruction set, which is used by the enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes wide, holding 4 double-precision floating-point values. Boolean values, modeled here as <4 x i1> are actually also represented as floating-point values (essentially { -1, 1 } for { false, true }). QPX shares many features with Altivec and VSX, but is distinct from both of them. One major difference is that, instead of adding completely-separate vector registers, QPX vector registers are extensions of the scalar floating-point registers (lane 0 is the corresponding scalar floating-point value). The operations supported on QPX vectors mirrors that supported on the scalar floating-point values (with some additional ones for permutations and logical/comparison operations). I've been maintaining this support out-of-tree, as part of the bgclang project, for several years. This is not the entire bgclang patch set, but is most of the subset that can be cleanly integrated into LLVM proper at this time. Adding this to the LLVM backend is part of my efforts to rebase bgclang to the current LLVM trunk, but is independently useful (especially for codes that use LLVM as a JIT in library form). The assembler/disassembler test coverage is complete. The CodeGen test coverage is not, but I've included some tests, and more will be added as follow-up work. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230413 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-25 01:06:45 +00:00
Rafael Espindola	76bdd01e0e	Support SHF_MERGE sections in COMDATs. This patch unifies the comdat and non-comdat code paths. By doing this it add missing features to the comdat side and removes the fixed section assumptions from the non-comdat side. In ELF there is no one true section for "4 byte mergeable" constants. We are better off computing the required properties of the section and asking the context for it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230411 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-25 00:52:15 +00:00
Eric Christopher	7c5314a076	Make this test even more OS and register allocation neutral. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230404 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-25 00:12:11 +00:00
Eric Christopher	8269a59b1c	Make this test not dependent upon the triple. All that was needed was some flexibility in the check line for the comment basic block. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230400 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-24 23:43:26 +00:00
Simon Pilgrim	41cda40157	Reapplied D7816 & rL230177 & rL230278 - with an additional fix toensure that the smallest build vector input scalar type is always used. Additional (crash) test cases already committed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230388 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-24 22:08:56 +00:00
Simon Pilgrim	0115755020	Added test case for PR22678 (check CONCAT_VECTORS DAG combiner pass doesn't introduce illegal types) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230386 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-24 21:46:23 +00:00
Andrew Kaylor	8f475e9d77	Fixing eol-style git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230378 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-24 20:49:35 +00:00
Eric Christopher	7c611d59cc	Revert: Author: Simon Pilgrim <llvm-dev@redking.me.uk> Date: Mon Feb 23 23:04:28 2015 +0000 Fix based on post-commit comment on D7816 & rL230177 - BUILD_VECTOR operand truncation was using the the BV's output scalar type instead of the input type. and Author: Simon Pilgrim <llvm-dev@redking.me.uk> Date: Sun Feb 22 18:17:28 2015 +0000 [DagCombiner] Generalized BuildVector Vector Concatenation The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node. This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well. This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper. Differential Revision: http://reviews.llvm.org/D7816 as the root cause of PR22678 which is causing an assertion inside the DAG combiner. I'll follow up to the main thread as well. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230358 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-24 19:11:00 +00:00
Matthias Braun	dd1a6e074d	AArch64: Relax assert about large shift sizes. The reason why these large shift sizes happen is because OpaqueConstants currently inhibit alot of DAG combining, but that has to be addressed in another commit (like the proposal in D6946). Differential Revision: http://reviews.llvm.org/D6940 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230355 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-24 18:52:04 +00:00
Tom Stellard	ba150ed636	R600/SI: Remove isel mubuf legalization We legalize mubuf instructions post-instruction selection, so this code is no longer needed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230352 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-24 17:59:19 +00:00
Tim Northover	5530ac99e6	ARM: treat [N x i32] and [N x i64] as AAPCS composite types The logic is almost there already, with our special homogeneous aggregate handling. Tweaking it like this allows front-ends to emit AAPCS compliant code without ever having to count registers or add discarded padding arguments. Only arrays of i32 and i64 are needed to model AAPCS rules, but I decided to apply the logic to all integer arrays for more consistency. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230348 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-24 17:22:34 +00:00
Hans Wennborg	b499b73e30	Revert r230280: "Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap" This caused PR22674, failing this assert: Instructions.h:2281: llvm::Value* llvm::PHINode::getOperand(unsigned int) const: Assertion `i_nocapture < OperandTraits<PHINode>::operands(this) && "getOperand() out of range!"' failed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230341 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-24 16:19:29 +00:00
Michael Kuperstein	2379e8a2ee	[x32] x32 should use ebx as the base pointer. This fixes the original issue in PR22655, but not the secondary one. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230334 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-24 15:27:13 +00:00
Reed Kotler	aecbb87ee8	Beginning of alloca implementation for Mips fast-isel Summary: Begin to add various address modes; including alloca. Test Plan: Make sure there are no regressions in test-suite at O0/02 in mips32r1/r2 Reviewers: dsanders Reviewed By: dsanders Subscribers: echristo, rfuhler, llvm-commits Differential Revision: http://reviews.llvm.org/D6426 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230300 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-24 02:36:45 +00:00
David Majnemer	fbdee9f0c0	X86: Only use 'lea' in Win64 epilogues if a frame pointer exists We can only use 'add' in epilogues, 'lea' is not permitted unless we've established a frame pointer in the prologue. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230286 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-24 00:11:32 +00:00
Sanjoy Das	8d16a81c33	Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap When emitting the increment operation, SCEVExpander marks the operation as nuw or nsw based on the flags on the preincrement SCEV. This is incorrect because, for instance, it is possible that {-6,+,1} is <nuw> while {-6,+,1}+1 = {-5,+,1} is not. This change teaches SCEV to mark the increment as nuw/nsw only if it can explicitly prove that the increment operation won't overflow. Apart from the attached test case, another (more realistic) manifestation of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll. NOTE: this change was landed with an incorrect commit message in rL230275 and was reverted for that reason in rL230279. This commit message is the correct one. Differential Revision: http://reviews.llvm.org/D7778 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230280 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-23 23:22:58 +00:00
Sanjoy Das	69048edf8a	Revert 230275. 230275 got committed with an incorrect commit message due to a mixup on my side. Will re-land in a few moments with the correct commit message. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230279 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-23 23:13:22 +00:00
Andrea Di Biagio	770e106ed6	[X86] Teach how to custom lower double-to-half conversions under fast-math. This patch teaches the backend how to expand a double-half conversion into a double-float conversion immediately followed by a float-half conversion. We do this only under fast-math, and if float-half conversions are legal for the target. Added test CodeGen/X86/fastmath-float-half-conversion.ll Differential Revision: http://reviews.llvm.org/D7832 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230276 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-23 22:59:02 +00:00
Sanjoy Das	7ebbc8de2f	Fix bug 22641 The bug was a result of getPreStartForExtend interpreting nsw/nuw flags on an add recurrence more strongly than is legal. {S,+,X}<nsw> implies S+X is nsw only if the backedge of the loop is taken at least once. Differential Revision: http://reviews.llvm.org/D7808 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230275 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-23 22:55:13 +00:00
David Majnemer	ad6622575c	X86: Use a smaller 'mov' instruction for stack probe calls Prologue emission, in some cases, requires calls to a stack probe helper function. The amount of stack to probe is passed as a register argument in the Win64 ABI but the instruction sequence used is pessimistic: it assumes that the number of bytes to probe is greater than 4 GB. Instead, select a more appropriate opcode depending on the number of bytes we are going to probe. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230270 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-23 21:50:30 +00:00
David Majnemer	d71e4c6218	X86: Use 'mov' instead of 'lea' in Win64 SEH prologues when possible 'mov' and 'lea' are equivalent when the displacement applied with 'lea' is zero. However, 'mov' should encode smaller. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230269 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-23 21:50:27 +00:00
Bruno Cardoso Lopes	a7db376a63	[X86][MMX] Fix test to reflect current codegen This test failed in several buildbots, a bit unclear how that happen since this was the previous behavior before r230248. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230258 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-23 20:57:46 +00:00
Andrew Kaylor	595050a793	Adding test for Windows EH frame variable remapping. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230250 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-23 20:04:51 +00:00
Andrew Kaylor	1d10231766	Remap frame variables for native Windows exception handling. Differential Revision: http://reviews.llvm.org/D7770 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230249 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-23 20:01:56 +00:00
Bruno Cardoso Lopes	ee7b509aa3	Revert "[X86][MMX] Add MMX instructions to foldable tables" This reverts commit r230226 since it breaks win buildbots. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230248 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-23 19:53:37 +00:00
Daniel Sanders	b50b4e2d36	[mips] Honour -mno-odd-spreg for vector insert/extract when MSA is enabled. Summary: -mno-odd-spreg prohibits the use of odd-numbered single-precision floating point registers. However, vector insert/extract was still using them when manipulating the subregisters of an MSA register. Fixed this by ensuring that insertion/extraction is only performed on even-numbered vector registers when -mno-odd-spreg is given. Reviewers: vmedic, sstankovic Reviewed By: sstankovic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7672 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230235 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-23 17:22:16 +00:00
Bruno Cardoso Lopes	01312dd0b4	[X86] Add specific mtriple in order to appease builbots git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230229 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-23 15:33:40 +00:00
Bruno Cardoso Lopes	77d2363908	[X86][MMX] Add MMX instructions to foldable tables Teach the peephole optimizer to work with MMX instructions by adding entries into the foldable tables. This covers folding opportunities not handled during isel. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230226 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-23 15:23:22 +00:00
Bruno Cardoso Lopes	c606f3a3cb	[X86][MMX] Support folding loads in psll, psrl and psra intrinsics git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230225 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-23 15:23:14 +00:00
Bruno Cardoso Lopes	6916c75fba	[X86][MMX] Add tests for pslli, psrli and psrai intrinsics Add tests to cover the RR form of the pslli, psrli and psrai intrinsics. In the next commit, the loads are going to be folded and the instructions use the RM form. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230224 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-23 15:23:06 +00:00
Elena Demikhovsky	fdafc8fd5e	AVX-512: recommitted 229837 + bugfix + test git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230223 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-23 15:12:31 +00:00
Simon Pilgrim	66c960350c	[DagCombiner] Generalized BuildVector Vector Concatenation The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node. This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well. This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper. Differential Revision: http://reviews.llvm.org/D7816 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230177 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-22 18:17:28 +00:00
Matt Arsenault	29f97a6c46	R600/SI: Use v_madmk_f32 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230149 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-21 21:29:10 +00:00
Matt Arsenault	c490f78e53	R600/SI: Try to use v_madak_f32 This is a code size optimization when the constant only has one use. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230148 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-21 21:29:07 +00:00
Simon Pilgrim	b430a06e94	[X86][SSE] Added shuffle based integer zero extension tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230145 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-21 21:25:16 +00:00
David Majnemer	e95985d3a0	Win64: Stack alignment constraints aren't applied during SET_FPREG Stack realignment occurs after the prolog, not during, for Win64. Because of this, don't factor in the maximum stack alignment when establishing a frame pointer. This fixes PR22572. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230113 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-21 01:04:47 +00:00
Rafael Espindola	c093973970	Use short names for jumptable sections. Also refactor code to remove some duplication. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230087 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-20 23:28:28 +00:00
Matt Arsenault	16fc5e9c0f	R600/SI: Remove v_sub_f64 pseudo The expansion code does the same thing. Since the operands were not defined with the correct types, this has the side effect of fixing operand folding since the expanded pseudo would never use SGPRs or inline immediates. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230072 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-20 22:10:45 +00:00
Matt Arsenault	bbb748eece	R600: Use new fmad node. This enables a few useful combines that used to only use fma. Also since v_mad_f32 apparently does not support denormals, disable the existing cases that are custom handled if they are requested. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230071 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-20 22:10:41 +00:00
Jozef Kolek	b2e79a8e69	Reversed revision 229706. The reason is regression, which is caused by the usage of instruction ADDU16 by CodeGen. For this instruction an improper register is allocated, i.e. the register that is not from register set defined for the instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230053 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-20 20:26:52 +00:00
Andrea Di Biagio	3583d23018	[X86][FastIsel] Teach how to select float-half conversion intrinsics. This patch teaches X86FastISel how to select intrinsic 'convert_from_fp16' and intrinsic 'convert_to_fp16'. If the target has F16C, we can select VCVTPS2PHrr for a float-half conversion, and VCVTPH2PSrr for a half-float conversion. Differential Revision: http://reviews.llvm.org/D7673 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230043 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-20 19:37:14 +00:00
Kit Barton	3e00ca983c	I incorrectly marked the VORC instruction as isCommutable when I added it. This fix removes the VORC instruction definition from the isCommutable block. Phabricator review: http://reviews.llvm.org/D7772 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230020 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-20 15:54:58 +00:00
Hal Finkel	2c5f9584ba	[PowerPC] Loop Data Prefetching for the BG/Q The IBM BG/Q supercomputer's A2 cores have a hardware prefetching unit, the L1P, but it does not prefetch directly into the A2's L1 cache. Instead, it prefetches into its own L1P buffer, and the latency to access that buffer is significantly higher than that to the L1 cache (although smaller than the latency to the L2 cache). As a result, especially when multiple hardware threads are not actively busy, explicitly prefetching data into the L1 cache is advantageous. I've been using this pass out-of-tree for data prefetching on the BG/Q for well over a year, and it has worked quite well. It is enabled by default only for the BG/Q, but can be enabled for other cores as well via a command-line option. Eventually, we might want to add some TTI interfaces and move this into Transforms/Scalar (there is nothing particularly target dependent about it, although only machines like the BG/Q will benefit from its simplistic strategy). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229966 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-20 05:08:21 +00:00
Chandler Carruth	efbbaefea5	[x86] Remove the old vector shuffle lowering code and its flag. The new shuffle lowering has been the default for some time. I've enabled the new legality testing by default with no really blocking regressions. I've fuzz tested this very heavily (many millions of fuzz test cases have passed at this point). And this cleans up a ton of code. =] Thanks again to the many folks that helped with this transition. There was a lot of work by others that went into the new shuffle lowering to make it really excellent. In case you aren't using a diff algorithm that can handle this: X86ISelLowering.cpp: 22 insertions(+), 2940 deletions(-) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229964 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-20 04:25:04 +00:00
Chandler Carruth	07ef8904ad	[x86] Now that the new vector shuffle legality is enabled and everything is going well, remove the flag and the code for the old legality tests. This is the first step toward removing the entire old vector shuffle lowering. Much more code to delete coming up next. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229963 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-20 03:59:35 +00:00
Chandler Carruth	38749b8e07	[x86] Make the new vector shuffle legality test on by default, which reflects the fact that the x86 backend can in fact lower any shuffle you want it to with reasonably high code quality. My recent work on the new vector shuffle has made this regress very little. The diff in the test cases makes me very, very happy. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229958 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-20 03:05:47 +00:00
Chandler Carruth	4fd100f9e9	[x86] Clean up a couple of test cases with the new update script. Split one test case that is only partially tested in 32-bits into two test cases so that the script doesn't generate massive spews of tests for the cases we don't care about. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229955 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-20 02:44:13 +00:00
Chandler Carruth	90b8e791ac	Revert r229944: EH: Prune unreachable resume instructions during Dwarf EH preparation This doesn't pass 'ninja check-llvm' for me. Lots of tests, including the ones updated, fail with crashes and other explosions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229952 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-20 02:15:36 +00:00
Reid Kleckner	49ab3a626a	EH: Prune unreachable resume instructions during Dwarf EH preparation Today a simple function that only catches exceptions and doesn't run destructor cleanups ends up containing a dead call to _Unwind_Resume (PR20300). We can't remove these dead resume instructions during normal optimization because inlining might introduce additional landingpads that do have cleanups to run. Instead we can do this during EH preparation, which is guaranteed to run after inlining. Fixes PR20300. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D7744 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229944 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-20 01:00:19 +00:00
Eric Christopher	8c4bb575e1	Revert "AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics." The instructions were being generated on architectures that don't support avx512. This reverts commit r229837. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229942 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-20 00:45:28 +00:00
Ahmed Bougacha	5898fc70ec	[ARM] Re-re-apply VLD1/VST1 base-update combine. This re-applies r223862, r224198, r224203, and r224754, which were reverted in r228129 because they exposed Clang misalignment problems when self-hosting. The combine caused the crashes because we turned ISD::LOAD/STORE nodes to ARMISD::VLD1/VST1_UPD nodes. When selecting addressing modes, we were very lax for the former, and only emitted the alignment operand (as in "[r1:128]") when it was larger than the standard alignment of the memory type. However, for ARMISD nodes, we just used the MMO alignment, no matter what. In our case, we turned ISD nodes to ARMISD nodes, and this caused the alignment operands to start being emitted. And that's how we exposed alignment problems that were ignored before (but I believe would have been caught with SCTRL.A==1?). To fix this, we can just mirror the hack done for ISD nodes: only take into account the MMO alignment when the access is overaligned. Original commit message: We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). rdar://19717869, rdar://14062261. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229932 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 23:52:41 +00:00
Sanjay Patel	4dad5f2731	add X86 load folding tests for unary math ops X86 load folding is fragile; eg, the tests here don't work without AVX even though they should. This is because we have a mix of tablegen patterns that have been added over time, and we have a load folding table used by the peephole optimizer that has to be kept in sync with the ever-changing ISA and tablegen defs. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229870 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 16:59:11 +00:00
Chandler Carruth	b7012af85f	[x86] Delete still more piles of complex code now that we have a good systematic lowering of v8i16. This required a slight strategy shift to prefer unpack lowerings in more places. While this isn't a cut-and-dry win in every case, it is in the overwhelming majority. There are only a few places where the old lowering would probably be a touch faster, and then only by a small margin. In some cases, this is yet another significant improvement. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229859 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 15:21:57 +00:00
Chandler Carruth	c57e90422f	[x86] Teach the unpack lowering how to lower with an initial unpack in addition to lowering to trees rooted in an unpack. This saves shuffles and or registers in many various ways, lets us handle another class of v4i32 shuffles pre SSE4.1 without domain crosses, etc. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229856 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 15:06:13 +00:00
Chandler Carruth	7f583a4201	[x86] Dramatically improve v8i16 shuffle lowering by not using its terribly complex partial blend logic. This code path was one of the more complex and bug prone when it first went in and it hasn't faired much better. Ultimately, with the simpler basis for unpack lowering and support bit-math blending, this is completely obsolete. In the worst case without this we generate different but equivalent instructions. However, in many cases we generate much better code. This is especially true when blends or pshufb is available. This does expose one (minor) weakness of the unpack lowering that I'll try to address. In case you were wondering, this is actually a big part of what I've been trying to pull off in the recent string of commits. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229853 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 14:08:24 +00:00
Chandler Carruth	943b2ca2de	[x86] Remove the final fallback in the v8i16 lowering that isn't really needed, and significantly improve the SSSE3 path. This makes the new strategy much more clear. If we can blend, we just go with that. If we can't blend, we try to permute into an unpack so that we handle cases where the unpack doing the blend also simplifies the shuffle. If that fails and we've got SSSE3, we now call into factored-out pshufb lowering code so that we leverage the fact that pshufb can set up a blend for us while shuffling. This generates great code, especially because we know we don't have a fast blend at this point. Finally, we fall back on decomposing into permutes and blends because we do at least have a bit-math-based blend if we need to use that. This pretty significantly improves some of the v8i16 code paths. We never need to form pshufb for the single-input shuffles because we have effective target-specific combines to form it there, but we were missing its effectiveness in the blends. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229851 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 13:56:49 +00:00
Chandler Carruth	c3d7858505	[x86] Simplify the pre-SSSE3 v16i8 lowering significantly by decomposing them into permutes and a blend with the generic decomposition logic. This works really well in almost every case and lets the code only manage the expansion of a single input into two v8i16 vectors to perform the actual shuffle. The blend-based merging is often much nicer than the pack based merging that this replaces. The only place where it isn't we end up blending between two packs when we could do a single pack. To handle that case, just teach the v2i64 lowering to handle these blends by digging out the operands. With this we're down to only really random permutations that cause an explosion of instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229849 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 13:15:12 +00:00
Chandler Carruth	3d4542ce3d	[x86] Remove the insanely over-aggressive unpack lowering strategy for v16i8 shuffles, and replace it with new facilities. This uses precise patterns to match exact unpacks, and the new generalized unpack lowering only when we detect a case where we will have to shuffle both inputs anyways and they terminate in exactly a blend. This fixes all of the blend horrors that I uncovered by always lowering blends through the vector shuffle lowering. It also removes sooooo much of the crazy instruction sequences required for v16i8 lowering previously. Much cleaner now. The only "meh" aspect is that we sometimes use pshufb+pshufb+unpck when it would be marginally nicer to use pshufb+pshufb+por. However, the difference there is tiny. In many cases its a win because we re-use the pshufb mask. In others, we get to avoid the pshufb entirely. I've left a FIXME, but I'm dubious we can really do better than this. I'm actually pretty happy with this lowering now. For SSE2 this exposes some horrors that were really already there. Those will have to fixed by changing a different path through the v16i8 lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229846 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 12:10:37 +00:00
Jozef Kolek	bb539d3b4c	[mips][microMIPS] Make usage of AND16, OR16 and XOR16 by code generator Differential Revision: http://reviews.llvm.org/D7611 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229845 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 11:51:32 +00:00
Elena Demikhovsky	675d06d1d0	AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229837 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 10:48:04 +00:00
Chandler Carruth	ac2b1a1bb3	[x86] Add support for bit-wise blending and use it in the v8 and v16 lowering paths. I'm going to be leveraging this to simplify a lot of the overly complex lowering of v8 and v16 shuffles in pre-SSSE3 modes. Sadly, this isn't profitable on v4i32 and v2i64. There, the float and double blending instructions for pre-SSE4.1 are actually pretty good, and we can't beat them with bit math. And once SSE4.1 comes around we have direct blending support and this ceases to be relevant. Also, some of the test cases look odd because the domain fixer canonicalizes these to floating point domain. That's OK, it'll use the integer domain when it matters and some day I may be able to update enough of LLVM to canonicalize the other way. This restores almost all of the regressions from teaching x86's vselect lowering to always use vector shuffle lowering for blends. The remaining problems are because the v16 lowering path is still doing crazy things. I'll be re-arranging that strategy in more detail in subsequent commits to finish recovering the performance here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229836 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 10:46:52 +00:00
Chandler Carruth	a8fb39af83	[x86,sdag] Two interrelated changes to the x86 and sdag code. First, don't combine bit masking into vector shuffles (even ones the target can handle) once operation legalization has taken place. Custom legalization of vector shuffles may exist for these patterns (making the predicate return true) but that custom legalization may in some cases produce the exact bit math this matches. We only really want to handle this prior to operation legalization. However, the x86 backend, in a fit of awesome, relied on this. What it would do is mark VSELECTs as expand, which would turn them into arithmetic, which this would then match back into vector shuffles, which we would then lower properly. Amazing. Instead, the second change is to teach the x86 backend to directly form vector shuffles from VSELECT nodes with constant conditions, and to mark all of the vector types we support lowering blends as shuffles as custom VSELECT lowering. We still mark the forms which actually support variable blends as legal so that the custom lowering is bypassed, and the legal lowering can even be used by the vector shuffle legalization (yes, i know, this is confusing. but that's how the patterns are written). This makes the VSELECT lowering much more sensible, and in fact should fix a bunch of bugs with it. However, as you'll see in the test cases, right now what it does is point out the hilarious deficiency of the new vector shuffle lowering when it comes to blends. Fortunately, my very next patch fixes that. I can't submit it yet, because that patch, somewhat obviously, forms the exact and/or pattern that the DAG combine is matching here! Without this patch, teaching the vector shuffle lowering to produce the right code infloops in the DAG combiner. With this patch alone, we produce terrible code but at least lower through the right paths. With both patches, all the regressions here should be fixed, and a bunch of the improvements (like using 2 shufps with no memory loads instead of 2 andps with memory loads and an orps) will stay. Win! There is one other change worth noting here. We had hilariously wrong vectorization cost estimates for vselect because we fell through to the code path that assumed all "expand" vector operations are scalarized. However, the "expand" lowering of VSELECT is vector bit math, most definitely not scalarized. So now we go back to the correct if horribly naive cost of "1" for "not scalarized". If anyone wants to add actual modeling of shuffle costs, that would be cool, but this seems an improvement on its own. Note the removal of 16 and 32 "costs" for doing a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of course, we don't right now because of OMG bad code, but I'm going to fix that. Next patch. I promise. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229835 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 10:36:19 +00:00
Peter Collingbourne	7d3b145da4	llvm-mc: Use Target::createNullStreamer to fix crashes on target-specific asm directives. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229798 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 00:45:04 +00:00
Chandler Carruth	00c954ffc4	[x86] Merge checks for a recently added test case that is the same on all SSE variants and AVX variants. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229770 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 23:20:49 +00:00
Reid Kleckner	f89d9b1c75	Add an IR-to-IR test for dwarf EH preparation using opt This tests the simple resume instruction elimination logic that we have before making some changes to it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229768 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 23:17:41 +00:00
Reid Kleckner	ae09ebc540	dos2unix the WinEH file and tests git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229735 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 19:52:46 +00:00
Andrew Kaylor	a4976167c4	Adding implementation to outline C++ catch handlers for native Windows 64 exception handling. Differential Revision: http://reviews.llvm.org/D7363 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229715 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 18:31:51 +00:00
Jozef Kolek	2032d755e7	[mips][microMIPS] Make usage of ADDU16 and SUBU16 by code generator Differential Revision: http://reviews.llvm.org/D7609 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229706 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 17:33:56 +00:00
Daniel Sanders	7eedd07d5e	[mips] Add backend support for Mips32r[35] and Mips64r[35]. Summary: These ISA's didn't add any instructions so they are almost identical to Mips32r2 and Mips64r2. Even the ELF e_flags are the same, However the ISA revision in .MIPS.abiflags is 3 or 5 respectively instead of 2. Reviewers: vmedic Reviewed By: vmedic Subscribers: tomatabacu, llvm-commits, atanasyan Differential Revision: http://reviews.llvm.org/D7381 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229695 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 16:24:50 +00:00
Kit Barton	31840a62af	This patch adds the VSX logical instructions introduced in the Power ISA 2.07. It also removes the added complexity that favors VMX versions of the three instructions. Phabricator review: http://reviews.llvm.org/D7616 Commiting on Nemanja's behalf. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229694 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 16:21:46 +00:00
Vasileios Kalintiris	0563ea452c	[mips] Avoid redundant sign extension of the result of binary bitwise instructions. Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7581 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229675 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 14:57:05 +00:00
Bradley Smith	4fe6d075d5	[ARM] Add missing M/R class CPUs Add some of the missing M and R class Cortex CPUs, namely: Cortex-M0+ (called Cortex-M0plus for GCC compatibility) Cortex-M1 SC000 SC300 Cortex-R5 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229660 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 10:33:30 +00:00
Michael Kuperstein	9571ea6620	Fixes two issue in SimplifyDemandedBits of sext_in_reg: 1) We should not try to simplify if the sext has multiple uses 2) There is no need to simplify is the source value is already sign-extended. Patch by Gil Rapaport <gil.rapaport@intel.com> Differential Revision: http://reviews.llvm.org/D6949 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229659 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 09:43:40 +00:00
Chandler Carruth	4e8a4638e9	[x86] Refactor the bit shift code the same as I just did the byte shift code. While this didn't have the miscompile (it used MatchLeft consistently) it missed some cases where it could use right shifts. I've added a test case Craig Topper came up with to exercise the right shift matching. This code is really identical between the two. I'm going to merge them next so that we don't keep two copies of all of this logic. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229655 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 09:19:58 +00:00
Ulrich Weigand	bebd59c74b	[SystemZ] Support all TLS access models - CodeGen part The current SystemZ back-end only supports the local-exec TLS access model. This patch adds all required CodeGen support for the other TLS models, which means in particular: - Expand initial-exec TLS accesses by loading TLS offsets from the GOT using @indntpoff relocations. - Expand general-dynamic and local-dynamic accesses by generating the appropriate calls to __tls_get_offset. Note that this routine has a non-standard ABI and requires loading the GOT pointer into %r12, so the patch also adds support for the GLOBAL_OFFSET_TABLE ISD node. - Add a new platform-specific optimization pass to remove redundant __tls_get_offset calls in the local-dynamic model (modeled after the corresponding X86 pass). - Add test cases verifying all access models and optimizations. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229654 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 09:13:27 +00:00
Daniel Jasper	66bd6852bb	Remove experimental options to control machine block placement. This reverts r226034. Benchmarking with those flags has not revealed anything interesting. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229648 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 08:18:07 +00:00
Elena Demikhovsky	87483ed180	AVX-512: Added support for FP instructions with embedded rounding mode. By Asaf Badouh <asaf.badouh@intel.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229645 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 07:59:20 +00:00
Craig Topper	052d754ccb	[X86] Add another test case for the bug fixed in r229642. With the bug a vpsrldq was emitted instead of pslldq. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229643 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 07:45:43 +00:00
Chandler Carruth	c9520b48ae	[x86] Rewrite the byte shift detection to not use boolean variables to track state. I didn't like this in the code review because the pattern tends to be error prone, but I didn't see a clear way to rewrite it. Turns out that there were bugs here, I found them when fuzz testing our shuffle lowering for correctness on x86. The core of the problem is that we need to consistently test all our preconditions for the same directionality of shift and the same input vector. Instead, formulate this as two predicates (one doesn't depend on the input in any way), pass things like the directionality and input vector as inputs, and loop over the alternatives. This fixes a pattern of very rare miscompiles coming out of this code. Turned up roughly 4 out of every 1 million v8 shuffles in my fuzz testing. The new code is over half a million test runs with no failures yet. I've also fuzzed every other function in the lowering code with over 3.5 million test cases and not discovered any other miscompiles. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229642 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 07:13:48 +00:00
Craig Topper	ed42dcef75	[X86] Remove AVX2 and SSE2 pslldq and psrldq intrinsics. We can represent them in IR with vector shuffles now. All their uses have been removed from clang in favor of shuffles. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229640 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 06:24:44 +00:00
Matt Arsenault	2422768a8a	R600/SI: Add missing offset operand to buffer bothen git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229605 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 02:04:38 +00:00
Matt Arsenault	fe524d5902	R600/SI: Add missing soffset operand to global atomics git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229604 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 02:04:35 +00:00
Andrea Di Biagio	b3ff6a88b6	[X86][FastIsel] Teach how to select scalar integer to float/double conversions. This patch teaches fast-isel how to select a (V)CVTSI2SSrr for an integer to float conversion, and how to select a (V)CVTSI2SDrr for an integer to double conversion. Added test 'fast-isel-int-float-conversion.ll'. Differential Revision: http://reviews.llvm.org/D7698 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229589 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 23:40:58 +00:00
Rafael Espindola	3b75cfe179	Add r228939 back with a fix. The problem in the original patch was not switching back to .text after printing an eh table. Original message: On ELF, put PIC jump tables in a non executable section. Fixes PR22558. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229586 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 23:34:51 +00:00
Rafael Espindola	54b2025420	Add a test showing the problem in r228939. If an EH table is printed in between the function and the jump table we would fail to switch back to the text section to print the jump table. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229580 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 23:21:46 +00:00
Simon Pilgrim	cbc2ca5ec9	[X86][SSE] Generalised unpckl/unpckh shuffle matching Added commuted unpckl/unpckh shuffle matching patterns as many cases containing undefined lanes fail to commute by themselves. Differential Revision: http://reviews.llvm.org/D7564 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229571 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 22:24:32 +00:00
Sanjay Patel	ef23f042ce	use a triple instead of a cpu; less builbot sadness git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229563 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 21:59:54 +00:00
Rafael Espindola	51beb495fc	Add testcases I missed in r229541. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229542 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 20:50:39 +00:00
Sanjay Patel	166769cfb4	make basic block label matching more flexible for less sad buildbots git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229535 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 20:29:31 +00:00
Tom Stellard	ec5b9ab433	R600/SI: Fix asam errors in SIFoldOperands We were trying to fold into implicit uses, which led to out of bounds access of the MCInstrDesc::OpInfo arrray. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229533 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 20:11:54 +00:00
Sanjay Patel	544843cee1	prevent folding a scalar FP load into a packed logical FP instruction (PR22371) Change the memory operands in sse12_fp_packed_scalar_logical_alias from scalars to vectors. That's what the hardware packed logical FP instructions define: 128-bit memory operands. There are no scalar versions of these instructions...because this is x86. Generating the wrong code (folding a scalar load into a 128-bit load) is still possible using the peephole optimization pass and the load folding tables. We won't completely solve this bug until we either fix the lowering in fabs/fneg/fcopysign and any other places where scalar FP logic is created or fix the load folding in foldMemoryOperandImpl() to make sure it isn't changing the size of the load. Differential Revision: http://reviews.llvm.org/D7474 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229531 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 20:08:21 +00:00
Sanjay Patel	1115b2c27e	Canonicalize splats as build_vectors (PR22283) This is a follow-on patch to: http://reviews.llvm.org/D7093 That patch canonicalized constant splats as build_vectors, and this patch removes the constant check so we can canonicalize all splats as build_vectors. This fixes the 2nd test case in PR22283: http://llvm.org/bugs/show_bug.cgi?id=22283 The unfortunate code duplication between SelectionDAG and DAGCombiner is discussed in the earlier patch review. At least this patch is just removing code... This improves an existing x86 AVX test and changes codegen in an ARM test. Differential Revision: http://reviews.llvm.org/D7389 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229511 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 16:54:32 +00:00
Tom Stellard	7a7153e5ee	R600/SI: Extend private extload pattern to include zext loads git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229507 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 16:36:00 +00:00
Andrea Di Biagio	16389eee3f	[X86][FastISel] Add missing flag -fast-isel-abort to run lines in test fast-isel-fptrunc-fpext.ll. Flag -fast-isel-abort is required in order to verify that X86FastISel never fails to select FPExt (float-to-double) and FPTrunc (double-to-float). No Functional change intended. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229489 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 12:25:49 +00:00
Elena Demikhovsky	199f58a198	AVX-512: changes in intel_ocl_bi calling conventions - added mask types v8i1 and v16i1 to possible function parameters - enabled passing 512-bit vectors in standard CC - added a test for KNL intel_ocl_bi conventions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229482 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 09:20:12 +00:00
Michael Kuperstein	e275542046	[X86] Combine vector anyext + and into a vector zext Vector zext tends to get legalized into a vector anyext, represented as a vector shuffle with an undef vector + a bitcast, that gets ANDed with a mask that zeroes the undef elements. Combine this into an explicit shuffle with a zero vector instead. This allows shuffle lowering to match it as a zext, instead of matching it as an anyext and emitting an explicit AND. This combine only covers a subset of the cases, but it's a start. Differential Revision: http://reviews.llvm.org/D7666 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229480 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 08:22:51 +00:00
Eric Christopher	fb031eee53	Move ABI handling and 64-bitness to the PowerPC target machine. This required changing how the computation of the ABI is handled and how some of the checks for ABI/target are done. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229471 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 06:45:15 +00:00
Chandler Carruth	1e357351be	[x86] Teach the unpack lowering to try wider element unpacks. This allows it to match still more places where previously we would have to fall back on floating point shuffles or other more complex lowering strategies. I'm hoping to replace some of the hand-rolled unpack matching with this routine is it gets more and more clever. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229463 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 02:12:24 +00:00
Hal Finkel	4ba3a67430	Specify arch in test/CodeGen/X86/float-conv-elim.ll This test was failing on non-x86 hosts because it specified a cpu of x86_64, but not an architecture. x86_64 is obviously not a valid cpu on all architectures. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229460 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 00:11:19 +00:00
Hal Finkel	ba51ae6864	[PowerPC] Support non-direct-sub/superclass VSX copies Our register allocation has become better recently, it seems, and is now starting to generate cross-block copies into inflated register classes. These copies are not transformed into subregister insertions/extractions by the PPCVSXCopy class, and so need to be handled directly by PPCInstrInfo::copyPhysReg. The code to do this was almost there, but not quite (it was unnecessarily restricting itself to only the direct sub/super-register-class case (not copying between, for example, something in VRRC and the lower-half of VSRC which are super-registers of F8RC). Triggering this behavior manually is difficult; I'm including two bugpoint-reduced test cases from the test suite. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229457 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 23:46:30 +00:00
Cameron McInally	cdddfe0cb3	[AVX512] Make 512b vector floating point rounds legal on AVX512. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229445 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 22:15:42 +00:00
Simon Pilgrim	0638f4e115	[X86][SSE] Add SSE MOVQ instructions to SSEPackedInt domain Patch to explicitly add the SSE MOVQ (rr,mr,rm) instructions to SSEPackedInt domain - prevents a number of costly domain switches. Differential Revision: http://reviews.llvm.org/D7600 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229439 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 21:50:56 +00:00
Mehdi Amini	2deb1d0b54	SelectionDAG: fold (fp_to_u/sint (s/uint_to_fp)) here too Update SPARC tests to match. From: Fiona Glaser <fglaser@apple.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229438 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 21:47:58 +00:00
Craig Topper	4031c08c87	[X86] Remove the multiply by 8 that goes into the shift constant for X86ISD::VSHLDQ and X86ISD::VSRLDQ. This simplifies the pattern matching in isel and allows these nodes to become the patterns embedded in the instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229431 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 20:52:07 +00:00
Andrew Trick	4f7d60c1ea	AArch64: Safely handle the incoming sret call argument. This adds a safe interface to the machine independent InputArg struct for accessing the index of the original (IR-level) argument. When a non-native return type is lowered, we generate the hidden machine-level sret argument on-the-fly. Before this fix, we were representing this argument as OrigArgIndex == 0, which is an outright lie. In particular this crashed in the AArch64 backend where we actually try to access the type of the original argument. Now we use a sentinel value for machine arguments that have no original argument index. AArch64, ARM, Mips, and PPC now check for this case before accessing the original argument. Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229413 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 18:10:47 +00:00
Chandler Carruth	cbe6ecfc81	[x86] Add a generic unpack-targeted lowering technique. This can be used to generically lower blends and is particularly nice because it is available frome SSE2 onward. This removes a lot of the remaining domain crossing blends in SSE2 code. I'm hoping to replace some of the "interleaved" lowering hacks with something closer to this which should be more principled. First, this needs to learn how to detect and use other interleavings besides that of the natural type provided. That will be a follow-up patch though. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229378 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 12:28:18 +00:00
Chandler Carruth	e62fbca6b7	[x86] Switch this test to use checks generated by my update script. NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229377 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 12:23:22 +00:00
Chandler Carruth	29679ccc12	[x86] Add initial basic support for forming blends of v16i8 vectors. This blend instruction is ... really lame. The register usage is insane. As a consequence this is probably only barely better than 2 pshufbs followed by a por, and that mostly because it only has to read from a single memory location. However, this doesn't fix as much as I kind of expected, so more to go. Pretty sure that the ordering and delegation of v16i8 is just really, really bad. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229373 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 10:58:23 +00:00
Chandler Carruth	ab497238cb	[x86] Add some more test cases for i8 vector blends. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229372 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 10:51:49 +00:00
Craig Topper	74b9ad3485	[X86] Add support for lowering shuffles to 256-bit PALIGNR instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229359 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 06:29:06 +00:00
Craig Topper	abdf58f7f9	[X86] Remove some hard tab characters from tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229358 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 06:29:02 +00:00
Chandler Carruth	454c3997b4	[x86] Teach the 128-bit vector shuffle lowering routines to take advantage of the existence of a reasonable blend instruction. The 256-bit vector shuffle lowering has leveraged the general technique of decomposed shuffles and blends for quite some time, but this never made it back into the 128-bit code, and there are a large number of patterns where this is substantially better. For example, this removes almost all domain crossing in vector shuffles that involve some blend and some permutation with SSE4.1 and later. See the massive reduction in 'shufps' for integer test cases in this commit. This isn't perfect yet for a few reasons: 1) The v8i16 shuffle lowering continues to plague me. We don't always form an unpack-based blend when that would be better. But the wins pretty drastically outstrip the losses here. 2) The v16i8 shuffle lowering is just a disaster here. I never went and implemented blend support here for some terrible reason. I'll do that next probably. I've not updated it for now. More variations on this technique are coming as well -- we don't shuffle-into-unpack or shuffle-into-palignr, both of which would also be profitable. Note that some test cases grow significantly in the number of instructions, but I expect to actually be faster. We use pshufd+pshufd+blendw instead of a single shufps, but the pshufd's are very likely to pipeline well (two ports on most modern intel chips) and the blend is a very fast instruction. The domain switch penalty will essentially always be more than a blend instruction, which is the only increase in tree height. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229350 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 01:52:02 +00:00
Chandler Carruth	aa35e012a3	[x86] Clean up a few test cases with the update script. NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229349 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 01:39:50 +00:00
Simon Pilgrim	f4e056ac2a	Added (still inefficient) shuffle test case for PR21138 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229321 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 18:21:39 +00:00
Simon Pilgrim	afcb895fe1	Added some test cases of missed opportunities to use unpckl/unpckh shuffles git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229313 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 15:07:45 +00:00
Simon Pilgrim	28f299b62d	[X86][AVX2] vpslldq/vpsrldq byte shifts for AVX2 This patch refactors the existing lowerVectorShuffleAsByteShift function to add support for 256-bit vectors on AVX2 targets. It also fixes a tablegen issue that prevented the lowering of vpslldq/vpsrldq vec256 instructions. Differential Revision: http://reviews.llvm.org/D7596 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229311 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 13:19:52 +00:00
Chandler Carruth	3e93916175	[x86] Add the test case from PR22412, we now get this right even with the new vector shuffle legality. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229310 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 12:45:05 +00:00
Chandler Carruth	fbde8bffba	[x86] Teach the decomposed shuffle/blend lowering to use an early blend when that will allow it to lower with a single permute instead of multiple permutes. It tries to detect when it will only have to do a single permute in either case to maximize folding of loads and such. This cuts a lot of the avx2 shuffle permute counts in half. =] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229309 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 12:42:15 +00:00
Chandler Carruth	72753f87f2	[SDAG] Teach the SelectionDAG to canonicalize vector shuffles of splats directly into blends of the splats. These patterns show up even very late in the vector shuffle lowering where we don't have any chance for DAG combining to kick in, and blending is a tremendously simpler operation to model. By coercing the shuffle into a blend we can much more easily match and lower shuffles of splats. Immediately with this change there are significantly more blends being matched in the x86 vector shuffle lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229308 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 12:18:12 +00:00
Chandler Carruth	52f1b6dbed	[x86] Stop shuffling zero vectors. =] I was somewhat surprised this pattern really came up, but it does. It seems better to just directly handle it than try to special case every place where we end up forming a shuffle that devolves to a shuffle of a zero vector. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229301 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 10:34:52 +00:00
Chandler Carruth	46d3e580ed	[x86] When splitting 256-bit vectors into 128-bit vectors, don't extract subvectors from buildvectors. That doesn't really make any sense and it breaks all of the down-stream matching of buildvectors to cleverly lower shuffles. With this, we now get the shift-based lowering of 256-bit vector shuffles with AVX1 when we split them into 128-bit vectors. We also do much better on the zero-extension patterns, although there remains quite a bit of room for improvement here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229299 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 10:12:02 +00:00
Chandler Carruth	4d2dfa703e	[x86] Update some tests with the latest version of my script and llc. This mostly adds some shuffle decode comments and cleans up indentation. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229296 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 09:26:15 +00:00
Chandler Carruth	62ba2b29d8	[x86] Add a slight variation on some of the other generic shuffle lowerings -- one which decomposes into an initial blend followed by a permute. Particularly on newer chips, blends are handled independently of shuffles and so this is much less bottlenecked on the single port that floating point shuffles are executed with on Intel. I'll be adding this lowering to a bunch of other code paths in subsequent commits to handle still more places where we can effectively leverage blends when they're available in the ISA. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229292 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 08:26:30 +00:00
Chandler Carruth	8e8b9bd42f	[x86] Add a test case for PR22390 which was a dup of PR22377 and fixed by r229285. This is a nice different test case though, so I'd like to have the extra testing of these kinds of patterns. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229286 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 07:05:50 +00:00
Chandler Carruth	5ee516549f	[x86] Fix PR22377, a regression with the new vector shuffle legality test. This was just a matter of the DAG combine for vector shuffles being too aggressive. This is a bit of a grey area, but I think generally if we can re-use intermediate shuffles, we should. Certainly, given the test cases I have available, this seems like the right call. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229285 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 07:01:10 +00:00
Chandler Carruth	9bb943b185	[x86] Switch a collection of tests explicitly to the new vector shuffle legality test (essentially, everything is legal). I'm planning to make this the default shortly, but I'd like to fix a collection of the bugs it exposes first, and this will let me easily test them. It also showcases both the improvements and a few of the regressions triggered by the change. The biggest improvements by far are the significantly reduced shuffling and domain crossing in the combining test case. The biggest regressions are missing some clever blending patterns. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229284 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 06:37:21 +00:00
Chandler Carruth	0294b517ac	[x86] Remove the now-default-on flag for the new vector shuffle lowering strategy from a bunch of tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229283 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 06:20:51 +00:00
Chandler Carruth	da0198de41	[x86] Teach my test updating script about another quirk of the printed asm and port the mmx vector shuffle test to it. Not thrilled with how it handles the stack manipulation logic, but I'm much less bothered by that than I am by updating the test manually. =] If anyone wants to teach the test checks management script about stack adjustment patterns, that'd be cool too. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229268 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 00:08:01 +00:00

... 5 6 7 8 9 ...

13381 Commits