llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2025-08-10 18:26:02 +00:00

Author	SHA1	Message	Date
Quentin Colombet	65edced76b	[ARM] Do not perform a tail call when the caller returns several values. The fix is slightly different then x86 (see r216117) because the number of values attached to a return can vary even for a single returned value (e.g., f64 yields two returned values). <rdar://problem/18352998> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218076 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 21:17:50 +00:00
Robin Morisset	5052940c27	Restore "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors" Summary: This patch was originally in D5304 (I could not find a way to reopen that revision). It was accepted, commited and broke the build bots because the overloading of the constructor of ArrayRef for braced initializer lists is not supported by all toolchains. I then reverted it, and propose this fixed version that uses a plain C array instead in makeDMB (that array is then converted implicitly to an ArrayRef, but that is not behind an ifdef). Could someone confirm me whether initialization lists for plain C arrays are supported by every toolchain used to build llvm ? Otherwise I can just initialize the array in the old way: args[0] = ...; .. ; args[5] = ...; Below is the description of the original patch: ``` I had only tested this code for ARMv7 and ARMv8. This patch adds several fallback paths if the processor does not support dmb ish: - dmb sy if a cortex-M with support for dmb - mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB) These fallback paths were chosen based on the code for fence seq_cst. Thanks to luqmana for having noticed this bug. ``` Test Plan: Added more cases to atomic-load-store.ll + make check-all Reviewers: jfb, t.p.northover, luqmana Subscribers: llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D5386 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218066 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 18:56:04 +00:00
Matt Arsenault	e08e52528b	R600: Bug 20982 - Avoid undefined left shift of negative value I'm not sure what the hardware actually does, so don't bother trying to fold it for now. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218057 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 15:52:26 +00:00
Chandler Carruth	72f0d9515e	[x86] Use PALIGNR for v4i32 and v2i64 blends when appropriate. There is no purpose in using it for single-input shuffles as pshufd is just as fast and doesn't tie the two operands. This removes a substantial amount of wrong-domain blend operations in SSSE3 mode. It also completes the usage of PALIGNR for integer shuffles and addresses one of the test cases Quentin hit with the new vector shuffle lowering. There is still the question of whether and when to use this for floating point shuffles. It is faster than shufps or shufpd but in the integer domain. I don't yet really have a good heuristic here for when to use this instruction for floating point vectors. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218038 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 09:00:25 +00:00
Chandler Carruth	088aa097d5	[x86] Add an SSSE3 run and check mode to the 128-bit v2 tests of the new vector shuffle lowering. This will be needed for up-coming palignr tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218037 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 08:33:04 +00:00
Juergen Ributzka	f789dac2dd	Revert "[FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ." Reverting it until I have time to investigate a regression. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218035 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 08:07:40 +00:00
Juergen Ributzka	ef48b51126	Fix previous commit: [FastISel][AArch64] Simplify XALU multiplies. When folding the intrinsic flag into the branch or select we also have to consider the fact if the intrinsic got simplified, because it changes the flag we have to check for. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218034 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 07:26:26 +00:00
Juergen Ributzka	e7fba004ce	[FastISel][AArch64] Simplify XALU multiplies. Simplify {s\|u}mul.with.overflow to {s\|u}add.with.overflow when possible. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218033 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 07:04:54 +00:00
Juergen Ributzka	4b6f00ad18	[FastISel][AArch64] Followup commit for 218031 to handle negative offsets too. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218032 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 07:04:49 +00:00
Juergen Ributzka	22b557d942	[FastISel][AArch64] Try to fold the offset into the add instruction when simplifying a memory address. Small optimization in 'simplifyAddress'. When the offset cannot be encoded in the load/store instruction, then we need to materialize the address manually. The add instruction can encode a wider range of immediates than the load/store instructions. This change tries to fold the offset into the add instruction first before materializing the offset in a register. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218031 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 05:40:47 +00:00
Juergen Ributzka	ffbd4879eb	[FastISel][AArch64] Fold 'AND' instruction during the address computation. The 'AND' instruction could be used to mask out the lower 32 bits of a register. If this is done inside an address computation we might be able to fold the instruction into the memory instruction itself. and x1, x1, #0xffffffff ---> ldrb x0, [x0, w1, uxtw] ldrb x0, [x0, x1] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218030 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 05:40:41 +00:00
Chandler Carruth	49ab1a424d	[x86] Add an SSSE3 run to the v4 shuffle test. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218028 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 04:38:32 +00:00
Chandler Carruth	3ff76847ba	[x86] Initial step of teaching the new vector shuffle lowering about PALIGNR. This just adds it to the v8i16 and v16i8 lowering steps where it is completely unmatched. It also introduces the logic for detecting rotation shuffle masks even in the presence of single input or blend masks and arbitrarily undef lanes. I've added fairly comprehensive tests for the matching logic in v8i16 because the tests at that size are much easier to write and manage. I've not checked the SSE2 code generated for these tests because the code is horrible. It is absolute madness. Testing it will just make the test brittle without giving any interesting improvements in the correctness confidence. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218013 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 04:11:29 +00:00
Juergen Ributzka	710fc316fb	[FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ. Teach selectBranch to fold bit test and branch into a single instruction (TBZ or TBNZ). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218010 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 02:44:13 +00:00
Samuel Antao	6693d0de3e	Fix FastISel bug in boolean returns for PowerPC. For PPC targets, FastISel does not take the sign extension information into account when selecting return instructions whose operands are constants. A consequence of this is that the return of boolean values is not correct. This patch fixes the problem by evaluating the sign extension information also for constants, forwarding this information to PPCMaterializeInt which takes this information to drive the sign extension during the materialization. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217993 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 23:25:06 +00:00
Juergen Ributzka	7516444a26	[FastISel][AArch64] Custom lower sdiv by power-of-2. Emit an optimized instruction sequence for sdiv by power-of-2 depending on the exact flag. This fixes rdar://problem/18224511. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217986 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 21:55:55 +00:00
Juergen Ributzka	580875d39d	[FastISel][AArch64] Simplify mul to shift when possible. This is related to rdar://problem/18369687. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217980 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 20:35:41 +00:00
Alexey Samsonov	dc4eb3d6dc	Exclude known and bugzilled failures from UBSan bootstrap git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217979 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 20:17:52 +00:00
Juergen Ributzka	46d6fd2908	[FastISel][AArch64] Fold mul into add/sub and logical operations. Try to fold the multiply into the add/sub or logical operations (when possible). This is related to rdar://problem/18369687. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217978 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 19:51:38 +00:00
Juergen Ributzka	5461af97bc	[FastISel][AArch64] Fold mul into the address computation of memory operations. Teach 'computeAddress' to also fold multiplies into the address computation (when possible). This fixes rdar://problem/18369443. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217977 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 19:19:31 +00:00
Robin Morisset	e2ff4e489b	Revert "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors" It is breaking the build on the buildbots but works fine on my machine, I revert while trying to understand what happens (it appears to depend on the compiler used to build, I probably used a C++11 feature that is not perfectly supported by some of the buildbots). This reverts commit feb3176c4d006f99af8b40373abd56215a90e7cc. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217973 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 18:09:13 +00:00
Juergen Ributzka	07c9ae576c	[FastISel][AArch64] Fold compare with zero and branch into CBZ and CBNZ. This takes advanatage of the CBZ and CBNZ instruction to further optimize the common null check pattern into a single instruction. This is related to rdar://problem/18358882. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217972 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 18:05:34 +00:00
Juergen Ributzka	17e0ee5078	[FastISel][AArch64] Improve branch selection to support all FP conditions. This adds the last two missing floating-point condition codes (FCMP_UEQ and FCMP_ONE) also to the branch selection. In these two cases an additonal branch instruction is required. This also adds unit tests to checks all the different condition codes. This is related o rdar://problem/18358882. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217966 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 17:46:47 +00:00
Robin Morisset	30486fa3de	[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors Summary: I had only tested this code for ARMv7 and ARMv8. This patch adds several fallback paths if the processor does not support dmb ish: - dmb sy if a cortex-M with support for dmb - mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB) These fallback paths were chosen based on the code for fence seq_cst. Thanks to luqmana for having noticed this bug. Test Plan: Added more cases to atomic-load-store.ll + make check-all Reviewers: jfb, t.p.northover, luqmana Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5304 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217965 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 17:41:16 +00:00
Matt Arsenault	507636288f	R600/SI: Change formatting of printed FP immediates Only 1 decimal place should be printed for inline immediates. Other constants should be hex constants. Does not include f64 tests because folding those inline immediates currently does not work. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217964 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 17:32:13 +00:00
Pavel Chupin	780f7e2168	[x32] Fix function indirect calls Summary: Zero-extend register to 64-bit for callq/jmpq. Test Plan: 3 tests added Reviewers: nadav, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5355 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217942 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 07:09:23 +00:00
Quentin Colombet	9fe79b48b8	[CodeGenPrepare][AddressingModeMatcher] The promotion mechanism was expecting instructions when truncate, sext, or zext were created. Fix that. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217926 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-16 22:36:07 +00:00
Juergen Ributzka	c9bc145e31	[FastISel][AArch64] Add vector support to argument lowering. Lower the first 8 vector arguments too. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217850 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-16 00:25:30 +00:00
Chandler Carruth	bad2c13aae	[x86] As a follow-up to r217819, don't check for VSELECT legality now that we don't use VSELECT and directly emit an addsub synthetic node. Also remove a stale comment referencing VSELECT. The test case is updated to use 'core2' which only has SSE3, not SSE4.1, and it still passes. Previously it would not because we lacked sufficient blend support to legalize the VSELECT. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217849 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-16 00:24:42 +00:00
Chandler Carruth	cba9d1273a	[x86] Add the beginnings of a proper DAG combine to match ADDSUBPS and ADDSUBPD nodes out of blends of adds and subs. This allows us to actually form these instructions with SSE3 rather than only forming them when we had both SSE3 for the ADDSUB instructions and SSE4.1 for the blend instructions. ;] Kind-of important. I've adjusted the CPU requirements on one of the tests to demonstrate this kicking in nicely for an SSE3 cpu configuration. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217848 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-16 00:15:20 +00:00
Juergen Ributzka	c0f00e90d2	[FastISel][AArch64] Add missing test case for previous commit. This adds the missing test case for the previous commit: Allow handling of vectors during return lowering for little endian machines. Sorry for the noise. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217847 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 23:47:57 +00:00
Juergen Ributzka	df445d7af2	[FastISel][AArch64] Lower sin/cos/pow to runtime lib calls. Also lower sin/cos/pow to runtime lib calls. This fixes rdar://problem/18343468. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217839 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 22:33:06 +00:00
Rafael Espindola	d41a46e942	Add back tests for empty function in SPARC and PowerPC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217834 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 22:11:07 +00:00
Juergen Ributzka	323445f706	[FastISel][AArch64] Add lowering support for frem. This lowers frem to a runtime libcall inside fast-isel. The test case also checks the CallLoweringInfo bug that was exposed by this change. This fixes rdar://problem/18342783. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217833 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 22:07:49 +00:00
Juergen Ributzka	86bdc1efbe	[FastISel][AArch64] Improve floating-point compare support. Add support for the last two missing fcmp condition codes: UEQ and ONE. This fixes rdar://problem/18341575. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217823 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 20:47:16 +00:00
Reed Kotler	34ad085eec	Add mips32 r1 to the list of supported targets for Mips fast-isel Summary: Expand list of supported targets for Mips to include mips32 r1. Previously it only include r2. More patches are coming where there is a difference but in the current patches as pushed upstream, r1 and r2 are equivalent. Test Plan: simplestorefp1.ll add new build bots at mips to test this flavor at both -O0 and -O2 Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D5306 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217821 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 20:30:25 +00:00
NAKAMURA Takumi	85deed0525	llvm/test/CodeGen/X86/peephole-fold-movsd.ll: Relax an expression for win32. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217806 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 19:00:31 +00:00
Rafael Espindola	d58cb55353	Add a triple to fix the bots. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217805 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 18:54:41 +00:00
Rafael Espindola	3f0ce4fa18	Fix a lot of confusion around inserting nops on empty functions. On MachO, and MachO only, we cannot have a truly empty function since that breaks the linker logic for atomizing the section. When we are emitting a frame pointer, the presence of an unreachable will create a cfi instruction pointing past the last instruction. This is perfectly fine. The FDE information encodes the pc range it applies to. If some tool cannot handle this, we should explicitly say which bug we are working around and only work around it when it is actually relevant (not for ELF for example). Given the unreachable we could omit the .cfi_def_cfa_register, but then again, we could also omit the entire function prologue if we wanted to. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217801 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 18:32:58 +00:00
Quentin Colombet	49e423ca30	[CodeGenPrepare][AddressingModeMatcher] Fix a think-o for the sext(zext) -> zext promotion introduced in r217629. We were returning the old sext instead of the new zext as the promoted instruction! Thanks Joerg Sonnenberger for the test case. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217800 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 18:26:58 +00:00
Akira Hatanaka	348e9e7b6d	[X86] Fix a bug in X86's peephole optimization. Peephole optimization was folding MOVSDrm, which is a zero-extending double precision floating point load, into ADDPDrr, which is a SIMD add of two packed double precision floating point values. (before) %vreg21<def> = MOVSDrm <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg21 %vreg23<def,tied1> = ADDPDrr %vreg20<tied0>, %vreg21; VR128:%vreg23,%vreg20,%vreg21 (after) %vreg23<def,tied1> = ADDPDrm %vreg20<tied0>, <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg23,%vreg20 X86InstrInfo::foldMemoryOperandImpl already had the logic that prevented this from happening. However the check wasn't being conducted for loads from stack objects. This commit factors out the logic into a new function and uses it for checking loads from stack slots are not zero-extending loads. rdar://problem/18236850 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217799 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 18:23:52 +00:00
Matt Arsenault	f1b16047b7	R600/SI: Prefer selecting more e64 instruction forms. Add some more tests to make sure better operand choices are still made. Leave some cases that seem to have no reason to ever be e64 alone. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217789 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 17:15:02 +00:00
Matt Arsenault	6fc71a0cfc	R600/SI: Make sure double vector fmul is tested git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217787 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 17:04:54 +00:00
Matt Arsenault	e626ee51b6	R600/SI: Add some mubuf testcases. I noticed some odd looking cases where addr64 wasn't set when storing to a pointer in an SGPR. This seems to be intentional, and partially tested already. The documentation seems to describe addr64 in terms of which registers addressing modifiers come from, but I would expect to always need addr64 when using 64-bit pointers. If no offset is applied, it makes sense to not need to worry about doing a 64-bit add for the final address. A small immediate offset can be applied, so is it OK to not have addr64 set if a carry is necessary when adding the base pointer in the resource to the offset? git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217785 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 16:48:01 +00:00
Matt Arsenault	d189a0407d	R600/SI: Add preliminary support for flat address space git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217777 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 15:41:53 +00:00
Chandler Carruth	c5371836a5	[x86] Begin emitting PBLENDW instructions for integer blend operations when SSE4.1 is available. This removes a ton of domain crossing from blend code paths that were ending up in the floating point code path. This is just the tip of the iceberg though. The real switch is for integer blend lowering to more actively rely on this instruction being available so we don't hit shufps at all any longer. =] That will come in a follow-up patch. Another place where we need better support is for using PBLENDVB when doing so avoids the need to have two complementary PSHUFB masks. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217767 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 12:40:54 +00:00
Chandler Carruth	9277ad2d36	[x86] Add an explicit SSE3 run to this test and flesh out a bunch of missing specific checks. While there is a lot of redundancy here where all-but-one mode use the same code generation, I'd rather have each variant spelled out and checked so that readers aren't misled by an omission in the test suite. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217765 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 11:40:20 +00:00
Chandler Carruth	2fdec16fbe	[x86] Teach the x86 DAG combiner to form UNPCKLPS and UNPCKHPS instructions from the relevant shuffle patterns. This is the last tweak I'm aware of to generate essentially perfect v4f32 and v2f64 shuffles with the new vector shuffle lowering up through SSE4.1. I'm sure I've missed some and it'd be nice to check since v4f32 is amenable to exhaustive exploration, but this is all of the tricks I'm aware of. With AVX there is a new trick to use the VPERMILPS instruction, that's coming up in a subsequent patch. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217761 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 11:26:25 +00:00
Chandler Carruth	08780d4c1d	[x86] Teach the x86 DAG combiner to form MOVSLDUP and MOVSHDUP instructions when it finds an appropriate pattern. These are lovely instructions, and its a shame to not use them. =] They are fast, and can hand loads folded into their operands, etc. I've also plumbed the comment shuffle decoding through the various layers so that the test cases are printed nicely. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217758 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 11:15:23 +00:00
Chandler Carruth	04402a6c13	[x86] Undo a flawed transform I added to form UNPCK instructions when AVX is available, and generally tidy up things surrounding UNPCK formation. Originally, I was thinking that the only advantage of PSHUFD over UNPCK instruction variants was its free copy, and otherwise we should use the shorter encoding UNPCK instructions. This isn't right though, there is a larger advantage of being able to fold a load into the operand of a PSHUFD. For UNPCK, the operand must be in a register so it can be the second input. This removes the UNPCK formation in the target-specific DAG combine for v4i32 shuffles. It also lifts the v8 and v16 cases out of the AVX-specific check as they are potentially replacing multiple instructions with a single instruction and so should always be valuable. The floating point checks are simplified accordingly. This also adjusts the formation of PSHUFD instructions to attempt to match the shuffle mask to one which would fit an UNPCK instruction variant. This was originally motivated to allow it to match the UNPCK instructions in the combiner, but clearly won't now. Eventually, we should add a MachineCombiner pass that can form UNPCK instructions post-RA when the operand is known to be in a register and thus there is no loss. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217755 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 10:35:41 +00:00

1 2 3 4 5 ...

11595 Commits