llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2024-08-24 09:29:42 +00:00

Author	SHA1	Message	Date
Ahmed Bougacha	88d2b5812a	[MachineCSE] Clear kill-flag on registers imp-def'd by the CSE'd instruction. Go through implicit defs of CSMI and MI, and clear the kill flags on their uses in all the instructions between CSMI and MI. We might have made some of the kill flags redundant, consider: subs ... %NZCV<imp-def> <- CSMI csinc ... %NZCV<imp-use,kill> <- this kill flag isn't valid anymore subs ... %NZCV<imp-def> <- MI, to be eliminated csinc ... %NZCV<imp-use,kill> Since we eliminated MI, and reused a register imp-def'd by CSMI (here %NZCV), that register, if it was killed before MI, should have that kill flag removed, because it's lifetime was extended. Also, add an exhaustive testcase for the motivating example. Reviewed by: Juergen Ributzka <juergen@apple.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223133 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-02 18:09:51 +00:00
Tim Northover	b588e02c07	AArch64: make register block rules apply to vector types too. The blocking code originated in ARM, which is more aggressive about casting types to a canonical representative before doing anything else, so I missed out most vector HFAs and broke the ABI. This should fix it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223126 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-02 17:15:22 +00:00
Tom Stellard	15e1919a76	R600/SI: Set the ATC bit on all resource descriptors for the HSA runtime git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223125 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-02 17:05:41 +00:00
Charlie Turner	364f2f3fcf	Emit Tag_ABI_FP_denormal correctly in fast-math mode. The default ARM floating-point mode does not support IEEE 754 mode exactly. Of relevance to this patch is that input denormals are flushed to zero. The way in which they're flushed to zero depends on the architecture, * For VFPv2, it is implementation defined as to whether the sign of zero is preserved. * For VFPv3 and above, the sign of zero is always preserved when a denormal is flushed to zero. When FP support has been disabled, the strategy taken by this patch is to assume the software support will mirror the behaviour of the hardware support for the target if it existed. That is, for architectures which can only have VFPv2, it is assumed the software will flush to positive zero. For later architectures it is assumed the software will flush to zero preserving sign. Change-Id: Icc5928633ba222a4ba3ca8c0df44a440445865fd git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223110 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-02 08:22:29 +00:00
Jingyue Wu	b043278834	[NVPTX] Do not emit .weak symbols for NVPTX Summary: ".weak" symbols cannot be consumed by ptxas (PR21685). This patch makes the weak directive in MCAsmPrinter customizable, and disables emitting ".weak" symbols for NVPTX. Test Plan: weak-linkage.ll Reviewers: jholewinski Reviewed By: jholewinski Subscribers: majnemer, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D6455 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223077 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-01 21:16:17 +00:00
Reid Kleckner	03c735b42c	Parse 'ghccc' in .ll files as the GHC convention (cc 10) Previously we just used "cc 10" in the .ll files, but that isn't very human readable. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223076 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-01 21:04:44 +00:00
Ahmed Bougacha	14fe2e6948	[AArch64] Don't combine "select (setcc i1 LHS, RHS), vL, vR". r208210 introduced an optimization that improves the vector select codegen by doing the setcc on vectors directly. This is a problem they the setcc operands are i1s, because the optimization would create vectors of i1, which aren't legal. Part of PR21549. Differential Revision: http://reviews.llvm.org/D6308 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223075 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-01 20:59:00 +00:00
Ahmed Bougacha	217a4a87ce	[AArch64] Fix v2i8->i16 bitcast legalization. r213378 improved f16 bitcasts, so that they go directly through subregs, instead of through the stack. That code now causes an assertion failure for bitcasts from other 16-bits types (most importantly v2i8). Correct that by doing the custom lowering for i16 bitcasts only when the input is an f16. Part of PR21549. Differential Revision: http://reviews.llvm.org/D6307 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223074 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-01 20:52:32 +00:00
Ahmed Bougacha	eb1f2e8eb0	[MachineVerifier] Accept a MBB with a single landing pad successor. The MachineVerifier used to check that there was always exactly one unconditional branch to a non-landingpad (normal) successor. If that normal successor to an invoke BB is unreachable, it seems reasonable to only have one successor, the landing pad. On targets other than AArch64 (and on AArch64 with a different testcase), the branch folder turns the branch to the landing pad into a fallthrough. The MachineVerifier, which relies on AnalyzeBranch, is unable to check the condition, and doesn't complain. However, it does in this specific testcase, where the branch to the landing pad remained. Make the MachineVerifier accept it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223059 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-01 18:43:53 +00:00
Tim Northover	f7f88095a3	ARM: lower tail calls correctly when using GHC calling convention. Patch by Ben Gamari. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223055 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-01 17:46:39 +00:00
Hans Wennborg	c9605b6067	Revert r223049, r223050 and r223051 while investigating test failures. I didn't foresee affecting the Clang test suite :/ git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223054 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-01 17:36:43 +00:00
Hans Wennborg	e6f4d335d7	SelectionDAG switch lowering: Replace unreachable default with most popular case. This can significantly reduce the size of the switch, allowing for more efficient lowering. I also worked with the idea of exploiting unreachable defaults by omitting the range check for jump tables, but always ended up with a non-neglible binary size increase. It might be worth looking into some more. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223049 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-01 17:08:32 +00:00
Jay Foad	8b9cea42db	[PowerPC] Fix unwind info with dynamic stack realignment Summary: PowerPC DWARF unwind info defined CFA as SP + offset even in a function where the stack had been dynamically realigned. This clearly doesn't work because the offset from SP to CFA is not a constant. Fix it by defining CFA as BP instead. This was causing the AddressSanitizer null_deref test to fail 50% of the time, depending on whether SP happened to be 32-byte aligned on entry to a particular function or not. Reviewers: willschm, uweigand, hfinkel Reviewed By: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6410 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222996 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-01 09:42:32 +00:00
Akira Hatanaka	ca780b4578	[stack protector] Set edge weights for newly created basic blocks. This commit fixes a bug in stack protector pass where edge weights were not set when new basic blocks were added to lists of successor basic blocks. Differential Revision: http://reviews.llvm.org/D5766 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222987 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-01 04:27:03 +00:00
Hans Wennborg	5c6c5e23bd	Switch lowering: Fix broken 'Figure out which block is next' code This doesn't seem to have worked in a long time, but other optimizations would clean it up. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222961 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-29 21:17:05 +00:00
Matt Arsenault	6e5862cb8c	R600/SI: Fix assertion on sign extend of 3 vectors This was trying to create an MVT with 3x vectors which created an invalid EVT git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222942 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-28 22:51:38 +00:00
Duncan P. N. Exon Smith	54786a0936	Revert "Masked Vector Load and Store Intrinsics." This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222936 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-28 21:29:14 +00:00
Sanjay Patel	c5992119fc	Enable FeatureFastUAMem for btver2 Allow unaligned 16-byte memop codegen for btver2. No functional changes for any other subtargets. Replace the existing supposed small memcpy test with an actual test of a small memcpy. The previous test wasn't using FileCheck either. This patch should allow us to close PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ). Differential Revision: http://reviews.llvm.org/D6360 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222925 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-28 18:40:18 +00:00
Tim Northover	d7a4f74f15	AArch64: treat [N x Ty] as a block during procedure calls. The AAPCS treats small structs and homogeneous floating (or vector) aggregates specially, and guarantees they either get passed as a contiguous block of registers, or prevent any future use of those registers and get passed on the stack. This concept can fit quite neatly into LLVM's own type system, mapping an HFA to [N x float] and so on, and small structs to [N x i64]. Doing so allows front-ends to emit AAPCS compliant code without having to duplicate the register counting logic. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222903 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-27 21:02:42 +00:00
Charlie Turner	72ba1af89c	Stop uppercasing build attribute data. The string data for string-valued build attributes were being unconditionally uppercased. There is no mention in the ARM ABI addenda about case conventions, so it's technically implementation defined as to whether the data are capitialised in some way or not. However, there are good reasons not to captialise the data. * It's less work. * Some vendors may legitimately have case-sensitive checks for these attributes which would fail on LLVM generated object files. * There could be locale issues with uppercasing. The original reasons for uppercasing appear to have stemmed from an old codesourcery toolchain behaviour, see http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/87133 This patch makes the object file emitted no longer captialise string data, it encodes as seen in the assembly source. Change-Id: Ibe20dd6e60d2773d57ff72a78470839033aa5538 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222882 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-27 12:13:56 +00:00
Will Newton	87a2f3751c	Update AArch64 ELF relocations to ABI 1.0 This mostly entails adding relocations, however there are a couple of changes to existing relocations: 1. R_AARCH64_NONE is defined to be zero rather than 256 R_AARCH64_NONE has been defined to be zero for a long time elsewhere e.g. binutils and glibc since the submission of the AArch64 port in 2012 so this is required for compatibility. 2. R_AARCH64_TLSDESC_ADR_PAGE renamed to R_AARCH64_TLSDESC_ADR_PAGE21 I don't think there is any way for relocation names to leak out of LLVM so this should not break anything. Tested with check-all with no regressions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222821 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-26 10:49:18 +00:00
Elena Demikhovsky	10c8f38047	AVX-512: Scalar ERI intrinsics including SAE mode and memory operand. Added AVX512_maskable_scalar template, that should cover all scalar instructions in the future. The main difference between AVX512_maskable_scalar<> and AVX512_maskable<> is using X86select instead of vselect. I need it, because I can't create vselect node for MVT::i1 mask for scalar instruction. http://reviews.llvm.org/D6378 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222820 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-26 10:46:49 +00:00
Simon Pilgrim	7f6cee9626	[X86][SSE] Improvements to byte shift shuffle matching Since (v)pslldq / (v)psrldq instructions resolve to a single input argument it is useful to match it much earlier than we currently do - this prevents more complicated shuffles (notably insertion into a zero vector) matching before it. Differential Revision: http://reviews.llvm.org/D6409 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222796 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-25 22:34:59 +00:00
Cameron McInally	9f4bb0420d	[AVX512] Add 512b integer shift by variable intrinsics and patterns. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222786 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-25 20:41:51 +00:00
Zoran Jovanovic	137c475805	[mips][micromips] Use call instructions with short delay slots Differential Revision: http://reviews.llvm.org/D6338 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222752 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-25 10:50:00 +00:00
Juergen Ributzka	a4ebd338c4	[FastISel][AArch64] Fix and extend the tbz/tbnz pattern matching. The pattern matching failed to recognize all instances of "-1", because when comparing against "-1" we didn't use an APInt of the same bitwidth. This commit fixes this and also adds inverse versions of the conditon to catch more cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222722 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-25 04:16:15 +00:00
Hal Finkel	5d6f185653	[PowerPC] Implement combineRepeatedFPDivisors This does not matter on newer cores (where we can use reciprocal estimates in fast-math mode anyway), but for older cores this allows us to generate better fast-math code where we have multiple FDIVs with a common divisor. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222710 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-24 23:45:21 +00:00
Andrea Di Biagio	a1e1f01699	[X86] Improved target specific combine on VSELECT dag nodes. This patch teaches function 'transformVSELECTtoBlendVECTOR_SHUFFLE' how to convert VSELECT dag nodes to shuffles on targets that do not have SSE4.1. On pre-SSE4.1 targets, we can still perform blend operations using movss/movsd. Also, removed a target specific combine that performed a premature lowering of VSELECT nodes to target specific MOVSS/MOVSD nodes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222647 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-24 12:23:15 +00:00
Michael Kuperstein	d539147834	[X86] Fixes bug in build_vector v4x32 lowering r222375 made some improvements to build_vector lowering of v4x32 and v4xf32 into an insertps, but it missed a case where: 1. A single extracted element is used twice. 2. The lower of the two non-zero indexes should be preserved, and the higher should be used for the dest mask. This caused a crash, since the source value for the insertps ends-up uninitialized. Differential Revision: http://reviews.llvm.org/D6377 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222635 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-23 13:09:06 +00:00
Elena Demikhovsky	ae1ae2c3a1	Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222632 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-23 08:07:43 +00:00
Matt Arsenault	4f5aa5994e	R600: Fix extloads of i1 on R600/Evergreen git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222631 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-23 02:57:54 +00:00
Matt Arsenault	2be9044ffc	R600/SI: Add additional tests for i1 loads git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222629 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-23 02:57:50 +00:00
Matt Arsenault	5cd4913c8f	R600/SI: Fix broken check lines and modernize prefixes Use -LABEL and remove -CHECK git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222628 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-23 02:57:49 +00:00
Matt Arsenault	023311333a	R600/SI: Fix missing -verify-machineinstrs on a test git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222627 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-23 02:57:47 +00:00
Chandler Carruth	06a07dadb9	[x86] Add some tests for a common unpack pattern of vector shuffle that has a remarkably unique and efficient lowering. While we get this some of the time already, we miss a few cases and there wasn't a principled reason we got it. We should at least test this. v8 already has tests for this pattern. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222607 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-22 05:44:43 +00:00
Tom Stellard	739dfb1a0e	R600/SI: Add a failing test case for offset order in ds_read2 instructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222585 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 22:31:47 +00:00
Tom Stellard	573630a020	R600/SI: Emit s_mov_b32 m0, -1 before every DS instruction This s_mov_b32 will write to a virtual register from the M0Reg class and all the ds instructions now take an extra M0Reg explicit argument. This change is necessary to prevent issues with the scheduler mixing together instructions that expect different values in the m0 registers. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222583 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 22:31:44 +00:00
Tom Stellard	edcd88ce1a	R600/SI: Add SIFoldOperands pass This pass attempts to fold the source operands of mov and copy instructions into their uses. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222581 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 22:06:37 +00:00
Jozef Kolek	d9accc1e5f	[mips][microMIPS] This patch implements functionality in MIPS delay slot filler such as if delay slot filler have to put NOP instruction into the delay slot of microMIPS BEQ or BNE instruction which uses the register $0, then instead of emitting NOP this instruction is replaced by the corresponding microMIPS compact branch instruction, i.e. BEQZC or BNEZC. Differential Revision: http://reviews.llvm.org/D3566 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222580 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 22:04:35 +00:00
Tom Stellard	9a85cc1705	R600/SI: Use hex notation for constant in test git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222578 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 22:00:13 +00:00
Sanjay Patel	28660d4b2f	Add a feature flag for slow 32-byte unaligned memory accesses [x86]. This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen for Sandy Bridge and Ivy Bridge. There is no functionality change intended for those chips. Previously, the absence of AVX2 was being used as a proxy to detect this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2 that do not have the 32-byte unaligned access slowdown. Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ). Differential Revision: http://reviews.llvm.org/D6355 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222544 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 17:40:04 +00:00
Chandler Carruth	46c5a97adc	[x86] Restructure the checking patterns for v16 and v32 avx2 vector shuffle lowering to allow much better blend matching. Specifically, with the new structure the code seems clearer to me and we correctly can hit the cases where merging two 128-bit lanes is a clear win and can be shuffled cheaply afterward. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222539 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 14:53:03 +00:00
Chandler Carruth	0889d65fd5	[x86] Make the previous logic significantly less conservative and get a bunch more improvements. Non-lane-crossing is fine, the key is that lane merging only makes sense for single-input shuffles. Not sure why I got so turned around here. The code all works, I was just using the wrong model for it. This only updates v4 and v8 lowering. The v16 and v32 lowering requires restructuring the entire check sequence. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222537 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 14:33:24 +00:00
Andrea Di Biagio	607099b697	[DAG] Teach how to turn a build_vector into a shuffle if some of the operands are zero. Before this patch, the DAGCombiner only tried to convert build_vector dag nodes into shuffles if all operands were either extract_vector_elt or undef. This patch improves that logic and teaches the DAGCombiner how to deal with build_vector dag nodes where one or more operands are zero. A build_vector dag node with some zero operands is turned into a shuffle only if the resulting shuffle mask is legal for the target. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222536 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 14:32:06 +00:00
Chandler Carruth	bd357588a1	[x86] Teach the x86 vector shuffle lowering to detect mergable 128-bit lanes. By special casing these we can often either reduce the total number of shuffles significantly or reduce the number of (high latency on Haswell) AVX2 shuffles that potentially cross 128-bit lanes. Even when these don't actually cross lanes, they have much higher latency to support that. Doing two of them and a blend is worse than doing a single insert across the 128-bit lanes to blend and then doing a single interleaved shuffle. While this seems like a narrow case, it kept cropping up on me and the difference is huge as you can see in many of the test cases. I first hit this trying to perfectly fix the interleaving shuffle patterns used by Halide for AVX2. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222533 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 13:56:05 +00:00
Chandler Carruth	a5f4576510	[x86] Remove more windows line endings that slipped into this file... git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222528 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 12:33:46 +00:00
Chandler Carruth	d8d3a957d8	[x86] Add a bunch of test cases to 256-bit shuffles that exercise merging 128-bit subvectors and also shuffling all the elements of those subvectors. Currently we generate pretty bad code for many of these, but I'm testing a patch that should dramatically improve this in addition to making the shuffle lowering robust to other changes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222525 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 12:17:50 +00:00
Alexey Volkov	d0d0424368	[X86] For Silvermont CPU use 16-bit division instead of 64-bit for small positive numbers Differential Revision: http://reviews.llvm.org/D5938 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222521 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 11:19:34 +00:00
Hao Liu	09ad94decb	DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal. E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip) A hook is added to allow the target to control whether it needs to do such combine. Reviewed in http://reviews.llvm.org/D6334 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222510 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 06:39:58 +00:00
Hal Finkel	361eafaffa	[PPC] Use SeparateConstOffsetFromGEP This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM, there is a store moved out of the inner loop) and a potential speedup on MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it makes some code look cleaner, and synchronizing the backends in this regard seems like a generally good thing. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222504 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-21 04:35:51 +00:00

1 2 3 4 5 ...

12132 Commits