llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2025-02-11 11:34:02 +00:00

Author	SHA1	Message	Date
Bill Wendling	e31c05926b	Unify the lowering of arguments during SjLj prepare. The 'select true, %arg, undef' instruction can be used for both aggregate and non-aggregate arguments. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212967 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-14 18:21:11 +00:00
Saleem Abdulrasool	5335b49f96	X86: correct 64-bit atomics on 32-bit We would emit a libcall for a 64-bit atomic on x86 after SVN r212119. This was due to the misuse of hasCmpxchg16 to indicate if cmpxchg8b was supported on a 32-bit target. They were added at different times and would result in the border condition being mishandled. This fixes the border case to emit the cmpxchg8b instruction for 64-bit atomic operations on x86 at the cost of restoring a long-standing bug in the codegen. We emit a cmpxchg8b on all x86 targets even where the CPU does not support this instruction (pre-Pentium CPUs). Although this bug should be fixed, this was present prior to SVN r212119 and this change, so this is not really introducing a regression. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212956 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-14 16:28:13 +00:00
Tim Northover	9e251d6d41	X86: remove temporary atomicrmw used during lowering. We construct a temporary "atomicrmw xchg" instruction when lowering atomic stores for widths that aren't supported natively. This isn't on the top-level worklist though, so it won't be removed automatically and we have to do it ourselves once that itself has been lowered. Thanks Saleem for pointing this out! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212948 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-14 15:31:13 +00:00
Daniel Sanders	c4ce78e261	[mips] For the FP64A ABI, odd-numbered double-precision moves must not use mtc1/mfc1. Summary: This is because the FP64A the hardware will redirect 32-bit reads/writes from/to odd-numbered registers to the upper 32-bits of the corresponding even register. In effect, simulating FR=0 mode when FR=0 mode is not available. Unfortunately, we have to make the decision to avoid mfc1/mtc1 before register allocation so we currently do this for even registers too. FPXX has a similar requirement on 32-bit architectures that lack mfhc1/mthc1 so this patch also handles the affected moves from the FPU for FPXX too. Moves to the FPU were supported by an earlier commit. Differential Revision: http://reviews.llvm.org/D4484 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212938 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-14 13:08:14 +00:00
Daniel Sanders	543f70b040	[mips] Use MFHC1 when it is available (MIPS32r2 and later) for both FP32 and FP64 moves Summary: This is similar to r210771 which did the same thing for MTHC1. Also corrected MTHC1_D32 and MTHC1_D64 which used AFGR64 and FGR64 on the wrong definitions. Differential Revision: http://reviews.llvm.org/D4483 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212936 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-14 12:41:31 +00:00
Tim Northover	26012cec89	AArch64: remove unnecessary pseudo-instruction. Sufficiently twisted use of TableGen lets us write patterns directly for f16 (as an i16 promoted to i32) -> f32 conversion. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212933 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-14 11:16:02 +00:00
Sasa Stankovic	fce699d88a	[mips] Expand BuildPairF64 to a spill and reload when the O32 FPXX ABI is enabled and mthc1 and dmtc1 are not available (e.g. on MIPS32r1) This prevents the upper 32-bits of a double precision value from being moved to the FPU with mtc1 to an odd-numbered FPU register. This is necessary to ensure that the code generated executes correctly regardless of the current FPU mode. MIPS32r2 and above continues to use mtc1/mthc1, while MIPS-IV and above continue to use dmtc1. Differential Revision: http://reviews.llvm.org/D4465 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212930 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-14 09:40:29 +00:00
Bill Wendling	5388e6f9b3	Support lowering of empty aggregates. This crash was pretty common while compiling Rust for iOS (armv7). Reason - SjLj preparation step was lowering aggregate arguments as ExtractValue + InsertValue. ExtractValue has assertion which checks that there is some data in value, which is not true in case of empty (no fields) structures. Rust uses them quite extensively so this patch uses a 'select true, %val, undef' instruction to lower the argument. Patch by Valerii Hiora. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212922 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-14 06:22:36 +00:00
Andrea Di Biagio	124ce3ab36	[DAGCombiner] Fix a crash caused by a missing check for legal type when trying to fold shuffles. Verify that DAGCombiner does not crash when trying to fold a pair of shuffles according to rule (added at r212539): (shuffle (shuffle A, Undef, M0), Undef, M1) -> (shuffle A, Undef, M2) The DAGCombiner avoids folding shuffles if the resulting shuffle dag node is not legal for the target. That means, the resulting shuffle must have legal type and legal mask. Before, the DAGCombiner only called method 'TargetLowering::isShuffleMaskLegal' to check if it was "safe" to fold according to the above-mentioned rule. However, this caused a crash in the x86 backend since method 'isShuffleMaskLegal' always expects to be called on a legal vector type. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212915 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-13 21:02:14 +00:00
Matt Arsenault	98c2299532	R600: Run more tests with promote alloca disabled. Re-run tests changed in r211110 to test both paths. Also fix broken check line. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212895 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-13 02:46:17 +00:00
Matt Arsenault	ff71f3dac1	R600: Run private-memory test with and without alloca promote The unpromoted path still needs to be tested since we can't always promote to using LDS. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212894 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-13 02:18:06 +00:00
Saleem Abdulrasool	01c06d7954	AArch64: add support for llvm.aarch64.hint intrinsic This adds a llvm.aarch64.hint intrinsic to mirror the llvm.arm.hint in order to support the various hint intrinsic functions in the ACLE. Add an optional pattern field that permits the subclass to specify the pattern that matches the selection. The intrinsic pattern is set as mayLoad, mayStore, so overload the value for the definition of the hint instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212883 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-12 21:20:49 +00:00
Matt Arsenault	cc6a418600	R600: Add missing tests for some intrinsics git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212870 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-12 00:36:19 +00:00
Ulrich Weigand	edb27188b4	[PowerPC] Fix invalid displacement created by LocalStackAlloc This commit fixes a bug in PPCRegisterInfo::isFrameOffsetLegal that could result in the LocalStackAlloc pass creating an MI instruction out-of-range displacement: %vreg17<def> = LD 33184, %vreg31; mem:LD8[%g](align=32) %G8RC:%vreg17 G8RC_and_G8RC_NOX0:%vreg31 (In final assembler output the top bits are stripped off, resulting in a negative offset loading from below the stack pointer.) Common code expects the isFrameOffsetLegal routine to verify whether adding a given offset to the offset already present in the instruction results in a valid displacement. However, on PowerPC the routine did not take the already present instruction offset into account. This commit fixes isFrameOffsetLegal to add the instruction offset, and updates a local caller (needsFrameBaseReg) to no longer add the instruction offset itself before calling isFrameOffsetLegal. Reviewed by Hal Finkel. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212832 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-11 17:19:31 +00:00
Marek Olsak	438e1f2ad8	R600/SI: Use i32 vectors for resources and samplers This affects new intrinsics only. What surprises me is that v32i8 still works. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212831 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-11 17:11:52 +00:00
Marek Olsak	5a35fdcafb	R600/SI: add sample and image intrinsics exposing all instruction fields We need the intrinsics with offsets, so why not just add them all. The R128 parameter will also be useful for reducing SGPR usage. GL_ARB_image_load_store also adds some image GLSL modifiers like "coherent", so Mesa will probably translate those to slc, glc, etc. When LLVM 3.5 is released, I'll switch Mesa to these new intrinsics. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212830 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-11 17:11:46 +00:00
Oliver Stannard	cb047f2a74	ARM: Allow __fp16 as a function arg or return type for AArch64 ACLE 2.0 allows __fp16 to be used as a function argument or return type. This enables this for AArch64. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212812 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-11 13:33:46 +00:00
Quentin Colombet	52c50f5ec5	[X86] Fix the inversion of low and high bits for the lowering of MUL_LOHI. Also add a few comments. <rdar://problem/17581756> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212808 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-11 12:08:23 +00:00
Jan Vesely	865527a09b	R600: Implement float to long/ulong Use alg. from LegalizeDAG.cpp Move Expand setting to SIISellowering v2: Extend existing tests instead of creating new ones v3: use separate LowerFPTOSINT function v4: use TargetLowering::expandFP_TO_SINT add comment about using FP_TO_SINT for uints Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212773 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 22:40:21 +00:00
Zoran Jovanovic	fc6a659676	[mips] Emit two CFI offset directives per double precision SDC1/LDC1 instead of just one for FR=1 registers Differential Revision: http://reviews.llvm.org/D4310 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212769 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 22:23:30 +00:00
Andrea Di Biagio	4d3b6131c7	Extend the test coverage in combine-vec-shuffle-2.ll adding some negative tests. Add test cases where we don't expect to trigger the combine optimizations introduced at revision 212748. No functional change intended. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212756 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 18:59:41 +00:00
Matt Arsenault	b730c3d28d	Revert "Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine."" Don't try to convert the select condition type. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212750 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 18:21:04 +00:00
Andrea Di Biagio	3996e5290a	[DAG] Further improve the logic in DAGCombiner that folds a pair of shuffles into a single shuffle if the resulting mask is legal. This patch teaches the DAGCombiner how to fold shuffles according to the following new rules: 1. shuffle(shuffle(x, y), undef) -> x 2. shuffle(shuffle(x, y), undef) -> y 3. shuffle(shuffle(x, y), undef) -> shuffle(x, undef) 4. shuffle(shuffle(x, y), undef) -> shuffle(y, undef) The backend avoids to combine shuffles according to rules 3. and 4. if the resulting shuffle does not have a legal mask. This is to avoid introducing illegal shuffles that are potentially expanded into a sub-optimal sequence of target specific dag nodes during vector legalization. Added test case combine-vec-shuffle-2.ll to verify that we correctly triggers the new rules when combining shuffles. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212748 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 18:04:55 +00:00
Akira Hatanaka	60079c18bc	[X86] Mark pseudo instruction TEST8ri_NOEREX as hasSIdeEffects=0. Also, add a case clause in X86InstrInfo::shouldScheduleAdjacent to enable macro-fusion. <rdar://problem/15680770> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212747 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 18:00:53 +00:00
Zoran Jovanovic	0ef47114d3	[mips] Added FPXX modeless calling convention. Differential Revision: http://reviews.llvm.org/D4293 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212726 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 15:36:12 +00:00
Tim Northover	6b0ac2aa02	AArch64: correctly fast-isel i8 & i16 multiplies We were asking for a register for type i8 or i16 which caused an assert. rdar://problem/17620015 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212718 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 14:18:46 +00:00
Daniel Sanders	24a071b8c5	[mips] Add support for -modd-spreg/-mno-odd-spreg Summary: When -mno-odd-spreg is in effect, 32-bit floating point values are not permitted in odd FPU registers. The option also prohibits 32-bit and 64-bit floating point comparison results from being written to odd registers. This option has three purposes: * It allows support for certain MIPS implementations such as loongson-3a that do not allow the use of odd registers for single precision arithmetic. * When using -mfpxx, -mno-odd-spreg is the default and this allows us to statically check that code is compliant with the O32 FPXX ABI since mtc1/mfc1 instructions to/from odd registers are guaranteed not to appear for any reason. Once this has been established, the user can then re-enable -modd-spreg to regain the use of all 32 single-precision registers. * When using -mfp64 and -mno-odd-spreg together, an O32 extension named O32 FP64A is used as the ABI. This is intended to provide almost all functionality of an FR=1 processor but can also be executed on a FR=0 core with the assistance of a hardware compatibility mode which emulates FR=0 behaviour on an FR=1 processor. * Added '.module oddspreg' and '.module nooddspreg' each of which update the .MIPS.abiflags section appropriately * Moved setFpABI() call inside emitDirectiveModuleFP() so that the caller doesn't have to remember to do it. * MipsABIFlags now calculates the flags1 and flags2 member on demand rather than trying to maintain them in the same format they will be emitted in. There is one portion of the -mfp64 and -mno-odd-spreg combination that is not implemented yet. Moves to/from odd-numbered double-precision registers must not use mtc1. I will fix this in a follow-up. Differential Revision: http://reviews.llvm.org/D4383 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212717 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 13:38:23 +00:00
Chandler Carruth	cdbdfa28d1	[x86,SDAG] Introduce any- and sign-extend-vector-inreg nodes analogous to the zero-extend-vector-inreg node introduced previously for the same purpose: manage the type legalization of widened extend operations, especially to support the experimental widening mode for x86. I'm adding both because sign-extend is expanded in terms of any-extend with shifts to propagate the sign bit. This removes the last fundamental scalarization from vec_cast2.ll (a test case that hit many really bad edge cases for widening legalization), although the trunc tests in that file still appear scalarized because the the shuffle legalization is scalarizing. Funny thing, I've been working on that. Some initial experiments with this and SSE2 scenarios is showing moderately good behavior already for sign extension. Still some work to do on the shuffle combining on X86 before we're generating optimal sequences, but avoiding scalarization is a huge step forward. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212714 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 12:32:32 +00:00
NAKAMURA Takumi	e93de7fb73	llvm/test/CodeGen/X86/shift-parts.ll: FileCheck-ize. (from r212640) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212709 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 11:37:39 +00:00
NAKAMURA Takumi	5290734b8f	Revert r212640, "Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine." This caused miscompilation on, at least, x86-64. SExt(i1 cond) confused other optimizations. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212708 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 11:37:28 +00:00
Chandler Carruth	a4e7f05a28	[x86] Add another combine that is particularly useful for the new vector shuffle lowering: match shuffle patterns equivalent to an unpcklwd or unpckhwd instruction. This allows us to use generic lowering code for v8i16 shuffles and match the unpack pattern late. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212705 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 11:09:29 +00:00
Daniel Sanders	b0b3161567	Make it possible for ints/floats to return different values from getBooleanContents() Summary: On MIPS32r6/MIPS64r6, floating point comparisons return 0 or -1 but integer comparisons return 0 or 1. Updated the various uses of getBooleanContents. Two simplifications had to be disabled when float and int boolean contents differ: - ScalarizeVecRes_VSELECT except when the kind of boolean contents is trivially discoverable (i.e. when the condition of the VSELECT is a SETCC node). - visitVSELECT (select C, 0, 1) -> (xor C, 1). Come to think of it, this one could test for the common case of 'C' being a SETCC too. Preserved existing behaviour for all other targets and updated the affected MIPS32r6/MIPS64r6 tests. This also fixes the pi benchmark where the 'low' variable was counting in the wrong direction because it thought it could simply add the result of the comparison. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, jholewinski, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D4389 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212697 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 10:18:12 +00:00
Chandler Carruth	977aab501d	[x86] Expand the target DAG combining for PSHUFD nodes to be able to combine into half-shuffles through unpack instructions that expand the half to a whole vector without messing with the dword lanes. This fixes some redundant instructions in splat-like lowerings for v16i8, which are now getting to be really nice. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212695 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 09:57:36 +00:00
Chandler Carruth	dc90a3ab8f	[x86] Tweak the v16i8 single input special case lowering for shuffles that splat i8s into i16s. Previously, we would try much too hard to arrange a sequence of i8s in one half of the input such that we could unpack them into i16s and shuffle those into place. This isn't always going to be a cheaper i8 shuffle than our other strategies. The case where it is always going to be cheaper is when we can arrange all the necessary inputs into one half using just i16 shuffles. It happens that viewing the problem this way also makes it much easier to produce an efficient set of shuffles to move the inputs into one half and then unpack them. With this, our splat code gets one step closer to being not terrible with the new experimental lowering strategy. It also exposes two combines missing which I will add next. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212692 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 09:16:40 +00:00
Chandler Carruth	95b14b00da	[x86] Initial improvements to the new shuffle lowering for v16i8 shuffles specifically for cases where a small subset of the elements in the input vector are actually used. This is specifically targetted at improving the shuffles generated for trunc operations, but also helps out splat-like operations. There is still some really low-hanging fruit here that I want to address but this is a huge step in the right direction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212680 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 04:34:06 +00:00
Hao Liu	a3c15c19b8	[AArch64]Fix an assertion failure in DAG Combiner about concating 2 build_vector. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212677 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 03:41:50 +00:00
Matt Arsenault	425ef825a6	R600/SI: Add support for llvm.convert.{to\|from}.fp16 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212676 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-10 03:22:20 +00:00
David Blaikie	eff0c67b6a	Recommit r212203: Don't try to construct debug LexicalScopes hierarchy for functions that do not have top level debug information. Reverted by Eric Christopher (Thanks!) in r212203 after Bob Wilson reported LTO issues. Duncan Exon Smith and Aditya Nandakumar helped provide a reduced reproduction, though the failure wasn't too hard to guess, and even easier with the example to confirm. The assertion that the subprogram metadata associated with an llvm::Function matches the scope data referenced by the DbgLocs on the instructions in that function is not valid under LTO. In LTO, a C++ inline function might exist in multiple CUs and the subprogram metadata nodes will refer to the same llvm::Function. In this case, depending on the order of the CUs, the first intance of the subprogram metadata may not be the one referenced by the instructions in that function and the assertion will fail. A test case (test/DebugInfo/cross-cu-linkonce-distinct.ll) is added, the assertion removed and a comment added to explain this situation. Original commit message: If a function isn't actually in a CU's subprogram list in the debug info metadata, ignore all the DebugLocs and don't try to build scopes, track variables, etc. While this is possibly a minor optimization, it's also a correctness fix for an incoming patch that will add assertions to LexicalScopes and the debug info verifier to ensure that all scope chains lead to debug info for the current function. Fix up a few test cases that had broken/incomplete debug info that could violate this constraint. Add a test case where this occurs by design (inlining a debug-info-having function in an attribute nodebug function - we want this to work because /if/ the nodebug function is then inlined into a debug-info-having function, it should be fine (and will work fine - we just stitch the scopes up as usual), but should the inlining not happen we need to not assert fail either). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212649 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-09 21:02:41 +00:00
Matt Arsenault	3e8ed89484	Add trunc (select c, a, b) -> select c (trunc a), (trunc b) combine. Do this if the truncate is free and the select is legal. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212640 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-09 19:12:07 +00:00
Jim Grosbach	a3edd6a038	AArch64: Better codegen for storing to __fp16. Storing will generally be immediately preceded by rounding from an f32 or f64, so make sure to match those patterns directly to convert into the FPR16 register class directly rather than going through the integer GPRs. This also eliminates an extra step in the convert-from-f64 path which was first converting to f32 and then to f16 from there. rdar://17594379 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212638 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-09 18:55:52 +00:00
Chandler Carruth	4c27c85cde	[x86] Fix a bug in my new zext-vector-inreg DAG trickery where we were not widening the input type to the node sufficiently to let the ext take place in a register. This would in turn result in a mysterious bitcast assertion failure downstream. First change here is to add back the helpful assert I had in an earlier version of the code to catch this immediately. Next change is to add support to the type legalization to detect when we have widened the operand either too little or too much (for whatever reason) and find a size-matched legal vector type to convert it to first. This can also fail so we get a new fallback path, but that seems OK. With this, we no longer crash on vec_cast2.ll when using widening. I've also added the CHECK lines for the zero-extend cases here. We still need to support sign-extend and trunc (or something) to get plausible code for the other two thirds of this test which is one of the regression tests that showed the most scalarization when widening was force-enabled. Slowly closing in on widening being a viable legalization strategy without it resorting to scalarization at every turn. =] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212614 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-09 12:36:54 +00:00
Benjamin Kramer	4afbd3e941	X86: When lowering v8i32 himuls use the correct shuffle masks for AVX2. Turns out my trick of using the same masks for SSE4.1 and AVX2 didn't work out as we have to blend two vectors. While there remove unecessary cross-lane moves from the shuffles so the backend can lower it to palignr instead of vperm. Fixes PR20118, a miscompilation of vector sdiv by constant on AVX2. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212611 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-09 11:12:39 +00:00
Chandler Carruth	ce184e95f9	[x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when widening vector types to be legal and a ZERO_EXTEND node is encountered. When we use widening to legalize vector types, extend nodes are a real challenge. Either the input or output is likely to be legal, but in many cases not both. As a consequence, we don't really have any way to represent this situation and the prior code in the widening legalization framework would just scalarize the extend operation completely. This patch introduces a new DAG node to represent doing a zero extend of a vector "in register". The core of the idea is to allow legal but different vector types in the input and output. The output vector must have fewer lanes but wider elements. The operation is defined to zero extend the low elements of the input to the size of the output elements, and drop all of the high elements which don't have a corresponding lane in the output vector. It also includes generic expansion of this node in terms of blending a zero vector into the high elements of the vector and bitcasting across. This in turn yields extremely nice code for x86 SSE2 when we use the new widening legalization logic in conjunction with the new shuffle lowering logic. There is still more to do here. We need to support sign extension, any extension, and potentially int-to-float conversions. My current plan is to continue using similar synthetic nodes to model each of these transitions with generic lowering code for each one. However, with this patch LLVM already reaches performance parity with GCC for the core C loops of the x264 code (assuming you disable the hand-written assembly versions) when compiling for SSE2 and SSE3 architectures and enabling the new widening and lowering logic for vectors. Differential Revision: http://reviews.llvm.org/D4405 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212610 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-09 10:58:18 +00:00
Daniel Sanders	1285b3130b	[mips][mips64r6] Correct select patterns that have the condition or true/false values backwards Summary: This bug caused SingleSource/Regression/C/uint64_to_float and SingleSource/UnitTests/2002-05-02-CastTest3 to fail (among others). Differential Revision: http://reviews.llvm.org/D4388 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212608 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-09 10:47:26 +00:00
Daniel Sanders	388704618e	[mips][mips64r6] Correct cond names in the cmp.cond.[ds] instructions Summary: It seems we accidentally read the wrong column of the table MIPS64r6 spec and used the names for c.cond.fmt instead of cmp.cond.fmt. Differential Revision: http://reviews.llvm.org/D4387 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212607 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-09 10:40:20 +00:00
Daniel Sanders	f08bcb9b97	[mips][mips64r6] Use JALR for indirect branches instead of JR (which is not available on MIPS32r6/MIPS64r6) Summary: This completes the change to use JALR instead of JR on MIPS32r6/MIPS64r6. Reviewers: jkolek, vmedic, zoran.jovanovic, dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4269 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212605 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-09 10:21:59 +00:00
Daniel Sanders	7c2ef822f7	[mips][mips64r6] Use JALR for returns instead of JR (which is not available on MIPS32r6/MIPS64r6) Summary: RET, and RET_MM have been replaced by a pseudo named PseudoReturn. In addition a version with a 64-bit GPR named PseudoReturn64 has been added. Instruction selection for a return matches RetRA, which is expanded post register allocation to PseudoReturn/PseudoReturn64. During MipsAsmPrinter, this PseudoReturn/PseudoReturn64 are emitted as: - (JALR64 $zero, $rs) on MIPS64r6 - (JALR $zero, $rs) on MIPS32r6 - (JR_MM $rs) on microMIPS - (JR $rs) otherwise On MIPS32r6/MIPS64r6, 'jr $rs' is an alias for 'jalr $zero, $rs'. To aid development and review (specifically, to ensure all cases of jr are updated), these aliases are temporarily named 'r6.jr' instead of 'jr'. A follow up patch will change them back to the correct mnemonic. Added (JALR $zero, $rs) to MipsNaClELFStreamer's definition of an indirect jump, and removed it from its definition of a call. Note: I haven't accounted for MIPS64 in MipsNaClELFStreamer since it's doesn't appear to account for any MIPS64-specifics. The return instruction created as part of eh_return expansion is now expanded using expandRetRA() so we use the right return instruction on MIPS32r6/MIPS64r6 ('jalr $zero, $rs'). Also, fixed a misuse of isABI_N64() to detect 64-bit wide registers in expandEhReturn(). Reviewers: jkolek, vmedic, mseaborn, zoran.jovanovic, dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4268 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212604 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-09 10:16:07 +00:00
Chandler Carruth	98eac0a244	[x86] Re-apply a variant of the x86 side of r212324 now that the rest has settled without incident, removing the x86-specific and overly strict 'isVectorSplat' routine in favor of generic and more powerful splat detection. The primary motivation and result of this is that the x86 backend can now see through splats which contain undef elements. This is essential if we are using a widening form of legalization and I've updated a test case to also run in that mode as before this change the generated code for the test case was completely scalarized. This version of the patch much more carefully handles the undef lanes. - We aren't overly conservative about them in the shift lowering (where we will never use the splat itself). - One place where the splat would have been re-used by the existing code now explicitly constructs a new constant splat that will be safe. - The broadcast lowering is much more reasonable with undefs by doing a correct check of whether the splat is the only user of a loaded value, checking that the splat actually crosses multiple lanes before using a broadcast, and handling broadcasts of non-constant splats. As a consequence of the last bullet, the weird usage of vpshufd instead of vbroadcast is gone, and we actually can lower an AVX splat with vbroadcastss where before we emitted a really strange pattern of a vector load and a manual splat across the vector. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212602 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-09 10:06:58 +00:00
Jim Grosbach	05bb7c5045	AArch64: Better codegen for loading from __fp16. Loading will generally extend to an f32 or an 64, so make sure to match those patterns directly to load into the FPR16 register class directly rather than going through the integer GPRs. This also eliminates an extra step in the convert-to-f64 path which was first converting to f32 and then to f64 from there. rdar://17594379 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212573 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-08 23:28:48 +00:00
Andrea Di Biagio	b8245a4599	[DAG] Teach how to combine a pair of shuffles into a single shuffle if the resulting mask is legal. This patch teaches how to fold a shuffle according to rule: shuffle (shuffle (x, undef, M0), undef, M1) -> shuffle(x, undef, M2) We do this only if the resulting mask M2 is legal; this is to avoid introducing illegal shuffles that are potentially expanded into a sub-optimal sequence of target specific dag nodes. This patch has the advantage of being target independent, since it works on ISD nodes. Therefore, all targets (not only x86) can take advantage of this rule. The idea behind this patch is that most shuffle pairs can be safely combined before we run the legalizer on vector operations. This allows us to combine/simplify dag nodes earlier in the process and not only immediately before instruction selection stage. That said. This patch is not meant to replace any existing target specific combine rules; backends might still introduce new shuffles during legalization stage. Also, this rule is very simple and avoids to aggressively optimize shuffles. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212539 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-08 15:22:29 +00:00

1 2 3 4 5 ...

11030 Commits