llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2024-07-22 09:29:31 +00:00

Author	SHA1	Message	Date
Stephen Lin	e54885af9b	AArch64/PowerPC/SystemZ/X86: This patch fixes the interface, usage, and all in-tree implementations of TargetLoweringBase::isFMAFasterThanMulAndAdd in order to resolve the following issues with fmuladd (i.e. optional FMA) intrinsics: 1. On X86(-64) targets, ISD::FMA nodes are formed when lowering fmuladd intrinsics even if the subtarget does not support FMA instructions, leading to laughably bad code generation in some situations. 2. On AArch64 targets, ISD::FMA nodes are formed for operations on fp128, resulting in a call to a software fp128 FMA implementation. 3. On PowerPC targets, FMAs are not generated from fmuladd intrinsics on types like v2f32, v8f32, v4f64, etc., even though they promote, split, scalarize, etc. to types that support hardware FMAs. The function has also been slightly renamed for consistency and to force a merge/build conflict for any out-of-tree target implementing it. To resolve, see comments and fixed in-tree examples. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185956 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-09 18:16:56 +00:00
Chad Rosier	5b3fca50a0	The getRegForInlineAsmConstraint function should only accept MVT value types. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184642 91177308-0d34-0410-b5e6-96231b3b80d8	2013-06-22 18:37:38 +00:00
Bill Wendling	a5e5ba611f	Don't cache the instruction and register info from the TargetMachine, because the internals of TargetMachine could change. No functionality change intended. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183571 91177308-0d34-0410-b5e6-96231b3b80d8	2013-06-07 21:00:34 +00:00
Andrew Trick	ac6d9bec67	Track IR ordering of SelectionDAG nodes 2/4. Change SelectionDAG::getXXXNode() interfaces as well as call sites of these functions to pass in SDLoc instead of DebugLoc. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182703 91177308-0d34-0410-b5e6-96231b3b80d8	2013-05-25 02:42:55 +00:00
Matt Arsenault	225ed7069c	Add LLVMContext argument to getSetCCResultType git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182180 91177308-0d34-0410-b5e6-96231b3b80d8	2013-05-18 00:21:46 +00:00
Bill Wendling	13bbe1f52e	Use the target options specified on a function to reset the back-end. During LTO, the target options on functions within the same Module may change. This would necessitate resetting some of the back-end. Do this for X86, because it's a Friday afternoon. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178917 91177308-0d34-0410-b5e6-96231b3b80d8	2013-04-05 21:52:40 +00:00
Michael Liao	c26392aa5d	Add support of RDSEED defined in AVX2 extension git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178314 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-28 23:41:26 +00:00
Michael Liao	f8fd883fd3	Add XTEST codegen support git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178083 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-26 22:47:01 +00:00
Michael Liao	a6b20ced76	Fix PR10475 - ISD::SHL/SRL/SRA must have either both scalar or both vector operands but TLI.getShiftAmountTy() so far only return scalar type. As a result, backend logic assuming that breaks. - Rename the original TLI.getShiftAmountTy() to TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to return target-specificed scalar type or the same vector type as the 1st operand. - Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar type. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176364 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-01 18:40:30 +00:00
Eli Bendersky	d6f19c7163	The operand listing is very much outdated. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175220 91177308-0d34-0410-b5e6-96231b3b80d8	2013-02-14 23:17:03 +00:00
Evan Cheng	8688a58c53	Teach SDISel to combine fsin / fcos into a fsincos node if the following conditions are met: 1. They share the same operand and are in the same BB. 2. Both outputs are used. 3. The target has a native instruction that maps to ISD::FSINCOS node or the target provides a sincos library call. Implemented the generic optimization in sdisel and enabled it for Mac OSX. Also added an additional optimization for x86_64 Mac OSX by using an alternative entry point __sincos_stret which returns the two results in xmm0 / xmm1. rdar://13087969 PR13204 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@173755 91177308-0d34-0410-b5e6-96231b3b80d8	2013-01-29 02:32:37 +00:00
Craig Topper	4aee1bb222	Fix inconsistent usage of PALIGN and PALIGNR when referring to the same instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@173667 91177308-0d34-0410-b5e6-96231b3b80d8	2013-01-28 06:48:25 +00:00
Craig Topper	b84b423634	Make helper method static. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@173005 91177308-0d34-0410-b5e6-96231b3b80d8	2013-01-21 06:13:28 +00:00
Craig Topper	d713c0f7f1	Capitalize lowerTRUNCATE so that it matches the other lower functions in this file despite it not matching coding standards. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172994 91177308-0d34-0410-b5e6-96231b3b80d8	2013-01-20 21:34:37 +00:00
Craig Topper	26827f3dc5	Make LowerVSETCC a static function and use MVT instead of EVT. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172969 91177308-0d34-0410-b5e6-96231b3b80d8	2013-01-20 09:02:22 +00:00
Craig Topper	f84b7500ce	Make some helper methods static. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172936 91177308-0d34-0410-b5e6-96231b3b80d8	2013-01-20 00:50:58 +00:00
Craig Topper	00a312c478	Capitalize LowerVectorIntExtend to be consistent with all the other lower functions in this file. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172927 91177308-0d34-0410-b5e6-96231b3b80d8	2013-01-19 23:14:09 +00:00
Nadav Rotem	13f8cf55d4	Efficient lowering of vector sdiv when the divisor is a splatted power of two constant. PR 14848. The lowered sequence is based on the existing sequence the target-independent DAG Combiner creates for the scalar case. Patch by Zvi Rackover. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171953 91177308-0d34-0410-b5e6-96231b3b80d8	2013-01-09 05:14:33 +00:00
Chandler Carruth	aeef83c6af	Switch TargetTransformInfo from an immutable analysis pass that requires a TargetMachine to construct (and thus isn't always available), to an analysis group that supports layered implementations much like AliasAnalysis does. This is a pretty massive change, with a few parts that I was unable to easily separate (sorry), so I'll walk through it. The first step of this conversion was to make TargetTransformInfo an analysis group, and to sink the nonce implementations in ScalarTargetTransformInfo and VectorTargetTranformInfo into a NoTargetTransformInfo pass. This allows other passes to add a hard requirement on TTI, and assume they will always get at least on implementation. The TargetTransformInfo analysis group leverages the delegation chaining trick that AliasAnalysis uses, where the base class for the analysis group delegates to the previous analysis pass, allowing all but tho NoFoo analysis passes to only implement the parts of the interfaces they support. It also introduces a new trick where each pass in the group retains a pointer to the top-most pass that has been initialized. This allows passes to implement one API in terms of another API and benefit when some other pass above them in the stack has more precise results for the second API. The second step of this conversion is to create a pass that implements the TargetTransformInfo analysis using the target-independent abstractions in the code generator. This replaces the ScalarTargetTransformImpl and VectorTargetTransformImpl classes in lib/Target with a single pass in lib/CodeGen called BasicTargetTransformInfo. This class actually provides most of the TTI functionality, basing it upon the TargetLowering abstraction and other information in the target independent code generator. The third step of the conversion adds support to all TargetMachines to register custom analysis passes. This allows building those passes with access to TargetLowering or other target-specific classes, and it also allows each target to customize the set of analysis passes desired in the pass manager. The baseline LLVMTargetMachine implements this interface to add the BasicTTI pass to the pass manager, and all of the tools that want to support target-aware TTI passes call this routine on whatever target machine they end up with to add the appropriate passes. The fourth step of the conversion created target-specific TTI analysis passes for the X86 and ARM backends. These passes contain the custom logic that was previously in their extensions of the ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces. I separated them into their own file, as now all of the interface bits are private and they just expose a function to create the pass itself. Then I extended these target machines to set up a custom set of analysis passes, first adding BasicTTI as a fallback, and then adding their customized TTI implementations. The fourth step required logic that was shared between the target independent layer and the specific targets to move to a different interface, as they no longer derive from each other. As a consequence, a helper functions were added to TargetLowering representing the common logic needed both in the target implementation and the codegen implementation of the TTI pass. While technically this is the only change that could have been committed separately, it would have been a nightmare to extract. The final step of the conversion was just to delete all the old boilerplate. This got rid of the ScalarTargetTransformInfo and VectorTargetTransformInfo classes, all of the support in all of the targets for producing instances of them, and all of the support in the tools for manually constructing a pass based around them. Now that TTI is a relatively normal analysis group, two things become straightforward. First, we can sink it into lib/Analysis which is a more natural layer for it to live. Second, clients of this interface can depend on it always being available which will simplify their code and behavior. These (and other) simplifications will follow in subsequent commits, this one is clearly big enough. Finally, I'm very aware that much of the comments and documentation needs to be updated. As soon as I had this working, and plausibly well commented, I wanted to get it committed and in front of the build bots. I'll be doing a few passes over documentation later if it sticks. Commits to update DragonEgg and Clang will be made presently. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171681 91177308-0d34-0410-b5e6-96231b3b80d8	2013-01-07 01:37:14 +00:00
Nadav Rotem	e503319874	LoopVectorizer: 1. Add code to estimate register pressure. 2. Add code to select the unroll factor based on register pressure. 3. Add bits to TargetTransformInfo to provide the number of registers. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171469 91177308-0d34-0410-b5e6-96231b3b80d8	2013-01-04 17:48:25 +00:00
Hal Finkel	82860f63e1	Add a subtype parameter to VTTI::getShuffleCost In order to cost subvector insertion and extraction, we need to know the type of the subvector being extracted. No functionality change. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171453 91177308-0d34-0410-b5e6-96231b3b80d8	2013-01-03 02:34:09 +00:00
Nadav Rotem	ae34b4280e	CostModel: initial checkin for code that estimates the cost of special shuffles. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171180 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-28 08:19:03 +00:00
Nadav Rotem	0509db2738	AVX: Move the ZEXT/ANYEXT DAGCo optimizations to the lowering of these optimizations. The old test cases still cover all of these lowering/optimizations. The single change that we have is that now anyext does not need to zero a register, because it does not use the exact code path as the zero_extend. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171178 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-28 05:45:24 +00:00
Nadav Rotem	1a330af3b5	AVX/AVX2: Move the SEXT lowering code from a target specific DAGco to a lowering function. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171170 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-27 22:47:16 +00:00
Benjamin Kramer	739c7a83e1	X86: Match the SSE/AVX min/max vector ops using a custom node instead of intrinsics This is very mechanical, no functionality change. Preparation for PR14667. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170898 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-21 14:04:55 +00:00
Nadav Rotem	042a9a2666	Add a missing "virtual" keyword. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170842 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-21 05:02:12 +00:00
Nadav Rotem	f5637c3997	Improve the X86 cost model for loads and stores. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170830 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-21 01:33:59 +00:00
Patrik Hagglund	e5c65911a6	Change TargetLowering::getTypeForExtArgOrReturn to take and return MVTs, instead of EVTs. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170537 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-19 12:02:25 +00:00
Patrik Hagglund	0340557fb8	Change TargetLowering::findRepresentativeClass to take an MVT, instead of EVT. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170532 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-19 11:30:36 +00:00
Craig Topper	b926afcc5b	Simplify BMI ANDN matching to use patterns instead of a DAG combine. Also add ANDN to isDefConvertible. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170305 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-17 05:12:30 +00:00
Benjamin Kramer	388fc6a988	X86: Add a couple of target-specific dag combines that turn VSELECTS into psubus if possible. We match the pattern "x >= y ? x-y : 0" into "subus x, y" and two special cases if y is a constant. DAGCombiner canonicalizes those so we first have to undo the canonicalization for those cases. The pattern occurs in gzip when the loop vectorizer is enabled. Part of PR14613. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170273 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-15 16:47:44 +00:00
Evan Cheng	946a3a9f22	Sorry about the churn. One more change to getOptimalMemOpType() hook. Did I mention the inline memcpy / memset expansion code is a mess? This patch split the ZeroOrLdSrc argument into two: IsMemset and ZeroMemset. The first indicates whether it is expanding a memset or a memcpy / memmove. The later is whether the memset is a memset of zero. It's totally possible (likely even) that targets may want to do different things for memcpy and memset of zero. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169959 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-12 02:34:41 +00:00
Evan Cheng	7d34267df6	- Rename isLegalMemOpType to isSafeMemOpType. "Legal" is a very overloade term. Also added more comments to explain why it is generally ok to return true. - Rename getOptimalMemOpType argument IsZeroVal to ZeroOrLdSrc. It's meant to be true for loaded source (memcpy) or zero constants (memset). The poor name choice is probably some kind of legacy issue. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169954 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-12 01:32:07 +00:00
Evan Cheng	61f4dfe369	Avoid using lossy load / stores for memcpy / memset expansion. e.g. f64 load / store on non-SSE2 x86 targets. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169944 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-12 00:42:09 +00:00
Patrik Hagglund	34525f9ac0	Revert EVT->MVT changes, r169836-169851, due to buildbot failures. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169854 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-11 11:14:33 +00:00
Patrik Hagglund	47fd10f2fc	Change TargetLowering::getTypeForExtArgOrReturn to take and return MVTs, instead of EVTs. Accordingly, add bitsLT (and similar) to MVT. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169850 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-11 10:20:51 +00:00
Patrik Hagglund	bade0345d1	Change TargetLowering::findRepresentativeClass to take an MVT, instead of EVT. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169845 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-11 09:57:18 +00:00
Evan Cheng	376642ed62	Some enhancements for memcpy / memset inline expansion. 1. Teach it to use overlapping unaligned load / store to copy / set the trailing bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies. 2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g. x86 and ARM. 3. When memcpy from a constant string, do not replace the load with a constant if it's not possible to materialize an integer immediate with a single instruction (required a new target hook: TLI.isIntImmLegal()). 4. Use unaligned load / stores more aggressively if target hooks indicates they are "fast". 5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8. Also increase the threshold to something reasonable (8 for memset, 4 pairs for memcpy). This significantly improves Dhrystone, up to 50% on ARM iOS devices. rdar://12760078 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169791 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-10 23:21:26 +00:00
Shuxin Yang	5518a1355b	- Re-enable population count loop idiom recognization - fix a bug which cause sigfault. - add two testing cases which was causing crash git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169687 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-09 03:12:46 +00:00
Chandler Carruth	7065a2bcec	Revert the patches adding a popcount loop idiom recognition pass. There are still bugs in this pass, as well as other issues that are being worked on, but the bugs are crashers that occur pretty easily in the wild. Test cases have been sent to the original commit's review thread. This reverts the commits: r169671: Fix a logic error. r169604: Move the popcnt tests to an X86 subdirectory. r168931: Initial commit adding the pass. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169683 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-08 22:18:29 +00:00
Evan Cheng	2766a47310	Replace r169459 with something safer. Rather than having computeMaskedBits to understand target implementation of any_extend / extload, just generate zero_extend in place of any_extend for liveouts when the target knows the zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz). rdar://12771555 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169536 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-06 19:13:27 +00:00
Evan Cheng	8a7186dbc2	Let targets provide hooks that compute known zero and ones for any_extend and extload's. If they are implemented as zero-extend, or implicitly zero-extend, then this can enable more demanded bits optimizations. e.g. define void @foo(i16* %ptr, i32 %a) nounwind { entry: %tmp1 = icmp ult i32 %a, 100 br i1 %tmp1, label %bb1, label %bb2 bb1: %tmp2 = load i16* %ptr, align 2 br label %bb2 bb2: %tmp3 = phi i16 [ 0, %entry ], [ %tmp2, %bb1 ] %cmp = icmp ult i16 %tmp3, 24 br i1 %cmp, label %bb3, label %exit bb3: call void @bar() nounwind br label %exit exit: ret void } This compiles to the followings before: push {lr} mov r2, #0 cmp r1, #99 bhi LBB0_2 @ BB#1: @ %bb1 ldrh r2, [r0] LBB0_2: @ %bb2 uxth r0, r2 cmp r0, #23 bhi LBB0_4 @ BB#3: @ %bb3 bl _bar LBB0_4: @ %exit pop {lr} bx lr The uxth is not needed since ldrh implicitly zero-extend the high bits. With this change it's eliminated. rdar://12771555 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169459 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-06 01:28:01 +00:00
Elena Demikhovsky	226e0e6264	Simplified BLEND pattern matching for shuffles. Generate VPBLENDD for AVX2 and VPBLENDW for v16i16 type on AVX2. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169366 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-05 09:24:57 +00:00
Chandler Carruth	a1514e24cc	Sort includes for all of the .h files under the 'lib' tree. These were missed in the first pass because the script didn't yet handle include guards. Note that the script is now able to handle all of these headers without manual edits. =] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169224 91177308-0d34-0410-b5e6-96231b3b80d8	2012-12-04 07:12:27 +00:00
Shuxin Yang	84fca61ca5	rdar://12100355 (part 1) This revision attempts to recognize following population-count pattern: while(a) { c++; ... ; a &= a - 1; ... }, where <c> and <a>could be used multiple times in the loop body. TODO: On X8664 and ARM, __buildin_ctpop() are not expanded to a efficent instruction sequence, which need to be improved in the following commits. Reviewed by Nadav, really appreciate! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168931 91177308-0d34-0410-b5e6-96231b3b80d8	2012-11-29 19:38:54 +00:00
Craig Topper	2da3691d6d	Move some helper methods to being static functions in the implementation file. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167696 91177308-0d34-0410-b5e6-96231b3b80d8	2012-11-11 22:45:02 +00:00
Craig Topper	5ed5c37d7f	Removed unimplemented method declaration. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167670 91177308-0d34-0410-b5e6-96231b3b80d8	2012-11-10 09:00:12 +00:00
Craig Topper	8aae8ddb92	Simplify custom emitter code for pcmp(e/i)str(i/m) and make the helper functions static. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167669 91177308-0d34-0410-b5e6-96231b3b80d8	2012-11-10 08:57:41 +00:00
Craig Topper	9c7ae01f39	Cleanup pcmp(e/i)str(m/i) instruction definitions and load folding support. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167652 91177308-0d34-0410-b5e6-96231b3b80d8	2012-11-10 01:23:36 +00:00
Michael Liao	be02a90de1	Add support of RTM from TSX extension - Add RTM code generation support throught 3 X86 intrinsics: xbegin()/xend() to start/end a transaction region, and xabort() to abort a tranaction region git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167573 91177308-0d34-0410-b5e6-96231b3b80d8	2012-11-08 07:28:54 +00:00
Nadav Rotem	b042868c01	Cost Model: add tables for some avx type-conversion hacks. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167480 91177308-0d34-0410-b5e6-96231b3b80d8	2012-11-06 19:33:53 +00:00
Nadav Rotem	7ae3bcca45	CostModel: Add tables for the common x86 compares. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167421 91177308-0d34-0410-b5e6-96231b3b80d8	2012-11-05 23:48:20 +00:00
Nadav Rotem	b4b04c3fa0	X86 CostModel: Add support for a some of the common arithmetic instructions for SSE4, AVX and AVX2. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167347 91177308-0d34-0410-b5e6-96231b3b80d8	2012-11-03 00:39:56 +00:00
Nadav Rotem	0c31e43ff3	Add a stub for the x86 cost model impl. Implement a basic cost rule for inserting/extracting from XMM registers. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167333 91177308-0d34-0410-b5e6-96231b3b80d8	2012-11-02 23:27:16 +00:00
Michael Liao	c5c970ee85	Clean up redundant SP register maintained in X86 TLI git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167104 91177308-0d34-0410-b5e6-96231b3b80d8	2012-10-31 04:14:09 +00:00
Manman Ren	4c74a956b2	X86 MMX: optimize transfer from mmx to i32 We used to generate a store (movq) + a load. Now we use movd. rdar://9946746 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167056 91177308-0d34-0410-b5e6-96231b3b80d8	2012-10-30 22:15:38 +00:00
Michael Liao	a7554630e9	Add custom UINT_TO_FP from v4i8/v4i16/v8i8/v8i16 to v4f32/v8f32 - Replace v4i8/v8i8 -> v8f32 DAG combine with custom lowering to reduce DAG combine overhead. - Extend the support to v4i16/v8i16 as well. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166487 91177308-0d34-0410-b5e6-96231b3b80d8	2012-10-23 17:36:08 +00:00
Michael Liao	d9d09600ee	Enable lowering ZERO_EXTEND/ANY_EXTEND to PMOVZX from SSE4.1 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166486 91177308-0d34-0410-b5e6-96231b3b80d8	2012-10-23 17:34:00 +00:00
Michael Liao	facace808c	Lower BUILD_VECTOR to SHUFFLE + INSERT_VECTOR_ELT for X86 - If INSERT_VECTOR_ELT is supported (above SSE2, either by custom sequence of legal insn), transform BUILD_VECTOR into SHUFFLE + INSERT_VECTOR_ELT if most of elements could be built from SHUFFLE with few (so far 1) elements being inserted. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166288 91177308-0d34-0410-b5e6-96231b3b80d8	2012-10-19 17:15:18 +00:00
Michael Liao	bedcbd433d	Support v8f32 to v8i8/vi816 conversion through custom lowering - Add custom FP_TO_SINT on v8i16 (and v8i8 which is legalized as v8i16 due to vector element-wise widening) to reduce DAG combiner and its overhead added in X86 backend. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166036 91177308-0d34-0410-b5e6-96231b3b80d8	2012-10-16 18:14:11 +00:00
Michael Liao	6c0e04c823	Add __builtin_setjmp/_longjmp supprt in X86 backend - Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also used as a light-weight replacement of setjmp/longjmp which are used to implementation continuation, user-level threading, and etc. The support added in this patch ONLY addresses this usage and is NOT intended to support SjLj exception handling as zero-cost DWARF exception handling is used by default in X86. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165989 91177308-0d34-0410-b5e6-96231b3b80d8	2012-10-15 22:39:43 +00:00
Michael Liao	44c2d61b67	Add support for FP_ROUND from v2f64 to v2f32 - Due to the current matching vector elements constraints in ISD::FP_ROUND, rounding from v2f64 to v4f32 (after legalization from v2f32) is scalarized. Add a customized v2f32 widening to convert it into a target-specific X86ISD::VFPROUND to work around this constraints. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165631 91177308-0d34-0410-b5e6-96231b3b80d8	2012-10-10 16:53:28 +00:00
Michael Liao	9d796db3e7	Add alternative support for FP_ROUND from v2f32 to v2f64 - Due to the current matching vector elements constraints in ISD::FP_EXTEND, rounding from v2f32 to v2f64 is scalarized. Add a customized v2f32 widening to convert it into a target-specific X86ISD::VFPEXT to work around this constraints. This patch also reverts a previous attempt to fix this issue by recovering the scalarized ISD::FP_EXTEND pattern and thus significantly reduces the overhead of supporting non-power-2 vector FP extend. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165625 91177308-0d34-0410-b5e6-96231b3b80d8	2012-10-10 16:32:15 +00:00
Micah Villmow	3574eca1b0	Move TargetData to DataLayout. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165402 91177308-0d34-0410-b5e6-96231b3b80d8	2012-10-08 16:38:25 +00:00
Michael Liao	e5e8f7656a	Add missing i64 max/min/umax/umin on 32-bit target - Turn on atomic6432.ll and add specific test case as well git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164616 91177308-0d34-0410-b5e6-96231b3b80d8	2012-09-25 18:08:13 +00:00
Evan Cheng	b1cacc7423	Fix an illegal tailcall opt where the callee returns a double via xmm while caller returns x86_fp80 via st0. rdar://12229511 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164588 91177308-0d34-0410-b5e6-96231b3b80d8	2012-09-25 05:32:34 +00:00
Michael Liao	b118a073d7	Re-work X86 code generation of atomic ops with spin-loop - Rewrite/merge pseudo-atomic instruction emitters to address the following issue: * Reduce one unnecessary load in spin-loop previously the spin-loop looks like thisMBB: newMBB: ld t1 = [bitinstr.addr] op t2 = t1, [bitinstr.val] not t3 = t2 (if Invert) mov EAX = t1 lcs dest = [bitinstr.addr], t3 [EAX is implicit] bz newMBB fallthrough -->nextMBB the 'ld' at the beginning of newMBB should be lift out of the loop as lcs (or CMPXCHG on x86) will load the current memory value into EAX. This loop is refined as: thisMBB: EAX = LOAD [MI.addr] mainMBB: t1 = OP [MI.val], EAX LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined] JNE mainMBB sinkMBB: * Remove immopc as, so far, all pseudo-atomic instructions has all-register form only, there is no immedidate operand. * Remove unnecessary attributes/modifiers in pseudo-atomic instruction td * Fix issues in PR13458 - Add comprehensive tests on atomic ops on various data types. NOTE: Some of them are turned off due to missing functionality. - Revise tests due to the new spin-loop generated. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164281 91177308-0d34-0410-b5e6-96231b3b80d8	2012-09-20 03:06:15 +00:00
Michael Liao	f966e4e5b3	Add wider vector/integer support for PR12312 - Enhance the fix to PR12312 to support wider integer, such as 256-bit integer. If more than 1 fully evaluated vectors are found, POR them first followed by the final PTEST. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163832 91177308-0d34-0410-b5e6-96231b3b80d8	2012-09-13 20:24:54 +00:00
Craig Topper	55b2405484	Make a bunch of lowering helper functions static instead of member functions. No functional change. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163596 91177308-0d34-0410-b5e6-96231b3b80d8	2012-09-11 06:15:32 +00:00
Nadav Rotem	d60cb11afd	When unsafe math is used, we can use commutative FMAX and FMIN. In some cases this allows for better code generation. Added a new DAGCombine transformation to convert FMAX and FMIN to FMANC and FMINC, which are commutative. For example: movaps %xmm0, %xmm1 movsd LC(%rip), %xmm0 minsd %xmm1, %xmm0 becomes: minsd LC(%rip), %xmm0 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162187 91177308-0d34-0410-b5e6-96231b3b80d8	2012-08-19 13:06:16 +00:00
Craig Topper	c087870c47	Make ReplaceATOMIC_BINARY_64 a static function. Use a nested switch to reduce to only a single call to it thus allowing it to be inlined by the compiler. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162088 91177308-0d34-0410-b5e6-96231b3b80d8	2012-08-17 06:55:11 +00:00
Michael Liao	7091b2451d	fix PR11334 - FP_EXTEND only support extending from vectors with matching elements. This results in the scalarization of extending to v2f64 from v2f32, which will be legalized to v4f32 not matching with v2f64. - add X86-specific VFPEXT supproting extending from v4f32 to v2f64. - add BUILD_VECTOR lowering helper to recover back the original extending from v4f32 to v2f64. - test case is enhanced to include different vector width. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161894 91177308-0d34-0410-b5e6-96231b3b80d8	2012-08-14 21:24:47 +00:00
Craig Topper	bccc8ce9b8	Remove the LowerMMXCONCAT_VECTORS function. It could never execute because there are no legal 64-bit vector types that could be used as inputs to a 128-bit concat_vectors. Remove a target specific SDNode and its patterns that become unused as a result. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161742 91177308-0d34-0410-b5e6-96231b3b80d8	2012-08-13 01:23:55 +00:00
Craig Topper	4feb647283	Implement proper handling for pcmpistri/pcmpestri intrinsics. Requires custom handling in DAGISelToDAG due to limitations in TableGen's implicit def handling. Fixes PR11305. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161318 91177308-0d34-0410-b5e6-96231b3b80d8	2012-08-06 06:22:36 +00:00
Bob Wilson	d49edb7ab0	Fall back to selection DAG isel for calls to builtin functions. Fast isel doesn't currently have support for translating builtin function calls to target instructions. For embedded environments where the library functions are not available, this is a matter of correctness and not just optimization. Most of this patch is just arranging to make the TargetLibraryInfo available in fast isel. <rdar://problem/12008746> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161232 91177308-0d34-0410-b5e6-96231b3b80d8	2012-08-03 04:06:28 +00:00
Elena Demikhovsky	1503aba4a0	Added FMA functionality to X86 target. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161110 91177308-0d34-0410-b5e6-96231b3b80d8	2012-08-01 12:06:00 +00:00
Bill Wendling	2127c9b93a	Remove tabs. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160479 91177308-0d34-0410-b5e6-96231b3b80d8	2012-07-19 00:15:11 +00:00
Evan Cheng	70e10d3fe4	This is another case where instcombine demanded bits optimization created large immediates. Add dag combine logic to recover in case the large immediates doesn't fit in cmp immediate operand field. int foo(unsigned long l) { return (l>> 47) == 1; } we produce %shr.mask = and i64 %l, -140737488355328 %cmp = icmp eq i64 %shr.mask, 140737488355328 %conv = zext i1 %cmp to i32 ret i32 %conv which codegens to movq $0xffff800000000000,%rax andq %rdi,%rax movq $0x0000800000000000,%rcx cmpq %rcx,%rax sete %al movzbl %al,%eax ret TargetLowering::SimplifySetCC would transform (X & -256) == 256 -> (X >> 8) == 1 if the immediate fails the isLegalICmpImmediate() test. For x86, that's immediates which are not a signed 32-bit immediate. Based on a patch by Eli Friedman. PR10328 rdar://9758774 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160346 91177308-0d34-0410-b5e6-96231b3b80d8	2012-07-17 06:53:39 +00:00
Benjamin Kramer	b9bee04995	Add intrinsics for Ivy Bridge's rdrand instruction. The rdrand/cmov sequence is the same that is emitted by both GCC and ICC. Fixes PR13284. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160117 91177308-0d34-0410-b5e6-96231b3b80d8	2012-07-12 09:31:43 +00:00
Craig Topper	2a5dc43bd9	Use XOP vpcom intrinsics in patterns instead of a target specific SDNode type. Remove the custom lowering code that selected the SDNode type. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158279 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-09 17:02:24 +00:00
Hans Wennborg	f0234fcbc9	Implement the local-dynamic TLS model for x86 (PR3985) This implements codegen support for accesses to thread-local variables using the local-dynamic model, and adds a clean-up pass so that the base address for the TLS block can be re-used between local-dynamic access on an execution path. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157818 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-01 16:27:21 +00:00
Justin Holewinski	d2ea0e10cb	Change interface for TargetLowering::LowerCallTo and TargetLowering::LowerCall to pass around a struct instead of a large set of individual values. This cleans up the interface and allows more information to be added to the struct for future targets without requiring changes to each and every target. NV_CONTRIB git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157479 91177308-0d34-0410-b5e6-96231b3b80d8	2012-05-25 16:35:28 +00:00
Benjamin Kramer	17c836c4b5	X86: Don't emit conditional floating point moves on when targeting pre-pentiumpro architectures. * Model FPSW (the FPU status word) as a register. * Add ISel patterns for the FUCOM, FNSTSW and SAHF instructions. During Legalize/Lowering, build a node sequence to transfer the comparison result from FPSW into EFLAGS. If you're wondering about the right-shift: That's an implicit sub-register extraction (%ax -> %ah) which is handled later on by the instruction selector. Fixes PR6679. Patch by Christoph Erhardt! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@155704 91177308-0d34-0410-b5e6-96231b3b80d8	2012-04-27 12:07:43 +00:00
Craig Topper	8325c11d47	Merge vpermps/vpermd and vpermpd/vpermq SD nodes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154782 91177308-0d34-0410-b5e6-96231b3b80d8	2012-04-16 00:41:45 +00:00
Elena Demikhovsky	73c504af9d	Added VPERM optimization for AVX2 shuffles git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154761 91177308-0d34-0410-b5e6-96231b3b80d8	2012-04-15 11:18:59 +00:00
Richard Smith	42fc29e717	Fix X86 codegen for 'atomicrmw nand' to generate x = ~(x & y), not x = ~x & y. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154705 91177308-0d34-0410-b5e6-96231b3b80d8	2012-04-13 22:47:00 +00:00
Nadav Rotem	e611378a6e	Reapply 154396 after fixing a test. Original message: Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendV uses a register for the selection while Vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154483 91177308-0d34-0410-b5e6-96231b3b80d8	2012-04-11 06:40:27 +00:00
Eric Christopher	a139051654	Temporarily revert this patch to see if it brings the buildbots back. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154425 91177308-0d34-0410-b5e6-96231b3b80d8	2012-04-10 19:33:16 +00:00
Nadav Rotem	50e64cfe6e	Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendv uses a register for the selection while vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154396 91177308-0d34-0410-b5e6-96231b3b80d8	2012-04-10 14:33:13 +00:00
Evan Cheng	bf010eb911	Fix a long standing tail call optimization bug. When a libcall is emitted legalizer always use the DAG entry node. This is wrong when the libcall is emitted as a tail call since it effectively folds the return node. If the return node's input chain is not the entry (i.e. call, load, or store) use that as the tail call input chain. PR12419 rdar://9770785 rdar://11195178 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154370 91177308-0d34-0410-b5e6-96231b3b80d8	2012-04-10 01:51:00 +00:00
Nadav Rotem	154819dd6f	Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type. Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154310 91177308-0d34-0410-b5e6-96231b3b80d8	2012-04-09 07:45:58 +00:00
Rafael Espindola	26c8dcc692	Always compute all the bits in ComputeMaskedBits. This allows us to keep passing reduced masks to SimplifyDemandedBits, but know about all the bits if SimplifyDemandedBits fails. This allows instcombine to simplify cases like the one in the included testcase. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@154011 91177308-0d34-0410-b5e6-96231b3b80d8	2012-04-04 12:51:34 +00:00
Evan Cheng	4bfcd4acbc	Re-commit r151623 with fix. Only issue special no-return calls if it's a direct call. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151645 91177308-0d34-0410-b5e6-96231b3b80d8	2012-02-28 18:51:51 +00:00
Daniel Dunbar	20bd5296ce	Revert r151623 "Some ARM implementaions, e.g. A-series, does return stack prediction. ...", it is breaking the Clang build during the Compiler-RT part. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151630 91177308-0d34-0410-b5e6-96231b3b80d8	2012-02-28 15:36:07 +00:00
Evan Cheng	ec52aaa12f	Some ARM implementaions, e.g. A-series, does return stack prediction. That is, the processor keeps a return addresses stack (RAS) which stores the address and the instruction execution state of the instruction after a function-call type branch instruction. Calling a "noreturn" function with normal call instructions (e.g. bl) can corrupt RAS and causes 100% return misprediction so LLVM should use a unconditional branch instead. i.e. mov lr, pc b _foo The "mov lr, pc" is issued in order to get proper backtrace. rdar://8979299 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151623 91177308-0d34-0410-b5e6-96231b3b80d8	2012-02-28 06:42:03 +00:00
NAKAMURA Takumi	9a68fdc7f8	Target/X86: Fix assertion failures and warnings caused by r151382 _ftol2 lowering for i386-*-win32 targets. Patch by Joe Groff. [Joe Groff] Hi everyone. My previous patch applied as r151382 had a few problems: Clang raised a warning, and X86 LowerOperation would assert out for fptoui f64 to i32 because it improperly lowered to an illegal BUILD_PAIR. Here's a patch that addresses these issues. Let me know if any other changes are necessary. Thanks. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151432 91177308-0d34-0410-b5e6-96231b3b80d8	2012-02-25 03:37:25 +00:00
Michael J. Spencer	1a2d061ec0	Add WIN_FTOL_* psudo-instructions to model the unique calling convention used by the Win32 _ftol2 runtime function. Patch by Joe Groff! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151382 91177308-0d34-0410-b5e6-96231b3b80d8	2012-02-24 19:01:22 +00:00
Craig Topper	44d23825d6	Make all pointers to TargetRegisterClass const since they are all pointers to static data that should not be modified. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@151134 91177308-0d34-0410-b5e6-96231b3b80d8	2012-02-22 05:59:10 +00:00
Craig Topper	5aaffa8470	Make a bunch of X86ISelLowering shuffle functions static now that they are no longer needed by isel. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@150908 91177308-0d34-0410-b5e6-96231b3b80d8	2012-02-19 02:53:47 +00:00
Craig Topper	5b209e84f4	Add target specific node for PMULUDQ. Change patterns to use it and custom lower intrinsics to it. Use it instead of intrinsic to handle 64-bit vector multiplies. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@149807 91177308-0d34-0410-b5e6-96231b3b80d8	2012-02-05 03:14:49 +00:00
Elena Demikhovsky	dcabc7bca9	Optimization for SIGN_EXTEND operation on AVX. Special handling was added for v4i32 -> v4i64 and v8i16 -> v8i32 extensions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@149600 91177308-0d34-0410-b5e6-96231b3b80d8	2012-02-02 09:10:43 +00:00
Elena Demikhovsky	3ae98150e3	Optimization for "truncate" operation on AVX. Truncating v4i64 -> v4i32 and v8i32 -> v8i16 may be done with set of shuffles. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@149485 91177308-0d34-0410-b5e6-96231b3b80d8	2012-02-01 07:56:44 +00:00
Craig Topper	86c7c583a3	Move some XOP patterns into instruction definition. Replae VPCMOV intrinsic patterns with custom lowering to a target specific nodes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@149216 91177308-0d34-0410-b5e6-96231b3b80d8	2012-01-30 01:10:15 +00:00
Craig Topper	1906d32e55	Combine X86 CMPPD and CMPPS node types. Simplifies selection code and pattern matching. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@148670 91177308-0d34-0410-b5e6-96231b3b80d8	2012-01-22 23:36:02 +00:00
Craig Topper	67609fd0eb	Merge PCMPEQB/PCMPEQW/PCMPEQD/PCMPEQQ and PCMPGTB/PCMPGTW/PCMPGTD/PCMPGTQ X86 ISD node types into only two node types. Simplifying opcode selection and pattern matching. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@148667 91177308-0d34-0410-b5e6-96231b3b80d8	2012-01-22 22:42:16 +00:00
Craig Topper	ed2e13d667	Add target specific ISD node types for SSE/AVX vector shuffle instructions and change all the code that used to create intrinsic nodes to create the new nodes instead. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@148664 91177308-0d34-0410-b5e6-96231b3b80d8	2012-01-22 19:15:14 +00:00
Craig Topper	6a32b6f0c0	Remove unused X86 ISD node type defines. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@148644 91177308-0d34-0410-b5e6-96231b3b80d8	2012-01-22 01:15:56 +00:00
Craig Topper	1a7700a3fa	Merge 128-bit and 256-bit SHUFPS/SHUFPD handling. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@148466 91177308-0d34-0410-b5e6-96231b3b80d8	2012-01-19 08:19:12 +00:00
Victor Umansky	435d0bd09d	Reverted commit #147601 upon Evan's request. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147748 91177308-0d34-0410-b5e6-96231b3b80d8	2012-01-08 17:20:33 +00:00
Victor Umansky	19d8559019	Peephole optimization of ptest-conditioned branch in X86 arch. Performs instruction combining of sequences generated by ptestz/ptestc intrinsics to ptest+jcc pair for SSE and AVX. Testing: passed 'make check' including LIT tests for all sequences being handled (both SSE and AVX) Reviewers: Evan Cheng, David Blaikie, Bruno Lopes, Elena Demikhovsky, Chad Rosier, Anton Korobeynikov git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147601 91177308-0d34-0410-b5e6-96231b3b80d8	2012-01-05 08:46:19 +00:00
Craig Topper	b3982da7d2	Merge X86 SHUFPS and SHUFPD node types. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147394 91177308-0d34-0410-b5e6-96231b3b80d8	2011-12-31 23:50:21 +00:00
Chandler Carruth	acc068e873	Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the X86ISelLowering C++ code. Because this is lowered via an xor wrapped around a bsr, we want the dagcombine which runs after isel lowering to have a chance to clean things up. In particular, it is very common to see code which looks like: (sizeof(x)8 - 1) ^ __builtin_clz(x) Which is trying to compute the most significant bit of 'x'. That's actually the value computed directly by the 'bsr' instruction, but if we match it too late, we'll get completely redundant xor instructions. The more naive code for the above (subtracting rather than using an xor) still isn't handled correctly due to the dagcombine getting confused. Also, while here fix an issue spotted by inspection: we should have been expanding the zero-undef variants to the normal variants when there is an 'lzcnt' instruction. Do so, and test for this. We don't want to generate unnecessary 'bsr' instructions. These two changes fix some regressions in encoding and decoding benchmarks. However, there is still a lot* to be improve on in this type of code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147244 91177308-0d34-0410-b5e6-96231b3b80d8	2011-12-24 10:55:54 +00:00
Craig Topper	ab44d3cf49	Remove an unused X86ISD node type. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146833 91177308-0d34-0410-b5e6-96231b3b80d8	2011-12-17 19:16:44 +00:00
Craig Topper	94438ba538	Don't try to match 'unpackl/h v, v' for 32xi8 and 16xi16 when only AVX1 is supported. Fix 'unpackh v, v' for 256-bit types to understand 128-bit lanes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146726 91177308-0d34-0410-b5e6-96231b3b80d8	2011-12-16 08:06:31 +00:00
Craig Topper	d93e4c3496	Remove some remants of the old palign pattern fragment that were still hanging around. Also remove a cast from inside getShuffleVPERM2X128Immediate and getShuffleVPERMILPImmediate since the only caller already had done the cast. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146344 91177308-0d34-0410-b5e6-96231b3b80d8	2011-12-11 19:12:35 +00:00
Craig Topper	34671b812a	Merge floating point and integer UNPCK X86ISD node types. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145926 91177308-0d34-0410-b5e6-96231b3b80d8	2011-12-06 08:21:25 +00:00
Craig Topper	ec24e61ab0	Merge VPERM2F128/VPERM2I128 ISD node types. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145485 91177308-0d34-0410-b5e6-96231b3b80d8	2011-11-30 07:47:51 +00:00
Craig Topper	316cd2a2c5	Merge decoding of VPERMILPD and VPERMILPS shuffle masks. Merge X86ISD node type for VPERMILPD/PS. Add instruction selection support for VINSERTI128/VEXTRACTI128. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145483 91177308-0d34-0410-b5e6-96231b3b80d8	2011-11-30 06:25:25 +00:00
Craig Topper	70b883b3a7	Add X86 instruction selection for VPERM2I128 when AVX2 is enabled. Merge VPERMILPS/VPERMILPD detection since they are pretty similar. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145238 91177308-0d34-0410-b5e6-96231b3b80d8	2011-11-28 10:14:51 +00:00
Craig Topper	38034c568c	Merge 128-bit and 256-bit X86ISD node types for VPERMILPS and VPERMILPD. Simplify some shuffle lowering code since V1 can never be UNDEF due to canonalizing that occurs when shuffle nodes are created. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145153 91177308-0d34-0410-b5e6-96231b3b80d8	2011-11-26 22:55:48 +00:00
Craig Topper	06cb680779	Collapse X86ISD node types for PUNPCKH, PUNPCKL, UNPCKLP, and UNPCKHP to not be type specific. Now we just have integer high and low and floating point high and low. Pattern matching will choose the correct instruction based on the vector type. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145148 91177308-0d34-0410-b5e6-96231b3b80d8	2011-11-26 20:47:44 +00:00
Craig Topper	705f2431a0	Remove 256-bit specific node types for UNPCKHPS/D and instead use the 128-bit versions and let the operand type disinquish. Also fix the load form of the v8i32 patterns for these to realize that the load would be promoted to v4i64. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145126 91177308-0d34-0410-b5e6-96231b3b80d8	2011-11-24 22:57:10 +00:00
Craig Topper	f475a55bd4	Remove AVX2 specific X86ISD node types for PUNPCKH/L and instead just reuse the 128-bit versions and let the vector type distinguish. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145125 91177308-0d34-0410-b5e6-96231b3b80d8	2011-11-24 22:20:08 +00:00
Craig Topper	6fa583d787	Lowering for v32i8 to VPUNPCKLBW/VPUNPCKHBW when AVX2 is enabled. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145028 91177308-0d34-0410-b5e6-96231b3b80d8	2011-11-21 08:26:50 +00:00
Craig Topper	6347e8662c	Add support for lowering 256-bit shuffles to VPUNPCKL/H for i16, i32, i64 if AVX2 is enabled. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145026 91177308-0d34-0410-b5e6-96231b3b80d8	2011-11-21 06:57:39 +00:00
Craig Topper	54f952afac	Synthesize SSSE3/AVX 128-bit horizontal integer add/sub instructions from add/sub of appropriate shuffle vectors. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144989 91177308-0d34-0410-b5e6-96231b3b80d8	2011-11-19 09:02:40 +00:00
Craig Topper	3113384a34	Collapse X86 PSIGNB/PSIGNW/PSIGND node types. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144988 91177308-0d34-0410-b5e6-96231b3b80d8	2011-11-19 07:33:10 +00:00
Lang Hames	15701f8969	Rename NonScalarIntSafe to something more appropriate. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@143080 91177308-0d34-0410-b5e6-96231b3b80d8	2011-10-26 23:50:43 +00:00
Craig Topper	b4c945716f	Remove intrinsics for X86 BLSI, BLSMSK, and BLSR intrinsics and replace with custom isel lowering code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@142642 91177308-0d34-0410-b5e6-96231b3b80d8	2011-10-21 06:55:01 +00:00
Craig Topper	54a11176f6	Add X86 ANDN instruction. Including instruction selection. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@141947 91177308-0d34-0410-b5e6-96231b3b80d8	2011-10-14 07:06:56 +00:00
Duncan Sands	17470bee5f	Synthesize SSE3/AVX 128 bit horizontal add/sub instructions from floating point add/sub of appropriate shuffle vectors. Does not synthesize the 256 bit AVX versions because they work differently. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@140332 91177308-0d34-0410-b5e6-96231b3b80d8	2011-09-22 20:15:48 +00:00
Nadav Rotem	fbad25e120	CR fixes per Bruno's request. Undo the changes from r139285 which added custom lowering to vselect. Add tablegen lowering for vselect. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139479 91177308-0d34-0410-b5e6-96231b3b80d8	2011-09-11 15:02:23 +00:00
Nadav Rotem	8ffad56f8e	Implement vector-select support for avx256. Refactor the vblend implementation to have tablegen match the instruction by the node type git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139400 91177308-0d34-0410-b5e6-96231b3b80d8	2011-09-09 20:29:17 +00:00
Nadav Rotem	ffe3e7da84	Add X86-SSE4 codegen support for vector-select. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139285 91177308-0d34-0410-b5e6-96231b3b80d8	2011-09-08 08:11:19 +00:00
Rafael Espindola	5c984df26b	Fix comment. Noticed by Duncan. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139161 91177308-0d34-0410-b5e6-96231b3b80d8	2011-09-06 19:29:31 +00:00
Duncan Sands	28b77e968d	Add codegen support for vector select (in the IR this means a select with a vector condition); such selects become VSELECT codegen nodes. This patch also removes VSETCC codegen nodes, unifying them with SETCC nodes (codegen was actually often using SETCC for vector SETCC already). This ensures that various DAG combiner optimizations kick in for vector comparisons. Passes dragonegg bootstrap with no testsuite regressions (nightly testsuite as well as "make check-all"). Patch mostly by Nadav Rotem. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139159 91177308-0d34-0410-b5e6-96231b3b80d8	2011-09-06 19:07:46 +00:00
Duncan Sands	4a544a79bd	Split the init.trampoline intrinsic, which currently combines GCC's init.trampoline and adjust.trampoline intrinsics, into two intrinsics like in GCC. While having one combined intrinsic is tempting, it is not natural because typically the trampoline initialization needs to be done in one function, and the result of adjust trampoline is needed in a different (nested) function. To get around this llvm-gcc hacks the nested function lowering code to insert an additional parent variable holding the adjust.trampoline result that can be accessed from the child function. Dragonegg doesn't have the luxury of tweaking GCC code, so it stored the result of adjust.trampoline in the memory GCC set aside for the trampoline itself (this is always available in the child function), and set up some new memory (using an alloca) to hold the trampoline. Unfortunately this breaks Go which allocates trampoline memory on the heap and wants to use it even after the parent has exited (!). Rather than doing even more hacks to get Go working, it seemed best to just use two intrinsics like in GCC. Patch mostly by Sanjoy Das. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@139140 91177308-0d34-0410-b5e6-96231b3b80d8	2011-09-06 13:37:06 +00:00
Rafael Espindola	151ab3e2f7	Adds support for variable sized allocas. For a variable sized alloca, code is inserted to first check if the current stacklet has enough space. If so, space is allocated by simply decrementing the stack pointer. Otherwise a runtime routine (__morestack_allocate_stack_space in libgcc) is called which allocates the required memory from the heap. Patch by Sanjoy Das. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138818 91177308-0d34-0410-b5e6-96231b3b80d8	2011-08-30 19:47:04 +00:00
Rafael Espindola	d07b7ec772	Adds a SelectionDAG node X86SegAlloca which will be custom lowered from DYNAMIC_STACKALLOC. Two new pseudo instructions (SEG_ALLOCA_32 and SEG_ALLOCA_64) which will match X86SegAlloca (based on word size) are also added. They will be custom emitted to inject the actual stack handling code. Patch by Sanjoy Das. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138814 91177308-0d34-0410-b5e6-96231b3b80d8	2011-08-30 19:43:21 +00:00
Eli Friedman	43f51aeca8	Add support for generating CMPXCHG16B on x86-64 for the cmpxchg IR instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138660 91177308-0d34-0410-b5e6-96231b3b80d8	2011-08-26 21:21:21 +00:00
Craig Topper	13894fa135	Break 256-bit vector int add/sub/mul into two 128-bit operations to avoid costly scalarization. Fixes PR10711. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@138427 91177308-0d34-0410-b5e6-96231b3b80d8	2011-08-24 06:14:18 +00:00
Bruno Cardoso Lopes	0e6d230abd	Introduce matching patterns for vbroadcast AVX instruction. The idea is to match splats in the form (splat (scalar_to_vector (load ...))) whenever the load can be folded. All the logic and instruction emission is working but because of PR8156, there are no ways to match loads, cause they can never be folded for splats. Thus, the tests are XFAILed, but I've tested and exercised all the logic using a relaxed version for checking the foldable loads, as if the bug was already fixed. This should work out of the box once PR8156 gets fixed since MayFoldLoad will work as expected. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137810 91177308-0d34-0410-b5e6-96231b3b80d8	2011-08-17 02:29:19 +00:00
Bruno Cardoso Lopes	53cae1362d	The VPERM2F128 is a AVX instruction which permutes between two 256-bit vectors. It operates on 128-bit elements instead of regular scalar types. Recognize shuffles that are suitable for VPERM2F128 and teach the x86 legalizer how to handle them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137519 91177308-0d34-0410-b5e6-96231b3b80d8	2011-08-12 21:48:26 +00:00
Bruno Cardoso Lopes	9065d4b65f	Cleanup PALIGNR handling and remove the old palign pattern fragment. Also make PALIGNR masks to don't match 256-bits, which isn't supported It's also a step to solve PR10489 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136448 91177308-0d34-0410-b5e6-96231b3b80d8	2011-07-29 01:30:59 +00:00
Eli Friedman	1464846801	Code generation for 'fence' instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136283 91177308-0d34-0410-b5e6-96231b3b80d8	2011-07-27 22:21:52 +00:00
Bruno Cardoso Lopes	cea34e41fa	The vpermilps and vpermilpd have different behaviour regarding the usage of the shuffle bitmask. Both work in 128-bit lanes without crossing, but in the former the mask of the high part is the same used by the low part while in the later both lanes have independent masks. Handle this properly and and add support for vpermilpd. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136200 91177308-0d34-0410-b5e6-96231b3b80d8	2011-07-27 00:56:34 +00:00
Bruno Cardoso Lopes	4ea496846a	Recognize unpckh* masks and match 256-bit versions. The new versions are different from the previous 128-bit because they work in lanes. Update a few comments and add testcases git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136157 91177308-0d34-0410-b5e6-96231b3b80d8	2011-07-26 22:03:40 +00:00
Bruno Cardoso Lopes	9123c6fea0	More movsldup/movshdup cleanup. Rewrite the mask matching function and add support for 256-bit versions (but no instruction selection yet, coming next). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@136050 91177308-0d34-0410-b5e6-96231b3b80d8	2011-07-26 02:39:28 +00:00
Bruno Cardoso Lopes	6683efb4cd	-Inspected a AVX code block added by someone in early Feb. This was never used and was actually very wrong, fix it and make it simpler. Also remove the ConcatVectors function, which is unused now. - Fix a introduction of useless nodes in r126664 and r126264. The VUNPCKL* should never be introduced cause we don't want duplicate nodes for 128 AVX and non-AVX modes, the actual instruction difference only exists during isel, but not for target specific DAG nodes. We only introduce V* target nodes when there is no 128-bit version already there. - Fix a fragile test and make it more useful. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@135729 91177308-0d34-0410-b5e6-96231b3b80d8	2011-07-22 00:15:07 +00:00
Bruno Cardoso Lopes	65b74e1d00	Add support for 256-bit versions of VPERMIL instruction. This is a new instruction introduced in AVX, which can operate on 128 and 256-bit vectors. It considers a 256-bit vector as two independent 128-bit lanes. It can permute any 32 or 64 elements inside a lane, and restricts the second lane to have the same permutation of the first one. With the improved splat support introduced early today, adding codegen for this instruction enable more efficient 256-bit code: Instead of: vextractf128 $0, %ymm0, %xmm0 punpcklbw %xmm0, %xmm0 punpckhbw %xmm0, %xmm0 vinsertf128 $0, %xmm0, %ymm0, %ymm1 vinsertf128 $1, %xmm0, %ymm1, %ymm0 vextractf128 $1, %ymm0, %xmm1 shufps $1, %xmm1, %xmm1 movss %xmm1, 28(%rsp) movss %xmm1, 24(%rsp) movss %xmm1, 20(%rsp) movss %xmm1, 16(%rsp) vextractf128 $0, %ymm0, %xmm0 shufps $1, %xmm0, %xmm0 movss %xmm0, 12(%rsp) movss %xmm0, 8(%rsp) movss %xmm0, 4(%rsp) movss %xmm0, (%rsp) vmovaps (%rsp), %ymm0 We get: vextractf128 $0, %ymm0, %xmm0 punpcklbw %xmm0, %xmm0 punpckhbw %xmm0, %xmm0 vinsertf128 $0, %xmm0, %ymm0, %ymm1 vinsertf128 $1, %xmm0, %ymm1, %ymm0 vpermilps $85, %ymm0, %ymm0 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@135662 91177308-0d34-0410-b5e6-96231b3b80d8	2011-07-21 01:55:47 +00:00

1 2 3 4 5 ...

660 Commits