llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2024-08-26 23:29:22 +00:00

Author	SHA1	Message	Date
Juergen Ributzka	1f5263e43f	[FastISel][AArch64] Don't fold instructions too aggressively into the memory operation. Currently instructions are folded very aggressively into the memory operation, which can lead to the use of killed operands: %vreg1<def> = ADDXri %vreg0<kill>, 2 %vreg2<def> = LDRBBui %vreg0, 2 ... = ... %vreg1 ... This usually happens when the result is also used by another non-memory instruction in the same basic block, or any instruction in another basic block. If the computed address is used by only memory operations in the same basic block, then it is safe to fold them. This is because all memory operations will fold the address computation and the original computation will never be emitted. This fixes rdar://problem/18142857. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216629 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-27 22:52:33 +00:00
Juergen Ributzka	ccf53013cd	[FastISel][AArch64] Fix simplify address when the address comes from a shift. When the address comes directly from a shift instruction then the address computation cannot be folded into the memory instruction, because the zero register is not available as a base register. Simplify addess needs to emit the shift instruction and use the result as base register. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216621 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-27 21:38:33 +00:00
Juergen Ributzka	d445e4acdb	[FastISel][AArch64] Use the zero register for stores. Use the zero register directly when possible to avoid an unnecessary register copy and a wasted register at -O0. This also uses integer stores to store a positive floating-point zero. This saves us from materializing the positive zero in a register and then storing it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216617 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-27 21:04:52 +00:00
Oliver Stannard	5e487f8dc7	Teach the AArch64 backend about v4f16 and v8f16 This teaches the AArch64 backend to deal with the operations required to deal with the operations on v4f16 and v8f16 which are exposed by NEON intrinsics, plus the add, sub, mul and div operations. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216555 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-27 16:16:04 +00:00
Chandler Carruth	7e3dc40fab	[x86] Fix a regression introduced with r213897 for 32-bit targets where we stopped efficiently lowering sextload using the SSE41 instructions for that operation. This is a consequence of a bad predicate I used thinking of the memory access needs. The code actually handles the cases where the predicate doesn't apply, and handles them much better. =] Simple fix and a test case added. Fixes PR20767. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216538 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-27 11:39:47 +00:00
Chandler Carruth	963a5e6c61	[SDAG] Re-instate r215611 with a fix to a pesky X86 DAG combine. This combine is essentially combining target-specific nodes back into target independent nodes that it "knows" will be combined yet again by a target independent DAG combine into a different set of target-independent nodes that are legal (not custom though!) and thus "ok". This seems... deeply flawed. The crux of the problem is that we don't combine un-legalized shuffles that are introduced by legalizing other operations, and thus we don't see a very profitable combine opportunity. So the backend just forces the input to that combine to re-appear. However, for this to work, the conditions detected to re-form the unlegalized nodes must be exactly right. Previously, failing this would have caused poor code (if you're lucky) or a crasher when we failed to select instructions. After r215611 we would fall back into the legalizer. In some cases, this just "fixed" the crasher by produces bad code. But in the test case added it caused the legalizer and the dag combiner to iterate forever. The fix is to make the alignment checking in the x86 side of things match the alignment checking in the generic DAG combine exactly. This isn't really a satisfying or principled fix, but it at least make the code work as intended. It also highlights that it would be nice to detect the availability of under aligned loads for a given type rather than bailing on this optimization. I've left a FIXME to document this. Original commit message for r215611 which covers the rest of the chang: [SDAG] Fix a case where we would iteratively legalize a node during combining by replacing it with something else but not re-process the node afterward to remove it. In a truly remarkable stroke of bad luck, this would (in the test case attached) end up getting some other node combined into it without ever getting re-processed. By adding it back on to the worklist, in addition to deleting the dead nodes more quickly we also ensure that if it stops being dead for any reason it makes it back through the legalizer. Without this, the test case will end up failing during instruction selection due to an and node with a type we don't have an instruction pattern for. It took many million runs of the shuffle fuzz tester to find this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216537 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-27 11:22:16 +00:00
Elena Demikhovsky	fe0c6ead85	AVX-512: Added intrinsic for VMOVSS store form with mask. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216530 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-27 07:38:43 +00:00
Juergen Ributzka	fc03e72b4f	[FastISel][AArch64] Fix address simplification. When a shift with extension or an add with shift and extension cannot be folded into the memory operation, then the address calculation has to be materialized separately. While doing so the code forgot to consider a possible sign-/zero- extension. This fix folds now also the sign-/zero-extension into the add or shift instruction which is used to materialize the address. This fixes rdar://problem/18141718. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216511 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-27 00:58:30 +00:00
Juergen Ributzka	836f4bd090	[FastISel][AArch64] Fold Sign-/Zero-Extend into the shift immediate instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216510 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-27 00:58:26 +00:00
Yi Kong	2282afa6cc	ARM: Add patterns for dbg git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216451 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-26 12:47:26 +00:00
Chad Rosier	373fc00835	[AArch32] Add patterns for VCVT{A,N,P,M}. Patterns for lowering libm calls to VCVT{A,N,P,M} are also included. Phabricator Revision: http://reviews.llvm.org/D5033 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216388 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-25 16:56:33 +00:00
Hal Finkel	7ca2a7d742	[PowerPC] Add support for dcbtst and icbt (prefetch) Adds code generation support for dcbtst (data cache prefetch for write) and icbt (instruction cache prefetch for read - Book E cores only). We still end up with a 'cannot select' error for the non-supported prefetch intrinsic forms. This will be fixed in a later commit. Fixes PR20692. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216339 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-23 23:21:04 +00:00
Chad Rosier	8eb867e97d	Revert "ARM: improve RTABI 4.2 conformance on Linux" This reverts commit r215862 due to nightly failures. Will work on getting a reduced test case, but I wanted to get our bots green in the meantime. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216325 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-23 18:29:43 +00:00
Chandler Carruth	bfed08e41f	[x86] Start fixing a really subtle and terrible form of miscompile in these DAG combines. The DAG auto-CSE thing is truly terrible. Due to it, when RAUW-ing a node with its operand, you can cause its uses to CSE to itself, which then causes their uses to become your uses which causes them to be picked up by the RAUW. For nodes that are determined to be "no-ops", this is "fine". But if the RAUW is one of several steps to enact a transformation, this causes the DAG to really silently eat an discard nodes that you would never expect. It took days for me to actually pinpoint a test case triggering this and a really frustrating amount of time to even comprehend the bug because I never even thought about the ability of RAUW to iteratively consume nodes due to CSE-ing them into itself. To fix this, we have to build up a brand-new chain of operations any time we are combining across (potentially) intervening nodes. But once the logic is added to do this, another issue surfaces: CombineTo eagerly deletes the one node combined, but no others. This is... really frustrating. If deleting it makes its operands become dead, those operand nodes often won't go onto the worklist in the order you would want -- they're already on it and not near the top. That means things higher on the worklist will get combined prior to these dead nodes being GCed out of the worklist, and if the chain is long, the immediate users won't be enough to re-detect where the root of the chain is that became single-use again after deleting the dead nodes. The better way to do this is to never immediately delete nodes, and instead to just enqueue them so we can recursively delete them. The combined-from node is typically not on the worklist anyways by virtue of having been popped off.... But that in turn breaks other tests that require CombineTo to delete unused nodes. :: sigh :: Fortunately, there is a better way. This whole routine should have been returning the replacement rather than using CombineTo which is quite hacky. Switch to that, and all the pieces fall together. I suspect the same kind of miscompile is possible in the half-shuffle folding code, and potentially the recursive folding code. I'll be switching those over to a pattern more like this one for safety's sake even though I don't immediately have any test cases for them. Note that the only way I got a test case for this instance was with heavily DAG combined 256-bit shuffle sequences generated by my fuzzer. ;] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216319 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-23 10:25:15 +00:00
Nick Lewycky	f591b9c33e	Revert r215611 because it caused the infinite loop in bug 20736. There is a reduced testcase in that bug. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216307 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-23 00:45:03 +00:00
Reid Kleckner	d89c0abc07	ARM / x86_64 varargs: Don't save regparms in prologue without va_start There's no need to do this if the user doesn't call va_start. In the future, we're going to have thunks that forward these register parameters with musttail calls, and they won't need these spills for handling va_start. Most of the test suite changes are adding va_start calls to existing tests to keep things working. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216294 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-22 21:59:26 +00:00
Tom Stellard	f50f927d65	R600/SI: Use READ2/WRITE2 instructions for 64-bit mem ops with 32-bit alignment git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216279 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-22 18:49:35 +00:00
Tom Stellard	ec4cb3346d	R600/SI: Use a ComplexPattern for DS loads and stores git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216278 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-22 18:49:33 +00:00
Quentin Colombet	c3f2ad0879	[ARM] Move the implementation of the target hooks related to copy-related instruction from ARMInstrInfo to ARMBaseInstrInfo. That way, thumb mode can also benefit from the advanced copy optimization. <rdar://problem/12702965> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216274 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-22 18:05:22 +00:00
Sasa Stankovic	cc59c3f335	[mips] Don't use odd-numbered float registers for double arguments for fastcc calling convention if FP is 64-bit and +nooddspreg is used. Differential Revision: http://reviews.llvm.org/D4981.diff git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216262 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-22 09:23:22 +00:00
Juergen Ributzka	5e34dffb9c	[FastISel][AArch64] Add support for variable shift. This adds the missing variable shift support for value type i8, i16, and i32. This fixes <rdar://problem/18095685>. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216242 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-21 23:06:07 +00:00
Juergen Ributzka	5d6365c80c	[FastISel][AArch64] Use the correct register class to make the MI verifier happy. This is mostly achieved by providing the correct register class manually, because getRegClassFor always returns the GPRAllRegClass for MVT::i32 and MVT::i64. Also cleanup the code to use the FastEmitInst_ method whenever possible. This makes sure that the operands' register class is properly constrained. For all the remaining cases this adds the missing constrainOperandRegClass calls for each operand. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216225 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-21 20:57:57 +00:00
Tom Stellard	fdbf61d00d	R600/SI: Teach moveToVALU how to handle more S_LOAD_* instructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216220 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-21 20:41:00 +00:00
Tom Stellard	5f52739370	R600/SI: Make sure SCRATCH_WAVE_OFFSET is added as Live-In to the function This fixes a crash in an ocl conformance test. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216219 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-21 20:40:58 +00:00
Quentin Colombet	ad3c6289b6	[AArch64] Run a peephole pass right after AdvSIMD pass. The AdvSIMD pass may produce copies that are not coalescer-friendly. The peephole optimizer knows how to fix that as demonstrated in the test case. <rdar://problem/12702965> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216200 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-21 18:10:07 +00:00
Moritz Roth	a6afad8b33	Thumb1 load/store optimizer: Improve code to materialize new base register. There are two add-immediate instructions in Thumb1: tADDi8 and tADDi3. Only the latter supports using different source and destination registers, so whenever we materialize a new base register (at a certain offset) we'd do so by moving the base register value to the new register and then adding in place. This patch changes the code to use a single tADDi3 if the offset is small enough to fit in 3 bits. Differential Revision: http://reviews.llvm.org/D5006 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216193 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-21 17:11:03 +00:00
Juergen Ributzka	69ec09b61f	[FastISel][AArch64] Remove redundant test. These tests and many more are already covered by fast-isel-addressing-modes.ll. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216186 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-21 16:40:05 +00:00
Jonathan Roelofs	4c3be1aa0f	Add a thread-model knob for lowering atomics on baremetal & single threaded systems http://reviews.llvm.org/D4984 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216182 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-21 14:35:47 +00:00
Benjamin Kramer	daada81e5c	DAGCombiner: Make concat_vector combine safe for EVTs and concat_vectors with many arguments. PR20677 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216175 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-21 13:28:02 +00:00
Oliver Stannard	760a46522a	[ARM] Enable DP copy, load and store instructions for FPv4-SP The FPv4-SP floating-point unit is generally referred to as single-precision only, but it does have double-precision registers and load, store and GPR<->DPR move instructions which operate on them. This patch enables the use of these registers, the main advantage of which is that we now comply with the AAPCS-VFP calling convention. This partially reverts r209650, which added some AAPCS-VFP support, but did not handle return values or alignment of double arguments in registers. This patch also adds tests for Thumb2 code generation for floating-point instructions and intrinsics, which previously only existed for ARM. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216172 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-21 12:50:31 +00:00
Robert Khasanov	fec1abaeab	[x86] Added _addcarry_ and _subborrow_ intrinsics git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216164 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-21 09:43:43 +00:00
Robert Khasanov	10dacc4b52	[x86] Broadwell: ADOX/ADCX. Added _addcarryx_u{32\|64} intrinsics to LLVM. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216162 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-21 09:27:00 +00:00
Jiangning Liu	150cef218f	Revert r216066, "Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type". git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216147 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-21 01:59:30 +00:00
Quentin Colombet	e817bdd304	[PeepholeOptimizer] Take advantage of the isInsertSubreg property in the advanced copy optimization. This is the final step patch toward transforming: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 bx lr Indeed, thanks to this patch, this optimization is able to look through vmov.32 d16[0], r0 vmov.32 d16[1], r1 and is able to rewrite the following sequence: vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 into simple generic GPR copies that the coalescer managed to remove. <rdar://problem/12702965> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216144 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-21 00:19:16 +00:00
Jonathan Roelofs	506ed4d4a5	Lower thumbv4t & thumbv5 lo->lo copies through a push-pop sequence On pre-v6 hardware, 'MOV lo, lo' gives undefined results, so such copies need to be avoided. This patch trades simplicity for implementation time at the expense of performance... As they say: correctness first, then performance. See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075998.html for a few ideas on how to make this better. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216138 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-20 23:38:50 +00:00
Sanjay Patel	89305e5345	Don't prevent a vselect of constants from becoming a single load (PR20648). Fix for PR20648 - http://llvm.org/bugs/show_bug.cgi?id=20648 This patch checks the operands of a vselect to see if all values are constants. If yes, bail out of any further attempts to create a blend or shuffle because SelectionDAGLegalize knows how to turn this kind of vselect into a single load. This already happens for machines without SSE4.1, so the added checks just send more targets down that path. Differential Revision: http://reviews.llvm.org/D4934 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216121 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-20 20:34:56 +00:00
Duncan P. N. Exon Smith	4641d5dbdf	X86: Add missing triples from r216119 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216120 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-20 19:58:59 +00:00
Duncan P. N. Exon Smith	5012f1db20	X86: Align the stack on word boundaries in LowerFormalArguments() The goal of the patch is to implement section 3.2.3 of the AMD64 ABI correctly. The controlling sentence is, "The size of each argument gets rounded up to eightbytes. Therefore the stack will always be eightbyte aligned." The equivalent sentence in the i386 ABI page 37 says, "At all times, the stack pointer should point to a word-aligned area." For both architectures, the stack pointer is not being rounded up to the nearest eightbyte or word between the last normal argument and the first variadic argument. Patch by Thomas Jablin! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216119 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-20 19:40:59 +00:00
Keno Fischer	4b1cddbaf0	Do not insert a tail call when returning multiple values on X86 Summary: This fixes http://llvm.org/bugs/show_bug.cgi?id=19530. The problem is that X86ISelLowering erroneously thought the third call was eligible for tail call elimination. It would have been if it's return value was actually the one returned by the calling function, but here that is not the case and additional values are being returned. Test Plan: Test case from the original bug report is included. Reviewers: rafael Reviewed By: rafael Subscribers: rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D4968 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216117 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-20 19:00:37 +00:00
Sanjay Patel	3deb3e32b2	critical-anti-dependency breaker: don't use reg def info from kill insts (PR20308) In PR20308 ( http://llvm.org/bugs/show_bug.cgi?id=20308 ), the critical-anti-dependency breaker caused a miscompile because it broke a WAR hazard using a register that it thinks is available based on info from a kill inst. Until PR18663 is solved, we shouldn't use any def/use info from a kill because they are really just nops. This patch adds guard checks for kills around calls to ScanInstruction() where the DefIndices array is set. For good measure, add an assert in ScanInstruction() so we don't hit this bug again. The test case is a reduced version of the code from the bug report. Differential Revision: http://reviews.llvm.org/D4977 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216114 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-20 18:03:00 +00:00
Quentin Colombet	dcd3cbea54	[PeepholeOptimizer] Refactor the advanced copy optimization to take advantage of the isRegSequence property. This is a follow-up of r215394 and r215404, which respectively introduces the isRegSequence property and uses it for ARM. Thanks to the property introduced by the previous commits, this patch is able to optimize the following sequence: vmov d0, r2, r3 vmov d1, r0, r1 vmov r0, s0 vmov r1, s2 udiv r0, r1, r0 vmov r1, s1 vmov r2, s3 udiv r1, r2, r1 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr This patch refactors how the copy optimizations are done in the peephole optimizer. Prior to this patch, we had one copy-related optimization that replaced a copy or bitcast by a generic, more suitable (in terms of register file), copy. With this patch, the peephole optimizer features two copy-related optimizations: 1. One for rewriting generic copies to generic copies: PeepholeOptimizer::optimizeCoalescableCopy. 2. One for replacing non-generic copies with generic copies: PeepholeOptimizer::optimizeUncoalescableCopy. The goals of these two optimizations are slightly different: one rewrite the operand of the instruction (#1), the other kills off the non-generic instruction and replace it by a (sequence of) generic instruction(s). Both optimizations rely on the ValueTracker introduced in r212100. The ValueTracker has been refactored to use the information from the TargetInstrInfo for non-generic instruction. As part of the refactoring, we switched the tracking from the index of the definition to the actual register (virtual or physical). This one change is to provide better consistency with register related APIs and to ease the use of the TargetInstrInfo. Moreover, this patch introduces a new helper class CopyRewriter used to ease the rewriting of generic copies (i.e., #1). Finally, this patch adds a dead code elimination pass right after the peephole optimizer to get rid of dead code that may appear after rewriting. This is related to <rdar://problem/12702965>. Review: http://reviews.llvm.org/D4874 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216088 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-20 17:41:48 +00:00
Juergen Ributzka	5273295751	[FastISel][AArch64] Don't fold the sign-/zero-extend from i1 into the compare. This fixes a bug I introduced in a previous commit (r216033). Sign-/Zero- extension from i1 cannot be folded into the ADDS/SUBS instructions. Instead both operands have to be sign-/zero-extended with separate instructions. Related to <rdar://problem/17913111>. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216073 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-20 16:34:15 +00:00
Jiangning Liu	e04455d72b	Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type legalization stage. With those two optimizations, fewer signed/zero extension instructions can be inserted, and then we can expose more opportunities to Machine CSE pass in back-end. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216066 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-20 12:05:15 +00:00
Pavel Chupin	aadaac228d	[x32] Fix FrameIndex check in SelectLEA64_32Addr Summary: Fixes http://llvm.org/bugs/show_bug.cgi?id=20016 reproducible on new lea-5.ll case. Also use RSP/RBP for x32 lea to save 1 byte used for 0x67 prefix in ESP/EBP case. Test Plan: lea tests modified to include x32/nacl and new test added Reviewers: nadav, dschuff, t.p.northover Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D4929 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216065 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-20 11:59:22 +00:00
Yi Kong	40f9d11ccc	ARM: Fix codegen for rbit intrinsic LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic. According to ARM ARM, rbit only takes register as argument, not immediate. The correct instruction should be rbit <Rd>, <Rm>. The bug was originally introduced in r211057. Differential Revision: http://reviews.llvm.org/D4980 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216064 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-20 10:40:20 +00:00
Juergen Ributzka	3ef392c4e2	[FastISel][AArch64] Use the proper FMOV instruction to materialize a +0.0. Use FMOVWSr/FMOVXDr instead of FMOVSr/FMOVDr, which have the proper register class to be used with the zero register. This makes the MachineInstruction verifier happy again. This is related to <rdar://problem/18027157>. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216040 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-20 01:10:36 +00:00
Juergen Ributzka	ae9a7964ef	[FastISel][AArch64] Factor out ADDS/SUBS instruction emission and add support for extensions and shift folding. Factor out the ADDS/SUBS instruction emission code into helper functions and make the helper functions more clever to support most of the different ADDS/SUBS instructions the architecture support. This includes better immedediate support, shift folding, and sign-/zero-extend folding. This fixes <rdar://problem/17913111>. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216033 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-19 22:29:55 +00:00
Juergen Ributzka	1d58f989d4	[FastISel][AArch64] Extend floating-point materialization test. This adds the missing test that I promised for r215753 to test the materialization of the floating-point value +0.0. Related to <rdar://problem/18027157>. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216019 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-19 20:35:07 +00:00
Juergen Ributzka	06bb1ca1e0	Reapply [FastISel][AArch64] Add support for more addressing modes (r215597). Note: This was originally reverted to track down a buildbot error. Reapply without any modifications. Original commit message: FastISel didn't take much advantage of the different addressing modes available to it on AArch64. This commit allows the ComputeAddress method to recognize more addressing modes that allows shifts and sign-/zero-extensions to be folded into the memory operation itself. For Example: lsl x1, x1, #3 --> ldr x0, [x0, x1, lsl #3] ldr x0, [x0, x1] sxtw x1, w1 lsl x1, x1, #3 --> ldr x0, [x0, x1, sxtw #3] ldr x0, [x0, x1] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216013 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-19 19:44:17 +00:00
Juergen Ributzka	96b1e70c66	Reapply [FastISel][X86] Add large code model support for materializing floating-point constants (r215595). Note: This was originally reverted to track down a buildbot error. Reapply without any modifications. Original commit message: In the large code model for X86 floating-point constants are placed in the constant pool and materialized by loading from it. Since the constant pool could be far away, a PC relative load might not work. Therefore we first materialize the address of the constant pool with a movabsq and then load from there the floating-point value. Fixes <rdar://problem/17674628>. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216012 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-19 19:44:13 +00:00

1 2 3 4 5 ...

11422 Commits