llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2024-09-05 17:30:02 +00:00

Author	SHA1	Message	Date
Elena Demikhovsky	1596373671	Shuffle optimization for AVX/AVX2. The current patch optimizes frequently used shuffle patterns and gives these instruction sequence reduction. Before: vshufps $-35, %xmm1, %xmm0, %xmm2 ## xmm2 = xmm0[1,3],xmm1[1,3] vpermilps $-40, %xmm2, %xmm2 ## xmm2 = xmm2[0,2,1,3] vextractf128 $1, %ymm1, %xmm1 vextractf128 $1, %ymm0, %xmm0 vshufps $-35, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[1,3],xmm1[1,3] vpermilps $-40, %xmm0, %xmm0 ## xmm0 = xmm0[0,2,1,3] vinsertf128 $1, %xmm0, %ymm2, %ymm0 After: vshufps $13, %ymm0, %ymm1, %ymm1 ## ymm1 = ymm1[1,3],ymm0[0,0],ymm1[5,7],ymm0[4,4] vshufps $13, %ymm0, %ymm0, %ymm0 ## ymm0 = ymm0[1,3,0,0,5,7,4,4] vunpcklps %ymm1, %ymm0, %ymm0 ## ymm0 = ymm0[0],ymm1[0],ymm0[1],ymm1[1],ymm0[4],ymm1[4],ymm0[5],ymm1[5] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159188 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-26 08:04:10 +00:00
Andrew Trick	c9b1e25493	Enable the new LoopInfo algorithm by default. The primary advantage is that loop optimizations will be applied in a stable order. This helps debugging and unit test creation. It is also a better overall implementation without pathologically bad performance on deep functions. On large functions (llvm-stress --size=200000 \| opt -loops) Before: 0.1263s After: 0.0225s On deep functions (after tweaking llvm-stress, thanks Nadav): Before: 0.2281s After: 0.0227s See r158790 for more comments. The loop tree is now consistently generated in forward order, but loop passes are applied in reverse order over the program. If we have a loop optimization that prefers forward order, that can easily be achieved by adding a different type of LoopPassManager. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159183 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-26 04:11:38 +00:00
Eli Friedman	52d418df5d	Make some ugly hacks for inline asm operands which name a specific register a bit more thorough. PR13196. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159176 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-25 23:42:33 +00:00
Jakob Stoklund Olesen	5984d2b31f	Run ProcessImplicitDefs on SSA form where it can be much simpler. Implicitly defined virtual registers can simply have the <undef> bit set on all uses, and copies can be turned into implicit defs recursively. Physical registers are a bit trickier. We handle the common case where a physreg def is used by a nearby instruction in the same basic block. For more complicated cases, just leave the IMPLICIT_DEF instruction in. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159149 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-25 18:12:18 +00:00
Jakob Stoklund Olesen	82d58b147f	%RCX is not a function live-out in eh.return functions. The function live-out registers must be live at all function returns, and %RCX is only used by eh.return. When a function also has a normal return, only %RAX holds a return value. This fixes PR13188. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159116 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-24 15:53:01 +00:00
Hans Wennborg	ce718ff9f4	Extend the IL for selecting TLS models (PR9788) This allows the user/front-end to specify a model that is better than what LLVM would choose by default. For example, a variable might be declared as @x = thread_local(initialexec) global i32 42 if it will not be used in a shared library that is dlopen'ed. If the specified model isn't supported by the target, or if LLVM can make a better choice, a different model may be used. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159077 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-23 11:37:03 +00:00
Chad Rosier	e5457d2116	FileCheckize tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159044 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-22 23:04:02 +00:00
Evan Cheng	c90a1fcf9f	EmitZerofill should take a 64-bit size or else it's chopping off large zero-filled global. rdar://11729134 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159023 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-22 20:14:46 +00:00
Jakob Stoklund Olesen	e208c49172	Functions calling __builtin_eh_return must have a frame pointer. The code in X86TargetLowering::LowerEH_RETURN() assumes that a frame pointer exists, but the frame pointer was forced by the presence of llvm.eh.unwind.init which isn't guaranteed. If llvm.eh.unwind.init is actually required in functions calling eh.return (is it?), we should diagnose that instead of emitting bad machine code. This should fix the dragonegg-x86_64-linux-gcc-4.6-test bot. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158961 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-22 03:04:27 +00:00
Jakob Stoklund Olesen	c4118452bc	Remove the -live-regunits command line option. Register allocators depend on it being permanently enabled now. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158873 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-20 23:31:34 +00:00
Jakob Stoklund Olesen	7824152557	Only update regunit live ranges that have been precomputed. Regunit live ranges are computed on demand, so when mi-sched calls handleMove, some regunits may not have live ranges yet. That makes updating them easier: Just skip the non-existing ranges. They will be computed correctly from the rescheduled machine code when they are needed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158831 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-20 18:00:57 +00:00
Craig Topper	703c38bf58	Don't insert 128-bit UNDEF into 256-bit vectors. Just keep the 256-bit vector. Original patch by Elena Demikhovsky. Tweaked by me to allow possibility of covering more cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158792 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-20 05:39:26 +00:00
Rafael Espindola	565bdbf598	really add a triple :-( git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158696 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-19 02:17:35 +00:00
Rafael Espindola	e08c1347d9	Add a triple to the test. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158695 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-19 01:42:34 +00:00
Rafael Espindola	d6b43a317e	Move the support for using .init_array from ARM to the generic TargetLoweringObjectFileELF. Use this to support it on X86. Unlike ARM, on X86 it is not easy to find out if .init_array should be used or not, so the decision is made via TargetOptions and defaults to off. Add a command line option to llc that enables it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158692 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-19 00:48:28 +00:00
Chandler Carruth	457dfbac8a	Add a regression test for the bug exposed by r158087, which has been temporarily reverted. This test is annoyingly overspecified, but I don't know of another way to thoroughly test the saving and restoring of the registers. While this will have to be adjusted even with the issue fixed in order to re-apply r158087, those adjustments should very clearly indicate that it is still correct (%esp getting restored prior to pops), whereas without it, this case can easily slip under the radar. Still, any suggestions for improvements are very welcome. All credit to Matt Beaumont-Gay for reducing this out of an insane Address Sanitizer crash to a reasonably small seg-faulting C program when built with -mstackrealign. I just reduced it to IR, which was much simpler. =] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158656 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-18 09:15:04 +00:00
Chandler Carruth	43369249e7	Temporarily revert r158087. This patch causes problems when both dynamic stack realignment and dynamic allocas combine in the same function. With this patch, we no longer build the epilog correctly, and silently restore registers from the wrong position in the stack. Thanks to Matt for tracking this down, and getting at least an initial test case to Chad. I'm going to try to check a variation of that test case in so we can easily track the fixes required. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158654 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-18 07:03:12 +00:00
Craig Topper	cc95b57d42	Fix intrinsics for XOP frczss/sd instructions. These instructions only take one source register and zero the upper bits of the destination rather than preserving them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158396 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-13 07:18:53 +00:00
Craig Topper	c29106b36f	Replace XOP vpcom intrinsics with fewer intrinsics that take the immediate as an argument. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158278 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-09 16:46:13 +00:00
Jakob Stoklund Olesen	6660ed5f2f	Don't run RAFast in the optimizing regalloc pipeline. The fast register allocator is not supposed to work in the optimizing pipeline. It doesn't make sense to compute live intervals, run full copy coalescing, and then run RAFast. Fast register allocation in the optimizing pipeline is better done by RABasic. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158242 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-08 23:15:12 +00:00
Manman Ren	6620ccf5d8	Test case for r158160 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158218 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-08 18:42:37 +00:00
Manman Ren	9236362a64	X86: optimize generated code for integer ABS This patch will generate the following for integer ABS: movl %edi, %eax negl %eax cmovll %edi, %eax INSTEAD OF movl %edi, %ecx sarl $31, %ecx leal (%rdi,%rcx), %eax xorl %ecx, %eax There exists a target-independent DAG combine for integer ABS, which converts integer ABS to sar+add+xor. For X86, we match this pattern back to neg+cmov. This is implemented in PerformXorCombine. rdar://10695237 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158175 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-07 22:39:10 +00:00
Rafael Espindola	c07f5bbd3b	Use a base register instead of an index register with the local dynamic model. Fixes pr13048. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158158 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-07 18:39:19 +00:00
Manman Ren	87253c2ebd	X86: replace SUB with CMP if possible This patch will optimize the following movq %rdi, %rax subq %rsi, %rax cmovsq %rsi, %rdi movq %rdi, %rax to cmpq %rsi, %rdi cmovsq %rsi, %rdi movq %rdi, %rax Perform this optimization if the actual result of SUB is not used. rdar: 11540023 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158126 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-07 00:42:47 +00:00
Manman Ren	2afde7782d	Revert r157755. The commit is intended to fix rdar://11540023. It is implemented as part of peephole optimization. We can actually implement this in the SelectionDAG lowering phase. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158122 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-06 23:53:03 +00:00
Chad Rosier	a97b180fc4	Add support for dynamic stack realignment in the presence of dynamic allocas on X86. rdar://11496434 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@158087 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-06 17:37:40 +00:00
Nadav Rotem	fcb2c3cf5e	Remove the "-promote-elements" flag. This flag is now enabled by default. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157925 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-04 11:27:21 +00:00
Craig Topper	a15f9d5311	Rename FMA3 feature flag to just FMA to match gcc so it can be added to clang. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157903 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-03 18:58:46 +00:00
Craig Topper	529ce07c5f	Rename fma4 intrinsics to just fma since they are now used for both FMA4 and FMA3. Autoupgrade support coming in a separate commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157898 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-03 07:26:46 +00:00
Manman Ren	c73ea9102b	Revert r157831 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157896 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-03 03:14:24 +00:00
Craig Topper	57ae246a6a	Use sse_load_f32/64 for scalar FMA3 intrinsic patterns instead of 128-bit loads to match instruction behavior. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157895 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-03 01:40:43 +00:00
Manman Ren	73c2f7f5ed	X86: peephole optimization to remove cmp instruction This patch will optimize the following: sub r1, r3 cmp r3, r1 or cmp r1, r3 bge L1 TO sub r1, r3 bge L1 or ble L1 If the branch instruction can use flag from "sub", then we can eliminate the "cmp" instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157831 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-01 19:49:33 +00:00
Chris Lattner	2b76473929	testcase for PR13006, thanks to Duncan for filing it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157824 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-01 18:19:46 +00:00
Hans Wennborg	f0234fcbc9	Implement the local-dynamic TLS model for x86 (PR3985) This implements codegen support for accesses to thread-local variables using the local-dynamic model, and adds a clean-up pass so that the base address for the TLS block can be re-used between local-dynamic access on an execution path. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157818 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-01 16:27:21 +00:00
Craig Topper	3a8172ad8d	Remove fadd(fmul) patterns for FMA3. This needs to be implemented by paying attention to FP_CONTRACT and matching @llvm.fma which is not available yet. This will allow us to enablle intrinsic use at least though. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157804 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-01 06:07:48 +00:00
Chris Lattner	f59e4e3452	enhance the logic for looking through tailcalls to look through transparent casts in multiple-return value scenarios, like what happens on X86-64 when returning small structs. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157800 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-01 05:29:15 +00:00
Chris Lattner	5b0d946537	enhance getNoopInput to know about vector<->vector bitcasts of legal types, as well as int<->ptr casts. This allows us to tailcall functions with some trivial casts between the call and return (i.e. because the return types disagree). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157798 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-01 05:16:33 +00:00
Chris Lattner	09c14c0836	add some simple 64-bit tail call tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157797 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-01 05:03:31 +00:00
Chris Lattner	e8ea60b8ba	merge some tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157795 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-01 05:00:54 +00:00
Chris Lattner	e109648880	rename test git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157794 91177308-0d34-0410-b5e6-96231b3b80d8	2012-06-01 04:58:50 +00:00
Manman Ren	91c5346d91	X86: replace SUB with CMP if possible This patch will optimize the following movq %rdi, %rax subq %rsi, %rax cmovsq %rsi, %rdi movq %rdi, %rax to cmpq %rsi, %rdi cmovsq %rsi, %rdi movq %rdi, %rax Perform this optimization if the actual result of SUB is not used. rdar: 11540023 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157755 91177308-0d34-0410-b5e6-96231b3b80d8	2012-05-31 17:20:29 +00:00
Elena Demikhovsky	177cf1e1a3	Added FMA3 Intel instructions. I disabled FMA3 autodetection, since the result may differ from expected for some benchmarks. I added tests for GodeGen and intrinsics. I did not change llvm.fma.f32/64 - it may be done later. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157737 91177308-0d34-0410-b5e6-96231b3b80d8	2012-05-31 09:20:20 +00:00
Craig Topper	0559a2f8ae	Add intrinsic for pclmulqdq instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157731 91177308-0d34-0410-b5e6-96231b3b80d8	2012-05-31 04:37:40 +00:00
Jakob Stoklund Olesen	9cda1be0aa	Prioritize smaller register classes for urgent evictions. It helps compile exotic inline asm. In the test case, normal GR32 virtual registers use up eax-edx so the final GR32_ABCD live range has no registers left. Since all the live ranges were tiny, we had no way of prioritizing the smaller register class. This patch allows tiny unspillable live ranges to be evicted by tiny unspillable live ranges from a smaller register class. <rdar://problem/11542429> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157715 91177308-0d34-0410-b5e6-96231b3b80d8	2012-05-30 21:46:58 +00:00
Chris Lattner	f186df0d3e	it's pointed out that R11 can be used for magic things, and doing things just for 64-bit registers is silly. Just optimize 3 more. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157699 91177308-0d34-0410-b5e6-96231b3b80d8	2012-05-30 18:08:02 +00:00
Chris Lattner	5aaabbfe62	Extend the (abi-irrelevant) return convention to be able to return more than two values in integer registers. This is already supported by the fastcc convention, but it doesn't hurt to support it in the standard conventions as well. In cases where we can cheat at the calling convention, this allows us to avoid returning things through memory in more cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157698 91177308-0d34-0410-b5e6-96231b3b80d8	2012-05-30 17:50:14 +00:00
Benjamin Kramer	1386e9b7b1	Add intrinsics, code gen, assembler and disassembler support for the SSE4a extrq and insertq instructions. This required light surgery on the assembler and disassembler because the instructions use an uncommon encoding. They are the only two instructions in x86 that use register operands and two immediates. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157634 91177308-0d34-0410-b5e6-96231b3b80d8	2012-05-29 19:05:25 +00:00
Chris Lattner	c32cef6aa1	These tests used intrinsics with the wrong prototype. They weren't caught because the old verifier just checked that something "was a pointer", but not that the pointee was correct. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157544 91177308-0d34-0410-b5e6-96231b3b80d8	2012-05-27 19:35:41 +00:00
Benjamin Kramer	c511b2a5a1	SelectionDAGBuilder: When emitting small compare chains for switches order them by using edge weights. SimplifyCFG tends to form a lot of 2-3 case switches when merging branches. Move the most likely condition to the front so it is checked first and the others can be skipped. This is currently not as effective as it could be because SimplifyCFG destroys profiling metadata when merging branches and switches. Merging branch weight metadata is tricky though. This code touches at most 3 cases so I didn't use a proper sorting algorithm. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157521 91177308-0d34-0410-b5e6-96231b3b80d8	2012-05-26 20:01:32 +00:00
NAKAMURA Takumi	f755e0001a	test/CodeGen/X86/bigstructret.ll: Suppress one test. It is msvc-incompatible. (compatible to mingw32 and netbsd, though) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@157474 91177308-0d34-0410-b5e6-96231b3b80d8	2012-05-25 15:40:54 +00:00

1 2 3 4 5 ...

3672 Commits