llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2025-02-23 20:29:30 +00:00

Author	SHA1	Message	Date
Simon Pilgrim	c00cd8e3c8	[X86] Memory folding for commutative instructions. This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning. This mainly helps the stack inliner better fold reloads of 3 (or more) operand instructions (VEX encoded SSE etc.) but by performing this in the lowest foldMemoryOperandImpl implementation it also replaces the X86InstrInfo::optimizeLoadInstr version and is now used by FastISel too. Differential Revision: http://reviews.llvm.org/D5701 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219584 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-12 10:52:55 +00:00
Simon Pilgrim	b6e1b30957	Test commit access (email fix) Indentation tidyup. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219577 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-11 20:28:56 +00:00
Benjamin Kramer	fedd0e2a21	MC: Bit pack MCSymbolData. On x86_64 this brings it from 80 bytes to 64 bytes. Also make any member variables private and clean up uses to go through the existing accessors. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219573 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-11 15:07:21 +00:00
Simon Pilgrim	195be17c96	Test commit access Fix comment typo + spelling. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219572 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-11 14:23:36 +00:00
Chandler Carruth	f2f5070d79	Don't use an unqualified 'abs' function call with a builtin type. This is dangerous for numerous reasons. The primary risk here is with floating point or double types where if the wrong header files are included in a strange order this can implicitly convert to integers and then call the C abs function on the integers. There is a secondary risk that even impacts integers where if the namespace the code is written in ever defines an abs overload for types within that namespace the global abs will be hidden. The correct form is to call std::abs or write 'using std::abs' for builtin types (and only the latter is correct in any generic context). I've also added the requisite header to be a bit more explicit here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219484 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-10 08:27:19 +00:00
Robert Khasanov	340b5b9ad7	[AVX512] Extended avx512_binop_rm for AVX512VL subsets. Added avx512_binop_rm_vl multiclass for VL subset Added encoding tests git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219390 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-09 08:38:48 +00:00
Adam Nemet	8088af2547	[AVX512] Rename AVX512_masking* to AVX512_maskable* No functional change. This is the current AVX512_maskable multiclass hierarchy: maskable_custom / \ / \ maskable_common maskable_in_asm / \ / \ maskable maskable_3src git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219363 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-08 23:25:39 +00:00
Adam Nemet	fbd0e464dd	[AVX512] Intrinsics for vextract*x4 This adds the Pat<>'s for the intrinsics. These are necessary because we don't lower these intrinsics to SDNodes but match them directly. See the rational in the previous commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219362 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-08 23:25:37 +00:00
Adam Nemet	e868005a27	[AVX512] Add asm-only support for vextractx4 masking variants These derive from the new asm-only masking definitions. Unfortunately I wasn't able to find a ISel pattern that we could legally generate for the masking variants. The problem is that since the destination is v4 we would need VK4 register classes and v4i1 value types to express the masking. These are however not legal types/classes in AVX512f but only in VL, so things get complicated pretty quickly. We can revisit this question later if we have a more pressing need to express something like this. So the ISel patterns are empty for the masking instructions and the next patch will add Pat<>s instead to match the intrinsics calls with instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219361 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-08 23:25:33 +00:00
Adam Nemet	9d0ec9212b	[AVX512] Move DAG for all-zero node to X86VectorVTInfo No functional change. No change in X86.td.expanded except for the appearance of the new attributes. The new attributes will be used in the subsequent patch. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219360 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-08 23:25:31 +00:00
Adam Nemet	6feb834941	[AVX512] Peel off an asm-only class from AVX512_masking_common. No functional change. This enables the generation of masking instructions that don't provide a ISel pattern. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219358 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-08 23:25:23 +00:00
Robin Morisset	b79d91ca1c	[X86] Don't transform atomic-load-add into an inc/dec when inc/dec is slow git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219357 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-08 23:16:23 +00:00
Robin Morisset	48dfa127d7	[X86] Avoid generating inc/dec when slow for x.atomic_store(1 + x.atomic_load()) Summary: I had forgotten to check for NotSlowIncDec in the patterns that can generate inc/dec for the above pattern (added in D4796). This currently applies to Atom Silvermont, KNL and SKX. Test Plan: New checks on atomic_mi.ll Reviewers: jfb, nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5677 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219336 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-08 19:38:18 +00:00
Robert Khasanov	0e3754615e	[AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VPCMP/VPCMPU{BWDQ} Added CMP_MASK_CC intrinsic type. Added tests for intrinsics. Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219316 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-08 15:49:26 +00:00
Robert Khasanov	e659ba92c8	[AVX512] Refactoring of avx512_binop_rm multiclass through AVX512_masking. Added new argrument for AVX512_masking: InstrItinClass and bit isCommutable. No functional change. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219310 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-08 14:37:45 +00:00
Eric Christopher	b7cd35b171	Cache TargetLowering on SelectionDAGISel and update previous calls to getTargetLowering() with the cached variable. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219284 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-08 07:32:17 +00:00
Robin Morisset	28127ffb51	[X86] Fix a bug with fetch_add(INT32_MIN) Summary: Fix pr21099 The pseudocode of what we were doing (spread through two functions) was: if (operand.doesNotFitIn32Bits()) Opc.initializeWithFoo(); if (operand < 0) operand = -operand; if (operand.doesFitIn8Bits()) Opc.initializeWithBar(); else if (operand.doesFitIn32Bits()) Opc.initializeWithBlah(); doStuff(Opc); So for operand == INT32_MIN, Opc was never initialized because the operand changes from fitting in 32 bits to not fitting, causing the various bugs/error messages noted by pr21099. This patch adds an extra test at the beginning for this case, and an llvm_unreachable to have better error message if the operand ends up not fitting in 32-bits at the end. Test Plan: new test + make check Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5655 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219257 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-07 23:53:57 +00:00
Yuri Gorshenin	86e0844d1c	[asan-asm-instrumentation] CFI directives are generated for .S files. Summary: CFI directives are generated for .S files. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5520 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219199 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-07 11:03:09 +00:00
Craig Topper	95717dbb11	[X86] Fix a bug where the disassembler was ignoring the VEX.W bit in 32-bit mode for certain instructions it shouldn't. Unfortunately, this isn't easy to fix since there's no simple way to figure out from the disassembler tables whether the W-bit is being used to select a 64-bit GPR or if its a required part of the opcode. The fix implemented here just looks for "64" in the instruction name and ignores the W-bit in 32-bit mode if its present. Fixes PR21169. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219194 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-07 07:29:50 +00:00
Craig Topper	ada4703cba	Formatting fixes. Most putting 'else' on the same line as the preceding curly brace. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219193 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-07 07:29:48 +00:00
Craig Topper	038a3b8d65	Fix filename in header and use C++ version of the C header files. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219192 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-07 07:29:46 +00:00
Benjamin Kramer	043994e266	X86: Drop the isConvertibleTo3Addr bit from shufps/shufpd now that we don't convert them anymore. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219112 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-06 09:56:40 +00:00
Eric Christopher	e6f32be8df	Add subtarget caches to aarch64, arm, ppc, and x86. These will make it easier to test further changes to the code generation and optimization pipelines as those are moved to subtargets initialized with target feature and target cpu. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219106 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-06 06:45:36 +00:00
Chandler Carruth	1d02acb7a0	[x86] Remove the 2-addr-to-3-addr "optimization" from shufps to pshufd. This trades a (register-renamer-friendly) movaps for a floating point / integer domain cross. That is a very bad trade, even on architectures where domain crossing is relatively fast. On any chip where there is even a cycle stall, this is a Very Bad Idea. It doesn't even seem likely to cause a spill to be introduced because the reason for the copy is to destructively shuffle in place. Thanks to Ben Kramer for fixing a bug in this code that my new shuffle lowering exposed and highlighting that perhaps it should just go away. =] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219090 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-05 22:57:31 +00:00
Benjamin Kramer	88b3a52eec	X86: Don't drop half of the mask when converting 2-address shufps into 3-address pshufd. It's debatable whether this transform is useful at all, but for now make sure we don't generate invalid asm. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219084 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-05 16:14:29 +00:00
Elena Demikhovsky	a0cb2c75b0	AVX-512-SKX: Added instruction VPMOVM2B/W/D/Q. This instruction allows to broadacst mask vector to data vector. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219083 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-05 14:11:08 +00:00
Chandler Carruth	30ae72f02c	[x86] Fix PR21139, one of the last remaining regressions found in the new vector shuffle lowering. This is loosely based on a patch by Marius Wachtler to the PR (thanks!). I refactored it a bi to use std::count_if and a mutable array ref but the core idea was exactly right. I also added some direct testing of this case. I believe PR21137 is now the only remaining regression. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219081 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-05 12:07:34 +00:00
Chandler Carruth	a644b090de	[x86] Teach the new vector shuffle lowering how to lower 128-bit shuffles using AVX and AVX2 instructions. This fixes PR21138, one of the few remaining regressions impacting benchmarks from the new vector shuffle lowering. You may note that it "regresses" many of the vperm2x128 test cases -- these were actually "improved" by the naive lowering that the new shuffle lowering previously did. This regression gave me fits. I had this patch ready-to-go about an hour after flipping the switch but wasn't sure how to have the best of both worlds here and thought the correct solution might be a completely different approach to lowering these vector shuffles. I'm now convinced this is the correct lowering and the missed optimizations shown in vperm2x128 are actually due to missing target-independent DAG combines. I've even written most of the needed DAG combine and will submit it shortly, but this part is ready and should help some real-world benchmarks out. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219079 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-05 11:41:36 +00:00
Chandler Carruth	03a77831cc	[x86] Enable the new vector shuffle lowering by default. Update the entire regression test suite for the new shuffles. Remove most of the old testing which was devoted to the old shuffle lowering path and is no longer relevant really. Also remove a few other random tests that only really exercised shuffles and only incidently or without any interesting aspects to them. Benchmarking that I have done shows a few small regressions with this on LNT, zero measurable regressions on real, large applications, and for several benchmarks where the loop vectorizer fires in the hot path it shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy Bridge machines. Running on AMD machines shows even more dramatic improvements. When using newer ISA vector extensions the gains are much more modest, but the code is still better on the whole. There are a few regressions being tracked (PR21137, PR21138, PR21139) but by and large this is expected to be a win for x86 generated code performance. It is also more correct than the code it replaces. I have fuzz tested this extensively with ISA extensions up through AVX2 and found no crashes or miscompiles (yet...). The old lowering had a few miscompiles and crashers after a somewhat smaller amount of fuzz testing. There is one significant area where the new code path lags behind and that is in AVX-512 support. However, there was extremely little support for that already and so this isn't a significant step backwards and the new framework will probably make it easier to implement lowering that uses the full power of AVX-512's table-based shuffle+blend (IMO). Many thanks to Quentin, Andrea, Robert, and others for benchmarking assistance. Thanks to Adam and others for help with AVX-512. Thanks to Hal, Eric, and many others for answering my incessant questions about how the backend actually works. =] I will leave the old code path in the tree until the 3 PRs above are at least resolved to folks' satisfaction. Then I will rip it (and 1000s of lines of code) out. =] I don't expect this flag to stay around for very long. It may not survive next week. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219046 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-04 03:52:55 +00:00
Chandler Carruth	1e663cf69c	[x86] Fix a bug in the VZEXT DAG combine that I just made more powerful. It turns out this combine was always somewhat flawed -- there are cases where nested VZEXT nodes can't be combined: if their types have a mismatch that can be observed in the result. While none of these show up in currently, once I switch to the new vector shuffle lowering a few test cases actually form such nested VZEXT nodes. I've not come up with any IR pattern that I can sensible write to exercise this, but it will be covered by tests once I flip the switch. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219044 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-04 02:51:03 +00:00
Chandler Carruth	cd2c1d8db1	[x86] Sink a generic combine of VZEXT nodes from the lowering to VZEXT nodes to the DAG combining of them. This will allow the combine to fire on both old vector shuffle lowering and the new vector shuffle lowering and generally seems like a cleaner design. I've trimmed down the code a bit and tried to make it and the surrounding combine fairly clean while moving it around. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219042 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-04 01:05:48 +00:00
Chandler Carruth	f159de96bd	[x86] Add a really preposterous number of patterns for matching all of the various ways in which blends can be used to do vector element insertion for lowering with the scalar math instruction forms that effectively re-blend with the high elements after performing the operation. This then allows me to bail on the element insertion lowering path when we have SSE4.1 and are going to be doing a normal blend, which in turn restores the last of the blends lost from the new vector shuffle lowering when I got it to prioritize insertion in other cases (for example when we don't have a blend instruction). Without the patterns, using blends here would have regressed sse-scalar-fp-arith.ll completely with the new vector shuffle lowering. For completeness, I've added RUN-lines with the new lowering here. This is somewhat superfluous as I'm about to flip the default, but hey, it shows that this actually significantly changed behavior. The patterns I've added are just ridiculously repetative. Suggestions on making them better very much welcome. In particular, handling the commuted form of the v2f64 patterns is somewhat obnoxious. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219033 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-03 22:43:17 +00:00
Chandler Carruth	91ea3e41ae	[x86] Adjust the patterns for lowering X86vzmovl nodes which don't perform a load to use blendps rather than movss when it is available. For non-loads, blendps is much faster. It can execute on two ports in Sandy Bridge and Ivy Bridge, and three ports on Haswell. This fixes one of the "regressions" from aggressively taking the "insertion" path in the new vector shuffle lowering. This does highlight one problem with blendps -- it isn't commuted as heavily as it should be. That's future work though. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219022 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-03 21:38:49 +00:00
Adam Nemet	726942c8bb	[ISel] Keep matching state consistent when folding during X86 address match In the X86 backend, matching an address is initiated by the 'addr' complex pattern and its friends. During this process we may reassociate and-of-shift into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the shift into the scale of the address. However as demonstrated by the testcase, this can trigger CSE of not only the shift and the AND which the code is prepared for but also the underlying load node. In the testcase this node is sitting in the RecordedNode and MatchScope data structures of the matcher and becomes a deleted node upon CSE. Returning from the complex pattern function, we try to access it again hitting an assert because the node is no longer a load even though this was checked before. Now obviously changing the DAG this late is bending the rules but I think it makes sense somewhat. Outside of addresses we prefer and-of-shift because it may lead to smaller immediates (FoldMaskAndShiftToScale is an even better example because it create a non-canonical node). We currently don't recognize addresses during DAGCombiner where arguably this canonicalization should be performed. On the other hand, having this in the matcher allows us to cover all the cases where an address can be used in an instruction. I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible for initiating the recursive CSE on users (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it is not strictly necessary since the shift is hooked into the visited user. Of course it's safer to keep the DAG consistent at all times (e.g. for accurate number of uses, etc.). So rather than changing the fundamentals, I've decided to continue along the previous patches and detect the CSE. This patch installs a very targeted DAGUpdateListener for the duration of a complex-pattern match and updates the matching state accordingly. (Previous patches used HandleSDNode to detect the CSE but that's not practical here). The listener is only installed on X86. I tested that there is no measurable overhead due to this while running through the spec2k BC files with llc. The only thing we pay for is the creation of the listener. The callback never ever triggers in spec2k since this is a corner case. Fixes rdar://problem/18206171 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219009 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-03 20:00:34 +00:00
Chandler Carruth	dce98e6739	[x86] Teach the new vector shuffle lowering to aggressively form MOVSS and MOVSD nodes for single element vector inserts. This is particularly important because a number of patterns in the backend detect these patterns and leverage them to simplify things. It also fixes quite a few of the insertion bad code examples. However, it regresses a specific area: when available, blendps and blendpd are dramatically faster than movss and movsd respectively. But it doesn't really work to form the blend logic first because the blends aren't as crazy efficient when the data is coming from memory anyways, and thus will have a movss or movsd regardless. Also, doing that would block a bunch of the patterns that this is designed to hit. So my plan is to go into the patterns for lowering MOVSS and MOVSD and lower them via blends when available. However that's a pretty invasive restructuring so it will need to be a follow-up patch. I have already gone into the patterns to lower MOVSS and MOVSD from memory using MOVLPD, etc. Without that, several of the test cases I already have regress. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218985 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-03 13:11:13 +00:00
Chandler Carruth	7ae6f2abf6	[x86] Refactor the element insertion logic in the new vector shuffle lowering to handle the potential mirroring of 2-element vectors (because we can't reliably sort them one way) in the caller rather than in the insertion logic. This will simplify things considerably as more ways to fail to match the insertion are added because now we have a nice try and retry point. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218980 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-03 12:01:55 +00:00
Chandler Carruth	01b3858e66	[x86] Significantly improve the ability of the new vector shuffle lowering to match VZEXT_MOVL patterns. I hadn't realized that these had sufficient pattern smarts in the backend to lower zext-ing from the low element of a vector without it being a scalar_to_vector node. They do, and this is how to match a bunch of patterns for movq, movss, etc. There is a weird propensity to end up using pshufd to place the element afterward even though it means domain crossing (or rather, to use xorps+movss to zext the element rather than movq) but that's an orthogonal problem with VZEXT_MOVL that someone should probably look at. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218977 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-03 11:25:58 +00:00
Chandler Carruth	53bf81ae59	[x86] Unbreak SSE1 with the new vector shuffle lowering. We can't widen element types to form illegal vector types. I've added a special SSE1 test case here that makes sure we don't break this going forward. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218974 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-03 10:11:39 +00:00
Adam Nemet	6955c9d1ac	[AVX512] Pull pattern for subvector insert into the instruction definition No functional change intended. Very similar to the change I made for subvector extract in r218480. test/CodeGen/X86/avx512-insert-extract.ll covers this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218928 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-02 23:18:30 +00:00
Adam Nemet	d9e2cc7fa0	[AVX512] Refactor subvector inserts No functional change. Very similar to the extract refactoring I did in r218478. Compared X86.td.expanded before and after. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218927 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-02 23:18:28 +00:00
Adam Nemet	a9014e5530	[AVX512] Fix i256mem->f256mem typo in VINSERTF64x4rm Just like in the case of extracts, the refactoring is uncovering some typos in the code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218926 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-02 23:18:26 +00:00
Juergen Ributzka	b3f91b0af7	[Stackmaps] Make ithe frame-pointer required for stackmaps. Do not eliminate the frame pointer if there is a stackmap or patchpoint in the function. All stackmap references should be FP relative. This fixes PR21107. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218920 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-02 22:21:49 +00:00
Chandler Carruth	bf21d40070	[x86] Teach the new vector shuffle lowering to widen floating point elements as well as integer elements in order to form simpler shuffle patterns. This is the primary reason why we were failing to match some of the 2-and-2 floating point shuffles such as PR21140. Even after fixing this we need to support some extra patterns in the backend in order to match the resulting X86ISD::UNPCKL nodes into the correct instructions. This commit should fix PR21140 and includes more comprehensive testing of insertion patterns in v4 shuffles. Not all of the added tests are beautiful. For example, we don't have clever instructions to insert-via-load in the integer domain. There are also some places where we aren't sufficiently cunning with our use of movq and movd, but that's future work. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218911 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-02 21:37:14 +00:00
Chandler Carruth	4bbf21e71e	[x86] Improve and correct how the new vector shuffle lowering was matching and lowering 64-bit insertions. The first problem was that we weren't looking through bitcasts to discover that we could lower as insertions. Once fixed, we in turn weren't looking through bitcasts to discover that we could fold a load into the lowering. Once fixed, we weren't forming a SCALAR_TO_VECTOR node around the inserted element and instead were passing a scalar to a DAG node that expected a vector. It turns out there are some patterns that will "lower" this into the correct asm, but the rest of the X86 backend is very unhappy with such antics. This should fix a few more edge case regressions I've spotted going through the regression test suite to enable the new vector shuffle lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218839 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 23:14:28 +00:00
Sanjay Patel	2b918388ab	Lower FNEG ( FABS (x) ) -> FNABS (x) [X86 codegen] PR20578 Negative FABS of either a scalar or vector should be handled the same way on x86 with SSE/AVX: a single OR instruction of the FP operand with a constant to light up the sign bit(s). http://llvm.org/bugs/show_bug.cgi?id=20578 Differential Revision: http://reviews.llvm.org/D5201 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218822 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 21:20:06 +00:00
Eric Christopher	2e07dedce3	constify TargetMachine parameter for X86TargetLowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218804 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 20:38:22 +00:00
Sanjay Patel	72447214a6	Don't repeat function/variable name in comment. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218791 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 19:39:32 +00:00
Adrian Prantl	02474a32eb	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218787 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 18:55:02 +00:00
Adrian Prantl	10c4265675	Revert r218778 while investigating buldbot breakage. "Move the complex address expression out of DIVariable and into an extra" git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218782 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 18:10:54 +00:00
Adrian Prantl	076fd5dfc1	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218778 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 17:55:39 +00:00

1 2 3 4 5 ...

10926 Commits