llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2024-07-08 18:30:04 +00:00

Author	SHA1	Message	Date
Eric Christopher	74678a1ed1	Add a license header to the AVX512 file. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229941 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-20 00:36:53 +00:00
Benjamin Kramer	1ce666d86c	Demote vectors to arrays. No functionality change. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229861 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 15:26:17 +00:00
Chandler Carruth	b7012af85f	[x86] Delete still more piles of complex code now that we have a good systematic lowering of v8i16. This required a slight strategy shift to prefer unpack lowerings in more places. While this isn't a cut-and-dry win in every case, it is in the overwhelming majority. There are only a few places where the old lowering would probably be a touch faster, and then only by a small margin. In some cases, this is yet another significant improvement. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229859 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 15:21:57 +00:00
Chandler Carruth	c57e90422f	[x86] Teach the unpack lowering how to lower with an initial unpack in addition to lowering to trees rooted in an unpack. This saves shuffles and or registers in many various ways, lets us handle another class of v4i32 shuffles pre SSE4.1 without domain crosses, etc. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229856 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 15:06:13 +00:00
Chandler Carruth	7f583a4201	[x86] Dramatically improve v8i16 shuffle lowering by not using its terribly complex partial blend logic. This code path was one of the more complex and bug prone when it first went in and it hasn't faired much better. Ultimately, with the simpler basis for unpack lowering and support bit-math blending, this is completely obsolete. In the worst case without this we generate different but equivalent instructions. However, in many cases we generate much better code. This is especially true when blends or pshufb is available. This does expose one (minor) weakness of the unpack lowering that I'll try to address. In case you were wondering, this is actually a big part of what I've been trying to pull off in the recent string of commits. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229853 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 14:08:24 +00:00
Chandler Carruth	943b2ca2de	[x86] Remove the final fallback in the v8i16 lowering that isn't really needed, and significantly improve the SSSE3 path. This makes the new strategy much more clear. If we can blend, we just go with that. If we can't blend, we try to permute into an unpack so that we handle cases where the unpack doing the blend also simplifies the shuffle. If that fails and we've got SSSE3, we now call into factored-out pshufb lowering code so that we leverage the fact that pshufb can set up a blend for us while shuffling. This generates great code, especially because we know we don't have a fast blend at this point. Finally, we fall back on decomposing into permutes and blends because we do at least have a bit-math-based blend if we need to use that. This pretty significantly improves some of the v8i16 code paths. We never need to form pshufb for the single-input shuffles because we have effective target-specific combines to form it there, but we were missing its effectiveness in the blends. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229851 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 13:56:49 +00:00
Chandler Carruth	c3d7858505	[x86] Simplify the pre-SSSE3 v16i8 lowering significantly by decomposing them into permutes and a blend with the generic decomposition logic. This works really well in almost every case and lets the code only manage the expansion of a single input into two v8i16 vectors to perform the actual shuffle. The blend-based merging is often much nicer than the pack based merging that this replaces. The only place where it isn't we end up blending between two packs when we could do a single pack. To handle that case, just teach the v2i64 lowering to handle these blends by digging out the operands. With this we're down to only really random permutations that cause an explosion of instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229849 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 13:15:12 +00:00
Chandler Carruth	3d4542ce3d	[x86] Remove the insanely over-aggressive unpack lowering strategy for v16i8 shuffles, and replace it with new facilities. This uses precise patterns to match exact unpacks, and the new generalized unpack lowering only when we detect a case where we will have to shuffle both inputs anyways and they terminate in exactly a blend. This fixes all of the blend horrors that I uncovered by always lowering blends through the vector shuffle lowering. It also removes sooooo much of the crazy instruction sequences required for v16i8 lowering previously. Much cleaner now. The only "meh" aspect is that we sometimes use pshufb+pshufb+unpck when it would be marginally nicer to use pshufb+pshufb+por. However, the difference there is tiny. In many cases its a win because we re-use the pshufb mask. In others, we get to avoid the pshufb entirely. I've left a FIXME, but I'm dubious we can really do better than this. I'm actually pretty happy with this lowering now. For SSE2 this exposes some horrors that were really already there. Those will have to fixed by changing a different path through the v16i8 lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229846 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 12:10:37 +00:00
Chandler Carruth	71164d08b1	[x86] The SELECT x86 DAG combine also does legalization. It used to rely on things not being marked as either custom or legal, but we now do custom lowering of more VSELECT nodes. To cope with this, manually replicate the legality tests here. These have to stay in sync with the set of tests used in the custom lowering of VSELECT. Ideally, we wouldn't do any of this combine-based-legalization when we have an actual custom legalization step for VSELECT, but I'm not going to be able to rewrite all of that today. I don't have a test case for this currently, but it was found when compiling a number of the test-suite benchmarks. I'll try to reduce a test case and add it. This should at least fix the test-suite fallout on build bots. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229844 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 11:43:37 +00:00
Michael Kuperstein	2b5910a767	Reverting r229831 due to multiple ARM/PPC/MIPS build-bot failures. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229841 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 11:38:11 +00:00
Elena Demikhovsky	675d06d1d0	AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229837 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 10:48:04 +00:00
Chandler Carruth	ac2b1a1bb3	[x86] Add support for bit-wise blending and use it in the v8 and v16 lowering paths. I'm going to be leveraging this to simplify a lot of the overly complex lowering of v8 and v16 shuffles in pre-SSSE3 modes. Sadly, this isn't profitable on v4i32 and v2i64. There, the float and double blending instructions for pre-SSE4.1 are actually pretty good, and we can't beat them with bit math. And once SSE4.1 comes around we have direct blending support and this ceases to be relevant. Also, some of the test cases look odd because the domain fixer canonicalizes these to floating point domain. That's OK, it'll use the integer domain when it matters and some day I may be able to update enough of LLVM to canonicalize the other way. This restores almost all of the regressions from teaching x86's vselect lowering to always use vector shuffle lowering for blends. The remaining problems are because the v16 lowering path is still doing crazy things. I'll be re-arranging that strategy in more detail in subsequent commits to finish recovering the performance here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229836 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 10:46:52 +00:00
Chandler Carruth	a8fb39af83	[x86,sdag] Two interrelated changes to the x86 and sdag code. First, don't combine bit masking into vector shuffles (even ones the target can handle) once operation legalization has taken place. Custom legalization of vector shuffles may exist for these patterns (making the predicate return true) but that custom legalization may in some cases produce the exact bit math this matches. We only really want to handle this prior to operation legalization. However, the x86 backend, in a fit of awesome, relied on this. What it would do is mark VSELECTs as expand, which would turn them into arithmetic, which this would then match back into vector shuffles, which we would then lower properly. Amazing. Instead, the second change is to teach the x86 backend to directly form vector shuffles from VSELECT nodes with constant conditions, and to mark all of the vector types we support lowering blends as shuffles as custom VSELECT lowering. We still mark the forms which actually support variable blends as legal so that the custom lowering is bypassed, and the legal lowering can even be used by the vector shuffle legalization (yes, i know, this is confusing. but that's how the patterns are written). This makes the VSELECT lowering much more sensible, and in fact should fix a bunch of bugs with it. However, as you'll see in the test cases, right now what it does is point out the hilarious deficiency of the new vector shuffle lowering when it comes to blends. Fortunately, my very next patch fixes that. I can't submit it yet, because that patch, somewhat obviously, forms the exact and/or pattern that the DAG combine is matching here! Without this patch, teaching the vector shuffle lowering to produce the right code infloops in the DAG combiner. With this patch alone, we produce terrible code but at least lower through the right paths. With both patches, all the regressions here should be fixed, and a bunch of the improvements (like using 2 shufps with no memory loads instead of 2 andps with memory loads and an orps) will stay. Win! There is one other change worth noting here. We had hilariously wrong vectorization cost estimates for vselect because we fell through to the code path that assumed all "expand" vector operations are scalarized. However, the "expand" lowering of VSELECT is vector bit math, most definitely not scalarized. So now we go back to the correct if horribly naive cost of "1" for "not scalarized". If anyone wants to add actual modeling of shuffle costs, that would be cool, but this seems an improvement on its own. Note the removal of 16 and 32 "costs" for doing a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of course, we don't right now because of OMG bad code, but I'm going to fix that. Next patch. I promise. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229835 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 10:36:19 +00:00
Michael Kuperstein	23dd089d8f	Use std::bitset for SubtargetFeatures Previously, subtarget features were a bitfield with the underlying type being uint64_t. Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset. No functional change. Differential Revision: http://reviews.llvm.org/D7065 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229831 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 09:01:04 +00:00
Benjamin Kramer	e8a0a78bad	X86: Use bitset to manage a bag of bits. NFC. Doesn't matter in terms of memory usage or perf here, but it's a neat simplification. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229672 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 14:10:44 +00:00
Chandler Carruth	a5cc501201	[x86] Tighten the assertions to document that canonicalization has actually removed all but a very small number of choices for v2i64. Also remove dead code handling cases that simply cannot arise. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229670 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 11:46:29 +00:00
Chandler Carruth	406928ebba	[x86] Switch an if which is trivially true to an assert. NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229669 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 11:46:27 +00:00
Chandler Carruth	72cacedbb7	[x86] Remove some more 'bit' nomenclature from the generic shift lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229668 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 11:46:23 +00:00
Chandler Carruth	3378af8802	[x86] Fold together the two shift lowering strategies. They were doing quite literally the same work, we just need to special case the >64-bit element shift code emission to emit the byte shift instructions and offsets. This also makes reasoning about each of the vector lowering strategies easier as we don't have to remember to use both forms. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229662 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 10:40:38 +00:00
Chandler Carruth	4e8a4638e9	[x86] Refactor the bit shift code the same as I just did the byte shift code. While this didn't have the miscompile (it used MatchLeft consistently) it missed some cases where it could use right shifts. I've added a test case Craig Topper came up with to exercise the right shift matching. This code is really identical between the two. I'm going to merge them next so that we don't keep two copies of all of this logic. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229655 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 09:19:58 +00:00
Elena Demikhovsky	87483ed180	AVX-512: Added support for FP instructions with embedded rounding mode. By Asaf Badouh <asaf.badouh@intel.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229645 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 07:59:20 +00:00
Chandler Carruth	c9520b48ae	[x86] Rewrite the byte shift detection to not use boolean variables to track state. I didn't like this in the code review because the pattern tends to be error prone, but I didn't see a clear way to rewrite it. Turns out that there were bugs here, I found them when fuzz testing our shuffle lowering for correctness on x86. The core of the problem is that we need to consistently test all our preconditions for the same directionality of shift and the same input vector. Instead, formulate this as two predicates (one doesn't depend on the input in any way), pass things like the directionality and input vector as inputs, and loop over the alternatives. This fixes a pattern of very rare miscompiles coming out of this code. Turned up roughly 4 out of every 1 million v8 shuffles in my fuzz testing. The new code is over half a million test runs with no failures yet. I've also fuzzed every other function in the lowering code with over 3.5 million test cases and not discovered any other miscompiles. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229642 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 07:13:48 +00:00
Craig Topper	ed42dcef75	[X86] Remove AVX2 and SSE2 pslldq and psrldq intrinsics. We can represent them in IR with vector shuffles now. All their uses have been removed from clang in favor of shuffles. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229640 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-18 06:24:44 +00:00
Andrea Di Biagio	b3ff6a88b6	[X86][FastIsel] Teach how to select scalar integer to float/double conversions. This patch teaches fast-isel how to select a (V)CVTSI2SSrr for an integer to float conversion, and how to select a (V)CVTSI2SDrr for an integer to double conversion. Added test 'fast-isel-int-float-conversion.ll'. Differential Revision: http://reviews.llvm.org/D7698 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229589 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 23:40:58 +00:00
Sanjay Patel	4bf44517c8	rename variables again because these tables also deal with stores; NFC Suggestion by Simon Pilgrim git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229574 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 22:38:06 +00:00
Simon Pilgrim	cbc2ca5ec9	[X86][SSE] Generalised unpckl/unpckh shuffle matching Added commuted unpckl/unpckh shuffle matching patterns as many cases containing undefined lanes fail to commute by themselves. Differential Revision: http://reviews.llvm.org/D7564 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229571 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 22:24:32 +00:00
Sanjay Patel	0be02ef5b1	Add comment to explain a non-obvious setting; NFC. This is paraphrased from Simon Pilgrim's comment in: http://reviews.llvm.org/D7492 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229566 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 22:09:54 +00:00
Sanjay Patel	c3a976c935	remove function names from comments; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229558 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 21:55:20 +00:00
Sanjay Patel	a3a63972c5	replace meaningless variable names; NFCI git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229549 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 21:37:28 +00:00
Sanjay Patel	544843cee1	prevent folding a scalar FP load into a packed logical FP instruction (PR22371) Change the memory operands in sse12_fp_packed_scalar_logical_alias from scalars to vectors. That's what the hardware packed logical FP instructions define: 128-bit memory operands. There are no scalar versions of these instructions...because this is x86. Generating the wrong code (folding a scalar load into a 128-bit load) is still possible using the peephole optimization pass and the load folding tables. We won't completely solve this bug until we either fix the lowering in fabs/fneg/fcopysign and any other places where scalar FP logic is created or fix the load folding in foldMemoryOperandImpl() to make sure it isn't changing the size of the load. Differential Revision: http://reviews.llvm.org/D7474 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229531 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 20:08:21 +00:00
Benjamin Kramer	1a50a12b43	Prefer SmallVector::append/insert over push_back loops. Same functionality, but hoists the vector growth out of the loop. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229500 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 15:29:18 +00:00
Andrea Di Biagio	f1ad156ce0	[X86] Silence -Wsign-compare warnings. GCC 4.8 reported two new warnings due to comparisons between signed and unsigned integer expressions. The new warnings were accidentally introduced by revision 229480. Added explicit casts to silence the warnings. No functional change intended. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229488 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 11:20:11 +00:00
Elena Demikhovsky	199f58a198	AVX-512: changes in intel_ocl_bi calling conventions - added mask types v8i1 and v16i1 to possible function parameters - enabled passing 512-bit vectors in standard CC - added a test for KNL intel_ocl_bi conventions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229482 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 09:20:12 +00:00
Michael Kuperstein	e275542046	[X86] Combine vector anyext + and into a vector zext Vector zext tends to get legalized into a vector anyext, represented as a vector shuffle with an undef vector + a bitcast, that gets ANDed with a mask that zeroes the undef elements. Combine this into an explicit shuffle with a zero vector instead. This allows shuffle lowering to match it as a zext, instead of matching it as an anyext and emitting an explicit AND. This combine only covers a subset of the cases, but it's a start. Differential Revision: http://reviews.llvm.org/D7666 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229480 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 08:22:51 +00:00
Chandler Carruth	1e357351be	[x86] Teach the unpack lowering to try wider element unpacks. This allows it to match still more places where previously we would have to fall back on floating point shuffles or other more complex lowering strategies. I'm hoping to replace some of the hand-rolled unpack matching with this routine is it gets more and more clever. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229463 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 02:12:24 +00:00
Cameron McInally	cdddfe0cb3	[AVX512] Make 512b vector floating point rounds legal on AVX512. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229445 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 22:15:42 +00:00
Simon Pilgrim	0638f4e115	[X86][SSE] Add SSE MOVQ instructions to SSEPackedInt domain Patch to explicitly add the SSE MOVQ (rr,mr,rm) instructions to SSEPackedInt domain - prevents a number of costly domain switches. Differential Revision: http://reviews.llvm.org/D7600 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229439 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 21:50:56 +00:00
Craig Topper	4031c08c87	[X86] Remove the multiply by 8 that goes into the shift constant for X86ISD::VSHLDQ and X86ISD::VSRLDQ. This simplifies the pattern matching in isel and allows these nodes to become the patterns embedded in the instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229431 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 20:52:07 +00:00
Craig Topper	e124dc723b	[X86] Remove x86.avx2.psll.dq.bs and x86.avx2.psrl.dq.bs intrinsics. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229430 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 20:51:59 +00:00
Aaron Ballman	987d1055d3	We require MSVC 1800 as our minimum, so these checks can safely go away; NFC. (It seems this code has been copy/pasted around, unfortunately.) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229417 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 18:34:57 +00:00
Chandler Carruth	cbe6ecfc81	[x86] Add a generic unpack-targeted lowering technique. This can be used to generically lower blends and is particularly nice because it is available frome SSE2 onward. This removes a lot of the remaining domain crossing blends in SSE2 code. I'm hoping to replace some of the "interleaved" lowering hacks with something closer to this which should be more principled. First, this needs to learn how to detect and use other interleavings besides that of the natural type provided. That will be a follow-up patch though. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229378 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 12:28:18 +00:00
Chandler Carruth	29679ccc12	[x86] Add initial basic support for forming blends of v16i8 vectors. This blend instruction is ... really lame. The register usage is insane. As a consequence this is probably only barely better than 2 pshufbs followed by a por, and that mostly because it only has to read from a single memory location. However, this doesn't fix as much as I kind of expected, so more to go. Pretty sure that the ordering and delegation of v16i8 is just really, really bad. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229373 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 10:58:23 +00:00
Chandler Carruth	8b1a5559e9	[x86] Switch my usage of VariadicFunction to a "normal" variadic template now that we can use them. This is, of course, horribly ugly because of the required recursive formulation. Suggestions for making it less ugly welcome. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229367 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 09:59:48 +00:00
Craig Topper	74b9ad3485	[X86] Add support for lowering shuffles to 256-bit PALIGNR instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229359 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 06:29:06 +00:00
Chandler Carruth	454c3997b4	[x86] Teach the 128-bit vector shuffle lowering routines to take advantage of the existence of a reasonable blend instruction. The 256-bit vector shuffle lowering has leveraged the general technique of decomposed shuffles and blends for quite some time, but this never made it back into the 128-bit code, and there are a large number of patterns where this is substantially better. For example, this removes almost all domain crossing in vector shuffles that involve some blend and some permutation with SSE4.1 and later. See the massive reduction in 'shufps' for integer test cases in this commit. This isn't perfect yet for a few reasons: 1) The v8i16 shuffle lowering continues to plague me. We don't always form an unpack-based blend when that would be better. But the wins pretty drastically outstrip the losses here. 2) The v16i8 shuffle lowering is just a disaster here. I never went and implemented blend support here for some terrible reason. I'll do that next probably. I've not updated it for now. More variations on this technique are coming as well -- we don't shuffle-into-unpack or shuffle-into-palignr, both of which would also be profitable. Note that some test cases grow significantly in the number of instructions, but I expect to actually be faster. We use pshufd+pshufd+blendw instead of a single shufps, but the pshufd's are very likely to pipeline well (two ports on most modern intel chips) and the blend is a very fast instruction. The domain switch penalty will essentially always be more than a blend instruction, which is the only increase in tree height. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229350 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 01:52:02 +00:00
Aaron Ballman	66981fe208	Removing LLVM_DELETED_FUNCTION, as MSVC 2012 was the last reason for requiring the macro. NFC; LLVM edition. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229340 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 22:54:22 +00:00
Simon Pilgrim	ef06a9c53a	Coding style fixes to recent patches. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229312 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 14:19:29 +00:00
Simon Pilgrim	28f299b62d	[X86][AVX2] vpslldq/vpsrldq byte shifts for AVX2 This patch refactors the existing lowerVectorShuffleAsByteShift function to add support for 256-bit vectors on AVX2 targets. It also fixes a tablegen issue that prevented the lowering of vpslldq/vpsrldq vec256 instructions. Differential Revision: http://reviews.llvm.org/D7596 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229311 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 13:19:52 +00:00
Chandler Carruth	fbde8bffba	[x86] Teach the decomposed shuffle/blend lowering to use an early blend when that will allow it to lower with a single permute instead of multiple permutes. It tries to detect when it will only have to do a single permute in either case to maximize folding of loads and such. This cuts a lot of the avx2 shuffle permute counts in half. =] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229309 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 12:42:15 +00:00
Chandler Carruth	3d39845812	[x86] Teach the shuffle mask equivalence test to look through build vectors and detect equivalent inputs. This lets the code match unpck-style instructions when only one of the inputs are lined up but the other input is a splat and so which lanes we pull from doesn't matter. Today, this doesn't really happen, but just by accident. I have a patch that normalizes how we shuffle splats, and with that patch this will be necessary for a lot of the mask equivalence tests to work. I don't really know how to write a test case for this specific change until the other change lands though. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229307 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 12:07:55 +00:00
Chandler Carruth	23b34c287f	[x86] Tweak the ordering of unpack matching vs. element insertion, and don't try to do element insertion for non-zero-index floating point vectors. We don't have any useful patterns or lowering for element insertion into high elements of a floating point vector, and the generic shuffle lowering will end up being better -- namely it will fall back to unpck. But we should try to handle other forms of element insertion before matching unpck patterns. While this doesn't matter much right now, I'm working on a patch that makes unpck matching much more powerful, and that patch will break without this re-ordering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229306 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 12:01:14 +00:00
Chandler Carruth	52f1b6dbed	[x86] Stop shuffling zero vectors. =] I was somewhat surprised this pattern really came up, but it does. It seems better to just directly handle it than try to special case every place where we end up forming a shuffle that devolves to a shuffle of a zero vector. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229301 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 10:34:52 +00:00
Chandler Carruth	1a9c1dbe4d	[x86] Use a more helpful parenthesizing of these comparisons. Silences a -Wparentheses complaint from GCC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229300 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 10:15:20 +00:00
Chandler Carruth	46d3e580ed	[x86] When splitting 256-bit vectors into 128-bit vectors, don't extract subvectors from buildvectors. That doesn't really make any sense and it breaks all of the down-stream matching of buildvectors to cleverly lower shuffles. With this, we now get the shift-based lowering of 256-bit vector shuffles with AVX1 when we split them into 128-bit vectors. We also do much better on the zero-extension patterns, although there remains quite a bit of room for improvement here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229299 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 10:12:02 +00:00
Chandler Carruth	27acd682e0	[x86] Make computing the zeroable elements slightly more powerful, at least in theory. I don't actually have a test case that benefits from this, but theoretically, it could come up, and I don't want to try to think about whether this is the culprit or something else is, so I'd rather just make this code powerful. =/ Makes me sad that I can't really test it though. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229298 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 09:33:36 +00:00
Chandler Carruth	62ba2b29d8	[x86] Add a slight variation on some of the other generic shuffle lowerings -- one which decomposes into an initial blend followed by a permute. Particularly on newer chips, blends are handled independently of shuffles and so this is much less bottlenecked on the single port that floating point shuffles are executed with on Intel. I'll be adding this lowering to a bunch of other code paths in subsequent commits to handle still more places where we can effectively leverage blends when they're available in the ISA. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229292 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 08:26:30 +00:00
Craig Topper	9bb36ed8d8	[X86] Add assembly parser support for mnemonic aliases for AVX-512 vpcmp instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229287 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 07:13:48 +00:00
Craig Topper	09ea4e976b	[X86] Add assembler predicates for the rest of the AVX512 feature flags. This makes the assembly matching consistent across all AVX512 instructions. Without this we were allowing some AVX512 instructions to be parsed always, but not the foundation instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229280 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 04:54:55 +00:00
Craig Topper	e2f7231e45	[X86] Add the remaining 11 possible exact ModRM formats. This makes their encodings linear which can then be used to simplify some other code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229279 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 04:16:44 +00:00
Simon Pilgrim	6d5ee8a8b5	[X86][XOP] Enable commutation for XOP instructions Patch to allow XOP instructions (integer comparison and integer multiply-add) to be commuted. The comparison instructions sometimes require the compare mode to be flipped but the remaining instructions can use default commutation modes. This patch also sets the SSE domains of all the XOP instructions. Differential Revision: http://reviews.llvm.org/D7646 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229267 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-14 22:40:46 +00:00
Craig Topper	32f60795f5	[X86] Improve parsing support AVX/SSE floating point compare instruction mnemonic aliases. They'll now print with the alias the parser received instead of converting to the explicit immediate form. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229266 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-14 21:54:03 +00:00
Simon Pilgrim	ee03ed8187	Line ending fix. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229256 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-14 13:27:53 +00:00
Duncan P. N. Exon Smith	894c8c514a	X86: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229214 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-14 01:59:52 +00:00
Ahmed Bougacha	6a50342499	[X86] Factor out the CMOV pseudo definitions. NFCI. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229206 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-14 01:36:53 +00:00
Sanjay Patel	fa1b3ba1f0	[SSE/AVX] Use multiclasses to reduce the mass of scalar math patterns; NFCI This takes the preposterous number of patterns in this section that were last added to in r219033 down to just plain obnoxious. With a little more work, we might get this down to just comical. I've added more test cases to the existing file that checks these patterns, but it seems that some of these patterns simply don't exist with today's shuffle lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229158 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-13 21:52:42 +00:00
Sanjay Patel	b7458cc63a	fix typos; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229155 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-13 21:07:22 +00:00
Chandler Carruth	417c5c172c	[PM] Remove the old 'PassManager.h' header file at the top level of LLVM's include tree and the use of using declarations to hide the 'legacy' namespace for the old pass manager. This undoes the primary modules-hostile change I made to keep out-of-tree targets building. I sent an email inquiring about whether this would be reasonable to do at this phase and people seemed fine with it, so making it a reality. This should allow us to start bootstrapping with modules to a certain extent along with making it easier to mix and match headers in general. The updates to any code for users of LLVM are very mechanical. Switch from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h". Qualify the types which now produce compile errors with "legacy::". The most common ones are "PassManager", "PassManagerBase", and "FunctionPassManager". git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229094 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-13 10:01:29 +00:00
Craig Topper	f3455f13a2	[X86] Add support for parsing and printing the mnemonic aliases for the XOP VPCOM instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229078 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-13 07:42:25 +00:00
Craig Topper	c5222f156f	Fix a typo in a comment. NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229071 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-13 06:07:29 +00:00
Craig Topper	db9343fb40	[X86] Remove int_x86_sse2_psll_dq_bs and int_x86_sse2_psrl_dq_bs intrinsics. The builtins aren't used by clang. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229069 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-13 06:07:24 +00:00
David Majnemer	73a92d5136	X86: Don't crash if we can't decode the pshufb mask Constant pool entries are uniqued by their contents regardless of their type. This means that a pshufb can have a shuffle mask which isn't a simple array of bytes. The code path which attempts to decode the mask didn't check for failure, causing PR22559. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228979 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-12 23:26:26 +00:00
Simon Pilgrim	00481c20de	Relaxed over-zealous alignment requirement for VEX-encoded AES instructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228953 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-12 20:01:03 +00:00
Benjamin Kramer	d913d9d2c3	MathExtras: Bring Count(Trailing\|Leading)Ones and CountPopulation in line with countTrailingZeros Update all callers. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228930 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-12 15:35:40 +00:00
Michael Kuperstein	fb107d8bf0	[X86] Call frame optimization - allow stack-relative movs to be folded into a push Since we track esp precisely, there's no reason not to allow this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228924 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-12 14:17:35 +00:00
Elena Demikhovsky	f41b8e3e49	AVX-512: Fixed the "test" operation for i1 type Using KORTESTW for comparison i1 value with zero was wrong since the instruction tests 16 bits. KORTESTW may be used with KSHIFTL+KSHIFTR that clean the 15 upper bits. I removed (X86cmp i1, 0) pattern and zero-extend i1 to i8 and then use TESTB. There are some cases where i1 is in the mask register and the upper bits are already zeroed. Then KORTESTW is the better solution, but it is subject for optimization. Meanwhile, I'm fixing the correctness issue. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228916 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-12 08:40:34 +00:00
Michael Kuperstein	fd98d3be55	[X86] A heuristic to estimate the size impact for converting stack-relative parameter movs to pushes This gives a rough estimate of whether using pushes instead of movs is profitable, in terms of size. We go over all calls in the MachineFunction and compute: a) For each callsite that can not use pushes, the penalty of not having a reserved call frame. b) For each callsite that can use pushes, the gain of actually replacing the movs with pushes (and the potential penalty of having to readjust the stack). Differential Revision: http://reviews.llvm.org/D7561 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228915 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-12 08:36:35 +00:00
Michael Kuperstein	0686b8affc	[X86] Split information collection from actual transformation in call frame optimization This splits collecting information from actually performing the transformation, so that we can add a heuristic in between the two. NFC. Differential Revision: http://reviews.llvm.org/D7497 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228817 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-11 08:53:55 +00:00
David Majnemer	f2138c2df8	X86: @llvm.frameaddress should defer to SelectionDAG for Win CFI git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228754 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-10 22:00:34 +00:00
David Majnemer	420f72a301	X86: Make @llvm.frameaddress work correctly with Windows unwind codes Simply loading or storing the frame pointer is not sufficient for Windows targets. Instead, create a synthetic frame object that we will lower later. References to this synthetic object will be replaced with the correct reference to the frame address. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228748 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-10 21:22:05 +00:00
David Majnemer	3163865f01	X86: Emit Win64 SaveXMM opcodes at the right offset in the right order Walk the instructions marked FrameSetup and consider any stores of XMM registers to the stack as needing a SaveXMM opcode. This fixes PR22521. Differential Revision: http://reviews.llvm.org/D7527 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228724 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-10 19:01:47 +00:00
Simon Pilgrim	c99d58d6c1	[X86][AVX2] Missing AVX2 memory folding instructions Added most of the missing vector folding patterns for AVX2 (as well as fixing the vpermpd and verpmq patterns) Differential Revision: http://reviews.llvm.org/D7492 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228688 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-10 13:22:57 +00:00
Simon Pilgrim	8bcc093da5	[X86][XOP] Added XOP memory folding patterns + tests This patch adds the complete AMD Bulldozer XOP instruction set to the memory folding pattern tables for stack folding, etc. Note: Many of the XOP instructions have multiple table entries as it can fold loads from different sources. Differential Revision: http://reviews.llvm.org/D7484 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228685 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-10 12:57:17 +00:00
Andrea Di Biagio	bd1729e5d4	[X86][FastIsel] Avoid introducing legacy SSE instructions if the target has AVX. This patch teaches X86FastISel how to select AVX instructions for scalar float/double convert operations. Before this patch, X86FastISel always selected legacy SSE instructions for FPExt (from float to double) and FPTrunc (from double to float). For example: \code define double @foo(float %f) { %conv = fpext float %f to double ret double %conv } \end code Before (with -mattr=+avx -fast-isel) X86FastIsel selected a CVTSS2SDrr which is legacy SSE: cvtss2sd %xmm0, %xmm0 With this patch, X86FastIsel selects a VCVTSS2SDrr instead: vcvtss2sd %xmm0, %xmm0, %xmm0 Added test fast-isel-fptrunc-fpext.ll to check both the register-register and the register-memory float/double conversion variants. Differential Revision: http://reviews.llvm.org/D7438 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228682 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-10 12:04:41 +00:00
Craig Topper	77b557430c	[X86] Preserve mem refs on newly created 'Store' node instead of 'Load' node when handling store unfolding. Bug spotted by Steve King. I have no idea how to test this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228672 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-10 06:29:28 +00:00
Craig Topper	5fc4b96e62	[X86] Remove unnecessary alignment checks from the load folding tables. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228671 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-10 05:10:50 +00:00
David Majnemer	69114ee016	X86: Emit an ABI compliant prologue and epilogue for Win64 Win64 has specific contraints on what valid prologues and epilogues look like. This constraint is born from the flexibility and descriptiveness of Win64's unwind opcodes. Prologues previously emitted by LLVM could not be represented by the unwind opcodes, preventing operations powered by stack unwinding to successfully work. Differential Revision: http://reviews.llvm.org/D7520 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228641 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-10 00:57:42 +00:00
Sanjay Patel	50c61d2569	rename variable to give it some meaning; remove obvious comments; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228579 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-09 16:30:58 +00:00
Sanjay Patel	eed74400b1	fix comment that didn't match the code; remove unnecessary braces; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228578 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-09 16:04:52 +00:00
Craig Topper	cc5e6d56fc	[X86] Remove 256-bit and 512-bit memop pattern fragments. They are no longer used. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228563 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-09 04:04:53 +00:00
Craig Topper	bd477dfbbf	[X86] Remove 'memop' uses from AVX512. Use 'load' instead. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228562 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-09 04:04:50 +00:00
Craig Topper	3824fd3a25	[X86] Remove the remaining uses of memop from AVX and AVX2 instruction patterns. AVX and AVX2 can handle unaligned loads being folded so we can just use 'load' git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228551 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-08 22:38:25 +00:00
Sanjay Patel	b3d4cc50ca	fix typos; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228529 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-08 18:54:22 +00:00
Simon Pilgrim	2ba70e81a4	Moved AVX2 vbroadcast (reg) instruction foldings under the correct grouping. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228526 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-08 17:13:54 +00:00
Craig Topper	3e7edda4aa	[X86] Add register use/def for wrmsr and rdmsr. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228515 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-07 23:36:51 +00:00
Craig Topper	e15d286e83	[X86] Add GETSEC instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228514 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-07 23:36:36 +00:00
Simon Pilgrim	2134ae7f38	[X86][AVX] Added missing stack folding support + test for vptest ymm instruction git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228509 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-07 21:44:06 +00:00
Andrea Di Biagio	0e0dfd99f9	Fix typos; NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228493 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-07 13:56:20 +00:00
Sanjay Patel	af0a07822e	use local variables; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228452 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-06 22:43:52 +00:00
Reid Kleckner	6dc42dd2da	Don't dllexport declarations Fixes PR22488 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228411 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-06 17:59:49 +00:00
Benjamin Kramer	e003f1ac8c	Make helper functions/classes/globals static. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228410 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-06 17:51:54 +00:00

1 2 3 4 5 ...

11381 Commits