llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2024-08-31 12:30:06 +00:00

Author	SHA1	Message	Date
Sanjay Patel	4f00dcbf22	make the tested feature (SSE2) explicit git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230881 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-28 23:55:24 +00:00
Sanjay Patel	be657e70b1	fixed to test only the feature, not the feature and a CPU git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230878 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-28 23:47:09 +00:00
Chandler Carruth	ac2b1a1bb3	[x86] Add support for bit-wise blending and use it in the v8 and v16 lowering paths. I'm going to be leveraging this to simplify a lot of the overly complex lowering of v8 and v16 shuffles in pre-SSSE3 modes. Sadly, this isn't profitable on v4i32 and v2i64. There, the float and double blending instructions for pre-SSE4.1 are actually pretty good, and we can't beat them with bit math. And once SSE4.1 comes around we have direct blending support and this ceases to be relevant. Also, some of the test cases look odd because the domain fixer canonicalizes these to floating point domain. That's OK, it'll use the integer domain when it matters and some day I may be able to update enough of LLVM to canonicalize the other way. This restores almost all of the regressions from teaching x86's vselect lowering to always use vector shuffle lowering for blends. The remaining problems are because the v16 lowering path is still doing crazy things. I'll be re-arranging that strategy in more detail in subsequent commits to finish recovering the performance here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229836 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 10:46:52 +00:00
Chandler Carruth	a8fb39af83	[x86,sdag] Two interrelated changes to the x86 and sdag code. First, don't combine bit masking into vector shuffles (even ones the target can handle) once operation legalization has taken place. Custom legalization of vector shuffles may exist for these patterns (making the predicate return true) but that custom legalization may in some cases produce the exact bit math this matches. We only really want to handle this prior to operation legalization. However, the x86 backend, in a fit of awesome, relied on this. What it would do is mark VSELECTs as expand, which would turn them into arithmetic, which this would then match back into vector shuffles, which we would then lower properly. Amazing. Instead, the second change is to teach the x86 backend to directly form vector shuffles from VSELECT nodes with constant conditions, and to mark all of the vector types we support lowering blends as shuffles as custom VSELECT lowering. We still mark the forms which actually support variable blends as legal so that the custom lowering is bypassed, and the legal lowering can even be used by the vector shuffle legalization (yes, i know, this is confusing. but that's how the patterns are written). This makes the VSELECT lowering much more sensible, and in fact should fix a bunch of bugs with it. However, as you'll see in the test cases, right now what it does is point out the hilarious deficiency of the new vector shuffle lowering when it comes to blends. Fortunately, my very next patch fixes that. I can't submit it yet, because that patch, somewhat obviously, forms the exact and/or pattern that the DAG combine is matching here! Without this patch, teaching the vector shuffle lowering to produce the right code infloops in the DAG combiner. With this patch alone, we produce terrible code but at least lower through the right paths. With both patches, all the regressions here should be fixed, and a bunch of the improvements (like using 2 shufps with no memory loads instead of 2 andps with memory loads and an orps) will stay. Win! There is one other change worth noting here. We had hilariously wrong vectorization cost estimates for vselect because we fell through to the code path that assumed all "expand" vector operations are scalarized. However, the "expand" lowering of VSELECT is vector bit math, most definitely not scalarized. So now we go back to the correct if horribly naive cost of "1" for "not scalarized". If anyone wants to add actual modeling of shuffle costs, that would be cool, but this seems an improvement on its own. Note the removal of 16 and 32 "costs" for doing a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of course, we don't right now because of OMG bad code, but I'm going to fix that. Next patch. I promise. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229835 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 10:36:19 +00:00
Andrew Trick	c4ae8cbc5d	X86 ABI fix for return values > 24 bytes. The return value's address must be returned in %rax. i.e. the callee needs to copy the sret argument (%rdi) into the return value (%rax). This probably won't manifest as a bug when the caller is LLVM-compiled code. But it is an ABI guarantee and tools expect it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228321 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-05 18:09:05 +00:00
Chandler Carruth	b0589710cc	[x86] Give movss and movsd execution domains in the x86 backend. This associates movss and movsd with the packed single and packed double execution domains (resp.). While this is largely cosmetic, as we now don't have weird ping-pong-ing between single and double precision, it is also useful because it avoids the domain fixing algorithm from seeing domain breaks that don't actually exist. It will also be much more important if we have an execution domain default other than packed single, as that would cause us to mix movss and movsd with integer vector code on a regular basis, a very bad mixture. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228135 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-04 10:58:53 +00:00
Chandler Carruth	424a198c30	[x86] Mechanically update a bunch of tests' check lines using the latest version of the script. Changes include: - Using the VEX prefix - Skipping more detail when we have useful shuffle comments to match - Matching more shuffle comments that have been added to the printer (yay!) - Matching the destination registers of some AVX instructions - Stripping trailing whitespace that crept in - Fixing indentation issues Nothing interesting going on here. I'm just trying really hard to ensure these changes don't show up in the diffs with actual changes to the backend. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228132 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-04 10:46:53 +00:00
Tim Northover	f5f8a3e6a6	DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N)) It can help with argument juggling on some targets, and is generally a good idea. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226740 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-21 23:17:19 +00:00
Tim Northover	c49e57ade1	Revert "DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))" It hadn't gone through review yet, but was still on my local copy. This reverts commit r226663 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226665 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-21 15:48:52 +00:00
Tim Northover	47f47f5d2a	DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N)) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226663 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-21 15:43:28 +00:00
Craig Topper	04a45948a0	Improve logic that decides if its profitable to commute when some of the virtual registers involved have uses/defs chains connecting them to physical register. Fix up the tests that this change improves. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221336 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-05 06:43:02 +00:00
Chandler Carruth	03a77831cc	[x86] Enable the new vector shuffle lowering by default. Update the entire regression test suite for the new shuffles. Remove most of the old testing which was devoted to the old shuffle lowering path and is no longer relevant really. Also remove a few other random tests that only really exercised shuffles and only incidently or without any interesting aspects to them. Benchmarking that I have done shows a few small regressions with this on LNT, zero measurable regressions on real, large applications, and for several benchmarks where the loop vectorizer fires in the hot path it shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy Bridge machines. Running on AMD machines shows even more dramatic improvements. When using newer ISA vector extensions the gains are much more modest, but the code is still better on the whole. There are a few regressions being tracked (PR21137, PR21138, PR21139) but by and large this is expected to be a win for x86 generated code performance. It is also more correct than the code it replaces. I have fuzz tested this extensively with ISA extensions up through AVX2 and found no crashes or miscompiles (yet...). The old lowering had a few miscompiles and crashers after a somewhat smaller amount of fuzz testing. There is one significant area where the new code path lags behind and that is in AVX-512 support. However, there was extremely little support for that already and so this isn't a significant step backwards and the new framework will probably make it easier to implement lowering that uses the full power of AVX-512's table-based shuffle+blend (IMO). Many thanks to Quentin, Andrea, Robert, and others for benchmarking assistance. Thanks to Adam and others for help with AVX-512. Thanks to Hal, Eric, and many others for answering my incessant questions about how the backend actually works. =] I will leave the old code path in the tree until the 3 PRs above are at least resolved to folks' satisfaction. Then I will rip it (and 1000s of lines of code) out. =] I don't expect this flag to stay around for very long. It may not survive next week. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219046 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-04 03:52:55 +00:00
Chandler Carruth	2470f448ac	[x86] Regenerate precise FileCheck lines for the lats batch of test cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218954 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-03 01:57:38 +00:00
Benjamin Kramer	daada81e5c	DAGCombiner: Make concat_vector combine safe for EVTs and concat_vectors with many arguments. PR20677 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216175 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-21 13:28:02 +00:00
Filipe Cabecinhas	d99cefbad1	Convert a vselect into a concat_vector if possible Summary: If both vector args to vselect are concat_vectors and the condition is constant and picks half a vector from each argument, convert the vselect into a concat_vectors. Added a test. The ConvertSelectToConcatVector is assuming it doesn't get vselects with arguments of, for example, <undef, undef, true, true>. Those get taken care of in the checks above its call. Reviewers: nadav, delena, grosbach, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3916 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209929 91177308-0d34-0410-b5e6-96231b3b80d8	2014-05-30 23:03:11 +00:00
Andrea Di Biagio	825b93b2df	[X86] Teach how to combine a vselect into a movss/movsd Add target specific rules for combining vselect dag nodes into movss/movsd when possible. If the vector type of the vselect dag node in input is either MVT::v4i13 or MVT::v4f32, then try to fold according to rules: 1) fold (vselect (build_vector (0, -1, -1, -1)), A, B) -> (movss A, B) 2) fold (vselect (build_vector (-1, 0, 0, 0)), A, B) -> (movss B, A) If the vector type of the vselect dag node in input is either MVT::v2i64 or MVT::v2f64 (and we have SSE2), then try to fold according to rules: 3) fold (vselect (build_vector (0, -1)), A, B) -> (movsd A, B) 4) fold (vselect (build_vector (-1, 0)), A, B) -> (movsd B, A) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199683 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-20 19:35:22 +00:00
Andrea Di Biagio	638e97f135	Teach the DAGCombiner how to fold 'vselect' dag nodes according to the following two rules: 1) fold (vselect (build_vector AllOnes), A, B) -> A 2) fold (vselect (build_vector AllZeros), A, B) -> B git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198777 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-08 18:33:04 +00:00
NAKAMURA Takumi	d8e67feaf2	llvm/test/CodeGen/X86/vselect.ll: Unbreak Windows x64 targets to add -mtriple=x86_64-unknown-unknown. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198114 91177308-0d34-0410-b5e6-96231b3b80d8	2013-12-28 13:04:29 +00:00
Andrea Di Biagio	b2f47c6a34	Teach DAGCombiner how to fold a SIGN_EXTEND_INREG of a BUILD_VECTOR of ConstantSDNodes (or UNDEFs) into a simple BUILD_VECTOR. For example, given the following sequence of dag nodes: i32 C = Constant<1> v4i32 V = BUILD_VECTOR C, C, C, C v4i32 Result = SIGN_EXTEND_INREG V, ValueType:v4i1 The SIGN_EXTEND_INREG node can be folded into a build_vector since the vector in input is a BUILD_VECTOR of constants. The optimized sequence is: i32 C = Constant<-1> v4i32 Result = BUILD_VECTOR C, C, C, C git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198084 91177308-0d34-0410-b5e6-96231b3b80d8	2013-12-27 20:20:28 +00:00

19 Commits