llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2025-02-25 03:30:37 +00:00

Author	SHA1	Message	Date
Chandler Carruth	8415f84e49	[x86] Fix a really terrible bug in the repeated 128-bin-lane shuffle detection. It was incorrectly handling undef lanes by actually treating an undef lane in the first 128-bit lane as a numeric shuffle value. Fortunately, this almost always DTRT and disabled detecting repeated patterns. But not always. =/ This patch introduces a much more principled approach and fixes the miscompiles I spotted by inspection previously. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218346 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 01:03:57 +00:00
Chandler Carruth	30ce74b5e3	[x86] Teach the new vector shuffle lowering to lower v4i64 vector shuffles using the AVX2 instructions. This is the first step of cutting in real AVX2 support. Note that I have spotted at least one bug in the test cases already, but I suspect it was already present and just is getting surfaced. Will investigate next. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218338 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 22:39:02 +00:00
Chandler Carruth	798f2849c3	[x86] Teach the rest of the 'target shuffle' machinery about blends and add VPBLENDD to the InstPrinter's comment generation so we get nice comments everywhere. Now that we have the nice comments, I can see the bug introduced by a silly typo in the commit that enabled VPBLENDD, and have fixed it. Yay tests that are easy to inspect. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218335 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 22:14:14 +00:00
Robin Morisset	30e7514d01	[X86] Make wide loads be managed by AtomicExpand Summary: AtomicExpand already had logic for expanding wide loads and stores on LL/SC architectures, and for expanding wide stores on CmpXchg architectures, but not for wide loads on CmpXchg architectures. This patch fills this hole, and makes use of this new feature in the X86 backend. Only one functionnal change: we now lose the SynchScope attribute. It is regrettable, but I have another patch that I will submit soon that will solve this for all of AtomicExpand (it seemed better to split it apart as it is a different concern). Test Plan: make check-all (lots of tests for this functionality already exist) Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5404 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218332 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 20:59:25 +00:00
Chandler Carruth	7024c7e949	[x86] Teach the new shuffle lowering's blend functionality to use AVX2's VPBLENDD where appropriate even on 128-bit vectors. According to Agner's tables, this instruction is significantly higher throughput (can execute on any port) on Haswell chips so we should aggressively try to form it when available. Sadly, this loses our delightful shuffle comments. I'll add those back for VPBLENDD next. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218322 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 18:16:12 +00:00
Chandler Carruth	4850be49a3	[x86] Teach the vector comment parsing and printing to correctly handle undef in the shuffle mask. This shows up when we're printing comments during lowering and we still have an IR-level constant hanging around that models undef. A nice consequence of this is much prettier test cases where the undef lanes actually show up as undef rather than as a particular set of values. This also allows us to print shuffle comments in cases that use undef such as the recently added variable VPERMILPS lowering. Now those test cases have nice shuffle comments attached with their details. The shuffle lowering for PSHUFB has been augmented to use undef, and the shuffle combining has been augmented to comprehend it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218301 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 11:15:19 +00:00
Chandler Carruth	8f637786d8	[x86] Teach the AVX1 path of the new vector shuffle lowering one more trick that I missed. VPERMILPS has a non-immediate memory operand mode that allows it to do asymetric shuffles in the two 128-bit lanes. Use this rather than two shuffles and a blend. However, it turns out the variable shuffle path to VPERMILPS (and VPERMILPD, although that one offers no functional differenc from the immediate operand other than variability) wasn't even plumbed through codegen. Do such plumbing so that we can reasonably emit a variable-masked VPERMILP instruction. Also plumb basic comment parsing and printing through so that the tests are reasonable. There are still a few tests which don't show the shuffle pattern. These are tests with undef lanes. I'll teach the shuffle decoding and printing to handle undef mask entries in a follow-up. I've looked at the masks and they seem reasonable. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218300 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 10:08:29 +00:00
Chandler Carruth	4b365159bf	[x86] Rename X86ISD::VPERMILP to X86ISD::VPERMILPI (and the same for the td pattern). Currently we only model the immediate operand variation of VPERMILPS and VPERMILPD, we should make that clear in the pseudos used. Will be adding support for the variable mask variant in my next commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218282 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 22:29:42 +00:00
Kaelyn Takata	cdc451b1ae	Fix a "typo" from my previous commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218281 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 22:17:59 +00:00
Kaelyn Takata	1488ba63fe	Silence unused variable warnings in the new stub functions that occur when assertions are disabled. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218280 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 22:14:13 +00:00
Chandler Carruth	8571ae37ae	[x86] Stub out the integer lowering of 256-bit vectors with AVX2 support. No interesting functionality yet, but this will let me implement one vector type at a time. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218277 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 21:45:57 +00:00
Sanjay Patel	6539887847	Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2). We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors. This patch should preserve all existing behavior with regular optimization levels, but also use splats whenever possible when optimizing for size on any CPU with AVX or AVX2. The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save at least 8 bytes (up to 31 bytes) of constant pool data. Differential Revision: http://reviews.llvm.org/D5347 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218263 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 18:54:01 +00:00
Pavel Chupin	25c57d5cfe	[x32] Fix segmented stacks support Summary: Update segmented-stacks*.ll tests with x32 target case and make corresponding changes to make them pass. Test Plan: tests updated with x32 target Reviewers: nadav, rafael, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5245 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218247 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 13:11:35 +00:00
Chandler Carruth	de95c380c7	[x86] Back out a bad choice about lowering v4i64 and pave the way for a more sane approach to AVX2 support. Fundamentally, there is no useful way to lower integer vectors in AVX. None. We always end up with a VINSERTF128 in the end, so we might as well eagerly switch to the floating point domain and do everything there. This cleans up lots of weird and unlikely to be correct differences between integer and floating point shuffles when we only have AVX1. The other nice consequence is that by doing things this way we will make it much easier to write the integer lowering routines as we won't need to duplicate the logic to check for AVX vs. AVX2 in each one -- if we actually try to lower a 256-bit vector as an integer vector, we have AVX2 and can rely on it. I think this will make the code much simpler and more comprehensible. Currently, I've disabled all support for AVX2 so that we always fall back to AVX. This keeps everything working rather than asserting. That will go away with the subsequent series of patches that provide a baseline AVX2 implementation. Please note, I'm going to implement AVX2 without access to hardware. That means I cannot correctness test this path. I will be relying on those with access to AVX2 hardware to do correctness testing and fix bugs here, but as a courtesy I'm trying to sketch out the framework for the new-style vector shuffle lowering in the context of the AVX2 ISA. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218228 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 00:32:15 +00:00
Chandler Carruth	37bb4b0365	[x86] Teach the new vector shuffle lowering how to cleverly lower single input v8f32 shuffles which are not 128-bit lane crossing but have different shuffle patterns in the low and high lanes. This removes most of the extract/insert traffic that was unnecessary and is particularly good at lowering cases where only one of the two lanes is shuffled at all. I've also added a collection of test cases with undef lanes because this lowering is somewhat more sensitive to undef lanes than others. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218226 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 23:46:13 +00:00
Chandler Carruth	974e872b03	[x86] With the stronger canonicalization of shuffles added in r218216, the new vector shuffle lowering no longer needs to check both symmetric forms of UNPCK patterns for v4f64. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218217 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 13:37:51 +00:00
Chandler Carruth	974542d7d8	[x86] Teach the new vector shuffle lowering to re-use the SHUFPS lowering when it can use a symmetric SHUFPS across both 128-bit lanes. This required making the SHUFPS lowering tolerant of other vector types, and adjusting our canonicalization to canonicalize harder. This is the last of the clever uses of symmetry I've thought of for v8f32. The rest of the tricks I'm aware of here are to work around assymetry in the mask. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218216 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 13:35:14 +00:00
Chandler Carruth	38e181630a	[x86] Refactor the logic to form SHUFPS instruction patterns to lower a generic vector shuffle mask into a helper that isn't specific to the other things that influence which choice is made or the specific types used with the instruction. No functionality changed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218215 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 13:03:00 +00:00
Chandler Carruth	1a5f7f54f4	[x86] Teach the new vector shuffle lowering the basics about insertion of a single element into a zero vector for v4f64 and v4i64 in AVX. Ironically, there is less to see here because xor+blend is so crazy fast that we can't really beat that to zero the high 128-bit lane. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218214 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 12:49:46 +00:00
Chandler Carruth	6ef31b0079	[x86] Teach the new vector shuffle lowering how to lower to UNPCKLPS and UNPCKHPS with AVX vectors by recognizing those patterns when they are repeated for both 128-bit lanes. With this, we now generate the exact same (really nice) code for Quentin's avx_test_case.ll which was the most significant regression reported for the new shuffle lowering. In fact, I'm out of specific test cases for AVX lowering, the rest were AVX2 I think. However, there are a bunch of pretty obvious remaining things to improve with AVX... git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218213 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 12:20:44 +00:00
Chandler Carruth	7922d3e39a	[x86] Begin teaching the new vector shuffle lowering among the most important bits of cleverness: to detect and lower repeated shuffle patterns between the two 128-bit lanes with a single instruction. This patch just teaches it how to lower single-input shuffles that fit this model using VPERMILPS. =] There is more that needs to happen here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218211 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 12:01:19 +00:00
Chandler Carruth	fdaf59e9b1	[x86] Explicitly lower to a blend early if it is trivial to do so for v8f32 shuffles in the new vector shuffle lowering code. This is very cheap to do and makes it much more clear that anything more expensive but overlapping with this lowering should be selected afterward (for example using AVX2's VPERMPS). However, no functionality changed here as without this code we would fall through to create no-op shuffles of each input and a blend. =] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218209 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 11:40:39 +00:00
Chandler Carruth	29720a4bad	[x86] Teach the new vector shuffle lowering of v4f64 to prefer a direct VBLENDPD over using VSHUFPD. While the 256-bit variant of VBLENDPD slows down to the same speed as VSHUFPD on Sandy Bridge CPUs, it has twice the reciprocal throughput on Ivy Bridge CPUs much like it does everywhere for 128-bits. There isn't a downside, so just eagerly use this instruction when it suffices. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218208 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 11:17:55 +00:00
Chandler Carruth	25089558f2	[x86] Switch the blend implementation to use a MVT switch rather than awkward conditions. The readability improvement of this will be even more important as I generalize it to handle more types. No functionality changed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218205 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 10:36:12 +00:00
Chandler Carruth	4127d76566	[x86] Remove some essentially lying comments from the v4f64 path of the new vector shuffle lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218204 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 10:27:14 +00:00
Chandler Carruth	05a8a724e2	[x86] Fix a helper to reflect that what we actually care about is 128-bit lane crossings, not 'half' crossings. This came up in code review ages ago, but I hadn't really addresesd it. Also added some documentation for the helper. No functionality changed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218203 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 09:35:25 +00:00
Chandler Carruth	291140b112	[x86] Teach the new vector shuffle lowering the first step toward more actual support for complex AVX shuffling tricks. We can do independent blends of the low and high 128-bit lanes of an avx vector, so shuffle the inputs into place and then do the blend at 256 bits. This will in many cases remove one blend instruction. The next step is to permute the low and high halves in-place rather than extracting them and re-inserting them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218202 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 09:35:22 +00:00
Chandler Carruth	ae464b2ba1	[x86] Teach the new vector shuffle lowering to use VPERMILPD for single-input shuffles with doubles. This allows them to fold memory operands into the shuffle, etc. This is just the analog to the v4f32 case in my prior commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218193 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-20 22:09:27 +00:00
Chandler Carruth	9c7ffd20df	[x86] Teach the new vector shuffle lowering to use the AVX VPERMILPS instruction for single-vector floating point shuffles. This in turn allows the shuffles to fold a load into the instruction which is one of the common regressions hit with the new shuffle lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218190 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-20 20:52:07 +00:00
Chandler Carruth	c16105b078	[x86] Teach the v4f32 path of the new shuffle lowering to handle the tricky case of single-element insertion into the zero lane of a zero vector. We can't just use the same pattern here as we do in every other vector type because the general insertion logic can handle insertion into the non-zero lane of the vector. However, in SSE4.1 with v4f32 vectors we have INSERTPS that is a much better choice than the generic one for such lowerings. But INSERTPS can do lots of other lowerings as well so factoring its logic into the general insertion logic doesn't work very well. We also can't just extract the core common part of the general insertion logic that is faster (forming VZEXT_MOVL synthetic nodes that lower to MOVSS when they can) because VZEXT_MOVL is often faster than a blend while INSERTPS is slower! So instead we do a restrictive condition on attempting to use the generic insertion logic to narrow it to those cases where VZEXT_MOVL won't need a shuffle afterward and thus will do better than INSERTPS. Then we try blending. Then we go back to INSERTPS. This still doesn't generate perfect code for some silly reasons that can be fixed by tweaking the td files for lowering VZEXT_MOVL to use XORPS+BLENDPS when available rather than XORPS+MOVSS when the input ends up in a register rather than a load from memory -- BLENDPSrr has twice the reciprocal throughput of MOVSSrr. Don't you love this ISA? git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218177 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-20 04:15:22 +00:00
Chandler Carruth	9ba9f1a7e6	[x86] Refactor the code for emitting INSERTPS to reuse the zeroable mask analysis used elsewhere. This removes the last duplicate of this logic. Also simplify the code here quite a bit. No functionality changed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218176 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-20 03:57:01 +00:00
Chandler Carruth	cc62abbe39	[x86] Generalize the single-element insertion lowering to work with floating point types and use it for both v2f64 and v2i64 single-element insertion lowering. This fixes the last non-AVX performance regression test case I've gotten of for the new vector shuffle lowering. There is obvious analogous lowering for v4f32 that I'll add in a follow-up patch (because with INSERTPS, v4f32 requires special treatment). After that, its AVX stuff. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218175 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-20 03:32:25 +00:00
Chandler Carruth	8924ed3db4	[x86] Replace some duplicated logic reasoning about whether particular vector lanes can be modeled as zero with a call to the new function that computes a bit-vector representing that information. No functionality changed here, but will allow doing more clever things with the zero-test. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218174 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-20 02:44:21 +00:00
Chandler Carruth	f7ca3552ff	[x86] Hoist a function up to the rest of the non-type-specific lowering helpers, and re-flow the logic to use early exit and be a bit more readable. No functionality changed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218155 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 21:52:10 +00:00
Chandler Carruth	401b720aa8	[x86] Hoist the actual lowering logic into a helper function to separate it from the shuffle pattern matching logic. Also cleaned up variable names, comments, etc. No functionality changed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218152 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 21:20:08 +00:00
Chandler Carruth	dc58d1e099	[x86] Fully generalize the zext lowering in the new vector shuffle lowering to support both anyext and zext and to custom lower for many different microarchitectures. Using this allows us to get exactly the right code for zext and anyext shuffles in all the vector sizes. For v16i8, the improvement is huge. The new SSE2 test case added I refused to add before this because it was sooooo muny instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218143 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 20:00:32 +00:00
Chandler Carruth	89436b4160	[x86] Recognize that we can use duplication to widen v16i8 shuffles due to undef lanes as well as defined widenable lanes. This dramatically improves the lowering we use for undef-shuffles in a zext-ish pattern for SSE2. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218115 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 09:45:21 +00:00
Chandler Carruth	ec1f7b1c87	[x86] Teach the new vector shuffle lowering to also use pmovzx for v4i32 shuffles that are zext-ing. Not a lot to see here; the undef lane variant is better handled with pshufd, but this improves the actual zext pattern. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218112 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 08:37:44 +00:00
Chandler Carruth	330aa6fd6b	[x86] Add a dedicated lowering path for zext-compatible vector shuffles to the new vector shuffle lowering code. This allows us to emit PMOVZX variants consistently for patterns where it is a viable lowering. This instruction is both fast and allows us to fold loads into it. This only hooks the new lowering up for i16 and i8 element widths, mostly so I could manage the change to the tests. I'll add the i32 one next, although it is significantly less interesting. One thing to note is that we already had some tests for these patterns but those tests had far less horrible instructions. The problem is that those tests weren't checking the strict start and end of the instruction sequence. =[ As a consequence something changed in the lowering making us generate TERRIBLE code for these patterns in SSE2 through SSSE3. I've consolidated all of the tests and spelled out the madness that we currently emit for these shuffles. I'm going to try to figure out what has gone wrong here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218102 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 06:07:49 +00:00
Chandler Carruth	72f0d9515e	[x86] Use PALIGNR for v4i32 and v2i64 blends when appropriate. There is no purpose in using it for single-input shuffles as pshufd is just as fast and doesn't tie the two operands. This removes a substantial amount of wrong-domain blend operations in SSSE3 mode. It also completes the usage of PALIGNR for integer shuffles and addresses one of the test cases Quentin hit with the new vector shuffle lowering. There is still the question of whether and when to use this for floating point shuffles. It is faster than shufps or shufpd but in the integer domain. I don't yet really have a good heuristic here for when to use this instruction for floating point vectors. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218038 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 09:00:25 +00:00
Chandler Carruth	3ff76847ba	[x86] Initial step of teaching the new vector shuffle lowering about PALIGNR. This just adds it to the v8i16 and v16i8 lowering steps where it is completely unmatched. It also introduces the logic for detecting rotation shuffle masks even in the presence of single input or blend masks and arbitrarily undef lanes. I've added fairly comprehensive tests for the matching logic in v8i16 because the tests at that size are much easier to write and manage. I've not checked the SSE2 code generated for these tests because the code is horrible. It is absolute madness. Testing it will just make the test brittle without giving any interesting improvements in the correctness confidence. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218013 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 04:11:29 +00:00
Pavel Chupin	780f7e2168	[x32] Fix function indirect calls Summary: Zero-extend register to 64-bit for callq/jmpq. Test Plan: 3 tests added Reviewers: nadav, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5355 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217942 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 07:09:23 +00:00
Robin Morisset	5c16c4e45a	[X86] Use the generic AtomicExpandPass instead of X86AtomicExpandPass This required a new hook called hasLoadLinkedStoreConditional to know whether to expand atomics to LL/SC (ARM, AArch64, in a future patch Power) or to CmpXchg (X86). Apart from that, the new code in AtomicExpandPass is mostly moved from X86AtomicExpandPass. The main result of this patch is to get rid of that pass, which had lots of code duplicated with AtomicExpandPass. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217928 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 00:06:58 +00:00
Chandler Carruth	07b445aff7	[x86] Remove a FIXME that doesn't make any sense. Only the lanes feeding the blend that is matched by this are "used" in any sense, and so any build_vector or other nodes feeding these will already drop other lanes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217855 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-16 02:16:42 +00:00
Chandler Carruth	2f21b7ec5c	[x86] Cleanup an unused variable by actually using it in the non-asserts place where it was needed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217854 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-16 02:14:51 +00:00
Chandler Carruth	2e363ece75	[x86] Remove the last vestiges of the BLENDI-based ADDSUB pattern matching. This design just fundamentally didn't work because ADDSUB is available prior to any legal lowerings of BLENDI nodes. Instead, we have a dedicated ADDSUB synthetic ISD node which is pattern matched trivially into the instructions. These nodes are then recognized by both the existing and a trivial new lowering combine in the backend. Removing these patterns required adding 2 missing shuffle masks to the DAG combine, without which tests would have failed. Added the masks and a helpful assert as well to catch if anything ever goes wrong here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217851 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-16 00:39:08 +00:00
Chandler Carruth	bad2c13aae	[x86] As a follow-up to r217819, don't check for VSELECT legality now that we don't use VSELECT and directly emit an addsub synthetic node. Also remove a stale comment referencing VSELECT. The test case is updated to use 'core2' which only has SSE3, not SSE4.1, and it still passes. Previously it would not because we lacked sufficient blend support to legalize the VSELECT. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217849 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-16 00:24:42 +00:00
Chandler Carruth	cba9d1273a	[x86] Add the beginnings of a proper DAG combine to match ADDSUBPS and ADDSUBPD nodes out of blends of adds and subs. This allows us to actually form these instructions with SSE3 rather than only forming them when we had both SSE3 for the ADDSUB instructions and SSE4.1 for the blend instructions. ;] Kind-of important. I've adjusted the CPU requirements on one of the tests to demonstrate this kicking in nicely for an SSE3 cpu configuration. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217848 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-16 00:15:20 +00:00
Chandler Carruth	fa6cf7e73c	[x86] Start fixing our emission of ADDSUBPS and ADDSUBPD instructions by introducing a synthetic X86 ISD node representing this generic operation. The relevant patterns for mapping these nodes into the concrete instructions are also added, and a gnarly bit of C++ code in the target-specific DAG combiner is replaced with simple code emitting this primitive. The next step is to generically combine blends of adds and subs into this node so that we can drop the reliance on an SSE4.1 ISD node (BLENDI) when matching an SSE3 feature (ADDSUB). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217819 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 20:09:47 +00:00
Chandler Carruth	c5371836a5	[x86] Begin emitting PBLENDW instructions for integer blend operations when SSE4.1 is available. This removes a ton of domain crossing from blend code paths that were ending up in the floating point code path. This is just the tip of the iceberg though. The real switch is for integer blend lowering to more actively rely on this instruction being available so we don't hit shufps at all any longer. =] That will come in a follow-up patch. Another place where we need better support is for using PBLENDVB when doing so avoids the need to have two complementary PSHUFB masks. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217767 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-15 12:40:54 +00:00

1 2 3 4 5 ...

2832 Commits