llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2025-11-02 22:23:10 +00:00

Author	SHA1	Message	Date
Elena Demikhovsky	a73ac1f463	Added a table for intrinsics on X86. It should remove dosens of lines in handling instrinsics (in a huge switch) and give an easy way to add new intrinsics. I did not completed to move al intrnsics to the table, I'll do this in the upcomming commits. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215826 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-17 09:00:20 +00:00
Chandler Carruth	48c67ed949	[x86] Fix an indentation goof in a prior commit. Should have re-run clang-format. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215824 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-17 00:40:34 +00:00
Chandler Carruth	a3805f1c73	[x86] Teach lots of the new vector shuffle lowering to use UNPCK instructions for blend operations at 128 bits. This was a serious hole in our prior blend lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215819 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-16 09:42:15 +00:00
Reid Kleckner	2726e7d60b	Fix the build with MSVC 2013 after new shuffle code MSVC gives this awesome diagnostic: ..\lib\Target\X86\X86ISelLowering.cpp(7085) : error C2971: 'llvm::VariadicFunction1' : template parameter 'Func' : 'isShuffleEquivalentImpl' : a local variable cannot be used as a non-type argument ..\include\llvm/ADT/VariadicFunction.h(153) : see declaration of 'llvm::VariadicFunction1' ..\lib\Target\X86\X86ISelLowering.cpp(7061) : see declaration of 'isShuffleEquivalentImpl' Using an anonymous namespace makes the problem go away. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215744 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-15 18:03:58 +00:00
Chandler Carruth	92ee945e2e	[x86] Teach the new AVX v4f64 shuffle lowering to use UNPCK instructions where applicable for blending. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215737 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-15 17:42:00 +00:00
Chandler Carruth	12e69a0267	[x86] Add the initial skeleton of type-based dispatch for AVX vectors in the new shuffle lowering and an implementation for v4 shuffles. This allows us to handle non-half-crossing shuffles directly for v4 shuffles, both integer and floating point. This currently misses places where we could perform the blend via UNPCK instructions, but otherwise generates equally good or better code for the test cases included to the existing vector shuffle lowering. There are a few cases that are entertainingly better. ;] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215702 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-15 11:01:40 +00:00
Chandler Carruth	437928be5c	[x86] Remove the duplicated code for testing whether we can widen the elements of a shuffle mask and simplify how it works. No functionality changed now that the bug that was here has been fixed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215696 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-15 07:41:57 +00:00
Chandler Carruth	886f0101a7	[x86] Fix the very broken formation of vpunpck instructions in the target-specific shuffl DAG combines. We were recognizing the paired shuffles backwards. This code needs to be replaced anyways as we have the same functionality elsewhere, but I'll do the refactoring in a follow-up, this is the minimal fix to the behavior. In addition to fixing miscompiles with the new vector shuffle lowering, it also causes the canonicalization to kick in much better, selecting the smaller encoding variants in lots of places in the new AVX path. This still isn't quite ideal as we don't need both the shufpd and the punpck instructions, but that'll get fixed in a follow-up patch. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215690 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-15 03:54:49 +00:00
Chandler Carruth	477f28c48d	[x86] Fix PR20540 where the x86 shuffle DAG combiner had completely broken logic for merging shuffle masks in the face of SM_SentinelZero mask operands. While these are '-1' they don't mean 'undef' the way '-1' means in the pre-legalized shuffle masks. Instead, they mean that the shuffle operation is forcibly zeroing that lane. Reflect this and explicitly handle it in a bunch of places. In one place the effect is equivalent but much more clear. In the rest it was really weirdly broken. Also, rewrite the entire merging thing to be a more directy operation with a single loop and just doing math to map the indices through the various masks. Also add a bunch of asserts to try to make in extremely clear what the different masks can possibly look like. Finally, add some comments to clarify that we're merging shuffle masks up here rather than down as we do everywhere else, and thus the logic is quite confusing. Thanks to several different people for sending test cases, and for Robert Khasanov for an initial attempt at fixing. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215687 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-15 02:43:18 +00:00
Adam Nemet	90eb948fc9	[AVX512] Switch FMA intrinsics to the masking version This does the renaming and updates the lowering logic. Part of <rdar://problem/17688758> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215664 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-14 17:13:30 +00:00
Adam Nemet	b27f7ac2d7	[X86] Break out logic to map FMA Intrinsic number to Opcode No functional change. Will be used to lower AVX512 masking FMA intrinsics. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215663 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-14 17:13:27 +00:00
Adam Nemet	6360552890	[AVX512] Break out the logic to lower masking intrinsics No functional change. This will be used by the FMA intrinsic lowering as well and hopefully many more. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215661 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-14 17:13:24 +00:00
Chandler Carruth	cad1711154	[x86] Begin stubbing out the AVX support in the new vector shuffle lowering scheme. Currently, this just directly bails to the fallback path of splitting the 256-bit vector into two 128-bit vectors, operating there, and then joining the results back together. While the results are far from perfect, they are shockingly good for what we're doing here. I'll be layering the rest of the functionality on top of this piece by piece and updating tests as I go. Note that 256-bit vectors in this mode are still somewhat WIP. While I think the code paths that I'm adding here are clean and good-to-go, there are still a lot of 128-bit assumptions that I'll need to stomp out as I march through the functional spread here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215637 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-14 12:13:59 +00:00
Quentin Colombet	b2b79cd485	[X86] Fix the value of the low mask for the lowering of MUL_LOHI for v4i32. Found by code inspection. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215604 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-13 23:49:24 +00:00
Aaron Ballman	8b77c00bbf	Silence a -Wparenthesis warning with these asserts. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215537 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-13 10:49:07 +00:00
Elena Demikhovsky	4c97c1420b	AVX-512: Fixed a bug in shufflevector lowering. PALIGNR instruction does not exist in AVX-512F set. Added a test. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215526 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-13 07:58:43 +00:00
Chandler Carruth	6bb093bbe7	[x86] Rewrite a core part of the new vector shuffle lowering to handle one pesky test case correctly. This test case caused the old code to infloop occilating between solving the low-half and the high-half. The 'side balancing' part of single-input v8 shuffle lowering didn't handle the one pattern which can cause it to occilate. Fortunately the fuzz testing found this case. Unfortuately it was terrible to handle. I'm really sorry for the amount and density of the code here, I'd love suggestions on how to simplify it. I feel like there must be a simpler form here, but after a lot of days I've not found it. This is the only one I've found that even works. I've added the one pesky test case along with some nice comments explaining the core problem that we have to solve here. So far this has survived approximately 32k test cases. More strenuous fuzzing commencing. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215519 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-13 01:25:45 +00:00
Adam Nemet	6ea7f36872	[AVX512] Handle valign masking intrinsic via C++ lowering I think that this will scale better in most cases than adding a Pat<> for each mapping from the intrinsic DAG to the intruction (i.e. rri, rrik, rrikz). We can just lower to the SDNode and have the resulting DAG be matches by the DAG patterns. Alternatively (long term), we could keep the Pat<>s but generate them via the new AVX512_masking multiclass. The difficulty is that in order to formulate that we would have to concatenate DAGs. Currently this is only supported if the operators of the input DAGs are identical. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215473 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-12 21:13:12 +00:00
Sanjay Patel	01c6ad07d2	fixed typos git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215451 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-12 16:00:06 +00:00
Hans Wennborg	edcf61a55c	Increase the size of these SmallVectors in X86ISelLowering.cpp. In a Clang bootstrap, their sizes were always 12, 16 and 16, respectively. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215336 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-11 02:21:22 +00:00
Sanjay Patel	21327ec3e9	fixed typos git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215299 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-09 22:23:02 +00:00
Patrik Hagglund	cf403861a3	[pr19635] Revert most of r170537, and add new testcase. Patch provided by Andrey Kuharev. Sorry, r170537 was obviously wrong. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215190 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-08 08:21:19 +00:00
Alexander Kornienko	2ca1dd1381	Insert parens to avoid a warning: suggest parentheses around arithmetic in operand of '^' [-Wparentheses] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215101 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-07 12:09:34 +00:00
Chandler Carruth	0e89fbb120	[x86] Fix another miscompile found through fuzz testing the new vector shuffle lowering. This is closely related to the previous one. Here we failed to use the source offset when swapping in the other case -- where we end up swapping the final shuffle. The cause of this bug is a bit different: I simply wasn't thinking about the fact that this mask is actually a slice of a wide mask and thus has numbers that need SourceOffset applied. Simple fix. Would be even more simple with an algorithm-y thing to use here, but correctness first. =] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215095 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-07 10:37:35 +00:00
Chandler Carruth	0651861b7b	[x86] Fix another miscompile in the new vector shuffle lowering found via the fuzz tester. Here I missed an offset when round-tripping a value through a shuffle mask. I got it right 2 lines below. See a problem? I do. ;] I'll probably be adding a little "swap" algorithm which accepts a range and two values and swaps those values where they occur in the range. Don't really have a name for it, let me know if you do. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215094 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-07 10:14:27 +00:00
Chandler Carruth	b3364512fc	[x86] Fix another miscompile in the new vector shuffle lowering found through the new fuzzer. This one is great: bad operator precedence led the modulus to happen at the wrong point. All the asserts didn't fire because there were usually the right values past the end of the 4 element region we were looking at. Probably could have gotten a crash here with ASan + fuzzing, but the correctness tests pinpointed this really nicely. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215092 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-07 09:45:02 +00:00
Pavel Chupin	5d8c984e54	[x32] Use ebp/esp as frame and stack pointer Summary: Since pointers are 32-bit on x32 we can use ebp and esp as frame and stack pointer. Some operations like PUSH/POP and CFI_INSTRUCTION still require 64-bit register, so using 64-bit MachineFramePtr where required. X86_64 NaCl uses 64-bit frame/stack pointers, however it's been found that both isTarget64BitLP64 and isTarget64BitILP32 are true for NaCl. Addressing this issue here as well by making isTarget64BitLP64 false. Also mark hasReservedSpillSlot unreachable on X86. See inlined comments. Test Plan: Add one new simple test and upgrade 2 existing with x32 target case. Reviewers: nadav, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D4617 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215091 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-07 09:41:19 +00:00
Chandler Carruth	15d82b7d33	[x86] Fix a miscompile in the new shuffle lowering found through the new fuzz testing. The function which tested for adjacency did what it said on the tin, but when I called it, I wanted it to do something more thorough: I wanted to know if the pairs of shuffle elements were adjacent and started at 0 mod 2. In one place I had the decency to try to test for this, but in the other it was completely skipped, miscompiling this test case. Fix this by making the helper actually do what I wanted it to do everywhere I called it (and removing the now redundant code in one place). I really dislike the name "canWidenShuffleElements" for this predicate. If anyone can come up with a better name, please let me know. The other name I thought about was "canWidenShuffleMask" but is it really widening the mask to reduce the number of lanes shuffled? I don't know. Naming things is hard. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215089 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-07 08:11:31 +00:00
Eric Christopher	41612a9b85	Remove the target machine from CCState. Previously it was only used to get the subtarget and that's accessible from the MachineFunction now. This helps clear the way for smaller changes where we getting a subtarget will require passing in a MachineFunction/Function as well. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214988 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-06 18:45:26 +00:00
Chandler Carruth	a341a8070a	[x86] Fix two independent miscompiles in the process of getting the same test case to actually generate correct code. The primary miscompile fixed here is that we weren't correctly handling in-place elements in one half of a single-input v8i16 shuffle when moving a dword of elements from that half to the other half. Some times, we would clobber the in-place elements in forming the dword to move across halves. The fix to this involves forcibly marking the in-place inputs even when there is no need to gather them into a dword, and to much more carefully re-arrange the elements when grouping them into a dword to move across halves. With these two changes we would generate correct shuffles for the test case, but found another miscompile. There are also some random perturbations of the generated shuffle pattern in SSE2. It looks like a wash; more instructions in some cases fewer in others. The second miscompile would corrupt the results into nonsense. This is a buggy pattern in one of the added DAG combines. Mapping elements through a PSHUFD when pairing redundant half-shuffles is much harder than this code makes it out to be -- it requires reasoning about all of where the input is used in the PSHUFD, not just one part of where it is used. Plus, we can't combine a half shuffle into a PSHUFD but the code didn't guard against it. I think this was just a bad idea and I've just removed that aspect of the combine. No tests regress as a consequence so seems OK. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214954 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-06 10:16:36 +00:00
Chandler Carruth	346b68772b	[x86] Switch to a formulation of a for loop that is much more obviously not corrupting the mask by mutating it more times than intended. No functionality changed (the results were non-overlapping so the old version "worked" but was non-obvious). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214953 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-06 10:16:33 +00:00
JF Bastien	5e48675853	Fix typos in comments and doc Committing http://reviews.llvm.org/D4798 for Robin Morisset (morisset@google.com) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214934 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-05 23:27:34 +00:00
Chandler Carruth	fadc91beec	[x86] Fix a crasher due to shuffles which cancel each other out and add a test case. We also miscompile this test case which is showing a serious flaw in the single-input v8i16 shuffle code. I've left the specific instruction checks FIXME-ed out until I can address the bug in the single-input code, but I wanted to separate out a significant functionality change to produce correct code from a very simple and targeted crasher fix. The miscompile problem stems from keeping track of inputs by value rather than by index. As a consequence of doing this, we can't reliably update those inputs because they might swap and we can't detect this without copying the mask. The blend code now uses indices for the input lists and this seems strictly better. It also should make it easier to sort things and do other cleanups. I think the time has come to simplify The Great Lambda here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214914 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-05 18:45:49 +00:00
Adam Nemet	c64a05905a	[X86] Improve comments for r214888 A rebase somehow ate my comments. This restores them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214903 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-05 17:58:49 +00:00
Adam Nemet	af98f76fb5	[X86] Add lowering to VALIGN This was currently part of lowering to PALIGNR with some special-casing to make interlane shifting work. Since AVX512F has interlane alignr (valignd/q) and AVX512BW has vpalignr we need to support both of these at the same time, e.g. for SKX. This patch breaks out the common code and then add support to check both of these lowering options from LowerVECTOR_SHUFFLE. I also added some FIXMEs where I think the AVX512BW and AVX512VL additions should probably go. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214888 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-05 17:22:59 +00:00
Adam Nemet	b4d58974c3	[X86] Separate DAG node for valign and palignr They have different semantics (valign is interlane while palingr is intralane) and palingr is still needed even in the AVX512 context. According to the latest spec AVX512BW provides these. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214887 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-05 17:22:55 +00:00
Chandler Carruth	ff8028c8da	[x86] Reformat some code I moved around in a prior commit but left poorly formatted. Sorry about that. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214853 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-05 10:35:30 +00:00
Chandler Carruth	e6329cf303	[x86] Fix a crash and wrong-code bug in the new vector lowering all found by a single test reduced out of a failure on llvm-stress. The start of the problem (and the crash) came when we tried to use a find of a non-used slot in the move-to half of the move-mask as the target for two bad-half inputs. While if lucky this will be the first of a pair of slots which we can place the bad-half inputs into, it isn't actually guaranteed. This really isn't surprising, not sure what I was thinking. The correct way to find the two unused slots is to look for one of the used slots. We know it isn't that pair, and we can use some modular arithmetic to find the other pair by masking off the odd bit and adding 2 modulo 4. With this, we reliably found a viable pair of slots for the bad-half inputs. Sadly, that wasn't enough. We also had a wrong code bug that surfaced when I reduced the test case for this where we would use the same slot twice for the two bad inputs. This is because both of the bad inputs could be in odd slots originally and thus the mod-2 mapping would actually be the same. The whole point of the weird indexing into the pair of empty slots was to try to leverage when the end result needed the two bad-half inputs to be paired in a dword and pre-pair them in the correct orrientation. This is less important with the powerful combining we're now doing, and also easier and more reliable to achieve be noting that we add the bad-half inputs in order. Thus, if they are in a dword pair, the low part of that will be the first input in the sequence. Always putting that in the low element will just do the right thing in addition to computing the correct result. Test case added. =] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214849 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-05 08:19:21 +00:00
Eric Christopher	6035518e3b	Have MachineFunction cache a pointer to the subtarget to make lookups shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214838 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-05 02:39:49 +00:00
Eric Christopher	9f85dccfc6	Remove the TargetMachine forwards for TargetSubtargetInfo based information and update all callers. No functional change. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214781 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-04 21:25:23 +00:00
Chandler Carruth	48593d7934	[x86] Just unilaterally prefer SSSE3-style PSHUFB lowerings over clever use of PACKUS. It's cleaner that way. I looked at implementing clever combine-based folding of PACKUS chains into PSHUFB but it is quite hard and doesn't seem likely to be worth it. The most annoying part would be detecting that the correct masking had been done to use PACKUS-style instructions as a blend operation rather than there being any saturating as is indicated by its name. We generate really nice code for what few test cases I've come up with that aren't completely contrived for this by just directly prefering PSHUFB and so let's go with that strategy for now. =] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214707 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-04 10:17:35 +00:00
Chandler Carruth	93f5d9f093	[x86] Implement more aggressive use of PACKUS chains for lowering common patterns of v16i8 shuffles. This implements one of the more important FIXMEs for the SSE2 support in the new shuffle lowering. We now generate the optimal shuffle sequence for truncate-derived shuffles which show up essentially everywhere. Unfortunately, this exposes a weakness in other parts of the shuffle logic -- we can no longer form PSHUFB here. I'll add the necessary support for that and other things in a subsequent commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214702 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-04 09:40:02 +00:00
Chandler Carruth	73100d8f33	[x86] Handle single input shuffles in the SSSE3 case more intelligently. I spent some time looking into a better or more principled way to handle this. For example, by detecting arbitrary "unneeded" ORs... But really, there wasn't any point. We just shouldn't build blatantly wrong code so late in the pipeline rather than adding more stages and logic later on to fix it. Avoiding this is just too simple. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214680 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-04 01:14:24 +00:00
Chandler Carruth	caf471e820	[x86] Remove the FIXME that was implemented in r214628. Managed to forget to update the comment here... =/ git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214630 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-02 11:34:23 +00:00
Chandler Carruth	1029c7003f	[x86] Largely complete the use of PSHUFB in the new vector shuffle lowering with a small addition to it and adding PSHUFB combining. There is one obvious place in the new vector shuffle lowering where we should form PSHUFBs directly: when without them we will unpack a vector of i8s across two different registers and do a potentially 4-way blend as i16s only to re-pack them into i8s afterward. This is the crazy expensive fallback path for i8 shuffles and we can just directly use pshufb here as it will always be cheaper (the unpack and pack are two instructions so even a single shuffle between them hits our three instruction limit for forming PSHUFB). However, this doesn't generate very good code in many cases, and it leaves a bunch of common patterns not using PSHUFB. So this patch also adds support for extracting a shuffle mask from PSHUFB in the X86 lowering code, and uses it to handle PSHUFBs in the recursive shuffle combining. This allows us to combine through them, combine multiple ones together, and generally produce sufficiently high quality code. Extracting the PSHUFB mask is annoyingly complex because it could be either pre-legalization or post-legalization. At least this doesn't have to deal with re-materialized constants. =] I've added decode routines to handle the different patterns that show up at this level and we dispatch through them as appropriate. The two primary test cases are updated. For the v16 test case there is still a lot of room for improvement. Since I was going through it systematically I left behind a bunch of FIXME lines that I'm hoping to turn into ALL lines by the end of this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214628 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-02 10:39:15 +00:00
Chandler Carruth	3c92a7aac1	[x86] Fix a few typos in my comments spotted in passing. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214626 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-02 10:29:34 +00:00
Chandler Carruth	fb1293fd4c	[x86] Teach the target shuffle mask extraction to recognize unary forms of normally binary shuffle instructions like PUNPCKL and MOVLHPS. This detects cases where a single register is used for both operands making the shuffle behave in a unary way. We detect this and adjust the mask to use the unary form which allows the existing DAG combine for shuffle instructions to actually work at all. As a consequence, this uncovered a number of obvious bugs in the existing DAG combine which are fixed. It also now canonicalizes several shuffles even with the existing lowering. These typically are trying to match the shuffle to the domain of the input where before we only really modeled them with the floating point variants. All of the cases which change to an integer shuffle here have something in the integer domain, so there are no more or fewer domain crosses here AFAICT. Technically, it might be better to go from a GPR directly to the floating point domain, but detecting floating point outputs despite integer inputs is a lot more code and seems unlikely to be worthwhile in practice. If folks are seeing domain-crossing regressions here though, let me know and I can hack something up to fix it. Also as a consequence, a bunch of missed opportunities to form pshufb now can be formed. Notably, splats of i8s now form pshufb. Interestingly, this improves the existing splat lowering too. We go from 3 instructions to 1. Yes, we may tie up a register, but it seems very likely to be worth it, especially if splatting the 0th byte (the common case) as then we can use a zeroed register as the mask. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214625 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-02 10:27:38 +00:00
Akira Hatanaka	306030f8aa	[X86] Simplify X87 stackifier pass. Stop using ST registers for function returns and inline-asm instructions and use FP registers instead. This allows removing a large amount of code in the stackifier pass that was needed to track register liveness and handle copies between ST and FP registers and function calls returning floating point values. It also fixes a bug which manifests when an ST register defined by an inline-asm instruction was live across another inline-asm instruction, as shown in the following sequence of machine instructions: 1. INLINEASM <es:frndint> $0:[regdef], %ST0<imp-def,tied5> 2. INLINEASM <es:fldcw $0> 3. %FP0<def> = COPY %ST0 <rdar://problem/16952634> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214580 91177308-0d34-0410-b5e6-96231b3b80d8	2014-08-01 22:19:41 +00:00
Louis Gerbarg	7d54c5b0f2	Make sure no loads resulting from load->switch DAGCombine are marked invariant Currently when DAGCombine converts loads feeding a switch into a switch of addresses feeding a load the new load inherits the isInvariant flag of the left side. This is incorrect since invariant loads can be reordered in cases where it is illegal to reoarder normal loads. This patch adds an isInvariant parameter to getExtLoad() and updates all call sites to pass in the data if they have it or false if they don't. It also changes the DAGCombine to use that data to make the right decision when creating the new load. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214449 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-31 21:45:05 +00:00
Matt Arsenault	2dd264c8a3	Add alignment value to allowsUnalignedMemoryAccess Rename to allowsMisalignedMemoryAccess. On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment, and don't need to be split into multiple accesses. Vector loads with an alignment of the element type are not uncommon in OpenCL code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214055 91177308-0d34-0410-b5e6-96231b3b80d8	2014-07-27 17:46:40 +00:00

1 2 3 4 5 ...

2738 Commits