llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2025-07-18 10:24:45 +00:00

Author	SHA1	Message	Date
Matt Arsenault	4bacfe2095	Add generic fmad DAG node. This allows sharing of FMA forming combines to work with instructions that have the same semantics as a separate multiply and add. This is expand by default, and only formed post legalization so it shouldn't have much impact on targets that do not want it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230070 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-20 22:10:33 +00:00
Ahmed Bougacha	953c5c9458	[CodeGen] Use ArrayRef instead of std::vector&. NFC. The former lets us use SmallVectors. Do so in ARM and AArch64. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229925 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 23:13:10 +00:00
Chandler Carruth	a8fb39af83	[x86,sdag] Two interrelated changes to the x86 and sdag code. First, don't combine bit masking into vector shuffles (even ones the target can handle) once operation legalization has taken place. Custom legalization of vector shuffles may exist for these patterns (making the predicate return true) but that custom legalization may in some cases produce the exact bit math this matches. We only really want to handle this prior to operation legalization. However, the x86 backend, in a fit of awesome, relied on this. What it would do is mark VSELECTs as expand, which would turn them into arithmetic, which this would then match back into vector shuffles, which we would then lower properly. Amazing. Instead, the second change is to teach the x86 backend to directly form vector shuffles from VSELECT nodes with constant conditions, and to mark all of the vector types we support lowering blends as shuffles as custom VSELECT lowering. We still mark the forms which actually support variable blends as legal so that the custom lowering is bypassed, and the legal lowering can even be used by the vector shuffle legalization (yes, i know, this is confusing. but that's how the patterns are written). This makes the VSELECT lowering much more sensible, and in fact should fix a bunch of bugs with it. However, as you'll see in the test cases, right now what it does is point out the hilarious deficiency of the new vector shuffle lowering when it comes to blends. Fortunately, my very next patch fixes that. I can't submit it yet, because that patch, somewhat obviously, forms the exact and/or pattern that the DAG combine is matching here! Without this patch, teaching the vector shuffle lowering to produce the right code infloops in the DAG combiner. With this patch alone, we produce terrible code but at least lower through the right paths. With both patches, all the regressions here should be fixed, and a bunch of the improvements (like using 2 shufps with no memory loads instead of 2 andps with memory loads and an orps) will stay. Win! There is one other change worth noting here. We had hilariously wrong vectorization cost estimates for vselect because we fell through to the code path that assumed all "expand" vector operations are scalarized. However, the "expand" lowering of VSELECT is vector bit math, most definitely not scalarized. So now we go back to the correct if horribly naive cost of "1" for "not scalarized". If anyone wants to add actual modeling of shuffle costs, that would be cool, but this seems an improvement on its own. Note the removal of 16 and 32 "costs" for doing a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of course, we don't right now because of OMG bad code, but I'm going to fix that. Next patch. I promise. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229835 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-19 10:36:19 +00:00
Sanjay Patel	1115b2c27e	Canonicalize splats as build_vectors (PR22283) This is a follow-on patch to: http://reviews.llvm.org/D7093 That patch canonicalized constant splats as build_vectors, and this patch removes the constant check so we can canonicalize all splats as build_vectors. This fixes the 2nd test case in PR22283: http://llvm.org/bugs/show_bug.cgi?id=22283 The unfortunate code duplication between SelectionDAG and DAGCombiner is discussed in the earlier patch review. At least this patch is just removing code... This improves an existing x86 AVX test and changes codegen in an ARM test. Differential Revision: http://reviews.llvm.org/D7389 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229511 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 16:54:32 +00:00
Benjamin Kramer	1a50a12b43	Prefer SmallVector::append/insert over push_back loops. Same functionality, but hoists the vector growth out of the loop. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229500 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-17 15:29:18 +00:00
Mehdi Amini	2deb1d0b54	SelectionDAG: fold (fp_to_u/sint (s/uint_to_fp)) here too Update SPARC tests to match. From: Fiona Glaser <fglaser@apple.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229438 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-16 21:47:58 +00:00
Chandler Carruth	5ee516549f	[x86] Fix PR22377, a regression with the new vector shuffle legality test. This was just a matter of the DAG combine for vector shuffles being too aggressive. This is a bit of a grey area, but I think generally if we can re-use intermediate shuffles, we should. Certainly, given the test cases I have available, this seems like the right call. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229285 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-15 07:01:10 +00:00
Duncan P. N. Exon Smith	9de77c7eca	CodeGen: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) Also, add `Function::getFnStackAlignment()`, and canonicalize: getAttributes().getStackAlignment(AttributeSet::FunctionIndex) => getFnStackAlignment() git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229208 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-14 01:44:41 +00:00
Benjamin Kramer	d913d9d2c3	MathExtras: Bring Count(Trailing\|Leading)Ones and CountPopulation in line with countTrailingZeros Update all callers. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228930 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-12 15:35:40 +00:00
Ahmed Bougacha	9e9bde9b54	[CodeGen] Don't blindly combine (fp_round (fp_round x)) to (fp_round x). We used to do this DAG combine, but it's not always correct: If the first fp_round isn't a value preserving truncation, it might introduce a tie in the second fp_round, that wouldn't occur in the single-step fp_round we want to fold to. In other words, double rounding isn't the same as rounding. Differential Revision: http://reviews.llvm.org/D7571 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228911 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-12 06:15:29 +00:00
Jonas Paulsson	fc58cf7676	Fix SelectionDAG compile time issue with alias analysis. Add new token factor node and its users to worklist if alias analysis is turned on, in DAGCombiner::visitTokenFactor(). Alias analysis may cause a lot of new token factors to be inserted into the DAG, and they need to be optimized to avoid significant slow-downs. Reviewed by Hal Finkel. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228841 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-11 16:10:31 +00:00
Jonas Paulsson	b8ee890901	Two comment typo fixes in lib/CodeGen/SelectionDAG/DAGCombiner.cpp. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228700 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-10 15:34:29 +00:00
Chandler Carruth	1c7c2e8650	[x86] Fix PR22524: the DAG combiner was incorrectly handling illegal nodes when folding bitcasts of constants. We can't fold things and then check after-the-fact whether it was legal. Once we have formed the DAG node, arbitrary other nodes may have been collapsed to it. There is no easy way to go back. Instead, we need to test for the specific folding cases we're interested in and ensure those are legal first. This could in theory make this less powerful for bitcasting from an integer to some vector type, but AFAICT, that can't actually happen in the SDAG so its fine. Now, we only whitelist specific int->fp and fp->int bitcasts for post-legalization folding. I've added the test case from the PR. (Also as a note, this does not appear to be in 3.6, no backport needed) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228656 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-10 02:25:56 +00:00
Ahmed Bougacha	ec35069525	[CodeGen] Add hook/combine to form vector extloads, enabled on X86. The combine that forms extloads used to be disabled on vector types, because "None of the supported targets knows how to perform load and sign extend on vectors in one instruction." That's not entirely true, since at least SSE4.1 X86 knows how to do those sextloads/zextloads (with PMOVS/ZX). But there are several aspects to getting this right. First, vector extloads are controlled by a profitability callback. For instance, on ARM, several instructions have folded extload forms, so it's not always beneficial to create an extload node (and trying to match extloads is a whole 'nother can of worms). The interesting optimization enables folding of s/zextloads to illegal (splittable) vector types, expanding them into smaller legal extloads. It's not ideal (it introduces some legalization-like behavior in the combine) but it's better than the obvious alternative: form illegal extloads, and later try to split them up. If you do that, you might generate extloads that can't be split up, but have a valid ext+load expansion. At vector-op legalization time, it's too late to generate this kind of code, so you end up forced to scalarize. It's better to just avoid creating egregiously illegal nodes. This optimization is enabled unconditionally on X86. Note that the splitting combine is happy with "custom" extloads. As is, this bypasses the actual custom lowering, and just unrolls the extload. But from what I've seen, this is still much better than the current custom lowering, which does some kind of unrolling at the end anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added FIXME). Also note that the existing combine that forms extloads is now also enabled on legal vectors. This doesn't have a big effect on X86 (because sext+load is usually combined to sext_inreg+aextload). On ARM it fires on some rare occasions; that's for a separate commit. Differential Revision: http://reviews.llvm.org/D6904 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228325 91177308-0d34-0410-b5e6-96231b3b80d8	2015-02-05 18:31:02 +00:00
Quentin Colombet	24508d33fb	Revert r227242 - Merge vector stores into wider vector stores (PR21711). This commit creates infinite loop in DAG combine for in the LLVM test-suite for aarch64 with mcpu=cylcone (just having neon may be enough to expose this). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227272 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-27 23:58:01 +00:00
Sanjay Patel	c94f9d3d2f	Merge vector stores into wider vector stores (PR21711) This patch resolves part of PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ). The 'f3' test case in that report presents a situation where we have two 128-bit stores extracted from a 256-bit source vector. Instead of producing this: vmovaps %xmm0, (%rdi) vextractf128 $1, %ymm0, 16(%rdi) This patch merges the 128-bit stores into a single 256-bit store: vmovups %ymm0, (%rdi) Differential Revision: http://reviews.llvm.org/D7208 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227242 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-27 20:50:27 +00:00
Sanjay Patel	05d5e213c4	merge consecutive stores of extracted vector elements (PR21711) This is a 2nd try at the same optimization as http://reviews.llvm.org/D6698. That patch was checked in at r224611, but reverted at r225031 because it caused a failure outside of the regression tests. The cause of the crash was not recognizing consecutive stores that have mixed source values (loads and vector element extracts), so this patch adds a check to bail out if any store value is not coming from a vector element extract. This patch also refactors the shared logic of the constant source and vector extracted elements source cases into a helper function. Differential Revision: http://reviews.llvm.org/D6850 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226845 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-22 18:21:26 +00:00
Michael Kuperstein	a52ddfa930	[DAGCombine] Produce better code for constant splats This solves PR22276. Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead. Differential Revision: http://reviews.llvm.org/D7093 Fixed recommit of r226811. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226816 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-22 13:07:28 +00:00
Michael Kuperstein	8fc1c3a619	Revert r226811, MSVC accepts code sane compilers don't. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226814 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-22 12:48:07 +00:00
Michael Kuperstein	0a979a09ae	[DAGCombine] Produce better code for constant splats This solves PR22276. Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead. Differential Revision: http://reviews.llvm.org/D7093 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226811 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-22 12:37:23 +00:00
Elena Demikhovsky	2785766bc8	Fixed a bug in type legalizer for masked load/store intrinsics. The problem occurs when after vectorization we have type <2 x i32>. This type is promoted to <2 x i64> and then requires additional efforts for expanding loads and truncating stores. I added EXPAND / TRUNCATE attributes to the masked load/store SDNodes. The code now contains additional shuffles. I've prepared changes in the cost estimation for masked memory operations, it will be submitted separately. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226808 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-22 12:07:59 +00:00
Elena Demikhovsky	9cb8df2c75	Fixed a comment git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226806 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-22 10:01:36 +00:00
Elena Demikhovsky	cdce03426d	Fixed a bug in narrowing store operation. Type MVT::i1 became legal in KNL, but store operation can't be narrowed to this type, since the size of VT (1 bit) is not equal to its actual store size(8 bits). Added a test provided by David (dag@cray.com) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226805 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-22 09:39:08 +00:00
Tim Northover	f5f8a3e6a6	DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N)) It can help with argument juggling on some targets, and is generally a good idea. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226740 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-21 23:17:19 +00:00
Tim Northover	c49e57ade1	Revert "DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))" It hadn't gone through review yet, but was still on my local copy. This reverts commit r226663 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226665 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-21 15:48:52 +00:00
Tim Northover	47f47f5d2a	DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N)) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226663 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-21 15:43:28 +00:00
Mehdi Amini	5eed637b34	Improve DAG combine pass on certain IR vector patterns Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector produced suboptimal code in AVX2 mode with certain IR combinations. In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32 (undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical, but then mysteriously generated rather bad code; the movq/movhpd combination didn't match. The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs would get promoted to 4f32 by the type legalizer, eventually resulting in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing these were both half the output size, concatted them and then produced a shuffle. However, the resulting concat + shuffle was more complex than it should be; in the case where the upper half of the output is undef, we probably want to generate shuffle + concat instead. This enhancement causes the vector_shuffle combine step to recognize this suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR in case the same suboptimal pattern occurs for other reasons. This results in the optimizer correctly producing the optimal movq + movhpd sequence for all three variations on this IR, even with AVX2. I've included a test case. Radar link: rdar://problem/19287012 Fix for PR 21943. From: Fiona Glaser <fglaser@apple.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226360 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-17 01:35:56 +00:00
Chandler Carruth	1b279144ec	[cleanup] Re-sort all the #include lines in LLVM using utils/sort_includes.py. I clearly haven't done this in a while, so more changed than usual. This even uncovered a missing include from the InstrProf library that I've added. No functionality changed here, just mechanical cleanup of the include order. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225974 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-14 11:23:27 +00:00
Mehdi Amini	cfe92407cd	DAG Combiner: Fold SelectCC When Cond is UNDEF In case folding a node end up with a NaN as operand for the select, the folding of the condition of the selectcc node returns "UNDEF". Differential Revision: http://reviews.llvm.org/D6889 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225952 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-14 05:45:24 +00:00
Matthias Braun	12232769b3	DAGCombiner: simplify by using condition variables; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225836 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-13 22:17:46 +00:00
Matt Arsenault	7c06364dc0	R600: Implement getRecipEstimate This requires a new hook to prevent expanding sqrt in terms of rsqrt and reciprocal. v_rcp_f32, v_rsq_f32, and v_sqrt_f32 are all the same rate, so this expansion would just double the number of instructions and cycles. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225828 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-13 20:53:23 +00:00
Olivier Sallenave	9dd21f4380	Added TLI hook for isFPExtFree. Some of the FMA combine heuristics are now guarded with that hook. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225795 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-13 15:06:36 +00:00
Matt Arsenault	29ad7506e1	Combine fcmp + select to fminnum / fmaxnum if no nans and legal Also require unsafe FP math for no since there isn't a way to test for signed zeros. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225744 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-13 00:43:00 +00:00
Hal Finkel	8e1d151abe	[DAGCombine] Remainder of fix to r225380 (More FMA folding opportunities) As pointed out by Aditya (and Owen), when we elide an FP extend to form an FMA, we need to extend the incoming operands so that the resulting node will really be legal. This is currently enabled only for PowerPC, and it happens to work there regardless, but this should fix the functionality for everyone else should anyone else wish to use it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225492 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-09 01:29:29 +00:00
Hal Finkel	40ddb2ce8f	Partial fix to r225380 (More FMA folding opportunities) As pointed out by Aditya (and Owen), there are two things wrong with this code. First, it adds patterns which elide FP extends when forming FMAs, and that might not be profitable on all targets (it belongs behind the pre-existing aggressive-FMA-formation flag). This is fixed by this change. Second, the resulting nodes might have operands of different types (the extensions need to be re-added). That will be fixed in the follow-up commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225485 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-09 00:45:54 +00:00
Ahmed Bougacha	7fac1d945f	[SelectionDAG] Allow targets to specify legality of extloads' result type (in addition to the memory type). The LoadExt legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225421 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-08 00:51:32 +00:00
Olivier Sallenave	033a537a84	More FMA folding opportunities. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225380 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-07 20:54:17 +00:00
Olivier Sallenave	bc2572cac5	Test commit git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225368 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-07 19:45:17 +00:00
Craig Topper	9bf73516cb	Replace several 'assert(false' with 'llvm_unreachable' or fold a condition into the assert. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225160 91177308-0d34-0410-b5e6-96231b3b80d8	2015-01-05 10:15:49 +00:00
Alexey Samsonov	c0319dd9c2	Revert "merge consecutive stores of extracted vector elements" This reverts commit r224611. This change causes crashes in X86 DAG->DAG Instruction Selection. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225031 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-31 00:40:28 +00:00
Mehdi Amini	8548c2453f	Always assert in DAGCombine and not only when -debug is enabled Right now in DAG Combine check the validity of the returned type only when -debug is given on the command line. However usually the test cases in the validation does not use -debug. An Assert build should always check this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224779 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-23 18:59:02 +00:00
Michael Kuperstein	1f0ddef593	[DagCombine] Improve DAGCombiner BUILD_VECTOR when it has two sources of elements This partially fixes PR21943. For AVX, we go from: vmovq (%rsi), %xmm0 vmovq (%rdi), %xmm1 vpermilps $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3] vinsertps $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3] vinsertps $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3] vpermilps $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3] vinsertps $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0] To the expected: vmovq (%rdi), %xmm0 vmovhpd (%rsi), %xmm0, %xmm0 retq Fixing this for AVX2 is still open. Differential Revision: http://reviews.llvm.org/D6749 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224759 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-23 08:59:45 +00:00
Sanjay Patel	3c3cd10928	merge consecutive stores of extracted vector elements Add a path to DAGCombiner::MergeConsecutiveStores() to combine multiple scalar stores when the store operands are extracted vector elements. This is a partial fix for PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ). For the new test case, codegen improves from: vmovss %xmm0, (%rdi) vextractps $1, %xmm0, 4(%rdi) vextractps $2, %xmm0, 8(%rdi) vextractps $3, %xmm0, 12(%rdi) vextractf128 $1, %ymm0, %xmm0 vmovss %xmm0, 16(%rdi) vextractps $1, %xmm0, 20(%rdi) vextractps $2, %xmm0, 24(%rdi) vextractps $3, %xmm0, 28(%rdi) vzeroupper retq To: vmovups %ymm0, (%rdi) vzeroupper retq Patch reviewed by Nadav Rotem. Differential Revision: http://reviews.llvm.org/D6698 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224611 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-19 20:23:41 +00:00
Michael Kuperstein	fd350586f5	[DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle. This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors. This fixes PR15872 and provides a partial fix for PR21711. Differential Revision: http://reviews.llvm.org/D6678 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224429 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-17 12:32:17 +00:00
Matt Arsenault	6e6318f148	Add target hook for whether it is profitable to reduce load widths Add an option to disable optimization to shrink truncated larger type loads to smaller type loads. On SI this prevents using scalar load instructions in some cases, since there are no scalar extloads. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@224084 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-12 00:00:24 +00:00
Owen Anderson	59bf8e81f3	Fix a few instances found in SelectionDAG where we were not handling F16 at parity with F32 and F64. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223760 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-09 06:50:39 +00:00
Simon Pilgrim	94590ca4cf	[InstCombine] Minor optimization for bswap with binary ops Added instcombine optimizations for BSWAP with AND/OR/XOR ops: OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) ) OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) ) Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well: fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y)) Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone. Differential Revision: http://reviews.llvm.org/D6407 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223349 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-04 09:44:01 +00:00
Elena Demikhovsky	73ae1df82c	Masked Load / Store Intrinsics - the CodeGen part. I'm recommiting the codegen part of the patch. The vectorizer part will be send to review again. Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223348 91177308-0d34-0410-b5e6-96231b3b80d8	2014-12-04 09:40:44 +00:00
Duncan P. N. Exon Smith	54786a0936	Revert "Masked Vector Load and Store Intrinsics." This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222936 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-28 21:29:14 +00:00
Elena Demikhovsky	ae1ae2c3a1	Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222632 91177308-0d34-0410-b5e6-96231b3b80d8	2014-11-23 08:07:43 +00:00

1 2 3 4 5 ...

1389 Commits