llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2025-02-23 05:29:23 +00:00

Author	SHA1	Message	Date
Adrian Prantl	10c4265675	Revert r218778 while investigating buldbot breakage. "Move the complex address expression out of DIVariable and into an extra" git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218782 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 18:10:54 +00:00
Adrian Prantl	076fd5dfc1	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218778 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 17:55:39 +00:00
Tom Stellard	56077f5796	R600: Call EmitFunctionHeader() in the AsmPrinter to populate the ELF symbol table git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218776 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 17:15:17 +00:00
Tom Stellard	6a0fcf7f53	C API: Add LLVMCloneModule() git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218775 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 17:14:57 +00:00
Jingyue Wu	ccd995ab0c	Revert r216862 due to a performance regression Reported by Alexey Volkov in PR21115 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218771 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 15:22:13 +00:00
Toma Tabacu	a2878ec715	[mips] Rename emit and parse functions for the .cpload assembler directive. NFC. Summary: It's better if we have a consistent name for .cpload-related functions. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5437 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218768 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 14:53:19 +00:00
Tom Stellard	f7082f9bd7	R600/SI: Add a generic pseudo EXP instruction git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218767 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 14:44:45 +00:00
Tom Stellard	cbb63311cd	R600/SI: Add generic pseudo MTBUF instructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218766 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 14:44:43 +00:00
Tom Stellard	f69ae4815a	R600/SI: Add generic pseudo SMRD instructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218765 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 14:44:42 +00:00
Oliver Stannard	9d7038c437	[ARM] Allow selecting VRINT[APMXZR] and VCVT[BT] instructions for FPv5 Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions when targeting ARMv8, but they are actually present on any target with FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an M-profile core, but they have the same instructions so we model them both as FPARMv8 in the ARM backend. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218763 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 13:13:18 +00:00
Chandler Carruth	7d64681274	[x86] Fix a few more tiny patterns with the new vector shuffle lowering that keep cropping up in the regression test suite. This also addresses one of the issues raised on the mailing list with failing to form 'movsd' in as many cases as we realistically should. There will be corresponding patches forthcoming for v4f32 at least. This was a lot of fuss for a relatively small gain, but all the fuss was on my end trying different ways of holding the pieces of the x86 fragment patterns just right. Now that it works, the code is reasonably simple. In the new test cases I'm adding here, v2i64 sticks out as just plain horrible. I've not come up with any great ideas here other than that it would be nice to recognize when we're going to take a domain crossing hit and cross earlier to get the decent instructions. At least with AVX it is slightly less silly.... git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218756 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 11:14:02 +00:00
Chandler Carruth	a1b88ab2c1	[x86] Delete some extraneous logic from the new vector shuffle lowering. Nothing was relying on this and there are potentially some edge cases that it would not be correct under. Removing it seems better than trying to "fix" it as nothing was relying on it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218755 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 11:13:57 +00:00
Tom Coxon	01649dea92	[AArch64] Allow access to all system registers with MRS/MSR instructions. The A64 instruction set includes a generic register syntax for accessing implementation-defined system registers. The syntax for these registers is: S<op0>_<op1>_<CRn>_<CRm>_<op2> The encoding space permitted for implementation-defined system registers is: op0 op1 CRn CRm op2 11 xxx 1x11 xxxx xxx The full encoding space can now be accessed: op0 op1 CRn CRm op2 xx xxx xxxx xxxx xxx This is useful to anyone needing to write assembly code supporting new system registers before the assembler has learned the official names for them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218753 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 10:13:59 +00:00
Evgeniy Stepanov	82e145f9ef	Revert r218721, r218735. Failing bootstrap on Linux (arm, x86). http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13139/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/470 http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/8518 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218752 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 10:07:28 +00:00
Asiri Rathnayake	e9bbacd0a8	Add missing natual vector cast. Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST was introduced in r217159 and r217138. This patch adds a missing cast from v2f32 to v1i64 which is causing some compilation failures. Also added test cases to cover various modimm types and BUILD_VECTORs with i64 elements. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218751 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 09:59:45 +00:00
Oliver Stannard	ff18b9ff38	[ARM] Add support for Cortex-M7, FPv5-SP and FPv5-DP (LLVM) The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be modelled using the same target feature, and all double-precision operations are already disabled by the fp-only-sp target features. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218747 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 09:02:17 +00:00
Daniel Sanders	9a11fba79f	[mips] Fix disassembly of [ls][wd]c[23], cache, and pref Fixes PR21015, and PR20993. Patch by Jun Koi git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218745 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 08:26:55 +00:00
Sasa Stankovic	05a13f0bd0	[mips] For indirect calls we don't need $gp to point to .got. Mips linker doesn't generate lazy binding stub for a function whose address is taken in the program. Differential Revision: http://reviews.llvm.org/D5067 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218744 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 08:22:21 +00:00
Lang Hames	e2ef4419a8	[MCJIT] Turn the getSymbolAddress free function created in r218626 into a static member of RTDyldMemoryManager (and rename to getSymbolAddressInProcess). The functionality this provides is very specific to RTDyldMemoryManager, so it makes sense to keep it in that class to avoid accidental re-use. No functional change. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218741 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 04:11:13 +00:00
Nick Lewycky	b69f873ee1	Fix typo in comment from r218733 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218739 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 03:37:34 +00:00
Gerolf Hoflehner	3adf585efe	[InstCombine] Fix for assert build failures caused by r218721 The icmp-select-icmp optimization made the implicit assumption that the select-icmp instructions are in the same block and asserted on it. The fix explicitly checks for that condition and conservatively suppresses the optimization when it is violated. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218735 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 03:24:39 +00:00
Chandler Carruth	9e2fe46484	[x86] Teach the new vector shuffle lowering to be even more aggressive in exposing the scalar value to the broadcast DAG fragment so that we can catch even reloads and fold them into the broadcast. This is somewhat magical I'm afraid but seems to work. It is also what the old lowering did, and I've switched an old test to run both lowerings demonstrating that we get the same result. Unlike the old code, I'm not lowering f32 or f64 scalars through this path when we only have AVX1. The target patterns include pretty heinous code to re-cast those as shuffles when the scalar happens to not be spilled because AVX1 provides no broadcast mechanism from registers what-so-ever. This is terribly brittle. I'd much rather go through our generic lowering code to get this. If needed, we can add a peephole to get even more opportunities to broadcast-from-spill-slots that are exposed post-RA, but my suspicion is this just doesn't matter that much. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218734 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 03:19:43 +00:00
Chandler Carruth	429670f0e8	[x86] Hoist the zext-lowering up in the v4i32 lowering routine -- it is the same speed as pshufd but we can fold loads into the pmovzx instructions. This fixes some regressions that came up in the regression test suite for the new vector shuffle lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218733 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 02:25:54 +00:00
David Blaikie	06c1373053	Implement DW_TAG_subrange_type with DW_AT_count rather than DW_AT_upper_bound This allows proper disambiguation of unbounded arrays and arrays of zero bound ("struct foo { int x[]; };" and "struct foo { int x[0]; }"). GCC instead produces an upper bound of -1 in the latter situation, but count seems tidier. This way lower_bound is provided if it's not the language default and count is provided if the count is known, otherwise it's omitted. Simple. If someone wants to look at rdar://problem/12566646 and see if this change is acceptable to that bug/fix, that might be helpful (see the empty-and-one-elem-array.ll test case which cites that radar). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218726 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 00:56:55 +00:00
Adam Nemet	d0d5b08fbd	[AVX512] Remove space before \t in AsmStrings. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218725 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 00:41:32 +00:00
Chandler Carruth	afe75172b1	[x86] Teach the new vector shuffle lowering about VBROADCAST and VPBROADCAST. This has the somewhat expected pervasive impact. I don't know why I forgot about this. Everything seems good with lots of significant improvements in the tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218724 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 00:41:21 +00:00
Gerolf Hoflehner	2318c2f28d	[InstCombine] Optimize icmp-select-icmp In special cases select instructions can be eliminated by replacing them with a cheaper bitwise operation even when the select result is used outside its home block. The instances implemented are patterns like %x=icmp.eq %y=select %x,%r, null %z=icmp.eq\|neq %y, null br %z,true, false ==> %x=icmp.ne %y=icmp.eq %r,null %z=or %x,%y br %z,true,false The optimization is integrated into the instruction combiner and performed only when all uses of the select result can be replaced by the select operand proper. For this dominator information is used and dominance is now a required analysis pass in the combiner. The optimization itself is iterative. The critical step is to replace the select result with the non-constant select operand. So the select becomes local and the combiner iteratively works out simpler code pattern and eventually eliminates the select. rdar://17853760 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218721 91177308-0d34-0410-b5e6-96231b3b80d8	2014-10-01 00:13:22 +00:00
David Blaikie	8f70c4827a	Omit DW_AT_inline under -gmlt to save a little more space. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218719 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 23:29:16 +00:00
Hal Finkel	a0715579f0	[BasicAA] Make better use of zext and sign information Two related things: 1. Fixes a bug when calculating the offset in GetLinearExpression. The code previously used zext to extend the offset, so negative offsets were converted to large positive ones. 2. Enhance aliasGEP to deduce that, if the difference between two GEP allocations is positive and all the variables that govern the offset are also positive (i.e. the offset is strictly after the higher base pointer), then locations that fit in the gap between the two base pointers are NoAlias. Patch by Nick White! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218714 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 22:43:40 +00:00
David Blaikie	2c453a0c03	DebugInfo: Sink the code emitting DW_AT_APPLE_omit_frame_ptr down to a more common spot. No functional change. Pre-emptive refactoring before I start pushing some of this subprogram creation down into DWARFCompileUnit so I can build different subprograms in the skeleton unit from the dwo unit for adding -gmlt-like data to the skeleton. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218713 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 22:32:49 +00:00
Jingyue Wu	9cd9e4bb2c	[SimplifyCFG] threshold for folding branches with common destination Summary: This patch adds a threshold that controls the number of bonus instructions allowed for folding branches with common destination. The original code allows at most one bonus instruction. With this patch, users can customize the threshold to allow multiple bonus instructions. The default threshold is still 1, so that the code behaves the same as before when users do not specify this threshold. The motivation of this change is that tuning this threshold significantly (up to 25%) improves the performance of some CUDA programs in our internal code base. In general, branch instructions are very expensive for GPU programs. Therefore, it is sometimes worth trading more arithmetic computation for a more straightened control flow. Here's a reduced example: __global__ void foo(int a, int b, int c, int d, int e, int n, const int input, int output) { int sum = 0; for (int i = 0; i < n; ++i) sum += (((i ^ a) > b) && (((i \| c ) ^ d) > e)) ? 0 : input[i]; *output = sum; } The select statement in the loop body translates to two branch instructions "if ((i ^ a) > b)" and "if (((i \| c) ^ d) > e)" which share a common destination. With the default threshold, SimplifyCFG is unable to fold them, because computing the condition of the second branch "(i \| c) ^ d > e" requires two bonus instructions. With the threshold increased, SimplifyCFG can fold the two branches so that the loop body contains only one branch, making the code conceptually look like: sum += (((i ^ a) > b) & (((i \| c ) ^ d) > e)) ? 0 : input[i]; Increasing the threshold significantly improves the performance of this particular example. In the configuration where both conditions are guaranteed to be true, increasing the threshold from 1 to 2 improves the performance by 18.24%. Even in the configuration where the first condition is false and the second condition is true, which favors shortcuts, increasing the threshold from 1 to 2 still improves the performance by 4.35%. We are still looking for a good threshold and maybe a better cost model than just counting the number of bonus instructions. However, according to the above numbers, we think it is at least worth adding a threshold to enable more experiments and tuning. Let me know what you think. Thanks! Test Plan: Added one test case to check the threshold is in effect Reviewers: nadav, eliben, meheff, resistor, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D5529 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218711 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 22:23:38 +00:00
David Blaikie	76ff19ffa7	Disable the -gmlt optimization implemented in r218129 under Darwin due to issues with dsymutil. r218129 omits DW_TAG_subprograms which have no inlined subroutines when emitting -gmlt data. This makes -gmlt very low cost for -O0 builds. Darwin's dsymutil reasonably considers a CU empty if it has no subprograms (which occurs with the above optimization in -O0 programs without any force_inline function calls) and drops the line table, CU, and everything in this situation, making backtraces impossible. Until dsymutil is modified to account for this, disable this optimization on Darwin to preserve the desired functionality. (see r218545, which should be reverted after this patch, for other discussion/details) Footnote: In the long term, it doesn't look like this scheme (of simplified debug info to describe inlining to enable backtracing) is tenable, it is far too size inefficient for optimized code (the DW_TAG_inlined_subprograms, even once compressed, are nearly twice as large as the line table itself (also compressed)) and we'll be considering things like Cary's two level line table proposal to encode all this information directly in the line table. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218702 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 21:28:32 +00:00
Sanjay Patel	73a335f7f6	Use the target-specified iteration count to opt out of any further refinement of an estimate. NFC. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218700 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 20:44:23 +00:00
Sanjay Patel	cafc85bf1e	Split the estimate() interface into separate functions for each type. NFC. It was hacky to use an opcode as a switch because it won't always match (rsqrte != sqrte), and it looks like we'll need to add more special casing per arch than I had hoped for. Eg, x86 will prefer a different NR estimate implementation. ARM will want to use it's 'step' instructions. There also don't appear to be any new estimate instructions in any arch in a long, long time. Altivec vloge and vexpte may have been the first and last in that field... git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218698 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 20:28:48 +00:00
Juergen Ributzka	9952c922c2	Recommit r218010 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ. Note: This version fixed an issue with the TBZ/TBNZ instructions that were generated in FastISel. The issue was that the 64bit version of TBZ (TBZX) automagically sets the upper bit of the immediate field that is used to specify the bit we want to test. To test for any of the lower 32bits we have to first extract the subregister and use the 32bit version of the TBZ instruction (TBZW). Original commit message: Teach selectBranch to fold bit test and branch into a single instruction (TBZ or TBNZ). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218693 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 19:59:35 +00:00
Matt Arsenault	28233d3a63	R600/SI: Fix printing of clamp and omod No tests for omod since nothing uses it yet, but this should get rid of the remaining annoying trailing zeros after some instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218692 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 19:49:48 +00:00
Matt Arsenault	532a5c7dc6	R600/SI: Update VOP3b to not include obsolete operands abs / neg are now part of the srcN_modifiers operands git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218691 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 19:49:43 +00:00
Bradley Smith	95b3e168c5	Extend C disassembler API to allow specifying target features git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218682 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 16:31:40 +00:00
Reed Kotler	8a6f79e58d	Add numeric extend, trunctate to mips fast-isel Summary: Add numeric extend, trunctate to mips fast-isel Reactivates D4827 Test Plan: fpext.ll loadstoreconv.ll Reviewers: dsanders Subscribers: mcrosier Differential Revision: http://reviews.llvm.org/D5251 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218681 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 16:30:13 +00:00
Tom Coxon	8a23890385	[AArch64] Remove unnecessary whitespace. (Test commit) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218680 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 16:23:16 +00:00
Andrea Di Biagio	9e6df85d39	[DAG] Check in advance if a build_vector has a legal type before attempting to convert it into a shuffle. Currently, the DAG Combiner only tries to convert type-legal build_vector nodes into shuffles. This patch simply moves the logic that checks if a build_vector has a legal value type up before we even start analyzing the operands. This allows to early exit immediately from method 'visitBUILD_VECTOR' if the node type is known to be illegal. No functional change intended. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218677 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 15:30:22 +00:00
Alex Lorenz	38c59de6b1	llvm-cov: Use the number of executed functions for the function coverage metric. This commit fixes llvm-cov's function coverage metric by using the number of executed functions instead of the number of fully covered functions. Differential Revision: http://reviews.llvm.org/D5196 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218672 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 12:45:13 +00:00
Lorenzo Martignoni	f49592dddc	Introduce support for custom wrappers for vararg functions. Differential Revision: http://reviews.llvm.org/D5412 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218671 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 12:33:16 +00:00
Robert Khasanov	8acdc5232d	[AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VCMPGT{BWDQ}. Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218670 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 12:15:52 +00:00
Robert Khasanov	175ff01f0f	[AVX512] Added intrinsics for 128- and 256-bit versions of VCMPEQ{BWDQ} Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1. Now cmp intrinsics lower in the following way: (i8 (int_x86_avx512_mask_pcmpeq_q_128 (v2i64 %a), (v2i64 %b), (i8 %mask))) -> (i8 (bitcast (v8i1 (insert_subvector undef, (v2i1 (and (PCMPEQM %a, %b), (extract_subvector (v8i1 (bitcast %mask)), 0))), 0)))) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218669 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 11:41:54 +00:00
Robert Khasanov	cfa5724d50	[AVX512] Added intrinsics for VPCMPEQB and VPCMPEQW. Added new operand type for intrinsics (IIT_V64) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218668 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 11:32:22 +00:00
Robert Khasanov	58da66b2bf	[AVX512] Enabled intrinsics for VPCMPEQD and VPCMPEQQ. Added CMP_MASK intrinsic type git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218667 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 11:19:50 +00:00
Job Noorman	deb16c9eac	Make sure aggregates are properly alligned on MSP430. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218665 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 11:15:44 +00:00
Chad Rosier	ecea7ba518	[IndVarSimplify] Widen loop unsigned compares. This patch extends r217953 to handle unsigned comparison. Phabricator revision: http://reviews.llvm.org/D5526 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218659 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 03:17:42 +00:00
Chandler Carruth	4abb04a65c	[x86] Revert r218588, r218589, and r218600. These patches were pursuing a flawed direction and causing miscompiles. Read on for details. Fundamentally, the premise of this patch series was to map VECTOR_SHUFFLE DAG nodes into VSELECT DAG nodes for all blends because we are going to have to lower to VSELECT nodes for some blends to trigger the instruction selection patterns of variable blend instructions. This doesn't actually work out so well. In order to match performance with the existing VECTOR_SHUFFLE lowering code, we would need to re-slice the blend in order to fit it into either the integer or floating point blends available on the ISA. When coming from VECTOR_SHUFFLE (or other vNi1 style VSELECT sources) this works well because the X86 backend ensures that these types of operands to VSELECT get sign extended into '-1' and '0' for true and false, allowing us to re-slice the bits in whatever granularity without changing semantics. However, if the VSELECT condition comes from some other source, for example code lowering vector comparisons, it will likely only have the required bit set -- the high bit. We can't blindly slice up this style of VSELECT. Reid found some code using Halide that triggers this and I'm hopeful to eventually get a test case, but I don't need it to understand why this is A Bad Idea. There is another aspect that makes this approach flawed. When in VECTOR_SHUFFLE form, we have very distilled information that represents the constant blend mask. Converting back to a VSELECT form actually can lose this information, and so I think now that it is better to treat this as VECTOR_SHUFFLE until the very last moment and only use VSELECT nodes for instruction selection purposes. My plan is to: 1) Clean up and formalize the target pre-legalization DAG combine that converts a VSELECT with a constant condition operand into a VECTOR_SHUFFLE. 2) Remove any fancy lowering from VSELECT during legalization relying entirely on the DAG combine to catch cases where we can match to an immediate-controlled blend instruction. One additional step that I'm not planning on but would be interested in others' opinions on: we could add an X86ISD::VSELECT or X86ISD::BLENDV which encodes a fully legalized VSELECT node. Then it would be easy to write isel patterns only in terms of this to ensure VECTOR_SHUFFLE legalization only ever forms the fully legalized construct and we can't cycle between it and VSELECT combining. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218658 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 02:52:28 +00:00

1 2 3 4 5 ...

73237 Commits