llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2024-10-04 16:01:46 +00:00

Author	SHA1	Message	Date
Reed Kotler	8a6f79e58d	Add numeric extend, trunctate to mips fast-isel Summary: Add numeric extend, trunctate to mips fast-isel Reactivates D4827 Test Plan: fpext.ll loadstoreconv.ll Reviewers: dsanders Subscribers: mcrosier Differential Revision: http://reviews.llvm.org/D5251 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218681 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 16:30:13 +00:00
Alex Lorenz	a66e3bbf7a	Revert r218673 'llvm-cov: add test for report's function & file association.' Test causes buildbot failures. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218676 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 14:48:12 +00:00
Alex Lorenz	c174931a77	llvm-cov: add test for report's function & file association. This commit adds a test which checks that the functions defined in header files will get associated with the header files rather than the source files in the reports. Differential Revision: http://reviews.llvm.org/D5489 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218673 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 12:52:31 +00:00
Alex Lorenz	38c59de6b1	llvm-cov: Use the number of executed functions for the function coverage metric. This commit fixes llvm-cov's function coverage metric by using the number of executed functions instead of the number of fully covered functions. Differential Revision: http://reviews.llvm.org/D5196 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218672 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 12:45:13 +00:00
Lorenzo Martignoni	f49592dddc	Introduce support for custom wrappers for vararg functions. Differential Revision: http://reviews.llvm.org/D5412 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218671 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 12:33:16 +00:00
Robert Khasanov	8acdc5232d	[AVX512] Added intrinsics for 128-, 256- and 512-bit versions of VCMPGT{BWDQ}. Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218670 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 12:15:52 +00:00
Robert Khasanov	175ff01f0f	[AVX512] Added intrinsics for 128- and 256-bit versions of VCMPEQ{BWDQ} Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1. Now cmp intrinsics lower in the following way: (i8 (int_x86_avx512_mask_pcmpeq_q_128 (v2i64 %a), (v2i64 %b), (i8 %mask))) -> (i8 (bitcast (v8i1 (insert_subvector undef, (v2i1 (and (PCMPEQM %a, %b), (extract_subvector (v8i1 (bitcast %mask)), 0))), 0)))) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218669 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 11:41:54 +00:00
Robert Khasanov	cfa5724d50	[AVX512] Added intrinsics for VPCMPEQB and VPCMPEQW. Added new operand type for intrinsics (IIT_V64) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218668 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 11:32:22 +00:00
Robert Khasanov	58da66b2bf	[AVX512] Enabled intrinsics for VPCMPEQD and VPCMPEQQ. Added CMP_MASK intrinsic type git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218667 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 11:19:50 +00:00
Chad Rosier	ecea7ba518	[IndVarSimplify] Widen loop unsigned compares. This patch extends r217953 to handle unsigned comparison. Phabricator revision: http://reviews.llvm.org/D5526 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218659 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 03:17:42 +00:00
Chandler Carruth	4abb04a65c	[x86] Revert r218588, r218589, and r218600. These patches were pursuing a flawed direction and causing miscompiles. Read on for details. Fundamentally, the premise of this patch series was to map VECTOR_SHUFFLE DAG nodes into VSELECT DAG nodes for all blends because we are going to have to lower to VSELECT nodes for some blends to trigger the instruction selection patterns of variable blend instructions. This doesn't actually work out so well. In order to match performance with the existing VECTOR_SHUFFLE lowering code, we would need to re-slice the blend in order to fit it into either the integer or floating point blends available on the ISA. When coming from VECTOR_SHUFFLE (or other vNi1 style VSELECT sources) this works well because the X86 backend ensures that these types of operands to VSELECT get sign extended into '-1' and '0' for true and false, allowing us to re-slice the bits in whatever granularity without changing semantics. However, if the VSELECT condition comes from some other source, for example code lowering vector comparisons, it will likely only have the required bit set -- the high bit. We can't blindly slice up this style of VSELECT. Reid found some code using Halide that triggers this and I'm hopeful to eventually get a test case, but I don't need it to understand why this is A Bad Idea. There is another aspect that makes this approach flawed. When in VECTOR_SHUFFLE form, we have very distilled information that represents the constant blend mask. Converting back to a VSELECT form actually can lose this information, and so I think now that it is better to treat this as VECTOR_SHUFFLE until the very last moment and only use VSELECT nodes for instruction selection purposes. My plan is to: 1) Clean up and formalize the target pre-legalization DAG combine that converts a VSELECT with a constant condition operand into a VECTOR_SHUFFLE. 2) Remove any fancy lowering from VSELECT during legalization relying entirely on the DAG combine to catch cases where we can match to an immediate-controlled blend instruction. One additional step that I'm not planning on but would be interested in others' opinions on: we could add an X86ISD::VSELECT or X86ISD::BLENDV which encodes a fully legalized VSELECT node. Then it would be easy to write isel patterns only in terms of this to ensure VECTOR_SHUFFLE legalization only ever forms the fully legalized construct and we can't cycle between it and VSELECT combining. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218658 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 02:52:28 +00:00
Chandler Carruth	52b072d73f	[x86] Add some vector-register broadcast operations to the 256-bit v4 tests which were missing them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218657 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 02:32:36 +00:00
Matt Arsenault	cbb188bffc	R600: Fix broken check lines, missing scalar case. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218655 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 01:05:29 +00:00
Juergen Ributzka	a0af4b0271	[FastISel][AArch64] Fold sign-/zero-extends into the load instruction. The sign-/zero-extension of the loaded value can be performed by the memory instruction for free. If the result of the load has only one use and the use is a sign-/zero-extend, then we emit the proper load instruction. The extend is only a register copy and will be optimized away later on. Other instructions that consume the sign-/zero-extended value are also made aware of this fact, so they don't fold the extend too. This fixes rdar://problem/18495928. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218653 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-30 00:49:58 +00:00
Hans Wennborg	4edcbaec90	WinCOFFObjectWriter: optimize the string table for common suffices This is a follow-up from r207670 which did the same for ELF. Differential Revision: http://reviews.llvm.org/D5530 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218636 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-29 22:43:20 +00:00
Eric Christopher	6a2169eb6f	Add soft-float to the key for the subtarget lookup in the TargetMachine map, this makes sure that we can compile the same code for two different ABIs (hard and soft float) in the same module. Update one testcase accordingly (and fix some confusing naming) and add a new testcase as well with the ordering swapped which would highlight the problem. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218632 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-29 21:57:54 +00:00
Matt Arsenault	49cbc1891b	R600/SI: Also fix fsub + fadd a, a to mad combines git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218609 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-29 14:59:38 +00:00
Matt Arsenault	a5f45d5444	R600/SI: Fix using mad with multiplies by 2 These turn into fadds, so combine them into the target mad node. fadd (fadd (a, a), b) -> mad 2.0, a, b git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218608 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-29 14:59:34 +00:00
Chad Rosier	ea64dce261	[AArch64] Improve cost model to handle sdiv by a pow-of-two. This patch improves the target-specific cost model to better handle signed division by a power of two. The immediate result is that this enables the SLP vectorizer to do a better job. http://reviews.llvm.org/D5469 PR20714 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218607 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-29 13:59:31 +00:00
Kevin Qin	dbaeb6e7cb	Use a loop to simplify the runtime unrolling prologue. Runtime unrolling will create a prologue to execute the extra iterations which is can't divided by the unroll factor. It generates an if-then-else sequence to jump into a factor -1 times unrolled loop body, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: if (extraiters == loopfactor) jump L1 if (extraiters == loopfactor-1) jump L2 ... L1: LoopBody; L2: LoopBody; ... if tripcount < loopfactor jump End Loop: ... End: It means if the unroll factor is 4, the loop body will be 7 times unrolled, 3 are in loop prologue, and 4 are in the loop. This commit is to use a loop to execute the extra iterations in prologue, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: else jump Prol Prol: LoopBody; extraiters -= 1 // Omitted if unroll factor is 2. if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2. if (tripcount < loopfactor) jump End Loop: ... End: Then when unroll factor is 4, the loop body will be copied by only 5 times, 1 in the prologue loop, 4 in the original loop. And if the unroll factor is 2, new loop won't be created, just as the original solution. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218604 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-29 11:15:00 +00:00
Oliver Stannard	017c6111a8	[Thumb2] ldrexd and strexd are not defined on v7M The Thumb2 ldrexd and strexd instructions are not defined for M-class architectures. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218603 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-29 10:57:29 +00:00
Chandler Carruth	8ac2f142a8	[x86] Make the new vector shuffle lowering lower blends as VSELECT nodes, and rely exclusively on its logic. This removes a ton of duplication from the blend lowering and centralizes it in one place. One downside is that it requires a bunch of hacks to make this work with the current legalization framework. We have to manually speculate one aspect of legalizing VSELECT nodes to get everything to work nicely because the existing legalization framework isn't actually bottom-up. The other grossness is that we somewhat duplicate the analysis of constant blends. I'm on the fence here. If reviewers thing this would look better with VSELECT when it has constant operands dumping over tho VECTOR_SHUFFLE, we could go that way. But it would be a substantial change because currently all of the actual blend instructions are matched via patterns in the TD files based around VSELECT nodes (despite them not being perfect fits for that). Suggestions welcome, but at least this removes the rampant duplication in the backend. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218600 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-29 09:57:07 +00:00
Chandler Carruth	d23f1883d3	[x86] Delete a bunch of really bad and totally unnecessary code in the X86 target-specific DAG combining that tried to convert VSELECT nodes into VECTOR_SHUFFLE nodes that it "knew" would lower into immediate-controlled blend nodes. Turns out, we have perfectly good lowering of all these VSELECT nodes, and indeed that lowering already knows how to handle lowering through BLENDI to immediate-controlled blend nodes. The code just wasn't getting used much because this thing forced the world to go through the vector shuffle lowering. Yuck. This also exposes that I was too aggressive in avoiding domain crossing in v218588 with that lowering -- when the other option is to expand into two 128-bit vectors, it is worth domain crossing. Restore that behavior now that we have nice tests covering it. The test updates here fall into two camps. One is where previously we ended up with an unsigned encoding of the blend operand and now we get a signed encoding. In most of those places there were elaborate comments explaining exactly what these operands really mean. Rather than that, just switch these tests to use the nicely decoded comments that make it obvious that the final shuffle matches. The other updates are just removing pointless domain crossing by blending integers with PBLENDW rather than BLENDPS. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218589 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-29 02:01:20 +00:00
Chandler Carruth	8e93ce1780	[x86] Add the dispatch skeleton to the new vector shuffle lowering for AVX-512. There is no interesting logic yet. Everything ends up eventually delegating to the generic code to split the vector and shuffle the halves. Interestingly, that logic does a significantly better job of lowering all of these types than the generic vector expansion code does. Mostly, it lets most of the cases fall back to nice AVX2 code rather than all the way back to SSE code paths. Step 2 of basic AVX-512 support in the new vector shuffle lowering. Next up will be to incrementally add direct support for the basic instruction set to each type (adding tests first). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218585 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-29 00:37:27 +00:00
Chandler Carruth	b61dfec824	[x86] Teach the new vector shuffle lowering to fall back on AVX-512 vectors. Someone will need to build the AVX512 lowering, which should follow AVX1 and AVX2 very closely for AVX512F and AVX512BW resp. I've added a dummy test which is a port of the v8f32 and v8i32 tests from AVX and AVX2 to v8f64 and v8i64 tests for AVX512F and AVX512BW. Hopefully this is enough information for someone to implement proper lowering here. If not, I'll be happy to help, but right now the AVX-512 support isn't a priority for me. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218583 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-28 23:53:10 +00:00
Chandler Carruth	4f4280469c	[x86] Fix the new vector shuffle lowering's use of VSELECT for AVX2 lowerings. This was hopelessly broken. First, the x86 backend wants '-1' to be the element value representing true in a boolean vector, and second the operand order for VSELECT is backwards from the actual x86 instructions. To make matters worse, the backend is just using '-1' as the true value to get the high bit to be set. It doesn't actually symbolically map the '-1' to anything. But on x86 this isn't quite how it works: there only the high bit is relevant. As a consequence weird non-'-1' values like 0x80 actually "work" once you flip the operands to be backwards. Anyways, thanks to Hal for helping me sort out what these should be. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218582 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-28 23:23:55 +00:00
Chandler Carruth	3f40848670	[x86] Fix a really silly bug that I introduced fixing another bug in the new vector shuffle target DAG combines -- it helps to actually test for the value you want rather than just using an integer in a boolean context. Have I mentioned that I loathe implicit conversions recently? :: sigh :: git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218576 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-28 06:11:04 +00:00
Chandler Carruth	21b69296fb	[x86] Fix yet another bug in the new vector shuffle lowering's handling of widening masks. We can't widen a zeroing mask unless both elements that would be merged are either zeroed or undef. This is the only way to widen a mask if it has a zeroed element. Also clean up the code here by ordering the checks in a more logical way and by using the symoblic values for undef and zero. I'm actually torn on using the symbolic values because the existing code is littered with the assumption that -1 is undef, and moreover that entries '< 0' are the special entries. While that works with the values given to these constants, using the symbolic constants actually makes it a bit more opaque why this is the case. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218575 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-28 03:30:25 +00:00
James Molloy	aada52189e	[AArch64] Redundant store instructions should be removed as dead code If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed. This problem is found in spec2006-197.parser. For example, stur w10, [x11, #-4] stur w10, [x11, #-4] Then one of the two stur instructions can be removed. Patch by David Xu! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218569 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-27 17:02:54 +00:00
Craig Topper	00666192c1	Update test case to match minor formatting change introduced in r218563. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218564 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-27 05:36:53 +00:00
Chandler Carruth	72c3b07dfd	[x86] Fix terrible bugs everywhere in the new vector shuffle lowering and in the target shuffle combining when trying to widen vector elements. Previously only one of these was correct, and we didn't correctly propagate zeroing target shuffle masks (which have a different sentinel value from undef in non- target shuffle masks now). This isn't just a missed optimization, this caused us to drop zeroing shuffles on the floor and miscompile code. The added test case is one example of that. There are other fixes to the test suite as a consequence of this as well as restoring the undef elements in some of the masks that were lost when I brought sanity to the actual value of the undef and zero sentinels. I've also just cleaned up some of the PSHUFD and PSHUFLW and PSHUFHW combining code, but that code really needs to go. It was a nice initial attempt, but it isn't very principled and the recursive shuffle combiner is much more powerful. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218562 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-27 04:42:44 +00:00
Chandler Carruth	8470b5b812	[x86] Flip the sentinel values used in the target shuffle mask decoding to significantly more sane sentinels. Notably, everywhere else in the backend's representation of shuffles uses '-1' to represent undef. The target shuffle masks really shouldn't diverge from that, especially as in a few places they are manipulated by shared code. This causes us to lose some undef lanes in various test masks. I want to get these back, but technically it isn't invalid and there are a lot of bugs here so I want to try to establish a saner baseline for fixing some of the bugs by aligning the specific senitnel values used. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218561 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-27 04:42:39 +00:00
Craig Topper	00bc445d75	Fix TableGen -gen-disassembler output for bit fields with an offset. This fixes bit assignments like this Inst{7-0} = Foo{9-2} Patch by Steve King. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218560 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-27 04:38:02 +00:00
Sanjay Patel	676af35b38	Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2). This is purely refactoring. No functional changes intended. PowerPC is the only target that is currently using this interface. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) And: z = y / x into: z = y * rcpe(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction along with the number of refinement steps needed to make the estimate usable. Differential Revision: http://reviews.llvm.org/D5484 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218553 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 23:01:47 +00:00
David Majnemer	01ea611601	Object: BSS/virtual sections don't have contents Users of getSectionContents shouldn't try to pass in BSS or virtual sections. In all instances, this is a bug in the code calling this routine. N.B. Some COFF implementations (like CL) will mark their BSS sections as taking space on disk. This would confuse COFFObjectFile into thinking the section is larger than the file. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218549 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 22:32:16 +00:00
Kevin Enderby	7118c73867	Update llvm-objdump’s Mach-O symbolizer code to print the name of symbol stubs. So in fully linked images when a call is made through a stub it now gets a comment like the following in the disassembly: callq 0x100000f6c ## symbol stub for: _printf indicating the call is to a symbol stub and which symbol it is for. This is done for branch reference types and seeing if the branch target is in a stub section and if so using the indirect symbol table entry for that stub and using that symbol table entries symbol name. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218546 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 22:20:44 +00:00
Chandler Carruth	0a31a52b91	[x86] Fix a moderately terrifying bug in the new 128-bit shuffle logic that managed to elude all of my fuzz testing historically. =/ Something changed to allow this code path to actually be exercised and it was doing bad things. It is especially heavily exercised by the patterns that emerge when doing AVX shuffles that end up lowered through the 128-bit code path. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218540 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 20:41:45 +00:00
Chad Rosier	4150a8de76	[IndVar] Don't widen loop compare unless IV user is sign extended. PR21030 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218539 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 20:05:35 +00:00
Matt Arsenault	5435c66a33	R600/SI: Add strict check lines to div_scale tests. This has weird operand requirements so it's worthwhile to have very strict checks for its operands. Add different combinations of SGPR operands. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218535 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 17:55:11 +00:00
Matt Arsenault	d991d2217b	R600/SI Allow same SGPR to be used for multiple operands Instead of moving the first SGPR that is different than the first, legalize the operand that requires the fewest moves if one SGPR is used for multiple operands. This saves extra moves and is also required for some instructions which require that the same operand be used for multiple operands. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218532 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 17:55:03 +00:00
Matt Arsenault	aed12d4bad	R600/SI: Partially move operand legalization to post-isel hook. Disable the SGPR usage restriction parts of the DAG legalizeOperands. It now should only be doing immediate folding until it can be replaced later. The real legalization work is now done by the other SIInstrInfo::legalizeOperands git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218531 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 17:54:59 +00:00
Matt Arsenault	8a70e28114	R600/SI: Don't move operands that are required to be SGPRs e.g. v_cndmask_b32 requires the condition operand be an SGPR. If one of the source operands were an SGPR, that would be considered the one SGPR use and the condition operand would be illegally moved. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218529 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 17:54:52 +00:00
Matt Arsenault	26b2a7834e	R600/SI: Fix using wrong operand indices when commuting No test since the current SIISelLowering::legalizeOperands effectively hides this, and the general uses seem to only fire on SALU instructions which don't have modifiers between the operands. When trying to use legalizeOperands immediately after instruction selection, it now sees a lot more patterns it did not see before which break on this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218527 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 17:54:43 +00:00
David Peixotto	ea468dddfe	Ignore annotation function calls in cost computation The annotation instructions are dropped during codegen and have no impact on size. In some cases, the annotations were preventing the unroller from unrolling a loop because the annotation calls were pushing the cost over the unrolling threshold. Differential Revision: http://reviews.llvm.org/D5335 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218525 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 17:48:40 +00:00
Chandler Carruth	7929a210d5	[x86] In the new vector shuffle lowering, when trying to do another layer of tie-breaking sorting, it really helps to check that you're in a tie first. =] Otherwise the whole thing cycles infinitely. Test case added, another one found through fuzz testing. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218523 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 17:24:26 +00:00
Chandler Carruth	7164a4ae0a	[x86] Fix a large collection of bugs that crept in as I fleshed out the AVX support. New test cases included. Note that none of the existing test cases covered these buggy code paths. =/ Also, it is clear from this that SHUFPS and SHUFPD are the most bug prone shuffle instructions in x86. =[ These were all detected by fuzz-testing. (I <3 fuzz testing.) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218522 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 17:11:02 +00:00
Renato Golin	6215f78195	Elide repeated register operand in Thumb1 instructions This patch makes the ARM backend transform 3 operand instructions such as 'adds/subs' to the 2 operand version of the same instruction if the first two register operands are the same. Example: 'adds r0, r0, #1' will is transformed to 'adds r0, #1'. Currently for some instructions such as 'adds' if you try to assemble 'adds r0, r0, #8' for thumb v6m the assembler would throw an error message because the immediate cannot be encoded using 3 bits. The backend should be smart enough to transform the instruction to 'adds r0, #8', which allows for larger immediate constants. Patch by Ranjeet Singh. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218521 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 16:14:29 +00:00
Robert Khasanov	26ba182fdf	[AVX512] Added load/store from BW/VL subsets to Register2Memory opcode tables. Added lowering tests for these instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218508 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 09:48:50 +00:00
David Majnemer	178cb752c7	llvm-vtabledump: strip trailing NUL bytes git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218502 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 05:50:45 +00:00
David Majnemer	035e22bc06	llvm-vtabledump: Dump RTTI structures for the MS ABI git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218498 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 04:21:51 +00:00
David Xu	abf5bf221f	Revert patch of r218493, delete the test case git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218495 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 02:40:54 +00:00
David Xu	c41ae2a5c4	Redundant store instructions should be removed as dead code git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218493 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 02:02:09 +00:00
Eric Christopher	55a90ab4ef	Add the first backend support for on demand subtarget creation based on the Function. This is currently used to implement mips16 support in the mips backend via the existing module pass resetting the subtarget. Things to note: a) This involved running resetTargetOptions before creating a new subtarget so that code generation options like soft-float could be recognized when creating the new subtarget. This is to deal with initialization code in isel lowering that only paid attention to the initial value. b) Many of the existing testcases weren't using the soft-float feature correctly. I've corrected these based on the check values assuming that was the desired behavior. c) The mips port now pays attention to the target-cpu and target-features strings when generating code for a particular function. I've removed these from one function where the requested cpu and features didn't match the check lines in the testcase. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218492 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 01:44:08 +00:00
Matt Arsenault	deaa9d8c72	R600: Avoid repeated check lines git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218487 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 01:12:36 +00:00
Matt Arsenault	584886c0bb	R600/SI: Fix emitting trailing whitespace after s_waitcnt git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218486 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-26 01:09:46 +00:00
Adam Nemet	2f3ccfc257	[AVX512] Make vextractx4/vinsertx4 tests check for the index as well Extend test so that it provides coverage for the next commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218479 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 23:48:47 +00:00
Matt Arsenault	3011a602be	R600: Fix some missing conversion testcases git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218474 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 23:16:18 +00:00
Matt Arsenault	556ae0484a	Remove duplicated RUN lines in middle of test git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218473 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 23:16:14 +00:00
Bruno Cardoso Lopes	f4230250a1	[MachineSink+PGO] Teach MachineSink to use BlockFrequencyInfo Machine Sink uses loop depth information to select between successors BBs to sink machine instructions into, where BBs within smaller loop depths are preferable. This patch adds support for choosing between successors by using profile information from BlockFrequencyInfo instead, whenever the information is available. Tested it under SPEC2006 train (average of 30 runs for each program); ~1.5% execution speedup in average on x86-64 darwin. <rdar://problem/18021659> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218472 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 23:14:26 +00:00
Tom Stellard	29d48e6a49	R600/SI: Add support for global atomic add git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218457 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 18:30:26 +00:00
Robin Morisset	79826e015e	Lower idempotent RMWs to fence+load Summary: I originally tried doing this specifically for X86 in the backend in D5091, but it was rather brittle and generally running too late to be general. Furthermore, other targets may want to implement similar optimizations. So I reimplemented it at the IR-level, fitting it into AtomicExpandPass as it interacts with that pass (which could not be cleanly done before at the backend level). This optimization relies on a new target hook, which is only used by X86 for now, as the correctness of the optimization on other targets remains an open question. If it is found correct on other targets, it should be trivial to enable for them. Details of the optimization are discussed in D5091. Test Plan: make check-all + a new test Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5422 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218455 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 17:27:43 +00:00
Sid Manning	733681d3bd	Add missing attributes !cmp.[eq,gt,gtu] instructions. These instructions do not indicate they are extendable or the number of bits in the extendable operand. Rename to match architected names. Add a testcase for the intrinsics. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218453 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 13:09:54 +00:00
Daniel Sanders	03fe69e90d	[mips] Add CCValAssign::[ASZ]ExtUpper and CCPromoteToUpperBitsInType and handle struct's correctly on big-endian N32/N64 return values. Summary: The N32/N64 ABI's require that structs passed in registers are laid out such that spilling the register with 'sd' places the struct at the lowest address. For little endian this is trivial but for big-endian it requires that structs are shifted into the upper bits of the register. We also require that structs passed in registers have the 'inreg' attribute for big-endian N32/N64 to work correctly. This is because the tablegen-erated calling convention implementation only has access to the lowered form of struct arguments (one or more integers of up to 64-bits each) and is unable to determine the original type. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5286 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218451 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 12:15:05 +00:00
Renato Golin	6765c34b0c	Add aliases for VAND imm to VBIC ~imm On ARM NEON, VAND with immediate (16/32 bits) is an alias to VBIC ~imm with the same type size. Adding that logic to the parser, and generating VBIC instructions from VAND asm files. This patch also fixes the validation routines for NEON splat immediates which were wrong. Fixes PR20702. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218450 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 11:31:24 +00:00
Chandler Carruth	4b667ee436	[x86] Teach the new vector shuffle lowering to use AVX2 instructions for v4f64 and v8f32 shuffles when they are lane-crossing. We have fully general lane-crossing permutation functions in AVX2 that make this easy. Part of this also changes exactly when and how these vectors are split up when we don't have AVX2. This isn't always a win but it usually is a win, so on the balance I think its better. The primary regressions are all things that just need to be fixed anyways such as modeling when a blend can be completely accomplished via VINSERTF128, etc. Also, this highlights one of the few remaining big features: we do a really poor job of inserting elements into AVX registers efficiently. This completes almost all of the big tricks I have in mind for AVX2. The only things left that I plan to add: 1) element insertion smarts 2) palignr and other fairly specialized lowerings when they happen to apply git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218449 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 11:03:55 +00:00
Chandler Carruth	05901d80ba	[x86] Teach the new vector shuffle lowering a fancier way to lower 256-bit vectors with lane-crossing. Rather than immediately decomposing to 128-bit vectors, try flipping the 256-bit vector lanes, shuffling them and blending them together. This reduces our worst case shuffle by a pretty significant margin across the board. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218446 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 10:21:15 +00:00
Oliver Stannard	f220c5387b	[Thumb2] BXJ should be undefined for v7M, v8A The Thumb2 BXJ instruction (Branch and Exchange Jazelle) is not defined for v7M or v8A. It is defined for all other Thumb2-supporting architectures (v6T2, v7A and v7R). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218445 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 10:02:05 +00:00
Chandler Carruth	2e8d2c727c	[x86] Fix an oversight in the v8i32 path of the new vector shuffle lowering where it only used the mask of the low 128-bit lane rather than the entire mask. This allows the new lowering to correctly match the unpack patterns for v8i32 vectors. For reference, the reason that we check for the the entire mask rather than checking the repeated mask is because the repeated masks don't abide by all of the invariants of normal masks. As a consequence, it is safer to use the full mask with functions like the generic equivalence test. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218442 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 04:10:27 +00:00
Chandler Carruth	e3bb4bb2d5	[x86] Implement AVX2 support for v32i8 in the new vector shuffle lowering. This completes the basic AVX2 feature support, but there are still some improvements I'd like to do to really get the last mile of performance here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218440 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 02:52:12 +00:00
Chandler Carruth	1d63231455	[x86] More tweaks to the v32i8 test cases. I made a mistake in the previous commit and produced the wrong pattern. Fix that. Also make one more shuffle pattern byte-based rather than word-based, and add two more blend patterns. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218439 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 02:44:39 +00:00
Chandler Carruth	a87d04a759	[x86] Re-work a bunch of the v32i8 test cases to actually involve byte shuffles rather than word shuffles. As you might guess, these were built starting from the word shuffle test cases and I failed to properly port a bunch of them and left them as widened word shuffle test cases. We still have a couple of tests that check our ability to widen shuffles, but now we will test the actual byte shuffle quite a bit better. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218438 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 02:20:02 +00:00
Reid Kleckner	dd8ce126d7	MC: Use @IMGREL instead of @IMGREL32, which we can't parse Nico Rieck added support for this 32-bit COFF relocation some time ago for Win64 stuff. It appears that as an oversight, the assembly output used "foo"@IMGREL32 instead of "foo"@IMGREL, which is what we can parse. Sadly, there were actually tests that took in IMGREL and put out IMGREL32, and we didn't notice the inconsistency. Oh well. Now LLVM can assemble it's own output with slightly more fidelity. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218437 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 02:09:18 +00:00
Chandler Carruth	ef673b3c73	[x86] Fix the v16i16 blend logic I added in the prior commit and add the missing test cases for it. Unsurprisingly, without test cases, there were bugs here. Surprisingly, this bug wasn't caught at compile time. Yep, there is an X86ISD::BLENDV. It isn't wired to anything. Oops. I'll fix than next. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218434 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 01:13:38 +00:00
Justin Bogner	aacc919bfd	llvm-cov: Combine segments that cover the same location If we have multiple coverage counts for the same segment, we need to add them up rather than arbitrarily choosing one. This fixes that and adds a test with template instantiations to exercise it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218432 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 00:34:18 +00:00
Akira Hatanaka	0253523c92	[X86,AVX] Add an isel pattern for X86VBroadcast. This fixes PR21050 and rdar://problem/18434607. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218431 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 00:26:15 +00:00
Chandler Carruth	bdecfeb723	[x86] Implement v16i16 support with AVX2 in the new vector shuffle lowering. This also implements the fancy blend lowering for v16i16 using AVX2 and teaches the X86 backend to print shuffle masks for 256-bit PSHUFB and PBLENDW instructions. It also makes the mask decoding correct for PBLENDW instructions. The yaks, they are legion. Tests are updated accordingly. There are some missing tests for the VBLENDVB lowering, but I'll add those in a follow-up as this commit has accumulated enough cruft already. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218430 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-25 00:24:19 +00:00
Kevin Enderby	e793862979	Flush out enough of llvm-objdump’s SymbolizerSymbolLookUp() for Mach-O files to get the literal string “Hello world” printed as a comment on the instruction that loads the pointer to it. For now this is just for x86_64. So for object files with relocation entries it produces things like: leaq L_.str(%rip), %rax ## literal pool for: "Hello world\n" and similar for fully linked images like executables: leaq 0x4f(%rip), %rax ## literal pool for: "Hello world\n" Also to allow testing against darwin’s otool(1), I hooked up the existing -no-show-raw-insn option to the Mach-O parser code, added the new Mach-O only -full-leading-addr option to match otool(1)'s printing of addresses and also added the new -print-imm-hex option. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218423 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 23:08:22 +00:00
Kostya Serebryany	0e9d114865	[asan] don't instrument module CTORs that may be run before asan.module_ctor. This fixes asan running together -coverage git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218421 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 22:41:55 +00:00
Renato Golin	d4c244f2eb	Removing empty ARM tests from failed revert git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218419 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 21:58:04 +00:00
Renato Golin	ae3027caf5	Removing empty tests from failed revert git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218417 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 21:45:26 +00:00
Renato Golin	bb994f55a4	Revert 218406 - Refactor the RelocVisitor::visit method git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218416 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 21:30:43 +00:00
Renato Golin	450403a79e	Revert 218407 - Add support for ARM and AArch64 BE object files git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218415 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 21:30:14 +00:00
Renato Golin	6c34b467be	Revert 218408 - Report endianness in output of {dwarf, obj}dump git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218414 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 21:29:45 +00:00
Renato Golin	7fff39a194	Revert 218411 - XFAIL reloc test on x86/hexagon git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218413 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 21:28:53 +00:00
Renato Golin	ebf023f5bf	XFAIL reloc test on x86/hexagon git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218411 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 21:00:30 +00:00
Renato Golin	329e09f08c	Report endianness in output of {dwarf, obj}dump For biendian targets like ARM and AArch64, it is useful to have the output of the llvm-dwarfdump and llvm-objdump report the endianness used when the object files were generated. Patch by Charlie Turner. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218408 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 20:07:41 +00:00
Renato Golin	bc7101b134	Add support for ARM and AArch64 BE object files This change fixes the ARM and AArch64 relocation visitors in RelocVisitor. They were unconditionally assuming the object data are little-endian. Tests have been added to ensure that the llvm-dwarfdump utility does not crash when processing big-endian object files. Patch by Charlie Turner. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218407 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 20:07:30 +00:00
Renato Golin	c0104e4001	Refactor the RelocVisitor::visit method This change replaces the brittle if/else chain of string comparisons with a switch statement on the detected target triple, removing the need for testing arbitrary architecture names returned from getFileFormatName, whose primary purpose seems to be for display (user-interface) purposes. The visitor now takes a reference to the object file, rather than its arbitrary file format name to figure out whether the file is a 32 or 64-bit object file and what the detected target triple is. A set of tests have been added to help show that the refactoring processes relocations for the same targets as the original code. Patch by Charlie Turner. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218406 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 20:07:22 +00:00
Scott Douglass	f14380a2e8	pass environment when invoking llvm-config from lit.cfg Use the same environment when invoking llvm-config from lit.cfg as will be used when running tests, so that ASAN_OPTIONS, INCLUDE, etc. are present. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218403 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 18:37:48 +00:00
Kaelyn Takata	9917d2e7ad	Revert "Add support for ARM and AArch64 BE object files" This reverts commit r218389 as it depends on r218388. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218398 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 18:00:20 +00:00
Kaelyn Takata	48ac014ac1	Revert "Report endianness in output of {dwarf, obj}dump" This reverts commit r218391 as it depends on r218388 and r218389 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218397 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 18:00:17 +00:00
Kaelyn Takata	a0d6422afe	Revert "Refactor the RelocVisitor::visit method" This reverts commit `faac033f73`. The test depends on all targets to be enabled in llc in order to pass, and needs to be rewritten/refactored to not have that dependency. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218393 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 17:49:07 +00:00
Renato Golin	c3a1f3e136	Report endianness in output of {dwarf, obj}dump For biendian targets like ARM and AArch64, it is useful to have the output of the llvm-dwarfdump and llvm-objdump report the endianness used when the object files were generated. Patch by Charlie Turner. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218391 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 17:01:33 +00:00
Renato Golin	1513681a0b	Add support for ARM and AArch64 BE object files This change fixes the ARM and AArch64 relocation visitors in RelocVisitor. They were unconditionally assuming the object data are little-endian. Tests have been added to ensure that the llvm-dwarfdump utility does not crash when processing big-endian object files. Patch by Charlie Turner. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218389 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 17:01:06 +00:00
Renato Golin	faac033f73	Refactor the RelocVisitor::visit method This change replaces the brittle if/else chain of string comparisons with a switch statement on the detected target triple, removing the need for testing arbitrary architecture names returned from getFileFormatName, whose primary purpose seems to be for display (user-interface) purposes. The visitor now takes a reference to the object file, rather than its arbitrary file format name to figure out whether the file is a 32 or 64-bit object file and what the detected target triple is. A set of tests have been added to help show that the refactoring processes relocations for the same targets as the original code. Patch by Charlie Turner. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218388 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 17:00:42 +00:00
David Peixotto	cfc42962c8	Fix assertion in LICM doFinalization() The doFinalization method checks that the LoopToAliasSetMap is empty. LICM populates that map as it runs through the loop nest, deleting the entries for child loops as it goes. However, if a child loop is deleted by another pass (e.g. unrolling) then the loop will never be deleted from the map because LICM walks the loop nest to find entries it can delete. The fix is to delete the loop from the map and free the alias set when the loop is deleted from the loop nest. Differential Revision: http://reviews.llvm.org/D5305 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218387 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 16:48:31 +00:00
Moritz Roth	8c4e64af8a	[Thumb] Make load/store optimizer less conservative. If it's safe to clobber the condition flags, we can do a few extra things: it's then possible to reset the base register writeback using a SUBS, so we can try to merge even if the base register isn't dead after the merged instruction. This is effectively a (heavily bug-fixed) rewrite of r208992. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218386 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 16:35:50 +00:00
Oliver Stannard	43c6b6be8f	[Thumb] 32-bit encodings of 'cps' are not valid for v7M v7M only allows the 16-bit encoding of the 'cps' (Change Processor State) instruction, and does not have the 32-bit encoding which is valid from v6T2 onwards. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218382 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 14:20:01 +00:00
Chandler Carruth	10cd8098a7	[x86] Teach the instruction lowering to add comments describing constant pool data being loaded into a vector register. The comments take the form of: # ymm0 = [a,b,c,d,...] # xmm1 = <x,y,z...> The []s are used for generic sequential data and the <>s are used for specifically ConstantVector loads. Undef elements are printed as the letter 'u', integers in decimal, and floating point values as floating point values. Suggestions on improving the formatting or other aspects of the display are very welcome. My primary use case for this is to be able to FileCheck test masks passed to vector shuffle instructions in-register. It isn't fantastic for that (no decoding special zeroing semantics or other tricks), but it at least puts the mask onto an instruction line that could reasonably be checked. I've updated many of the new vector shuffle lowering tests to leverage this in their test cases so that we're actually checking the shuffle masks remain as expected. Before implementing this, I tried a bunch of different approaches. I looked into teaching the MCInstLower code to scan up the basic block and find a definition of a register used in a shuffle instruction and then decode that, but this seems incredibly brittle and complex. I talked to Hal a lot about the "right" way to do this: attach the raw shuffle mask to the instruction itself in some form of unencoded operands, and then use that to emit the comments. I still think that's the optimal solution here, but it proved to be beyond what I'm up for here. In particular, it seems likely best done by completing the plumbing of metadata through these layers and attaching the shuffle mask in metadata which could have fully automatic dropping when encoding an actual instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218377 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 09:39:41 +00:00
Matt Arsenault	0bb38df86c	R600/SI: Fix weird CHECK-DAG usage This prevents these from failing in a future commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218356 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 02:14:26 +00:00
Tom Stellard	81c6c9690a	R600/SI: Enable selecting SALU inside branches We can do this now that the FixSGPRLiveRanges pass is working. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218353 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 01:33:28 +00:00
Chandler Carruth	6717f9d907	[x86] Teach the new vector shuffle lowering to lower v8i32 shuffles with the native AVX2 instructions. Note that the test case is really frustrating here because VPERMD requires the mask to be in the register input and we don't produce a comment looking through that to the constant pool. I'm going to attempt to improve this in a subsequent commit, but not sure if I will succeed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218347 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 01:24:44 +00:00
Chandler Carruth	8415f84e49	[x86] Fix a really terrible bug in the repeated 128-bin-lane shuffle detection. It was incorrectly handling undef lanes by actually treating an undef lane in the first 128-bit lane as a numeric shuffle value. Fortunately, this almost always DTRT and disabled detecting repeated patterns. But not always. =/ This patch introduces a much more principled approach and fixes the miscompiles I spotted by inspection previously. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218346 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-24 01:03:57 +00:00
Robin Morisset	73ce2886b1	Fix swift-atomics testcase This testcase was not testing what it meant: because there were only two checks for dmb {{ish}} in the second function, it could have missed a bug where one of the three required dmb {{ish}} became dmb {{ishst}}. As I was fixing it, I also added CHECK-LABELs to make it a bit less brittle. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218341 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 23:18:01 +00:00
Chandler Carruth	30ce74b5e3	[x86] Teach the new vector shuffle lowering to lower v4i64 vector shuffles using the AVX2 instructions. This is the first step of cutting in real AVX2 support. Note that I have spotted at least one bug in the test cases already, but I suspect it was already present and just is getting surfaced. Will investigate next. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218338 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 22:39:02 +00:00
Reid Kleckner	8577eaf8e6	GlobalOpt: Preserve comdats of unoptimized initializers Rather than slurping in and splatting out the whole ctor list, preserve the existing array entries without trying to understand them. Only remove the entries that we know we can optimize away. This way we don't need to wire through priority and comdats or anything else we might add. Fixes a linker issue where the .init_array or .ctors entry would point to discarded initialization code if the comdat group from the TU with the faulty global_ctors entry was dropped. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218337 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 22:33:01 +00:00
Jim Grosbach	bd847644b3	AArch64: allow constant expressions for shifted reg literals e.g., add w1, w2, w3, lsl #(2 - 1) This sort of thing comes up in pre-processed assembly playing macro games. Still validate that it's an assembly time constant. The early exit error check was just a bit overzealous and disallowed a left paren. rdar://18430542 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218336 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 22:16:02 +00:00
Chandler Carruth	798f2849c3	[x86] Teach the rest of the 'target shuffle' machinery about blends and add VPBLENDD to the InstPrinter's comment generation so we get nice comments everywhere. Now that we have the nice comments, I can see the bug introduced by a silly typo in the commit that enabled VPBLENDD, and have fixed it. Yay tests that are easy to inspect. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218335 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 22:14:14 +00:00
Robin Morisset	30e7514d01	[X86] Make wide loads be managed by AtomicExpand Summary: AtomicExpand already had logic for expanding wide loads and stores on LL/SC architectures, and for expanding wide stores on CmpXchg architectures, but not for wide loads on CmpXchg architectures. This patch fills this hole, and makes use of this new feature in the X86 backend. Only one functionnal change: we now lose the SynchScope attribute. It is regrettable, but I have another patch that I will submit soon that will solve this for all of AtomicExpand (it seemed better to split it apart as it is a different concern). Test Plan: make check-all (lots of tests for this functionality already exist) Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5404 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218332 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 20:59:25 +00:00
Robin Morisset	58bca6e8ec	[Power] Use AtomicExpandPass for fence insertion, and use lwsync where appropriate Summary: This patch makes use of AtomicExpandPass in Power for inserting fences around atomic as part of an effort to remove fence insertion from SelectionDAGBuilder. As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic lwsync) instead of sync 0 (heavyweight sync) in many cases. I also added a test, as there was no test for the barriers emitted by the Power backend for atomic loads and stores. Test Plan: new test + make check-all Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5180 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218331 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 20:46:49 +00:00
Chandler Carruth	7024c7e949	[x86] Teach the new shuffle lowering's blend functionality to use AVX2's VPBLENDD where appropriate even on 128-bit vectors. According to Agner's tables, this instruction is significantly higher throughput (can execute on any port) on Haswell chips so we should aggressively try to form it when available. Sadly, this loses our delightful shuffle comments. I'll add those back for VPBLENDD next. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218322 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 18:16:12 +00:00
Oliver Stannard	abe1cb7985	Fix segfault in AArch64 backend with -g and -mbig-endian Fix a null pointer dereference when trying to swap the endianness of fixups in the .eh_frame section in the AArch64 backend. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218311 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 15:38:11 +00:00
Timur Iskhodzhanov	cda6e24915	Fix a small typo in the test comment git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218306 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 14:07:12 +00:00
Timur Iskhodzhanov	afb2f169a3	Rebuild the inputs for the codeview-linetables.test with VS2013 Also provide reproducible instructions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218303 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 13:49:51 +00:00
Chandler Carruth	4850be49a3	[x86] Teach the vector comment parsing and printing to correctly handle undef in the shuffle mask. This shows up when we're printing comments during lowering and we still have an IR-level constant hanging around that models undef. A nice consequence of this is much prettier test cases where the undef lanes actually show up as undef rather than as a particular set of values. This also allows us to print shuffle comments in cases that use undef such as the recently added variable VPERMILPS lowering. Now those test cases have nice shuffle comments attached with their details. The shuffle lowering for PSHUFB has been augmented to use undef, and the shuffle combining has been augmented to comprehend it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218301 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 11:15:19 +00:00
Chandler Carruth	8f637786d8	[x86] Teach the AVX1 path of the new vector shuffle lowering one more trick that I missed. VPERMILPS has a non-immediate memory operand mode that allows it to do asymetric shuffles in the two 128-bit lanes. Use this rather than two shuffles and a blend. However, it turns out the variable shuffle path to VPERMILPS (and VPERMILPD, although that one offers no functional differenc from the immediate operand other than variability) wasn't even plumbed through codegen. Do such plumbing so that we can reasonably emit a variable-masked VPERMILP instruction. Also plumb basic comment parsing and printing through so that the tests are reasonable. There are still a few tests which don't show the shuffle pattern. These are tests with undef lanes. I'll teach the shuffle decoding and printing to handle undef mask entries in a follow-up. I've looked at the masks and they seem reasonable. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218300 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 10:08:29 +00:00
Michael Kuperstein	5f843038fb	Ensure bitcode encoding stays stable. This includes constants, attributes, and some additional instructions not covered by previous tests. Work was done by lama.saba@intel.com. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218297 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-23 08:48:01 +00:00
Sanjay Patel	c4ef4e47c2	tighten up checks We manage to generate all of the matching instructions (and a lot more) via the reciprocal optimization function - even if we completely remove the square root optimization. With CHECK_NEXT, we assure that we're executing the expected square root optimization paths and not generating extra insts. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218284 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 22:46:44 +00:00
Sanjay Patel	90969b9ee0	remove unnecessary labels; NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218278 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 21:52:53 +00:00
Juergen Ributzka	af989653e0	[FastISel][AArch64] Also allow folding of sign-/zero-extend and shift-left for booleans (i1). Shift-left immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. This should fix a bug found by Chad. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218275 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 21:08:53 +00:00
David Majnemer	d80fc698f3	MC: ReadOnlyWithRel section kinds should map to rdata in COFF Don't consider ReadOnlyWithRel as a writable section in COFF, they really belong in .rdata. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218268 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 20:39:23 +00:00
Chandler Carruth	56c7cfe41f	[x86] Introduce tests covering the gamut of 256-bit vector shuffling. These are just test cases, no actual code yet. This establishes the baseline fallback strategy we're starting from on AVX2 and the expected lowering we use on AVX1. Also, these test cases are very much generated. I've manually crafted the specific pattern set that I'm hoping will be useful at exercising the lowering code, but I've not (and could not) manually verify all of these. I've spot checked and they seem legit to me. As with the rest of vector shuffling, at a certain point the only really useful way to check the correctness of this stuff is through fuzz testing. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218267 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 20:25:08 +00:00
Sanjay Patel	6539887847	Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2). We generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors. This patch should preserve all existing behavior with regular optimization levels, but also use splats whenever possible when optimizing for size on any CPU with AVX or AVX2. The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save at least 8 bytes (up to 31 bytes) of constant pool data. Differential Revision: http://reviews.llvm.org/D5347 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218263 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 18:54:01 +00:00
Akira Hatanaka	73c604b290	Fix test case commited in r218242 to appease buildbot. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218261 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 18:07:20 +00:00
Tom Stellard	e1bc40b1e6	Revert "R600/SI: Add support for global atomic add" This reverts commit r218254. The global_atomics.ll test fails with asserts disabled. For some reason, the compiler fails to produce the atomic no return variants. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218257 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 16:44:04 +00:00
Frederic Riss	22d448775e	Fix a test introduced in r218246 to work also on Windows. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218255 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 16:17:32 +00:00
Tom Stellard	6d625ad495	R600/SI: Add support for global atomic add git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218254 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 15:35:35 +00:00
Pavel Chupin	25c57d5cfe	[x32] Fix segmented stacks support Summary: Update segmented-stacks*.ll tests with x32 target case and make corresponding changes to make them pass. Test Plan: tests updated with x32 target Reviewers: nadav, rafael, dschuff Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5245 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218247 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 13:11:35 +00:00
Frederic Riss	21e5bf8461	[dwarfdump] Dump full filenames as DW_AT_(decl\|call)_file attribute values Reviewers: dblaikie samsonov Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5192 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218246 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 12:36:04 +00:00
Frederic Riss	cc55b73867	Allow DWARFDebugInfoEntryMinimal::getSubroutineName to resolve cross-unit references. Summary: getSubroutineName is currently only used by llvm-symbolizer, thus add a binary test containing a cross-cu inlining example. Reviewers: samsonov, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5394 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218245 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 12:35:53 +00:00
Robert Lougher	2ee97f03a4	Fix assert when decoding PSHUFB mask The PSHUFB mask decode routine used to assert if the mask index was out of range (<0 or greater than the size of the vector). The problem is, we can legitimately have a PSHUFB with a large index using intrinsics. The instruction only uses the least significant 4 bits. This change removes the assert and masks the index to match the instruction behaviour. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218242 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 11:54:38 +00:00
Oliver Stannard	98ef3474ef	Downgrade DWARF2 section limit error to a warning We currently emit an error when trying to assemble a file with more than one section using DWARF2 debug info. This should be a warning instead, as the resulting file will still be usable, but with a degraded debug illusion. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218241 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 10:45:16 +00:00
Chandler Carruth	ec35919c9a	[x86] Move the AVX v4i64 test cases down to group them together. Increasingly I don't want to mix the integer and floating point tests, especially with AVX where they are handled quite differently. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218233 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 03:05:23 +00:00
Chandler Carruth	de95c380c7	[x86] Back out a bad choice about lowering v4i64 and pave the way for a more sane approach to AVX2 support. Fundamentally, there is no useful way to lower integer vectors in AVX. None. We always end up with a VINSERTF128 in the end, so we might as well eagerly switch to the floating point domain and do everything there. This cleans up lots of weird and unlikely to be correct differences between integer and floating point shuffles when we only have AVX1. The other nice consequence is that by doing things this way we will make it much easier to write the integer lowering routines as we won't need to duplicate the logic to check for AVX vs. AVX2 in each one -- if we actually try to lower a 256-bit vector as an integer vector, we have AVX2 and can rely on it. I think this will make the code much simpler and more comprehensible. Currently, I've disabled all support for AVX2 so that we always fall back to AVX. This keeps everything working rather than asserting. That will go away with the subsequent series of patches that provide a baseline AVX2 implementation. Please note, I'm going to implement AVX2 without access to hardware. That means I cannot correctness test this path. I will be relying on those with access to AVX2 hardware to do correctness testing and fix bugs here, but as a courtesy I'm trying to sketch out the framework for the new-style vector shuffle lowering in the context of the AVX2 ISA. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218228 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-22 00:32:15 +00:00
Chandler Carruth	37bb4b0365	[x86] Teach the new vector shuffle lowering how to cleverly lower single input v8f32 shuffles which are not 128-bit lane crossing but have different shuffle patterns in the low and high lanes. This removes most of the extract/insert traffic that was unnecessary and is particularly good at lowering cases where only one of the two lanes is shuffled at all. I've also added a collection of test cases with undef lanes because this lowering is somewhat more sensitive to undef lanes than others. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218226 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 23:46:13 +00:00
Chandler Carruth	7da57cf5b4	[x86] Add a bunch of test cases where we have different shuffle patterns in the high and low 128-bit lanes of a v8f32 vector. No functionality change yet, but wanted to set up the baseline for my next patch which will make these quite a bit better. =] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218224 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 23:32:42 +00:00
Chandler Carruth	974542d7d8	[x86] Teach the new vector shuffle lowering to re-use the SHUFPS lowering when it can use a symmetric SHUFPS across both 128-bit lanes. This required making the SHUFPS lowering tolerant of other vector types, and adjusting our canonicalization to canonicalize harder. This is the last of the clever uses of symmetry I've thought of for v8f32. The rest of the tricks I'm aware of here are to work around assymetry in the mask. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218216 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 13:35:14 +00:00
Chandler Carruth	1a5f7f54f4	[x86] Teach the new vector shuffle lowering the basics about insertion of a single element into a zero vector for v4f64 and v4i64 in AVX. Ironically, there is less to see here because xor+blend is so crazy fast that we can't really beat that to zero the high 128-bit lane. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218214 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 12:49:46 +00:00
Chandler Carruth	6ef31b0079	[x86] Teach the new vector shuffle lowering how to lower to UNPCKLPS and UNPCKHPS with AVX vectors by recognizing those patterns when they are repeated for both 128-bit lanes. With this, we now generate the exact same (really nice) code for Quentin's avx_test_case.ll which was the most significant regression reported for the new shuffle lowering. In fact, I'm out of specific test cases for AVX lowering, the rest were AVX2 I think. However, there are a bunch of pretty obvious remaining things to improve with AVX... git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218213 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 12:20:44 +00:00
Chandler Carruth	7a94357b04	[x86] Add test cases for UNPCK instructions with v8f32 AVX vectors in preparation for enhancing their support in the new vector shuffle lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218212 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 12:13:11 +00:00
Chandler Carruth	7922d3e39a	[x86] Begin teaching the new vector shuffle lowering among the most important bits of cleverness: to detect and lower repeated shuffle patterns between the two 128-bit lanes with a single instruction. This patch just teaches it how to lower single-input shuffles that fit this model using VPERMILPS. =] There is more that needs to happen here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218211 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 12:01:19 +00:00
Chandler Carruth	e4cb9d5f25	[x86] Regenerate this test case now that I've improved my script for generating the test cases to format things more consistently and actually catch all the operand sequences that should be elided in favor of the asm comments. No actual changes here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218210 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 11:51:33 +00:00
Chandler Carruth	29720a4bad	[x86] Teach the new vector shuffle lowering of v4f64 to prefer a direct VBLENDPD over using VSHUFPD. While the 256-bit variant of VBLENDPD slows down to the same speed as VSHUFPD on Sandy Bridge CPUs, it has twice the reciprocal throughput on Ivy Bridge CPUs much like it does everywhere for 128-bits. There isn't a downside, so just eagerly use this instruction when it suffices. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218208 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 11:17:55 +00:00
Chandler Carruth	0dd52092d0	[x86] Add some more comprehensive tests for v4f64 blending. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218207 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 11:12:19 +00:00
Chandler Carruth	57191b0b48	[x86] Re-generate a bunch of the v4f64 test cases with my new script. This expands the integer cases to cover the fact that AVX2 moves their lane-crossing shuffles into the integer domain. It also adds proper support for AVX2 run lines and the "ALL" group when it doesn't matter. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218206 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 11:07:41 +00:00
Chandler Carruth	291140b112	[x86] Teach the new vector shuffle lowering the first step toward more actual support for complex AVX shuffling tricks. We can do independent blends of the low and high 128-bit lanes of an avx vector, so shuffle the inputs into place and then do the blend at 256 bits. This will in many cases remove one blend instruction. The next step is to permute the low and high halves in-place rather than extracting them and re-inserting them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218202 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 09:35:22 +00:00
David Majnemer	31b080d57f	MC: Support aligned COMMON symbols for COFF link.exe: Fuzz testing has shown that COMMON symbols with size > 32 will always have an alignment of at least 32 and all symbols with size < 32 will have an alignment of at least the largest power of 2 less than the size of the symbol. binutils: The BFD linker essentially work like the link.exe behavior but with alignment 4 instead of 32. The BFD linker also supports an extension to COFF which adds an -aligncomm argument to the .drectve section which permits specifying a precise alignment for a variable but MC currently doesn't support editing .drectve in this way. With all of this in mind, we decide to play a little trick: we can ensure that the alignment will be respected by bumping the size of the global to it's alignment. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218201 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 09:18:07 +00:00
Chandler Carruth	1ca1e33c3a	[x86] Add some more test cases covering specific blend patterns. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218200 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 09:01:26 +00:00
Chandler Carruth	b7ef7f97a8	[x86] Add the beginnings of some tests for our v8f32 shuffle lowering under AVX. This really just documents the current state of the world. I'm going to try to flesh it out to cover any test cases I plan to improve prior to improving them so that the delta made by changes is actually visible to code reviewers. This is made easier by the fact that I now have a script to automate the process of producing test cases including the check lines. =] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218199 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-21 08:49:27 +00:00
Chandler Carruth	ae464b2ba1	[x86] Teach the new vector shuffle lowering to use VPERMILPD for single-input shuffles with doubles. This allows them to fold memory operands into the shuffle, etc. This is just the analog to the v4f32 case in my prior commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218193 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-20 22:09:27 +00:00
Chandler Carruth	479d0ba62b	[x86] Add an AVX run to the 128-bit v2 tests, teach them to have a generic SSE and AVX mode in addition to a specific AVX1 test path, and flesh out the AVX tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218192 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-20 21:26:41 +00:00
David Majnemer	182c8ff6c0	Update tests which broke from r218189 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218191 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-20 21:18:43 +00:00
Chandler Carruth	9c7ffd20df	[x86] Teach the new vector shuffle lowering to use the AVX VPERMILPS instruction for single-vector floating point shuffles. This in turn allows the shuffles to fold a load into the instruction which is one of the common regressions hit with the new shuffle lowering. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218190 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-20 20:52:07 +00:00
David Majnemer	1c1bde666c	MC: Fix MCSectionCOFF::PrintSwitchToSection We had a few bugs: - We were considering the GVKind instead of just looking at the section characteristics - We would never print out 'y' when a section was meant to be unreadable - We would never print out 's' when a section was meant to be shared - We translated IMAGE_SCN_MEM_DISCARDABLE to 'n' when it should've meant IMAGE_SCN_LNK_REMOVE git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218189 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-20 20:40:50 +00:00
Chandler Carruth	cc727ec92a	[x86] Start moving to a fancier check syntax to reduce the need for duplication of check lines. The idea is to have broad sets of compilation modes that will frequently diverge without having to always and immediately explode to the precise ISA feature set. While this already helps due to VEX encoded differences, it will help much more as I teach the new shuffle lowering about more of the new VEX encoded instructions which can still be used to implement 128-bit shuffles. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218188 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-20 18:36:39 +00:00
David Majnemer	3f34ae97b9	MC: Treat ReadOnlyWithRel and ReadOnlyWithRelLocal as ReadOnly for COFF A problem with our old behavior becomes observable under x86-64 COFF when we need a read-only GV which has an initializer which is referenced using a relocation: we would mark the section as writable. Marking the section as writable interferes with section merging. This fixes PR21009. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218179 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-20 07:31:46 +00:00
Chandler Carruth	c16105b078	[x86] Teach the v4f32 path of the new shuffle lowering to handle the tricky case of single-element insertion into the zero lane of a zero vector. We can't just use the same pattern here as we do in every other vector type because the general insertion logic can handle insertion into the non-zero lane of the vector. However, in SSE4.1 with v4f32 vectors we have INSERTPS that is a much better choice than the generic one for such lowerings. But INSERTPS can do lots of other lowerings as well so factoring its logic into the general insertion logic doesn't work very well. We also can't just extract the core common part of the general insertion logic that is faster (forming VZEXT_MOVL synthetic nodes that lower to MOVSS when they can) because VZEXT_MOVL is often faster than a blend while INSERTPS is slower! So instead we do a restrictive condition on attempting to use the generic insertion logic to narrow it to those cases where VZEXT_MOVL won't need a shuffle afterward and thus will do better than INSERTPS. Then we try blending. Then we go back to INSERTPS. This still doesn't generate perfect code for some silly reasons that can be fixed by tweaking the td files for lowering VZEXT_MOVL to use XORPS+BLENDPS when available rather than XORPS+MOVSS when the input ends up in a register rather than a load from memory -- BLENDPSrr has twice the reciprocal throughput of MOVSSrr. Don't you love this ISA? git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218177 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-20 04:15:22 +00:00
Chandler Carruth	cc62abbe39	[x86] Generalize the single-element insertion lowering to work with floating point types and use it for both v2f64 and v2i64 single-element insertion lowering. This fixes the last non-AVX performance regression test case I've gotten of for the new vector shuffle lowering. There is obvious analogous lowering for v4f32 that I'll add in a follow-up patch (because with INSERTPS, v4f32 requires special treatment). After that, its AVX stuff. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218175 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-20 03:32:25 +00:00
David Majnemer	c7210b3f0b	llvm-readobj: pretty-print special COFF section names Print IMAGE_SYM_DEBUG and the like instead of (-2). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218172 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-20 00:25:06 +00:00
Peter Collingbourne	87f7e75e58	Fix crash with an insertvalue that produces an empty object. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218171 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-20 00:10:47 +00:00
Matt Arsenault	1a505ebae4	R600: Un-xfail a test which passes with pass disabled git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218165 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 23:02:20 +00:00
Matt Arsenault	ea3a0242f4	R600/SI: Un-xfail tests which work now git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218164 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 23:02:18 +00:00
Matt Arsenault	c58ab80f78	R600/SI: Un xfail a test that works now git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218162 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 22:42:40 +00:00
Juergen Ributzka	faf93a6e0c	[FastIsel][AArch64] Fix a think-o in address computation. When looking through sign/zero-extensions the code would always assume there is such an extension instruction and use the wrong operand for the address. There was also a minor issue in the handling of 'AND' instructions. I accidentially used a 'cast' instead of a 'dyn_cast'. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218161 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 22:23:46 +00:00
Chandler Carruth	dc58d1e099	[x86] Fully generalize the zext lowering in the new vector shuffle lowering to support both anyext and zext and to custom lower for many different microarchitectures. Using this allows us to get exactly the right code for zext and anyext shuffles in all the vector sizes. For v16i8, the improvement is huge. The new SSE2 test case added I refused to add before this because it was sooooo muny instructions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218143 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 20:00:32 +00:00
Justin Bogner	dcd8562eb7	llvm-cov: Prevent a test from matching its own check lines Since llvm-cov shows the source file in its output, be careful about potentially matching the check lines themselves. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218138 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 19:04:08 +00:00
David Blaikie	b2c0a6db6e	Fix test case to be portable to different architectures. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218134 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 18:31:25 +00:00
Matt Arsenault	c14f7630e0	R600/SI: Fix test to prepare for scheduler git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218131 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 18:11:16 +00:00
David Blaikie	9e4e3d057f	Omit DW_TAG_subprograms for subprograms without inlined subroutines when producing -gmlt data To reduce the size of -gmlt data, skip the subprograms without any inlined subroutines. Since we've now got the ability to make these determinations in the backend (funnily enough - we added the flag so we wouldn't produce ranges under -gmlt, but with this change we use the flag, but go back to producing ranges under -gmlt). Instead, just produce CU ranges to inform the consumer which parts of the code are described by this CU's line table. Tools could inspect the line table directly to compute the range, but the CU ranges only seem to be about 0.5% of object/executable size, so I'm not too worried about teaching llvm-symbolizer that trick just yet - it's certainly a possible piece of future work. Update an llvm-symbolizer test just to demonstrate that this schema is acceptable there (if it wasn't, the compiler-rt tests would catch this, but good to have an in-llvm-tree test for llvm-symbolizer's behavior here) Building the clang binary with -gmlt with this patch reduces the total size of object files by 5.1% (5.56% without ranges) without compression and the executable by 4.37% (4.75% without ranges). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218129 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 17:03:16 +00:00
Hal Finkel	c404e8208c	Optionally enable more-aggressive FMA formation in DAGCombine The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one use, but this is overly-conservative on some systems. Specifically, if the FMA and the FADD have the same latency (and the FMA does not compete for resources with the FMUL any more than the FADD does), there is no need for the restriction, and furthermore, forming the FMA leaving the FMUL can still allow for higher overall throughput and decreased critical-path length. Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to elide the hasOneUse check. This is enabled for PowerPC by default, as most PowerPC systems will benefit. Patch by Olivier Sallenave, thanks! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218120 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 11:42:56 +00:00
Chandler Carruth	89436b4160	[x86] Recognize that we can use duplication to widen v16i8 shuffles due to undef lanes as well as defined widenable lanes. This dramatically improves the lowering we use for undef-shuffles in a zext-ish pattern for SSE2. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218115 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 09:45:21 +00:00
Chandler Carruth	3e990c1e5b	[x86] Actually test the SSE2 lowering for most of the zext-ish shuffles. Not sure why I only did SSSE3 here. Also, I've left out some of the SSE2 ones because the shuffles are so absurd it's not worth transcribing them. Will try to fix them to be sane and then check them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218114 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 08:51:06 +00:00
Chandler Carruth	ec1f7b1c87	[x86] Teach the new vector shuffle lowering to also use pmovzx for v4i32 shuffles that are zext-ing. Not a lot to see here; the undef lane variant is better handled with pshufd, but this improves the actual zext pattern. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218112 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 08:37:44 +00:00
Justin Bogner	42b96889d1	llvm-cov: Fix dropped lines when filters were applied Uncovered lines in the middle of a covered region weren't being shown when filtering to a particular function. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218109 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 08:13:16 +00:00
Chandler Carruth	330aa6fd6b	[x86] Add a dedicated lowering path for zext-compatible vector shuffles to the new vector shuffle lowering code. This allows us to emit PMOVZX variants consistently for patterns where it is a viable lowering. This instruction is both fast and allows us to fold loads into it. This only hooks the new lowering up for i16 and i8 element widths, mostly so I could manage the change to the tests. I'll add the i32 one next, although it is significantly less interesting. One thing to note is that we already had some tests for these patterns but those tests had far less horrible instructions. The problem is that those tests weren't checking the strict start and end of the instruction sequence. =[ As a consequence something changed in the lowering making us generate TERRIBLE code for these patterns in SSE2 through SSSE3. I've consolidated all of the tests and spelled out the madness that we currently emit for these shuffles. I'm going to try to figure out what has gone wrong here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218102 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 06:07:49 +00:00
Jiangning Liu	61519cd699	Optimize sext/zext insertion algorithm in back-end. With this optimization, we will not always insert zext for values crossing basic blocks, but insert sext if the users of a value crossing basic block has preference of sign predicate. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218101 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 05:30:35 +00:00
David Blaikie	a562871c67	Omit DW_AT_frame_base under -gmlt for size git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218100 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 04:55:05 +00:00
David Blaikie	a5a4f87474	Omit all the extra static attributes on subprograms in -gmlt This omission will be done in a fancier manner once we're dealing with "put gmlt in the skeleton CUs under fission" - it'll have to be conditional on the kind of CU we're emitting into (skeleton or gmlt). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218098 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 04:30:36 +00:00
Hans Wennborg	2ee31bcdee	Fix an it's vs. its typo. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218093 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 01:14:56 +00:00
Matt Arsenault	bd2b96a12d	R600: Better fix for bug 20982 Just do the left shift as unsigned to avoid the UB. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218092 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 00:42:06 +00:00
Chandler Carruth	9b676fd6f2	[x86] Extend this test to cover SSE4.1. Nothing interesting here, but paves the way for subsequent changes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218091 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-19 00:30:24 +00:00
Peter Collingbourne	fb2832f689	Try to fix i686-cygming bots. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218086 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 22:56:00 +00:00
Peter Collingbourne	394be6c159	LTO: introduce object file-based on-disk module format. This format is simply a regular object file with the bitcode stored in a section named ".llvmbc", plus any number of other (non-allocated) sections. One immediate use case for this is to accommodate compilation processes which expect the object file to contain metadata in non-allocated sections, such as the ".go_export" section used by some Go compilers [1], although I imagine that in the future we could consider compiling parts of the module (such as large non-inlinable functions) directly into the object file to improve LTO efficiency. [1] http://golang.org/doc/install/gccgo#Imports Differential Revision: http://reviews.llvm.org/D4371 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218078 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 21:28:49 +00:00
Quentin Colombet	65edced76b	[ARM] Do not perform a tail call when the caller returns several values. The fix is slightly different then x86 (see r216117) because the number of values attached to a return can vary even for a single returned value (e.g., f64 yields two returned values). <rdar://problem/18352998> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218076 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 21:17:50 +00:00
Robin Morisset	5052940c27	Restore "[ARM, Fix] Fix emitLeading/TrailingFence on old ARM processors" Summary: This patch was originally in D5304 (I could not find a way to reopen that revision). It was accepted, commited and broke the build bots because the overloading of the constructor of ArrayRef for braced initializer lists is not supported by all toolchains. I then reverted it, and propose this fixed version that uses a plain C array instead in makeDMB (that array is then converted implicitly to an ArrayRef, but that is not behind an ifdef). Could someone confirm me whether initialization lists for plain C arrays are supported by every toolchain used to build llvm ? Otherwise I can just initialize the array in the old way: args[0] = ...; .. ; args[5] = ...; Below is the description of the original patch: ``` I had only tested this code for ARMv7 and ARMv8. This patch adds several fallback paths if the processor does not support dmb ish: - dmb sy if a cortex-M with support for dmb - mcr p15, #0, r0, c7, c10, #5 for ARMv6 (special instruction equivalent to a DMB) These fallback paths were chosen based on the code for fence seq_cst. Thanks to luqmana for having noticed this bug. ``` Test Plan: Added more cases to atomic-load-store.ll + make check-all Reviewers: jfb, t.p.northover, luqmana Subscribers: llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D5386 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218066 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 18:56:04 +00:00
Matt Arsenault	e08e52528b	R600: Bug 20982 - Avoid undefined left shift of negative value I'm not sure what the hardware actually does, so don't bother trying to fold it for now. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218057 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 15:52:26 +00:00
Chandler Carruth	72f0d9515e	[x86] Use PALIGNR for v4i32 and v2i64 blends when appropriate. There is no purpose in using it for single-input shuffles as pshufd is just as fast and doesn't tie the two operands. This removes a substantial amount of wrong-domain blend operations in SSSE3 mode. It also completes the usage of PALIGNR for integer shuffles and addresses one of the test cases Quentin hit with the new vector shuffle lowering. There is still the question of whether and when to use this for floating point shuffles. It is faster than shufps or shufpd but in the integer domain. I don't yet really have a good heuristic here for when to use this instruction for floating point vectors. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218038 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 09:00:25 +00:00
Chandler Carruth	088aa097d5	[x86] Add an SSSE3 run and check mode to the 128-bit v2 tests of the new vector shuffle lowering. This will be needed for up-coming palignr tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218037 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 08:33:04 +00:00
Juergen Ributzka	f789dac2dd	Revert "[FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ." Reverting it until I have time to investigate a regression. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218035 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 08:07:40 +00:00
Juergen Ributzka	ef48b51126	Fix previous commit: [FastISel][AArch64] Simplify XALU multiplies. When folding the intrinsic flag into the branch or select we also have to consider the fact if the intrinsic got simplified, because it changes the flag we have to check for. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218034 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 07:26:26 +00:00
Juergen Ributzka	e7fba004ce	[FastISel][AArch64] Simplify XALU multiplies. Simplify {s\|u}mul.with.overflow to {s\|u}add.with.overflow when possible. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218033 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 07:04:54 +00:00
Juergen Ributzka	4b6f00ad18	[FastISel][AArch64] Followup commit for 218031 to handle negative offsets too. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218032 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 07:04:49 +00:00
Juergen Ributzka	22b557d942	[FastISel][AArch64] Try to fold the offset into the add instruction when simplifying a memory address. Small optimization in 'simplifyAddress'. When the offset cannot be encoded in the load/store instruction, then we need to materialize the address manually. The add instruction can encode a wider range of immediates than the load/store instructions. This change tries to fold the offset into the add instruction first before materializing the offset in a register. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218031 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 05:40:47 +00:00
Juergen Ributzka	ffbd4879eb	[FastISel][AArch64] Fold 'AND' instruction during the address computation. The 'AND' instruction could be used to mask out the lower 32 bits of a register. If this is done inside an address computation we might be able to fold the instruction into the memory instruction itself. and x1, x1, #0xffffffff ---> ldrb x0, [x0, w1, uxtw] ldrb x0, [x0, x1] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218030 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 05:40:41 +00:00
Chandler Carruth	49ab1a424d	[x86] Add an SSSE3 run to the v4 shuffle test. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218028 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 04:38:32 +00:00
Saleem Abdulrasool	9c00ddb8d5	ARM: prevent crash on ELF directives on COFF Certain directives are unsupported on Windows (some of which could/should be supported). We would not diagnose the use but rather crash during the emission as we try to access the Target Streamer. Add an assertion to prevent creating a NULL reference (which is not permitted under C++) as well as a test to ensure that we can diagnose the disabled directives. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218014 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 04:28:29 +00:00
Chandler Carruth	3ff76847ba	[x86] Initial step of teaching the new vector shuffle lowering about PALIGNR. This just adds it to the v8i16 and v16i8 lowering steps where it is completely unmatched. It also introduces the logic for detecting rotation shuffle masks even in the presence of single input or blend masks and arbitrarily undef lanes. I've added fairly comprehensive tests for the matching logic in v8i16 because the tests at that size are much easier to write and manage. I've not checked the SSE2 code generated for these tests because the code is horrible. It is absolute madness. Testing it will just make the test brittle without giving any interesting improvements in the correctness confidence. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218013 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 04:11:29 +00:00
Saleem Abdulrasool	5bf65590d0	ARM: use a more precise check for MachO Rather than relying on support for a specific directive to determine if we are targeting MachO, explicitly check the output format. As an additional bonus, cleanup the caret diagnostic for the non-MachO case and avoid the spurious error caused by not discarding the statement. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218012 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 03:49:55 +00:00
Juergen Ributzka	710fc316fb	[FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ. Teach selectBranch to fold bit test and branch into a single instruction (TBZ or TBNZ). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218010 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-18 02:44:13 +00:00
Samuel Antao	6693d0de3e	Fix FastISel bug in boolean returns for PowerPC. For PPC targets, FastISel does not take the sign extension information into account when selecting return instructions whose operands are constants. A consequence of this is that the return of boolean values is not correct. This patch fixes the problem by evaluating the sign extension information also for constants, forwarding this information to PPCMaterializeInt which takes this information to drive the sign extension during the materialization. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217993 91177308-0d34-0410-b5e6-96231b3b80d8	2014-09-17 23:25:06 +00:00

... 2 3 4 5 6 ...

26456 Commits