Commit Graph

3307 Commits

Author SHA1 Message Date
Elena Demikhovsky
f68b214e2d Fixed vsqrt.ss intrinsic usage - order of input operands was wrong.
Added a test.
Thanks Bruno for reviewing the patch.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145403 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-29 15:00:45 +00:00
Craig Topper
f267972d28 Fix shuffle decoding for memory forms for (V)SHUFPS/D.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145392 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-29 07:58:09 +00:00
Craig Topper
36e36ace77 Fix issues in shuffle decoding around VPERM* instructions. Fix shuffle decoding for VSHUFPS/D for 256-bit types. Add pattern matching for memory forms of VPERMILPS/VPERMILPD.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145390 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-29 07:49:05 +00:00
Craig Topper
fe2a6c584a Fix VINSERTF128/VEXTRACTF128 to be marked as FP instructions. Allow execution dependency fix pass to convert them to their integer equivalents when AVX2 is enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145376 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-29 05:37:58 +00:00
Craig Topper
108126cfc6 Correctly mark VPERM2F128 as being an FP instruction and add execution domain fixing support to convert it to VPERM2I128 for AVX2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145370 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-29 03:57:34 +00:00
Evan Cheng
ed1c0c7f58 Revert r145273 and fix in SelectionDAG::InferPtrAlignment() instead.
Conservatively returns zero when the GV does not specify an alignment nor is it
initialized. Previously it returns ABI alignment for type of the GV. However, if
the type is a "packed" type, then the under-specified alignments is attached to
the load / store instructions. In that case, the alignment of the type cannot be
trusted.
rdar://10464621


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145300 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-28 22:37:34 +00:00
Evan Cheng
1c487869f5 DAG combine should not increase alignment of loads / stores with alignment less
than ABI alignment. These are loads / stores from / to "packed" data structures.
Their alignments are intentionally under-specified.

rdar://10301431


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145273 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-28 20:42:56 +00:00
Craig Topper
70b883b3a7 Add X86 instruction selection for VPERM2I128 when AVX2 is enabled. Merge VPERMILPS/VPERMILPD detection since they are pretty similar.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145238 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-28 10:14:51 +00:00
Chandler Carruth
fac1305da1 Take two on rotating the block ordering of loops. My previous attempt
was centered around the premise of laying out a loop in a chain, and
then rotating that chain. This is good for preserving contiguous layout,
but bad for actually making sane rotations. In order to keep it safe,
I had to essentially make it impossible to rotate deeply nested loops.
The information needed to correctly reason about a deeply nested loop is
actually available -- *before* we layout the loop. We know the inner
loops are already fused into chains, etc. We lose information the moment
we actually lay out the loop.

The solution was the other alternative for this algorithm I discussed
with Benjamin and some others: rather than rotating the loop
after-the-fact, try to pick a profitable starting block for the loop's
layout, and then use our existing layout logic. I was worried about the
complexity of this "pick" step, but it turns out such complexity is
needed to handle all the important cases I keep teasing out of benchmarks.

This is, I'm afraid, a bit of a work-in-progress. It is still
misbehaving on some likely important cases I'm investigating in Olden.
It also isn't really tested. I'm going to try to craft some interesting
nested-loop test cases, but it's likely to be extremely time consuming
and I don't want to go there until I'm sure I'm testing the correct
behavior. Sadly I can't come up with a way of getting simple, fine
grained test cases for this logic. We need complex loop structures to
even trigger much of it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145183 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-27 13:34:33 +00:00
Chandler Carruth
2eb5a744b1 Rework a bit of the implementation of loop block rotation to not rely so
heavily on AnalyzeBranch. That routine doesn't behave as we want given
that rotation occurs mid-way through re-ordering the function. Instead
merely check that there are not unanalyzable branching constructs
present, and then reason about the CFG via successor lists. This
actually simplifies my mental model for all of this as well.

The concrete result is that we now will rotate more loop chains. I've
added a test case from Olden highlighting the effect. There is still
a bit more to do here though in order to regain all of the performance
in Olden.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145179 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-27 09:22:53 +00:00
Chris Lattner
3211c6e31b remove autoupgrade support for old forms of llvm.prefetch and the old
trampoline forms.  Both of these were correct in LLVM 3.0, and we don't
need to support LLVM 2.9 and earlier in mainline.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145174 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-27 07:42:04 +00:00
Chris Lattner
d2bf432b2b Upgrade syntax of tests using volatile instructions to use 'load volatile' instead of 'volatile load', which is archaic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145171 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-27 06:54:59 +00:00
Chris Lattner
663aebf8d6 remove some old autoupgrade logic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145167 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-27 06:10:54 +00:00
Chandler Carruth
2e38cf961d Introduce a loop block rotation optimization to the new block placement
pass. This is designed to achieve one of the important optimizations
that the old code placement pass did, but more simply.

This is a somewhat rough and *very* conservative version of the
transform. We could get a lot fancier here if there are profitable cases
to do so. In particular, this only looks for a single pattern, it
insists that the loop backedge being rotated away is the last backedge
in the chain, and it doesn't provide any means of doing better in-loop
placement due to the rotation. However, it appears that it will handle
the important loops I am finding in the LLVM test suite.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145158 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-27 00:38:03 +00:00
Eli Friedman
4455142a95 Fix APFloat::convert so that it handles narrowing conversions correctly; it
was returning incorrect values in rare cases, and incorrectly marking
exact conversions as inexact in some more common cases. Fixes PR11406, and a
missed optimization in test/CodeGen/X86/fp-stack-O0.ll.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145141 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-26 03:38:02 +00:00
Bruno Cardoso Lopes
1b9b377975 This patch contains support for encoding FMA4 instructions and
tablegen patterns for scalar FMA4 operations and intrinsic. Also
add tests for vfmaddsd.

Patch by Jan Sjodin

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145133 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-25 19:33:42 +00:00
Craig Topper
705f2431a0 Remove 256-bit specific node types for UNPCKHPS/D and instead use the 128-bit versions and let the operand type disinquish. Also fix the load form of the v8i32 patterns for these to realize that the load would be promoted to v4i64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145126 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-24 22:57:10 +00:00
Chandler Carruth
4aae4f9007 Fix a silly use-after-free issue. A much earlier version of this code
need lots of fanciness around retaining a reference to a Chain's slot in
the BlockToChain map, but that's all gone now. We can just go directly
to allocating the new chain (which will update the mapping for us) and
using it.

Somewhat gross mechanically generated test case replicates the issue
Duncan spotted when actually testing this out.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145120 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-24 11:23:15 +00:00
Chandler Carruth
a2deea1dcf When adding blocks to the list of those which no longer have any CFG
conflicts, we should only be adding the first block of the chain to the
list, lest we try to merge into the middle of that chain. Most of the
places we were doing this we already happened to be looking at the first
block, but there is no reason to assume that, and in some cases it was
clearly wrong.

I've added a couple of tests here. One already worked, but I like having
an explicit test for it. The other is reduced from a test case Duncan
reduced for me and used to crash. Now it is handled correctly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145119 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-24 08:46:04 +00:00
Benjamin Kramer
f238f50aaf X86: Use btq for bit tests if the immediate can't be encoded in 32 bits.
Before:
	movabsq	$4294967296, %rax       ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x01,0x00,0x00,0x00]
	testq	%rax, %rdi              ## encoding: [0x48,0x85,0xf8]
	jne	LBB0_2                  ## encoding: [0x75,A]

After:
	btq	$32, %rdi               ## encoding: [0x48,0x0f,0xba,0xe7,0x20]
	jb	LBB0_2                  ## encoding: [0x72,A]

btq is usually slower than testq because it doesn't fuse with the jump, but here we're better off
saving one register and a giant movabsq.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145103 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-23 13:54:17 +00:00
NAKAMURA Takumi
e4513b1fc5 test/CodeGen/X86/block-placement.ll: Add explicit -mtriple=i686-linux. X86 Win32 CodeGen does not support EH yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145101 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-23 12:18:22 +00:00
Chandler Carruth
598894ff25 Relax an invariant that block placement was trying to assert a bit
further. This invariant just wasn't going to work in the face of
unanalyzable branches; we need to be resillient to the phenomenon of
chains poking into a loop and poking out of a loop. In fact, we already
were, we just needed to not assert on it.

This was found during a bootstrap with block placement turned on.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145100 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-23 10:35:36 +00:00
Elena Demikhovsky
52a35a89e6 I added several lines in X86 code generator that allow to choose
VSHUFPS/VSHUFPD instructions while lowering VECTOR_SHUFFLE node. I check a commuted VSHUFP mask.

The patch was reviewed by Bruno.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145099 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-23 10:23:16 +00:00
Chandler Carruth
521fc5bcd7 Handle the case of a no-return invoke correctly. It actually still has
successors, they just are all landing pad successors. We handle this the
same way as no successors. Comments attached for the next person to wade
through here and another lovely test case courtesy of Benjamin Kramer's
bugpoint reduction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145098 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-23 08:23:54 +00:00
Bob Wilson
23d66a58b7 Enable stack protectors for all arrays, not just char arrays. rdar://5875909
Patch by Bill Wendling.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145097 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-23 07:13:56 +00:00
Jakob Stoklund Olesen
7f5e43f61d Fix PR11422.
This was a bug in keeping track of the available domains when merging
domain values.

The wrong domain mask caused ExecutionDepsFix to try to move VANDPSYrr
to the integer domain which is only available in AVX2.

Also add an assertion to catch future attempts at emitting AVX2
instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145096 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-23 04:03:08 +00:00
Chandler Carruth
47fb954f74 Fix a crash in block placement due to an inner loop that happened to be
reversed in the function's original ordering, and we happened to
encounter it while handling an outer unnatural CFG structure.

Thanks to the test case reduced from GCC's source by Benjamin Kramer.
This may also fix a crasher in gzip that Duncan reduced for me, but
I haven't yet gotten to testing that one.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145094 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-23 03:03:21 +00:00
Chandler Carruth
3b7b209bf8 Fix a devilish miscompile exposed by block placement. The
updateTerminator code didn't correctly handle EH terminators in one very
specific case. AnalyzeBranch would find no terminator instruction, and
so the fallback in updateTerminator is to assume fallthrough. This is
correct, but the destination of the fallthrough was assumed to be the
first successor.

This is *almost always* true, but in certain cases the loop
transformations will cause the landing pad to be the first successor!
Instead of this brittle logic, actually look through the successors for
a non-landing-pad accessor, and to assert if more than one is found.

This will hopefully fix some (if not all) of the self host miscompiles
with block placement. Thanks to Benjamin Kramer for reporting, Nick
Lewycky for an initial stab at a reduction, and Duncan for endless
advice on EH (which I know nothing about) as well as reviewing the
actual fix.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145062 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-22 13:13:16 +00:00
Rafael Espindola
fdb00a9bdb Add triple to the test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145057 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-22 06:36:25 +00:00
Rafael Espindola
254a13282c If a register is both an early clobber and part of a tied use, handle the use
before the clobber so that we copy the value if needed.

Fixes pr11415.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145056 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-22 06:27:18 +00:00
Craig Topper
6fa583d787 Lowering for v32i8 to VPUNPCKLBW/VPUNPCKHBW when AVX2 is enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145028 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-21 08:26:50 +00:00
Craig Topper
3b73312020 Test case for r145026
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145027 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-21 06:58:09 +00:00
Craig Topper
a124f94952 Make LowerSIGN_EXTEND_INREG split 256-bit vectors when AVX1 is enabled and use AVX2 shifts when AVX2 is enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145022 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-21 01:12:36 +00:00
NAKAMURA Takumi
742e5cf612 test/CodeGen/X86/block-placement.ll: Relax expressions for Win32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145011 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-20 12:49:45 +00:00
Chandler Carruth
b0dadb9dd5 The logic for breaking the CFG in the presence of hot successors didn't
properly account for the *global* probability of the edge being taken.
This manifested as a very large number of unconditional branches to
blocks being merged against the CFG even though they weren't
particularly hot within the CFG.

The fix is to check whether the edge being merged is both locally hot
relative to other successors for the source block, and globally hot
compared to other (unmerged) predecessors of the destination block.

This introduces a new crasher on GCC single-source, but it's currently
behind a flag, and Ben has offered to work on the reduction. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145010 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-20 11:22:06 +00:00
Chandler Carruth
2901243fda Add some comments to the latest test case I added here to document what
is actually being tested. Also add some FileCheck goodness to much more
carefully ensure that the result is the desired result. Before this test
would only have failed through an assert failure if the underlying fix
were reverted.

Also, add some weight metadata and a comment explaining exactly what is
going on to a trick section of the test case. Originally, we were
getting very unlucky and trying to form a block chain that isn't
actually profitable. I'm working on a fix to avoid forming these
unprofitable chains, and that would also have masked any failure from
this test case. The easy solution is to add some metadata that makes it
*really* profitable to form the bad chain here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145006 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-20 09:30:40 +00:00
Craig Topper
0d86d462f8 Add code for lowering v32i8 shifts by a splat to AVX2 immediate shift instructions. Remove 256-bit splat handling from LowerShift as it was already handled by PerformShiftCombine.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145005 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-20 00:12:05 +00:00
Craig Topper
745a86bac9 Use 256-bit vcmpeqd for creating an all ones vector when AVX2 is enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145004 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-19 22:34:59 +00:00
Chandler Carruth
03300ecaee Move the handling of unanalyzable branches out of the loop-driven chain
formation phase and into the initial walk of the basic blocks. We
essentially pre-merge all blocks where unanalyzable fallthrough exists,
as we won't be able to update the terminators effectively after any
reorderings. This is quite a bit more principled as there may be CFGs
where the second half of the unanalyzable pair has some analyzable
predecessor that gets placed first. Then it may get placed next,
implicitly breaking the unanalyzable branch even though we never even
looked at the part that isn't analyzable. I've included a test case that
triggers this (thanks Benjamin yet again!), and I'm hoping to synthesize
some more general ones as I dig into related issues.

Also, to make this new scheme work we have to be able to handle branches
into the middle of a chain, so add this check. We always fallback on the
incoming ordering.

Finally, this starts to really underscore a known limitation of the
current implementation -- we don't consider broken predecessors when
merging successors. This can caused major missed opportunities, and is
something I'm planning on looking at next (modulo more bug reports).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144994 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-19 10:26:02 +00:00
Craig Topper
6bf57b0272 Test cases for SSSE3/AVX integer horizontal add/sub.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144990 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-19 09:03:33 +00:00
Craig Topper
1666cb6d63 Extend VPBLENDVB and VPSIGN lowering to work for AVX2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144987 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-19 07:07:26 +00:00
Nadav Rotem
cbbe33fde4 Add AVX2 vpbroadcast support
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144967 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-18 02:49:55 +00:00
Devang Patel
ce35d8b5a1 DISubrange supports unsigned lower/upper array bounds, so let's not fake it in the end while emitting DWARF. If a FE needs to encode signed lower/upper array bounds then we need to extend DISubrange or ad DISignedSubrange.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144937 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-17 23:43:15 +00:00
Eli Friedman
4db4addcd4 Make sure to replace the chain properly when DAGCombining a LOAD+EXTRACT_VECTOR_ELT into a single LOAD. Fixes PR10747/PR11393.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144863 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-16 23:50:22 +00:00
Evan Cheng
2b89498979 Another missing X86ISD::MOVLPD pattern. rdar://10450317
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144839 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-16 22:24:44 +00:00
Evan Cheng
c3aa7c5c5a Disable expensive two-address optimizations at -O0. rdar://10453055
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144806 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-16 18:44:48 +00:00
Eli Friedman
ee94dc212e Fix testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144769 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-16 03:03:52 +00:00
Eli Friedman
d577df8e5a CONCAT_VECTORS can have more than two operands. PR11389.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144768 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-16 02:52:39 +00:00
Nadav Rotem
f8c10e5cb1 AVX: Add support for vbroadcast from BUILD_VECTOR and refactor some of the vbroadcast code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144720 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-15 22:50:37 +00:00
NAKAMURA Takumi
ec0af2f4e1 test/CodeGen/X86/dec-eflags-lower.ll: Relax expression for win32 x64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144714 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-15 22:30:37 +00:00