Commit Graph

3500 Commits

Author SHA1 Message Date
Rafael Espindola
25cd4ff97e Generate the segmented stack prologue for fastcc too.
Patch by Brian Anderson.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147958 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-11 18:41:19 +00:00
Chandler Carruth
11f0e7b158 Revert r147945 which disabled an addressing mode transformation. I had
hoped this would revive one of the llvm-gcc selfhost build bots, but it
didn't so it doesn't appear that my transform is the culprit.

If anyone else is seeing failures, please let me know!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147957 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-11 18:36:12 +00:00
Rafael Espindola
313c703831 Use unsigned comparison in segmented stack prologue.
This is a comparison of two addresses, and GCC does the comparison unsigned.

Patch by Brian Anderson.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147954 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-11 18:23:35 +00:00
Rafael Espindola
014f7a3b37 Explicitly set the scale to 1 on some segstack prologue instrs.
Patch by Brian Anderson.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147952 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-11 18:14:03 +00:00
Jan Sjödin
46df3adb4e Add XOP Intrinsics and tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147949 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-11 15:20:20 +00:00
Nadav Rotem
394a1f53b9 Fix a bug in the lowering of BUILD_VECTOR for AVX. SCALAR_TO_VECTOR does not zero untouched elements. Use INSERT_VECTOR_ELT instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147948 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-11 14:07:51 +00:00
Chandler Carruth
e4bc80a14b Disable the transformation I added in r147936 to see if it fixes some
strange build bot failures that look like a miscompile into an infloop.
I'll investigate this tomorrow, but I'd both like to know whether my
patch is the culprit, and get the bots back to green.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147945 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-11 12:17:47 +00:00
Jakob Stoklund Olesen
dec1f99615 Fix undefined code and reenable test case.
I don't think the compact encoding code is right, but at least is has
defined behavior now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147938 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-11 09:08:04 +00:00
Chandler Carruth
f103b3d1b9 Teach the X86 instruction selection to do some heroic transforms to
detect a pattern which can be implemented with a small 'shl' embedded in
the addressing mode scale. This happens in real code as follows:

  unsigned x = my_accelerator_table[input >> 11];

Here we have some lookup table that we look into using the high bits of
'input'. Each entity in the table is 4-bytes, which means this
implicitly gets turned into (once lowered out of a GEP):

  *(unsigned*)((char*)my_accelerator_table + ((input >> 11) << 2));

The shift right followed by a shift left is canonicalized to a smaller
shift right and masking off the low bits. That hides the shift right
which x86 has an addressing mode designed to support. We now detect
masks of this form, and produce the longer shift right followed by the
proper addressing mode. In addition to saving a (rather large)
instruction, this also reduces stalls in Intel chips on benchmarks I've
measured.

In order for all of this to work, one part of the DAG needs to be
canonicalized *still further* than it currently is. This involves
removing pointless 'trunc' nodes between a zextload and a zext. Without
that, we end up generating spurious masks and hiding the pattern.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147936 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-11 08:41:08 +00:00
NAKAMURA Takumi
69b5df8bf6 llvm/test/CodeGen/X86/zext-fold.ll: Relax an expression in stack offset.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147928 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-11 07:34:22 +00:00
NAKAMURA Takumi
29cc410e6d llvm/test/CodeGen/X86/sub-with-overflow.ll: Add explicit -mtriple=i686-linux.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147927 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-11 07:34:14 +00:00
Jakob Stoklund Olesen
bc9beda433 Disable test that seems to expose an unrelated Linux issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147921 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-11 03:42:27 +00:00
Jakob Stoklund Olesen
2aad2f6e60 Detect when a value is undefined on an edge to a landing pad.
Consider this code:

int h() {
  int x;
  try {
    x = f();
    g();
  } catch (...) {
    return x+1;
  }
  return x;
}

The variable x is undefined on the first edge to the landing pad, but it
has the f() return value on the second edge to the landing pad.

SplitAnalysis::getLastSplitPoint() would assume that the return value
from f() was live into the landing pad when f() throws, which is of
course impossible.

Detect these cases, and treat them as if the landing pad wasn't there.
This allows spill code to be inserted after the function call to f().

<rdar://problem/10664933>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147912 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-11 02:07:05 +00:00
Chad Rosier
eab30279f7 Add test case for r147881.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147891 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-10 23:09:53 +00:00
Joerg Sonnenberger
216f63702f Default stack alignment for 32bit x86 should be 4 Bytes, not 8 Bytes.
Add a test that checks the stack alignment of a simple function for
Darwin, Linux and NetBSD for 32bit and 64bit mode.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147888 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-10 22:43:53 +00:00
Nadav Rotem
6c0366cb25 Fix a bug in the legalization of shuffle vectors. When we emulate shuffles using BUILD_VECTORS we may be using a BV of different type. Make sure to cast it back.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147851 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-10 14:28:46 +00:00
Craig Topper
a937633893 Fix a crash in AVX2 when trying to broadcast a double into a 128-bit vector. There is no vbroadcastsd xmm, but we do need to support 64-bit integers broadcasted into xmm. Also factor the AVX check into the isVectorBroadcast function. This makes more sense since the AVX2 check was already inside.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147844 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-10 08:23:59 +00:00
Evan Cheng
97b5beb7fe Allow machine-cse to look across MBB boundary when cse'ing instructions that
define physical registers. It's currently very restrictive, only catching
cases where the CE is in an immediate (and only) predecessor. But it catches
a surprising large number of cases.

rdar://10660865


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147827 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-10 02:02:58 +00:00
Chandler Carruth
a7cb699251 Cleanup and FileCheck-ize a test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147772 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-09 09:44:26 +00:00
Craig Topper
5feb5dae93 Clean up patterns for MOVNT*. Not sure why there were floating point types on MOVNTPS and MOVNTDQ. And v4i64 was completely missing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147767 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-09 06:52:46 +00:00
Rafael Espindola
1fe9737eb4 Don't print an unused label before .cfi_endproc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147763 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-09 00:17:29 +00:00
Craig Topper
39f227e4dd Don't disable MMX support when AVX is enabled. Fix predicates for MMX instructions that were added along with SSE instructions to check for AVX in addition to SSE level.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147762 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-09 00:11:29 +00:00
Victor Umansky
435d0bd09d Reverted commit #147601 upon Evan's request.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147748 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-08 17:20:33 +00:00
Rafael Espindola
547be2699c Don't print a label before .cfi_startproc when we don't need to. This makes
the produce assembly when using CFI just a bit more readable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147743 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-07 22:42:19 +00:00
Evan Cheng
977679d603 Added a late machine instruction copy propagation pass. This catches
opportunities that only present themselves after late optimizations
such as tail duplication .e.g.
## BB#1:
        movl    %eax, %ecx
        movl    %ecx, %eax
        ret

The register allocator also leaves some of them around (due to false
dep between copies from phi-elimination, etc.)

This required some changes in codegen passes. Post-ra scheduler and the
pseudo-instruction expansion passes have been moved after branch folding
and tail merging. They were before branch folding before because it did
not always update block livein's. That's fixed now. The pass change makes
independently since we want to properly schedule instructions after
branch folding / tail duplication.

rdar://10428165
rdar://10640363



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147716 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-07 03:02:36 +00:00
Eric Christopher
5548755201 Make the 'x' constraint work for AVX registers as well.
Fixes rdar://10614894

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147704 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-07 01:02:09 +00:00
Chandler Carruth
62dfc51152 Prevent a DAGCombine from firing where there are two uses of
a combined-away node and the result of the combine isn't substantially
smaller than the input, it's just canonicalized. This is the first part
of a significant (7%) performance gain for Snappy's hot decompression
loop.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147604 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-05 11:05:55 +00:00
Chandler Carruth
1e141a854e Cleanup and FileCheck-ize a test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147603 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-05 11:05:47 +00:00
Victor Umansky
19d8559019 Peephole optimization of ptest-conditioned branch in X86 arch. Performs instruction combining of sequences generated by ptestz/ptestc intrinsics to ptest+jcc pair for SSE and AVX.
Testing: passed 'make check' including LIT tests for all sequences being handled (both SSE and AVX)

Reviewers: Evan Cheng, David Blaikie, Bruno Lopes, Elena Demikhovsky, Chad Rosier, Anton Korobeynikov



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147601 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-05 08:46:19 +00:00
Benjamin Kramer
44aac553f6 FileCheck hygiene.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147580 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-05 00:43:34 +00:00
NAKAMURA Takumi
7d34b92993 test/CodeGen/X86/jump_sign.ll: Add -mcpu=pentiumpro for non-x86 hosts. It uses "cmov".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147521 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-04 03:52:23 +00:00
Evan Cheng
56f582d664 For x86, canonicalize max
(x > y) ? x : y
=>
(x >= y) ? x : y

So for something like
(x - y) > 0 : (x - y) ? 0
It will be
(x - y) >= 0 : (x - y) ? 0

This makes is possible to test sign-bit and eliminate a comparison against
zero. e.g.
subl   %esi, %edi
testl  %edi, %edi
movl   $0, %eax
cmovgl %edi, %eax
=>
xorl   %eax, %eax
subl   %esi, $edi
cmovsl %eax, %edi

rdar://10633221


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147512 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-04 01:41:39 +00:00
Nadav Rotem
c2d064f028 Revert 147426 because it caused pr11696.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147485 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-03 22:19:42 +00:00
Nadav Rotem
316477dd54 Fix incorrect widening of the bitcast sdnode in case the incoming operand is integer-promoted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147484 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-03 22:12:28 +00:00
Chad Rosier
3d1161e9ae Enhance DAGCombine for transforming 128->256 casts into a vmovaps, rather
then a vxorps + vinsertf128 pair if the original vector came from a load.
rdar://10594409

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147481 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-03 21:05:52 +00:00
Elena Demikhovsky
ce58a03587 Fixed a bug in SelectionDAG.cpp.
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.

The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.

I added a special test that checks vector shuffle on win32.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147445 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-03 11:59:04 +00:00
Nadav Rotem
a46f35d3d6 Optimize the sequence blend(sign_extend(x)) to blend(shl(x)) since SSE blend instructions only look at the highest bit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147426 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-02 08:05:46 +00:00
Craig Topper
a86bcfb565 Allow CRC32 instructions to be selected when AVX is enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147411 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-01 19:51:58 +00:00
Craig Topper
de9e4c728e Fix sfence, lfence, mfence, and clflush to be able to be selected when AVX is enabled. Fix monitor and mwait to require SSE3 or AVX, previously they worked even if SSE3 was disabled. Make prefetch instructions not set the execution domain since they don't use XMM registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147409 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-01 19:40:22 +00:00
Rafael Espindola
acae2a63b9 Revert 147399. It broke CodeGen/ARM/vext.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147400 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-01 17:36:23 +00:00
Elena Demikhovsky
ac12855066 Fixed a bug in SelectionDAG.cpp.
The failure seen on win32, when i64 type is illegal.
It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR.

The failure message is:
llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT || (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed.

I added a special test that checks vector shuffle on win32.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147399 91177308-0d34-0410-b5e6-96231b3b80d8
2012-01-01 16:22:47 +00:00
Craig Topper
3ee6d22c78 Add patterns for integer forms of SHUFPD/VSHUFPD with a memory load.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147393 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-31 23:24:49 +00:00
Craig Topper
e00805d52f Fix typo in a SHUFPD and VSHUFPD pattern that prevented SHUFPD/VSHUFPD with a load from being selected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147392 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-31 23:15:11 +00:00
Craig Topper
2e9ed29449 Change FMA4 memory forms to use memopv* instead of alignedloadv*. No need to force alignment on these instructions. Add a couple testcases for memory forms.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147361 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-30 02:18:36 +00:00
Craig Topper
57d4b3315f Fix load size for FMA4 SS/SD instructions. They need to use f32 and f64 size, but with the special handling to be compatible with the intrinsic expecting a vector. Similar handling is already used elsewhere.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147360 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-30 01:49:53 +00:00
Eli Friedman
da813f4209 Fix type-checking for load transformation which is not legal on floating-point types. PR11674.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147323 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-28 21:24:44 +00:00
Nadav Rotem
6059b83695 PR11662.
Promotion of the mask operand needs to be done using PromoteTargetBoolean, and not padded with garbage.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147309 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-28 13:08:20 +00:00
Elena Demikhovsky
021c0a2ee7 Fixed a bug in LowerVECTOR_SHUFFLE and LowerBUILD_VECTOR.
Matching MOVLP mask for AVX (265-bit vectors) was wrong.
The failure was detected by conformance tests.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147308 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-28 08:14:01 +00:00
Eli Friedman
d6e2560e7a Make sure DAGCombiner doesn't introduce multiple loads from the same memory location. PR10747, part 2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147283 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-26 22:49:32 +00:00
Chandler Carruth
7782102c70 Use standard promotion for i8 CTTZ nodes and i8 CTLZ nodes when the
LZCNT instructions are available. Force promotion to i32 to get
a smaller encoding since the fix-ups necessary are just as complex for
either promoted type

We can't do standard promotion for CTLZ when lowering through BSR
because it results in poor code surrounding the 'xor' at the end of this
instruction. Essentially, if we promote the entire CTLZ node to i32, we
end up doing the xor on a 32-bit CTLZ implementation, and then
subtracting appropriately to get back to an i8 value. Instead, our
custom logic just uses the knowledge of the incoming size to compute
a perfect xor. I'd love to know of a way to fix this, but so far I'm
drawing a blank. I suspect the legalizer could be more clever and/or it
could collude with the DAG combiner, but how... ;]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147251 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-24 12:12:34 +00:00
Chandler Carruth
3d636ea8ed Add systematic testing for cttz as well, and fix the bug I spotted by
inspection earlier.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147250 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-24 11:46:10 +00:00
Chandler Carruth
9d2051f7fa Add i8 and i64 testing for ctlz on x86. Also simplify the i16 test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147249 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-24 11:26:59 +00:00
Chandler Carruth
e0c643d503 Tidy up this rather crufty test. Put the declarations at the top to make
my C-brain happy. Remove the unnecessary bits of pedantic IR fluff like
nounwind. Remove stray uses comments. Name things semantically rather
than tN so that adding a new test in the middle doesn't cause pain, and
so that new tests can be grouped semantically.

This exposes how little systematic testing is going on here. I noticed
this by finding several bugs via inspection and wondering why this test
wasn't catching any of them. =[

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147248 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-24 11:26:57 +00:00
Chandler Carruth
d873a4b89b Expand more when we have a nice 'tzcnt' instruction, to avoid generating
'bsf' instructions here.

This one is actually debatable to my eyes. It's not clear that any chip
implementing 'tzcnt' would have a slow 'bsf' for any reason, and unless
EFLAGS or a zero input matters, 'tzcnt' is just a longer encoding.
Still, this restores the old behavior with 'tzcnt' enabled for now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147246 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-24 11:11:38 +00:00
Chandler Carruth
131f7d3544 Tidy up some of these tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147245 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-24 11:11:36 +00:00
Chandler Carruth
acc068e873 Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the
X86ISelLowering C++ code. Because this is lowered via an xor wrapped
around a bsr, we want the dagcombine which runs after isel lowering to
have a chance to clean things up. In particular, it is very common to
see code which looks like:

  (sizeof(x)*8 - 1) ^ __builtin_clz(x)

Which is trying to compute the most significant bit of 'x'. That's
actually the value computed directly by the 'bsr' instruction, but if we
match it too late, we'll get completely redundant xor instructions.

The more naive code for the above (subtracting rather than using an xor)
still isn't handled correctly due to the dagcombine getting confused.

Also, while here fix an issue spotted by inspection: we should have been
expanding the zero-undef variants to the normal variants when there is
an 'lzcnt' instruction. Do so, and test for this. We don't want to
generate unnecessary 'bsr' instructions.

These two changes fix some regressions in encoding and decoding
benchmarks. However, there is still a *lot* to be improve on in this
type of code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147244 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-24 10:55:54 +00:00
Chandler Carruth
c08e57c7c9 Cleanup this test a bit, sorting things and grouping them more clearly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@147243 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-24 10:55:42 +00:00
Elena Demikhovsky
ba4f83b4e9 This is the second fix related to VZEXT_MOVL node.
The failure that I see in the current version is:

LLVM ERROR: Cannot select: 0x18b8f70: v4i64 = X86ISD::VZEXT_MOVL 0x18beee0 [ID=14]
  0x18beee0: v4i64 = insert_subvector 0x18b8c70, 0x18b9170, 0x18b9570 [ID=13]
    0x18b8c70: v4i64 = insert_subvector 0x18b9870, 0x18bf4e0, 0x18b9970 [ID=12]
      0x18b9870: v4i64 = undef [ID=4]
      0x18bf4e0: v2i64 = bitcast 0x18bf3e0 [ID=10]
        0x18bf3e0: v4i32 = BUILD_VECTOR 0x18b9770, 0x18b9770, 0x18b9770, 0x18b9770 [ID=8]
          0x18b9770: i32 = TargetConstant<0> [ID=6]
          0x18b9770: i32 = TargetConstant<0> [ID=6]
          0x18b9770: i32 = TargetConstant<0> [ID=6]
          0x18b9770: i32 = TargetConstant<0> [ID=6]
      0x18b9970: i32 = Constant<0> [ID=3]
    0x18b9170: v2i64 = undef [ORD=1] [ID=1]
    0x18b9570: i32 = Constant<2> [ID=5]



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146975 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-20 13:34:28 +00:00
Chandler Carruth
f2d7693fbb Begin teaching the X86 target how to efficiently codegen patterns that
use the zero-undefined variants of CTTZ and CTLZ. These are just simple
patterns for now, there is more to be done to make real world code using
these constructs be optimized and codegen'ed properly on X86.

The existing tests are spiffed up to check that we no longer generate
unnecessary cmov instructions, and that we generate the very important
'xor' to transform bsr which counts the index of the most significant
one bit to the number of leading (most significant) zero bits. Also they
now check that when the variant with defined zero result is used, the
cmov is still produced.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146974 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-20 11:19:37 +00:00
Lang Hames
8b99c1e42c Make sure that the lower bits on the VSELECT condition are properly set.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146800 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-17 01:08:46 +00:00
Craig Topper
94438ba538 Don't try to match 'unpackl/h v, v' for 32xi8 and 16xi16 when only AVX1 is supported. Fix 'unpackh v, v' for 256-bit types to understand 128-bit lanes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146726 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-16 08:06:31 +00:00
Chad Rosier
c8dd20170e Add missing zmovl AVX patterns which were causing crashes.
Patch by Elena Demikhovsky <elena.demikhovsky@intel.com>!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146689 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-15 22:11:31 +00:00
Chad Rosier
0660cfe3c8 Fix assert in LowerBUILD_VECTOR for v16i16 type on AVX.
Patch by Elena Demikhovsky <elena.demikhovsky@intel.com>!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146684 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-15 21:34:44 +00:00
Lang Hames
81fdd7bd6a Set specific target cpu for testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146678 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-15 20:22:34 +00:00
Lang Hames
74c86e513b Added test case for r146671.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146675 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-15 19:56:07 +00:00
Eli Friedman
ca072a3977 Don't try to form FGETSIGN after legalization; it is possible in some cases, but the existing code can't do it correctly. PR11570.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146630 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-15 02:07:20 +00:00
Chad Rosier
a860b189e4 Add support for lowering fneg when AVX is enabled.
rdar://10566486


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146625 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-15 01:02:25 +00:00
Chandler Carruth
ddbc274169 Manually upgrade the test suite to specify the flag to cttz and ctlz.
I followed three heuristics for deciding whether to set 'true' or
'false':

- Everything target independent got 'true' as that is the expected
  common output of the GCC builtins.
- If the target arch only has one way of implementing this operation,
  set the flag in the way that exercises the most of codegen. For most
  architectures this is also the likely path from a GCC builtin, with
  'true' being set. It will (eventually) require lowering away that
  difference, and then lowering to the architecture's operation.
- Otherwise, set the flag differently dependending on which target
  operation should be tested.

Let me know if anyone has any issue with this pattern or would like
specific tests of another form. This should allow the x86 codegen to
just iteratively improve as I teach the backend how to differentiate
between the two forms, and everything else should remain exactly the
same.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146370 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-12 11:59:10 +00:00
Evan Cheng
b3e6c70c84 Update test to something more sensible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146282 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-09 21:54:10 +00:00
Benjamin Kramer
b653397dcd X86: Add patterns for the various rounding ops for SSE4.1 and AVX.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146257 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-09 15:44:03 +00:00
Evan Cheng
9c181a92d8 Forgot setting -march.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146244 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-09 06:15:00 +00:00
Evan Cheng
e955726a0e Add 256-bit variant vmovss and vmovsd patterns. rdar://10538417
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146196 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-08 22:30:45 +00:00
Evan Cheng
13d2ba34f2 Add various missing AVX patterns which was causing crashes. Sadly, the generated
code looks pretty bad compared to SSE.

rdar://10538793


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146191 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-08 22:05:28 +00:00
Evan Cheng
e9c1e07c5f Add test for r146163.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146167 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-08 19:21:39 +00:00
NAKAMURA Takumi
e4472726b5 test/CodeGen/X86/vec_compare-2.ll: Add explicit -mtriple=i686-linux.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146152 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-08 15:24:09 +00:00
Nadav Rotem
44bac7cd65 Fix a bug in the integer-promotion of bitcast operations on vector types.
We must not issue a bitcast operation for integer-promotion of vector types, because the
location of the values in the vector may be different.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146150 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-08 13:10:01 +00:00
Eli Friedman
f91abd22be Support vector bitcasts in the AsmPrinter. PR11495.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@146001 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-07 00:50:54 +00:00
Eli Friedman
26323442d5 Fix an optimization involving EXTRACT_SUBVECTOR in DAGCombine so it behaves correctly. PR11494.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145996 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-07 00:11:56 +00:00
Craig Topper
cb6bd11bd6 Fix a bunch of SSE/AVX patterns to use v2i64/v4i64 loads since all other integer vector loads are promoted to those.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145927 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-06 09:04:59 +00:00
Craig Topper
1ff73d7a67 Merge isSHUFPMask and isCommutedSHUFPMask into single function that can do both. Do the same for the 256-bit version. Use loops to reduce size of isVSHUFPYMask. Fix test cases that were incorrectly passing due to isCommutedSHUFPMask not checking for the vector being 128-bit. This caused some 256-bit shuffles to be incorrectly commuted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145921 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-06 04:59:07 +00:00
NAKAMURA Takumi
27de2a54f3 test/CodeGen/X86/pointer-vector.ll: Add explicit -mtriple=i686-linux.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145805 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-05 07:54:57 +00:00
Nadav Rotem
1608769abe Add support for vectors of pointers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145801 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-05 06:29:09 +00:00
Sanjoy Das
199ce33b3b Check for stack space more intelligently.
libgcc sets the stack limit field in TCB to 256 bytes above the actual
allocated stack limit.  This means if the function's stack frame needs
less than 256 bytes, we can just compare the stack pointer with the
stack limit.  This should result in lesser calls to __morestack.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145766 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-03 09:32:07 +00:00
Sanjoy Das
40f8222e1e Fix a bug in the x86-32 code generated for segmented stacks.
Currently LLVM pads the call to __morestack with a add and sub of 8
bytes to esp.  This isn't correct since __morestack expects the call
to be followed directly by a ret.

This commit also adjusts the relevant test-case.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145765 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-03 09:21:07 +00:00
Craig Topper
138a5c66b9 Add instruction selection support for horizontal add/sub of 256-bit floating point vectors. Also add the test case for 256-bit integer vectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145680 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-02 07:16:01 +00:00
Eric Christopher
7d5a61e975 For 64-bit the rest of the general regs are ok for the q constraint. Make
sure we can emit both the high and low versions of those registers.

Fixes rdar://10392864

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145579 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-01 08:12:41 +00:00
Eli Friedman
522fb8cc01 Pass AVX vectors which are arguments to varargs functions on the stack. <rdar://problem/10463281>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145573 91177308-0d34-0410-b5e6-96231b3b80d8
2011-12-01 04:49:21 +00:00
Jan Sjödin
dd649e35e5 Support for encoding all FMA4 instructions and tablegen patterns for all
remaining FMA4 instructions and intrinsics with tests.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145525 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-30 22:09:42 +00:00
Nadav Rotem
78647434ea Add test arch to make it pass on non x86 targets
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145498 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-30 17:34:28 +00:00
Nadav Rotem
f3993125b1 Add a tripple to the test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145489 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-30 11:20:56 +00:00
Nadav Rotem
18197d7425 X86: PerformOrCombine introduced a vselect node with a wrong order of operands. This bug was introduced when a dedicated blend sdnode was replaced with the vselect node (in 139479).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145488 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-30 10:13:37 +00:00
Evan Cheng
a3438cf48b Add another missing pattern. llvm-gcc likes f64 but clang likes i64 so it was generating poor code for some SSE builtins.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145448 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-29 22:48:34 +00:00
Jakob Stoklund Olesen
0edd83bfff Make X86::FsFLD0SS / FsFLD0SD real pseudo-instructions.
Like V_SET0, these instructions are expanded by ExpandPostRA to xorps /
vxorps so they can participate in execution domain swizzling.

This also makes the AVX variants redundant.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145440 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-29 22:27:25 +00:00
Elena Demikhovsky
f68b214e2d Fixed vsqrt.ss intrinsic usage - order of input operands was wrong.
Added a test.
Thanks Bruno for reviewing the patch.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145403 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-29 15:00:45 +00:00
Craig Topper
f267972d28 Fix shuffle decoding for memory forms for (V)SHUFPS/D.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145392 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-29 07:58:09 +00:00
Craig Topper
36e36ace77 Fix issues in shuffle decoding around VPERM* instructions. Fix shuffle decoding for VSHUFPS/D for 256-bit types. Add pattern matching for memory forms of VPERMILPS/VPERMILPD.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145390 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-29 07:49:05 +00:00
Craig Topper
fe2a6c584a Fix VINSERTF128/VEXTRACTF128 to be marked as FP instructions. Allow execution dependency fix pass to convert them to their integer equivalents when AVX2 is enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145376 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-29 05:37:58 +00:00
Craig Topper
108126cfc6 Correctly mark VPERM2F128 as being an FP instruction and add execution domain fixing support to convert it to VPERM2I128 for AVX2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145370 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-29 03:57:34 +00:00
Evan Cheng
ed1c0c7f58 Revert r145273 and fix in SelectionDAG::InferPtrAlignment() instead.
Conservatively returns zero when the GV does not specify an alignment nor is it
initialized. Previously it returns ABI alignment for type of the GV. However, if
the type is a "packed" type, then the under-specified alignments is attached to
the load / store instructions. In that case, the alignment of the type cannot be
trusted.
rdar://10464621


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145300 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-28 22:37:34 +00:00
Evan Cheng
1c487869f5 DAG combine should not increase alignment of loads / stores with alignment less
than ABI alignment. These are loads / stores from / to "packed" data structures.
Their alignments are intentionally under-specified.

rdar://10301431


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145273 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-28 20:42:56 +00:00
Craig Topper
70b883b3a7 Add X86 instruction selection for VPERM2I128 when AVX2 is enabled. Merge VPERMILPS/VPERMILPD detection since they are pretty similar.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145238 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-28 10:14:51 +00:00
Chandler Carruth
fac1305da1 Take two on rotating the block ordering of loops. My previous attempt
was centered around the premise of laying out a loop in a chain, and
then rotating that chain. This is good for preserving contiguous layout,
but bad for actually making sane rotations. In order to keep it safe,
I had to essentially make it impossible to rotate deeply nested loops.
The information needed to correctly reason about a deeply nested loop is
actually available -- *before* we layout the loop. We know the inner
loops are already fused into chains, etc. We lose information the moment
we actually lay out the loop.

The solution was the other alternative for this algorithm I discussed
with Benjamin and some others: rather than rotating the loop
after-the-fact, try to pick a profitable starting block for the loop's
layout, and then use our existing layout logic. I was worried about the
complexity of this "pick" step, but it turns out such complexity is
needed to handle all the important cases I keep teasing out of benchmarks.

This is, I'm afraid, a bit of a work-in-progress. It is still
misbehaving on some likely important cases I'm investigating in Olden.
It also isn't really tested. I'm going to try to craft some interesting
nested-loop test cases, but it's likely to be extremely time consuming
and I don't want to go there until I'm sure I'm testing the correct
behavior. Sadly I can't come up with a way of getting simple, fine
grained test cases for this logic. We need complex loop structures to
even trigger much of it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145183 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-27 13:34:33 +00:00
Chandler Carruth
2eb5a744b1 Rework a bit of the implementation of loop block rotation to not rely so
heavily on AnalyzeBranch. That routine doesn't behave as we want given
that rotation occurs mid-way through re-ordering the function. Instead
merely check that there are not unanalyzable branching constructs
present, and then reason about the CFG via successor lists. This
actually simplifies my mental model for all of this as well.

The concrete result is that we now will rotate more loop chains. I've
added a test case from Olden highlighting the effect. There is still
a bit more to do here though in order to regain all of the performance
in Olden.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145179 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-27 09:22:53 +00:00
Chris Lattner
3211c6e31b remove autoupgrade support for old forms of llvm.prefetch and the old
trampoline forms.  Both of these were correct in LLVM 3.0, and we don't
need to support LLVM 2.9 and earlier in mainline.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145174 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-27 07:42:04 +00:00
Chris Lattner
d2bf432b2b Upgrade syntax of tests using volatile instructions to use 'load volatile' instead of 'volatile load', which is archaic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145171 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-27 06:54:59 +00:00
Chris Lattner
663aebf8d6 remove some old autoupgrade logic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145167 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-27 06:10:54 +00:00
Chandler Carruth
2e38cf961d Introduce a loop block rotation optimization to the new block placement
pass. This is designed to achieve one of the important optimizations
that the old code placement pass did, but more simply.

This is a somewhat rough and *very* conservative version of the
transform. We could get a lot fancier here if there are profitable cases
to do so. In particular, this only looks for a single pattern, it
insists that the loop backedge being rotated away is the last backedge
in the chain, and it doesn't provide any means of doing better in-loop
placement due to the rotation. However, it appears that it will handle
the important loops I am finding in the LLVM test suite.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145158 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-27 00:38:03 +00:00
Eli Friedman
4455142a95 Fix APFloat::convert so that it handles narrowing conversions correctly; it
was returning incorrect values in rare cases, and incorrectly marking
exact conversions as inexact in some more common cases. Fixes PR11406, and a
missed optimization in test/CodeGen/X86/fp-stack-O0.ll.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145141 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-26 03:38:02 +00:00
Bruno Cardoso Lopes
1b9b377975 This patch contains support for encoding FMA4 instructions and
tablegen patterns for scalar FMA4 operations and intrinsic. Also
add tests for vfmaddsd.

Patch by Jan Sjodin

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145133 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-25 19:33:42 +00:00
Craig Topper
705f2431a0 Remove 256-bit specific node types for UNPCKHPS/D and instead use the 128-bit versions and let the operand type disinquish. Also fix the load form of the v8i32 patterns for these to realize that the load would be promoted to v4i64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145126 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-24 22:57:10 +00:00
Chandler Carruth
4aae4f9007 Fix a silly use-after-free issue. A much earlier version of this code
need lots of fanciness around retaining a reference to a Chain's slot in
the BlockToChain map, but that's all gone now. We can just go directly
to allocating the new chain (which will update the mapping for us) and
using it.

Somewhat gross mechanically generated test case replicates the issue
Duncan spotted when actually testing this out.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145120 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-24 11:23:15 +00:00
Chandler Carruth
a2deea1dcf When adding blocks to the list of those which no longer have any CFG
conflicts, we should only be adding the first block of the chain to the
list, lest we try to merge into the middle of that chain. Most of the
places we were doing this we already happened to be looking at the first
block, but there is no reason to assume that, and in some cases it was
clearly wrong.

I've added a couple of tests here. One already worked, but I like having
an explicit test for it. The other is reduced from a test case Duncan
reduced for me and used to crash. Now it is handled correctly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145119 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-24 08:46:04 +00:00
Benjamin Kramer
f238f50aaf X86: Use btq for bit tests if the immediate can't be encoded in 32 bits.
Before:
	movabsq	$4294967296, %rax       ## encoding: [0x48,0xb8,0x00,0x00,0x00,0x00,0x01,0x00,0x00,0x00]
	testq	%rax, %rdi              ## encoding: [0x48,0x85,0xf8]
	jne	LBB0_2                  ## encoding: [0x75,A]

After:
	btq	$32, %rdi               ## encoding: [0x48,0x0f,0xba,0xe7,0x20]
	jb	LBB0_2                  ## encoding: [0x72,A]

btq is usually slower than testq because it doesn't fuse with the jump, but here we're better off
saving one register and a giant movabsq.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145103 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-23 13:54:17 +00:00
NAKAMURA Takumi
e4513b1fc5 test/CodeGen/X86/block-placement.ll: Add explicit -mtriple=i686-linux. X86 Win32 CodeGen does not support EH yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145101 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-23 12:18:22 +00:00
Chandler Carruth
598894ff25 Relax an invariant that block placement was trying to assert a bit
further. This invariant just wasn't going to work in the face of
unanalyzable branches; we need to be resillient to the phenomenon of
chains poking into a loop and poking out of a loop. In fact, we already
were, we just needed to not assert on it.

This was found during a bootstrap with block placement turned on.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145100 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-23 10:35:36 +00:00
Elena Demikhovsky
52a35a89e6 I added several lines in X86 code generator that allow to choose
VSHUFPS/VSHUFPD instructions while lowering VECTOR_SHUFFLE node. I check a commuted VSHUFP mask.

The patch was reviewed by Bruno.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145099 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-23 10:23:16 +00:00
Chandler Carruth
521fc5bcd7 Handle the case of a no-return invoke correctly. It actually still has
successors, they just are all landing pad successors. We handle this the
same way as no successors. Comments attached for the next person to wade
through here and another lovely test case courtesy of Benjamin Kramer's
bugpoint reduction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145098 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-23 08:23:54 +00:00
Bob Wilson
23d66a58b7 Enable stack protectors for all arrays, not just char arrays. rdar://5875909
Patch by Bill Wendling.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145097 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-23 07:13:56 +00:00
Jakob Stoklund Olesen
7f5e43f61d Fix PR11422.
This was a bug in keeping track of the available domains when merging
domain values.

The wrong domain mask caused ExecutionDepsFix to try to move VANDPSYrr
to the integer domain which is only available in AVX2.

Also add an assertion to catch future attempts at emitting AVX2
instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145096 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-23 04:03:08 +00:00
Chandler Carruth
47fb954f74 Fix a crash in block placement due to an inner loop that happened to be
reversed in the function's original ordering, and we happened to
encounter it while handling an outer unnatural CFG structure.

Thanks to the test case reduced from GCC's source by Benjamin Kramer.
This may also fix a crasher in gzip that Duncan reduced for me, but
I haven't yet gotten to testing that one.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145094 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-23 03:03:21 +00:00
Chandler Carruth
3b7b209bf8 Fix a devilish miscompile exposed by block placement. The
updateTerminator code didn't correctly handle EH terminators in one very
specific case. AnalyzeBranch would find no terminator instruction, and
so the fallback in updateTerminator is to assume fallthrough. This is
correct, but the destination of the fallthrough was assumed to be the
first successor.

This is *almost always* true, but in certain cases the loop
transformations will cause the landing pad to be the first successor!
Instead of this brittle logic, actually look through the successors for
a non-landing-pad accessor, and to assert if more than one is found.

This will hopefully fix some (if not all) of the self host miscompiles
with block placement. Thanks to Benjamin Kramer for reporting, Nick
Lewycky for an initial stab at a reduction, and Duncan for endless
advice on EH (which I know nothing about) as well as reviewing the
actual fix.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145062 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-22 13:13:16 +00:00
Rafael Espindola
fdb00a9bdb Add triple to the test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145057 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-22 06:36:25 +00:00
Rafael Espindola
254a13282c If a register is both an early clobber and part of a tied use, handle the use
before the clobber so that we copy the value if needed.

Fixes pr11415.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145056 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-22 06:27:18 +00:00
Craig Topper
6fa583d787 Lowering for v32i8 to VPUNPCKLBW/VPUNPCKHBW when AVX2 is enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145028 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-21 08:26:50 +00:00
Craig Topper
3b73312020 Test case for r145026
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145027 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-21 06:58:09 +00:00
Craig Topper
a124f94952 Make LowerSIGN_EXTEND_INREG split 256-bit vectors when AVX1 is enabled and use AVX2 shifts when AVX2 is enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145022 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-21 01:12:36 +00:00
NAKAMURA Takumi
742e5cf612 test/CodeGen/X86/block-placement.ll: Relax expressions for Win32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145011 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-20 12:49:45 +00:00
Chandler Carruth
b0dadb9dd5 The logic for breaking the CFG in the presence of hot successors didn't
properly account for the *global* probability of the edge being taken.
This manifested as a very large number of unconditional branches to
blocks being merged against the CFG even though they weren't
particularly hot within the CFG.

The fix is to check whether the edge being merged is both locally hot
relative to other successors for the source block, and globally hot
compared to other (unmerged) predecessors of the destination block.

This introduces a new crasher on GCC single-source, but it's currently
behind a flag, and Ben has offered to work on the reduction. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145010 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-20 11:22:06 +00:00
Chandler Carruth
2901243fda Add some comments to the latest test case I added here to document what
is actually being tested. Also add some FileCheck goodness to much more
carefully ensure that the result is the desired result. Before this test
would only have failed through an assert failure if the underlying fix
were reverted.

Also, add some weight metadata and a comment explaining exactly what is
going on to a trick section of the test case. Originally, we were
getting very unlucky and trying to form a block chain that isn't
actually profitable. I'm working on a fix to avoid forming these
unprofitable chains, and that would also have masked any failure from
this test case. The easy solution is to add some metadata that makes it
*really* profitable to form the bad chain here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145006 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-20 09:30:40 +00:00
Craig Topper
0d86d462f8 Add code for lowering v32i8 shifts by a splat to AVX2 immediate shift instructions. Remove 256-bit splat handling from LowerShift as it was already handled by PerformShiftCombine.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145005 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-20 00:12:05 +00:00
Craig Topper
745a86bac9 Use 256-bit vcmpeqd for creating an all ones vector when AVX2 is enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@145004 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-19 22:34:59 +00:00
Chandler Carruth
03300ecaee Move the handling of unanalyzable branches out of the loop-driven chain
formation phase and into the initial walk of the basic blocks. We
essentially pre-merge all blocks where unanalyzable fallthrough exists,
as we won't be able to update the terminators effectively after any
reorderings. This is quite a bit more principled as there may be CFGs
where the second half of the unanalyzable pair has some analyzable
predecessor that gets placed first. Then it may get placed next,
implicitly breaking the unanalyzable branch even though we never even
looked at the part that isn't analyzable. I've included a test case that
triggers this (thanks Benjamin yet again!), and I'm hoping to synthesize
some more general ones as I dig into related issues.

Also, to make this new scheme work we have to be able to handle branches
into the middle of a chain, so add this check. We always fallback on the
incoming ordering.

Finally, this starts to really underscore a known limitation of the
current implementation -- we don't consider broken predecessors when
merging successors. This can caused major missed opportunities, and is
something I'm planning on looking at next (modulo more bug reports).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144994 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-19 10:26:02 +00:00
Craig Topper
6bf57b0272 Test cases for SSSE3/AVX integer horizontal add/sub.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144990 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-19 09:03:33 +00:00
Craig Topper
1666cb6d63 Extend VPBLENDVB and VPSIGN lowering to work for AVX2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144987 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-19 07:07:26 +00:00
Nadav Rotem
cbbe33fde4 Add AVX2 vpbroadcast support
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144967 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-18 02:49:55 +00:00
Devang Patel
ce35d8b5a1 DISubrange supports unsigned lower/upper array bounds, so let's not fake it in the end while emitting DWARF. If a FE needs to encode signed lower/upper array bounds then we need to extend DISubrange or ad DISignedSubrange.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144937 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-17 23:43:15 +00:00
Eli Friedman
4db4addcd4 Make sure to replace the chain properly when DAGCombining a LOAD+EXTRACT_VECTOR_ELT into a single LOAD. Fixes PR10747/PR11393.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144863 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-16 23:50:22 +00:00
Evan Cheng
2b89498979 Another missing X86ISD::MOVLPD pattern. rdar://10450317
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144839 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-16 22:24:44 +00:00
Evan Cheng
c3aa7c5c5a Disable expensive two-address optimizations at -O0. rdar://10453055
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144806 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-16 18:44:48 +00:00
Eli Friedman
ee94dc212e Fix testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144769 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-16 03:03:52 +00:00
Eli Friedman
d577df8e5a CONCAT_VECTORS can have more than two operands. PR11389.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144768 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-16 02:52:39 +00:00
Nadav Rotem
f8c10e5cb1 AVX: Add support for vbroadcast from BUILD_VECTOR and refactor some of the vbroadcast code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144720 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-15 22:50:37 +00:00
NAKAMURA Takumi
ec0af2f4e1 test/CodeGen/X86/dec-eflags-lower.ll: Relax expression for win32 x64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144714 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-15 22:30:37 +00:00
Pete Cooper
2d49689793 Added custom lowering for load->dec->store sequence in x86 when the EFLAGS registers is used
by later instructions.

Only done for DEC64m right now.

Fixes <rdar://problem/6172640>


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144705 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-15 21:57:53 +00:00
Rafael Espindola
6c5b2dcd83 We currently use a callback to handle an IL pass deleting a BB that still
has a reference to it. Unfortunately, that doesn't work for codegen passes
since we don't get notified of MBB's being deleted (the original BB stays).

Use that fact to our advantage and after printing a function, check if
any of the IL BBs corresponds to a symbol that was not printed. This fixes
pr11202.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144674 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-15 19:08:46 +00:00
Jakob Stoklund Olesen
f805a7c25c Revert r144611 and r144613.
These tests are actually correct, clang was miscompiling ExeDepsFix::processUses.

Evan fixed the miscompilation in r144628.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144630 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-15 07:13:03 +00:00
Chandler Carruth
3273c8937b Rather than trying to use the loop block sequence *or* the function
block sequence when recovering from unanalyzable control flow
constructs, *always* use the function sequence. I'm not sure why I ever
went down the path of trying to use the loop sequence, it is
fundamentally not the correct sequence to use. We're trying to preserve
the incoming layout in the cases of unreasonable control flow, and that
is only encoded at the function level. We already have a filter to
select *exactly* the sub-set of blocks within the function that we're
trying to form into a chain.

The resulting code layout is also significantly better because of this.
In several places we were ending up with completely unreasonable control
flow constructs due to the ordering chosen by the loop structure for its
internal storage. This change removes a completely wasteful vector of
basic blocks, saving memory allocation in the common case even though it
costs us CPU in the fairly rare case of unnatural loops. Finally, it
fixes the latest crasher reduced out of GCC's single source. Thanks
again to Benjamin Kramer for the reduction, my bugpoint skills failed at
it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144627 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-15 06:26:43 +00:00
Craig Topper
4c077a1f04 Properly qualify AVX2 specific parts of execution dependency table. Also enable converting between 256-bit PS/PD operations when AVX1 is enabled. Fixes PR11370.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144622 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-15 05:55:35 +00:00
Jakob Stoklund Olesen
ff70467aa2 Really fix test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144613 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-15 03:17:01 +00:00
Jakob Stoklund Olesen
3c84ec070a Allow for depencendy-breaking instructions before cvt*.
This should unbreak clang-x86_64-darwin10-RA, but I can't actually
reproduce the failure.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@144611 91177308-0d34-0410-b5e6-96231b3b80d8
2011-11-15 02:29:48 +00:00