Commit Graph

1365 Commits

Author SHA1 Message Date
Bruno Cardoso Lopes
3722f007b6 Replace unpckl_undef and unpckh_undef matching with target specific opcodes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112806 91177308-0d34-0410-b5e6-96231b3b80d8
2010-09-02 05:23:12 +00:00
Bruno Cardoso Lopes
dd69db858c Move condition out to prepare for more matching
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112805 91177308-0d34-0410-b5e6-96231b3b80d8
2010-09-02 04:20:26 +00:00
Bruno Cardoso Lopes
ad10fb2b56 Remove checking for isUNPCKL_v_undef_Mask, the specific node is already emitted for it
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112804 91177308-0d34-0410-b5e6-96231b3b80d8
2010-09-02 03:57:58 +00:00
Bruno Cardoso Lopes
d00bfe1f8d become more strict about when it's safe to use X86ISD::MOVLPS
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112799 91177308-0d34-0410-b5e6-96231b3b80d8
2010-09-02 02:35:51 +00:00
Bruno Cardoso Lopes
4783a3ee13 Revert r112689, avoid those kind of checks cause they mess up with mmx
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112760 91177308-0d34-0410-b5e6-96231b3b80d8
2010-09-01 22:59:03 +00:00
Bruno Cardoso Lopes
56098f5d26 Use movlps, movlpd, movss and movsd specific nodes instead of pattern matching with movlp pattern fragment
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112694 91177308-0d34-0410-b5e6-96231b3b80d8
2010-09-01 05:08:25 +00:00
Bruno Cardoso Lopes
9cfad89a68 minor change, simplify some logic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112689 91177308-0d34-0410-b5e6-96231b3b80d8
2010-09-01 00:57:08 +00:00
Bruno Cardoso Lopes
e654b56eb1 Move some functions around so they can be used for some other to come function
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112687 91177308-0d34-0410-b5e6-96231b3b80d8
2010-09-01 00:51:36 +00:00
Bruno Cardoso Lopes
013bb3dee9 Use x86 specific MOVSLDUP node, add more patterns to match it and remove useless load nodes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112661 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-31 22:35:05 +00:00
Bruno Cardoso Lopes
5023ef281c Use x86 specific MOVSHDUP node and add more patterns to match it
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112657 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-31 22:22:11 +00:00
Bruno Cardoso Lopes
7ff30bb1a5 Use MOVHLPS node instead of matching using movhlps and movhlps_undef pattern fragments
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112644 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-31 21:38:49 +00:00
Bruno Cardoso Lopes
f2db5b48d0 Use MOVLHPS and MOVHLPS x86 nodes whenever possible. Also remove some useless nodes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112642 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-31 21:15:21 +00:00
Bruno Cardoso Lopes
20a07f422d Use X86ISD::MOVSS and MOVSD to represent the movl mask pattern, also fix the handling of those nodes when seeking for scalars inside vector shuffles
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112570 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-31 02:26:40 +00:00
Chris Lattner
24faf611a3 fix the buildvector->insertp[sd] logic to not always create a redundant
insertp[sd] $0, which is a noop.  Before:

_f32:                                   ## @f32
	pshufd	$1, %xmm1, %xmm2
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm2, %xmm3
	addss	%xmm1, %xmm0
                                        ## kill: XMM0<def> XMM0<kill> XMM0<def>
	insertps	$0, %xmm0, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

after:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movdqa	%xmm2, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

The extra movs are due to a random (poor) scheduling decision.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112379 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-28 17:59:08 +00:00
Chris Lattner
3ddcc43040 fix the BuildVector -> unpcklps logic to not do pointless shuffles
when the top elements of a vector are undefined.  This happens all
the time for X86-64 ABI stuff because only the low 2 elements of
a 4 element vector are defined.  For example, on:

_Complex float f32(_Complex float A, _Complex float B) {
  return A+B;
}

We used to produce (with SSE2, SSE4.1+ uses insertps):

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$16, %xmm2, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm0
	addss	%xmm1, %xmm0
	pshufd	$16, %xmm0, %xmm1
	movdqa	%xmm2, %xmm0
	unpcklps	%xmm1, %xmm0
	ret

We now produce:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movaps	%xmm2, %xmm0
	unpcklps	%xmm3, %xmm0
	ret

This implements rdar://8368414


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112378 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-28 17:28:30 +00:00
Chris Lattner
6e80e44926 improve comments in the unpcklps generating logic, introduce
a new EltStride variable instead of reusing NumElems variable
for a non-obvious purpose.  No functionality change.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112377 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-28 17:15:43 +00:00
Bruno Cardoso Lopes
27f1279411 Clean up the logic of vector shuffles -> vector shifts.
Also teach this logic how to handle target specific shuffles if
needed, this is necessary while searching recursively for zeroed
scalar elements in vector shuffle operands.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112348 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-28 02:46:39 +00:00
Anton Korobeynikov
c52bedba54 Properly handle passing of FP stuff to varargs function on Win64:
value should be copied to the corresponding shadow reg as well.
Patch by Cameron Esfahani!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112262 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-27 14:43:06 +00:00
Bruno Cardoso Lopes
af57738f00 zap the now unused MVT::getIntVectorWithNumElements
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112218 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-26 20:53:12 +00:00
Chris Lattner
8306968c14 implement SplitVecOp_CONCAT_VECTORS, fixing the included testcase with SSE1.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112171 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-26 05:51:22 +00:00
Chris Lattner
97a2a56f43 fix sse1 only codegen in x86-64 mode, which is something we
apparently try to support.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112168 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-26 05:24:29 +00:00
Bruno Cardoso Lopes
3e60a232c1 Revert this for now, PUNPCKLDQ dont operate on v4f32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112090 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-25 21:26:37 +00:00
Anton Korobeynikov
9f7f83b861 Fix nasty mingw32 bug, which e.g. prevented llvm-gcc bootstrap there.
Mark _alloca call as clobberring EFLAGS, otherwise some DCE might remove
other flags-clobberring stuff (e.g. cmp instructions) occuring after
_alloca call.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112034 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-25 07:50:11 +00:00
Bruno Cardoso Lopes
f76c55aa40 PUNPCKLDQ should also be used for v4f32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112020 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-25 02:55:40 +00:00
Bruno Cardoso Lopes
7338bbd32a teach lowering to get target specific nodes for pshufd, emulating the same isel behavior for now, so we can pass all vector shuffle tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@112017 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-25 02:35:37 +00:00
Dan Gohman
92b651fb19 Fix X86's isLegalAddressingMode to recognize that static addresses
need not be RIP-relative in small mode.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111917 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-24 15:55:12 +00:00
Bruno Cardoso Lopes
8878e21fe6 Use pshufhw and pshuflw in more cases and fix getTargetShuffleNode number of arguments
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111890 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-24 01:16:15 +00:00
Bruno Cardoso Lopes
3efc0778c9 Start using target speficic nodes for shuffles: pshufhw and pshuflw
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111837 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-23 20:41:02 +00:00
Anton Korobeynikov
4654a07e25 Revert invalid r111792. Jump tables are not broken on x86-64 / coff,
it's COFF emitter which does not support differences of two symbols
(and needs to be fixed). GAS is pretty fine with code produced.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111801 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-23 07:38:51 +00:00
Michael J. Spencer
3464cec4d8 Workaround broken jump tables on x86-64 COFF.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111792 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-23 04:45:37 +00:00
Bruno Cardoso Lopes
bf8154a439 Prepare LowerVECTOR_SHUFFLEv8i16 to use x86 target specific nodes directly
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111704 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-21 01:32:18 +00:00
Bruno Cardoso Lopes
3157ef1c13 This is the first step towards refactoring the x86 vector shuffle code. The
general idea here is to have a group of x86 target specific nodes which are
going to be selected during lowering and then directly matched in isel.

The commit includes the addition of those specific nodes and a *bunch* of
patterns, and incrementally we're going to switch between them and what we
have right now. Both the patterns and target specific nodes can change as
we move forward with this work.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111691 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-20 22:55:05 +00:00
Anton Korobeynikov
3a1e54a6b9 More fixes for win64:
- Do not clobber al during variadic calls, this is AMD64 ABI-only feature
  - Emit wincall64, where necessary
Patch by Cameron Esfahani!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111289 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-17 21:06:07 +00:00
Eric Christopher
c0b2a2018a Rework how the non-sse2 memory barrier is lowered so that the
encoding is correct for the built-in assembler.

Based on a patch from Chris.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111083 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-14 21:51:50 +00:00
Chris Lattner
132929aa9e improve indentation
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111073 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-14 17:26:09 +00:00
Bruno Cardoso Lopes
bb0a9489e0 Fix comment to reflect code, and remove an unused argument
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@111022 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-13 17:50:47 +00:00
Bruno Cardoso Lopes
8c05a850f4 Begin to support some vector operations for AVX 256-bit intructions. The long
term goal here is to be able to match enough of vector_shuffle and build_vector
so all avx intrinsics which aren't mapped to their own built-ins but to
shufflevector calls can be codegen'd. This is the first (baby) step, support
building zeroed vectors.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110897 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-12 02:06:36 +00:00
Dan Gohman
d881627d33 Use ISD::ADD instead of ISD::SUB with a negated constant. This
avoids trouble if the return type of TD->getPointerSize() is
changed to something which doesn't promote to a signed type,
and is simpler anyway.

Also, use getCopyFromReg instead of getRegister to read a
physical register's value.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110835 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-11 18:14:00 +00:00
Bruno Cardoso Lopes
045573ce21 Add AVX matching patterns to Packed Bit Test intrinsics.
Apply the same approach of SSE4.1 ptest intrinsics but
create a new x86 node "testp" since AVX introduces
vtest{ps}{pd} instructions which set ZF and CF depending
on sign bit AND and ANDN of packed floating-point sources.

This is slightly different from what the "ptest" does.
Tests comming with the other 256 intrinsics tests.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110744 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-10 23:25:42 +00:00
Bruno Cardoso Lopes
405f11b300 Support AVX 256-bit load and store intrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110645 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-10 01:43:16 +00:00
Bruno Cardoso Lopes
ac09835a22 Support very basic (doesn't include ABI support in the front-end, varags, ...) 256-bit argument passing and return for AVX
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110394 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-05 23:35:51 +00:00
Eric Christopher
b6729dc0ef Make x86-64 membarriers work without sse and clean up some of the
uses.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@110274 91177308-0d34-0410-b5e6-96231b3b80d8
2010-08-04 23:03:04 +00:00
Bruno Cardoso Lopes
98f985607b Support all 128-bit AVX vector intrinsics. Most part of them I already
declared during the addition of the assembler support, the additional
changes are:
- Add missing intrinsics
- Move all SSE conversion instructions in X86InstInfo64.td to the SSE.td file.
- Duplicate some patterns to AVX mode.
- Step into PCMPEST/PCMPIST custom inserter and add AVX versions.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109878 91177308-0d34-0410-b5e6-96231b3b80d8
2010-07-30 19:54:33 +00:00
Jakob Stoklund Olesen
b2eeed7464 Revert r109652, and remove the offending assert in loadRegFromStackSlot instead.
We do sometimes load from a too small stack slot when dealing with x86 arguments
(varargs and smaller-than-32-bit args). It looks like we know what we are doing
in those cases, so I am going to remove the assert instead of artifically
enlarging stack slot sizes.

The assert in storeRegToStackSlot stays in. We don't want to write beyond the
bounds of a stack slot.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109764 91177308-0d34-0410-b5e6-96231b3b80d8
2010-07-29 17:42:27 +00:00
Jakob Stoklund Olesen
4c010ec851 Create a fixed stack object for varargs that is as large as any register.
The size of this object isn't used for anything - technically it is of variable
size.

This avoids a false positive from the assert in
X86InstrInfo::loadRegFromStackSlot, and fixes PR7735.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109652 91177308-0d34-0410-b5e6-96231b3b80d8
2010-07-28 20:55:38 +00:00
Nate Begeman
51409214d7 Implement a vectorized algorithm for <16 x i8> << <16 x i8>
This is about 4x faster and smaller than the existing scalarization.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109566 91177308-0d34-0410-b5e6-96231b3b80d8
2010-07-28 00:21:48 +00:00
Nate Begeman
bdcb5afb77 ~40% faster vector shl <4 x i32> on SSE 4.1 Larger improvements for smaller types coming in future patches.
For:

define <2 x i64> @shl(<4 x i32> %r, <4 x i32> %a) nounwind readnone ssp {
entry:
  %shl = shl <4 x i32> %r, %a                     ; <<4 x i32>> [#uses=1]
  %tmp2 = bitcast <4 x i32> %shl to <2 x i64>     ; <<2 x i64>> [#uses=1]
  ret <2 x i64> %tmp2
}

We get:

_shl:                                   ## @shl
	pslld	$23, %xmm1
	paddd	LCPI0_0, %xmm1
	cvttps2dq	%xmm1, %xmm1
	pmulld	%xmm1, %xmm0
	ret

Instead of:

_shl:                                   ## @shl
	pshufd	$3, %xmm0, %xmm2
	movd	%xmm2, %eax
	pshufd	$3, %xmm1, %xmm2
	movd	%xmm2, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm2
	pshufd	$1, %xmm0, %xmm3
	movd	%xmm3, %eax
	pshufd	$1, %xmm1, %xmm3
	movd	%xmm3, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm3
	punpckldq	%xmm2, %xmm3
	movd	%xmm0, %eax
	movd	%xmm1, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm2
	movhlps	%xmm0, %xmm0
	movd	%xmm0, %eax
	movhlps	%xmm1, %xmm1
	movd	%xmm1, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm0
	punpckldq	%xmm0, %xmm2
	movdqa	%xmm2, %xmm0
	punpckldq	%xmm3, %xmm0
	ret


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109549 91177308-0d34-0410-b5e6-96231b3b80d8
2010-07-27 22:37:06 +00:00
Evan Cheng
dee81010eb On x86, f32 / f64 nodes share the same registers as 128-bit vector values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109450 91177308-0d34-0410-b5e6-96231b3b80d8
2010-07-26 21:50:05 +00:00
Evan Cheng
70017e44cd Add an ILP scheduler. This is a register pressure aware scheduler that's
appropriate for targets without detailed instruction iterineries.
The scheduler schedules for increased instruction level parallelism in
low register pressure situation; it schedules to reduce register pressure
when the register pressure becomes high.

On x86_64, this is a win for all tests in CFP2000. It also sped up 256.bzip2
by 16%.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109300 91177308-0d34-0410-b5e6-96231b3b80d8
2010-07-24 00:39:05 +00:00
Dale Johannesen
c76d23f2e2 The only supported calling convention for X86-64 uses
SSE, so we can't return floating point values if this
is disabled.  Detect this error for clang.

With SSE1 only, f64 is a problem; it can be done, but
neither llvm-gcc nor clang has ever generated correct
code for it.  Since nobody noticed this I think it's
OK to treat it as an error for now.

This also handles SSE-sized vectors of floating point.
8207686, 8204109.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@109201 91177308-0d34-0410-b5e6-96231b3b80d8
2010-07-23 00:30:35 +00:00