Commit Graph

328 Commits

Author SHA1 Message Date
Eric Christopher
fbd6687cf1 Update insertps handling based on feedback. Move to a v4f32 style
to support vector arguments and scalar arguments correctly. Update
lowering and fix comment to refer to pinsr* instead of insertps.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@76921 91177308-0d34-0410-b5e6-96231b3b80d8
2009-07-24 00:33:09 +00:00
Eric Christopher
1e5cdea9d7 Support insertps via the intrinsic and add a couple of simple
testcases to make sure it's being generated.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@76843 91177308-0d34-0410-b5e6-96231b3b80d8
2009-07-23 02:22:41 +00:00
Eli Friedman
7e2242be71 Fix for PR2484: add an SSE1 pattern for a shuffle we normally prefer to
handle with an SSE2 instruction.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@73760 91177308-0d34-0410-b5e6-96231b3b80d8
2009-06-19 07:00:55 +00:00
Eli Friedman
9d47b8d8ea Fix an obvious typo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@72987 91177308-0d34-0410-b5e6-96231b3b80d8
2009-06-06 05:55:37 +00:00
Bill Wendling
2265ba0717 The MONITOR and MWAIT instructions have insufficient information for
decoding. Essentially, they both map to the same column in the "opcode
extensions for one- and two-byte opcodes" table in the x86 manual. The RawFrm
complicates decoding this.

Instead, use opcode 0x01, prefix 0x01, and form MRM1r. Then have the code
emitter special case these, a la [SML]FENCE.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@72556 91177308-0d34-0410-b5e6-96231b3b80d8
2009-05-28 23:40:46 +00:00
Evan Cheng
8a0b2daac2 Fix MOVMSKPDrr encoding.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@72535 91177308-0d34-0410-b5e6-96231b3b80d8
2009-05-28 18:55:28 +00:00
Evan Cheng
ed7f56b555 Fix PSIGND encoding bug. Patch by Sean Callanan.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@72534 91177308-0d34-0410-b5e6-96231b3b80d8
2009-05-28 18:48:53 +00:00
Bill Wendling
3b1259bb9f "The instructions MMX_PSADBWrm and MMX_PSADBWrr have opcode 0b11100000 (e0), but
the Intel manual (screenshot) says it should be 0b11110110 (f6).  The existing
encoding causes a disassembly conflict with MMX_PAVGBrm, which really should be
0f e0."

Patch by Sean Callanan!


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@72508 91177308-0d34-0410-b5e6-96231b3b80d8
2009-05-28 02:04:00 +00:00
Evan Cheng
bc9be219d6 Fix sfence jit encoding. Patch by Sean Callanan.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@72488 91177308-0d34-0410-b5e6-96231b3b80d8
2009-05-27 18:38:01 +00:00
Evan Cheng
0af934eb64 80 col violations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@71582 91177308-0d34-0410-b5e6-96231b3b80d8
2009-05-12 20:17:52 +00:00
Nate Begeman
ec8eee2d3a Fix infinite recursion in the C++ code which handles movddup by making it unnecessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@70425 91177308-0d34-0410-b5e6-96231b3b80d8
2009-04-29 22:47:44 +00:00
Nate Begeman
9008ca6b6b 2nd attempt, fixing SSE4.1 issues and implementing feedback from duncan.
PR2957

ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask.  A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to 
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@70225 91177308-0d34-0410-b5e6-96231b3b80d8
2009-04-27 18:41:29 +00:00
Rafael Espindola
15684b2955 Revert 69952. Causes testsuite failures on linux x86-64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@69967 91177308-0d34-0410-b5e6-96231b3b80d8
2009-04-24 12:40:33 +00:00
Nate Begeman
b706d29f9c PR2957
ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle
mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes
as the shuffle mask.  A value of -1 represents UNDEF.

In addition to eliminating the creation of illegal BUILD_VECTORS just to 
represent shuffle masks, we are better about canonicalizing the shuffle mask,
resulting in substantially better code for some classes of shuffles.

A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@69952 91177308-0d34-0410-b5e6-96231b3b80d8
2009-04-24 03:42:54 +00:00
Rafael Espindola
094fad37b9 Re-apply 68552.
Tested by bootstrapping llvm-gcc and using that to build llvm.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@68645 91177308-0d34-0410-b5e6-96231b3b80d8
2009-04-08 21:14:34 +00:00
Bill Wendling
044b5344c4 Temporarily revert r68552. This was causing a failure in the self-hosting LLVM
builds.

--- Reverse-merging (from foreign repository) r68552 into '.':
U    test/CodeGen/X86/tls8.ll
U    test/CodeGen/X86/tls10.ll
U    test/CodeGen/X86/tls2.ll
U    test/CodeGen/X86/tls6.ll
U    lib/Target/X86/X86Instr64bit.td
U    lib/Target/X86/X86InstrSSE.td
U    lib/Target/X86/X86InstrInfo.td
U    lib/Target/X86/X86RegisterInfo.cpp
U    lib/Target/X86/X86ISelLowering.cpp
U    lib/Target/X86/X86CodeEmitter.cpp
U    lib/Target/X86/X86FastISel.cpp
U    lib/Target/X86/X86InstrInfo.h
U    lib/Target/X86/X86ISelDAGToDAG.cpp
U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp
U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp
U    lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h
U    lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h
U    lib/Target/X86/X86ISelLowering.h
U    lib/Target/X86/X86InstrInfo.cpp
U    lib/Target/X86/X86InstrBuilder.h
U    lib/Target/X86/X86RegisterInfo.td



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@68560 91177308-0d34-0410-b5e6-96231b3b80d8
2009-04-07 22:35:25 +00:00
Rafael Espindola
2a6411bbbd Reduce code duplication on the TLS implementation.
This introduces a small regression on the generated code
quality in the case we are just computing addresses, not
loading values.

Will work on it and on X86-64 support.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@68552 91177308-0d34-0410-b5e6-96231b3b80d8
2009-04-07 21:37:46 +00:00
Evan Cheng
236aa8a503 ADDS{D|S}rr_Int and MULS{D|S}rr_Int are not commutable. The users of these intrinsics expect the high bits will not be modified.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@65499 91177308-0d34-0410-b5e6-96231b3b80d8
2009-02-26 03:12:02 +00:00
Nate Begeman
b9a47b824f Generate better code for v8i16 shuffles on SSE2
Generate better code for v16i8 shuffles on SSE2 (avoids stack)
Generate pshufb for v8i16 and v16i8 shuffles on SSSE3 where it is fewer uops.
Document the shuffle matching logic and add some FIXMEs for later further
  cleanups.
New tests that test the above.

Examples:

New:
_shuf2:
	pextrw	$7, %xmm0, %eax
	punpcklqdq	%xmm1, %xmm0
	pshuflw	$128, %xmm0, %xmm0
	pinsrw	$2, %eax, %xmm0

Old:
_shuf2:
	pextrw	$2, %xmm0, %eax
	pextrw	$7, %xmm0, %ecx
	pinsrw	$2, %ecx, %xmm0
	pinsrw	$3, %eax, %xmm0
	movd	%xmm1, %eax
	pinsrw	$4, %eax, %xmm0
	ret

=========

New:
_shuf4:
	punpcklqdq	%xmm1, %xmm0
	pshufb	LCPI1_0, %xmm0

Old:
_shuf4:
	pextrw	$3, %xmm0, %eax
	movsd	%xmm1, %xmm0
	pextrw	$3, %xmm1, %ecx
	pinsrw	$4, %ecx, %xmm0
	pinsrw	$5, %eax, %xmm0

========

New:
_shuf1:
	pushl	%ebx
	pushl	%edi
	pushl	%esi
	pextrw	$1, %xmm0, %eax
	rolw	$8, %ax
	movd	%xmm0, %ecx
	rolw	$8, %cx
	pextrw	$5, %xmm0, %edx
	pextrw	$4, %xmm0, %esi
	pextrw	$3, %xmm0, %edi
	pextrw	$2, %xmm0, %ebx
	movaps	%xmm0, %xmm1
	pinsrw	$0, %ecx, %xmm1
	pinsrw	$1, %eax, %xmm1
	rolw	$8, %bx
	pinsrw	$2, %ebx, %xmm1
	rolw	$8, %di
	pinsrw	$3, %edi, %xmm1
	rolw	$8, %si
	pinsrw	$4, %esi, %xmm1
	rolw	$8, %dx
	pinsrw	$5, %edx, %xmm1
	pextrw	$7, %xmm0, %eax
	rolw	$8, %ax
	movaps	%xmm1, %xmm0
	pinsrw	$7, %eax, %xmm0
	popl	%esi
	popl	%edi
	popl	%ebx
	ret

Old:
_shuf1:
	subl	$252, %esp
	movaps	%xmm0, (%esp)
	movaps	%xmm0, 16(%esp)
	movaps	%xmm0, 32(%esp)
	movaps	%xmm0, 48(%esp)
	movaps	%xmm0, 64(%esp)
	movaps	%xmm0, 80(%esp)
	movaps	%xmm0, 96(%esp)
	movaps	%xmm0, 224(%esp)
	movaps	%xmm0, 208(%esp)
	movaps	%xmm0, 192(%esp)
	movaps	%xmm0, 176(%esp)
	movaps	%xmm0, 160(%esp)
	movaps	%xmm0, 144(%esp)
	movaps	%xmm0, 128(%esp)
	movaps	%xmm0, 112(%esp)
	movzbl	14(%esp), %eax
	movd	%eax, %xmm1
	movzbl	22(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	42(%esp), %eax
	movd	%eax, %xmm1
	movzbl	50(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm1, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	77(%esp), %eax
	movd	%eax, %xmm1
	movzbl	84(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm1, %xmm2
	movzbl	104(%esp), %eax
	movd	%eax, %xmm1
	punpcklbw	%xmm1, %xmm0
	punpcklbw	%xmm2, %xmm0
	movaps	%xmm0, %xmm1
	punpcklbw	%xmm3, %xmm1
	movzbl	127(%esp), %eax
	movd	%eax, %xmm0
	movzbl	135(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	155(%esp), %eax
	movd	%eax, %xmm0
	movzbl	163(%esp), %eax
	movd	%eax, %xmm3
	punpcklbw	%xmm0, %xmm3
	punpcklbw	%xmm2, %xmm3
	movzbl	188(%esp), %eax
	movd	%eax, %xmm0
	movzbl	197(%esp), %eax
	movd	%eax, %xmm2
	punpcklbw	%xmm0, %xmm2
	movzbl	217(%esp), %eax
	movd	%eax, %xmm4
	movzbl	225(%esp), %eax
	movd	%eax, %xmm0
	punpcklbw	%xmm4, %xmm0
	punpcklbw	%xmm2, %xmm0
	punpcklbw	%xmm3, %xmm0
	punpcklbw	%xmm1, %xmm0
	addl	$252, %esp
	ret



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@65311 91177308-0d34-0410-b5e6-96231b3b80d8
2009-02-23 08:49:38 +00:00
Evan Cheng
1d76864df3 Handle llvm.x86.sse2.maskmov.dqu in 64-bit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@64240 91177308-0d34-0410-b5e6-96231b3b80d8
2009-02-10 22:06:28 +00:00
Evan Cheng
b3379fbc60 A few more isAsCheapAsAMove.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@63852 91177308-0d34-0410-b5e6-96231b3b80d8
2009-02-05 08:42:55 +00:00
Evan Cheng
1632782fe9 The memory alignment requirement on some of the mov{h|l}p{d|s} patterns are 16-byte. That is overly strict. These instructions read / write f64 memory locations without alignment requirement.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@63195 91177308-0d34-0410-b5e6-96231b3b80d8
2009-01-28 08:35:02 +00:00
Dan Gohman
b134709a58 Whitespace and other minor adjustments to make SSE instructions have
the same formatting as their corresponding SSE2 instructions, for
consistency.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61971 91177308-0d34-0410-b5e6-96231b3b80d8
2009-01-09 02:27:34 +00:00
Mon P Wang
af9b952627 Fixed x86 code generation of multiple for v2i64. It was incorrect for SSE4.1.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@61211 91177308-0d34-0410-b5e6-96231b3b80d8
2008-12-18 21:42:19 +00:00
Dan Gohman
15511cf166 Rename isSimpleLoad to canFoldAsLoad, to better reflect its meaning.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60487 91177308-0d34-0410-b5e6-96231b3b80d8
2008-12-03 18:15:48 +00:00
Dan Gohman
62c939d7d5 Mark x86's V_SET0 and V_SETALLONES with isSimpleLoad, and teach X86's
foldMemoryOperand how to "fold" them, by converting them into constant-pool
loads. When they aren't folded, they use xorps/cmpeqd, but for example when
register pressure is high, they may now be folded as memory operands, which
reduces register pressure.

Also, mark V_SET0 isAsCheapAsAMove so that two-address-elimination will
remat it instead of copying zeros around (V_SETALLONES was already marked).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@60461 91177308-0d34-0410-b5e6-96231b3b80d8
2008-12-03 05:21:24 +00:00
Evan Cheng
4b299d4ebd Fix lfence and mfence encoding. These look like MRM5r and MRM6r instructions except they do not have any operands. The RegModRM byte is encoded with register number 0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57692 91177308-0d34-0410-b5e6-96231b3b80d8
2008-10-17 17:14:20 +00:00
Dan Gohman
a7250ddc28 Fix the predicate for memop64 to be a regular load, not just
an unindexed load.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57612 91177308-0d34-0410-b5e6-96231b3b80d8
2008-10-16 00:03:00 +00:00
Dan Gohman
3358629380 Now that predicates can be composed, simplify several of
the predicates by extending simple predicates to create
more complex predicates instead of duplicating the logic
for the simple predicates.

This doesn't reduce much redundancy in DAGISelEmitter.cpp's
generated source yet; that will require improvements to
DAGISelEmitter.cpp's instruction sorting, to make it more
effectively group nodes with similar predicates together.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57565 91177308-0d34-0410-b5e6-96231b3b80d8
2008-10-15 06:50:19 +00:00
Dale Johannesen
e397acce9d Fix SSE4.1 roundss, roundsd. While the instructions have
the same pattern as roundpd/roundps, the Intel compiler 
builtins do not:  rounds* has an extra operand.  Fixes
gcc.target/i386/sse4_1-rounds[sd]-[1234].c



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57370 91177308-0d34-0410-b5e6-96231b3b80d8
2008-10-10 23:51:03 +00:00
Anders Carlsson
ae436cecca Certain patterns involving the "movss" instruction were marked as requiring SSE2, when in reality movss is an SSE1 instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@57246 91177308-0d34-0410-b5e6-96231b3b80d8
2008-10-07 16:14:11 +00:00
Bill Wendling
5e249b4a14 "The original bug was a complaint that _mm_srli_si128 mis-compiled when passed
a constant vector ("{0x123, 0x456}" syntax).  The fix is to simplify the
_mm_srli_si128 macro, and  move the "* 8" from the macro into the compiler
back-end.  I can't change the existing __builtins because so many people are
using them :-(."
Patch by Stuart Hastings!


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@56944 91177308-0d34-0410-b5e6-96231b3b80d8
2008-10-02 05:56:52 +00:00
Evan Cheng
b7a75a5a54 Implement "punpckldq %xmm0, $xmm0" as "pshufd $0x50, %xmm0, %xmm" unless optimizing for code size.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@56711 91177308-0d34-0410-b5e6-96231b3b80d8
2008-09-26 23:41:32 +00:00
Evan Cheng
c739489c19 unpckhps requires sse1, punpckhdq requires sse2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@56697 91177308-0d34-0410-b5e6-96231b3b80d8
2008-09-26 21:26:30 +00:00
Evan Cheng
0b457f0c3a With sse3 and when the source is a load or has multiple uses, favors movddup over shuffp*, pshufd, etc. Without sse3 or when the source is from a register, make use of movlhps
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@56620 91177308-0d34-0410-b5e6-96231b3b80d8
2008-09-25 20:50:48 +00:00
Evan Cheng
89d4a2848d pmovsxbq etc. requires sse4.1.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@56600 91177308-0d34-0410-b5e6-96231b3b80d8
2008-09-25 00:49:51 +00:00
Evan Cheng
ca57f78332 Fix patterns for SSE4.1 move and sign extend instructions. Also add instructions which fold VZEXT_MOVL and VZEXT_LOAD.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@56594 91177308-0d34-0410-b5e6-96231b3b80d8
2008-09-24 23:27:55 +00:00
Dan Gohman
f5aeb1a8e4 Rename ConstantSDNode::getValue to getZExtValue, for consistency
with ConstantInt. This led to fixing a bug in TargetLowering.cpp
using getValue instead of getAPIntValue.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@56159 91177308-0d34-0410-b5e6-96231b3b80d8
2008-09-12 16:56:44 +00:00
Eli Friedman
d0c0fae63b Fix for PR2687: Add patterns to match sint_to_fp and fp_to_sint for <2 x
i32>.  This is a little messy, but it works.

We should really get rid of the intrinsics, though, since they map
perfectly well to standard LLVM instructions.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@55864 91177308-0d34-0410-b5e6-96231b3b80d8
2008-09-05 23:07:03 +00:00
Evan Cheng
66e13153bd FsFLD0S{S|D} and V_SETALLONES are as cheap as moves.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@55466 91177308-0d34-0410-b5e6-96231b3b80d8
2008-08-28 07:52:25 +00:00
Dan Gohman
67ca6be16a Tablegen generated code already tests the opcode value, so it's not
necessary to use dyn_cast in these predicates.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@55055 91177308-0d34-0410-b5e6-96231b3b80d8
2008-08-20 15:24:22 +00:00
Dan Gohman
d9ced09299 Add an EXTRACTPSmr pattern to match the pattern that
X86ISelLowering creates.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54544 91177308-0d34-0410-b5e6-96231b3b80d8
2008-08-08 18:30:21 +00:00
Evan Cheng
e9d5035838 Fix PR2620: Fix X86cmppd selection code so it expects operands to be v2f64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@54376 91177308-0d34-0410-b5e6-96231b3b80d8
2008-08-05 22:19:15 +00:00
Nate Begeman
e99b255b5c Fix a typo in last commit
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53720 91177308-0d34-0410-b5e6-96231b3b80d8
2008-07-17 17:04:58 +00:00
Nate Begeman
30a0de94e7 SSE codegen for vsetcc nodes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53719 91177308-0d34-0410-b5e6-96231b3b80d8
2008-07-17 16:51:19 +00:00
Evan Cheng
331e2bd942 Fix for PR2472. Use movss to set lower 32-bits of a zero XMM vector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53386 91177308-0d34-0410-b5e6-96231b3b80d8
2008-07-10 01:08:23 +00:00
Evan Cheng
4e444436f2 Horizontal-add instructions are not commutative.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@52363 91177308-0d34-0410-b5e6-96231b3b80d8
2008-06-16 21:16:24 +00:00
Evan Cheng
35b9a7790e mpsadbw is commutable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@52352 91177308-0d34-0410-b5e6-96231b3b80d8
2008-06-16 20:25:59 +00:00
Duncan Sands
d4b9c17fb7 Disable some DAG combiner optimizations that may be
wrong for volatile loads and stores.  In fact this
is almost all of them!  There are three types of
problems: (1) it is wrong to change the width of
a volatile memory access.  These may be used to
do memory mapped i/o, in which case a load can have
an effect even if the result is not used.  Consider
loading an i32 but only using the lower 8 bits.  It
is wrong to change this into a load of an i8, because
you are no longer tickling the other three bytes.  It
is also unwise to make a load/store wider.  For
example, changing an i16 load into an i32 load is
wrong no matter how aligned things are, since the
fact of loading an additional 2 bytes can have
i/o side-effects.  (2) it is wrong to change the
number of volatile load/stores: they may be counted
by the hardware.  (3) it is wrong to change a volatile
load/store that requires one memory access into one
that requires several.  For example on x86-32, you
can store a double in one processor operation, but to
store an i64 requires two (two i32 stores).  In a
multi-threaded program you may want to bitcast an i64
to a double and store as a double because that will
occur atomically, and be indivisible to other threads.
So it would be wrong to convert the store-of-double
into a store of an i64, because this will become two
i32 stores - no longer atomic.  My policy here is
to say that the number of processor operations for
an illegal operation is undefined.  So it is alright
to change a store of an i64 (requires at least two
stores; but could be validly lowered to memcpy for
example) into a store of double (one processor op).
In short, if the new store is legal and has the same
size then I say that the transform is ok.  It would
also be possible to say that transforms are always
ok if before they were illegal, whether after they
are illegal or not, but that's more awkward to do
and I doubt it buys us anything much.
However this exposed an interesting thing - on x86-32
a store of i64 is considered legal!  That is because
operations are marked legal by default, regardless of
whether the type is legal or not.  In some ways this
is clever: before type legalization this means that
operations on illegal types are considered legal;
after type legalization there are no illegal types
so now operations are only legal if they really are.
But I consider this to be too cunning for mere mortals.
Better to do things explicitly by testing AfterLegalize.
So I have changed things so that operations with illegal
types are considered illegal - indeed they can never
map to a machine operation.  However this means that
the DAG combiner is more conservative because before
it was "accidentally" performing transforms where the
type was illegal because the operation was nonetheless
marked legal.  So in a few such places I added a check
on AfterLegalize, which I suppose was actually just
forgotten before.  This causes the DAG combiner to do
slightly more than it used to, which resulted in the X86
backend blowing up because it got a slightly surprising
node it wasn't expecting, so I tweaked it.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@52254 91177308-0d34-0410-b5e6-96231b3b80d8
2008-06-13 19:07:40 +00:00
Evan Cheng
f26ffe987c Implement vector shift up / down and insert zero with ps{rl}lq / ps{rl}ldq.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51667 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-29 08:22:04 +00:00