Commit Graph

285 Commits

Author SHA1 Message Date
Nate Begeman
e99b255b5c Fix a typo in last commit
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53720 91177308-0d34-0410-b5e6-96231b3b80d8
2008-07-17 17:04:58 +00:00
Nate Begeman
30a0de94e7 SSE codegen for vsetcc nodes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53719 91177308-0d34-0410-b5e6-96231b3b80d8
2008-07-17 16:51:19 +00:00
Evan Cheng
331e2bd942 Fix for PR2472. Use movss to set lower 32-bits of a zero XMM vector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@53386 91177308-0d34-0410-b5e6-96231b3b80d8
2008-07-10 01:08:23 +00:00
Evan Cheng
4e444436f2 Horizontal-add instructions are not commutative.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@52363 91177308-0d34-0410-b5e6-96231b3b80d8
2008-06-16 21:16:24 +00:00
Evan Cheng
35b9a7790e mpsadbw is commutable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@52352 91177308-0d34-0410-b5e6-96231b3b80d8
2008-06-16 20:25:59 +00:00
Duncan Sands
d4b9c17fb7 Disable some DAG combiner optimizations that may be
wrong for volatile loads and stores.  In fact this
is almost all of them!  There are three types of
problems: (1) it is wrong to change the width of
a volatile memory access.  These may be used to
do memory mapped i/o, in which case a load can have
an effect even if the result is not used.  Consider
loading an i32 but only using the lower 8 bits.  It
is wrong to change this into a load of an i8, because
you are no longer tickling the other three bytes.  It
is also unwise to make a load/store wider.  For
example, changing an i16 load into an i32 load is
wrong no matter how aligned things are, since the
fact of loading an additional 2 bytes can have
i/o side-effects.  (2) it is wrong to change the
number of volatile load/stores: they may be counted
by the hardware.  (3) it is wrong to change a volatile
load/store that requires one memory access into one
that requires several.  For example on x86-32, you
can store a double in one processor operation, but to
store an i64 requires two (two i32 stores).  In a
multi-threaded program you may want to bitcast an i64
to a double and store as a double because that will
occur atomically, and be indivisible to other threads.
So it would be wrong to convert the store-of-double
into a store of an i64, because this will become two
i32 stores - no longer atomic.  My policy here is
to say that the number of processor operations for
an illegal operation is undefined.  So it is alright
to change a store of an i64 (requires at least two
stores; but could be validly lowered to memcpy for
example) into a store of double (one processor op).
In short, if the new store is legal and has the same
size then I say that the transform is ok.  It would
also be possible to say that transforms are always
ok if before they were illegal, whether after they
are illegal or not, but that's more awkward to do
and I doubt it buys us anything much.
However this exposed an interesting thing - on x86-32
a store of i64 is considered legal!  That is because
operations are marked legal by default, regardless of
whether the type is legal or not.  In some ways this
is clever: before type legalization this means that
operations on illegal types are considered legal;
after type legalization there are no illegal types
so now operations are only legal if they really are.
But I consider this to be too cunning for mere mortals.
Better to do things explicitly by testing AfterLegalize.
So I have changed things so that operations with illegal
types are considered illegal - indeed they can never
map to a machine operation.  However this means that
the DAG combiner is more conservative because before
it was "accidentally" performing transforms where the
type was illegal because the operation was nonetheless
marked legal.  So in a few such places I added a check
on AfterLegalize, which I suppose was actually just
forgotten before.  This causes the DAG combiner to do
slightly more than it used to, which resulted in the X86
backend blowing up because it got a slightly surprising
node it wasn't expecting, so I tweaked it.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@52254 91177308-0d34-0410-b5e6-96231b3b80d8
2008-06-13 19:07:40 +00:00
Evan Cheng
f26ffe987c Implement vector shift up / down and insert zero with ps{rl}lq / ps{rl}ldq.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51667 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-29 08:22:04 +00:00
Dan Gohman
c2ecdc5a26 Fix the encoding for two more "rm" instructions that were using MRMSrcReg.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51630 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-28 01:50:19 +00:00
Mon P Wang
bfbbd4d221 Fixed X86 encoding error CVTPS2PD and CVTPD2PS when the source operand
is a memory location


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51626 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-28 00:42:27 +00:00
Evan Cheng
a31593901d Eliminate x86.sse2.punpckh.qdq and x86.sse2.punpckl.qdq.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51533 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-24 02:56:30 +00:00
Evan Cheng
e716bb1c59 Eliminate x86.sse2.movs.d, x86.sse2.shuf.pd, x86.sse2.unpckh.pd, and x86.sse2.unpckl.pd intrinsics. These will be lowered into shuffles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51531 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-24 02:14:05 +00:00
Evan Cheng
999dbe6bbc Remove x86.sse2.loadh.pd and x86.sse2.loadl.pd. These will be lowered into load and shuffle instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51522 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-24 00:07:29 +00:00
Evan Cheng
cd0baf21a1 Use movlps / movhps to modify low / high half of 16-byet memory location.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51501 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-23 21:23:16 +00:00
Evan Cheng
50f778deed Fix a duplicated pattern.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51490 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-23 18:00:18 +00:00
Dan Gohman
0b924dcef8 Use PMULDQ for v2i64 multiplies when SSE4.1 is available. And add
load-folding table entries for PMULDQ and PMULLD.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51489 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-23 17:49:40 +00:00
Evan Cheng
b1938263c7 Bug: rcpps can only folds a load if the address is 16-byte aligned. Fixed many 'ps' load folding patterns in X86InstrSSE.td which are missing the proper alignment checks.
Also fixed some 80 col. violations.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51462 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-23 00:37:07 +00:00
Evan Cheng
c36c0ab44b Add missing patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51435 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-22 18:56:56 +00:00
Evan Cheng
8e8de684c7 movsd and movq do not require 16-byte alignment. This fixes vec_set-5.ll on Linux.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51327 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-20 18:24:47 +00:00
Nate Begeman
32097bdbf6 Fix one more encoding bug.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51057 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-13 17:52:09 +00:00
Nate Begeman
c9bdb00683 Fix and encoding error in the psrad xmm, imm8 instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51020 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-13 01:47:52 +00:00
Nate Begeman
0d1704b955 Teach Legalize how to scalarize VSETCC
Teach X86 a few more vsetcc patterns.  Custom lowering for unsupported ones is next.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51009 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-12 23:09:43 +00:00
Nate Begeman
c2616e43fd Initial X86 codegen support for VSETCC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51000 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-12 20:34:32 +00:00
Evan Cheng
b70ea0bd03 Some clean up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@50929 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-10 00:59:18 +00:00
Evan Cheng
23573e5be6 Add a pattern to do move the low element of a v4f32 and zero extend the rest.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@50922 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-09 23:37:55 +00:00
Evan Cheng
d880b97257 Handle a few more cases of folding load i64 into xmm and zero top bits.
Note, some of the code will be moved into target independent part of DAG combiner in a subsequent patch.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@50918 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-09 21:53:03 +00:00
Evan Cheng
fd17f42bab Use movq to move low half of XMM register and zero-extend the rest.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@50874 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-08 22:35:02 +00:00
Evan Cheng
7e2ff77ef0 Handle vector move / load which zero the destination register top bits (i.e. movd, movq, movss (addr), movsd (addr)) with X86 specific dag combine.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@50838 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-08 00:57:18 +00:00
Evan Cheng
22b942aa4d Add separate intrinsics for MMX / SSE shifts with i32 integer operands. This allow us to simplify the horribly complicated matching code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@50601 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-03 00:52:09 +00:00
Evan Cheng
b609339a5c 80 column violation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@50575 91177308-0d34-0410-b5e6-96231b3b80d8
2008-05-02 07:53:32 +00:00
Chris Lattner
bd381a777b A better fix for my previous patch, MOVZQI2PQIrr just requires SSE2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@49986 91177308-0d34-0410-b5e6-96231b3b80d8
2008-04-20 05:52:46 +00:00
Dan Gohman
171c11ec93 Add support for the form of the SSE41 extractps instruction that
puts its result in a 32-bit GPR.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@49762 91177308-0d34-0410-b5e6-96231b3b80d8
2008-04-16 02:32:24 +00:00
Chris Lattner
db66750753 Fix the x86-64 side of PR2108 by adding a v2f64 version of
MOVZQI2PQIrr.  This would be better handled as a dag combine 
(with the goal of eliminating the bitconvert) but I don't know
how to do that safely.  Thoughts welcome.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@49463 91177308-0d34-0410-b5e6-96231b3b80d8
2008-04-10 05:13:43 +00:00
Evan Cheng
0c0f83ff5d Favors pshufd over shufps when shuffling elements from one vector. pshufd is faster than shufps.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@49244 91177308-0d34-0410-b5e6-96231b3b80d8
2008-04-05 00:30:36 +00:00
Evan Cheng
7aae876db1 Fix some SSE4.1 instruction encoding bugs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48815 91177308-0d34-0410-b5e6-96231b3b80d8
2008-03-26 08:11:49 +00:00
Evan Cheng
62a3f1538c - SSE4.1 extractfps extracts a f32 into a gr32 register. Very useful! Not. Fix the instruction specification and teaches lowering code to use it only when the only use is a store instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48746 91177308-0d34-0410-b5e6-96231b3b80d8
2008-03-24 21:52:23 +00:00
Nate Begeman
bc4efb8ac7 Add a couple missing SSE4 instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48430 91177308-0d34-0410-b5e6-96231b3b80d8
2008-03-16 21:14:46 +00:00
Evan Cheng
da47e6e0d0 Replace all target specific implicit def instructions with a target independent one: TargetInstrInfo::IMPLICIT_DEF.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48380 91177308-0d34-0410-b5e6-96231b3b80d8
2008-03-15 00:03:38 +00:00
Evan Cheng
029d9dafa0 Fix some 80 col violations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48361 91177308-0d34-0410-b5e6-96231b3b80d8
2008-03-14 07:46:48 +00:00
Evan Cheng
172b794cd5 Fix a number of encoding bugs. SSE 4.1 instructions MPSADBWrri, PINSRDrr, etc. have 8-bits immediate field (ImmT == Imm8).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48360 91177308-0d34-0410-b5e6-96231b3b80d8
2008-03-14 07:39:27 +00:00
Evan Cheng
c8e3b147ee Clean up my own mess.
X86 lowering normalize vector 0 to v4i32. However DAGCombine can fold (sub x, x) -> 0 after legalization. It can create a zero vector of a type that's not expected (e.g. v8i16). We don't want to disable the optimization since leaving a (sub x, x) is really bad. Add isel patterns for other types of vector 0 to ensure correctness. It's highly unlikely to happen other than in bugpoint reduced test cases.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48279 91177308-0d34-0410-b5e6-96231b3b80d8
2008-03-12 07:02:50 +00:00
Evan Cheng
27b7db549e Implement x86 support for @llvm.prefetch. It corresponds to prefetcht{0|1|2} and prefetchnta instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48042 91177308-0d34-0410-b5e6-96231b3b80d8
2008-03-08 00:58:38 +00:00
Evan Cheng
e9083d669a isTwoAddress = 1 -> Constraints.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47941 91177308-0d34-0410-b5e6-96231b3b80d8
2008-03-05 08:19:16 +00:00
Evan Cheng
e7b8a8b713 PSLLWri etc. are two-address instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47940 91177308-0d34-0410-b5e6-96231b3b80d8
2008-03-05 08:11:27 +00:00
Evan Cheng
efec751a1b - When DAG combiner is folding a bit convert into a BUILD_VECTOR, it should check if it's essentially a SCALAR_TO_VECTOR. Avoid turning (v8i16) <10, u, u, u> to <10, 0, u, u, u, u, u, u>. Instead, simply convert it to a SCALAR_TO_VECTOR of the proper type.
- X86 now normalize SCALAR_TO_VECTOR to (BIT_CONVERT (v4i32 SCALAR_TO_VECTOR)). Get rid of X86ISD::S2VEC.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47290 91177308-0d34-0410-b5e6-96231b3b80d8
2008-02-18 23:04:32 +00:00
Andrew Lenharth
22c5c1b2df llvm.memory.barrier, and impl for x86 and alpha
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47204 91177308-0d34-0410-b5e6-96231b3b80d8
2008-02-16 01:24:58 +00:00
Nate Begeman
cdd1eeca2c SSE4.1 64b integer insert/extract pattern support
Move formats into the formats file


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47035 91177308-0d34-0410-b5e6-96231b3b80d8
2008-02-12 22:51:28 +00:00
Nate Begeman
14d12caf1d Enable SSE4 codegen and pattern matching.
Add some notes to the README.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46949 91177308-0d34-0410-b5e6-96231b3b80d8
2008-02-11 04:19:36 +00:00
Nate Begeman
ab5d56c6b9 xmm0 variable blends
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46931 91177308-0d34-0410-b5e6-96231b3b80d8
2008-02-10 18:47:57 +00:00
Nate Begeman
fea2be50b9 memopv16i8 had wrong alignment requirement, would have broken pabsb
pabs{b,w,d} are not two address
fix extract-to-mem sse4 ops
add sse4 vector sign extend nodes


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46915 91177308-0d34-0410-b5e6-96231b3b80d8
2008-02-09 23:46:37 +00:00
Nate Begeman
1426d52cab Skeleton of insert and extract matching, more to come
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46902 91177308-0d34-0410-b5e6-96231b3b80d8
2008-02-09 01:38:08 +00:00