- X86 now normalize SCALAR_TO_VECTOR to (BIT_CONVERT (v4i32 SCALAR_TO_VECTOR)). Get rid of X86ISD::S2VEC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47290 91177308-0d34-0410-b5e6-96231b3b80d8
has plain one-result scalar integer multiplication instructions.
This avoids expanding such instructions into MUL_LOHI sequences that
must be special-cased at isel time, and avoids the problem with that
code that provented memory operands from being folded.
This fixes PR1874, addressesing the most common case. The uncommon
cases of optimizing multiply-high operations will require work
in DAGCombiner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47277 91177308-0d34-0410-b5e6-96231b3b80d8
CTTZ and CTPOP. The expansion code differs from
that in LegalizeDAG in that it chooses to take the
CTLZ/CTTZ count from the Hi/Lo part depending on
whether the Hi/Lo value is zero, not on whether
CTLZ/CTTZ of Hi/Lo returned 32 (or whatever the
width of the type is) for it. I made this change
because the optimizers may well know that Hi/Lo
is zero and exploit it. The promotion code for
CTTZ also differs from that in LegalizeDAG: it
uses an "or" to get the right result when the
original value is zero, rather than using a compare
and select. This also means the value doesn't
need to be zero extended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47075 91177308-0d34-0410-b5e6-96231b3b80d8
node as soon as we create it in SDISel. Previously we would lower it in
legalize. The problem with this is that it only exposes the argument
loads implied by FORMAL_ARGUMENTs after legalize, so that only dag combine 2
can hack on them. This causes us to miss some optimizations because
datatype expansion also happens here.
Exposing the loads early allows us to do optimizations on them. For example
we now compile arg-cast.ll to:
_foo:
movl $2147483647, %eax
andl 8(%esp), %eax
ret
where we previously produced:
_foo:
subl $12, %esp
movsd 16(%esp), %xmm0
movsd %xmm0, (%esp)
movl $2147483647, %eax
andl 4(%esp), %eax
addl $12, %esp
ret
It might also make sense to do this for ISD::CALL nodes, which have implicit
stores on many targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47054 91177308-0d34-0410-b5e6-96231b3b80d8
any bugs in the future since to get the crash you also
need hacked in fake libcall support (which creates odd
but legal trees), but since adding it doesn't hurt...
Thanks to Chris for this ultimately reduced version.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46706 91177308-0d34-0410-b5e6-96231b3b80d8
only two addressing mode nodes, SPUaform and SPUindirect (vice the
three previous ones, SPUaform, SPUdform and SPUxform). This improves
code somewhat because we now avoid using reg+reg addressing when
it can be avoided. It also simplifies the address selection logic,
which was the main point for doing this.
Also, for various global variables that would be loaded using SPU's
A-form addressing, prefer D-form offs[reg] addressing, keeping the
base in a register if the variable is used more than once.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46483 91177308-0d34-0410-b5e6-96231b3b80d8
registers if used by a bitconvert or using a bitconvert. This allows us to
avoid constant pool loads and use cheaper integer instructions when the
values come from or end up in integer regs anyway. For example, we now
compile CodeGen/X86/fp-in-intregs.ll to:
_test1:
movl $2147483648, %eax
xorl 4(%esp), %eax
ret
_test2:
movl $1065353216, %eax
orl 4(%esp), %eax
andl $3212836864, %eax
ret
Instead of:
_test1:
movss 4(%esp), %xmm0
xorps LCPI2_0, %xmm0
movd %xmm0, %eax
ret
_test2:
movss 4(%esp), %xmm0
andps LCPI3_0, %xmm0
movss LCPI3_1, %xmm1
andps LCPI3_2, %xmm1
orps %xmm0, %xmm1
movd %xmm1, %eax
ret
bitconverts can happen due to various calling conventions that require
fp values to passed in integer regs in some cases, e.g. when returning
a complex.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46414 91177308-0d34-0410-b5e6-96231b3b80d8
delete a node even if it was not dead in some cases. Instead, just add it to
the worklist. Also, make sure to use the CombineTo methods, as it was doing
things that were unsafe: the top level combine loop could touch dangling memory.
This fixes CodeGen/Generic/2008-01-25-dag-combine-mul.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46384 91177308-0d34-0410-b5e6-96231b3b80d8
This case returns the value in ST(0) and then has to convert it to an SSE
register. This causes significant codegen ugliness in some cases. For
example in the trivial fp-stack-direct-ret.ll testcase we used to generate:
_bar:
subl $28, %esp
call L_foo$stub
fstpl 16(%esp)
movsd 16(%esp), %xmm0
movsd %xmm0, 8(%esp)
fldl 8(%esp)
addl $28, %esp
ret
because we move the result of foo() into an XMM register, then have to
move it back for the return of bar.
Instead of hacking ever-more special cases into the call result lowering code
we take a much simpler approach: on x86-32, fp return is modeled as always
returning into an f80 register which is then truncated to f32 or f64 as needed.
Similarly for a result, we model it as an extension to f80 + return.
This exposes the truncate and extensions to the dag combiner, allowing target
independent code to hack on them, eliminating them in this case. This gives
us this code for the example above:
_bar:
subl $12, %esp
call L_foo$stub
addl $12, %esp
ret
The nasty aspect of this is that these conversions are not legal, but we want
the second pass of dag combiner (post-legalize) to be able to hack on them.
To handle this, we lie to legalize and say they are legal, then custom expand
them on entry to the isel pass (PreprocessForFPConvert). This is gross, but
less gross than the code it is replacing :)
This also allows us to generate better code in several other cases. For
example on fp-stack-ret-conv.ll, we now generate:
_test:
subl $12, %esp
call L_foo$stub
fstps 8(%esp)
movl 16(%esp), %eax
cvtss2sd 8(%esp), %xmm0
movsd %xmm0, (%eax)
addl $12, %esp
ret
where before we produced (incidentally, the old bad code is identical to what
gcc produces):
_test:
subl $12, %esp
call L_foo$stub
fstpl (%esp)
cvtsd2ss (%esp), %xmm0
cvtss2sd %xmm0, %xmm0
movl 16(%esp), %eax
movsd %xmm0, (%eax)
addl $12, %esp
ret
Note that we generate slightly worse code on pr1505b.ll due to a scheduling
deficiency that is unrelated to this patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46307 91177308-0d34-0410-b5e6-96231b3b80d8