vr1 = extract_subreg vr2, 3
...
vr3 = extract_subreg vr1, 2
The end result is vr3 is equal to vr2 with subidx 2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47592 91177308-0d34-0410-b5e6-96231b3b80d8
after legalize. Just because a constant is legal (e.g. 0.0 in SSE)
doesn't mean that its negated value is legal (-0.0). We could make
this stronger by checking to see if the negated constant is actually
legal post negation, but it doesn't seem like a big deal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47591 91177308-0d34-0410-b5e6-96231b3b80d8
for CellSPU modifications:
- SPUInstrInfo.td refactoring: "multiclass" really is _your_ friend.
- Other improvements based on refactoring effort in SPUISelLowering.cpp,
esp. in SPUISelLowering::PerformDAGCombine(), where zero amount shifts and
rotates are now eliminiated, other scalar-to-vector-to-scalar silliness
is also eliminated.
- 64-bit operations are being implemented, _muldi3.c gcc runtime now
compiles and generates the right code. More work still needs to be done.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47532 91177308-0d34-0410-b5e6-96231b3b80d8
instead of with mmx registers. This horribleness is apparently
done by gcc to avoid having to insert emms in places that really
should have it. This is the second half of rdar://5741668.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47474 91177308-0d34-0410-b5e6-96231b3b80d8
GCC apparently does this, and code depends on not having to do
emms when this happens. This is x86-64 only so far, second half
should handle x86-32.
rdar://5741668
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47470 91177308-0d34-0410-b5e6-96231b3b80d8
any, we force sdisel to do all regalloc for an asm. This
leads to gross but correct codegen.
This fixes the rest of PR2078.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47454 91177308-0d34-0410-b5e6-96231b3b80d8
inline asms.
Fix PR2078 by marking aliases of registers used when a register is
marked used. This prevents EAX from being allocated when AX is listed
in the clobber set for the asm.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47426 91177308-0d34-0410-b5e6-96231b3b80d8
- X86 now normalize SCALAR_TO_VECTOR to (BIT_CONVERT (v4i32 SCALAR_TO_VECTOR)). Get rid of X86ISD::S2VEC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47290 91177308-0d34-0410-b5e6-96231b3b80d8
has plain one-result scalar integer multiplication instructions.
This avoids expanding such instructions into MUL_LOHI sequences that
must be special-cased at isel time, and avoids the problem with that
code that provented memory operands from being folded.
This fixes PR1874, addressesing the most common case. The uncommon
cases of optimizing multiply-high operations will require work
in DAGCombiner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47277 91177308-0d34-0410-b5e6-96231b3b80d8
CTTZ and CTPOP. The expansion code differs from
that in LegalizeDAG in that it chooses to take the
CTLZ/CTTZ count from the Hi/Lo part depending on
whether the Hi/Lo value is zero, not on whether
CTLZ/CTTZ of Hi/Lo returned 32 (or whatever the
width of the type is) for it. I made this change
because the optimizers may well know that Hi/Lo
is zero and exploit it. The promotion code for
CTTZ also differs from that in LegalizeDAG: it
uses an "or" to get the right result when the
original value is zero, rather than using a compare
and select. This also means the value doesn't
need to be zero extended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47075 91177308-0d34-0410-b5e6-96231b3b80d8
node as soon as we create it in SDISel. Previously we would lower it in
legalize. The problem with this is that it only exposes the argument
loads implied by FORMAL_ARGUMENTs after legalize, so that only dag combine 2
can hack on them. This causes us to miss some optimizations because
datatype expansion also happens here.
Exposing the loads early allows us to do optimizations on them. For example
we now compile arg-cast.ll to:
_foo:
movl $2147483647, %eax
andl 8(%esp), %eax
ret
where we previously produced:
_foo:
subl $12, %esp
movsd 16(%esp), %xmm0
movsd %xmm0, (%esp)
movl $2147483647, %eax
andl 4(%esp), %eax
addl $12, %esp
ret
It might also make sense to do this for ISD::CALL nodes, which have implicit
stores on many targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47054 91177308-0d34-0410-b5e6-96231b3b80d8
any bugs in the future since to get the crash you also
need hacked in fake libcall support (which creates odd
but legal trees), but since adding it doesn't hurt...
Thanks to Chris for this ultimately reduced version.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46706 91177308-0d34-0410-b5e6-96231b3b80d8
only two addressing mode nodes, SPUaform and SPUindirect (vice the
three previous ones, SPUaform, SPUdform and SPUxform). This improves
code somewhat because we now avoid using reg+reg addressing when
it can be avoided. It also simplifies the address selection logic,
which was the main point for doing this.
Also, for various global variables that would be loaded using SPU's
A-form addressing, prefer D-form offs[reg] addressing, keeping the
base in a register if the variable is used more than once.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46483 91177308-0d34-0410-b5e6-96231b3b80d8
registers if used by a bitconvert or using a bitconvert. This allows us to
avoid constant pool loads and use cheaper integer instructions when the
values come from or end up in integer regs anyway. For example, we now
compile CodeGen/X86/fp-in-intregs.ll to:
_test1:
movl $2147483648, %eax
xorl 4(%esp), %eax
ret
_test2:
movl $1065353216, %eax
orl 4(%esp), %eax
andl $3212836864, %eax
ret
Instead of:
_test1:
movss 4(%esp), %xmm0
xorps LCPI2_0, %xmm0
movd %xmm0, %eax
ret
_test2:
movss 4(%esp), %xmm0
andps LCPI3_0, %xmm0
movss LCPI3_1, %xmm1
andps LCPI3_2, %xmm1
orps %xmm0, %xmm1
movd %xmm1, %eax
ret
bitconverts can happen due to various calling conventions that require
fp values to passed in integer regs in some cases, e.g. when returning
a complex.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46414 91177308-0d34-0410-b5e6-96231b3b80d8
delete a node even if it was not dead in some cases. Instead, just add it to
the worklist. Also, make sure to use the CombineTo methods, as it was doing
things that were unsafe: the top level combine loop could touch dangling memory.
This fixes CodeGen/Generic/2008-01-25-dag-combine-mul.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46384 91177308-0d34-0410-b5e6-96231b3b80d8
This case returns the value in ST(0) and then has to convert it to an SSE
register. This causes significant codegen ugliness in some cases. For
example in the trivial fp-stack-direct-ret.ll testcase we used to generate:
_bar:
subl $28, %esp
call L_foo$stub
fstpl 16(%esp)
movsd 16(%esp), %xmm0
movsd %xmm0, 8(%esp)
fldl 8(%esp)
addl $28, %esp
ret
because we move the result of foo() into an XMM register, then have to
move it back for the return of bar.
Instead of hacking ever-more special cases into the call result lowering code
we take a much simpler approach: on x86-32, fp return is modeled as always
returning into an f80 register which is then truncated to f32 or f64 as needed.
Similarly for a result, we model it as an extension to f80 + return.
This exposes the truncate and extensions to the dag combiner, allowing target
independent code to hack on them, eliminating them in this case. This gives
us this code for the example above:
_bar:
subl $12, %esp
call L_foo$stub
addl $12, %esp
ret
The nasty aspect of this is that these conversions are not legal, but we want
the second pass of dag combiner (post-legalize) to be able to hack on them.
To handle this, we lie to legalize and say they are legal, then custom expand
them on entry to the isel pass (PreprocessForFPConvert). This is gross, but
less gross than the code it is replacing :)
This also allows us to generate better code in several other cases. For
example on fp-stack-ret-conv.ll, we now generate:
_test:
subl $12, %esp
call L_foo$stub
fstps 8(%esp)
movl 16(%esp), %eax
cvtss2sd 8(%esp), %xmm0
movsd %xmm0, (%eax)
addl $12, %esp
ret
where before we produced (incidentally, the old bad code is identical to what
gcc produces):
_test:
subl $12, %esp
call L_foo$stub
fstpl (%esp)
cvtsd2ss (%esp), %xmm0
cvtss2sd %xmm0, %xmm0
movl 16(%esp), %eax
movsd %xmm0, (%eax)
addl $12, %esp
ret
Note that we generate slightly worse code on pr1505b.ll due to a scheduling
deficiency that is unrelated to this patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46307 91177308-0d34-0410-b5e6-96231b3b80d8
Fixed CellSPU's A-form (local store) address mode, so that all globals,
externals, constant pool and jump table symbols are now wrapped within
a SPUISD::AFormAddr pseudo-instruction. This now identifies all local
store memory addresses, although it requires a bit of legerdemain during
instruction selection to properly select loads to and stores from local
store, properly generating "LQA" instructions.
Also added mul_ops.ll test harness for exercising integer multiplication.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46142 91177308-0d34-0410-b5e6-96231b3b80d8
1. Legalize now always promotes truncstore of i1 to i8.
2. Remove patterns and gunk related to truncstore i1 from targets.
3. Rename the StoreXAction stuff to TruncStoreAction in TLI.
4. Make the TLI TruncStoreAction table a 2d table to handle from/to conversions.
5. Mark a wide variety of invalid truncstores as such in various targets, e.g.
X86 currently doesn't support truncstore of any of its integer types.
6. Add legalize support for truncstores with invalid value input types.
7. Add a dag combine transform to turn store(truncate) into truncstore when
safe.
The later allows us to compile CodeGen/X86/storetrunc-fp.ll to:
_foo:
fldt 20(%esp)
fldt 4(%esp)
faddp %st(1)
movl 36(%esp), %eax
fstps (%eax)
ret
instead of:
_foo:
subl $4, %esp
fldt 24(%esp)
fldt 8(%esp)
faddp %st(1)
fstps (%esp)
movl 40(%esp), %eax
movss (%esp), %xmm0
movss %xmm0, (%eax)
addl $4, %esp
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46140 91177308-0d34-0410-b5e6-96231b3b80d8
and the spill is its kill. However, if the local allocator has determined the
register has not been modified (possible when its value was reloaded), it would
not issue a restore. In that case, mark the last use of the virtual register as
kill.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46111 91177308-0d34-0410-b5e6-96231b3b80d8
It's not safe to use the two value CombineTo variant to combine away a dead load.
e.g.
v1, chain2 = load chain1, loc
v2, chain3 = load chain2, loc
v3 = add v2, c
Now we replace use of v1 with undef, use of chain2 with chain1.
ReplaceAllUsesWith() will iterate through uses of the first load and update operands:
v1, chain2 = load chain1, loc
v2, chain3 = load chain1, loc
v3 = add v2, c
Now the second load is the same as the first load, SelectionDAG cse will ensure
the use of second load is replaced with the first load.
v1, chain2 = load chain1, loc
v3 = add v1, c
Then v1 is replaced with undef and bad things happen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46099 91177308-0d34-0410-b5e6-96231b3b80d8
it should work, but I have no machine to test
it on. Committed because it will at least
cause no harm, and maybe someone can test it
for me!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46098 91177308-0d34-0410-b5e6-96231b3b80d8
make the 'fp return in ST(0)' optimization smart enough to
look through token factor nodes. THis allows us to compile
testcases like CodeGen/X86/fp-stack-retcopy.ll into:
_carg:
subl $12, %esp
call L_foo$stub
fstpl (%esp)
fldl (%esp)
addl $12, %esp
ret
instead of:
_carg:
subl $28, %esp
call L_foo$stub
fstpl 16(%esp)
movsd 16(%esp), %xmm0
movsd %xmm0, 8(%esp)
fldl 8(%esp)
addl $28, %esp
ret
Still not optimal, but much better and this is a trivial patch. Fixing
the rest requires invasive surgery that is is not llvm 2.2 material.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46054 91177308-0d34-0410-b5e6-96231b3b80d8
- struct_2.ll: Completely unaligned load/store testing
- call_indirect.ll, struct_1.ll: Add test lines to exercise
X-form [$reg($reg)] addressing
At this point, loads and stores should be under control (he says
in an optimistic tone of voice.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@45882 91177308-0d34-0410-b5e6-96231b3b80d8
- Cleaned up custom load/store logic, common code is now shared [see note
below], cleaned up address modes
- More test cases: various intrinsics, structure element access (load/store
test), updated target data strings, indirect function calls.
Note: This patch contains a refactoring of the LoadSDNode and StoreSDNode
structures: they now share a common base class, LSBaseSDNode, that
provides an interface to their common functionality. There is some hackery
to access the proper operand depending on the derived class; otherwise,
to do a proper job would require finding and rearranging the SDOperands
sent to StoreSDNode's constructor. The current refactor errs on the
side of being conservatively and backwardly compatible while providing
functionality that reduces redundant code for targets where loads and
stores are custom-lowered.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@45851 91177308-0d34-0410-b5e6-96231b3b80d8
Likewise fix up a bunch of other libcalls. While
there I remove NEG_F32 and NEG_F64 since they are
not used anywhere. This fixes 9 Ada ACATS failures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@45833 91177308-0d34-0410-b5e6-96231b3b80d8
the code generated is not wonderful. This turns a miscompilation into
a code quality bug (noted in the ppc readme). This fixes PR642, which
is over 2 years old (!). Nate, please review this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@45742 91177308-0d34-0410-b5e6-96231b3b80d8
providing a misleading facility. It's used once in the MIPS backend
and hardcoded as "\t.globl\t" everywhere else.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@45676 91177308-0d34-0410-b5e6-96231b3b80d8
values, which means doing extra legalization work.
It would be easier to get this kind of thing right if
there was some documentation...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@45472 91177308-0d34-0410-b5e6-96231b3b80d8
eliminating the llvm.x86.sse2.loadl.pd intrinsic?), one shuffle optzn
may be done (if shufps is better than pinsw, Evan, please review), and
we already know about LICM of simple instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@45407 91177308-0d34-0410-b5e6-96231b3b80d8
if we are just going to store it back anyway. This improves things
like:
double foo();
void bar(double *P) { *P = foo(); }
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@45399 91177308-0d34-0410-b5e6-96231b3b80d8
define void @f() {
...
call i32 @g()
...
}
define void @g() {
...
}
The hazards are:
- @f and @g have GC, but they differ GC. Inlining is invalid. This
may never occur.
- @f has no GC, but @g does. g's GC must be propagated to @f.
The other scenarios are safe:
- @f and @g have the same GC.
- @f and @g have no GC.
- @g has no GC.
This patch adds inliner checks for the former two scenarios.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@45351 91177308-0d34-0410-b5e6-96231b3b80d8
function with GC.
This will catch the error when the inliner inlines a function with
GC into a caller with no GC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@45350 91177308-0d34-0410-b5e6-96231b3b80d8
how to lower them (with no attempt made to be
efficient, since they should only occur for
unoptimized code).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@45108 91177308-0d34-0410-b5e6-96231b3b80d8
SelectionDAG::getConstant, in the same way as vector floating-point
constants. This allows the legalize expansion code for @llvm.ctpop and
friends to be usable with vector types.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@44954 91177308-0d34-0410-b5e6-96231b3b80d8
possible before resorting to pextrw and pinsrw.
- Better codegen for v4i32 shuffles masquerading as v8i16 or v16i8 shuffles.
- Improves (i16 extract_vector_element 0) codegen by recognizing
(i32 extract_vector_element 0) does not require a pextrw.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@44836 91177308-0d34-0410-b5e6-96231b3b80d8
methods are new to Function:
bool hasCollector() const;
const std::string &getCollector() const;
void setCollector(const std::string &);
void clearCollector();
The assembly representation is as such:
define void @f() gc "shadow-stack" { ...
The implementation uses an on-the-side table to map Functions to
collector names, such that there is no overhead. A StringPool is
further used to unique collector names, which are extremely
likely to be unique per process.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@44769 91177308-0d34-0410-b5e6-96231b3b80d8