an RFP register class.
Teach ScheduleDAG how to handle CopyToReg with different src/dst
reg classes.
This allows us to compile trivial inline asms that expect stuff
on the top of x87-fp stack.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48107 91177308-0d34-0410-b5e6-96231b3b80d8
in different register classes, e.g. copy of ST(0) to RFP*. This gets
some really trivial inline asm working that plops things on the top of
stack (PR879)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48105 91177308-0d34-0410-b5e6-96231b3b80d8
Secondly, we have to check whether the branch is actually pointing to the block
with the unwind in it. We could have gotten here because of the unwind_to alone.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48099 91177308-0d34-0410-b5e6-96231b3b80d8
into a vector of zeros or undef, and when the top part is obviously
zero, we can just use movd + shuffle. This allows us to compile
vec_set-B.ll into:
_test3:
movl $1234567, %eax
andl 4(%esp), %eax
movd %eax, %xmm0
ret
instead of:
_test3:
subl $28, %esp
movl $1234567, %eax
andl 32(%esp), %eax
movl %eax, (%esp)
movl $0, 4(%esp)
movq (%esp), %xmm0
addl $28, %esp
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48090 91177308-0d34-0410-b5e6-96231b3b80d8
_test3:
movd %rdi, %xmm1
#IMPLICIT_DEF %xmm0
punpcklqdq %xmm1, %xmm0
ret
instead of:
_test3:
#IMPLICIT_DEF %rax
movd %rax, %xmm0
movd %rdi, %xmm1
punpcklqdq %xmm1, %xmm0
ret
This is still not ideal. There is no reason to two xmm regs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48058 91177308-0d34-0410-b5e6-96231b3b80d8
- select_bits.ll now fully functional now that PR1993 is closed. It was
previously broken by refactoring in SPUInstrInfo.td and using multiclasses.
- Same for eqv.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47972 91177308-0d34-0410-b5e6-96231b3b80d8
except ppc long double. This allows us to shrink constant pool
entries for x86 long double constants, which in turn allows us to
use flds/fldl instead of fldt.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47938 91177308-0d34-0410-b5e6-96231b3b80d8
For x86, if sse2 is available, it's not a good idea since cvtss2sd is slower than a movsd load and it prevents load folding. On x87, it's important to shrink fp constant since fldt is very expensive.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47931 91177308-0d34-0410-b5e6-96231b3b80d8
PPC-64 doesn't work.) This also lowers the spilling of the CR registers so that
it uses a register other than the default R0 register (the scavenger scrounges
for one). A significant part of this patch fixes how kill information is
handled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47863 91177308-0d34-0410-b5e6-96231b3b80d8
a union containing a vector and an array whose elements were smaller than
the vector elements. this means we need to compile the load of the
array elements into an extract element plus a truncate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47752 91177308-0d34-0410-b5e6-96231b3b80d8
stack slot and store if the SINT_TO_FP is actually legal. This allows
us to compile:
double a(double b) {return (unsigned)b;}
to:
_a:
cvttsd2siq %xmm0, %rax
movl %eax, %eax
cvtsi2sdq %rax, %xmm0
ret
instead of:
_a:
subq $8, %rsp
cvttsd2siq %xmm0, %rax
movl %eax, %eax
cvtsi2sdq %rax, %xmm0
addq $8, %rsp
ret
crazy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47660 91177308-0d34-0410-b5e6-96231b3b80d8
_test:
movl %edi, %eax
ret
instead of:
_test:
movl $4294967295, %ecx
movq %rdi, %rax
andq %rcx, %rax
ret
It would be great to write this as a Pat pattern that used subregs
instead of a 'pseudo' instruction, but I don't know how to do that
in td files.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47658 91177308-0d34-0410-b5e6-96231b3b80d8
(on solaris10, which are:
CodeGen/PowerPC/frounds.ll
Transforms/InstCombine/2008-02-23-MulSub.ll)
I needed a tool to figure out which one is the guilty.
To this end I have added a verbosity
option to the test/Makefile.
It can be invoked thus:
gmake check TESTSUITE=CodeGen/PowerPC VERBOSE="-v -v"
(The number of "-v"s specifies the verbosity level.
Instead of "-v" other aliases can be specified,
please consult the dejagnu docs for info.)
At level >= 2 following line is logged for each
test, before running it:
ABOUT TO RUN: <test>.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47602 91177308-0d34-0410-b5e6-96231b3b80d8
vr1 = extract_subreg vr2, 3
...
vr3 = extract_subreg vr1, 2
The end result is vr3 is equal to vr2 with subidx 2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47592 91177308-0d34-0410-b5e6-96231b3b80d8
after legalize. Just because a constant is legal (e.g. 0.0 in SSE)
doesn't mean that its negated value is legal (-0.0). We could make
this stronger by checking to see if the negated constant is actually
legal post negation, but it doesn't seem like a big deal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47591 91177308-0d34-0410-b5e6-96231b3b80d8
not safe. This is fixed by more aggressively checking that the return slot is
not used elsewhere in the function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47544 91177308-0d34-0410-b5e6-96231b3b80d8
for CellSPU modifications:
- SPUInstrInfo.td refactoring: "multiclass" really is _your_ friend.
- Other improvements based on refactoring effort in SPUISelLowering.cpp,
esp. in SPUISelLowering::PerformDAGCombine(), where zero amount shifts and
rotates are now eliminiated, other scalar-to-vector-to-scalar silliness
is also eliminated.
- 64-bit operations are being implemented, _muldi3.c gcc runtime now
compiles and generates the right code. More work still needs to be done.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47532 91177308-0d34-0410-b5e6-96231b3b80d8
instead of with mmx registers. This horribleness is apparently
done by gcc to avoid having to insert emms in places that really
should have it. This is the second half of rdar://5741668.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47474 91177308-0d34-0410-b5e6-96231b3b80d8
GCC apparently does this, and code depends on not having to do
emms when this happens. This is x86-64 only so far, second half
should handle x86-32.
rdar://5741668
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47470 91177308-0d34-0410-b5e6-96231b3b80d8
any, we force sdisel to do all regalloc for an asm. This
leads to gross but correct codegen.
This fixes the rest of PR2078.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47454 91177308-0d34-0410-b5e6-96231b3b80d8
inline asms.
Fix PR2078 by marking aliases of registers used when a register is
marked used. This prevents EAX from being allocated when AX is listed
in the clobber set for the asm.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47426 91177308-0d34-0410-b5e6-96231b3b80d8
Parse reversed smax and umax as smin and umin and express them with negative
or binary-not SCEVs (which are really just subtract under the hood).
Parse 'xor %x, -1' as (-1 - %x).
Remove dead code (ConstantInt::get always returns a ConstantInt).
Don't use getIntegerSCEV(-1, Ty). The first value is an int, then it gets
passed into a uint64_t. Instead, create the -1 directly from
ConstantInt::getAllOnesValue().
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47360 91177308-0d34-0410-b5e6-96231b3b80d8
- X86 now normalize SCALAR_TO_VECTOR to (BIT_CONVERT (v4i32 SCALAR_TO_VECTOR)). Get rid of X86ISD::S2VEC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47290 91177308-0d34-0410-b5e6-96231b3b80d8
has plain one-result scalar integer multiplication instructions.
This avoids expanding such instructions into MUL_LOHI sequences that
must be special-cased at isel time, and avoids the problem with that
code that provented memory operands from being folded.
This fixes PR1874, addressesing the most common case. The uncommon
cases of optimizing multiply-high operations will require work
in DAGCombiner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47277 91177308-0d34-0410-b5e6-96231b3b80d8
another sret function, it should pass its own sret parameter to the tail callee, allowing it to fill in the correct
return value. llvm-gcc does not emit this by default. Instead, it allocates space in the caller for the sret of
the tail call and then uses memcpy to copy the result into the caller's sret parameter. This optimization detects
and optimizes that case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47265 91177308-0d34-0410-b5e6-96231b3b80d8
CTTZ and CTPOP. The expansion code differs from
that in LegalizeDAG in that it chooses to take the
CTLZ/CTTZ count from the Hi/Lo part depending on
whether the Hi/Lo value is zero, not on whether
CTLZ/CTTZ of Hi/Lo returned 32 (or whatever the
width of the type is) for it. I made this change
because the optimizers may well know that Hi/Lo
is zero and exploit it. The promotion code for
CTTZ also differs from that in LegalizeDAG: it
uses an "or" to get the right result when the
original value is zero, rather than using a compare
and select. This also means the value doesn't
need to be zero extended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47075 91177308-0d34-0410-b5e6-96231b3b80d8
node as soon as we create it in SDISel. Previously we would lower it in
legalize. The problem with this is that it only exposes the argument
loads implied by FORMAL_ARGUMENTs after legalize, so that only dag combine 2
can hack on them. This causes us to miss some optimizations because
datatype expansion also happens here.
Exposing the loads early allows us to do optimizations on them. For example
we now compile arg-cast.ll to:
_foo:
movl $2147483647, %eax
andl 8(%esp), %eax
ret
where we previously produced:
_foo:
subl $12, %esp
movsd 16(%esp), %xmm0
movsd %xmm0, (%esp)
movl $2147483647, %eax
andl 4(%esp), %eax
addl $12, %esp
ret
It might also make sense to do this for ISD::CALL nodes, which have implicit
stores on many targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47054 91177308-0d34-0410-b5e6-96231b3b80d8
variable (with step 1) and m is its final value. Then, the correct trip
count is SMAX(m,n)-n. Previously, we used SMAX(0,m-n), but m-n may
overflow and can't in general be interpreted as signed.
Patch by Nick Lewycky.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47007 91177308-0d34-0410-b5e6-96231b3b80d8
to the RHS. This simple change allows to compute loop iteration count
for loops with condition similar to the one in the testcase (which seems
to be quite common).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46959 91177308-0d34-0410-b5e6-96231b3b80d8
arbitrary iteration.
The patch:
1) changes SCEVSDivExpr into SCEVUDivExpr,
2) replaces PartialFact() function with BinomialCoefficient(); the
computations (essentially, the division) in BinomialCoefficient() are
performed with the apprioprate bitwidth necessary to avoid overflow;
unsigned division is used instead of the signed one.
Computations in BinomialCoefficient() require support from the code
generator for APInts. Currently, we use a hack rounding up the
neccessary bitwidth to the nearest power of 2. The hack is easy to turn
off in future.
One remaining issue: we assume the divisor of the binomial coefficient
formula can be computed accurately using 16 bits. It means we can handle
AddRecs of length up to 9. In future, we should use APInts to evaluate
the divisor.
Thanks to Nicholas for cooperation!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46955 91177308-0d34-0410-b5e6-96231b3b80d8
was incorrectly simplifying "x == (gep x, 1, i)" into false, even
though i could be negative. As it turns out, all the code to
handle this already existed, we just need to disable the incorrect
optimization case and let the general case handle it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46739 91177308-0d34-0410-b5e6-96231b3b80d8
any bugs in the future since to get the crash you also
need hacked in fake libcall support (which creates odd
but legal trees), but since adding it doesn't hurt...
Thanks to Chris for this ultimately reduced version.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46706 91177308-0d34-0410-b5e6-96231b3b80d8
In practice this can only happen on code with already undefined behavior,
but this is still a good thing to handle correctly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46539 91177308-0d34-0410-b5e6-96231b3b80d8
only two addressing mode nodes, SPUaform and SPUindirect (vice the
three previous ones, SPUaform, SPUdform and SPUxform). This improves
code somewhat because we now avoid using reg+reg addressing when
it can be avoided. It also simplifies the address selection logic,
which was the main point for doing this.
Also, for various global variables that would be loaded using SPU's
A-form addressing, prefer D-form offs[reg] addressing, keeping the
base in a register if the variable is used more than once.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46483 91177308-0d34-0410-b5e6-96231b3b80d8
way or the other. Rewriting the code itself prevents subsequent analysis
passes from making contradictory conclusions about the code that could
cause an infeasible path to be made feasible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46427 91177308-0d34-0410-b5e6-96231b3b80d8
registers if used by a bitconvert or using a bitconvert. This allows us to
avoid constant pool loads and use cheaper integer instructions when the
values come from or end up in integer regs anyway. For example, we now
compile CodeGen/X86/fp-in-intregs.ll to:
_test1:
movl $2147483648, %eax
xorl 4(%esp), %eax
ret
_test2:
movl $1065353216, %eax
orl 4(%esp), %eax
andl $3212836864, %eax
ret
Instead of:
_test1:
movss 4(%esp), %xmm0
xorps LCPI2_0, %xmm0
movd %xmm0, %eax
ret
_test2:
movss 4(%esp), %xmm0
andps LCPI3_0, %xmm0
movss LCPI3_1, %xmm1
andps LCPI3_2, %xmm1
orps %xmm0, %xmm1
movd %xmm1, %eax
ret
bitconverts can happen due to various calling conventions that require
fp values to passed in integer regs in some cases, e.g. when returning
a complex.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46414 91177308-0d34-0410-b5e6-96231b3b80d8
void bork() {
int *address = 0;
*address = 0;
}
It's compiled into LLVM code that looks like this:
define void @bork() noreturn nounwind {
entry:
unreachable
}
This is bad on some platforms (like PPC) because it will generate the label for
the function but no body. The label could end up being associated with some
non-code related stuff, like a section. This places a "trap" instruction if the
SimplifyCFG pass removed all code from the function leaving only one
"unreachable" instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46387 91177308-0d34-0410-b5e6-96231b3b80d8
delete a node even if it was not dead in some cases. Instead, just add it to
the worklist. Also, make sure to use the CombineTo methods, as it was doing
things that were unsafe: the top level combine loop could touch dangling memory.
This fixes CodeGen/Generic/2008-01-25-dag-combine-mul.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46384 91177308-0d34-0410-b5e6-96231b3b80d8
can't be aliased to other known objects. This allows us to know that byval
pointer args don't alias globals, etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46315 91177308-0d34-0410-b5e6-96231b3b80d8
This case returns the value in ST(0) and then has to convert it to an SSE
register. This causes significant codegen ugliness in some cases. For
example in the trivial fp-stack-direct-ret.ll testcase we used to generate:
_bar:
subl $28, %esp
call L_foo$stub
fstpl 16(%esp)
movsd 16(%esp), %xmm0
movsd %xmm0, 8(%esp)
fldl 8(%esp)
addl $28, %esp
ret
because we move the result of foo() into an XMM register, then have to
move it back for the return of bar.
Instead of hacking ever-more special cases into the call result lowering code
we take a much simpler approach: on x86-32, fp return is modeled as always
returning into an f80 register which is then truncated to f32 or f64 as needed.
Similarly for a result, we model it as an extension to f80 + return.
This exposes the truncate and extensions to the dag combiner, allowing target
independent code to hack on them, eliminating them in this case. This gives
us this code for the example above:
_bar:
subl $12, %esp
call L_foo$stub
addl $12, %esp
ret
The nasty aspect of this is that these conversions are not legal, but we want
the second pass of dag combiner (post-legalize) to be able to hack on them.
To handle this, we lie to legalize and say they are legal, then custom expand
them on entry to the isel pass (PreprocessForFPConvert). This is gross, but
less gross than the code it is replacing :)
This also allows us to generate better code in several other cases. For
example on fp-stack-ret-conv.ll, we now generate:
_test:
subl $12, %esp
call L_foo$stub
fstps 8(%esp)
movl 16(%esp), %eax
cvtss2sd 8(%esp), %xmm0
movsd %xmm0, (%eax)
addl $12, %esp
ret
where before we produced (incidentally, the old bad code is identical to what
gcc produces):
_test:
subl $12, %esp
call L_foo$stub
fstpl (%esp)
cvtsd2ss (%esp), %xmm0
cvtss2sd %xmm0, %xmm0
movl 16(%esp), %eax
movsd %xmm0, (%eax)
addl $12, %esp
ret
Note that we generate slightly worse code on pr1505b.ll due to a scheduling
deficiency that is unrelated to this patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46307 91177308-0d34-0410-b5e6-96231b3b80d8