another sret function, it should pass its own sret parameter to the tail callee, allowing it to fill in the correct
return value. llvm-gcc does not emit this by default. Instead, it allocates space in the caller for the sret of
the tail call and then uses memcpy to copy the result into the caller's sret parameter. This optimization detects
and optimizes that case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47265 91177308-0d34-0410-b5e6-96231b3b80d8
Also, noalias arguments are be considered "like" stack allocated ones for this purpose, because
the only way they can be modref'ed is if they escape somewhere in the current function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47247 91177308-0d34-0410-b5e6-96231b3b80d8
it actually does. Simplify CountOperands a little by reusing
ComputeMemOperandsEnd. And reword some comments for both.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47198 91177308-0d34-0410-b5e6-96231b3b80d8
tblgen will complain if a sign-extended constant does not fit into a
data type smaller than i32, e.g., i16. This causes a problem when certain
hex constants are used, such as 0xff for byte masks or immediate xor
values.
tblgen will try the sign-extended value first and, if the sign extended
value would overflow, it tries to see if the unsigned value will fit.
Consequently, a software developer can now safely incant:
(XORHIr16 R16C:$rA, 0xffff)
which is somewhat clearer and more informative than incanting:
(XORHIr16 R16C:$rA, (i16 -1))
even if the two are bitwise equivalent.
Tblgen also outputs the 64-bit unsigned constant in the generated ISel code
when getTargetConstant() is invoked.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47188 91177308-0d34-0410-b5e6-96231b3b80d8
we had reached the "fake bucket" after the last bucket, allowing the iterator
in some cases to run off the end of the hashtable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47178 91177308-0d34-0410-b5e6-96231b3b80d8
in a ret node. These are created as i32 constants
but on some platforms i32 is not legal. This
fixes 26 "make check" failures, for example
Alpha/2005-07-12-TwoMallocCalls.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47172 91177308-0d34-0410-b5e6-96231b3b80d8
register defs and uses after each successful coalescing.
- Also removed a number of hacks and fixed some subtle kill information bugs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47167 91177308-0d34-0410-b5e6-96231b3b80d8
it follows the order of the enum, not alphabetical.
The motivation is to make -mattr=+ssse3,+sse41
select SSE41 as it ought to. Added "ignored"
enum values of 0 to PPC and SPU to avoid compiler
warnings.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47143 91177308-0d34-0410-b5e6-96231b3b80d8
the return value is zero-extended if it isn't
sign-extended. It may also be any-extended.
Also, if a floating point value was returned
in a larger floating point type, pass 1 as the
second operand to FP_ROUND, which tells it
that all the precision is in the original type.
I think this is right but I could be wrong.
Finally, when doing libcalls, set isZExt on
a parameter if it is "unsigned". Currently
isSExt is set when signed, and nothing is
set otherwise. This should be right for all
calls to standard library routines.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47122 91177308-0d34-0410-b5e6-96231b3b80d8
1) ConstantFP is now expand by default
2) ConstantFP is not turned into TargetConstantFP during Legalize
if it is legal.
This allows ConstantFP to be handled like Constant, allowing for
targets that can encode FP immediates as MachineOperands.
As a bonus, fix up Itanium FP constants, which now correctly match,
and match more constants! Hooray.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47121 91177308-0d34-0410-b5e6-96231b3b80d8
CTTZ and CTPOP. The expansion code differs from
that in LegalizeDAG in that it chooses to take the
CTLZ/CTTZ count from the Hi/Lo part depending on
whether the Hi/Lo value is zero, not on whether
CTLZ/CTTZ of Hi/Lo returned 32 (or whatever the
width of the type is) for it. I made this change
because the optimizers may well know that Hi/Lo
is zero and exploit it. The promotion code for
CTTZ also differs from that in LegalizeDAG: it
uses an "or" to get the right result when the
original value is zero, rather than using a compare
and select. This also means the value doesn't
need to be zero extended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47075 91177308-0d34-0410-b5e6-96231b3b80d8
node as soon as we create it in SDISel. Previously we would lower it in
legalize. The problem with this is that it only exposes the argument
loads implied by FORMAL_ARGUMENTs after legalize, so that only dag combine 2
can hack on them. This causes us to miss some optimizations because
datatype expansion also happens here.
Exposing the loads early allows us to do optimizations on them. For example
we now compile arg-cast.ll to:
_foo:
movl $2147483647, %eax
andl 8(%esp), %eax
ret
where we previously produced:
_foo:
subl $12, %esp
movsd 16(%esp), %xmm0
movsd %xmm0, (%esp)
movl $2147483647, %eax
andl 4(%esp), %eax
addl $12, %esp
ret
It might also make sense to do this for ISD::CALL nodes, which have implicit
stores on many targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47054 91177308-0d34-0410-b5e6-96231b3b80d8
PR1877.
A3 = op A2 B0<kill>
...
B1 = A3 <- this copy
...
= op A3 <- more uses
==>
B2 = op B0 A2<kill>
...
B1 = B2 <- now an identify copy
...
= op B2 <- more uses
This speeds up FreeBench/neural by 29%, Olden/bh by 12%, oopack_v1p8 by 53%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47046 91177308-0d34-0410-b5e6-96231b3b80d8
Add an overload that supports the uint64_t interface for use by clients
that haven't been updated yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47039 91177308-0d34-0410-b5e6-96231b3b80d8
handle arbitrary precision integers and any number
of parts. For example, on a 32 bit machine an i50
corresponds to two i32 parts. getCopyToParts will
extend the i50 to an i64 then write half of the i64
to each part; getCopyFromParts will combine the two
i32 parts into an i64 then truncate the result to
i50.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47024 91177308-0d34-0410-b5e6-96231b3b80d8
variable (with step 1) and m is its final value. Then, the correct trip
count is SMAX(m,n)-n. Previously, we used SMAX(0,m-n), but m-n may
overflow and can't in general be interpreted as signed.
Patch by Nick Lewycky.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@47007 91177308-0d34-0410-b5e6-96231b3b80d8
to the RHS. This simple change allows to compute loop iteration count
for loops with condition similar to the one in the testcase (which seems
to be quite common).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46959 91177308-0d34-0410-b5e6-96231b3b80d8
Added member template "Add" to FoldingSetNodeID that allows "adding" arbitrary
objects to a profile via dispatch to FoldingSetTrait<T>::Profile().
Removed FoldingSetNodeID::AddAPFloat and FoldingSetNodeID::APInt, as their
functionality is now replaced using the above mentioned member template.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46957 91177308-0d34-0410-b5e6-96231b3b80d8
arbitrary iteration.
The patch:
1) changes SCEVSDivExpr into SCEVUDivExpr,
2) replaces PartialFact() function with BinomialCoefficient(); the
computations (essentially, the division) in BinomialCoefficient() are
performed with the apprioprate bitwidth necessary to avoid overflow;
unsigned division is used instead of the signed one.
Computations in BinomialCoefficient() require support from the code
generator for APInts. Currently, we use a hack rounding up the
neccessary bitwidth to the nearest power of 2. The hack is easy to turn
off in future.
One remaining issue: we assume the divisor of the binomial coefficient
formula can be computed accurately using 16 bits. It means we can handle
AddRecs of length up to 9. In future, we should use APInts to evaluate
the divisor.
Thanks to Nicholas for cooperation!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46955 91177308-0d34-0410-b5e6-96231b3b80d8
common problem of putting two terminators in the same block. I can't write
a testcase for this because the .ll parser rejects this before the verifier
can, but this can occur when generating IR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46900 91177308-0d34-0410-b5e6-96231b3b80d8
initializer problem, a minor tweak to the way the
DAGISelEmitter finds load/store nodes, and a renaming of the
new PseudoSourceValue objects.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46827 91177308-0d34-0410-b5e6-96231b3b80d8
check more intelligent. This speeds up mem2reg from 5.29s to
0.79s on a synthetic testcase with tons of predecessors and
phi nodes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46767 91177308-0d34-0410-b5e6-96231b3b80d8
was incorrectly simplifying "x == (gep x, 1, i)" into false, even
though i could be negative. As it turns out, all the code to
handle this already existed, we just need to disable the incorrect
optimization case and let the general case handle it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46739 91177308-0d34-0410-b5e6-96231b3b80d8
ReadyToProcess node - add an assertion to check
this. Add an assertion to NodeDeleted that checks
that processed/ready nodes are indeed not deleted.
It is because they are never deleted that none of
the maps can have a deleted node as the source of
a mapping. It does however seem to be possible in
theory to have a deleted value as the target of a
mapping, however this has not yet been spotted in
the wild. Still mulling on what to do about this.
[The theoretical situation is this: a node A is
expanded/promoted/whatever to a newly created node
B. Thus A->B is added to a map. When the subtree
rooted at B is legalized it is conceivable that B
is deleted due to RAUW on a node somewhere above
it].
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46705 91177308-0d34-0410-b5e6-96231b3b80d8
keep the LegalizeTypes node flags up to date when doing a RAUW.
This fixes a nasty bug that Duncan ran into and makes the
previous (nonbuggy case) more efficent.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46679 91177308-0d34-0410-b5e6-96231b3b80d8
DAGUpdateListener object pointer instead of just returning a vector
of deleted nodes. This makes the interfaces more efficient (no more
allocating a vector [at least a malloc], filling it in, then walking
it) and more clean. This also allows the client to be notified of
nodes that are *changed* but not deleted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46677 91177308-0d34-0410-b5e6-96231b3b80d8
SelectionDAG::ReplaceAllUsesWith to handle replacement of
an SDOperand with *any* sdoperand, not just one for a node with
a single result. Note that this has a horrible FIXME'd hack in it
to work around PR1975. This should be removed when PR1975 is fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46674 91177308-0d34-0410-b5e6-96231b3b80d8
Added ISD::DECLARE node type to represent llvm.dbg.declare intrinsic. Now the intrinsic calls are lowered into a SDNode and lives on through out the codegen passes.
For now, since all the debugging information recording is done at isel time, when a ISD::DECLARE node is selected, it has the side effect of also recording the variable. This is a short term solution that should be fixed in time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46659 91177308-0d34-0410-b5e6-96231b3b80d8
exposed a bug in APFloat's long double->double conversion of
NaNs. Broke several things in the ieee part of gcc testsuite.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46617 91177308-0d34-0410-b5e6-96231b3b80d8
the pattern when generating matchin code.
The first (and currently, only) attribute causes the immediate parent node of the ComplexPattern operand to be passed into the matching code rather than the node at the root of the entire DAG containing the pattern.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46606 91177308-0d34-0410-b5e6-96231b3b80d8
Replace getLocation() with getFrameIndexOffset() which returns the delta from frame pointer to stack slot. Dwarf writer can then use the information for whatever it wants.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46597 91177308-0d34-0410-b5e6-96231b3b80d8
in the backend. Introduce a new SDNode type, MemOperandSDNode, for
holding a MemOperand in the SelectionDAG IR, and add a MemOperand
list to MachineInstr, and code to manage them. Remove the offset
field from SrcValueSDNode; uses of SrcValueSDNode that were using
it are all all using MemOperandSDNode now.
Also, begin updating some getLoad and getStore calls to use the
PseudoSourceValue objects.
Most of this was written by Florian Brander, some
reorganization and updating to TOT by me.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46585 91177308-0d34-0410-b5e6-96231b3b80d8
Note this solution might be somewhat fragile since ISD::LABEL may be used for other
purposes. If that ends up to be an issue, we may need to introduce a different node
for debug labels.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46571 91177308-0d34-0410-b5e6-96231b3b80d8
memory reference information in the backend. Most of this was written by
Florian Brander, cleanup and updating to TOT by me.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46556 91177308-0d34-0410-b5e6-96231b3b80d8
- Expand tabs... (poss 80-col violations, will get them later...)
- Consolidate logic for SelectDFormAddr and SelectDForm2Addr into a single
function, simplifying maintenance. Also reduced custom instruction
generation for SPUvecinsert/INSERT_MASK.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46544 91177308-0d34-0410-b5e6-96231b3b80d8
In practice this can only happen on code with already undefined behavior,
but this is still a good thing to handle correctly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46539 91177308-0d34-0410-b5e6-96231b3b80d8
and StoreSDNode into their common base class LSBaseSDNode. Member
functions getLoadedVT and getStoredVT are replaced with the common
getMemoryVT to simplify code that will handle both loads and stores.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46538 91177308-0d34-0410-b5e6-96231b3b80d8
type that matters but the operand type. This fixes
2008-01-08-IllegalCMP.ll which crashed with the new
legalize infrastructure because SETCC with result
type i8 and operand type i64 was being custom expanded
by the X86 backend. With this fix, the gcc build gets
as far as the first libcall.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46525 91177308-0d34-0410-b5e6-96231b3b80d8
only two addressing mode nodes, SPUaform and SPUindirect (vice the
three previous ones, SPUaform, SPUdform and SPUxform). This improves
code somewhat because we now avoid using reg+reg addressing when
it can be avoided. It also simplifies the address selection logic,
which was the main point for doing this.
Also, for various global variables that would be loaded using SPU's
A-form addressing, prefer D-form offs[reg] addressing, keeping the
base in a register if the variable is used more than once.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46483 91177308-0d34-0410-b5e6-96231b3b80d8
way or the other. Rewriting the code itself prevents subsequent analysis
passes from making contradictory conclusions about the code that could
cause an infeasible path to be made feasible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46427 91177308-0d34-0410-b5e6-96231b3b80d8
registers if used by a bitconvert or using a bitconvert. This allows us to
avoid constant pool loads and use cheaper integer instructions when the
values come from or end up in integer regs anyway. For example, we now
compile CodeGen/X86/fp-in-intregs.ll to:
_test1:
movl $2147483648, %eax
xorl 4(%esp), %eax
ret
_test2:
movl $1065353216, %eax
orl 4(%esp), %eax
andl $3212836864, %eax
ret
Instead of:
_test1:
movss 4(%esp), %xmm0
xorps LCPI2_0, %xmm0
movd %xmm0, %eax
ret
_test2:
movss 4(%esp), %xmm0
andps LCPI3_0, %xmm0
movss LCPI3_1, %xmm1
andps LCPI3_2, %xmm1
orps %xmm0, %xmm1
movd %xmm1, %eax
ret
bitconverts can happen due to various calling conventions that require
fp values to passed in integer regs in some cases, e.g. when returning
a complex.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46414 91177308-0d34-0410-b5e6-96231b3b80d8
a "nop" instruction so that we don't have the function's label associated
with something that it's not supposed to be associated with.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46394 91177308-0d34-0410-b5e6-96231b3b80d8
void bork() {
int *address = 0;
*address = 0;
}
It's compiled into LLVM code that looks like this:
define void @bork() noreturn nounwind {
entry:
unreachable
}
This is bad on some platforms (like PPC) because it will generate the label for
the function but no body. The label could end up being associated with some
non-code related stuff, like a section. This places a "trap" instruction if the
SimplifyCFG pass removed all code from the function leaving only one
"unreachable" instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46387 91177308-0d34-0410-b5e6-96231b3b80d8
delete a node even if it was not dead in some cases. Instead, just add it to
the worklist. Also, make sure to use the CombineTo methods, as it was doing
things that were unsafe: the top level combine loop could touch dangling memory.
This fixes CodeGen/Generic/2008-01-25-dag-combine-mul.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46384 91177308-0d34-0410-b5e6-96231b3b80d8
was actually passing a completely incorrect size to sys_icache_invalidate.
Instead of having the JITEmitter do this (which doesn't have the correct
size), just make the target sync its own stubs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46354 91177308-0d34-0410-b5e6-96231b3b80d8
we can infer it. This will eventually help stuff, though it doesn't
do much right now because all fixed FI's have an alignment of 1.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46349 91177308-0d34-0410-b5e6-96231b3b80d8
can't be aliased to other known objects. This allows us to know that byval
pointer args don't alias globals, etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46315 91177308-0d34-0410-b5e6-96231b3b80d8
This case returns the value in ST(0) and then has to convert it to an SSE
register. This causes significant codegen ugliness in some cases. For
example in the trivial fp-stack-direct-ret.ll testcase we used to generate:
_bar:
subl $28, %esp
call L_foo$stub
fstpl 16(%esp)
movsd 16(%esp), %xmm0
movsd %xmm0, 8(%esp)
fldl 8(%esp)
addl $28, %esp
ret
because we move the result of foo() into an XMM register, then have to
move it back for the return of bar.
Instead of hacking ever-more special cases into the call result lowering code
we take a much simpler approach: on x86-32, fp return is modeled as always
returning into an f80 register which is then truncated to f32 or f64 as needed.
Similarly for a result, we model it as an extension to f80 + return.
This exposes the truncate and extensions to the dag combiner, allowing target
independent code to hack on them, eliminating them in this case. This gives
us this code for the example above:
_bar:
subl $12, %esp
call L_foo$stub
addl $12, %esp
ret
The nasty aspect of this is that these conversions are not legal, but we want
the second pass of dag combiner (post-legalize) to be able to hack on them.
To handle this, we lie to legalize and say they are legal, then custom expand
them on entry to the isel pass (PreprocessForFPConvert). This is gross, but
less gross than the code it is replacing :)
This also allows us to generate better code in several other cases. For
example on fp-stack-ret-conv.ll, we now generate:
_test:
subl $12, %esp
call L_foo$stub
fstps 8(%esp)
movl 16(%esp), %eax
cvtss2sd 8(%esp), %xmm0
movsd %xmm0, (%eax)
addl $12, %esp
ret
where before we produced (incidentally, the old bad code is identical to what
gcc produces):
_test:
subl $12, %esp
call L_foo$stub
fstpl (%esp)
cvtsd2ss (%esp), %xmm0
cvtss2sd %xmm0, %xmm0
movl 16(%esp), %eax
movsd %xmm0, (%eax)
addl $12, %esp
ret
Note that we generate slightly worse code on pr1505b.ll due to a scheduling
deficiency that is unrelated to this patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46307 91177308-0d34-0410-b5e6-96231b3b80d8
1. we already know the value is dead, so don't bother replacing
it with undef.
2. The very case the comment describes actually makes the load
live which asserts in deletenode. If we do the replacement
and the node becomes live, just treat it as new. This fixes
a failure on X86/2008-01-16-InvalidDAGCombineXform.ll with
some local changes in my tree.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46306 91177308-0d34-0410-b5e6-96231b3b80d8
dead stuff around. This gets fed into the isel pass and causes certain foldings from
happening because nodes have extraneous uses floating around. For example, if we turned
foo(bar(x)) -> baz(x), we sometimes left bar(x) around.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46305 91177308-0d34-0410-b5e6-96231b3b80d8
precision integers. This won't actually work
(and most of the code is dead) unless the new
legalization machinery is turned on. While
there, I rationalized the handling of i1, and
removed some bogus (and unused) sextload patterns.
For i1, this could result in microscopically
better code for some architectures (not X86).
It might also result in worse code if annotating
with AssertZExt nodes turns out to be more harmful
than helpful.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46280 91177308-0d34-0410-b5e6-96231b3b80d8