in the presence of out-of-loop users of in-loop values and the trip
count is not a known multiple of the unroll count, and to be a bit
simpler overall. This fixes PR2253.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@52645 91177308-0d34-0410-b5e6-96231b3b80d8
the section or the visibility from one global
value to another: copyAttributesFrom. This is
particularly useful for duplicating functions:
previously this was done by explicitly copying
each attribute in turn at each place where a
new function was created out of an old one, with
the result that obscure attributes were regularly
forgotten (like the collector or the section).
Hopefully now everything is uniform and nothing
is forgotten.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51567 91177308-0d34-0410-b5e6-96231b3b80d8
The SimplifyCFG pass looks at basic blocks that contain only phi nodes,
followed by an unconditional branch. In a lot of cases, such a block (BB) can
be merged into their successor (Succ).
This merging is performed by TryToSimplifyUncondBranchFromEmptyBlock. It does
this by taking all phi nodes in the succesor block Succ and expanding them to
include the predecessors of BB. Furthermore, any phi nodes in BB are moved to
Succ and expanded to include the predecessors of Succ as well.
Before attempting this merge, CanPropagatePredecessorsForPHIs checks to see if
all phi nodes can be properly merged. All functional changes are made to
this function, only comments were updated in
TryToSimplifyUncondBranchFromEmptyBlock.
In the original code, CanPropagatePredecessorsForPHIs looks quite convoluted
and more like stack of checks added to handle different kinds of situations
than a comprehensive check. In particular the first check in the function did
some value checking for the case that BB and Succ have a common predecessor,
while the last check in the function simply rejected all cases where BB and
Succ have a common predecessor. The first check was still useful in the case
that BB did not contain any phi nodes at all, though, so it was not completely
useless.
Now, CanPropagatePredecessorsForPHIs is restructured to to look a lot more
similar to the code that actually performs the merge. Both functions now look
at the same phi nodes in about the same order. Any conflicts (phi nodes with
different values for the same source) that could arise from merging or moving
phi nodes are detected. If no conflicts are found, the merge can happen.
Apart from only restructuring the checks, two main changes in functionality
happened.
Firstly, the old code rejected blocks with common predecessors in most cases.
The new code performs some extra checks so common predecessors can be handled
in a lot of cases. Wherever common predecessors still pose problems, the
blocks are left untouched.
Secondly, the old code rejected the merge when values (phi nodes) from BB were
used in any other place than Succ. However, it does not seem that there is any
situation that would require this check. Even more, this can be proven.
Consider that BB is a block containing of a single phi node "%a" and a branch
to Succ. Now, since the definition of %a will dominate all of its uses, BB
will dominate all blocks that use %a. Furthermore, since the branch from BB to
Succ is unconditional, Succ will also dominate all uses of %a.
Now, assume that one predecessor of Succ is not dominated by BB (and thus not
dominated by Succ). Since at least one use of %a (but in reality all of them)
is reachable from Succ, you could end up at a use of %a without passing
through it's definition in BB (by coming from X through Succ). This is a
contradiction, meaning that our original assumption is wrong. Thus, all
predecessors of Succ must also be dominated by BB (and thus also by Succ).
This means that moving the phi node %a from BB to Succ does not pose any
problems when the two blocks are merged, and any use checks are not needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51478 91177308-0d34-0410-b5e6-96231b3b80d8
address of the PassInfo directly instead of calling getPassInfo.
This eliminates a bunch of dynamic initializations of static data.
Also, fold RegisterPassBase into PassInfo, make a bunch of its
data members const, and rearrange some code to initialize data
members in constructors instead of using setter member functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51022 91177308-0d34-0410-b5e6-96231b3b80d8
several things that were neither in an anonymous namespace nor static
but not intended to be global.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@51017 91177308-0d34-0410-b5e6-96231b3b80d8
Fix said code to handle merging return instructions together correctly
when handling multiple return values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@50199 91177308-0d34-0410-b5e6-96231b3b80d8
as a global helper function. At the same type, switch it from taking
a vector of predecessors to an arbitrary sequential input. This allows
us to switch LoopSimplify to use a SmallVector for various temporary
vectors that it passed into SplitBlockPredecessors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@50020 91177308-0d34-0410-b5e6-96231b3b80d8
needs to be fixed here - a previous commit made sure
that intrinsics always get the right attributes.
So remove no-longer needed code, and while there use
Intrinsic::getDeclaration rather than getOrInsertFunction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@49337 91177308-0d34-0410-b5e6-96231b3b80d8
nounwind. When such calls are inlined into something
else that is invoked, they were getting changed to invokes,
which is badness.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@49299 91177308-0d34-0410-b5e6-96231b3b80d8
Specifically, introduction of XXX::Create methods
for Users that have a potentially variable number of
Uses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@49277 91177308-0d34-0410-b5e6-96231b3b80d8
2. Do not use # of basic blocks as part of the cost computation since it doesn't really figure into function size.
3. More aggressively inline function with vector code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@49061 91177308-0d34-0410-b5e6-96231b3b80d8
not marked nounwind, or for all functions when -enable-eh
is set, provided the target supports Dwarf EH.
llvm-gcc generates nounwind in the right places; other FEs
will need to do so also. Given such a FE, -enable-eh should
no longer be needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@49006 91177308-0d34-0410-b5e6-96231b3b80d8
Furthermore, double the limit when more than 10% of the callee instructions are vector instructions. Multimedia kernels tend to love inlining.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48725 91177308-0d34-0410-b5e6-96231b3b80d8
before trying to merge the block into its predecessors.
This allows two-entry-phi-return.ll to be simplified
into a single basic block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48252 91177308-0d34-0410-b5e6-96231b3b80d8
Secondly, we have to check whether the branch is actually pointing to the block
with the unwind in it. We could have gotten here because of the unwind_to alone.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48099 91177308-0d34-0410-b5e6-96231b3b80d8
Add the ability to remove just one instance of a BB from a phi node. This fixes
the compile error in the tree now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@48085 91177308-0d34-0410-b5e6-96231b3b80d8
check more intelligent. This speeds up mem2reg from 5.29s to
0.79s on a synthetic testcase with tons of predecessors and
phi nodes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@46767 91177308-0d34-0410-b5e6-96231b3b80d8
inlining a function if we know that the function does not write
to *any* memory. This implements test/Transforms/Inline/byval2.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@45912 91177308-0d34-0410-b5e6-96231b3b80d8
could theoretically introduce a trap, but is also a performance issue.
This speeds up ptrdist/ks by 8%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@45533 91177308-0d34-0410-b5e6-96231b3b80d8
define void @f() {
...
call i32 @g()
...
}
define void @g() {
...
}
The hazards are:
- @f and @g have GC, but they differ GC. Inlining is invalid. This
may never occur.
- @f has no GC, but @g does. g's GC must be propagated to @f.
The other scenarios are safe:
- @f and @g have the same GC.
- @f and @g have no GC.
- @g has no GC.
This patch adds inliner checks for the former two scenarios.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@45351 91177308-0d34-0410-b5e6-96231b3b80d8
calls 'nounwind'. It is important for correct C++
exception handling that nounwind markings do not get
lost, so this transformation is actually needed for
correctness.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@45218 91177308-0d34-0410-b5e6-96231b3b80d8
how to lower them (with no attempt made to be
efficient, since they should only occur for
unoptimized code).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@45108 91177308-0d34-0410-b5e6-96231b3b80d8
calls. Remove special casing of inline asm from the
inliner. There is a potential problem: the verifier
rejects invokes of inline asm (not sure why). If an
asm call is not marked "nounwind" in some .ll, and
instcombine is not run, but the inliner is run, then
an illegal module will be created. This is bad but
I'm not sure what the best approach is. I'm tempted
to remove the check in the verifier...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@45073 91177308-0d34-0410-b5e6-96231b3b80d8
Reimplement the xform in Analysis/ConstantFolding.cpp where we can use
targetdata to validate that it is safe. While I'm in there, fix some const
correctness issues and generalize the interface to the "operand folder".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@44817 91177308-0d34-0410-b5e6-96231b3b80d8
methods are new to Function:
bool hasCollector() const;
const std::string &getCollector() const;
void setCollector(const std::string &);
void clearCollector();
The assembly representation is as such:
define void @f() gc "shadow-stack" { ...
The implementation uses an on-the-side table to map Functions to
collector names, such that there is no overhead. A StringPool is
further used to unique collector names, which are extremely
likely to be unique per process.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@44769 91177308-0d34-0410-b5e6-96231b3b80d8
throw exceptions", just mark intrinsics with the nounwind
attribute. Likewise, mark intrinsics as readnone/readonly
and get rid of special aliasing logic (which didn't use
anything more than this anyway).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@44544 91177308-0d34-0410-b5e6-96231b3b80d8
the function type, instead they belong to functions
and function calls. This is an updated and slightly
corrected version of Reid Spencer's original patch.
The only known problem is that auto-upgrading of
bitcode files doesn't seem to work properly (see
test/Bitcode/AutoUpgradeIntrinsics.ll). Hopefully
a bitcode guru (who might that be? :) ) will fix it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@44359 91177308-0d34-0410-b5e6-96231b3b80d8
The meaning of getTypeSize was not clear - clarifying it is important
now that we have x86 long double and arbitrary precision integers.
The issue with long double is that it requires 80 bits, and this is
not a multiple of its alignment. This gives a primitive type for
which getTypeSize differed from getABITypeSize. For arbitrary precision
integers it is even worse: there is the minimum number of bits needed to
hold the type (eg: 36 for an i36), the maximum number of bits that will
be overwriten when storing the type (40 bits for i36) and the ABI size
(i.e. the storage size rounded up to a multiple of the alignment; 64 bits
for i36).
This patch removes getTypeSize (not really - it is still there but
deprecated to allow for a gradual transition). Instead there is:
(1) getTypeSizeInBits - a number of bits that suffices to hold all
values of the type. For a primitive type, this is the minimum number
of bits. For an i36 this is 36 bits. For x86 long double it is 80.
This corresponds to gcc's TYPE_PRECISION.
(2) getTypeStoreSizeInBits - the maximum number of bits that is
written when storing the type (or read when reading it). For an
i36 this is 40 bits, for an x86 long double it is 80 bits. This
is the size alias analysis is interested in (getTypeStoreSize
returns the number of bytes). There doesn't seem to be anything
corresponding to this in gcc.
(3) getABITypeSizeInBits - this is getTypeStoreSizeInBits rounded
up to a multiple of the alignment. For an i36 this is 64, for an
x86 long double this is 96 or 128 depending on the OS. This is the
spacing between consecutive elements when you form an array out of
this type (getABITypeSize returns the number of bytes). This is
TYPE_SIZE in gcc.
Since successive elements in a SequentialType (arrays, pointers
and vectors) need to be aligned, the spacing between them will be
given by getABITypeSize. This means that the size of an array
is the length times the getABITypeSize. It also means that GEP
computations need to use getABITypeSize when computing offsets.
Furthermore, if an alloca allocates several elements at once then
these too need to be aligned, so the size of the alloca has to be
the number of elements multiplied by getABITypeSize. Logically
speaking this doesn't have to be the case when allocating just
one element, but it is simpler to also use getABITypeSize in this
case. So alloca's and mallocs should use getABITypeSize. Finally,
since gcc's only notion of size is that given by getABITypeSize, if
you want to output assembler etc the same as gcc then getABITypeSize
is the size you want.
Since a store will overwrite no more than getTypeStoreSize bytes,
and a read will read no more than that many bytes, this is the
notion of size appropriate for alias analysis calculations.
In this patch I have corrected all type size uses except some of
those in ScalarReplAggregates, lib/Codegen, lib/Target (the hard
cases). I will get around to auditing these too at some point,
but I could do with some help.
Finally, I made one change which I think wise but others might
consider pointless and suboptimal: in an unpacked struct the
amount of space allocated for a field is now given by the ABI
size rather than getTypeStoreSize. I did this because every
other place that reserves memory for a type (eg: alloca) now
uses getABITypeSize, and I didn't want to make an exception
for unpacked structs, i.e. I did it to make things more uniform.
This only effects structs containing long doubles and arbitrary
precision integers. If someone wants to pack these types more
tightly they can always use a packed struct.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@43620 91177308-0d34-0410-b5e6-96231b3b80d8
in CodeExtractor and LoopSimplify unnecessary.
Hartmut, could you confirm that this fixes the issues you were seeing?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@43115 91177308-0d34-0410-b5e6-96231b3b80d8
Add a new DenseMapInfo::isEqual method to allow clients to redefine
the equality predicate used when probing the hash table.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@42042 91177308-0d34-0410-b5e6-96231b3b80d8
In the old way, we computed and inserted phi nodes for the whole IDF of
the definitions of the alloca, then computed which ones were dead and
removed them.
In the new method, we first compute the region where the value is live,
and use that information to only insert phi nodes that are live. This
eliminates the need to compute liveness later, and stops the algorithm
from inserting a bunch of phis which it then later removes.
This speeds up the testcase in PR1432 from 2.00s to 0.15s (14x) in a
release build and 6.84s->0.50s (14x) in a debug build.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40825 91177308-0d34-0410-b5e6-96231b3b80d8
to the worklist, and handling the last one with a 'tail call'. This speeds
up PR1432 from 2.0578s to 2.0012s (2.8%)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40822 91177308-0d34-0410-b5e6-96231b3b80d8
faster than with the 'local to a block' fastpath. This speeds
up PR1432 from 2.1232 to 2.0686s (2.6%)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40818 91177308-0d34-0410-b5e6-96231b3b80d8
to increment NumLocalPromoted, and didn't actually delete the
dead alloca, leading to an extra iteration of mem2reg.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40817 91177308-0d34-0410-b5e6-96231b3b80d8
stored value was a non-instruction value. Doh.
This increase the # single store allocas from 8982 to 9026, and
speeds up mem2reg on the testcase in PR1432 from 2.17 to 2.13s.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40813 91177308-0d34-0410-b5e6-96231b3b80d8
1. Check for revisiting a block before checking domination, which is faster.
2. If the stored value isn't an instruction, we don't have to check for domination.
3. If we have a value used in the same block more than once, make sure to remove the
block from the UsingBlocks vector. Not doing so forces us to go through the slow
path for the alloca.
The combination of these improvements increases the number of allocas on the fastpath
from 8935 to 8982 on PR1432. This speeds it up from 2.90s to 2.20s (31%)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40811 91177308-0d34-0410-b5e6-96231b3b80d8
a using block from the list if we handle it. Not doing this caused us
to not be able to promote (with the fast path) allocas which have uses (whoops).
This increases the # allocas hitting this fastpath from 4042 to 8935 on the
testcase in PR1432, speeding up mem2reg by 2.6x
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40809 91177308-0d34-0410-b5e6-96231b3b80d8
to Instruction::mayWriteToMemory, fixing a FIXME, and helping
various places that call mayWriteToMemory directly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40533 91177308-0d34-0410-b5e6-96231b3b80d8
This interface allows clients to inline bunch of functions with module
level call graph information.:wq
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40486 91177308-0d34-0410-b5e6-96231b3b80d8
second part dominates all the blocks dominated
by original basic block. And first part dominates
second part.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@40035 91177308-0d34-0410-b5e6-96231b3b80d8
llvm-gcc build to succeed. Without this change it fails in libstdc++
compilation. This causes no regressions in dejagnu tests. However,
someone who knows this code better might want to review it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@39924 91177308-0d34-0410-b5e6-96231b3b80d8
Due to darwin gcc bug, one version of darwin linker coalesces
static const int, which defauts PassID based pass identification.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@36652 91177308-0d34-0410-b5e6-96231b3b80d8
constructing ImmediateDominator is now folded into DomTree construction.
This is part of the ongoing work for PR217.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@36063 91177308-0d34-0410-b5e6-96231b3b80d8
ETForest updating mechanisms don't work as I thought they did. These changes
will be reapplied once the issue is worked out.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@35741 91177308-0d34-0410-b5e6-96231b3b80d8
BBNumbers. Instead of using a bi-directional mapping, just use a single
densemap. This speeds up mem2reg on 176.gcc by 8%, from 1.3489 to
1.2485s.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@33940 91177308-0d34-0410-b5e6-96231b3b80d8
the Transforms library. This reduces debug library size by 132 KB, debug
binary size by 376 KB, and reduces link time for llvm tools slightly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@33939 91177308-0d34-0410-b5e6-96231b3b80d8
This patch replaces the SymbolTable class with ValueSymbolTable which does
not support types planes. This means that all symbol names in LLVM must now
be unique. The patch addresses the necessary changes to deal with this and
removes code no longer needed as a result. This completes the bulk of the
changes for this PR. Some cleanup patches will follow.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@33918 91177308-0d34-0410-b5e6-96231b3b80d8
Make the Module's dependent library use a std::vector instead of SetVector
adjust #includes in .cpp files because SetVector.h is no longer included.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@33855 91177308-0d34-0410-b5e6-96231b3b80d8
updating. These were exposed by Devang's recent passmgr changes (with
non-default passorderings) because now the inliner can be interleved with
the LCSSA pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@33760 91177308-0d34-0410-b5e6-96231b3b80d8
ConstantFoldInstOperands/ConstantFoldCall to take a pointer to an array
of operands + size, instead of an std::vector.
In some cases, switch to using a SmallVector instead of a vector.
This allows us to get rid of some special case gross code that was there
to avoid the cost of constructing a vector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@33670 91177308-0d34-0410-b5e6-96231b3b80d8
The Module::setEndianness and Module::setPointerSize methods have been
removed. Instead you can get/set the DataLayout. Adjust thise accordingly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@33530 91177308-0d34-0410-b5e6-96231b3b80d8
This is the final patch for this PR. It implements some minor cleanup
in the use of IntegerType, to wit:
1. Type::getIntegerTypeMask -> IntegerType::getBitMask
2. Type::Int*Ty changed to IntegerType* from Type*
3. ConstantInt::getType() returns IntegerType* now, not Type*
This also fixes PR1120.
Patch by Sheng Zhou.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@33370 91177308-0d34-0410-b5e6-96231b3b80d8
rename Type::getIntegralTypeMask to Type::getIntegerTypeMask.
This makes naming much more consistent. For example, there are now no longer any
instances of IntegerType that are not considered isInteger! :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@33225 91177308-0d34-0410-b5e6-96231b3b80d8
recommended that getBoolValue be replaced with getZExtValue and that
get(bool) be replaced by get(const Type*, uint64_t). This implements
those changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@33110 91177308-0d34-0410-b5e6-96231b3b80d8
Merge ConstantIntegral and ConstantBool into ConstantInt.
Remove ConstantIntegral and ConstantBool from LLVM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@33073 91177308-0d34-0410-b5e6-96231b3b80d8
Take an incremental step towards type plane elimination. This change
separates types from values in the symbol tables by finally making use
of the TypeSymbolTable class. This yields more natural interfaces for
dealing with types and unclutters the SymbolTable class.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@32956 91177308-0d34-0410-b5e6-96231b3b80d8
This patch replaces signed integer types with signless ones:
1. [US]Byte -> Int8
2. [U]Short -> Int16
3. [U]Int -> Int32
4. [U]Long -> Int64.
5. Removal of isSigned, isUnsigned, getSignedVersion, getUnsignedVersion
and other methods related to signedness. In a few places this warranted
identifying the signedness information from other sources.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@32785 91177308-0d34-0410-b5e6-96231b3b80d8
This patch removes the SetCC instructions and replaces them with the ICmp
and FCmp instructions. The SetCondInst instruction has been removed and
been replaced with ICmpInst and FCmpInst.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@32751 91177308-0d34-0410-b5e6-96231b3b80d8
rework the hacks that had us passing OStream in. We pass in std::ostream*
instead, check for null, and then dispatch to the correct print() method.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@32636 91177308-0d34-0410-b5e6-96231b3b80d8
The long awaited CAST patch. This introduces 12 new instructions into LLVM
to replace the cast instruction. Corresponding changes throughout LLVM are
provided. This passes llvm-test, llvm/test, and SPEC CPUINT2000 with the
exception of 175.vpr which fails only on a slight floating point output
difference.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@31931 91177308-0d34-0410-b5e6-96231b3b80d8
only do these transformations if there are a small number of phi's.
This speeds up Ptrdist/ks from 2.35s to 2.19s on my mac pro.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@31853 91177308-0d34-0410-b5e6-96231b3b80d8
This patch converts the old SHR instruction into two instructions,
AShr (Arithmetic) and LShr (Logical). The Shr instructions now are not
dependent on the sign of their operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@31542 91177308-0d34-0410-b5e6-96231b3b80d8
Turn on -Wunused and -Wno-unused-parameter. Clean up most of the resulting
fall out by removing unused variables. Remaining warnings have to do with
unused functions (I didn't want to delete code without review) and unused
variables in generated code. Maintainers should clean up the remaining
issues when they see them. All changes pass DejaGnu tests and Olden.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@31380 91177308-0d34-0410-b5e6-96231b3b80d8
This patch implements the first increment for the Signless Types feature.
All changes pertain to removing the ConstantSInt and ConstantUInt classes
in favor of just using ConstantInt.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@31063 91177308-0d34-0410-b5e6-96231b3b80d8
The critical edge block dominates the dest block if the destblock dominates
all edges other than the one incoming from the critical edge.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@30696 91177308-0d34-0410-b5e6-96231b3b80d8
or when splitting loops with a common header into multiple loops. In particular
the old code would always insert the preheader before the old loop header. This
is disasterous in cases where the loop hasn't been rotated. For example, it can
produce code like:
.. outside the loop...
jmp LBB1_2 #bb13.outer
LBB1_1: #bb1
movsd 8(%esp,%esi,8), %xmm1
mulsd (%edi), %xmm1
addsd %xmm0, %xmm1
addl $24, %edi
incl %esi
jmp LBB1_3 #bb13
LBB1_2: #bb13.outer
leal (%edx,%eax,8), %edi
pxor %xmm1, %xmm1
xorl %esi, %esi
LBB1_3: #bb13
movapd %xmm1, %xmm0
cmpl $4, %esi
jl LBB1_1 #bb1
Note that the loop body is actually LBB1_1 + LBB1_3, which means that the
loop now contains an uncond branch WITHIN it to jump around the inserted
loop header (LBB1_2). Doh.
This patch changes the preheader insertion code to insert it in the right
spot, producing this code:
... outside the loop, fall into the header ...
LBB1_1: #bb13.outer
leal (%edx,%eax,8), %esi
pxor %xmm0, %xmm0
xorl %edi, %edi
jmp LBB1_3 #bb13
LBB1_2: #bb1
movsd 8(%esp,%edi,8), %xmm0
mulsd (%esi), %xmm0
addsd %xmm1, %xmm0
addl $24, %esi
incl %edi
LBB1_3: #bb13
movapd %xmm0, %xmm1
cmpl $4, %edi
jl LBB1_2 #bb1
Totally crazy, no branch in the loop! :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@30587 91177308-0d34-0410-b5e6-96231b3b80d8
reachable, making it general purpose enough for use by InsertPreheaderForLoop.
Eliminate custom dominfo updating code in InsertPreheaderForLoop, using
UpdateDomInfoForRevectoredPreds instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@30586 91177308-0d34-0410-b5e6-96231b3b80d8
Call these from your backend to enjoy setjmp/longjmp goodness, see
lib/Target/IA64/IA64ISelLowering.cpp for an example
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@30095 91177308-0d34-0410-b5e6-96231b3b80d8
Not only will this take huge amounts of compile time, the resultant loop nests
won't be useful for optimization. This reduces loopsimplify time on
Transforms/LoopSimplify/2006-08-11-LoopSimplifyLongTime.ll from ~32s to ~0.4s
with a debug build of llvm on a 2.7Ghz G5.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29647 91177308-0d34-0410-b5e6-96231b3b80d8
blocks that target loop blocks.
Before, the code was run once per loop, and depended on the number of
predecessors each block in the loop had. Unfortunately, scanning preds can
be really slow when huge numbers of phis exist or when phis with huge numbers
of inputs exist.
Now, the code is run once per function and scans successors instead of preds,
which is far faster. In addition, the new code is simpler and is goto free,
woo.
This change speeds up a nasty testcase Duraid provided me from taking hours to
taking ~72s with a debug build. The functionality this implements is already
tested in the testsuite as Transforms/CodeExtractor/2004-03-13-LoopExtractorCrash.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29644 91177308-0d34-0410-b5e6-96231b3b80d8
down approach, inspired by discussions with Tanya.
This approach is significantly faster, because it does not need dominator
frontiers and it does not insert extraneous unused PHI nodes. For example, on
252.eon, in a release-asserts build, this speeds up LCSSA (which is the slowest
pass in gccas) from 9.14s to 0.74s on my G5. This code is also slightly smaller
and significantly simpler than the old code.
Amusingly, in a normal Release build (which includes the
"assert(L->isLCSSAForm());" assertion), asserting that the result of LCSSA
is in LCSSA form is actually slower than the LCSSA transformation pass
itself on 252.eon. I will see if Loop::isLCSSAForm can be sped up next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29463 91177308-0d34-0410-b5e6-96231b3b80d8
Handle this case, which doesn't require a new callgraph edge. This fixes
a crash compiling MallocBench/gs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29121 91177308-0d34-0410-b5e6-96231b3b80d8
target CG node. This allows the inliner to properly update the callgraph
when using the pruning inliner. The pruning inliner may not copy over all
call sites from a callee to a caller, so the edges corresponding to those
call sites should not be copied over either.
This fixes PR827 and Transforms/Inline/2006-07-12-InlinePruneCGUpdate.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29120 91177308-0d34-0410-b5e6-96231b3b80d8
cases. Ideally, this issue will go away in the future as LCSSA gets smarter
about which Phi nodes it inserts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29076 91177308-0d34-0410-b5e6-96231b3b80d8
LCSSA is still the slowest pass when gccas'ing 252.eon, but now it only takes
39s instead of 289s. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28776 91177308-0d34-0410-b5e6-96231b3b80d8
not handling PHI nodes correctly when determining if a value was live-out.
This patch reduces the number of detected live-out variables in the testcase
from 6565 to 485.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28771 91177308-0d34-0410-b5e6-96231b3b80d8
If a single exit block has multiple predecessors within the loop, it will
appear in the exit blocks list more than once. LCSSA needs to take that into
account so that it doesn't double process that exit block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28750 91177308-0d34-0410-b5e6-96231b3b80d8
to link in the implementation. Thanks to Anton Korobeynikov for figuring out
what was going on here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28660 91177308-0d34-0410-b5e6-96231b3b80d8
code (while cloning) it often gets the branch/switch instructions. Since it
knows that edges of the CFG are dead, it need not clone (or even look) at
the obviously dead blocks. This should speed up the inliner substantially on
code where there are lots of inlinable calls to functions with constant
arguments. On C++ code in particular, this kicks in.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28641 91177308-0d34-0410-b5e6-96231b3b80d8
reimplement getValueDominatingFunction to walk the DominanceTree rather than
just searching blindly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28618 91177308-0d34-0410-b5e6-96231b3b80d8
is now theoretically feature-complete. It has not, however, been thoroughly
test, and is still considered experimental.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28529 91177308-0d34-0410-b5e6-96231b3b80d8
the iterated Dominance Frontier of the loop-closure Phi's. This is the
second phase of the LCSSA pass. The third phase (coming soon) will be to
update all uses of loop variables to use the loop-closure Phi's instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28524 91177308-0d34-0410-b5e6-96231b3b80d8
makes it so that it constant folds instructions on the fly. This is good
for several reasons:
0. Many instructions are constant foldable after inlining, particularly if
inlining a call with constant arguments.
1. Without this, the inliner has to allocate memory for all of the instructions
that can be constant folded, then a subsequent pass has to delete them. This
gets the job done without this extra work.
2. This makes the inliner *pass* a bit more aggressive: in particular, it
partially solves a phase order issue where the inliner would inline lots
of code that folds away to nothing, but think that the resultant function
is big because of this code that will be gone. Now the code never exists.
This is the first part of a 2-step process. The second part will be smart
enough to see when this implicit constant folding propagates a constant into
a branch or switch instruction, making CFG edges dead.
This implements Transforms/Inline/inline_constprop.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28521 91177308-0d34-0410-b5e6-96231b3b80d8
nondeterminism being bad) could cause some trivial missed optimizations (dead
phi nodes being left around for later passes to clean up).
With this, llvm-gcc4 now bootstraps and correctly compares. I don't know
why I never tried to do it before... :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@27984 91177308-0d34-0410-b5e6-96231b3b80d8
This patch is an incremental step towards supporting a flat symbol table.
It de-overloads the intrinsic functions by providing type-specific intrinsics
and arranging for automatically upgrading from the old overloaded name to
the new non-overloaded name. Specifically:
llvm.isunordered -> llvm.isunordered.f32, llvm.isunordered.f64
llvm.sqrt -> llvm.sqrt.f32, llvm.sqrt.f64
llvm.ctpop -> llvm.ctpop.i8, llvm.ctpop.i16, llvm.ctpop.i32, llvm.ctpop.i64
llvm.ctlz -> llvm.ctlz.i8, llvm.ctlz.i16, llvm.ctlz.i32, llvm.ctlz.i64
llvm.cttz -> llvm.cttz.i8, llvm.cttz.i16, llvm.cttz.i32, llvm.cttz.i64
New code should not use the overloaded intrinsic names. Warnings will be
emitted if they are used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@25366 91177308-0d34-0410-b5e6-96231b3b80d8
it doesn't contain any calls. This is a fairly common case for C++ code,
so it will probably speed up the inliner marginally in these cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@25284 91177308-0d34-0410-b5e6-96231b3b80d8
function was not an alloca, we wouldn't check the entry block for any allocas,
leading to increased stack space in some cases. In practice, allocas are almost
always at the top of the block, so this was never noticed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@25280 91177308-0d34-0410-b5e6-96231b3b80d8
has a single def. In this case, look for uses that are dominated by the def
and attempt to rewrite them to directly use the stored value.
This speeds up mem2reg on these values and reduces the number of phi nodes
inserted. This should address PR665.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@24411 91177308-0d34-0410-b5e6-96231b3b80d8
Add support for specifying alignment and size of setjmp jmpbufs.
No targets currently do anything with this information, nor is it presrved
in the bytecode representation. That's coming up next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@24196 91177308-0d34-0410-b5e6-96231b3b80d8
into the LLVMAnalysis library.
This allows LLVMTranform and LLVMTransformUtils to be archives and linked
with LLVMAnalysis.a, which provides any missing definitions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@24036 91177308-0d34-0410-b5e6-96231b3b80d8
SparcV9 JIT.
2. Make LLVMTransformUtils a relinked object file and always link it before
LLVMAnalysis.a. These two libraries have circular dependencies on each
other which creates problem when building the SparcV9 JIT. This change
fixes the dependency on all platforms problems with a minimum of fuss.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@24023 91177308-0d34-0410-b5e6-96231b3b80d8
pointer marking the end of the list, the zero *must* be cast to the pointer
type. An un-cast zero is a 32-bit int, and at least on x86_64, gcc will
not extend the zero to 64 bits, thus allowing the upper 32 bits to be
random junk.
The new END_WITH_NULL macro may be used to annotate a such a function
so that GCC (version 4 or newer) will detect the use of un-casted zero
at compile time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23888 91177308-0d34-0410-b5e6-96231b3b80d8
is performed so it is only at most once per function that contains an invoke
instead of once per invoke in the function. This patch has the following perks:
1. It fixes PR631, which complains about slowness.
2. If fixes PR240, which complains about non-volatile vars being live across
setjmp/longjmps.
3. It improves (but does not fix) the jmpbuf alignment issue on itanium by not
forcing the jmpbufs to always be 8-bytes off the alignment of the structure.
4. It speeds up 253.perlbmk from 338s to 13.70s (a 25x improvement!), making us
now about 4% faster than GCC.
Further improvements are also possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23477 91177308-0d34-0410-b5e6-96231b3b80d8
not define a value that is used outside of it's block. This catches many
more simplifications, e.g. 854 in 176.gcc, 137 in vpr, etc.
This implements branch-phi-thread.ll:test3.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23397 91177308-0d34-0410-b5e6-96231b3b80d8
control across branches with determined outcomes. More generality to follow.
This triggers a couple thousand times in specint.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23391 91177308-0d34-0410-b5e6-96231b3b80d8
a problem in LoopStrengthReduction, where it would split critical edges
then confused itself with outdated loop information.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22776 91177308-0d34-0410-b5e6-96231b3b80d8
Instead, just update the BB in-place. This is both faster, and it prevents
split-critical-edges from shuffling the PHI argument list unneccesarily.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22765 91177308-0d34-0410-b5e6-96231b3b80d8
into just Y. This often occurs when it seperates loops that have collapsed loop
headers. This implements LoopSimplify/phi-node-simplify.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22746 91177308-0d34-0410-b5e6-96231b3b80d8
BasicBlock's removePredecessor routine. This requires shuffling around
the definition and implementation of hasContantValue from Utils.h,cpp into
Instructions.h,cpp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22664 91177308-0d34-0410-b5e6-96231b3b80d8
consideration the case where a reference in an unreachable block could
occur. This fixes Transforms/SimplifyCFG/2005-08-01-PHIUpdateFail.ll,
something I ran into while bugpoint'ing another pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22584 91177308-0d34-0410-b5e6-96231b3b80d8
The optimization for locally used allocas was not safe for allocas that
were read before they were written. This change disables that optimization
in that case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22318 91177308-0d34-0410-b5e6-96231b3b80d8
a call. This fixes Prolangs-C++/deriv2, kimwitu++, and Misc-C++/bigfib
on X86 with -enable-x86-fastcc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22023 91177308-0d34-0410-b5e6-96231b3b80d8
sinh, cosh, etc.
* Make the name comparisons for the fp libcalls a little more efficient by
switching on the first character of the name before doing comparisons.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@21611 91177308-0d34-0410-b5e6-96231b3b80d8
using Function::arg_{iterator|begin|end}. Likewise Module::g* -> Module::global_*.
This patch is contributed by Gabor Greif, thanks!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@20597 91177308-0d34-0410-b5e6-96231b3b80d8
This does a simple form of "jump threading", which eliminates CFG edges that
are provably dead. This triggers 90 times in the external tests, and
eliminating CFG edges is always always a good thing! :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@20300 91177308-0d34-0410-b5e6-96231b3b80d8
SimplifyCFG is one of those passes that we use for final cleanup: it should
not rely on other passes to clean up its garbage. This fixes the "why are
trivially dead setcc's in the output of gccas" problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19212 91177308-0d34-0410-b5e6-96231b3b80d8
do not insert a prototype for malloc of: void* malloc(uint): on 64-bit u
targets this is not correct. Instead of prototype it as void *malloc(...),
and pass the correct intptr_t through the "...".
Finally, fix Regression/CodeGen/SparcV9/2004-12-13-MallocCrash.ll, by not
forming constantexpr casts from pointer to uint.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@18908 91177308-0d34-0410-b5e6-96231b3b80d8