llvm-6502/lib/Transforms/Scalar
Chandler Carruth ea2e90df15 Reimplement SROA yet again. Same fundamental principle, but a totally
different core implementation strategy.

Previously, SROA would build a relatively elaborate partitioning of an
alloca, associate uses with each partition, and then rewrite the uses of
each partition in an attempt to break apart the alloca into chunks that
could be promoted. This was very wasteful in terms of memory and compile
time because regardless of how complex the alloca or how much we're able
to do in breaking it up, all of the datastructure work to analyze the
partitioning was done up front.

The new implementation attempts to form partitions of the alloca lazily
and on the fly, rewriting the uses that make up that partition as it
goes. This has a few significant effects:
1) Much simpler data structures are used throughout.
2) No more double walk of the recursive use graph of the alloca, only
   walk it once.
3) No more complex algorithms for associating a particular use with
   a particular partition.
4) PHI and Select speculation is simplified and happens lazily.
5) More precise information is available about a specific use of the
   alloca, removing the need for some side datastructures.

Ultimately, I think this is a much better implementation. It removes
about 300 lines of code, but arguably removes more like 500 considering
that some code grew in the process of being factored apart and cleaned
up for this all to work.

I've re-used as much of the old implementation as possible, which
includes the lion's share of code in the form of the rewriting logic.
The interesting new logic centers around how the uses of a partition are
sorted, and split into actual partitions.

Each instruction using a pointer derived from the alloca gets
a 'Partition' entry. This name is totally wrong, but I'll do a rename in
a follow-up commit as there is already enough churn here. The entry
describes the offset range accessed and the nature of the access. Once
we have all of these entries we sort them in a very specific way:
increasing order of begin offset, followed by whether they are
splittable uses (memcpy, etc), followed by the end offset or whatever.
Sorting by splittability is important as it simplifies the collection of
uses into a partition.

Once we have these uses sorted, we walk from the beginning to the end
building up a range of uses that form a partition of the alloca.
Overlapping unsplittable uses are merged into a single partition while
splittable uses are broken apart and carried from one partition to the
next. A partition is also introduced to bridge splittable uses between
the unsplittable regions when necessary.

I've looked at the performance PRs fairly closely. PR15471 no longer
will even load (the module is invalid). Not sure what is up there.
PR15412 improves by between 5% and 10%, however it is nearly impossible
to know what is holding it up as SROA (the entire pass) takes less time
than reading the IR for that test case. The analysis takes the same time
as running mem2reg on the final allocas. I suspect (without much
evidence) that the new implementation will scale much better however,
and it is just the small nature of the test cases that makes the changes
small and noisy. Either way, it is still simpler and cleaner I think.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186316 91177308-0d34-0410-b5e6-96231b3b80d8
2013-07-15 10:30:19 +00:00
..
ADCE.cpp Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size. 2013-07-04 01:31:24 +00:00
BasicBlockPlacement.cpp
CMakeLists.txt Remove the simplify-libcalls pass (finally) 2013-06-20 19:48:07 +00:00
CodeGenPrepare.cpp Remove a useless declarations (found by scan-build) 2013-07-05 15:58:12 +00:00
ConstantProp.cpp
CorrelatedValuePropagation.cpp
DCE.cpp
DeadStoreElimination.cpp
EarlyCSE.cpp
GlobalMerge.cpp Access the TargetLoweringInfo from the TargetMachine object instead of caching it. The TLI may change between functions. No functionality change. 2013-06-19 21:07:11 +00:00
GVN.cpp Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size. 2013-07-04 01:31:24 +00:00
IndVarSimplify.cpp LFTR improvement to avoid truncation. 2013-07-12 22:08:48 +00:00
JumpThreading.cpp
LICM.cpp
LLVMBuild.txt
LoopDeletion.cpp Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size. 2013-07-14 04:42:23 +00:00
LoopIdiomRecognize.cpp
LoopInstSimplify.cpp
LoopRotation.cpp
LoopStrengthReduce.cpp
LoopUnrollPass.cpp
LoopUnswitch.cpp Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size. 2013-07-14 04:42:23 +00:00
LowerAtomic.cpp
Makefile
MemCpyOptimizer.cpp Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size. 2013-07-03 15:07:05 +00:00
Reassociate.cpp Reassociate: Remove unnecessary default operator=. 2013-07-06 15:10:13 +00:00
Reg2Mem.cpp
Scalar.cpp Remove the simplify-libcalls pass (finally) 2013-06-20 19:48:07 +00:00
ScalarReplAggregates.cpp Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size. 2013-07-14 04:42:23 +00:00
SCCP.cpp Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size. 2013-07-14 04:42:23 +00:00
SimplifyCFGPass.cpp
Sink.cpp
SROA.cpp Reimplement SROA yet again. Same fundamental principle, but a totally 2013-07-15 10:30:19 +00:00
StructurizeCFG.cpp
TailRecursionElimination.cpp Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size. 2013-07-14 04:42:23 +00:00