flags. They are still not enable in this revision.
Added TargetInstrInfo::isZeroCost() to fix a fundamental problem with
the scheduler's model of operand latency in the selection DAG.
Generalized unit tests to work with sched-cycles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123969 91177308-0d34-0410-b5e6-96231b3b80d8
TargetInstrInfo:
Change produceSameValue() to take MachineRegisterInfo as an optional argument.
When in SSA form, targets can use it to make more aggressive equality analysis.
Machine LICM:
1. Eliminate isLoadFromConstantMemory, use MI.isInvariantLoad instead.
2. Fix a bug which prevent CSE of instructions which are not re-materializable.
3. Use improved form of produceSameValue.
ARM:
1. Teach ARM produceSameValue to look pass some PIC labels.
2. Look for operands from different loads of different constant pool entries
which have same values.
3. Re-implement PIC GA materialization using movw + movt. Combine the pair with
a "add pc" or "ldr [pc]" to form pseudo instructions. This makes it possible
to re-materialize the instruction, allow machine LICM to hoist the set of
instructions out of the loop and make it possible to CSE them. It's a bit
hacky, but it significantly improve code quality.
4. Some minor bug fixes as well.
With the fixes, using movw + movt to materialize GAs significantly outperform the
load from constantpool method. 186.crafty and 255.vortex improved > 20%, 254.gap
and 176.gcc ~10%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123905 91177308-0d34-0410-b5e6-96231b3b80d8
checks enabled:
1) Use '<' to compare integers in a comparison function rather than '<='.
2) Use the uniqued set DefBlocks rather than Info.DefiningBlocks to initialize
the priority queue.
The speedup of scalarrepl on test-suite + SPEC2000 + SPEC2006 is a bit less, at
just under 16% rather than 17%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123662 91177308-0d34-0410-b5e6-96231b3b80d8
eliminating a potentially quadratic data structure, this also gives a 17%
speedup when running -scalarrepl on test-suite + SPEC2000 + SPEC2006. My initial
experiment gave a greater speedup around 25%, but I moved the dominator tree
level computation from dominator tree construction to PromoteMemToReg.
Since this approach to computing IDFs has a much lower overhead than the old
code using precomputed DFs, it is worth looking at using this new code for the
second scalarrepl pass as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123609 91177308-0d34-0410-b5e6-96231b3b80d8
half a million non-local queries, each of which would otherwise have triggered a
linear scan over a basic block.
Also fix a fixme for memory intrinsics which dereference pointers. With this,
we prove that a pointer is non-null because it was dereferenced by an intrinsic
112 times in llvm-test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123533 91177308-0d34-0410-b5e6-96231b3b80d8
these would try hard to match constants by inverting the bits
and recursively matching. There are two problems with this:
1) some patterns would match when we didn't want them to (theoretical)
2) this is insanely expensive to do, and most often pointless.
This was apparently useful in just 2 instcombine cases, which I
added code to handle explicitly. This change speeds up 'opt'
time on 176.gcc by 1% and produces bitwise identical code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123518 91177308-0d34-0410-b5e6-96231b3b80d8
early in the cleanup code and one late interlaced with the inliner. The second one is
important because inlining and other scalar optzns can unpin allocas, allowing them to
be split up and promoted. While important for performance, this is also relatively
rare, and we would previously force a (non-lazy) computation of DomFrontiers, which
happened even if nothing became unpinned.
With this patch, the first pass of scalarrepl still promotes the vast bulk of allocas
in programs, but hte second pass has changed to use SSAUpdater, which is more "sparse"
and lazy. This speeds up opt -O3 time on kimwitu++ (a c++ app) by about 1%. The
numbers are interesting: the first pass promotes ~17500 allocas. The second pass
promotes about 1600. For non-C++ codes, the compile time win should be greater,
because the second pass of scalarrepl does less.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123437 91177308-0d34-0410-b5e6-96231b3b80d8
- Fixed :upper16: fix up routine. It should be shifting down the top 16 bits first.
- Added support for Thumb2 :lower16: and :upper16: fix up.
- Added :upper16: and :lower16: relocation support to mach-o object writer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123424 91177308-0d34-0410-b5e6-96231b3b80d8
most important simplifications, as well as resolving phase ordering issues where instcombine
would inhibit important CSE'ing opportunities, for instance on BitBench/drop3.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123418 91177308-0d34-0410-b5e6-96231b3b80d8
While there, I noticed that the transform "undef >>a X -> undef" was wrong.
For example if X is 2 then the top two bits must be equal, so the result can
not be anything. I fixed this in the constant folder as well. Also, I made
the transform for "X << undef" stronger: it now folds to undef always, even
though X might be zero. This is in accordance with the LangRef, but I must
admit that it is fairly aggressive. Also, I added "i32 X << 32 -> undef"
following the LangRef and the constant folder, likewise fairly aggressive.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123417 91177308-0d34-0410-b5e6-96231b3b80d8
Add methods for accessing the (single) entry / exit edge of a region. If no such
edge exists, null is returned. Both accessors return the start block of the
corresponding edge. The edge can finally be formed by utilizing
Region::getEntry() or Region::getExit();
Contributed by: Andreas Simbuerger <simbuerg@fim.uni-passau.de>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123410 91177308-0d34-0410-b5e6-96231b3b80d8
in the right direction. It eliminated some hacks and will unblock codegen
work. But it's far from being done. It doesn't reject illegal expressions,
e.g. (FOO - :lower16:BAR). It also doesn't work in Thumb2 mode at all.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123369 91177308-0d34-0410-b5e6-96231b3b80d8
"this" pointer for any subclass of User, you could static_cast it to
User* and then reinterpret_cast that to Use* to get the end of the
operand list. This isn't a safe assumption in general, because the
static_cast might adjust the "this" pointer. Fixed by having these
OperandTraits classes take an extra template parameter, which is the
subclass of User. This is groundwork for PR889.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123235 91177308-0d34-0410-b5e6-96231b3b80d8
phi nodes. It is called from MergeBlockIntoPredecessor which is
called from GVN, which claims to preserve these.
I'm skeptical that this is the actual problem behind PR8954, but
this is a stab in the right direction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123222 91177308-0d34-0410-b5e6-96231b3b80d8
These functions not longer assert when passed 0, but simply return false instead.
No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123155 91177308-0d34-0410-b5e6-96231b3b80d8
Fix the TargetRegisterInfo::NoRegister places where someone preferred
typing 'TargetRegisterInfo::NoRegister' instead of typing '0'.
Note that TableGen is already emitting xx::NoRegister in xxGenRegisterNames.inc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123140 91177308-0d34-0410-b5e6-96231b3b80d8
The numbering plan is now:
0 NoRegister.
[1;2^30) Physical registers.
[2^30;2^31) Stack slots.
[2^31;2^32) Virtual registers. (With -1u and -2u used by DenseMapInfo.)
Each segment is filled from the left, so any mistaken interpretation should
quickly cause crashes.
FirstVirtualRegister has been removed. TargetRegisterInfo provides predicates
conversion functions that should be used instead of interpreting register
numbers manually.
It is now legal to pass NoRegister to isPhysicalRegister() and
isVirtualRegister(). The result is false in both cases.
It is quite rare to represent stack slots in this way, so isPhysicalRegister()
and isVirtualRegister() require that isStackSlot() be checked first if it can
possibly return true. This allows a very fast implementation of the common
predicates.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123137 91177308-0d34-0410-b5e6-96231b3b80d8
void f(int* begin, int* end) { std::fill(begin, end, 0); }
which turns into a != exit expression where one pointer is
strided and (thanks to step #1) known to not overflow, and
the other is loop invariant.
The observation here is that, though the IV is strided by
4 in this case, that the IV *has* to become equal to the
end value. It cannot "miss" the end value by stepping over
it, because if it did, the strided IV expression would
eventually wrap around.
Handle this by turning A != B into "A-B != 0" where the A-B
part is known to be NUW.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123131 91177308-0d34-0410-b5e6-96231b3b80d8
when no virtual registers have been allocated.
It was only used to resize IndexedMaps, so provide an IndexedMap::resize()
method such that
Map.grow(MRI.getLastVirtReg());
can be replaced with the simpler
Map.resize(MRI.getNumVirtRegs());
This works correctly when no virtuals are allocated, and it bypasses the to/from
index conversions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123130 91177308-0d34-0410-b5e6-96231b3b80d8
physical register numbers.
This makes the hack used in LiveInterval official, and lets LiveInterval be
oblivious of stack slots.
The isPhysicalRegister() and isVirtualRegister() predicates don't know about
this, so when a variable may contain a stack slot, isStackSlot() should always
be tested first.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123128 91177308-0d34-0410-b5e6-96231b3b80d8
config.h was generated, so it had no effect on it.
Thanks to arrowdodger for pointing out this and a tentative patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123119 91177308-0d34-0410-b5e6-96231b3b80d8
of using a Location class with the same information.
When making a copy of a MachineOperand that was already stored in a
MachineInstr, it is necessary to clear the parent pointer on the copy. Otherwise
the register use-def lists become inconsistent.
Add MachineOperand::clearParent() to do that. An alternative would be a custom
MachineOperand copy constructor that cleared ParentMI. I didn't want to do that
because of the performance impact.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123109 91177308-0d34-0410-b5e6-96231b3b80d8
Print virtual registers numbered from 0 instead of the arbitrary
FirstVirtualRegister. The first virtual register is printed as %vreg0.
TRI::NoRegister is printed as %noreg.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123107 91177308-0d34-0410-b5e6-96231b3b80d8
depending on TRI::FirstVirtualRegister.
Also use TRI::printReg instead of printing virtual registers directly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123101 91177308-0d34-0410-b5e6-96231b3b80d8
Provide MRI::getNumVirtRegs() and TRI::index2VirtReg() functions to allow
iteration over virtual registers without depending on the representation of
virtual register numbers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123098 91177308-0d34-0410-b5e6-96231b3b80d8
Add a unnamed_addr bit to global variables and functions. This will be used
to indicate that the address is not significant and therefore the constant
or function can be merged with others.
If an optimization pass can show that an address is not used, it can set this.
Examples of things that can have this set by the FE are globals created to
hold string literals and C++ constructors.
Adding unnamed_addr to a non-const global should have no effect unless
an optimization can transform that global into a constant.
Aliases are not allowed to have unnamed_addr since I couldn't figure
out any use for it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123063 91177308-0d34-0410-b5e6-96231b3b80d8
1. Take a flags argument instead of a bool. This makes
it more clear to the reader what it is used for.
2. Add a flag that says that "remapping a value not in the
map is ok".
3. Reimplement MapValue to share a bunch of code and be a lot
more efficient. For lookup failures, don't drop null values
into the map.
4. Using the new flag a bunch of code can vaporize in LinkModules
and LoopUnswitch, kill it.
No functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123058 91177308-0d34-0410-b5e6-96231b3b80d8
Instead encode llvm IR level property "HasSideEffects" in an operand (shared
with IsAlignStack). Added MachineInstrs::hasUnmodeledSideEffects() to check
the operand when the instruction is an INLINEASM.
This allows memory instructions to be moved around INLINEASM instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123044 91177308-0d34-0410-b5e6-96231b3b80d8
This means avoid using uint32_t. This patch reverts r112200 and fixes original problem by fixing argument type in lto.cpp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@123038 91177308-0d34-0410-b5e6-96231b3b80d8
Also fix an off-by-one in SelectionDAGBuilder that was preventing shuffle
vectors from being translated to EXTRACT_SUBVECTOR.
Patch by Tim Northover.
The test changes are needed to keep those spill-q tests from testing aligned
spills and restores. If the only aligned stack objects are spill slots, we
no longer realign the stack frame. Prior to this patch, an EXTRACT_SUBVECTOR
was legalized by loading from the stack, which created an aligned frame index.
Now, however, there is nothing except the spill slot in the stack frame, so
I added an aligned alloca.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122995 91177308-0d34-0410-b5e6-96231b3b80d8
We were never generating any of these nodes with variable indices, and there
was one legalizer function asserting on a non-constant index. If we ever have
a need to support variable indices, we can add this back again.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122993 91177308-0d34-0410-b5e6-96231b3b80d8
etc. takes an option OptSize. If OptSize is true, it would return
the inline limit for functions with attribute OptSize.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122952 91177308-0d34-0410-b5e6-96231b3b80d8
This pass precomputes CFG block frequency information that can be used by the
register allocator to find optimal spill code placement.
Given an interference pattern, placeSpills() will compute which basic blocks
should have the current variable enter or exit in a register, and which blocks
prefer the stack.
The algorithm is ready to consume block frequencies from profiling data, but for
now it gets by with the static estimates used for spill weights.
This is a work in progress and still not hooked up to RegAllocGreedy.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122938 91177308-0d34-0410-b5e6-96231b3b80d8
My i386 llvm-gcc nightly tester found a regression for
SingleSource/Benchmarks/McGill/chomp that a bisect blamed on 122743.
That seems strange but apparently the combination of earlycse and instcombine
did something bad. Chris says he intended to remove the instcombine pass, so
let's go ahead and try that. We'll see if there are any performance losses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122907 91177308-0d34-0410-b5e6-96231b3b80d8
It forms memset and memcpy's, and will someday form popcount and
other stuff. All of this is bad when compiling the implementation
of memset, memcpy, popcount, etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122854 91177308-0d34-0410-b5e6-96231b3b80d8
The analysis will be needed by both the greedy register allocator and the
X86FloatingPoint pass. It only needs to be computed once when the CFG doesn't
change.
This pass is very fast, usually showing up as 0.0% wall time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122832 91177308-0d34-0410-b5e6-96231b3b80d8
a pointer value has potentially become escaping. Implementations can choose to either fall back to
conservative responses for that value, or may recompute their analysis to accomodate the change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122777 91177308-0d34-0410-b5e6-96231b3b80d8
improvement in the generated code, and speeds up 'opt -std-compile-opts'
compile time on 176.gcc from 24.84s to 23.2s (about 7%).
This also resolves a specific code quality issue in rdar://7352081 which
was generating poor code for:
int t(int a, int b) {
if (a & b & 1)
return a & b;
return 3;
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122740 91177308-0d34-0410-b5e6-96231b3b80d8
update a callGraph when performing the common operation of splicing the body to
a new function and updating all callers (such as via RAUW).
No users yet, though this is intended for DeadArgumentElimination as part of
PR8887.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122728 91177308-0d34-0410-b5e6-96231b3b80d8
of instcombine that is currently in the middle of the loop pass pipeline. This
commit only checks in the pass; it will hopefully be enabled by default later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122719 91177308-0d34-0410-b5e6-96231b3b80d8
compile, and everyone's tests have shown it to be slower in practice, even for
quite large graphs.
I also hope to do an optimization that is only correct with the simpler data
structure, which would break this even further.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122684 91177308-0d34-0410-b5e6-96231b3b80d8
naively implemented, the Lengauer-Tarjan algorithm requires a separate bucket
for each vertex. However, this is unnecessary, because each vertex is only
placed into a single bucket (that of its semidominator), and each vertex's
bucket is processed before it is added to any bucket itself.
Instead of using a bucket per vertex, we use a single array Buckets that has two
purposes. Before the vertex V with DFS number i is processed, Buckets[i] stores
the index of the first element in V's bucket. After V's bucket is processed,
Buckets[i] stores the index of the next element in the bucket to which V now
belongs, if any.
Reading from the buckets can also be optimized. Instead of processing the bucket
of V's parent at the end of processing V, we process the bucket of V itself at
the beginning of processing V. This means that the case of the root vertex can
be simplified somewhat. It also means that we don't need to look up the DFS
number of the semidominator of every node in the bucket we are processing,
since we know it is the current index being processed.
This is a 6.5% speedup running -domtree on test-suite + SPEC2000/2006, with
larger speedups of around 12% on the larger benchmarks like GCC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122680 91177308-0d34-0410-b5e6-96231b3b80d8
limitations, this kicks in dozens of times in the 4 specfp2000 benchmarks,
and hundreds of times in the int part. It also kicks in hundreds of times
in multisource.
This kicks in right before loop deletion, which has the pleasant effect of
deleting loops that *just* do a memset.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122664 91177308-0d34-0410-b5e6-96231b3b80d8
DAG scheduling during isel. Most new functionality is currently
guarded by -enable-sched-cycles and -enable-sched-hazard.
Added InstrItineraryData::IssueWidth field, currently derived from
ARM itineraries, but could be initialized differently on other targets.
Added ScheduleHazardRecognizer::MaxLookAhead to indicate whether it is
active, and if so how many cycles of state it holds.
Added SchedulingPriorityQueue::HasReadyFilter to allowing gating entry
into the scheduler's available queue.
ScoreboardHazardRecognizer now accesses the ScheduleDAG in order to
get information about it's SUnits, provides RecedeCycle for bottom-up
scheduling, correctly computes scoreboard depth, tracks IssueCount, and
considers potential stall cycles when checking for hazards.
ScheduleDAGRRList now models machine cycles and hazards (under
flags). It tracks MinAvailableCycle, drives the hazard recognizer and
priority queue's ready filter, manages a new PendingQueue, properly
accounts for stall cycles, etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122541 91177308-0d34-0410-b5e6-96231b3b80d8
section.
This helps because in practice sections form a dag with debug sections pointing
to text sections. Finishing up the text sections first makes the debug section
relaxation trivial.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122314 91177308-0d34-0410-b5e6-96231b3b80d8
This implementation already exists as ConnectedVNInfoEqClasses in
LiveInterval.cpp, and it seems to be generally useful to have a light-weight way
of forming equivalence classes of small integers.
IntEqClasses doesn't allow enumeration of the elements in a class.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122293 91177308-0d34-0410-b5e6-96231b3b80d8
it could only be tested indirectly, via instcombine, gvn or some other
pass that makes use of InstructionSimplify, which means that testcases
had to be carefully contrived to dance around any other transformations
that that pass did.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122264 91177308-0d34-0410-b5e6-96231b3b80d8
createMachineVerifierPass and MachineFunction::verify.
The banner is printed before the machine code dump, just like the printer pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122113 91177308-0d34-0410-b5e6-96231b3b80d8
The heuristics split around the largest loop where the current register may be
allocated without interference.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122106 91177308-0d34-0410-b5e6-96231b3b80d8
may be called. If the entry block is empty, the insertion point iterator will be
the "end()" value. Calling ->getParent() on it (among others) causes problems.
Modify materializeFrameBaseRegister to take the machine basic block and insert
the frame base register at the beginning of that block. (It's very similar to
what the code does all ready. The only difference is that it will always insert
at the beginning of the entry block instead of after a previous materialization
of the frame base register. I doubt that that matters here.)
<rdar://problem/8782198>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122104 91177308-0d34-0410-b5e6-96231b3b80d8
moves the iterator to end(), and it is valid to call it on end().
That means it is valid to call advanceTo() with any monotonic key sequence.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122092 91177308-0d34-0410-b5e6-96231b3b80d8
IsSymbolRefDifferenceFullyResolved, it turns out this does change behavior on
enough cases for x86-32 that I would rather wait a bit on it.
- In practice, we will want to change this eventually because it only means we
generate less relocations (it also eliminates the need for the horrible
'.set' hack that Darwin requires in some places).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122042 91177308-0d34-0410-b5e6-96231b3b80d8
This is a three-way interval list intersection between a virtual register, a
live interval union, and a loop. It will be used to identify interference-free
loops for live range splitting.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@122034 91177308-0d34-0410-b5e6-96231b3b80d8
the MCCodeEmitter, which seems like a better organization.
- Also, cleaned up some magic constants while in the area.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121953 91177308-0d34-0410-b5e6-96231b3b80d8
IntervalMaps.
The IntervalMaps can have different template parameters, but the KeyT and Traits
types must be the same.
Tests are forthcoming.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121935 91177308-0d34-0410-b5e6-96231b3b80d8
A MachineLoopRange contains the intervals of slot indexes covered by the blocks
in a loop. This representation of the loop blocks is more efficient to compare
against interfering registers during register coalescing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121917 91177308-0d34-0410-b5e6-96231b3b80d8
Clang is now providing intrinsics for these and so we need to support them
in the backend. Radar 8068427.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121902 91177308-0d34-0410-b5e6-96231b3b80d8
attributes "interrupt_handle" and "save_volatiles". Support for lowering these
correctly will be in an upcoming commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121888 91177308-0d34-0410-b5e6-96231b3b80d8
With this we don't need the EffectiveSize field anymore. Without that field
LayoutFragment only updates offsets and we don't need to invalidate the
current fragment when it is relaxed (only the ones following it).
This is also a very small improvement in the accuracy of the layout info as
we now use the after relaxation size immediately.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121857 91177308-0d34-0410-b5e6-96231b3b80d8
registers that alias Reg, including itself. This is almost the same as the
existing getAliasSet() method, except for the inclusion of Reg.
The name matches the reflexive TRI::regsOverlap(x, y) relation.
It is very common to do stuff to a register and all its aliases:
stuff(Reg)
for (const unsigned *Alias = TRI->getAliasSet(Reg); *Alias; ++Alias)
stuff(*Alias);
That can now be written as the simpler:
for (const unsigned *Alias = TRI->getOverlaps(Reg); *Alias; ++Alias)
stuff(*Alias);
This change requires a bit more constant space for the alias lists because Reg
is included and because the empty alias list cannot be shared any longer.
If the getAliasSet method is eventually removed, this space can be reclaimed by
sharing overlap lists. For instance, %rax and %eax have identical overlap sets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121800 91177308-0d34-0410-b5e6-96231b3b80d8
AliasAnalysis consumers, PartialAlias will be treated as MayAlias.
For AliasAnalysis chaining, MayAlias says "procede to the next analysis".
PartialAlias will be used to indicate that the query should terminate,
even though it didn't reach MustAlias or NoAlias.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121507 91177308-0d34-0410-b5e6-96231b3b80d8
the offset. Add a new fixup flag to represent this, and use it for the one fixups that I have a testcase for needing
this. It's quite likely that the other Thumb fixups will need this too, and to have their fixup encoding logic
adjusted accordingly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121408 91177308-0d34-0410-b5e6-96231b3b80d8
both forward and backward scheduling. Rename it to
ScoreboardHazardRecognizer (Scoreboard is one word). Remove integer
division from the scoreboard's critical path.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121274 91177308-0d34-0410-b5e6-96231b3b80d8
This new register allocator is initially identical to RegAllocBasic, but it will
receive all of the tricks that RegAllocBasic won't get.
RegAllocGreedy will eventually replace linear scan.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121234 91177308-0d34-0410-b5e6-96231b3b80d8
before:
4 assembler - Number of assembler layout and relaxation steps
78563 assembler - Number of emitted assembler fragments
8693904 assembler - Number of emitted object file bytes
271223 assembler - Number of evaluated fixups
330771677 assembler - Number of fragment layouts
5958 assembler - Number of relaxed instructions
2508361 mcexpr - Number of MCExpr evaluations
real 0m26.123s
user 0m25.694s
sys 0m0.388s
after:
4 assembler - Number of assembler layout and relaxation steps
78563 assembler - Number of emitted assembler fragments
8693904 assembler - Number of emitted object file bytes
271223 assembler - Number of evaluated fixups
231507 assembler - Number of fragment layouts
5958 assembler - Number of relaxed instructions
2508361 mcexpr - Number of MCExpr evaluations
real 0m2.500s
user 0m2.113s
sys 0m0.273s
And yes, the outputs are identical :-)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121207 91177308-0d34-0410-b5e6-96231b3b80d8
zextOrTrunc(), and APSInt methods extend(), extOrTrunc() and new method
trunc(), to be const and to return a new value instead of modifying the
object in place.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121120 91177308-0d34-0410-b5e6-96231b3b80d8
namespace. None of them return anything except for success anyway. These will be
converted to returning their result soon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121109 91177308-0d34-0410-b5e6-96231b3b80d8
actuall addresses in a .o file, so it is better to let the MachO writer compute
it.
This is good for two reasons. First, areas that shouldn't care about
addresses now don't have access to it. Second, the layout of each section
is independent. I should use this in a subsequent commit to speed it up.
Most of the patch is just removing the section address computation. The two
interesting parts are the change on how we handle padding in the end
of sections and how MachO can get the address of a-b when a and b are in
different sections.
Since now the expression evaluation normally doesn't know the section address,
it will think that a-b needs relocation and let the MachO writer know. Once
it has computed the section addresses, it calls back the expression evaluation
with the section addresses to resolve these expressions.
The remaining problem is the handling of padding. Currently it will create
a special alignment fragment at the end. Since that fragment doesn't update
the alignment of the section, it needs the real address to be computed.
Since now the layout will not compute a-b with a and b in different sections,
the only effect that the special alignment fragment has is update the
address size of the section. This can also be done by the MachO writer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121076 91177308-0d34-0410-b5e6-96231b3b80d8
as llc + llvm-mc. This time ELF is not changed and I tested that llvm-gcc
bootstrap on darwin10 using darwin9's assembler and linker.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@121006 91177308-0d34-0410-b5e6-96231b3b80d8
memcpy's like:
memcpy(A, B)
memcpy(A, C)
we cannot delete the first memcpy as dead if A and C might be aliases.
If so, we actually get:
memcpy(A, B)
memcpy(A, A)
which is not correct to transform into:
memcpy(A, A)
This patch was heavily influenced by Jakub Staszak's patch in PR8728, thanks
Jakub!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120974 91177308-0d34-0410-b5e6-96231b3b80d8
foo = a - b
.long foo
instead of just
.long a - b
First, on darwin9 64 bits the assembler produces the wrong result. Second,
if "a" is the end of the section all darwin assemblers (9, 10 and mc) will not
consider a - b to be a constant but will if the dummy foo is created.
Split how we handle these cases. The first one is something MC should take care
of. The second one has to be handled by the caller.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120889 91177308-0d34-0410-b5e6-96231b3b80d8
doing that if the target is darwin10 or newer.
This fixes
*) Direct object emission was producing objects without the workaround on
darwin9.
*) Assembly printing was producing objects with the workaround on linux.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120866 91177308-0d34-0410-b5e6-96231b3b80d8
editing of the current interval.
These methods may cause coalescing, there are corresponding set*Unchecked
methods for editing without coalescing. The non-coalescing methods are useful
for applying monotonic transforms to all keys or values in a map without
accidentally coalescing transformed and untransformed intervals.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120829 91177308-0d34-0410-b5e6-96231b3b80d8
contain only data. Handle them specially instead of using AddSectionToTheEnd.
This moves a hack from the generic assembler to the elf writer. It is also
a bit faster and should make other improvements easier.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120683 91177308-0d34-0410-b5e6-96231b3b80d8
Scan the MachineFunction for DBG_VALUE instructions, and replace them with a
data structure similar to LiveIntervals. The live range of a DBG_VALUE is
determined by propagating it down the dominator tree until a new DBG_VALUE is
found. When a DBG_VALUE lives in a register, its live range is confined to the
live range of the register's value.
LiveDebugVariables runs before coalescing, so DBG_VALUEs are not artificially
extended when registers are joined.
The missing half will recreate DBG_VALUE instructions from the intervals when
register allocation is complete.
The pass is disabled by default. It can be enabled with the temporary command
line option -live-debug-variables.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120636 91177308-0d34-0410-b5e6-96231b3b80d8
legalization time. Since at legalization time there is no mapping from
SDNode back to the corresponding LLVM instruction and the return
SDNode is target specific, this requires a target hook to check for
eligibility. Only x86 and ARM support this form of sibcall optimization
right now.
rdar://8707777
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120501 91177308-0d34-0410-b5e6-96231b3b80d8
may-aliasing stores that partially overlap with different base
pointers. This implements PR6043 and the non-variable part of
PR8657
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120485 91177308-0d34-0410-b5e6-96231b3b80d8
- Use a DenseSet instead of a FoldingSet to cache
canonicalized nodes. This reduces the overhead
of double-hashing.
- Use reference counts in ImutAVLTree to much
more aggressively recover tree nodes that are
no longer usable. We can generate many
transient nodes while using add() and remove()
on ImmutableSet/ImmutableMaps to generate a final
set/map.
For the clang static analyzer (the main client
of these data structures), this results in
a slight speedup (0.5%) when analyzing sqlite3,
but much more importantly results in a 30-60%
reduction in peak memory usage when the analyzer
is analyzing a given function in a file. On
average that's about a ** 44% reduction ** in the
memory footprint of the static analyzer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120459 91177308-0d34-0410-b5e6-96231b3b80d8
about pairs of AA::Location's instead of looking for MemDep's
"Def" predicate. This is more powerful and general, handling
memset/memcpy/store all uniformly, and implementing PR8701 and
probably obsoleting parts of memcpyoptimizer.
This also fixes an obscure bug with init.trampoline and i8
stores, but I'm not surprised it hasn't been hit yet. Enhancing
init.trampoline to carry the size that it stores would allow
DSE to be much more aggressive about optimizing them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120406 91177308-0d34-0410-b5e6-96231b3b80d8
unbreaks test/Transforms/InstCombine/invariant.ll which was broken by r120382.
This is a fix-forward to do what I think Chris intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120388 91177308-0d34-0410-b5e6-96231b3b80d8
This analysis is going to run immediately after LiveIntervals. It will stay
alive during register allocation and keep track of user variables mentioned in
DBG_VALUE instructions.
When the register allocator is moving values between registers and the stack, it
is very hard to keep track of DBG_VALUE instructions. We usually get it wrong.
This analysis maintains a data structure that makes it easy to update DBG_VALUE
instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120385 91177308-0d34-0410-b5e6-96231b3b80d8
is trivially dead, since these have side effects. This makes the
(misnamed) MemoryUseIntrinsic class dead, so remove it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120382 91177308-0d34-0410-b5e6-96231b3b80d8
Sometimes std::copy can become a memmove call, and that is not a good idea when
copying relatively few bytes as we are doing. We also get a small win by
changing two loops into one.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120265 91177308-0d34-0410-b5e6-96231b3b80d8
We always disallowed overlapping inserts with different values, and this makes
the insertion code smaller and faster.
If an overwriting insert is needed, it can be added as a separate method that
trims any existing intervals before inserting. The immediate use cases for
IntervalMap don't need this - they only use disjoint insertions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@120264 91177308-0d34-0410-b5e6-96231b3b80d8