Summary:
Parts of the compiler still believed MSA load/stores have a 16-bit offset when
it is actually 10-bit. Corrected this, and fixed a closely related issue this
uncovered where load/stores with 10-bit and 12-bit offsets (MSA and microMIPS
respectively) could not load/store using offsets from the stack/frame pointer.
They accepted frameindex+offset, but not frameindex by itself.
Reviewers: jacksprat, matheusalmeida
Reviewed By: jacksprat
Differential Revision: http://llvm-reviews.chandlerc.com/D2888
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202717 91177308-0d34-0410-b5e6-96231b3b80d8
operand_values. The first provides a range view over operand Use
objects, and the second provides a range view over the Value*s being
used by those operands.
The naming is "STL-style" rather than "LLVM-style" because we have
historically named iterator methods STL-style, and range methods seem to
have far more in common with their iterator counterparts than with
"normal" APIs. Feel free to bikeshed on this one if you want, I'm happy
to change these around if people feel strongly.
I've switched code in SROA and LCG to exercise these mostly to ensure
they work correctly -- we don't really have an easy way to unittest this
and they're trivial.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202687 91177308-0d34-0410-b5e6-96231b3b80d8
Now that the PowerPC backend can track individual CR bits as first-class
registers, we should also have a way of allocating them for inline asm
statements. Because these registers are only one bit, if an output variable is
implicitly cast to a larger integer size, we'll get an any_extend to that
larger type (this is part of the existing target-independent logic). As a
result, regardless of the size of the output type, only the first bit is
meaningful.
The constraint identifier "wc" has been chosen for this purpose. Although gcc
does not currently support allocating individual CR bits, this identifier
choice has been coordinated with the gcc PowerPC team, and will be marked as
reserved for this purpose in the gcc constraints.md file.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202657 91177308-0d34-0410-b5e6-96231b3b80d8
This generalizes the code to eliminate extra truncs/exts around i1 bit
operations to also do the same on PPC64 for i32 bit operations. This eliminates
a fairly prevalent code wart:
int foo(int a) {
return a == 5 ? 7 : 8;
}
On PPC64, because of the extension implied by the ABI, this would generate:
cmplwi 0, 3, 5
li 12, 8
li 4, 7
isel 3, 4, 12, 2
rldicl 3, 3, 0, 32
blr
where the 'rldicl 3, 3, 0, 32', the extension, is completely unnecessary. At
least for the single-BB case (which is all that the DAG combine mechanism can
handle), this unnecessary extension is no longer generated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202600 91177308-0d34-0410-b5e6-96231b3b80d8
lib/Support/RWMutex.cpp contains an implementation of RWMutex that
uses pthread_rwlock, but when pthread_rwlock is not available (such as
under NaCl, when using newlib), it silently falls back to using the
no-op definition in lib/Support/Unix/RWMutex.inc, which is not
thread-safe.
Fix this case to be thread-safe by using a normal mutex.
Differential Revision: http://llvm-reviews.chandlerc.com/D2892
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202570 91177308-0d34-0410-b5e6-96231b3b80d8
Inside iterate, we scan backwards then scan forwards in a loop. When iteration
is not zero, the last node was just updated so we can skip it. But when
iteration is zero, we can't skip the last node.
For the testing case, fixing this will save a spill and move register copies
from hot path to cold path.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202557 91177308-0d34-0410-b5e6-96231b3b80d8
The previous PBQP solver was very robust but consumed a lot of memory,
performed a lot of redundant computation, and contained some unnecessarily tight
coupling that prevented experimentation with novel solution techniques. This new
solver is an attempt to address these shortcomings.
Important/interesting changes:
1) The domain-independent PBQP solver class, HeuristicSolverImpl, is gone.
It is replaced by a register allocation specific solver, PBQP::RegAlloc::Solver
(see RegAllocSolver.h).
The optimal reduction rules and the backpropagation algorithm have been extracted
into stand-alone functions (see ReductionRules.h), which can be used to build
domain specific PBQP solvers. This provides many more opportunities for
domain-specific knowledge to inform the PBQP solvers' decisions. In theory this
should allow us to generate better solutions. In practice, we can at least test
out ideas now.
As a side benefit, I believe the new solver is more readable than the old one.
2) The solver type is now a template parameter of the PBQP graph.
This allows the graph to notify the solver of any modifications made (e.g. by
domain independent rules) without the overhead of a virtual call. It also allows
the solver to supply policy information to the graph (see below).
3) Significantly reduced memory overhead.
Memory management policy is now an explicit property of the PBQP graph (via
the CostAllocator typedef on the graph's solver template argument). Because PBQP
graphs for register allocation tend to contain many redundant instances of
single values (E.g. the value representing an interference constraint between
GPRs), the new RASolver class uses a uniquing scheme. This massively reduces
memory consumption for large register allocation problems. For example, looking
at the largest interference graph in each of the SPEC2006 benchmarks (the
largest graph will always set the memory consumption high-water mark for PBQP),
the average memory reduction for the PBQP costs was 400x. That's times, not
percent. The highest was 1400x. Yikes. So - this is fixed.
"PBQP: No longer feasting upon every last byte of your RAM".
Minor details:
- Fully C++11'd. Never copy-construct another vector/matrix!
- Cute tricks with cost metadata: Metadata that is derived solely from cost
matrices/vectors is attached directly to the cost instances themselves. That way
if you unique the costs you never have to recompute the metadata. 400x less
memory means 400x less cost metadata (re)computation.
Special thanks to Arnaud de Grandmaison, who has been the source of much
encouragement, and of many very useful test cases.
This new solver forms the basis for future work, of which there's plenty to do.
I will be adding TODO notes shortly.
- Lang.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202551 91177308-0d34-0410-b5e6-96231b3b80d8