A value that is live in to a basic block should be returned by valueIn()
in LiveRangeQuery(getMBBStartIdx(MBB)), unless it is a PHI-def which
should be returned by valueDefined() instead.
Current code isn't using this functionality. Future code will.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163990 91177308-0d34-0410-b5e6-96231b3b80d8
If a PHI value happens to be live out from the layout predecessor of its
def block, the def slot index will be in the middle of the segment:
%vreg11 = [192r,240B:0)[352r,416B:2)[416B,496r:1) 0@192r 1@480B-phi %2@352r
A LiveRangeQuery for 480 should return NULL from valueIn() since the
PHI value is defined at the block entry, not live in to the block.
No test case, future code depends on this functionality.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163971 91177308-0d34-0410-b5e6-96231b3b80d8
new one, and add support for running the new pass in that mode and in
that slot of the pass manager. With this the new pass can completely
replace the old one within the pipeline.
The strategy for enabling or disabling the SSAUpdater logic is to do it
by making the requirement of the domtree analysis optional. By default,
it is required and we get the standard mem2reg approach. This is usually
the desired strategy when run in stand-alone situations. Within the
CGSCC pass manager, we disable requiring of the domtree analysis and
consequentially trigger fallback to the SSAUpdater promotion.
In theory this would allow the pass to re-use a domtree if one happened
to be available even when run in a mode that doesn't require it. In
practice, it lets us have a single pass rather than two which was
simpler for me to wrap my head around.
There is a hidden flag to force the use of the SSAUpdater code path for
the purpose of testing. The primary testing strategy is just to run the
existing tests through that path. One notable difference is that it has
custom code to handle lifetime markers, and one of the tests has been
enhanced to exercise that code.
This has survived a bootstrap and the test suite without serious
correctness issues, however my run of the test suite produced *very*
alarming performance numbers. I don't entirely understand or trust them
though, so more investigation is on-going.
To aid my understanding of the performance impact of the new SROA now
that it runs throughout the optimization pipeline, I'm enabling it by
default in this commit, and will disable it again once the LNT bots have
picked up one iteration with it. I want to get those bots (which are
much more stable) to evaluate the impact of the change before I jump to
any conclusions.
NOTE: Several Clang tests will fail because they run -O3 and check the
result's order of output. They'll go back to passing once I disable it
again.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163965 91177308-0d34-0410-b5e6-96231b3b80d8
- The current_pos function is supposed to return all the written bytes, not the
current position of the underlying stream.
- This caused tell() to be broken whenever the underlying stream had buffered
content.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163948 91177308-0d34-0410-b5e6-96231b3b80d8
* wrap code blocks in \code ... \endcode;
* refer to parameter names in paragraphs correctly (\arg is not what most
people want -- it starts a new paragraph);
* use \param instead of \arg to document parameters in order to be consistent
with the rest of the codebase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163902 91177308-0d34-0410-b5e6-96231b3b80d8
This is essentially a ground up re-think of the SROA pass in LLVM. It
was initially inspired by a few problems with the existing pass:
- It is subject to the bane of my existence in optimizations: arbitrary
thresholds.
- It is overly conservative about which constructs can be split and
promoted.
- The vector value replacement aspect is separated from the splitting
logic, missing many opportunities where splitting and vector value
formation can work together.
- The splitting is entirely based around the underlying type of the
alloca, despite this type often having little to do with the reality
of how that memory is used. This is especially prevelant with unions
and base classes where we tail-pack derived members.
- When splitting fails (often due to the thresholds), the vector value
replacement (again because it is separate) can kick in for
preposterous cases where we simply should have split the value. This
results in forming i1024 and i2048 integer "bit vectors" that
tremendously slow down subsequnet IR optimizations (due to large
APInts) and impede the backend's lowering.
The new design takes an approach that fundamentally is not susceptible
to many of these problems. It is the result of a discusison between
myself and Duncan Sands over IRC about how to premptively avoid these
types of problems and how to do SROA in a more principled way. Since
then, it has evolved and grown, but this remains an important aspect: it
fixes real world problems with the SROA process today.
First, the transform of SROA actually has little to do with replacement.
It has more to do with splitting. The goal is to take an aggregate
alloca and form a composition of scalar allocas which can replace it and
will be most suitable to the eventual replacement by scalar SSA values.
The actual replacement is performed by mem2reg (and in the future
SSAUpdater).
The splitting is divided into four phases. The first phase is an
analysis of the uses of the alloca. This phase recursively walks uses,
building up a dense datastructure representing the ranges of the
alloca's memory actually used and checking for uses which inhibit any
aspects of the transform such as the escape of a pointer.
Once we have a mapping of the ranges of the alloca used by individual
operations, we compute a partitioning of the used ranges. Some uses are
inherently splittable (such as memcpy and memset), while scalar uses are
not splittable. The goal is to build a partitioning that has the minimum
number of splits while placing each unsplittable use in its own
partition. Overlapping unsplittable uses belong to the same partition.
This is the target split of the aggregate alloca, and it maximizes the
number of scalar accesses which become accesses to their own alloca and
candidates for promotion.
Third, we re-walk the uses of the alloca and assign each specific memory
access to all the partitions touched so that we have dense use-lists for
each partition.
Finally, we build a new, smaller alloca for each partition and rewrite
each use of that partition to use the new alloca. During this phase the
pass will also work very hard to transform uses of an alloca into a form
suitable for promotion, including forming vector operations, speculating
loads throguh PHI nodes and selects, etc.
After splitting is complete, each newly refined alloca that is
a candidate for promotion to a scalar SSA value is run through mem2reg.
There are lots of reasonably detailed comments in the source code about
the design and algorithms, and I'm going to be trying to improve them in
subsequent commits to ensure this is well documented, as the new pass is
in many ways more complex than the old one.
Some of this is still a WIP, but the current state is reasonbly stable.
It has passed bootstrap, the nightly test suite, and Duncan has run it
successfully through the ACATS and DragonEgg test suites. That said, it
remains behind a default-off flag until the last few pieces are in
place, and full testing can be done.
Specific areas I'm looking at next:
- Improved comments and some code cleanup from reviews.
- SSAUpdater and enabling this pass inside the CGSCC pass manager.
- Some datastructure tuning and compile-time measurements.
- More aggressive FCA splitting and vector formation.
Many thanks to Duncan Sands for the thorough final review, as well as
Benjamin Kramer for lots of review during the process of writing this
pass, and Daniel Berlin for reviewing the data structures and algorithms
and general theory of the pass. Also, several other people on IRC, over
lunch tables, etc for lots of feedback and advice.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163883 91177308-0d34-0410-b5e6-96231b3b80d8
This is mostly documentation for the new machine model. It is designed
to be flexible, easy to incrementally refine for a subtarget, and
provide all the information that MachineScheduler will need.
If all goes well, I will follow up with an example of the new model in
use for ARM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163877 91177308-0d34-0410-b5e6-96231b3b80d8
.set a, b - c + CONSTANT
d = b - c + CONSTANT
Both 'a' and 'd' should be marked as absolute symbols (N_ABS).
rdar://12219394
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163853 91177308-0d34-0410-b5e6-96231b3b80d8
* wrap code blocks in \code ... \endcode;
* refer to parameter names in paragraphs correctly (\arg is not what most
people want -- it starts a new paragraph).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163790 91177308-0d34-0410-b5e6-96231b3b80d8
Add some support for dealing with an object pointer on arguments.
Part of rdar://9797999
which now supports adding the object pointer attribute to the
subprogram as it should.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163754 91177308-0d34-0410-b5e6-96231b3b80d8
- BlockAddress has no support of BA + offset form and there is no way to
propagate that offset into machine operand;
- Add BA + offset support and a new interface 'getTargetBlockAddress' to
simplify target block address forming;
- All targets are modified to use new interface and X86 backend is enhanced to
support BA + offset addressing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163743 91177308-0d34-0410-b5e6-96231b3b80d8
The search for liveness is clipped to a specific number of instructions around the target MachineInstr, in order to avoid degenerating into an O(N^2) algorithm. It tries to use various clues about how instructions around (both before and after) a given MachineInstr use that register, to determine its state at the MachineInstr.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163695 91177308-0d34-0410-b5e6-96231b3b80d8
Sub-register lane masks are bitmasks that can be used to determine if
two sub-registers of a virtual register will overlap. For example, ARM's
ssub0 and ssub1 sub-register indices don't overlap each other, but both
overlap dsub0 and qsub0.
The lane masks will be accurate on most targets, but on targets that use
sub-register indexes in an irregular way, the masks may conservatively
report that two sub-register indices overlap when the eventually
allocated physregs don't.
Irregular register banks also mean that the bits in a lane mask can't be
mapped onto register units, but the concept is similar.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163630 91177308-0d34-0410-b5e6-96231b3b80d8
Apparently, NumSubRegIndices was completely unused before. Adjust it by
one to include the null subreg index, just like getNumRegs() includes
the null register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163628 91177308-0d34-0410-b5e6-96231b3b80d8
The Hexagon target decided to use a lot of functionality from the
target-independent scheduler. That's fine, and other targets should be
able to do the same. This reorg and API update makes that easy.
For the record, ScheduleDAGMI was not meant to be subclassed. Instead,
new scheduling algorithms should be able to implement
MachineSchedStrategy and be done. But if need be, it's nice to be
able to extend ScheduleDAGMI, so I also made that easier. The target
scheduler is somewhat more apt to break that way though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163580 91177308-0d34-0410-b5e6-96231b3b80d8
For some reason .lcomm uses byte alignment and .comm log2 alignment so we can't
use the same setting for both. Fix this by reintroducing the LCOMM enum.
I verified this against mingw's gcc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163420 91177308-0d34-0410-b5e6-96231b3b80d8
- Darwin lied about not supporting .lcomm and turned it into zerofill in the
asm parser. Push the zerofill-conversion down into macho-specific code.
- This makes the tri-state LCOMMType enum superfluous, there are no targets
without .lcomm.
- Do proper error reporting when trying to use .lcomm with alignment on a target
that doesn't support it.
- .comm and .lcomm alignment was parsed in bytes on COFF, should be power of 2.
- Fixes PR13755 (.lcomm crashes on ELF).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163395 91177308-0d34-0410-b5e6-96231b3b80d8
- This patch is inspired by the failure of the following code snippet
which is used to convert enumerable values into encoding bits to
improve the readability of td files.
class S<int s> {
bits<2> V = !if(!eq(s, 8), {0, 0},
!if(!eq(s, 16), {0, 1},
!if(!eq(s, 32), {1, 0},
!if(!eq(s, 64), {1, 1}, {?, ?}))));
}
Later, PR8330 is found to report not exactly the same bug relevant
issue to bit/bits values.
- Instead of resolving bit/bits values separately through
resolveBitReference(), this patch adds getBit() for all Inits and
resolves bit value by resolving plus getting the specified bit. This
unifies the resolving of bit with other values and removes redundant
logic for resolving bit only. In addition,
BitsInit::resolveReferences() is optimized to take advantage of this
origanization by resolving VarBitInit's variable reference first and
then getting bits from it.
- The type interference in '!if' operator is revised to support possible
combinations of int and bits/bit in MHS and RHS.
- As there may be illegal assignments from integer value to bit, says
assign 2 to a bit, but we only check this during instantiation in some
cases, e.g.
bit V = !if(!eq(x, 17), 0, 2);
Verbose diagnostic message is generated when invalid value is
resolveed to help locating the error.
- PR8330 is fixed as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163360 91177308-0d34-0410-b5e6-96231b3b80d8
The RegisterCoalescer understands overlapping live ranges where one
register is defined as a copy of the other. With this change, register
allocators using LiveRegMatrix can do the same, at least for copies
between physical and virtual registers.
When a physreg is defined by a copy from a virtreg, allow those live
ranges to overlap:
%CL<def> = COPY %vreg11:sub_8bit; GR32_ABCD:%vreg11
%vreg13<def,tied1> = SAR32rCL %vreg13<tied0>, %CL<imp-use,kill>
We can assign %vreg11 to %ECX, overlapping the live range of %CL.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163336 91177308-0d34-0410-b5e6-96231b3b80d8
We will soon allow virtual register live ranges to overlap regunit live
ranges when the physreg is defined as a copy of the virtreg:
%EAX = COPY %vreg5
FOO %vreg5
BAR %EAX<kill>
There is no real interference since %vreg5 and %EAX have the same value
where they overlap.
This patch prevents addKillFlags from adding virtreg kill flags to FOO
where the assigned physreg is overlapping the virtual register live
range.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163335 91177308-0d34-0410-b5e6-96231b3b80d8
This Operand type takes a default argument, and is initialized to
this value if it does not appear in a patter.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163315 91177308-0d34-0410-b5e6-96231b3b80d8
Make sure to return a pointer into the target memory, not the local memory.
Often they are the same, but we can't assume that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163217 91177308-0d34-0410-b5e6-96231b3b80d8
pointers-to-strong-pointers may be in play. These can lead to retains and
releases happening in unstructured ways, foiling the optimizer. This fixes
rdar://12150909.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163180 91177308-0d34-0410-b5e6-96231b3b80d8
implementation does not co-exist well with how the sideeffect and alignstack
attributes are handled. The reverts r161641.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163174 91177308-0d34-0410-b5e6-96231b3b80d8
The MachineOperand::TiedTo field was maintained, but not used.
This patch enables it in isRegTiedToDefOperand() and
isRegTiedToUseOperand() which are the actual functions use by the
register allocator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163153 91177308-0d34-0410-b5e6-96231b3b80d8
After much agonizing, use a full 4 bits of precious MachineOperand space
to encode this. This uses existing padding, and doesn't grow
MachineOperand beyond its current 32 bytes.
This allows tied defs among the first 15 operands on a normal
instruction, just like the current MCInstrDesc constraint encoding.
Inline assembly needs to be able to tie more than the first 15 operands,
and gets special treatment.
Tied uses can appear beyond 15 operands, as long as they are tied to a
def that's in range.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163151 91177308-0d34-0410-b5e6-96231b3b80d8
- CodeGenPrepare pass for identifying div/rem ops
- Backend specifies the type mapping using addBypassSlowDivType
- Enabled only for Intel Atom with O2 32-bit -> 8-bit
- Replace IDIV with instructions which test its value and use DIVB if the value
is positive and less than 256.
- In the case when the quotient and remainder of a divide are used a DIV
and a REM instruction will be present in the IR. In the non-Atom case
they are both lowered to IDIVs and CSE removes the redundant IDIV instruction,
using the quotient and remainder from the first IDIV. However,
due to this optimization CSE is not able to eliminate redundant
IDIV instructions because they are located in different basic blocks.
This is overcome by calculating both the quotient (DIV) and remainder (REM)
in each basic block that is inserted by the optimization and reusing the result
values when a subsequent DIV or REM instruction uses the same operands.
- Test cases check for the presents of the optimization when calculating
either the quotient, remainder, or both.
Patch by Tyler Nowicki!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163150 91177308-0d34-0410-b5e6-96231b3b80d8
Rationale: For each preprocessor macro, either the definedness is what's
meaningful, or the value is what's meaningful, or both. If definedness is
meaningful, we should use #ifdef. If the value is meaningful, we should use
and #ifdef interchangeably for the same macro, seems ugly to me, even if
undefined macros are zero if used.
This also has the benefit that including an LLVM header doesn't prevent
you from compiling with -Wundef -Werror.
Patch by John Garvin!
<rdar://problem/12189979>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163148 91177308-0d34-0410-b5e6-96231b3b80d8
by instruction address from DWARF.
Add --inlining flag to llvm-dwarfdump to demonstrate and test this functionality,
so that "llvm-dwarfdump --inlining --address=0x..." now works much like
"addr2line -i 0x...", provided that the binary has debug info
(Clang's -gline-tables-only *is* enough).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163128 91177308-0d34-0410-b5e6-96231b3b80d8
the NumMCOperands argument to the GetMCInstOperandNum() function that is set
to the number of MCOperands this asm operand mapped to.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163124 91177308-0d34-0410-b5e6-96231b3b80d8
MatchInstructionImpl() function.
These values are used by the ConvertToMCInst() function to index into the
ConversionTable. The values are also needed to call the GetMCInstOperandNum()
function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163101 91177308-0d34-0410-b5e6-96231b3b80d8
For example, the ARM target does not have efficient ISel handling for vector
selects with scalar conditions. This patch adds a TLI hook which allows the
different targets to report which selects are supported well and which selects
should be converted to CF duting codegen prepare.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163093 91177308-0d34-0410-b5e6-96231b3b80d8
Most of the code guarded with ANDROIDEABI are not
ARM-specific, and having no relation with arm-eabi.
Thus, it will be more natural to call this
environment "Android" instead of "ANDROIDEABI".
Note: We are not using ANDROID because several projects
are using "-DANDROID" as the conditional compilation
flag.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163087 91177308-0d34-0410-b5e6-96231b3b80d8
Manage tied operands entirely internally to MachineInstr. This makes it
possible to change the representation of tied operands, as I will do
shortly.
The constraint that tied uses and defs must be in the same order was too
restrictive.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163021 91177308-0d34-0410-b5e6-96231b3b80d8
- Overloading operator<< for raw_ostream and pointers is dangerous, it alters
the behavior of code that includes the header.
- Remove unused ID.
- Use LLVM's byte swapping helpers instead of a hand-coded.
- Make ReadProfilingData work directly on a pointer.
No functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162992 91177308-0d34-0410-b5e6-96231b3b80d8
Changes the hash result for strings containing characters
with values >= 128, such as UTF8 strings (not normal ASCII).
Changed mostly so we match other implementations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162882 91177308-0d34-0410-b5e6-96231b3b80d8
Ordered memory operations are more constrained than volatile loads and
stores because they must be ordered with respect to all other memory
operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162861 91177308-0d34-0410-b5e6-96231b3b80d8
This means the same as LoadInst/StoreInst::isUnordered(), and implies
!isVolatile().
Atomic loads and stored are also ordered, and this is the right method
to check if it is safe to reorder memory operations. Ordered atomics
can't be reordered wrt normal loads and stores, which is a stronger
constraint than volatile.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162859 91177308-0d34-0410-b5e6-96231b3b80d8
This disables malloc-specific optimization when -fno-builtin (or -ffreestanding)
is specified. This has been a problem for a long time but became more severe
with the recent memory builtin improvements.
Since the memory builtin functions are used everywhere, this required passing
TLI in many places. This means that functions that now have an optional TLI
argument, like RecursivelyDeleteTriviallyDeadFunctions, won't remove dead
mallocs anymore if the TLI argument is missing. I've updated most passes to do
the right thing.
Fixes PR13694 and probably others.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162841 91177308-0d34-0410-b5e6-96231b3b80d8
The isTied bit is set automatically when a tied use is added and
MCInstrDesc indicates a tied operand. The tie is broken when one of the
tied operands is removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162814 91177308-0d34-0410-b5e6-96231b3b80d8
This patch implements ProfileDataLoader which loads profile data generated by
-insert-edge-profiling and updates branch weight metadata accordingly.
Patch by Alastair Murray.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162799 91177308-0d34-0410-b5e6-96231b3b80d8
While in SSA form, a MachineInstr can have pairs of tied defs and uses.
The tied operands are used to represent read-modify-write operands that
must be assigned the same physical register.
Previously, tied operand pairs were computed from fixed MCInstrDesc
fields, or by using black magic on inline assembly instructions.
The isTied flag makes it possible to add tied operands to any
instruction while getting rid of (some of) the inlineasm magic.
Tied operands on normal instructions are needed to represent predicated
individual instructions in SSA form. An extra <tied,imp-use> operand is
required to represent the output value when the instruction predicate is
false.
Adding a predicate to:
%vreg0<def> = ADD %vreg1, %vreg2
Will look like:
%vreg0<tied,def> = ADD %vreg1, %vreg2, pred:3, %vreg7<tied,imp-use>
The virtual register %vreg7 is the value given to %vreg0 when the
predicate is false. It will be assigned the same physreg as %vreg0.
This commit adds the isTied flag and sets it based on MCInstrDesc when
building an instruction. The flag is not used for anything yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162774 91177308-0d34-0410-b5e6-96231b3b80d8
Register operands are manipulated by a lot of target-independent code,
and it is not always possible to preserve target flags. That means it is
not safe to use target flags on register operands.
None of the targets in the tree are using register operand target flags.
External targets should be using immediate operands to annotate
instructions with operand modifiers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162770 91177308-0d34-0410-b5e6-96231b3b80d8
These extra flags are not required to properly order the atomic
load/store instructions. SelectionDAGBuilder chains atomics as if they
were volatile, and SelectionDAG::getAtomic() sets the isVolatile bit on
the memory operands of all atomic operations.
The volatile bit is enough to order atomic loads and stores during and
after SelectionDAG.
This means we set mayLoad on atomic_load, mayStore on atomic_store, and
mayLoad+mayStore on the remaining atomic read-modify-write operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162733 91177308-0d34-0410-b5e6-96231b3b80d8
Adds the vendor 'fsl' (used by Freescale SDK) to Triple. This will allow
clang support for Freescale cross-compile configurations.
Patch by Tobias von Koch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162726 91177308-0d34-0410-b5e6-96231b3b80d8
This section (introduced in DWARF-3) is used to define instruction address
ranges for functions that are not contiguous and can't be described
by low_pc/high_pc attributes (this is the usual case for inlined subroutines).
The patch is the first step to support fetching complete inlining info from DWARF.
Reviewed by Benjamin Kramer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162657 91177308-0d34-0410-b5e6-96231b3b80d8
ProfileDataTypes.h header.
With this patch the old and new profiling code can exist side-by-side. The new
profiling code will be submitted soon and it only supports insert-edge-profiling
for now and will not depend on ProfileInfo.
Patch by Alastair Murray.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162576 91177308-0d34-0410-b5e6-96231b3b80d8
the case of multiple edges from one block to another.
A simple example is a switch statement with multiple values to the same
destination. The definition of an edge is modified from a pair of blocks to
a pair of PredBlock and an index into the successors.
Also set the weight correctly when building SelectionDAG from LLVM IR,
especially when converting a Switch.
IntegersSubsetMapping is updated to calculate the weight for each cluster.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162572 91177308-0d34-0410-b5e6-96231b3b80d8
MSVC doesn't support passing by-value parameters with alignment of
16-bytes or higher apparantly. What is deeply confusing is that it seems
to *sometimes* (but not always) apply this to any type whose alignment
is set using __declspec(align(...)). This caused lots of errors when we switch
SmallVector over to use the automatically aligned character array
utilities as they used __declspec(align(...)) heavily.
As a pretty horrible but effective work-around, we instead cherry pick
the smallest alignment sizes with specific types that happen to have the
correct alignment, and then fall back to the attribute solution past
them. This should resolve the MSVC build errors folks have been hitting.
Sorry for that. In good news, it will do this without introducing other
UB I hope. =]
Thanks to Timur Iskhodzhanov for helping me test this!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162549 91177308-0d34-0410-b5e6-96231b3b80d8
Keep track of the set/unset state of these bits along with their
true/false values, but treat '?' as '0' for now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162461 91177308-0d34-0410-b5e6-96231b3b80d8
Currently, TableGen just guesses instruction properties when it can't
infer them form patterns.
This adds a guessInstructionProperties flag to the instruction set
definition that will be used to disable guessing. The flag is intended
as a migration aid. It will be removed again when no more targets need
their properties guessed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162460 91177308-0d34-0410-b5e6-96231b3b80d8
The logic for recomputing latency based on a ScheduleDAG edge was
shady. This bypasses the problem by requiring the client to provide
operand indices. This ensures consistent use of the machine model's
API.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162420 91177308-0d34-0410-b5e6-96231b3b80d8
When reporting an error for a defm, we would previously only report the
location of the outer defm, which is not always where the error is.
Now we also print the location of the expanded multiclass defs:
lib/Target/X86/X86InstrSSE.td:2902:12: error: foo
defm ADD : basic_sse12_fp_binop_s<0x58, "add", fadd, SSE_ALU_ITINS_S>,
^
lib/Target/X86/X86InstrSSE.td:2801:11: note: instantiated from multiclass
defm PD : sse12_fp_packed<opc, !strconcat(OpcodeStr, "pd"), OpNode, VR128,
^
lib/Target/X86/X86InstrSSE.td:194:5: note: instantiated from multiclass
def rm : PI<opc, MRMSrcMem, (outs RC:$dst), (ins RC:$src1, x86memop:$src2),
^
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162409 91177308-0d34-0410-b5e6-96231b3b80d8
The MCInst is immediately passed to the copy-constructor for local
storage, so there's no need for the parameter itself to be by-value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162404 91177308-0d34-0410-b5e6-96231b3b80d8
within the codegen EK_GPRel64BlockAddress. This was not
supported for direct object output and resulted in an assertion.
This change adds support for EK_GPRel64BlockAddress for
direct object.
One fallout from this is to turn on rela relocations
for mips64 to match gas.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162334 91177308-0d34-0410-b5e6-96231b3b80d8
int64_t, Symbol64TableEntry is actually only stored with 4-byte alignment
within the file.
The usage of #pragma pack here is copied from the corresponding code in
Support/Endian.h, so shouldn't introduce any new portability problems.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162312 91177308-0d34-0410-b5e6-96231b3b80d8
this is the index of the operand that failed to match.
Note: This may cause a buildbot failure due to an API mismatch in clang. Should
recover with my next commit to clang.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162295 91177308-0d34-0410-b5e6-96231b3b80d8
number of bits was bigger than 32. I checked every use of this function
that I could find and it looks like the maximum number of bits is 32, so I've
added an assertion checking this property, and a type cast to (hopefully) stop
PVS-Studio from warning about this in the future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162277 91177308-0d34-0410-b5e6-96231b3b80d8
The getSumForBlock function was quadratic in the number of successors
because getSuccWeight would perform a linear search for an already known
iterator.
This patch was originally committed as r161460, but reverted again
because of assertion failures. Now that duplicate Machine CFG edges have
been eliminated, this works properly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162233 91177308-0d34-0410-b5e6-96231b3b80d8
LLVM IR has labeled duplicate CFG edges, but since Machine CFG edges
don't have labels, it doesn't make sense to allow duplicates. There is
no way of telling what the edges mean.
Duplicate CFG edges cause confusion when dealing with edge weights. It
seems that code producing duplicate CFG edges usually does the wrong
thing with edge weights.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162227 91177308-0d34-0410-b5e6-96231b3b80d8
No new tests are added.
All tests in ExecutionEngine/MCJIT that have been failing pass after this patch
is applied (when "make check" is done on a mips board).
Patch by Petar Jovanovic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162135 91177308-0d34-0410-b5e6-96231b3b80d8
The previous fix only checked for simple cycles, use a set to catch longer
cycles too.
Drop the broken check from the ObjectSizeOffsetEvaluator. The BoundsChecking
pass doesn't have to deal with invalid IR like InstCombine does.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162120 91177308-0d34-0410-b5e6-96231b3b80d8
make it more consistent with its intended semantics.
The `linker_private_weak_def_auto' linkage type was meant to automatically hide
globals which never had their addresses taken. It has nothing to do with the
`linker_private' linkage type, which outputs the symbols with a `l' (ell) prefix
among other things.
The intended semantic is more like the `linkonce_odr' linkage type.
Change the name of the linkage type to `linkonce_odr_auto_hide'. And therefore
changing the semantics so that it produces the correct output for the linker.
Note: The old linkage name `linker_private_weak_def_auto' will still parse but
is not a synonym for `linkonce_odr_auto_hide'. This should be removed in 4.0.
<rdar://problem/11754934>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162114 91177308-0d34-0410-b5e6-96231b3b80d8
include/llvm/IntrinsicsHexagon.td: Hexagon_Intrinsic is the base class
for all Hexagon intrinsics and not altivec intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162087 91177308-0d34-0410-b5e6-96231b3b80d8
Select instructions pick one of two virtual registers based on a
condition, like x86 cmov. On targets like ARM that support predication,
selects can sometimes be eliminated by predicating the instruction
defining one of the operands.
Teach PeepholeOptimizer to recognize select instructions, and ask the
target to optimize them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162059 91177308-0d34-0410-b5e6-96231b3b80d8
where some fact lake a=b dominates a use in a phi, but doesn't dominate the
basic block itself.
This feature could also be implemented by splitting critical edges, but at least
with the current algorithm reasoning about the dominance directly is faster.
The time for running "opt -O2" in the testcase in pr10584 is 1.003 times slower
and on gcc as a single file it is 1.0007 times faster.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162023 91177308-0d34-0410-b5e6-96231b3b80d8
This can be used to tell TableGen to use a specific SubRegIndex instead
of synthesizing one when discovering all sub-registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161982 91177308-0d34-0410-b5e6-96231b3b80d8
instruction to something absurdly high, while setting the probability of
branching to the 'unwind' destination to the bare minimum. This should set cause
the normal destination's invoke blocks to be moved closer to the invoke.
PR13612
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161944 91177308-0d34-0410-b5e6-96231b3b80d8
Refactor the TableGen'erated fixed length disassemblmer to use a
table-driven state machine rather than a massive set of nested
switch() statements.
As a result, the ARM Disassembler (ARMDisassembler.cpp) builds much more
quickly and generates a smaller end result. For a Release+Asserts build on
a 16GB 3.4GHz i7 iMac w/ SSD:
Time to compile at -O2 (averaged w/ hot caches):
Previous: 35.5s
New: 8.9s
TEXT size:
Previous: 447,251
New: 297,661
Builds in 25% of the time previously required and generates code 66% of
the size.
Execution time of the disassembler is only slightly slower (7% disassembling
10 million ARM instructions, 19.6s vs 21.0s). The new implementation has
not yet been tuned, however, so the performance should almost certainly
be recoverable should it become a concern.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161888 91177308-0d34-0410-b5e6-96231b3b80d8
may invalidate its AliasSet because SSAUpdater does not update the AliasSet properly.
This patch teaches SSAUpdater to notify AliasSet that it made changes.
The testcase in PR12901 is too big to be useful and I could not reduce it to a normal size.
rdar://11872059 PR12901
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161803 91177308-0d34-0410-b5e6-96231b3b80d8
It never does anything when running 'make check', and it get's in the
way of updating live intervals in 2-addr.
The hook was originally added to help form IT blocks in Thumb2 code
before register allocation, but the pass ordering has changed since
then, and we run if-conversion after register allocation now.
When the MI scheduler is enabled, there will be no less than two
schedulers between 2-addr and Thumb2ITBlockPass, so this hook is
unlikely to help anything.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161794 91177308-0d34-0410-b5e6-96231b3b80d8
This new attribute is intended to be used by the backend to determine how
the inline asm string should be parsed/printed. This patch adds the
ia_nsdialect attribute and also adds a test case to ensure the IR is
correctly parsed, but there is no functional change at this time.
The standard dialect is assumed to be AT&T. Therefore, this attribute
should only be added to MS-style inline assembly statements, which use
the Intel dialect. If we ever support more dialects we'll need to
add additional state to the attribute.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161641 91177308-0d34-0410-b5e6-96231b3b80d8
This makes it possible to speed up def_iterator by stopping at the first
use. This makes def_empty() and getUniqueVRegDef() much faster when
there are many uses.
In a +Asserts build, LiveVariables is 100x faster in one case because
getVRegDef() has an assertion that would scan to the end of a
def_iterator chain.
Spill weight calculation is significantly faster (300x in one case)
because isTriviallyReMaterializable() calls MRI->isConstantPhysReg(%RIP)
which calls def_empty(%RIP).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161634 91177308-0d34-0410-b5e6-96231b3b80d8
Use a more conventional doubly linked list where the Prev pointers form
a cycle. This means it is no longer necessary to adjust the Prev
pointers when reallocating the VRegInfo array.
The test changes are required because the register allocation hint is
using the use-list order to break ties.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161633 91177308-0d34-0410-b5e6-96231b3b80d8
Register MachineOperands are kept in linked lists accessible via MRI's
reg_iterator interfaces. The linked list management was handled partly
by MachineOperand methods, partly by MRI methods.
Move all of the list management into MRI, delete
MO::AddRegOperandToRegInfo() and MO::RemoveRegOperandFromRegInfo().
Be more explicit about handling the cases where an MRI pointer isn't
available.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161632 91177308-0d34-0410-b5e6-96231b3b80d8
This new API will be used by clang to parse ms-style inline asms.
One goal of this project is to use this style of inline asm for targets other
then x86. Therefore, this API needs to be implemented for non-x86 targets at
some point in the future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161624 91177308-0d34-0410-b5e6-96231b3b80d8
MRI provides iterators for traversing the use-def chains. They should
not be accessible from anywhere else.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161543 91177308-0d34-0410-b5e6-96231b3b80d8
- The defines are in stddint.h, which is #include'd already.
- The block wasn't used anyway, since it was _OpenBSD_, and not __OpenBSD__
Patch by David Hill!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161515 91177308-0d34-0410-b5e6-96231b3b80d8
This replaces an existing subtarget hook on ARM and allows standard
CodeGen passes to potentially use the property.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161471 91177308-0d34-0410-b5e6-96231b3b80d8
The getSumForBlock function was quadratic in the number of successors
because getSuccWeight would perform a linear search for an already known
iterator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161460 91177308-0d34-0410-b5e6-96231b3b80d8
This adds support for TargetIndex operands during isel. The meaning of
these (index, offset, flags) operands is entirely defined by the target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161453 91177308-0d34-0410-b5e6-96231b3b80d8
A target index operand looks a lot like a constant pool reference, but
it is completely target-defined. It contains the 8-bit TargetFlags, a
32-bit index, and a 64-bit offset. It is preserved by all code generator
passes.
TargetIndex operands can be used to carry target-specific information in
cases where immediate operands won't suffice.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161441 91177308-0d34-0410-b5e6-96231b3b80d8
a use or a BB, but it is inline in the handling of the invoke instruction.
This patch refactors it so that it can be used in other cases. For example, in
define i32 @f(i32 %x) {
bb0:
%cmp = icmp eq i32 %x, 0
br i1 %cmp, label %bb2, label %bb1
bb1:
br label %bb2
bb2:
%cond = phi i32 [ %x, %bb0 ], [ 0, %bb1 ]
%foo = add i32 %cond, %x
ret i32 %foo
}
GVN should be able to replace %x with 0 in any use that is dominated by the
true edge out of bb0. In the above example the only such use is the one in
the phi.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161429 91177308-0d34-0410-b5e6-96231b3b80d8
On PPC64, this can be done with a simple TableGen pattern.
To enable this, I've added the (otherwise missing) readcyclecounter
SDNode definition to TargetSelectionDAG.td.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161302 91177308-0d34-0410-b5e6-96231b3b80d8
This option runs LiveIntervals before TwoAddressInstructionPass which
will eventually learn to exploit and update the analysis.
Eventually, LiveIntervals will run before PHIElimination, and we can get
rid of LiveVariables.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161270 91177308-0d34-0410-b5e6-96231b3b80d8
The previous change caused fast isel to not attempt handling any calls to
builtin functions. That included things like "printf" and caused some
noticable regressions in compile time. I wanted to avoid having fast isel
keep a separate list of functions that had to be kept in sync with what the
code in SelectionDAGBuilder.cpp was handling. I've resolved that here by
moving the list into TargetLibraryInfo. This is somewhat redundant in
SelectionDAGBuilder but it will ensure that we keep things consistent.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161263 91177308-0d34-0410-b5e6-96231b3b80d8
The 'unused' state of a value number can be represented as an invalid
def SlotIndex. This also exposed code that shouldn't have been looking
at unused value VNInfos.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161258 91177308-0d34-0410-b5e6-96231b3b80d8
The only real user of the flag was removeCopyByCommutingDef(), and it
has been switched to LiveIntervals::hasPHIKill().
All the code changed by this patch was only concerned with computing and
propagating the flag.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161255 91177308-0d34-0410-b5e6-96231b3b80d8
The VNInfo::HAS_PHI_KILL is only half supported. We precompute it in
LiveIntervalAnalysis, but it isn't properly updated by live range
splitting and functions like shrinkToUses().
It is only used in one place: RegisterCoalescer::removeCopyByCommutingDef().
This patch changes that function to use a new LiveIntervals::hasPHIKill()
function that computes the flag for a given value number.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161254 91177308-0d34-0410-b5e6-96231b3b80d8
Fast isel doesn't currently have support for translating builtin function
calls to target instructions. For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization. Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel. <rdar://problem/12008746>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161232 91177308-0d34-0410-b5e6-96231b3b80d8
This just provides a way to look up a LibFunc::Func enum value for a
function name. Alphabetize the enums and function names so we can use a
binary search.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161231 91177308-0d34-0410-b5e6-96231b3b80d8
The "findUsedStructTypes" method is very expensive to run. It needs to be
optimized so that LTO can run faster. Splitting this method out of the Module
class will help this occur. For instance, it can keep a list of seen objects so
that it doesn't process them over and over again.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161228 91177308-0d34-0410-b5e6-96231b3b80d8
Add more comments and use early returns to reduce nesting in isLoadFoldable.
Also disable folding for V_SET0 to avoid introducing a const pool entry and
a const pool load.
rdar://10554090 and rdar://11873276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161207 91177308-0d34-0410-b5e6-96231b3b80d8
yaml2obj takes a textual description of an object file in YAML format
and outputs the binary equivalent. This greatly simplifies writing
tests that take binary object files as input.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161205 91177308-0d34-0410-b5e6-96231b3b80d8
This trivial helper function tests if a register contains a register
unit. It is similar to regsOverlap(), but with asymmetric arguments.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161180 91177308-0d34-0410-b5e6-96231b3b80d8
Machine CSE and other optimizations can remove instructions so folding
is possible at peephole while not possible at ISel.
This patch is a rework of r160919 and was tested on clang self-host on my local
machine.
rdar://10554090 and rdar://11873276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161152 91177308-0d34-0410-b5e6-96231b3b80d8
TinyPtrVector. With these, it is sufficiently functional for my more
normal / pedestrian uses.
I've not included some r-value reference stuff here because the value
type for a TinyPtrVector is, necessarily, just a pointer.
I've added tests that cover the basic behavior of these routines, but
they aren't as comprehensive as I'd like. In particular, they don't
really test the iterator semantics as thoroughly as they should. Maybe
some brave soul will feel enterprising and flesh them out. ;]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161104 91177308-0d34-0410-b5e6-96231b3b80d8
Since the llvm::sys::fs::map_file_pages() support function it relies on
is not yet implemented on Windows, the unit tests for FileOutputBuffer
are currently conditionalized to run only on unix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161099 91177308-0d34-0410-b5e6-96231b3b80d8
for this class. These tests exercise most of the basic properties, but
the API for TinyPtrVector is very strange currently. My plan is to start
fleshing out the API to match that of SmallVector, but I wanted a test
for what is there first.
Sadly, it doesn't look reasonable to just re-use the SmallVector tests,
as this container can only ever store pointers, and much of the
SmallVector testing is to get construction and destruction right.
Just to get this basic test working, I had to add value_type to the
interface.
While here I found a subtle bug in the combination of 'erase', 'begin',
and 'end'. Both 'begin' and 'end' wanted to use a null pointer to
indicate the "end" iterator of an empty vector, regardless of whether
there is actually a vector allocated or the pointer union is null.
Everything else was fine with this except for erase. If you erase the
last element of a vector after it has held more than one element, we
return the end iterator of the underlying SmallVector which need not be
a null pointer. Instead, simply use the pointer, and poniter + size()
begin/end definitions in the tiny case, and delegate to the inner vector
whenever it is present.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161024 91177308-0d34-0410-b5e6-96231b3b80d8
CallInst for intrinsics. This allows users of the InstVisitor that would
like to special case certain very common intrinsics to do so naturally
in keeping with the type hierarchy's utility classes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161006 91177308-0d34-0410-b5e6-96231b3b80d8
test more than a single instantiation of SmallVector.
Add testing for 0, 1, 2, and 4 element sized "small" buffers. These
appear to be essentially untested in the unit tests until now.
Fix several tests to be robust in the face of a '0' small buffer. As
a consequence of this size buffer, the growth patterns are actually
observable in the test -- yes this means that many tests never caused
a grow to occur before. For some tests I've merely added a reserve call
to normalize behavior. For others, the growth is actually interesting,
and so I captured the fact that growth would occur and adjusted the
assertions to not assume how rapidly growth occured.
Also update the specialization for a '0' small buffer length to have all
the same interface points as the normal small vector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161001 91177308-0d34-0410-b5e6-96231b3b80d8
This is a cleaned up version of the isFree() function in
MachineTraceMetrics.cpp.
Transient instructions are very unlikely to produce any code in the
final output. Either because they get eliminated by RegisterCoalescing,
or because they are pseudo-instructions like labels and debug values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160977 91177308-0d34-0410-b5e6-96231b3b80d8
A->isPredecessor(B) is the same as B->isSuccessor(A), but it can
tolerate a B that is null or dangling. This shouldn't happen normally,
but it it useful for verification code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160968 91177308-0d34-0410-b5e6-96231b3b80d8