output (we're emitting a specification already and the information
isn't changing) and we're not in old gdb compat mode.
Saves 1% on the debug information for a build of llvm.
Fixes rdar://11043421
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162493 91177308-0d34-0410-b5e6-96231b3b80d8
The logic for recomputing latency based on a ScheduleDAG edge was
shady. This bypasses the problem by requiring the client to provide
operand indices. This ensures consistent use of the machine model's
API.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162420 91177308-0d34-0410-b5e6-96231b3b80d8
Based on CR feedback from r162301 and Craig Topper's refactoring in r162347
here are a few other places that could use the same API (& in one instance drop
a Function.h dependency).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162367 91177308-0d34-0410-b5e6-96231b3b80d8
SelectionDAG's 'init' has not been called when the SelectionDAGBuilder is
constructed (in SelectionDAGISel's constructor), so this was previously always
initialized with 0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162333 91177308-0d34-0410-b5e6-96231b3b80d8
Even looking at the revision history I couldn't quite piece together why this
cast was ever written in the first place, but I assume it was because of some
change in the inheritance, perhaps this function was reimplemented in a
derived type & this caller was meant to get the base version (& it wasn't
virtual)?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162301 91177308-0d34-0410-b5e6-96231b3b80d8
The getSumForBlock function was quadratic in the number of successors
because getSuccWeight would perform a linear search for an already known
iterator.
This patch was originally committed as r161460, but reverted again
because of assertion failures. Now that duplicate Machine CFG edges have
been eliminated, this works properly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162233 91177308-0d34-0410-b5e6-96231b3b80d8
IR that hasn't been through SimplifyCFG can look like this:
br i1 %b, label %r, label %r
Make sure we don't create duplicate Machine CFG edges in this case.
Fix the machine code verifier to accept conditional branches with a
single CFG edge.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162230 91177308-0d34-0410-b5e6-96231b3b80d8
The DAGCombiner tries to optimise a BUILD_VECTOR by checking if it
consists purely of get_vector_elts from one or two source vectors. If
so, it either makes a concat_vectors node or a shufflevector node.
However, it doesn't check the element type width of the underlying
vector, so if you have this sequence:
Node0: v4i16 = ...
Node1: i32 = extract_vector_elt Node0
Node2: i32 = extract_vector_elt Node0
Node3: v16i8 = BUILD_VECTOR Node1, Node2, ...
It will attempt to:
Node0: v4i16 = ...
NewNode1: v16i8 = concat_vectors Node0, ...
Where this is actually invalid because the element width is completely
different. This causes an assertion failure on DAG legalization stage.
Fix:
If output item type of BUILD_VECTOR differs from input item type.
Make concat_vectors based on input element type and then bitcast it to the output vector type. So the case described above will transformed to:
Node0: v4i16 = ...
NewNode1: v8i16 = concat_vectors Node0, ...
NewNode2: v16i8 = bitcast NewNode1
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162195 91177308-0d34-0410-b5e6-96231b3b80d8
make it more consistent with its intended semantics.
The `linker_private_weak_def_auto' linkage type was meant to automatically hide
globals which never had their addresses taken. It has nothing to do with the
`linker_private' linkage type, which outputs the symbols with a `l' (ell) prefix
among other things.
The intended semantic is more like the `linkonce_odr' linkage type.
Change the name of the linkage type to `linkonce_odr_auto_hide'. And therefore
changing the semantics so that it produces the correct output for the linker.
Note: The old linkage name `linker_private_weak_def_auto' will still parse but
is not a synonym for `linkonce_odr_auto_hide'. This should be removed in 4.0.
<rdar://problem/11754934>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162114 91177308-0d34-0410-b5e6-96231b3b80d8
Increment the MBB iterator at the top of the loop to properly handle the
current (and previous) instructions getting erased.
This fixes PR13625.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162099 91177308-0d34-0410-b5e6-96231b3b80d8
Select instructions pick one of two virtual registers based on a
condition, like x86 cmov. On targets like ARM that support predication,
selects can sometimes be eliminated by predicating the instruction
defining one of the operands.
Teach PeepholeOptimizer to recognize select instructions, and ask the
target to optimize them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162059 91177308-0d34-0410-b5e6-96231b3b80d8
It never does anything when running 'make check', and it get's in the
way of updating live intervals in 2-addr.
The hook was originally added to help form IT blocks in Thumb2 code
before register allocation, but the pass ordering has changed since
then, and we run if-conversion after register allocation now.
When the MI scheduler is enabled, there will be no less than two
schedulers between 2-addr and Thumb2ITBlockPass, so this hook is
unlikely to help anything.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161794 91177308-0d34-0410-b5e6-96231b3b80d8
It is still possible to if-convert if the tail block has extra
predecessors, but the tail phis must be rewritten instead of being
removed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161781 91177308-0d34-0410-b5e6-96231b3b80d8
Detect when there is not enough available ILP, so if-conversion can't
speculate instructions for free.
Compute the lengthening of the critical path when inserting a select
instruction that depends on the condition as well as both sides of the
if.
Reject conversions that would stretch the critical path by more than
half a mispredict penalty.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161713 91177308-0d34-0410-b5e6-96231b3b80d8
Trace::getResourceLength() computes the number of cycles required to
execute the trace when ignoring data dependencies. The number can be
compared to the critical path to estimate the trace ILP.
Trace::getPHIDepth() computes the data dependency depth of a PHI in a
trace successor that isn't necessarily part of the trace.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161711 91177308-0d34-0410-b5e6-96231b3b80d8
When a trace ends with a back-edge, include PHIs in the loop header in
the height computations. This makes the critical path through a loop
more accurate by including the latencies of the last instructions in the
loop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161688 91177308-0d34-0410-b5e6-96231b3b80d8
When replacing Old with New, it can happen that New is already a
successor. Add the old and new edge weights instead of creating a
duplicate edge.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161653 91177308-0d34-0410-b5e6-96231b3b80d8
This makes it possible to speed up def_iterator by stopping at the first
use. This makes def_empty() and getUniqueVRegDef() much faster when
there are many uses.
In a +Asserts build, LiveVariables is 100x faster in one case because
getVRegDef() has an assertion that would scan to the end of a
def_iterator chain.
Spill weight calculation is significantly faster (300x in one case)
because isTriviallyReMaterializable() calls MRI->isConstantPhysReg(%RIP)
which calls def_empty(%RIP).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161634 91177308-0d34-0410-b5e6-96231b3b80d8
Use a more conventional doubly linked list where the Prev pointers form
a cycle. This means it is no longer necessary to adjust the Prev
pointers when reallocating the VRegInfo array.
The test changes are required because the register allocation hint is
using the use-list order to break ties.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161633 91177308-0d34-0410-b5e6-96231b3b80d8
Register MachineOperands are kept in linked lists accessible via MRI's
reg_iterator interfaces. The linked list management was handled partly
by MachineOperand methods, partly by MRI methods.
Move all of the list management into MRI, delete
MO::AddRegOperandToRegInfo() and MO::RemoveRegOperandFromRegInfo().
Be more explicit about handling the cases where an MRI pointer isn't
available.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161632 91177308-0d34-0410-b5e6-96231b3b80d8
We filter out MachineLoop back-edges during the trace-building PO
traversals, but it is possible to have CFG cycles that aren't natural
loops, and MachineLoopInfo doesn't include such cycles.
Use a standard visited set to detect such CFG cycles, and completely
ignore them when picking traces.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161532 91177308-0d34-0410-b5e6-96231b3b80d8
We perform the following:
1> Use SUB instead of CMP for i8,i16,i32 and i64 in ISel lowering.
2> Modify MachineCSE to correctly handle implicit defs.
3> Convert SUB back to CMP if possible at peephole.
Removed pattern matching of (a>b) ? (a-b):0 and like, since they are handled
by peephole now.
rdar://11873276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161462 91177308-0d34-0410-b5e6-96231b3b80d8
The getSumForBlock function was quadratic in the number of successors
because getSuccWeight would perform a linear search for an already known
iterator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161460 91177308-0d34-0410-b5e6-96231b3b80d8
This adds support for TargetIndex operands during isel. The meaning of
these (index, offset, flags) operands is entirely defined by the target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161453 91177308-0d34-0410-b5e6-96231b3b80d8
A target index operand looks a lot like a constant pool reference, but
it is completely target-defined. It contains the 8-bit TargetFlags, a
32-bit index, and a 64-bit offset. It is preserved by all code generator
passes.
TargetIndex operands can be used to carry target-specific information in
cases where immediate operands won't suffice.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161441 91177308-0d34-0410-b5e6-96231b3b80d8
Compare the critical paths of the two traces through an if-conversion
candidate. If the difference is larger than the branch brediction
penalty, reject the if-conversion. If would never pay.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161433 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, MBP essentially aligned every branch target it could. This
bloats code quite a bit, especially non-looping code which has no real
reason to prefer aligned branch targets so heavily.
As Andy said in review, it's still a bit odd to do this without a real
cost model, but this at least has much more plausible heuristics.
Fixes PR13265.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161409 91177308-0d34-0410-b5e6-96231b3b80d8
If the result of a common subexpression is used at all uses of the candidate
expression, CSE should not increase the live range of the common subexpression.
rdar://11393714 and rdar://11819721
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161396 91177308-0d34-0410-b5e6-96231b3b80d8
This patch is mostly just refactoring a bunch of copy-and-pasted code, but
it also adds a check that the call instructions are readnone or readonly.
That check was already present for sin, cos, sqrt, log2, and exp2 calls, but
it was missing for the rest of the builtins being handled in this code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161282 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change intended, except replacing a DenseMap with a
SmallDenseMap which should behave identically.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161281 91177308-0d34-0410-b5e6-96231b3b80d8
This is far from complete, and only changes behavior when the
-early-live-intervals flag is passed to llc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161273 91177308-0d34-0410-b5e6-96231b3b80d8
This option runs LiveIntervals before TwoAddressInstructionPass which
will eventually learn to exploit and update the analysis.
Eventually, LiveIntervals will run before PHIElimination, and we can get
rid of LiveVariables.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161270 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, the identity copy would survive through register allocation
before it was removed by the rewriter.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161269 91177308-0d34-0410-b5e6-96231b3b80d8
The previous change caused fast isel to not attempt handling any calls to
builtin functions. That included things like "printf" and caused some
noticable regressions in compile time. I wanted to avoid having fast isel
keep a separate list of functions that had to be kept in sync with what the
code in SelectionDAGBuilder.cpp was handling. I've resolved that here by
moving the list into TargetLibraryInfo. This is somewhat redundant in
SelectionDAGBuilder but it will ensure that we keep things consistent.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161263 91177308-0d34-0410-b5e6-96231b3b80d8
I noticed that SelectionDAGBuilder::visitCall was missing a check for memcmp
in TargetLibraryInfo, so that it would use custom code for memcmp calls even
with -fno-builtin. I also had to add a new -disable-simplify-libcalls option
to llc so that I could write a test for this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161262 91177308-0d34-0410-b5e6-96231b3b80d8
The 'unused' state of a value number can be represented as an invalid
def SlotIndex. This also exposed code that shouldn't have been looking
at unused value VNInfos.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161258 91177308-0d34-0410-b5e6-96231b3b80d8
The only real user of the flag was removeCopyByCommutingDef(), and it
has been switched to LiveIntervals::hasPHIKill().
All the code changed by this patch was only concerned with computing and
propagating the flag.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161255 91177308-0d34-0410-b5e6-96231b3b80d8
The VNInfo::HAS_PHI_KILL is only half supported. We precompute it in
LiveIntervalAnalysis, but it isn't properly updated by live range
splitting and functions like shrinkToUses().
It is only used in one place: RegisterCoalescer::removeCopyByCommutingDef().
This patch changes that function to use a new LiveIntervals::hasPHIKill()
function that computes the flag for a given value number.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161254 91177308-0d34-0410-b5e6-96231b3b80d8
This functionality was added before we started running
DeadMachineInstructionElim on all targets. It serves no purpose now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161241 91177308-0d34-0410-b5e6-96231b3b80d8
Fast isel doesn't currently have support for translating builtin function
calls to target instructions. For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization. Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel. <rdar://problem/12008746>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161232 91177308-0d34-0410-b5e6-96231b3b80d8
Add more comments and use early returns to reduce nesting in isLoadFoldable.
Also disable folding for V_SET0 to avoid introducing a const pool entry and
a const pool load.
rdar://10554090 and rdar://11873276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161207 91177308-0d34-0410-b5e6-96231b3b80d8
Whenever both instruction depths and instruction heights are known in a
block, it is possible to compute the length of the critical path as
max(depth+height) over the instructions in the block.
The stored live-in lists make it possible to accurately compute the
length of a critical path that bypasses the current (small) block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161197 91177308-0d34-0410-b5e6-96231b3b80d8
Don't cause regunit intervals to be computed just to verify them. Only
check the already cached intervals.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161183 91177308-0d34-0410-b5e6-96231b3b80d8
LiveRangeEdit::eliminateDeadDefs() can delete a dead instruction that
reads unreserved physregs. This would leave the corresponding regunit
live interval dangling because we don't have shrinkToUses() for physical
registers.
Fix this problem by turning the instruction into a KILL instead of
deleting it. This happens in a landing pad in
test/CodeGen/X86/2012-05-19-CoalescerCrash.ll:
%vreg27<def,dead> = COPY %EDX<kill>; GR32:%vreg27
becomes:
KILL %EDX<kill>
An upcoming fix to the machine verifier will catch problems like this by
verifying regunit live intervals.
This fixes PR13498. I am not including the test case from the PR since
we already have one exposing the problem once the verifier is fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161182 91177308-0d34-0410-b5e6-96231b3b80d8
Machine CSE and other optimizations can remove instructions so folding
is possible at peephole while not possible at ISel.
This patch is a rework of r160919 and was tested on clang self-host on my local
machine.
rdar://10554090 and rdar://11873276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161152 91177308-0d34-0410-b5e6-96231b3b80d8
The height on an instruction is the minimum number of cycles from the
instruction is issued to the end of the trace. Heights are computed for
all instructions in and below the trace center block.
The method for computing heights is different from the depth
computation. As we visit instructions in the trace bottom-up, heights of
used instructions are pushed upwards. This way, we avoid scanning long
use lists, looking for uses in the current trace.
At each basic block boundary, a list of live-in registers and their
minimum heights is saved in the trace block info. These live-in lists
are used when restarting depth computations on a trace that
converges with an already computed trace. They will also be used to
accurately compute the critical path length.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161138 91177308-0d34-0410-b5e6-96231b3b80d8
Assuming infinite issue width, compute the earliest each instruction in
the trace can issue, when considering the latency of data dependencies.
The issue cycle is record as a 'depth' from the beginning of the trace.
This is half the computation required to find the length of the critical
path through the trace. Heights are next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161074 91177308-0d34-0410-b5e6-96231b3b80d8
One motivating example is to sink an instruction from a basic block which has
two successors: one outside the loop, the other inside the loop. We should try
to sink the instruction outside the loop.
rdar://11980766
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161062 91177308-0d34-0410-b5e6-96231b3b80d8
We are extending live ranges, so kill flags are not accurate. They
aren't needed until they are recomputed after RA anyway.
<rdar://problem/11950722>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161023 91177308-0d34-0410-b5e6-96231b3b80d8
We branch to the successor with higher edge weight first.
Convert from
je LBB4_8 --> to outer loop
jmp LBB4_14 --> to inner loop
to
jne LBB4_14
jmp LBB4_8
PR12750
rdar: 11393714
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161018 91177308-0d34-0410-b5e6-96231b3b80d8
This lets traces include the final iteration of a nested loop above the
center block, and the first iteration of a nested loop below the center
block.
We still don't allow traces to contain backedges, and traces are
truncated where they would leave a loop, as seen from the center block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161003 91177308-0d34-0410-b5e6-96231b3b80d8
When computing a trace, all the candidates for pred/succ must have been
visited. Filter out back-edges first, though. The PO traversal ignores
them.
Thanks to Andy for spotting this in review.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160995 91177308-0d34-0410-b5e6-96231b3b80d8
This is a cleaned up version of the isFree() function in
MachineTraceMetrics.cpp.
Transient instructions are very unlikely to produce any code in the
final output. Either because they get eliminated by RegisterCoalescing,
or because they are pseudo-instructions like labels and debug values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160977 91177308-0d34-0410-b5e6-96231b3b80d8
The MachineTraceMetrics analysis must be invalidated before modifying
the CFG. This will catch some of the violations of that rule.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160969 91177308-0d34-0410-b5e6-96231b3b80d8
A->isPredecessor(B) is the same as B->isSuccessor(A), but it can
tolerate a B that is null or dangling. This shouldn't happen normally,
but it it useful for verification code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160968 91177308-0d34-0410-b5e6-96231b3b80d8
Machine CSE and other optimizations can remove instructions so folding
is possible at peephole while not possible at ISel.
rdar://10554090 and rdar://11873276
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160919 91177308-0d34-0410-b5e6-96231b3b80d8
A value number is a PHI def if and only if it begins at a block
boundary. This can be derived from the def slot, a separate flag is not
necessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160893 91177308-0d34-0410-b5e6-96231b3b80d8
This option replaces the existing live interval computation with one
based on LiveRangeCalc.cpp. The new algorithm does not depend on
LiveVariables, and it can be run at any time, before or after leaving
SSA form.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160892 91177308-0d34-0410-b5e6-96231b3b80d8
This is still a work in progress.
Out-of-order CPUs usually execute instructions from multiple basic
blocks simultaneously, so it is necessary to look at longer traces when
estimating the performance effects of code transformations.
The MachineTraceMetrics analysis will pick a typical trace through a
given basic block and provide performance metrics for the trace. Metrics
will include:
- Instruction count through the trace.
- Issue count per functional unit.
- Critical path length, and per-instruction 'slack'.
These metrics can be used to determine the performance limiting factor
when executing the trace, and how it will be affected by a code
transformation.
Initially, this will be used by the early if-conversion pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160796 91177308-0d34-0410-b5e6-96231b3b80d8
It is redundant; RegisterCoalescer will do the remat if it can't eliminate
the copy. Collected instruction counts before and after this. A few extra
instructions are generated due to spilling but it is normal to see these kinds
of changes with almost any small codegen change, according to Jakob.
This also fixed rdar://11830760 where xor is expected instead of movi0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160749 91177308-0d34-0410-b5e6-96231b3b80d8
When a live range splits into multiple connected components, we would
arbitrarily assign <undef> uses to component 0. This is wrong when the
use is tied to a def that gets assigned to a different component:
%vreg69<def> = ADD8ri %vreg68<undef>, 1
The use and def must get the same virtual register.
Fix this by assigning <undef> uses to the same component as the value
defined by the instruction, if any:
%vreg69<def> = ADD8ri %vreg69<undef>, 1
This fixes PR13402. The PR has a test case which I am not including
because it is unlikely to keep exposing this behavior in the future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160739 91177308-0d34-0410-b5e6-96231b3b80d8
that do not support it (X86 does not lower select_cc).
PR: 13428
Together with Michael Kuperstein <michael.m.kuperstein@intel.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160619 91177308-0d34-0410-b5e6-96231b3b80d8
LiveRangeEdit::foldAsLoad() can eliminate a register by folding a load
into its only use. Only do that when the load is safe to move, and it
won't extend any live ranges.
This fixes PR13414.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160575 91177308-0d34-0410-b5e6-96231b3b80d8
PHIElimination splits critical edges when it predicts it can resolve
interference and eliminate copies. It doesn't split the edge if the
interference wouldn't be resolved anyway because the phi-use register is
live in the critical edge anyway.
Teach PHIElimination to split loop exiting edges with interference, even
if it wouldn't resolve the interference. This removes the necessary
copies from the loop, which is still an improvement from injecting the
copies into the loop.
The test case demonstrates the improvement. Before:
LBB0_1:
cmpb $0, (%rdx)
leaq 1(%rdx), %rdx
movl %esi, %eax
je LBB0_1
After:
LBB0_1:
cmpb $0, (%rdx)
leaq 1(%rdx), %rdx
je LBB0_1
movl %esi, %eax
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160571 91177308-0d34-0410-b5e6-96231b3b80d8
LiveIntervals due to the two-addr pass generating bogus MI code.
The crux of the issue was a loop nesting problem. The intent of the code
which attempts to transform instructions before converting them to
two-addr form is to defer and reprocess any transformed instructions as
the second processing is likely to have more opportunities to coalesce
copies, etc. Unfortunately, there was one section of processing that was
not deferred -- the INSERT_SUBREG rewriting. Due to quirks of how this
rewriting proceeded, not only did it occur early, it removed the bits of
information needed for the deferred processing to correctly generate the
necessary two address form (specifically inserting a copy), but didn't
trigger any immediate assertions and produced what appeared to be
already valid two-address from code. Thus, the assertion only fired much
later in the pipeline.
The fix is to hoist the transformation logic up layer to where it can
more firmly defer all further processing, and to teach the normal
processing to handle an edge case previously handled as part of the
transformation logic. This edge case (already matched tied register
operands) needs to *not* defer any steps.
As has been brought up repeatedly in the process: wow does this code
need refactoring. I *may* squeeze in some time to at least bring sanity
to this loop... but wow... =]
Thanks to Jakob for helpful hints on the way here, and the review.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160443 91177308-0d34-0410-b5e6-96231b3b80d8
When truncating a result of a vector that is split we need
to use the result of the split vector, and not re-split the dead node.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160357 91177308-0d34-0410-b5e6-96231b3b80d8
large immediates. Add dag combine logic to recover in case the large
immediates doesn't fit in cmp immediate operand field.
int foo(unsigned long l) {
return (l>> 47) == 1;
}
we produce
%shr.mask = and i64 %l, -140737488355328
%cmp = icmp eq i64 %shr.mask, 140737488355328
%conv = zext i1 %cmp to i32
ret i32 %conv
which codegens to
movq $0xffff800000000000,%rax
andq %rdi,%rax
movq $0x0000800000000000,%rcx
cmpq %rcx,%rax
sete %al
movzbl %al,%eax
ret
TargetLowering::SimplifySetCC would transform
(X & -256) == 256 -> (X >> 8) == 1
if the immediate fails the isLegalICmpImmediate() test. For x86,
that's immediates which are not a signed 32-bit immediate.
Based on a patch by Eli Friedman.
PR10328
rdar://9758774
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160346 91177308-0d34-0410-b5e6-96231b3b80d8
In the added testcase the constant 55 was behind an AssertZext of type i1, and ComputeDemandedBits
reported that some of the bits were both known to be one and known to be zero.
Together with Michael Kuperstein <michael.m.kuperstein@intel.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160305 91177308-0d34-0410-b5e6-96231b3b80d8
Add a micro-optimization to getNode of CONCAT_VECTORS when both operands are undefs.
Can't find a testcase for this because VECTOR_SHUFFLE already handles undef operands, but Duncan suggested that we add this.
Together with Michael Kuperstein <michael.m.kuperstein@intel.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160229 91177308-0d34-0410-b5e6-96231b3b80d8
The notable fix is to look at any dependencies attached to the kill
instruction (or other instructions between MI nad the kill) where the
dependencies are specific to the register in question.
The old code implicitly handled this by rejecting the transform if *any*
other uses were found within the block, but after the start point. The
new code directly finds the kill, and has to re-use the existing
dependency scan to check for non-kill uses.
This was caught by self-host, but I found the bug via inspection and use
of absurd assert scaffolding to compute the kills in two ways and
compare them. So I have no useful testcase for this other than
"bootstrap". I'd work harder to reduce a test case if this particular
code were likely to live for a long time.
Thanks to Benjamin Kramer for reviewing the fix itself.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160228 91177308-0d34-0410-b5e6-96231b3b80d8
Catch uses of undefined physregs that haven't been added to basic block
live-in lists. Run the verifier to pinpoint the problem.
Also run the verifier when a virtual register use is not jointly
dominated by defs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160207 91177308-0d34-0410-b5e6-96231b3b80d8
removes the largest scaling problem in the test cases from PR13225 when
ASan is switched to insert basic blocks in the natural CFG order.
It may also solve some scaling problems for more normal code with large
numbers of basic blocks and variables.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160194 91177308-0d34-0410-b5e6-96231b3b80d8
When dumping the DAG for a fatal 'Cannot select' back-end error, also
provide the name of the function the construct is in. Useful when dealing
with large testcases, as the next step is to llvm-extract the function
in question to get a small(er) testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160152 91177308-0d34-0410-b5e6-96231b3b80d8
the input vector, it can be bigger (this is helpful for powerpc where <2 x i16>
is a legal vector type but i16 isn't a legal type, IIRC). However this wasn't
being taken into account by ExpandRes_EXTRACT_VECTOR_ELT, causing PR13220.
Lightly tweaked version of a patch by Michael Liao.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160116 91177308-0d34-0410-b5e6-96231b3b80d8
r1025 = s/zext r1024, 4
r1026 = extract_subreg r1025, 4
to a copy:
r1026 = copy r1024
This is correct. However it uses TII->isCoalescableExtInstr() which can return
true for instructions which essentially does a sext_in_reg so this can end up
with an illegal copy where the source and destination register classes do not
match. Add a check to avoid it. Sorry, no test case possible at this time.
rdar://11849816
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160059 91177308-0d34-0410-b5e6-96231b3b80d8
generalizing its implementation sufficiently to support this value
number scenario as well.
This cuts out another significant performance hit in large functions
(over 10k basic blocks, etc), especially those with "natural" CFG
structures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160026 91177308-0d34-0410-b5e6-96231b3b80d8
This ordering allows nested if-conversion without using a work list, and
it makes it possible to update the dominator tree on the fly as well.
Any erased basic blocks will always be dominated by the current
post-order position, so the domtree can be pruned without invalidating
the iterator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@160025 91177308-0d34-0410-b5e6-96231b3b80d8
back of it.
I don't have anything even remotely close to a test case for this. It
only broke two build bots, both of them doing bootstrap builds, one of
them a dragonegg bootstrap. It doesn't break for me when I bootstrap
either. It doesn't reproduce every time or on many machines during the
bootstrap. Many thanks to Duncan Sands who got the exact command (and
stage of the bootstrap) which failed on the dragonegg bootstrap and
managed to get it to trigger under valgrind with debug symbols. The fix
was then found by inspection.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159993 91177308-0d34-0410-b5e6-96231b3b80d8
multiple scalars and insert them into a vector. Next, we shuffle the elements
into the correct places, as before.
Also fix a small dagcombine bug in SimplifyBinOpWithSameOpcodeHands, when the
migration of bitcasts happened too late in the SelectionDAG process.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159991 91177308-0d34-0410-b5e6-96231b3b80d8
quadratic behavior when performing pathological merges. Fixes the core
element of PR12652.
There is only one user of addRangeFrom left: join. I'm hoping to
refactor further in a future patch and have join use this merge
operation as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159982 91177308-0d34-0410-b5e6-96231b3b80d8
of the trick merge routines. This adds a layer of testing that was
necessary when implementing more efficient (and complex) merge logic for
this datastructure.
No functionality changed here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159981 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, this would become an integer extension operation, followed by a real integer->float conversion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159957 91177308-0d34-0410-b5e6-96231b3b80d8
subtarget CPU descriptions and support new features of
MachineScheduler.
MachineModel has three categories of data:
1) Basic properties for coarse grained instruction cost model.
2) Scheduler Read/Write resources for simple per-opcode and operand cost model (TBD).
3) Instruction itineraties for detailed per-cycle reservation tables.
These will all live side-by-side. Any subtarget can use any
combination of them. Instruction itineraries will not change in the
near term. In the long run, I expect them to only be relevant for
in-order VLIW machines that have complex contraints and require a
precise scheduling/bundling model. Once itineraries are only actively
used by VLIW-ish targets, they could be replaced by something more
appropriate for those targets.
This tablegen backend rewrite sets things up for introducing
MachineModel type #2: per opcode/operand cost model.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159891 91177308-0d34-0410-b5e6-96231b3b80d8
DwarfDebug class could generate the same (inlined) DIVariable twice:
1) when trying to find abstract debug variable for a concrete inlined instance.
2) when explicitly collecting info for variables that were optimized out.
This change makes sure that this duplication won't happen and makes
Clang pass "gdb.opt/inline-locals" test from gdb testsuite.
Reviewed by Eric Christopher.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159811 91177308-0d34-0410-b5e6-96231b3b80d8
hash_value overload for MachineOperands. This addresses a FIXME
sufficient for me to remove it, and cleans up the code nicely too.
The important changes to the hashing logic:
- TargetFlags are now included in all of the hashes. These were complete
missed.
- Register operands have their subregisters and whether they are a def
included in the hash.
- We now actually hash all of the operand types. Previously, many
operand types were simply *dropped on the floor*. For example:
- Floating point immediates
- Large integer immediates (>64-bit)
- External globals!
- Register masks
- Metadata operands
- It removes the offset from the block-address hash; I'm a bit
suspicious of this, but isIdenticalTo doesn't consider the offset for
black addresses.
Any patterns involving these entities could have triggered extreme
slowdowns in MachineCSE or PHIElimination. Let me know if there are PRs
you think might be closed now... I'm looking myself, but I may miss
them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159743 91177308-0d34-0410-b5e6-96231b3b80d8
broken. This patch fixes the superficial problems which lead to the
intractably slow compile times reported in PR13225.
The specific issue is that we were failing to include the *offset* of
a global variable in the hash code. Oops. This would in turn cause all
MIs which were only distinguishable due to operating on different
offsets of a global variable to produce identical hash functions. In
some of the test cases attached to the PR I saw hash table activity
where there were O(1000) probes-per-lookup *on average*. A very few
entries were responsible for most of these probes.
There is still quite a bit more to do here. The ad-hoc layering of data
in MachineOperands makes them *extremely* brittle to hash correctly.
We're missing quite a few other cases, the only ones I've fixed here are
the specific MO types which were allowed through the assert() in
getOffset().
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159741 91177308-0d34-0410-b5e6-96231b3b80d8
change.
Move the "Not profitable, avoid CSE!" debug message next to where we fail the
check for profitability and use a different message for avoiding CSE due to
being in different register classes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159729 91177308-0d34-0410-b5e6-96231b3b80d8
Also allow trailing register mask operands on non-variadic both
MachineSDNodes and MachineInstrs.
The extra physreg RegisterSDNode operands are added to the MI as
<imp-use> operands. This makes it possible to have non-variadic call
instructions.
Call and return instructions really are non-variadic, the argument
registers should only be used implicitly - they are not part of the
encoding.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159727 91177308-0d34-0410-b5e6-96231b3b80d8
IntegersSubsetMapping
- Replaced type of Items field from std::list with std::map. In neares future I'll test it with DenseMap and do the correspond replacement
if possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159703 91177308-0d34-0410-b5e6-96231b3b80d8
This pass performs if-conversion on SSA form machine code by
speculatively executing both sides of the branch and using a cmov
instruction to select the result. This can help lower the number of
branch mispredictions on architectures like x86 that don't have
predicable instructions.
The current implementation is very aggressive, and causes regressions on
mosts tests. It needs good heuristics that have yet to be implemented.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159694 91177308-0d34-0410-b5e6-96231b3b80d8
IntegersSubsetMapping
- Replaced type of Items field from std::list with std::map. In neares future I'll test it with DenseMap and do the correspond replacement
if possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159659 91177308-0d34-0410-b5e6-96231b3b80d8
It appears to have caught a use-after-free introduced as by r159567
and/or friends which call 'addPass' from many more places. The bug in
'addPass' doesn't appear to be new, and was spotted by inspection when
ASan shown a bright light of a stacktrace at these functions.
Hopefully this will fix the ASan failure -- I have no test case other
than running an ASan-built clang over the test suite.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159614 91177308-0d34-0410-b5e6-96231b3b80d8
This is still a work in progress but I believe it is currently good enough
to fix PR13122 "Need unit test driver for codegen IR passes". For example,
you can run llc with -stop-after=loop-reduce to have it dump out the IR after
running LSR. Serializing machine-level IR is not yet supported but we have
some patches in progress for that.
The plan is to serialize the IR to a YAML file, containing separate sections
for the LLVM IR, machine-level IR, and whatever other info is needed. Chad
suggested that we stash the stop-after pass in the YAML file and use that
instead of the start-after option to figure out where to restart the
compilation. I think that's a great idea, but since it's not implemented yet
I put the -start-after option into this patch for testing purposes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159570 91177308-0d34-0410-b5e6-96231b3b80d8
This makes it possible to just use a zero value to represent "no pass", so
the phony NoPassID global variable is no longer needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159568 91177308-0d34-0410-b5e6-96231b3b80d8
This is a preliminary step toward having TargetPassConfig be able to
start and stop the compilation at specified passes for unit testing
and debugging. No functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159567 91177308-0d34-0410-b5e6-96231b3b80d8
register does not have multiple definitions. Modified TwoAddressInstructionPass
to use getUniqueVRegDef instead of getVRegDef.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159545 91177308-0d34-0410-b5e6-96231b3b80d8
implicit_def, the other instruction can be anything, including instructions
that define multiple values. Be careful about that and don't assume what operand
0 is.
Fixes pr13249.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159509 91177308-0d34-0410-b5e6-96231b3b80d8
This was always part of the VMCore library out of necessity -- it deals
entirely in the IR. The .cpp file in fact was already part of the VMCore
library. This is just a mechanical move.
I've tried to go through and re-apply the coding standard's preferred
header sort, but at 40-ish files, I may have gotten some wrong. Please
let me know if so.
I'll be committing the corresponding updates to Clang and Polly, and
Duncan has DragonEgg.
Thanks to Bill and Eric for giving the green light for this bit of cleanup.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159421 91177308-0d34-0410-b5e6-96231b3b80d8
The TargetInstrInfo::getNumMicroOps API does not change, but soon it
will be used by MachineScheduler. Now each subtarget can specify the
number of micro-ops per itinerary class. For ARM, this is currently
always dynamic (-1), because it is used for load/store multiple which
depends on the number of register operands.
Zero is now a valid number of micro-ops. This can be used for
nop pseudo-instructions or instructions that the hardware can squash
during dispatch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159406 91177308-0d34-0410-b5e6-96231b3b80d8
Teach vector legalization how to honor Promote for int to float
conversions. The code checking whether to promote the operation knew
to look at the operand, but the actual promotion code didn't. This
fixes that. The operand is promoted up via [zs]ext.
rdar://11762659
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159378 91177308-0d34-0410-b5e6-96231b3b80d8
include/llvm/Analysis/DebugInfo.h to include/llvm/DebugInfo.h.
The reasoning is because the DebugInfo module is simply an interface to the
debug info MDNodes and has nothing to do with analysis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159312 91177308-0d34-0410-b5e6-96231b3b80d8
Such passes can be used to tweak the register assignments in a
target-dependent way, for example to avoid write-after-write
dependencies.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159209 91177308-0d34-0410-b5e6-96231b3b80d8
very first (and worst) placement algorithm. These should now more
accurately reflect the reality of the pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159185 91177308-0d34-0410-b5e6-96231b3b80d8
The primary advantage is that loop optimizations will be applied in a
stable order. This helps debugging and unit test creation. It is also
a better overall implementation without pathologically bad performance
on deep functions.
On large functions (llvm-stress --size=200000 | opt -loops)
Before: 0.1263s
After: 0.0225s
On deep functions (after tweaking llvm-stress, thanks Nadav):
Before: 0.2281s
After: 0.0227s
See r158790 for more comments.
The loop tree is now consistently generated in forward order, but loop
passes are applied in reverse order over the program. If we have a
loop optimization that prefers forward order, that can easily be
achieved by adding a different type of LoopPassManager.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159183 91177308-0d34-0410-b5e6-96231b3b80d8
Verify that all paths from the entry block to a virtual register read
pass through a def. Enable this check even when MRI->isSSA() is false.
Verify that the live range of a virtual register is live out of all
predecessor blocks, even for PHI-values.
This requires that PHIElimination sometimes inserts IMPLICIT_DEF
instruction in predecessor blocks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159150 91177308-0d34-0410-b5e6-96231b3b80d8
Implicitly defined virtual registers can simply have the <undef> bit set
on all uses, and copies can be turned into implicit defs recursively.
Physical registers are a bit trickier. We handle the common case where a
physreg def is used by a nearby instruction in the same basic block. For
more complicated cases, just leave the IMPLICIT_DEF instruction in.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159149 91177308-0d34-0410-b5e6-96231b3b80d8
When a PHI use is <undef>, don't emit a copy in the predecessor block,
but insert an IMPLICIT_DEF instruction instead. This ensures that
virtual register uses are always jointly dominated by defs, even if some
of them are IMPLICIT_DEF.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159121 91177308-0d34-0410-b5e6-96231b3b80d8
When the source register to a 2-addr instruction is undefined, there is
no need to attempt any transformations - simply replace the source
register with the destination register.
This also comes up when lowering IMPLICIT_DEF instructions - make sure
the <undef> flag is moved to the new partial register def operand:
%vreg8<def> = INSERT_SUBREG %vreg9<undef>, %vreg0<kill>, sub_16bit
rewrite undef:
%vreg8<def> = INSERT_SUBREG %vreg8<undef>, %vreg0<kill>, sub_16bit
convert to:
%vreg8:sub_16bit<def,read-undef> = COPY %vreg0<kill>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159120 91177308-0d34-0410-b5e6-96231b3b80d8
It's simple: Don't treat <undef> operands as uses, and don't assume a
virtual register has a defining instruction unless a real use has been
seen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159061 91177308-0d34-0410-b5e6-96231b3b80d8
Original commit message:
Allow up to 64 functional units per processor itinerary.
This patch changes the type used to hold the FU bitset from unsigned to uint64_t.
This will be needed for some upcoming PowerPC itineraries.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159027 91177308-0d34-0410-b5e6-96231b3b80d8