Stack is formed improperly for long structures passed as byval arguments for
EABI mode.
If we took AAPCS reference, we can found the next statements:
A: "If the argument requires double-word alignment (8-byte), the NCRN (Next
Core Register Number) is rounded up to the next even register number." (5.5
Parameter Passing, Stage C, C.3).
B: "The alignment of an aggregate shall be the alignment of its most-aligned
component." (4.3 Composite Types, 4.3.1 Aggregates).
So if we have structure with doubles (9 double fields) and 3 Core unused
registers (r1, r2, r3): caller should use r2 and r3 registers only.
Currently r1,r2,r3 set is used, but it is invalid.
Callee VA routine should also use r2 and r3 regs only. All is ok here. This
behaviour is guessed by rounding up SP address with ADD+BFC operations.
Fix:
Main fix is in ARMTargetLowering::HandleByVal. If we detected AAPCS mode and
8 byte alignment, we waste odd registers then.
P.S.:
I also improved LDRB_POST_IMM regression test. Since ldrb instruction will
not generated by current regression test after this patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166018 91177308-0d34-0410-b5e6-96231b3b80d8
This is a medium term workaround until we have a more robust solution
in the form of a register liveness utility for postRA passes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166001 91177308-0d34-0410-b5e6-96231b3b80d8
Using the cached bit vector in MRI avoids comstantly allocating and
recomputing the reserved register bit vector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165983 91177308-0d34-0410-b5e6-96231b3b80d8
Also provide an MRI::getReservedRegs() function to access the frozen
register set, and isReserved() and isAllocatable() methods to test
individual registers.
The various implementations of TRI::getReservedRegs() are quite
complicated, and many passes need to look at the reserved register set.
This patch makes it possible for these passes to use the cached copy in
MRI, avoiding a lot of malloc traffic and repeated calculations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165982 91177308-0d34-0410-b5e6-96231b3b80d8
The new coalescer can merge a dead def into an unused lane of an
otherwise live vector register.
Clear the <dead> flag when that happens since the flag refers to the
full virtual register which is still live after the partial dead def.
This fixes PR14079.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165877 91177308-0d34-0410-b5e6-96231b3b80d8
It is possible that the live range of the value being pruned loops back
into the kill MBB where the search started. When that happens, make sure
that the beginning of KillMBB is also pruned.
Instead of starting a DFS at KillMBB and skipping the root of the
search, start a DFS at each KillMBB successor, and allow the search to
loop back to KillMBB.
This fixes PR14078.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165872 91177308-0d34-0410-b5e6-96231b3b80d8
Completely update one interval at a time instead of collecting live
range fragments to be updated. This avoids building data structures,
except for a single SmallPtrSet of updated intervals.
Also share code between handleMove() and handleMoveIntoBundle().
Add support for moving dead defs across other live values in the
interval. The MI scheduler can do that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165824 91177308-0d34-0410-b5e6-96231b3b80d8
PHIElimination inserts IMPLICIT_DEF instructions to guarantee that all
PHI predecessors have a live-out value. These IMPLICIT_DEF values are
not considered to be real interference when coalescing virtual
registers:
%vreg1 = IMPLICIT_DEF
%vreg2 = MOV32r0
When joining %vreg1 and %vreg2, the IMPLICIT_DEF instruction and its
value number should simply be erased since the %vreg2 value number now
provides a live-out value for the PHI predecesor block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165813 91177308-0d34-0410-b5e6-96231b3b80d8
On PowerPC, a bitcast of <16 x i8> to i128 may run through a code
path in ExpandRes_BITCAST that attempts to do an intermediate
bitcast to a <4 x i32> vector, and then construct the Hi and Lo parts
of the resulting i128 by pairing up two of those i32 vector elements
each. The code already recognizes that on a big-endian system, the
first two vector elements form the Hi part, and the final two vector
elements form the Lo part (vice-versa from the little-endian situation).
However, we also need to take endianness into account when forming each
of those separate pairs: on a big-endian system, vector element 0 is
the *high* part of the pair making up the Hi part of the result, and
vector element 1 is the low part of the pair. The code currently always
uses vector element 0 as the low part and vector element 1 as the high
part, as is appropriate for little-endian platforms only.
This patch fixes this by swapping the vector elements as they are
paired up as appropriate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165802 91177308-0d34-0410-b5e6-96231b3b80d8
not legal. However, it should use a div instruction + mul + sub if divide is
legal. The rem legalization code was missing a check and incorrectly uses a
divrem libcall even when div is legal.
rdar://12481395
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165778 91177308-0d34-0410-b5e6-96231b3b80d8
isa<> et al. automatically infer when the cast is an upcast (including a
self-cast), so these are no longer necessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165767 91177308-0d34-0410-b5e6-96231b3b80d8
Not all instructions define a virtual register in their first operand.
Specifically, INLINEASM has a different format.
<rdar://problem/12472811>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165721 91177308-0d34-0410-b5e6-96231b3b80d8
The minimum set of required instructions is ISD::AND, ISD::OR, ISD::SETO(or ISD::SETOEQ) and ISD::SETUO(or ISD::SETUNE). Everything is expanded into one of two patterns:
Pattern 1: (LHS CC1 RHS) Opc (LHS CC2 RHS)
Pattern 2: (LHS CC1 LHS) Opc (RHS CC2 RHS)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165655 91177308-0d34-0410-b5e6-96231b3b80d8
- Due to the current matching vector elements constraints in ISD::FP_EXTEND,
rounding from v2f32 to v2f64 is scalarized. Add a customized v2f32 widening
to convert it into a target-specific X86ISD::VFPEXT to work around this
constraints. This patch also reverts a previous attempt to fix this issue by
recovering the scalarized ISD::FP_EXTEND pattern and thus significantly
reduces the overhead of supporting non-power-2 vector FP extend.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165625 91177308-0d34-0410-b5e6-96231b3b80d8
SchedulerDAGInstrs::buildSchedGraph ignores dependencies between FixedStack
objects and byval parameters. So loading byval parameters from stack may be
inserted *before* it will be stored, since these operations are treated as
independent.
Fix:
Currently ARMTargetLowering::LowerFormalArguments saves byval registers with
FixedStack MachinePointerInfo. To fix the problem we need to store byval
registers with MachinePointerInfo referenced to first the "byval" parameter.
Also commit adds two new fields to the InputArg structure: Function's argument
index and InputArg's part offset in bytes relative to the start position of
Function's argument. E.g.: If function's argument is 128 bit width and it was
splitted onto 32 bit regs, then we got 4 InputArg structs with same arg index,
but different offset values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165616 91177308-0d34-0410-b5e6-96231b3b80d8
checkRegMaskInterference only initializes the bitmask on the first interference.
This fixes PR14027 and (re)fixes PR13945.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165608 91177308-0d34-0410-b5e6-96231b3b80d8
Allows the new machine model to be used for NumMicroOps and OutputLatency.
Allows the HazardRecognizer to be disabled along with itineraries.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165603 91177308-0d34-0410-b5e6-96231b3b80d8
This wasn't contributing anything significant to postRA heuristics except compile time (by my measurements) and will be replaced by a more general heuristic for cross-region dependencies within the scheduler itself.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165563 91177308-0d34-0410-b5e6-96231b3b80d8
The next step is to update the optimizers to allow them to optimize the different address spaces with this information.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165505 91177308-0d34-0410-b5e6-96231b3b80d8
We use the enums to query whether an Attributes object has that attribute. The
opaque layer is responsible for knowing where that specific attribute is stored.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165488 91177308-0d34-0410-b5e6-96231b3b80d8
This class is used by LSR and a number of places in the codegen.
This is the first step in de-coupling LSR from TLI, and creating
a new interface in between them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165455 91177308-0d34-0410-b5e6-96231b3b80d8
When the CFG contains a loop with multiple entry blocks, the traces
computed by MachineTraceMetrics don't always have the same nice
properties. Loop back-edges are normally excluded from traces, but
MachineLoopInfo doesn't recognize loops with multiple entry blocks, so
those back-edges may be included.
Avoid asserting when that happens by adding an isEarlierInSameTrace()
function that accurately determines if a dominating block is part of the
same trace AND is above the currrent block in the trace.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165434 91177308-0d34-0410-b5e6-96231b3b80d8
a) frame setup instructions define the prologue
b) we shouldn't change our location mid-stream
Add a test to make sure that the stack adjustment stays within
the prologue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165250 91177308-0d34-0410-b5e6-96231b3b80d8
multiple stores with a single load. We create the wide loads and stores (and their chains)
before we remove the scalar loads and stores and fix the DAG chain. We attempted to merge
loads with a different chain. When that happened, the assumption that it is safe to RAUW
broke and a cycle was introduced.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165148 91177308-0d34-0410-b5e6-96231b3b80d8
is not profitable in many cases because modern processors perform multiple stores
in parallel and merging stores prior to merging requires extra work. We handle two main cases:
1. Store of multiple consecutive constants:
q->a = 3;
q->4 = 5;
In this case we store a single legal wide integer.
2. Store of multiple consecutive loads:
int a = p->a;
int b = p->b;
q->a = a;
q->b = b;
In this case we load/store either ilegal vector registers or legal wide integer registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165125 91177308-0d34-0410-b5e6-96231b3b80d8
Enable the pass by default for targets that request it, and change the
-enable-early-ifcvt to the opposite -disable-early-ifcvt.
There are still some x86 regressions when enabling early if-conversion
because of the missing machine models. Disable the pass for x86 until
machine models are added.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165075 91177308-0d34-0410-b5e6-96231b3b80d8
Reserved register live ranges look like a set of dead defs - any uses of
reserved registers are ignored.
Instead of skipping the updating of reserved register operands entirely,
just ignore the use operands and treat the def operands normally.
No test case, handleMove() is not commonly used yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165060 91177308-0d34-0410-b5e6-96231b3b80d8
JoinVals::pruneValues() calls LIS->pruneValue() to avoid conflicts when
overlapping two different values. This produces a set of live range end
points that are used to reconstruct the live range (with SSA update)
after joining the two registers.
When a value is pruned twice, the set of end points was insufficient:
v1 = DEF
v1 = REPLACE1
v1 = REPLACE2
KILL v1
The end point at KILL would only reconstruct the live range from
REPLACE2 to KILL, leaving the range REPLACE1-REPLACE2 dead.
Add REPLACE2 as an end point in this case so the full live range is
reconstructed.
This fixes PR13999.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165056 91177308-0d34-0410-b5e6-96231b3b80d8
the add/sub case since in the case of multiplication you also have to check that
the operation in the larger type did not overflow.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165017 91177308-0d34-0410-b5e6-96231b3b80d8
because moden processos can store multiple values in parallel, and preparing the consecutive store requires
some work. We only handle these cases:
1. Consecutive stores where the values and consecutive loads. For example:
int a = p->a;
int b = p->b;
q->a = a;
q->b = b;
2. Consecutive stores where the values are constants. Foe example:
q->a = 4;
q->b = 5;
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164910 91177308-0d34-0410-b5e6-96231b3b80d8
buildbots. Original commit message:
A DAGCombine optimization for merging consecutive stores. This optimization is not profitable in many cases
because moden processos can store multiple values in parallel, and preparing the consecutive store requires
some work. We only handle these cases:
1. Consecutive stores where the values and consecutive loads. For example:
int a = p->a;
int b = p->b;
q->a = a;
q->b = b;
2. Consecutive stores where the values are constants. Foe example:
q->a = 4;
q->b = 5;
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164890 91177308-0d34-0410-b5e6-96231b3b80d8
because moden processos can store multiple values in parallel, and preparing the consecutive store requires
some work. We only handle these cases:
1. Consecutive stores where the values and consecutive loads. For example:
int a = p->a;
int b = p->b;
q->a = a;
q->b = b;
2. Consecutive stores where the values are constants. Foe example:
q->a = 4;
q->b = 5;
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164885 91177308-0d34-0410-b5e6-96231b3b80d8
The new coalescer can turn a full virtual register definition into a
partial redef by merging another value into an unused vector lane.
Make sure to clear the <read-undef> flag on such defs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164807 91177308-0d34-0410-b5e6-96231b3b80d8
The fix is obvious and the only test case I have is horrible, so I am
not including it. The problem shows up when self-hosting clang on i386
with -new-coalescer enabled.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164793 91177308-0d34-0410-b5e6-96231b3b80d8
The hasFnAttr method has been replaced by querying the Attributes explicitly. No
intended functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164725 91177308-0d34-0410-b5e6-96231b3b80d8
scalar-to-vector conversion that we cannot handle. For instance, when an invalid
constraint is used in an inline asm statement.
<rdar://problem/12284092>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164662 91177308-0d34-0410-b5e6-96231b3b80d8
scalar-to-vector conversion that we cannot handle. For instance, when an invalid
constraint is used in an inline asm statement.
<rdar://problem/12284092>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164657 91177308-0d34-0410-b5e6-96231b3b80d8
Provide interface in TargetLowering to set or get the minimum number of basic
blocks whereby jump tables are generated for switch statements rather than an
if sequence.
getMinimumJumpTableEntries() defaults to 4.
setMinimumJumpTableEntries() allows target configuration.
This patch changes the default for the Hexagon architecture to 5
as it improves performance on some benchmarks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164628 91177308-0d34-0410-b5e6-96231b3b80d8
Even out-of-line jump tables can be in the code section, so mark them
as data-regions for those targets which support the directives.
rdar://12362871&12362974
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164571 91177308-0d34-0410-b5e6-96231b3b80d8
care about it being an argument variable so that we can decide
that captured block and lambda vars that don't happen to
be arguments could be an argument pointer.
Add the object pointer for one case onto the subprogram die.
rdar://12001329
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164419 91177308-0d34-0410-b5e6-96231b3b80d8
because LiveStackAnalysis was not preserved by VirtRegWriter. This caused
big stack usage regression in some cases.
rdar://12340383
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164408 91177308-0d34-0410-b5e6-96231b3b80d8
A PHI can't create interference on its own. If two live ranges interfere
at a PHI, they must also interfere when leaving one of the PHI
predecessors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164330 91177308-0d34-0410-b5e6-96231b3b80d8
The old-fashioned many-to-one value mapping doesn't always work when
merging vector lanes. A value can map to multiple different values, and
it can even be necessary to insert new PHIs.
When a value number is defined by a copy from a value number that
required SSa update, include the live range of the copied value number
in the SSA update as well. It is not necessarily a copy of the original
value number any longer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164329 91177308-0d34-0410-b5e6-96231b3b80d8