On large goto table based interpreters, where phi nodes can have (very) large
fan-ins, isLiveOut exhibited poor performances: about 40% of the full
codegen time was spent in PHIElim, sorting MachineBasicBlock addresses.
This patch improve the performances for such cases, and does not show
compile time regressions on the LNT, at bootstrap (llvm+clang+lldb) or
any other benchmarks we have in-house.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239510 91177308-0d34-0410-b5e6-96231b3b80d8
This patch ensures that SHL/SRL/SRA shifts for i8 and i16 vectors avoid scalarization. It builds on the existing i8 SHL vectorized implementation of moving the shift bits up to the sign bit position and separating the 4, 2 & 1 bit shifts with several improvements:
1 - SSE41 targets can use (v)pblendvb directly with the sign bit instead of performing a comparison to feed into a VSELECT node.
2 - pre-SSE41 targets were masking + comparing with an 0x80 constant - we avoid this by using the fact that a set sign bit means a negative integer which can be compared against zero to then feed into VSELECT, avoiding the need for a constant mask (zero generation is much cheaper).
3 - SRA i8 needs to be unpacked to the upper byte of a i16 so that the i16 psraw instruction can be correctly used for sign extension - we have to do more work than for SHL/SRL but perf tests indicate that this is still beneficial.
The i16 implementation is similar but simpler than for i8 - we have to do 8, 4, 2 & 1 bit shifts but less shift masking is involved. SSE41 use of (v)pblendvb requires that the i16 shift amount is splatted to both bytes however.
Tested on SSE2, SSE41 and AVX machines.
Differential Revision: http://reviews.llvm.org/D9474
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239509 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r239437.
This broke clang-cl self-hosts. We'd end up calling the __imp_ symbol
directly instead of using it to do an indirect function call.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239502 91177308-0d34-0410-b5e6-96231b3b80d8
It hasn't been used since r130964.
This also removes MachineModuleInfo::isUsedFunction and
MachineModuleInfo::AnalyzeModule, both of which were only
there to support UsedFunctions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239501 91177308-0d34-0410-b5e6-96231b3b80d8
This always just set the User::OperandList which is now set
in that method instead of being returned.
Reviewed by Duncan Exon Smith.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239493 91177308-0d34-0410-b5e6-96231b3b80d8
PhiNode, SwitchInst, LandingPad and IndirectBr all had virtually identical
logic for growing the hung off uses.
Move it to User so that they can all call a single shared implementation.
Their destructors were all empty after this change and were deleted. They all
have virtual clone_impl methods which can be used as vtable anchors.
Reviewed by Duncan Exon Smith.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239492 91177308-0d34-0410-b5e6-96231b3b80d8
Now that the subclasses which care about hung off uses let ~User clean it up,
there's no need for a separate method. Just inline it to ~User and delete it.
Reviewed by Duncan Exon Smith.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239491 91177308-0d34-0410-b5e6-96231b3b80d8
Currently all of the logic for deleting hung off uses, which PHI/switch/etc use,
is in their classes.
This adds a bit to Value which tracks whether that user had hung off uses,
then User can be responsible for clearing them instead of the sub classes.
Note, the bit used here was taken from NumOperands which was 30-bits.
Given the reduction to 29 bits, and the average User being just over 100 bytes,
a single User with 29-bits of num operands would need 50GB of RAM for itself
so its reasonable to assume that 29-bits is enough for now.
This is a step towards hiding all the hung off uses logic in the User.
Reviewed by Duncan Exon Smith.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239490 91177308-0d34-0410-b5e6-96231b3b80d8
PhiNode's need to allocate space for an array of Use[N] and then BasicBlock*[N].
They had their own allocHungOffUses to handle all of this. This moves the logic
in to User::allocHungOffUses and PhiNode passes in a bool to say to allocate
the BB* space too.
Reviewed by Duncan Exon Smith.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239489 91177308-0d34-0410-b5e6-96231b3b80d8
If the first argument to a function is a 'this' argument and the second
has the sret attribute, the ArgumentPromotion pass may promote the 'this'
argument to more than one argument, violating the IR constraint that 'sret'
may only be applied to the first or second argument.
Although this IR constraint is arguably unnecessary, it highlighted the fact
that ArgPromotion does not need to preserve this attribute. Dropping the
attribute reduces register pressure in the backend by avoiding the register
copy required by sret. Because sret implies noalias, we also replace the
former with the latter.
Differential Revision: http://reviews.llvm.org/D10353
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239488 91177308-0d34-0410-b5e6-96231b3b80d8
This is a reimplementation of D9780 at the machine instruction level rather than the DAG.
Use the MachineCombiner pass to reassociate scalar single-precision AVX additions (just a
starting point; see the TODO comments) to increase ILP when it's safe to do so.
The code is closely based on the existing MachineCombiner optimization that is implemented
for AArch64.
This patch should not cause the kind of spilling tragedy that led to the reversion of r236031.
Differential Revision: http://reviews.llvm.org/D10321
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239486 91177308-0d34-0410-b5e6-96231b3b80d8
O2 compiles just before GlobalDCE, unless we are preparing for LTO.
This pass eliminates available externally globals (turning them into
declarations), regardless of whether they are dead/unreferenced, since
we are guaranteed to have a copy available elsewhere at link time.
This enables additional opportunities for GlobalDCE.
If we are preparing for LTO (e.g. a -flto -c compile), the pass is not
included as we want to preserve available externally functions for possible
link time inlining. The FE indicates whether we are doing an -flto compile
via the new PrepareForLTO flag on the PassManagerBuilder.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239480 91177308-0d34-0410-b5e6-96231b3b80d8
Determining proper debug locations for instructions created in
PHITransAddr is tricky. We use a simple approach here and simply copy
debug locations from instructions computing load address to
"corresponding" instructions re-creating the address computation
in predecessor basic blocks.
This may not always be correct, given all the rearrangement and
simplification going on, and debug locations may jump around a lot,
as the basic blocks we copy locations between may be very far from
each other.
Still, this would work good in most simple cases (e.g. when chain
of address computing instruction is short, or our mapping turns out
to be 1-to-1), and we desire to have *some* reasonable debug locations
associated with newly inserted instructions.
See http://reviews.llvm.org/D10351 review thread for more details.
Test Plan: regression test suite
Reviewers: spatel, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10351
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239479 91177308-0d34-0410-b5e6-96231b3b80d8
During statepoint lowering we can sometimes avoid spilling of the value if we know that it was already spilled for previous statepoint.
We were doing this by checking if incoming statepoint value was lowered into load from stack slot. This was working only in boundaries of one basic block.
But instead of looking at the lowered node we can look directly at the llvm-ir value and if it was gc.relocate (or some simple modification of it) look up stack slot for it's derived pointer and reuse stack slot from it. This allows us to look across basic block boundaries.
Differential Revision: http://reviews.llvm.org/D10251
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239472 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: rafael
Reviewed By: rafael
Subscribers: rafael, ted, jfb, llvm-commits, rengolin, jholewinski
Differential Revision: http://reviews.llvm.org/D10311
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239467 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: rafael
Reviewed By: rafael
Subscribers: rafael, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10307
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239465 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: echristo, rafael
Reviewed By: rafael
Subscribers: rafael, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10243
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239464 91177308-0d34-0410-b5e6-96231b3b80d8
fix segfault by checking for UnknownArch, since
getArchTypePrefix() will return nullptr for UnknownArch.
This fixes regression caused by r238424.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239456 91177308-0d34-0410-b5e6-96231b3b80d8
We have to do this manually, the runtime only sets up ebp. Fixes a crash
when returning after catching an exception.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239451 91177308-0d34-0410-b5e6-96231b3b80d8
Use a "safeseh" string attribute to do this. You would think we chould
just accumulate the set of personalities like we do on dwarf, but this
fails to account for the LSDA-loading thunks we use for
__CxxFrameHandler3. Each of those needs to make it into .sxdata as well.
The string attribute seemed like the most straightforward approach.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239448 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit 2e449ec5bcdf67b52b315b16c2128aaf25d5b73c.
This was svn r239440. Its currently failing an ARM test so reverting while I work out
what to do next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239441 91177308-0d34-0410-b5e6-96231b3b80d8
It wasn't possible to have a variable Symbol with offset or 'isCommon' so
this just enables better packing of the MCSymbol class.
Reviewed by Rafael Espindola.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239440 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The RegisterScavenger explicitly ignores <kill> flags on operands of
predicated instructions and therefore assumes that such registers remain
live. When it then scavenges such a register, it inserts a spill of this
(killed) register. This is invalid code and gets flagged up by the
verifier.
Nowadays kill flags are set correctly on predicated instructions. This
patch makes the Scavenger respect them.
The bug has so far only been triggered by an internal pass, so I don't
have a test case unfortunately.
Fixes PR23119.
Reviewers: hfinkel, tobiasvk_caf
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9039
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239439 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We used to assume V->RAUW only modifies the operand list of V's user.
However, if V and V's user are Constants, RAUW may replace and invalidate V's
user entirely.
This patch fixes the above issue by letting the caller replace the
operand instead of calling RAUW on Constants.
Test Plan: @nested_const_expr and @rauw in access-non-generic.ll
Reviewers: broune, jholewinski
Reviewed By: broune, jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10345
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239435 91177308-0d34-0410-b5e6-96231b3b80d8
This gets all the handler info through to the asm printer and we can
look at the .xdata tables now. I've convinced one small catch-all test
case to work, but other than that, it would be a stretch to say this is
functional.
The state numbering algorithm avoids doing any scope reconstruction as
we do for C++ to simplify the implementation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239433 91177308-0d34-0410-b5e6-96231b3b80d8
Store instructions do not modify register values and therefore it's safe
to form a store pair even if the source register has been read in between
the two store instructions.
Previously, the read of w1 (see below) prevented the formation of a stp.
str w0, [x2]
ldr w8, [x2, #8]
add w0, w8, w1
str w1, [x2, #4]
ret
We now generate the following code.
stp w0, w1, [x2]
ldr w8, [x2, #8]
add w0, w8, w1
ret
All correctness tests with -Ofast on A57 with Spec200x and EEMBC pass.
Performance results for SPEC2K were within noise.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239432 91177308-0d34-0410-b5e6-96231b3b80d8