Summary:
In RewriteStatepointsForGC pass, we create a gc_relocate intrinsic for
each relocated pointer, and the gc_relocate has the same type with the
pointer. During the creation of gc_relocate intrinsic, llvm requires to
mangle its type. However, llvm does not support mangling of all possible
types. RewriteStatepointsForGC will hit an assertion failure when it
tries to create a gc_relocate for pointer to vector of pointers because
mangling for vector of pointers is not supported.
This patch changes the way RewriteStatepointsForGC pass creates
gc_relocate. For each relocated pointer, we erase the type of pointers
and create an unified gc_relocate of type i8 addrspace(1)*. Then a
bitcast is inserted to convert the gc_relocate to the correct type. In
this way, gc_relocate does not need to deal with different types of
pointers and the unsupported type mangling is no longer a problem. This
change would also ease further merge when LLVM erases types of pointers
and introduces an unified pointer type.
Some minor changes are also introduced to gc_relocate related part in
InstCombineCalls, CodeGenPrepare, and Verifier accordingly.
Patch by Chen Li!
Reviewers: reames, AndyAyers, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9592
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@237009 91177308-0d34-0410-b5e6-96231b3b80d8
The bug showed up as a compile-time assertion failure:
Assertion `NumBits >= MIN_INT_BITS && "bitwidth too small"' failed
when building msan tests on x86-64.
Prior to r236850, this bug was masked due to a bogus alignment check,
which also accidentally rejected non-byte-sized accesses. Afterwards,
an invalid ElementSizeBytes == 0 got further into the function, and
triggered the assertion failure.
It would probably be a good idea to allow it to handle merging stores
of unusual widths as well, but for now, to un-break it, I'm just
making the minimal fix.
Differential Revision: http://reviews.llvm.org/D9626
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236927 91177308-0d34-0410-b5e6-96231b3b80d8
When emitting something like 'add x, 1000' if we remat the 1000 then we should be able to
mark the vreg containing 1000 as killed. Given that we go bottom up in fast-isel, a later
use of 1000 will be higher up in the BB and won't kill it, or be impacted by the lower kill.
However, rematerialised constant expressions aren't generated bottom up. The local value save area
grows downwards. This means that if you remat 2 constant expressions which both use 1000 then the
first will kill it, then the second, which is *lower* in the BB will read a killed register.
This is the case in the attached test where the 2 GEPs both need to generate 'add x, 6680' for the constant offset.
Note that this commit only makes kill flag generation conservative. There's nothing else obviously wrong with
the local value save area growing downwards, and in fact it needs to for handling arbitrarily complex constant expressions.
However, it would be nice if there was a solution which would let us generate more accurate kill flags, or just kill flags completely.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236922 91177308-0d34-0410-b5e6-96231b3b80d8
The code that builds the dependence graph assumes that two PseudoSourceValues
don't alias. In a tail calling function two FixedStackObjects might refer to the
same location. Worse 'immutable' fixed stack objects like function arguments are
not immutable and will be clobbered.
Change this so that a load from a FixedStackObject is not invariant in a tail
calling function and don't return a PseudoSourceValue for an instruction in tail
calling functions when building the dependence graph so that we handle function
arguments conservatively.
Fix for PR23459.
rdar://20740035
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236916 91177308-0d34-0410-b5e6-96231b3b80d8
When selecting an extract instruction, we don't actually generate code but instead work out which register we are reading, and rewrite uses of the extract def to the source register. This is done via updateValueMap,.
However, its possible that the source register we are rewriting *to* to also have uses. If those uses are after a kill of the value we are rewriting *from* then we have uses after a kill and the verifier fails.
This code checks for the case where the to register is also used, and if so it clears all kill on the from register. This is conservative, but better that always clearing kills on the from register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236897 91177308-0d34-0410-b5e6-96231b3b80d8
This changes the shape of the statepoint intrinsic from:
@llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args)
to:
@llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args)
This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back.
In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation.
Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops.
Differential Revision: http://reviews.llvm.org/D9501
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236888 91177308-0d34-0410-b5e6-96231b3b80d8
The test here was sinking the AND here to a lower BB:
%vreg7<def> = ANDWri %vreg8, 0; GPR32common:%vreg7,%vreg8
TBNZW %vreg8<kill>, 0, <BB#1>; GPR32common:%vreg8
which meant that vreg8 was read after it was killed.
This commit changes the code from clearing kill flags on the AND to clearing flags on all registers used by the AND.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236886 91177308-0d34-0410-b5e6-96231b3b80d8
1) check whether the alignment of the memory is sufficient for the
*merged* store or load to be efficient.
Not doing so can result in some ridiculously poor code generation, if
merging creates a vector operation which must be aligned but isn't.
2) DON'T check that the alignment of each load/store is equal. If
you're merging 2 4-byte stores, the first *might* have 8-byte
alignment, but the second certainly will have 4-byte alignment. We do
want to allow those to be merged.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236850 91177308-0d34-0410-b5e6-96231b3b80d8
If we duplicate an instruction then we must also clear kill flags on any uses we rewrite.
Otherwise we might be killing a register which was used in other BBs.
For example, here the entry BB ended up with these instructions, the ADD having been tail duplicated.
%vreg24<def> = t2ADDri %vreg10<kill>, 1, pred:14, pred:%noreg, opt:%noreg; GPRnopc:%vreg24 rGPR:%vreg10
%vreg22<def> = COPY %vreg10; GPR:%vreg22 rGPR:%vreg10
The copy here is inserted after the add and so needs vreg10 to be live.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236782 91177308-0d34-0410-b5e6-96231b3b80d8
After r236617, branch probabilities are no longer guaranteed to be >= 1. This
patch makes the swich lowering code handle that correctly, without bumping the
branch weights by 1 which might cause overflow and skews the probabilities.
Covered by @zero_weight_tree in test/CodeGen/X86/switch.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236739 91177308-0d34-0410-b5e6-96231b3b80d8
We had code such as this:
r2 = ...
t2Bcc
label1:
ldr ... r2
label2;
return r2<dead, def>
The if converter was transforming this to
r2<def> = ...
return [pred] r2<dead,def>
ldr <r2, kill>
return
which fails the machine verifier because the ldr now reads from a dead def.
The fix here detects dead defs in stepForward and passes them back to the caller in the clobbers list. The caller then clears the dead flag from the def is the value is live.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236660 91177308-0d34-0410-b5e6-96231b3b80d8
demanded by the machine verifier.
After shrinking a live-range to its uses, it is possible to create several
smaller live-ranges. When this happens, shrinkToUses returns true and we need to
split the different components into their own live-ranges.
The problem does not reproduce on any in-tree target but Jonas Paulsson
<jonas.paulsson@ericsson.com>, who reported the problem, checked that this patch
fixes the issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236658 91177308-0d34-0410-b5e6-96231b3b80d8
If called twice in the same BB on the same constant, FastISel::fastEmit_ri_ was marking the materialized vreg as killed on each use, instead of only the last use.
Change this to only mark the last use as killed by making earlier uses check if the vreg is already used elsewhere.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236650 91177308-0d34-0410-b5e6-96231b3b80d8
Don't create names for temporary symbols when using an object streamer.
The names never make it to the output anyway. From the starting point
of r236629, my heap profile says this drops peak memory usage from 1100
MB to 1058 MB for CodeGen of `verify-uselistorder`, a savings of almost
4% on peak memory, and removes `StringMap<bool, BumpPtrAllocator...>`
from the profile entirely.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236642 91177308-0d34-0410-b5e6-96231b3b80d8
It's quite possible to encounter an insertvalue instruction that's more deeply
nested than the value we're looking for, but when that happens we really
mustn't compare beyond the end of the index array.
Since I couldn't see any guarantees about what comparisons std::equal makes, we
probably need to directly check the size beforehand. In practice, I suspect
most std::equal implementations would probably bail early, which would be OK.
But just in case...
rdar://20834485
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236635 91177308-0d34-0410-b5e6-96231b3b80d8
Emit the number of bytes in a `.debug_loc` entry directly. The old code
created temp labels (expensive), emitted the difference between them,
and then emitted one on each side of the relevant bytes.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`
(the optimized version of ld64's `-save-temps` when linking the
`verify-uselistorder` executable in an LTO bootstrap). I've hacked
`MCContext::Allocate()` to just call `malloc()` instead of using the
`BumpPtrAllocator` so that the heap profile is easier to read. As far
as peak memory is concerned, `MCContext::Allocate()` is equivalent to a
leak, since it only gets freed at process teardown.
In my heap profile, this patch drops memory usage of
`DwarfDebug::emitDebugLoc()` from 132.56 MB (11.4%) down to 29.86 MB
(2.7%) at peak memory. Some of that must be noise from `SmallVector`
(or other) allocations -- peak memory only dropped from 1160 MB down to
1100 MB -- but this nevertheless shaves 5% off the top.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236629 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When computing branch weights in BPI, we used to disallow branches with
weight 0. This is a minor nuisance, because a branch with weight 0 is
different to "don't have information". In the context of
instrumentation, it may mean "never executed", in the context of
sampling, it means "never or seldom executed".
In allowing 0 weight branches, I ran into issues with the switch
expansion code in selection DAG. It is currently hardwired to not handle
branches with weight 0. To maintain the current behaviour, I changed it
to use 1 when it finds 0, but perhaps the algorithm needs changes to
tolerate branches with weight zero.
Reviewers: hansw
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9533
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236617 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This patch correctly handles undef case of EXTRACT_VECTOR_ELT node where the element index is constant and not less than vector size.
Test Plan:
CodeGen for X86 test included.
Also one incorrect regression test fixed.
Reviewers: qcolombet, chandlerc, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D9250
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236584 91177308-0d34-0410-b5e6-96231b3b80d8
For accessors in the `Statepoint` class, use symbolic constants for
offsets into the argument vector instead of literals. This makes the
code intent clearer and simpler to change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236566 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We default the value argument to nullptr. The only use of the value is
in diagnosePossiblyInvalidConstraint and that seems to be resilient to
it being nullptr.
Reviewers: atrick, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9479
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236555 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The exported class will be used in later change, in
StatepointLowering.cpp. It is still internal to SelectionDAG (not
exported via include/).
Reviewers: reames, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9478
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236554 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently this does not change anything, but change will be used in a
later change to StatepointLowering.cpp
Reviewers: reames, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9477
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236553 91177308-0d34-0410-b5e6-96231b3b80d8
Note, this is a recommit of r236515 after fixing an error in r236514. The buildbot ran fast enough that it picked up r236514 prior to r236515 and threw an error. r236515 itself ran 'make check' without errors.
Original commit message follows:
A regmask (typically seen on a call) clobbers the set of registers it lists. The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks.
These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier. Otherwise, uses after the if converted call could think they are reading an undefined register.
Reviewed by Matthias Braun and Quentin Colombet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236550 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds the minimum plumbing necessary to use IR-level
fast-math-flags (FMF) in the backend without actually using
them for anything yet. This is a follow-on to:
http://reviews.llvm.org/rL235997
...which split the existing nsw / nuw / exact flags and FMF
into their own struct.
There are 2 structural changes here:
1. The main diff is that we're preparing to extend the optimization
flags to affect more than just binary SDNodes. Eg, IR intrinsics
( https://llvm.org/bugs/show_bug.cgi?id=21290 ) or non-binop nodes
that don't even exist in IR such as FMA, FNEG, etc.
2. The other change is that we're actually copying the FP fast-math-flags
from the IR instructions to SDNodes.
Differential Revision: http://reviews.llvm.org/D8900
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236546 91177308-0d34-0410-b5e6-96231b3b80d8
Note, this is a reapplication of r236515 with a fix to not assert on non-register operands, but instead only handle them until the subsequent commit. Original commit message follows.
The code was basically the same here already. Just added an out parameter for a vector of seen defs so that UpdatePredRedefs can call StepForward first, then do its own post processing on the seen defs.
Will be used in the next commit to also handle regmasks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236538 91177308-0d34-0410-b5e6-96231b3b80d8
This patch makes ReplaceExtractVectorEltOfLoadWithNarrowedLoad convert
the element number from getVectorIdxTy() to PtrTy before doing pointer
arithmetic on it. This is needed on z, where element numbers are i32
but pointers are i64.
Original patch by Richard Sandiford.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236530 91177308-0d34-0410-b5e6-96231b3b80d8
For little-endian, the function would convert (extract_vector_elt (load X), Y)
to X + Y*sizeof(elt). For big-endian it would instead use
X + sizeof(vec) - Y*sizeof(elt). The big-endian case wasn't right since
vector index order always follows memory/array order, even for big-endian.
(Note that the current handling has to be wrong for Y==0 since it would
access beyond the end of the vector.)
Original patch by Richard Sandiford.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236529 91177308-0d34-0410-b5e6-96231b3b80d8
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal.
E.g. it would load a v4i8 as an i32 if i32 was legal.
This patch extends that behavior to promoted integers as well as legal ones.
If the integer type for the full vector width is TypePromoteInteger,
the element type is going to be TypePromoteInteger too, and it's still
better to use a single promoting load or truncating store rather than N
individual promoting loads or truncating stores. E.g. if you have a v2i8
on a target where i16 is promoted to i32, it's better to load the v2i8 as
an i16 rather than load both i8s individually.
Original patch by Richard Sandiford.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236528 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit 963cdbccf6e5578822836fd9b2ebece0ba9a60b7 (ie r236514)
This is to get the bots green while i investigate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236518 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit b27413cbfd78d959c18e713bfa271fb69e6b3303 (ie r236515).
This is to get the bots green while i investigate the failures.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236517 91177308-0d34-0410-b5e6-96231b3b80d8
A regmask (typically seen on a call) clobbers the set of registers it lists. The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks.
These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier. Otherwise, uses after the if converted call could think they are reading an undefined register.
Reviewed by Matthias Braun and Quentin Colombet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236515 91177308-0d34-0410-b5e6-96231b3b80d8
The code was basically the same here already. Just added an out parameter for a vector of seen defs so that UpdatePredRedefs can call StepForward first, then do its own post processing on the seen defs.
Will be used in the next commit to also handle regmasks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236514 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r236360.
This change exposed a bug in WinEHPrepare by opting win32 code into EH
preparation. We already knew that WinEHPrepare has bugs, and is the
status quo for x64, so I don't think that's a reason to hold off on this
change. I disabled exceptions in the sanitizer tests in r236505 and an
earlier revision.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236508 91177308-0d34-0410-b5e6-96231b3b80d8
This patch introduces a new pass that computes the safe point to insert the
prologue and epilogue of the function.
The interest is to find safe points that are cheaper than the entry and exits
blocks.
As an example and to avoid regressions to be introduce, this patch also
implements the required bits to enable the shrink-wrapping pass for AArch64.
** Context **
Currently we insert the prologue and epilogue of the method/function in the
entry and exits blocks. Although this is correct, we can do a better job when
those are not immediately required and insert them at less frequently executed
places.
The job of the shrink-wrapping pass is to identify such places.
** Motivating example **
Let us consider the following function that perform a call only in one branch of
a if:
define i32 @f(i32 %a, i32 %b) {
%tmp = alloca i32, align 4
%tmp2 = icmp slt i32 %a, %b
br i1 %tmp2, label %true, label %false
true:
store i32 %a, i32* %tmp, align 4
%tmp4 = call i32 @doSomething(i32 0, i32* %tmp)
br label %false
false:
%tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ]
ret i32 %tmp.0
}
On AArch64 this code generates (removing the cfi directives to ease
readabilities):
_f: ; @f
; BB#0:
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
LBB0_2: ; %false
mov sp, x29
ldp x29, x30, [sp], #16
ret
With shrink-wrapping we could generate:
_f: ; @f
; BB#0:
cmp w0, w1
b.ge LBB0_2
; BB#1: ; %true
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #16 ; =16
stur w0, [x29, #-4]
sub x1, x29, #4 ; =4
mov w0, wzr
bl _doSomething
add sp, x29, #16 ; =16
ldp x29, x30, [sp], #16
LBB0_2: ; %false
ret
Therefore, we would pay the overhead of setting up/destroying the frame only if
we actually do the call.
** Proposed Solution **
This patch introduces a new machine pass that perform the shrink-wrapping
analysis (See the comments at the beginning of ShrinkWrap.cpp for more details).
It then stores the safe save and restore point into the MachineFrameInfo
attached to the MachineFunction.
This information is then used by the PrologEpilogInserter (PEI) to place the
related code at the right place. This pass runs right before the PEI.
Unlike the original paper of Chow from PLDI’88, this implementation of
shrink-wrapping does not use expensive data-flow analysis and does not need hack
to properly avoid frequently executed point. Instead, it relies on dominance and
loop properties.
The pass is off by default and each target can opt-in by setting the
EnableShrinkWrap boolean to true in their derived class of TargetPassConfig.
This setting can also be overwritten on the command line by using
-enable-shrink-wrap.
Before you try out the pass for your target, make sure you properly fix your
emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not
necessarily the entry block.
** Design Decisions **
1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but
for debugging and clarity I thought it was best to have its own file.
2. Right now, we only support one save point and one restore point. At some
point we can expand this to several save point and restore point, the impacted
component would then be:
- The pass itself: New algorithm needed.
- MachineFrameInfo: Hold a list or set of Save/Restore point instead of one
pointer.
- PEI: Should loop over the save point and restore point.
Anyhow, at least for this first iteration, I do not believe this is interesting
to support the complex cases. We should revisit that when we motivating
examples.
Differential Revision: http://reviews.llvm.org/D9210
<rdar://problem/3201744>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236507 91177308-0d34-0410-b5e6-96231b3b80d8