Rewrite the pattern match code to work also with Values instead with
Instructions only. Also remove the no longer need matcher (m_Instruction).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223797 91177308-0d34-0410-b5e6-96231b3b80d8
This optimization transforms code like:
bb1:
%0 = icmp ne i32 %a, 0
%1 = icmp ne i32 %b, 0
%or.cond = or i1 %0, %1
br i1 %or.cond, label %TrueBB, label %FalseBB
into a multiple branch instructions like:
bb1:
%0 = icmp ne i32 %a, 0
br i1 %0, label %TrueBB, label %bb2
bb2:
%1 = icmp ne i32 %b, 0
br i1 %1, label %TrueBB, label %FalseBB
This optimization is already performed by SelectionDAG, but not by FastISel.
FastISel cannot perform this optimization, because it cannot generate new
MachineBasicBlocks.
Performing this optimization at CodeGenPrepare time makes it available to both -
SelectionDAG and FastISel - and the implementation in SelectiuonDAG could be
removed. There are currenty a few differences in codegen for X86 and PPC, so
this commmit only enables it for FastISel.
Reviewed by Jim Grosbach
This fixes rdar://problem/19034919.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223786 91177308-0d34-0410-b5e6-96231b3b80d8
The aggressive anti-dep breaker, used by the PowerPC backend during post-RA
scheduling (but is available to all targets), did not handle early-clobber MI
operands (at all). When constructing the list of available registers for the
replacement of some def operand, check the using instructions, and remove
registers assigned to early-clobbered defs from the set.
Fixes PR21452.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223727 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes an issue with ScheduleDAGInstrs::buildSchedGraph
where stores without an underlying object would not be added
as a predecessor to the current BarrierChain.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223717 91177308-0d34-0410-b5e6-96231b3b80d8
Introduce the ``llvm.instrprof_increment`` intrinsic and the
``-instrprof`` pass. These provide the infrastructure for writing
counters for profiling, as in clang's ``-fprofile-instr-generate``.
The implementation of the instrprof pass is ported directly out of the
CodeGenPGO classes in clang, and with the followup in clang that rips
that code out to use these new intrinsics this ends up being NFC.
Doing the instrumentation this way opens some doors in terms of
improving the counter performance. For example, this will make it
simple to experiment with alternate lowering strategies, and allows us
to try handling profiling specially in some optimizations if we want
to.
Finally, this drastically simplifies the frontend and puts all of the
lowering logic in one place.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223672 91177308-0d34-0410-b5e6-96231b3b80d8
This can significantly reduce the size of the switch, allowing for more
efficient lowering.
I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.
SimplifyCFG currently does this transformation, but I'm working towards changing
that so we can optimize harder based on unreachable defaults.
Differential Revision: http://reviews.llvm.org/D6510
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223566 91177308-0d34-0410-b5e6-96231b3b80d8
Reverting this because, while it fixes the problem in the reduced test case, it
does not fix the problem in the full test case from the bug report.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223442 91177308-0d34-0410-b5e6-96231b3b80d8
The scheduling dependency graph is built bottom-up within each scheduling
region, and ScheduleDAGInstrs::addPhysRegDeps is called to add output/anti
dependencies, based on physical registers, to the SUs for instructions
based on those that come before them.
In the test case, we start before post-RA scheduling with a block that looks
like this:
...
INLINEASM <...
andc $0,$0,$2
stdcx. $0,0,$3
bne- 1b
> [sideeffect] [mayload] [maystore] [attdialect], $0:[regdef-ec:G8RC], %X6<earlyclobber,def,dead>, $1:[mem], %X3<kill>, $2:[reguse:G8RC], %X5<kill>, $3:[reguse:G8RC], %X3, $4:[mem], %X3, $5:[clobber], %CC<earlyclobber,imp-def,dead>, <<badref>>
...
%X4<def,dead> = ANDIo8 %X4<kill>, 1, %CR0<imp-def,dead>, %CR0GT<imp-def>
...
%R29<def> = ISEL %R3<undef>, %R4<kill>, %CR0GT<kill>
where it is relevant that %CC is an alias to %CR0, and that %CR0GT is a
subregister of %CR0. However, for post-RA scheduling, no dependency was added
to prevent the INLINEASM from being scheduled in between the ANDIo8 and the
ISEL (which communicate via the %CR0GT register).
In ScheduleDAGInstrs::addPhysRegDeps, when called for the %CC operand, we'd
iterate over all of its aliases (which include %CC itself and also %CR0), and
look for previously-encountered defs of those registers. We'd find the ANDIo8,
but decide not to add a dependency between the INLINEASM and the ANDIo8 because
both the INLINEASM's def of %CC is dead, and also the ANDIo8 def of %CR0 is
dead. This ignores, however, that ANDIo8 has a non-dead def of %CR0GT, a
subregister of %CR0, and thus a dependency still must exist.
To fix this problem, when calling registerDefIsDead on the SU with the def, we
also check all subregisters for possible non-dead defs, and add the dependency
if any are found.
Fixes PR21742.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223440 91177308-0d34-0410-b5e6-96231b3b80d8
no DWARF register number mapping, or if the register was a virtual
register that was never materialized. Previously, we would just emit a
bogus location, after this patch we don't emit a location at all by
doing an early exit.
After my bugfix in r223401 today, this doesn't actually happen on any
target that I tested this with, but it's still preferable to make the
possibility of a failure explicit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223428 91177308-0d34-0410-b5e6-96231b3b80d8
eliminating all uses of a vreg, update any DBG_VALUE describing that vreg
to point to the rematerialized register instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223401 91177308-0d34-0410-b5e6-96231b3b80d8
According to a previous FIXME comment we now not only look at MBB
successors, but also handle code sinking past them:
x = computation
if () {} else {}
use x
The instruction could be sunk over the whole diamond for the
if/then/else (or loop, etc), allowing it to be sunk into other blocks
after that.
Modified test added in r204522, due to one spill less present.
Minor fixes in comments.
Patch provided by Jonas Paulsson. Reviewed by Hal Finkel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223350 91177308-0d34-0410-b5e6-96231b3b80d8
Added instcombine optimizations for BSWAP with AND/OR/XOR ops:
OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )
OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )
Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well:
fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y))
Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone.
Differential Revision: http://reviews.llvm.org/D6407
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223349 91177308-0d34-0410-b5e6-96231b3b80d8
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.
Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223348 91177308-0d34-0410-b5e6-96231b3b80d8
Use the MCAsmInfo instead of the DataLayout, and allow
specifying a custom prefix for labels specifically. HSAIL
requires that labels begin with @, but global symbols with &.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223323 91177308-0d34-0410-b5e6-96231b3b80d8
Prior to this commit, physical registers defined implicitly were considered free
right after their definition, i.e.. like dead definitions. Therefore, their uses
had to immediately follow their definitions, otherwise the related register may
be reused to allocate a virtual register.
This commit fixes this assumption by keeping implicit definitions alive until
they are actually used. The downside is that if the implicit definition was dead
(and not marked at such), we block an otherwise available register. This is
however conservatively correct and makes the fast register allocator much more
robust in particular regarding the scheduling of the instructions.
Fixes PR21700.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223317 91177308-0d34-0410-b5e6-96231b3b80d8
Patch by Ben Gamari!
This redefines the `prefix` attribute introduced previously and
introduces a `prologue` attribute. There are a two primary usecases
that these attributes aim to serve,
1. Function prologue sigils
2. Function hot-patching: Enable the user to insert `nop` operations
at the beginning of the function which can later be safely replaced
with a call to some instrumentation facility
3. Runtime metadata: Allow a compiler to insert data for use by the
runtime during execution. GHC is one example of a compiler that
needs this functionality for its tables-next-to-code functionality.
Previously `prefix` served cases (1) and (2) quite well by allowing the user
to introduce arbitrary data at the entrypoint but before the function
body. Case (3), however, was poorly handled by this approach as it
required that prefix data was valid executable code.
Here we redefine the notion of prefix data to instead be data which
occurs immediately before the function entrypoint (i.e. the symbol
address). Since prefix data now occurs before the function entrypoint,
there is no need for the data to be valid code.
The previous notion of prefix data now goes under the name "prologue
data" to emphasize its duality with the function epilogue.
The intention here is to handle cases (1) and (2) with prologue data and
case (3) with prefix data.
References
----------
This idea arose out of discussions[1] with Reid Kleckner in response to a
proposal to introduce the notion of symbol offsets to enable handling of
case (3).
[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html
Test Plan: testsuite
Differential Revision: http://reviews.llvm.org/D6454
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223189 91177308-0d34-0410-b5e6-96231b3b80d8
We've long supported readcyclecounter on PPC64, but it is easier there (the
read of the 64-bit time-base register can be accomplished via a single
instruction). This now provides an implementation for PPC32 as well. On PPC32,
the time-base register is still 64 bits, but can only be read 32 bits at a time
via two separate SPRs. The ISA manual explains how to do this properly (it
involves re-reading the upper bits and looping if the counter has wrapped while
being read).
This requires PPC to implement a custom integer splitting legalization for the
READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then
gets turned into a pseudo-instruction, which is then expanded to the necessary
sequence (which has three SPR reads, the comparison and the branch).
Thanks to Paul Hargrove for pointing out to me that this was still unimplemented.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223161 91177308-0d34-0410-b5e6-96231b3b80d8
This is the third patch in a small series. It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085). The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them.
With this change, gc.statepoints should be functionally complete. The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now.
I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated. The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it.
During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics. Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints. Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack. The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases.
In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator. In principal, we shouldn't need to eagerly spill at all. The register allocator should do any spilling required and the statepoint should simply record that fact. Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure.
Reviewed by: atrick, ributzka
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223137 91177308-0d34-0410-b5e6-96231b3b80d8
Go through implicit defs of CSMI and MI, and clear the kill flags on
their uses in all the instructions between CSMI and MI.
We might have made some of the kill flags redundant, consider:
subs ... %NZCV<imp-def> <- CSMI
csinc ... %NZCV<imp-use,kill> <- this kill flag isn't valid anymore
subs ... %NZCV<imp-def> <- MI, to be eliminated
csinc ... %NZCV<imp-use,kill>
Since we eliminated MI, and reused a register imp-def'd by CSMI
(here %NZCV), that register, if it was killed before MI, should have
that kill flag removed, because it's lifetime was extended.
Also, add an exhaustive testcase for the motivating example.
Reviewed by: Juergen Ributzka <juergen@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223133 91177308-0d34-0410-b5e6-96231b3b80d8
This is the second patch in a small series. This patch contains the MachineInstruction and x86-64 backend pieces required to lower Statepoints. It does not include the code to actually generate the STATEPOINT machine instruction and as a result, the entire patch is currently dead code. I will be submitting the SelectionDAG parts within the next 24-48 hours. Since those pieces are by far the most complicated, I wanted to minimize the size of that patch. That patch will include the tests which exercise the functionality in this patch. The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683.
The STATEPOINT psuedo node is generated after all gc values are explicitly spilled to stack slots. The purpose of this node is to wrap an actual call instruction while recording the spill locations of the meta arguments used for garbage collection and other purposes. The STATEPOINT is modeled as modifing all of those locations to prevent backend optimizations from forwarding the value from before the STATEPOINT to after the STATEPOINT. (Doing so would break relocation semantics for collectors which wish to relocate roots.)
The implementation of STATEPOINT is closely modeled on PATCHPOINT. Eventually, much of the code in this patch will be removed. The long term plan is to merge the functionality provided by statepoints and patchpoints. Merging their implementations in the backend is likely to be a good starting point.
Reviewed by: atrick, ributzka
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223085 91177308-0d34-0410-b5e6-96231b3b80d8
The MachineVerifier used to check that there was always exactly one
unconditional branch to a non-landingpad (normal) successor.
If that normal successor to an invoke BB is unreachable, it seems
reasonable to only have one successor, the landing pad.
On targets other than AArch64 (and on AArch64 with a different testcase),
the branch folder turns the branch to the landing pad into a fallthrough.
The MachineVerifier, which relies on AnalyzeBranch, is unable to check
the condition, and doesn't complain. However, it does in this specific
testcase, where the branch to the landing pad remained.
Make the MachineVerifier accept it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223059 91177308-0d34-0410-b5e6-96231b3b80d8
This can significantly reduce the size of the switch, allowing for more
efficient lowering.
I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223049 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot. I'll respond to the commit on the
list with a reproduction of one of the failures.
Conflicts:
lib/Target/X86/X86TargetTransformInfo.cpp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222936 91177308-0d34-0410-b5e6-96231b3b80d8
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222632 91177308-0d34-0410-b5e6-96231b3b80d8
Before this patch, the DAGCombiner only tried to convert build_vector dag nodes
into shuffles if all operands were either extract_vector_elt or undef.
This patch improves that logic and teaches the DAGCombiner how to deal with
build_vector dag nodes where one or more operands are zero. A build_vector
dag node with some zero operands is turned into a shuffle only if the resulting
shuffle mask is legal for the target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222536 91177308-0d34-0410-b5e6-96231b3b80d8
This patch simplifies the logic that combines a pair of shuffle nodes into
a single shuffle if there is a legal mask. Also added comments to better
describe the algorithm. No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222522 91177308-0d34-0410-b5e6-96231b3b80d8
- Show "Considering..." message after flipping so you actually see the final
destination vreg as destination.
- Add a message on final join, so you can grep for "Success" messages to obtain
a list of which register got merged with which.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222382 91177308-0d34-0410-b5e6-96231b3b80d8
This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain.
I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr.
Differential Revision: http://reviews.llvm.org/D5699
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222340 91177308-0d34-0410-b5e6-96231b3b80d8
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.
This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222334 91177308-0d34-0410-b5e6-96231b3b80d8