While the term "Target" is in the name, it doesn't really have to do
with the LLVM Target library -- this isn't an abstraction which LLVM
targets generally need to implement or extend. It has much more to do
with modeling the various runtime libraries on different OSes and with
different runtime environments. The "target" in this sense is the more
general sense of a target of cross compilation.
This is in preparation for porting this analysis to the new pass
manager.
No functionality changed, and updates inbound for Clang and Polly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226078 91177308-0d34-0410-b5e6-96231b3b80d8
Fill out our support for the floating-point status and control register
instructions (mcrfs and friends). As it turns out, these are necessary for
compiling src/test/harness_fp.h in TBB for PowerPC.
Thanks to Raf Schietekat for reporting the issue!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226070 91177308-0d34-0410-b5e6-96231b3b80d8
Patch by Kit Barton.
Support for the ICBT instruction is currently present, but limited to
embedded processors. This change adds a new FeatureICBT that can be used
to identify whether the ICBT instruction is available on a specific processor.
Two new tests are added:
* Positive test to ensure the icbt instruction is present when using
-mcpu=pwr8
* Negative test to ensure the icbt instruction is not generated when
using -mcpu=pwr7
Both test cases use the Prefetch opcode in LLVM. They are based on the
ppc64-prefetch.ll test case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226033 91177308-0d34-0410-b5e6-96231b3b80d8
utils/sort_includes.py.
I clearly haven't done this in a while, so more changed than usual. This
even uncovered a missing include from the InstrProf library that I've
added. No functionality changed here, just mechanical cleanup of the
include order.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225974 91177308-0d34-0410-b5e6-96231b3b80d8
This re-applies r225808, fixed to avoid problems with SDAG dependencies along
with the preceding fix to ScheduleDAGSDNodes::RegDefIter::InitNodeNumDefs.
These problems caused the original regression tests to assert/segfault on many
(but not all) systems.
Original commit message:
This commit does two things:
1. Refactors PPCFastISel to use more of the common infrastructure for call
lowering (this lets us take advantage of this common code for lowering some
common intrinsics, stackmap/patchpoint among them).
2. Adds support for stackmap/patchpoint lowering. For the most part, this is
very similar to the support in the AArch64 target, with the obvious differences
(different registers, NOP instructions, etc.). The test cases are adapted
from the AArch64 test cases.
One difference of note is that the patchpoint call sequence takes 24 bytes, so
you can't use less than that (on AArch64 you can go down to 16). Also, as noted
in the docs, we take the patchpoint address to be the actual code address
(assuming the call is local in the TOC-sharing sense), which should yield
higher performance than generating the full cross-DSO indirect-call sequence
and is likely just as useful for JITed code (if not, we'll change it).
StackMaps and Patchpoints are still marked as experimental, and so this support
is doubly experimental. So go ahead and experiment!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225909 91177308-0d34-0410-b5e6-96231b3b80d8
This was already done in clang, this commit now uses the integrated
assembler as default when using LLVM tools directly.
A number of test cases using inline asm had to be adapted, either by
updating the expected output, or by using -no-integrated-as (for such
tests that deliberately use an invalid instruction in inline asm).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225819 91177308-0d34-0410-b5e6-96231b3b80d8
Reverting this while I investiage buildbot failures (segfaulting in
GetCostForDef at ScheduleDAGRRList.cpp:314).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225811 91177308-0d34-0410-b5e6-96231b3b80d8
This commit does two things:
1. Refactors PPCFastISel to use more of the common infrastructure for call
lowering (this lets us take advantage of this common code for lowering some
common intrinsics, stackmap/patchpoint among them).
2. Adds support for stackmap/patchpoint lowering. For the most part, this is
very similar to the support in the AArch64 target, with the obvious differences
(different registers, NOP instructions, etc.). The test cases are adapted
from the AArch64 test cases.
One difference of note is that the patchpoint call sequence takes 24 bytes, so
you can't use less than that (on AArch64 you can go down to 16). Also, as noted
in the docs, we take the patchpoint address to be the actual code address
(assuming the call is local in the TOC-sharing sense), which should yield
higher performance than generating the full cross-DSO indirect-call sequence
and is likely just as useful for JITed code (if not, we'll change it).
StackMaps and Patchpoints are still marked as experimental, and so this support
is doubly experimental. So go ahead and experiment!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225808 91177308-0d34-0410-b5e6-96231b3b80d8
We really need a separate 64-bit version of this instruction so that it can be
marked as clobbering LR8 (instead of just LR). No change in functionality
(although the verifier might be slightly happier), however, it is required for
stackmap/patchpoint support. Thus, this will be covered by stackmap test cases
once those are added.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225804 91177308-0d34-0410-b5e6-96231b3b80d8
For registers that have DWARF numbers (like CA, which is really part of XER),
add them. Also, RM is not an SPR, and the declaration hack (where it is
declared as an SPR with an arbitrary number) is not needed, so just declare it
as a register.
NFC; although CA's register number will be needed when stackmap/patchpoint
support is added.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225800 91177308-0d34-0410-b5e6-96231b3b80d8
One is that AArch64 has additional restrictions on when local relocations can
be used. We have to take those into consideration when deciding to put a L
symbol in the symbol table or not.
The other is that ld64 requires the relocations to cstring to use linker
visible symbols on AArch64.
Thanks to Michael Zolotukhin for testing this!
Remove doesSectionRequireSymbols.
In an assembly expression like
bar:
.long L0 + 1
the intended semantics is that bar will contain a pointer one byte past L0.
In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.
The solution used in ELF to use relocation with symbols if there is a non-zero
addend.
In MachO before this patch we would just keep all symbols in some sections.
This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.
This patch implements the non-zero addend logic for MachO too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225644 91177308-0d34-0410-b5e6-96231b3b80d8
Looking at r225438 inspired me to see how the PowerPC backend handled the
situation (calling a bitcasted TLS global), and it turns out we also produced
an error (cannot select ...). What it means to "call" something that is not a
function is implementation and platform specific, but in the name of doing
something (besides crashing), this makes sure we do what GCC does (treat all
such calls as calls through a function pointer -- meaning that the pointer is
assumed, as is the convention on PPC, to point to a function descriptor
structure holding the actual code address along with the function's TOC pointer
and environment pointer). As GCC does, we now do the same for calling regular
(non-TLS) non-function globals too.
I'm not sure whether this is the most useful way to define the behavior, but at
least we won't be alone.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225617 91177308-0d34-0410-b5e6-96231b3b80d8
This initial implementation of PPCTargetLowering::isZExtFree marks as free
zexts of small scalar loads (that are not sign-extending). This callback is
used by SelectionDAGBuilder's RegsForValue::getCopyToRegs, and thus to
determine whether a zext or an anyext is used to lower illegally-typed PHIs.
Because later truncates of zero-extended values are nops, this allows for the
elimination of later unnecessary truncations.
Fixes the initial complaint associated with PR22120.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225584 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In the previous commit, the register was saved, but space was not allocated.
This resulted in the parameter save area potentially clobbering r30, leading to
nasty results.
Test Plan: Tests updated
Reviewers: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6906
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225573 91177308-0d34-0410-b5e6-96231b3b80d8
Now that the way that the partial unrolling threshold for small loops is used
to compute the unrolling factor as been corrected, a slightly smaller threshold
is preferable. This is expected; other targets may need to re-tune as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225566 91177308-0d34-0410-b5e6-96231b3b80d8
The P7 benefits from not have really-small loops so that we either have
multiple dispatch groups in the loop and/or the ability to form more-full
dispatch groups during scheduling. Setting the partial unrolling threshold to
44 seems good, empirically, for the P7. Compared to using no late partial
unrolling, this yields the following test-suite speedups:
SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding
-66.3253% +/- 24.1975%
SingleSource/Benchmarks/Misc-C++/oopack_v1p8
-44.0169% +/- 29.4881%
SingleSource/Benchmarks/Misc/pi
-27.8351% +/- 12.2712%
SingleSource/Benchmarks/Stanford/Bubblesort
-30.9898% +/- 22.4647%
I've speculatively added a similar setting for the P8. Also, I've noticed that
the unroller does not quite calculate the unrolling factor correctly for really
tiny loops because it neglects to account for the fact that not every loop body
replicant contains an ending branch and counter increment. I'll fix that later.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225522 91177308-0d34-0410-b5e6-96231b3b80d8
On modern cores with lfiw[az]x, we can fold a sign or zero extension from i32
to i64 into the load necessary for an i64 -> fp conversion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225493 91177308-0d34-0410-b5e6-96231b3b80d8
MachineLICM uses a callback named hasLowDefLatency to determine if an
instruction def operand has a 'low' latency. If all relevant operands have a
'low' latency, the instruction is considered too cheap to hoist out of loops
even in low-register-pressure situations. On PowerPC cores, both the embedded
cores and the others, there is no reason to believe that this is a good choice:
all instructions have a cost inside a loop, and hoisting them when not limited
by register pressure is a reasonable default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225471 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: The PIC additions didn't update the prologue and epilogue code to save and restore r30 (PIC base register). This does that.
Test Plan: Tests updated.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6876
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225450 91177308-0d34-0410-b5e6-96231b3b80d8
type (in addition to the memory type).
The *LoadExt* legalization handling used to only have one type, the
memory type. This forced users to assume that as long as the extload
for the memory type was declared legal, and the result type was legal,
the whole extload was legal.
However, this isn't always the case. For instance, on X86, with AVX,
this is legal:
v4i32 load, zext from v4i8
but this isn't:
v4i64 load, zext from v4i8
Whereas v4i64 is (arguably) legal, even without AVX2.
Note that the same thing was done a while ago for truncstores (r46140),
but I assume no one needed it yet for extloads, so here we go.
Calls to getLoadExtAction were changed to add the value type, found
manually in the surrounding code.
Calls to setLoadExtAction were mechanically changed, by wrapping the
call in a loop, to match previous behavior. The loop iterates over
the MVT subrange corresponding to the memory type (FP vectors, etc...).
I also pulled neighboring setTruncStoreActions into some of the loops;
those shouldn't make a difference, as the additional types are illegal.
(e.g., i128->i1 truncstores on PPC.)
No functional change intended.
Differential Revision: http://reviews.llvm.org/D6532
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225421 91177308-0d34-0410-b5e6-96231b3b80d8
A few loops do trickier things than just iterating on an MVT subset,
so I'll leave them be for now.
Follow-up of r225387.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225392 91177308-0d34-0410-b5e6-96231b3b80d8
Remove the README.txt entry regarding register allocation of CR logical ops,
and replace it with a FIXME in PPCInstrInfo.td. The text in the README.txt was
not really accurate, and thanks goes to Pat Haugen (and Bill Schmidt) from IBM
for clarifying what was intended and highlighting the relevant text in the ISA
specification.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225325 91177308-0d34-0410-b5e6-96231b3b80d8
This is affecting the behavior of some ObjC++ / AArch64 test cases on Darwin.
Reverting to get the bots green while I track down the source of the changed
behavior.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225311 91177308-0d34-0410-b5e6-96231b3b80d8
int->fp conversions on PPC must be done through memory loads and stores. On a
modern core, this process begins by storing the int value to memory, then
loading it using a (sometimes special) FP load instruction. Unfortunately, we
would do this even when the value to be converted was itself a load, and we can
just use that same memory location instead of copying it to another first.
There is a slight complication when handling int_to_fp(fp_to_int(x)) pairs,
because the fp_to_int operand has not been lowered when the int_to_fp is being
lowered. We handle this specially by invoking fp_to_int's lowering logic
(partially) and getting the necessary memory location (some trivial refactoring
was done to make this possible).
This is all somewhat ugly, and it would be nice if some later CodeGen stage
could just clean this stuff up, but because doing so would involve modifying
target-specific nodes (or instructions), it is not immediately clear how that
would work.
Also, remove a related entry from the README.txt for which we now generate
reasonable code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225301 91177308-0d34-0410-b5e6-96231b3b80d8
Because of how Clang represents structs as arrays (at least on non-Darwin
platforms), and what SROA does, etc. this is no longer a problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225251 91177308-0d34-0410-b5e6-96231b3b80d8
The old target DAG combine that allowed for performing int_to_fp(fp_to_int(x))
without a load/store pair is updated here with support for unsigned integers,
and to support single-precision values without a third rounding step, on newer
cores with the appropriate instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225248 91177308-0d34-0410-b5e6-96231b3b80d8
We no longer generate horrible code for the stated function:
void f(signed char *a, _Bool b, _Bool c) {
signed char t = 0;
if (b) t = *a;
if (c) *a = t;
}
for which we now generate:
.L.f:
andi. 5, 5, 1
cmpldi 1, 4, 0
li 5, 0
beq 1, .LBB0_2
lbz 5, 0(3)
.LBB0_2: # %if.end
bclr 4, 1, 0
stb 5, 0(3)
blr
so we don't need the README.txt entry.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225217 91177308-0d34-0410-b5e6-96231b3b80d8
We now produce the desired code as noted in the README.txt file (no spurious
or). Remove the README entry and improve the regression test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225214 91177308-0d34-0410-b5e6-96231b3b80d8
We now produce the desired code as noted in the README.txt file. Remove the
README entry and add a regression test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225209 91177308-0d34-0410-b5e6-96231b3b80d8
We now produce the desired code as noted in the README.txt file. Remove the
README entry and add a regression test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225205 91177308-0d34-0410-b5e6-96231b3b80d8
Consider this function from our README.txt file:
int foo(int a, int b) { return (a < b) << 4; }
We now explicitly track CR bits by default, so the comment in the README.txt
about not really having a SETCC is no longer accurate, but we did generate this
somewhat silly code:
cmpw 0, 3, 4
li 3, 0
li 12, 1
isel 3, 12, 3, 0
sldi 3, 3, 4
blr
which generates the zext as a select between 0 and 1, and then shifts the
result by a constant amount. Here we preprocess the DAG in order to fold the
results of operations on an extension of an i1 value into the SELECT_I[48]
pseudo instruction when the resulting constant can be materialized using one
instruction (just like the 0 and 1). This was not implemented as a DAGCombine
because the resulting code would have been anti-canonical and depends on
replacing chained user nodes, which does not fit well into the lowering
paradigm. Now we generate:
cmpw 0, 3, 4
li 3, 0
li 12, 16
isel 3, 12, 3, 0
blr
which is less silly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225203 91177308-0d34-0410-b5e6-96231b3b80d8
The 64-bit semantics of cntlzw are not special, the 32-bit population count is
stored as a 64-bit value in the range [0,32]. As a result, it is always zero
extended, and it can be added to the PPCISelDAGToDAG peephole optimization as a
frontier instruction for the removal of unnecessary zero extensions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225192 91177308-0d34-0410-b5e6-96231b3b80d8
lhbrx and lwbrx not only load their data with byte swapping, but also clear the
upper 32 bits (at least). As a result, they can be added to the PPCISelDAGToDAG
peephole optimization as frontier instructions for the removal of unnecessary
zero extensions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225189 91177308-0d34-0410-b5e6-96231b3b80d8
PPC has an instruction for ctlz with defined zero behavior, and our lowering of
cttz (provided by DAGCombine) is also efficient and branchless, so speculating
these makes sense.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225150 91177308-0d34-0410-b5e6-96231b3b80d8
r225135 added the ability to materialize i64 constants using rotations in order
to reduce the instruction count. Sometimes we can use a rotation only with some
extra masking, so that we take advantage of the fact that generating a bunch of
extra higher-order 1 bits is easy using li/lis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225147 91177308-0d34-0410-b5e6-96231b3b80d8
Materializing full 64-bit constants on PPC64 can be expensive, requiring up to
5 instructions depending on the locations of the non-zero bits. Sometimes
materializing a rotated constant, and then applying the inverse rotation, requires
fewer instructions than the direct method. If so, do that instead.
In r225132, I added support for forming constants using bit inversion. In
effect, this reverts that commit and replaces it with rotation support. The bit
inversion is useful for turning constants that are mostly ones into ones that
are mostly zeros (thus enabling a more-efficient shift-based materialization),
but the same effect can be obtained by using negative constants and a rotate,
and that is at least as efficient, if not more.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225135 91177308-0d34-0410-b5e6-96231b3b80d8
Materializing full 64-bit constants on PPC64 can be expensive, requiring up to
5 instructions depending on the locations of the non-zero bits. Sometimes
materializing the bit-reversed constant, and then flipping the bits, requires
fewer instructions than the direct method. If so, do that instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225132 91177308-0d34-0410-b5e6-96231b3b80d8
The existing code provided for specifying a global loop alignment preference.
However, the preferred loop alignment might depend on the loop itself. For
recent POWER cores, loops between 5 and 8 instructions should have 32-byte
alignment (while the others are better with 16-byte alignment) so that the
entire loop will fit in one i-cache line.
To support this, getPrefLoopAlignment has been made virtual, and can be
provided with an optional MachineLoop* so the target can inspect the loop
before answering the query. The default behavior, as before, is to return the
value set with setPrefLoopAlignment. MachineBlockPlacement now queries the
target for each loop instead of only once per function. There should be no
functional change for other targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225117 91177308-0d34-0410-b5e6-96231b3b80d8
Most modern PowerPC cores prefer that functions and loops start on
16-byte-aligned boundaries (*), so instruct block placement, etc. to make this
happen. The branch selector has also been adjusted so account for the extra
nops that might now be inserted before loop headers.
(*) Some cores actually prefer other alignments for small loops, but that will
be addressed in a follow-up commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225115 91177308-0d34-0410-b5e6-96231b3b80d8