Since block address values can be larger than 2GB in 64-bit code, they
cannot be loaded simply using an @l / @ha pair, but instead must be
loaded from the TOC, just like GlobalAddress, ConstantPool, and
JumpTable values are.
The commit also fixes a bug in PPCLinuxAsmPrinter::doFinalization where
temporary labels could not be used as TOC values, since code would
attempt (and fail) to use GetOrCreateSymbol to create a symbol of the
same name as the temporary label.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220959 91177308-0d34-0410-b5e6-96231b3b80d8
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.
For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).
This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..
See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900
Differential Revision: http://reviews.llvm.org/D5658
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220570 91177308-0d34-0410-b5e6-96231b3b80d8
A previous patch enabled SELECT_VSRC and SELECT_CC_VSRC for VSX to
handle <2 x double> cases. This patch adds SELECT_VSFRC and
SELECT_CC_VSFRC to allow use of all 64 vector-scalar registers for the
f64 type when VSX is enabled. The changes are analogous to those in
the previous patch. I've added a new variant to vsx.ll to test the
code generation.
(I also cleaned up a little formatting in PPCInstrVSX.td from the
previous patch.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220395 91177308-0d34-0410-b5e6-96231b3b80d8
The tests test/CodeGen/Generic/select-cc.ll and
test/CodeGen/PowerPC/select-cc.ll both fail with VSX enabled. The
problem is that the lowering logic for the SELECT and SELECT_CC
operations doesn't currently support the VSX registers. This patch
fixes that.
In lib/Target/PowerPC/PPCInstrInfo.td, we have pseudos to handle this
for other register classes. Similar pseudos are added in
PPCInstrVSX.td (they must be there, because the "vsrc" register class
definition appears there) for the VSRC register class. The
SELECT_VSRC pseudo is then used in pattern matching for SELECT_CC.
The rest of the patch just adds logic for SELECT_VSRC wherever similar
logic appears for SELECT_VRRC.
There are no new test cases because the existing tests above test
this, along with a variant in test/CodeGen/PowerPC/vsx.ll.
After discussion with Hal, a future patch will add similar _VSFRC
variants to override f64 type handling (currently using F8RC).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220385 91177308-0d34-0410-b5e6-96231b3b80d8
Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64
types, but does not yet use lxvw4x and stxvw4x for 4x32 types. This
patch adds that support.
As with lxvd2x/stxvd2x, this involves straightforward overriding of
the patterns normally recognized for lvx/stvx, with preference given
to the VSX patterns when VSX is enabled.
In addition, the logic for permitting misaligned memory accesses is
modified so that v4r32 and v4i32 are treated the same as v2f64 and
v2i64 when VSX is enabled. Finally, the DAG generation for unaligned
loads is changed to just use a normal LOAD (which will become lxvw4x)
on P8 and later hardware, where unaligned loads are preferred over
lvsl/lvx/lvx/vperm.
A number of tests now generate the VSX loads/stores instead of
lvx/stvx, so this patch adds VSX variants to those tests. I've also
added <4 x float> tests to the vsx.ll test case, and created a
vsx-p8.ll test case to be used for testing code generation for the
P8Vector feature. For now, that simply tests the unaligned load/store
behavior.
This has been tested along with a temporary patch to enable the VSX
and P8Vector features, with no new regressions encountered with or
without the temporary patch applied.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220047 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Atomic loads and store of up to the native size (32 bits, or 64 for PPC64)
can be lowered to a simple load or store instruction (as the synchronization
is already handled by AtomicExpand, and the atomicity is guaranteed thanks to
the alignment requirements of atomic accesses). This is exactly what this patch
does. Previously, these were implemented by complex
load-linked/store-conditional loops.. an obvious performance problem.
For example, this patch turns
```
define void @store_i8_unordered(i8* %mem) {
store atomic i8 42, i8* %mem unordered, align 1
ret void
}
```
from
```
_store_i8_unordered: ; @store_i8_unordered
; BB#0:
rlwinm r2, r3, 3, 27, 28
li r4, 42
xori r5, r2, 24
rlwinm r2, r3, 0, 0, 29
li r3, 255
slw r4, r4, r5
slw r3, r3, r5
and r4, r4, r3
LBB4_1: ; =>This Inner Loop Header: Depth=1
lwarx r5, 0, r2
andc r5, r5, r3
or r5, r4, r5
stwcx. r5, 0, r2
bne cr0, LBB4_1
; BB#2:
blr
```
into
```
_store_i8_unordered: ; @store_i8_unordered
; BB#0:
li r2, 42
stb r2, 0(r3)
blr
```
which looks like a pretty clear win to me.
Test Plan:
fixed the tests + new test for indexed accesses + make check-all
Reviewers: jfb, wschmidt, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5587
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218922 91177308-0d34-0410-b5e6-96231b3b80d8
It was hacky to use an opcode as a switch because it won't always match
(rsqrte != sqrte), and it looks like we'll need to add more special casing
per arch than I had hoped for. Eg, x86 will prefer a different NR estimate
implementation. ARM will want to use it's 'step' instructions. There also
don't appear to be any new estimate instructions in any arch in a long,
long time. Altivec vloge and vexpte may have been the first and last in
that field...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218698 91177308-0d34-0410-b5e6-96231b3b80d8
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.
The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:
z = y / sqrt(x)
into:
z = y * rsqrte(x)
And:
z = y / x
into:
z = y * rcpe(x)
using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .
There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.
Differential Revision: http://reviews.llvm.org/D5484
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218553 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch makes use of AtomicExpandPass in Power for inserting fences around
atomic as part of an effort to remove fence insertion from SelectionDAGBuilder.
As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic
lwsync) instead of sync 0 (heavyweight sync) in many cases.
I also added a test, as there was no test for the barriers emitted by the Power
backend for atomic loads and stores.
Test Plan: new test + make check-all
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5180
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218331 91177308-0d34-0410-b5e6-96231b3b80d8
This is purely a plumbing patch. No functional changes intended.
The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:
z = y / sqrt(x)
into:
z = y * rsqrte(x)
using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .
The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner.
Next steps:
The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added.
Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner.
X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added.
Differential Revision: http://reviews.llvm.org/D5425
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218219 91177308-0d34-0410-b5e6-96231b3b80d8
The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one
use, but this is overly-conservative on some systems. Specifically, if the FMA
and the FADD have the same latency (and the FMA does not compete for resources
with the FMUL any more than the FADD does), there is no need for the
restriction, and furthermore, forming the FMA leaving the FMUL can still allow
for higher overall throughput and decreased critical-path length.
Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to
elide the hasOneUse check. This is enabled for PowerPC by default, as most
PowerPC systems will benefit.
Patch by Olivier Sallenave, thanks!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218120 91177308-0d34-0410-b5e6-96231b3b80d8
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reinstates commits r215111, 215115, 215116, 215117, 215136.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216982 91177308-0d34-0410-b5e6-96231b3b80d8
isPow2DivCheap
That name doesn't specify signed or unsigned.
Lazy as I am, I eventually read the function and variable comments. It turns out that this is strictly about signed div. But I discovered that the comments are wrong:
srl/add/sra
is not the general sequence for signed integer division by power-of-2. We need one more 'sra':
sra/srl/add/sra
That's the sequence produced in DAGCombiner. The first 'sra' may be removed when dividing by exactly '2', but that's a special case.
This patch corrects the comments, changes the name of the flag bit, and changes the name of the accessor methods.
No functional change intended.
Differential Revision: http://reviews.llvm.org/D5010
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216237 91177308-0d34-0410-b5e6-96231b3b80d8
A byval object, even if allocated at a fixed offset (prescribed by the ABI) is
pointed to by IR values. Most fixed-offset stack objects are not pointed-to by
IR values, so the default is to assume this is not possible. However, we need
to override the default in this case (instruction scheduling can cause
miscompiles otherwise).
Fixes PR20280.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215795 91177308-0d34-0410-b5e6-96231b3b80d8
On PPC/Darwin, byval arguments occur at fixed stack offsets in the callee's
frame, but are not immutable -- the pointer value is directly available to the
higher-level code as the address of the argument, and the value of the byval
argument can be modified at the IR level.
This is necessary, but not sufficient, to fix PR20280. When PR20280 is fixed in
a follow-up commit, its test case will cover this change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215793 91177308-0d34-0410-b5e6-96231b3b80d8
This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store
intrinsics. As with the construction of the MachineMemOperands for the
intrinsic calls used for unaligned load/store lowering, the only slight
complication is that we need to represent a larger memory range than the
loaded/stored value-type size (because the address is rounded down to an
aligned address, and we need to conservatively represent the entire possible
range of the actual access). This required adding an extra size field to
TargetLowering::IntrinsicInfo, and this was done in a way that required no
modifications to other targets (the size defaults to the store size of the
provided memory data type).
This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215512 91177308-0d34-0410-b5e6-96231b3b80d8
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reverts commits r215111, 215115, 215116, 215117, 215136.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215154 91177308-0d34-0410-b5e6-96231b3b80d8
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.
Thanks to Lang Hames for making MCJIT a good replacement!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215111 91177308-0d34-0410-b5e6-96231b3b80d8
to get the subtarget and that's accessible from the MachineFunction
now. This helps clear the way for smaller changes where we getting
a subtarget will require passing in a MachineFunction/Function as
well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214988 91177308-0d34-0410-b5e6-96231b3b80d8
Commits r213915 and r214718 fix recognition of shuffle masks for vmrg*
and vpku*um instructions for a little-endian target, by swapping the
input arguments. The vsldoi instruction requires similar treatment,
and also needs its shift count adjusted for little endian.
Reviewed by Ulrich Weigand.
This is a bug fix candidate for release 3.5 (and hopefully the last of
those for PowerPC).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214923 91177308-0d34-0410-b5e6-96231b3b80d8
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.
Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214838 91177308-0d34-0410-b5e6-96231b3b80d8
My original LE implementation of the vsldoi instruction, with its
altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect
shufflevector operations in the LLVM IR. Correct code is generated
because the back end handles the incorrect shufflevector in a
consistent manner.
This patch and a companion patch for Clang correct this problem by
removing the fixup from altivec.h and the corresponding fixup from the
PowerPC back end. Several test cases are also modified to reflect the
now-correct LLVM IR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214800 91177308-0d34-0410-b5e6-96231b3b80d8
In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl*
by swapping the input arguments. As it turns out, the exact same fix
is also required for the vpkuhum/vpkuwum patterns.
This fixes another regression in llvmpipe when vector support is
enabled.
Reviewed by Bill Schmidt.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214718 91177308-0d34-0410-b5e6-96231b3b80d8
I ran into some test failures where common code changed vector division
by constant into a multiply-high operation (MULHU). But these are not
implemented by the back-end, so we failed to recognize the insn.
Fixed by marking MULHU/MULHS as Expand for vector types.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214716 91177308-0d34-0410-b5e6-96231b3b80d8
This patch refactors code generation of vector comparisons.
This fixes a wrong code-gen bug for ISD::SETGE for floating-point types,
and improves generated code for vector comparisons in general.
Specifically, the patch moves all logic deciding how to implement vector
comparisons into getVCmpInst, which gets two extra boolean outputs
indicating to its caller whether its needs to swap the input operands
and/or negate the result of the comparison. Apart from implementing
these two modifications as directed by getVCmpInst, there is no need
to ever implement vector comparisons in any other manner; in particular,
there is never a need to perform two separate comparisons (e.g. one for
equal and one for greater-than, as code used to do before this patch).
Reviewed by Bill Schmidt.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214714 91177308-0d34-0410-b5e6-96231b3b80d8
Found by inspection while looking at PR20280: code would mark slots
in the parameter save area where a byval parameter is passed as
"immutable". This is not correct since code is allowed to modify
byval parameters in place in the parameter save area.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214517 91177308-0d34-0410-b5e6-96231b3b80d8
Altivec vector loads on PowerPC have an interesting property: They always load
from an aligned address (by rounding down the address actually provided if
necessary). In order to generate an actual unaligned load, you can generate two
load instructions, one with the original address, one offset by one vector
length, and use a special permutation to extract the bytes desired.
When this was originally implemented, I generated these two loads using regular
ISD::LOAD nodes, now marked as aligned. Unfortunately, there is a problem with
this:
The alignment of a load does not contribute to its identity, and SDNodes
are uniqued. So, imagine that we have some unaligned load, L1, that is not
aligned. The routine will create two loads, L1(aligned) and (L1+16)(aligned).
Further imagine that there had already existed a load (L1+16)(unaligned) with
the same chain operand as the load L1. When (L1+16)(aligned) is created as part
of the lowering of L1, this load *is* also the (L1+16)(unaligned) node, just
now marked as aligned (because the new alignment overwrites the old). But the
original users of (L1+16)(unaligned) now get the data intended for the
permutation yielding the data for L1, and (L1+16)(unaligned) no longer exists
to get its own permutation-based expansion. This was PR19991.
A second potential problem has to do with the MMOs on these loads, which can be
used by AA during instruction scheduling to break chain-based dependencies. If
the new "aligned" loads get the MMO from the original unaligned load, this does
not represent the fact that it will load data from below the original address.
Normally, this would not matter, but this load might be combined with another
load pair for a previous vector, and then the dependency on the otherwise-
ignored lower bytes can matter.
To fix both problems, instead of generating the necessary loads using regular
ISD::LOAD instructions, ppc_altivec_lvx intrinsics are used instead. These are
provided with MMOs with a conservative address range.
Unfortunately, I no longer have a failing test case (since PR19991 was
reported, other changes in CodeGen have forced this bug back into hiding it
again). Nevertheless, this should fix the underlying problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214481 91177308-0d34-0410-b5e6-96231b3b80d8
When generating unaligned vector loads, we need to search for other loads or
stores nearby offset by one vector width. If we find one, then we know that we
can safely generate another aligned load at that address. Otherwise, we must
generate the next load using an offset of the vector width minus one byte (so
we don't read off the end of the allocation if the base unaligned address
happened to be aligned at runtime). We had previously done this using only
other vector loads and stores, but did not consider the PowerPC-specific vector
load/store intrinsics. Now we'll also consider vector intrinsics. By itself,
this change is a feature enhancement, but is a necessary step toward fixing the
underlying problem behind PR19991.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214469 91177308-0d34-0410-b5e6-96231b3b80d8
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.
This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214449 91177308-0d34-0410-b5e6-96231b3b80d8
Rename to allowsMisalignedMemoryAccess.
On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment,
and don't need to be split into multiple accesses. Vector loads with
an alignment of the element type are not uncommon in OpenCL code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214055 91177308-0d34-0410-b5e6-96231b3b80d8
Because the PowerPC vmrgh* and vmrgl* instructions have a built-in
big-endian bias, it is necessary to swap their inputs in little-endian
mode when using them to implement a vector shuffle. This was
previously missed in the vector LE implementation.
There was already logic to distinguish between unary and "normal"
vmrg* vector shuffles, so this patch extends that logic to use a third
option: "swapped" vmrg* vector shuffles that are used for little
endian in place of the "normal" ones.
I've updated the vec-shuffle-le.ll test to check for the expected
register ordering on the generated instructions.
This bug was discovered when testing the LE and ELFv2 patches for
safety if they were backported to 3.4. A different vectorization
decision was made in 3.4 than on mainline trunk, and that exposed the
problem. I've verified this fix takes care of that issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213915 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds infrastructure support for passing array types
directly. These can be used by the front-end to pass aggregate
types (coerced to an appropriate array type). The details of the
array type being used inform the back-end about ABI-relevant
properties. Specifically, the array element type encodes:
- whether the parameter should be passed in FPRs, VRs, or just
GPRs/stack slots (for float / vector / integer element types,
respectively)
- what the alignment requirements of the parameter are when passed in
GPRs/stack slots (8 for float / 16 for vector / the element type
size for integer element types) -- this corresponds to the
"byval align" field
Using the infrastructure provided by this patch, a companion patch
to clang will enable two features:
- In the ELFv2 ABI, pass (and return) "homogeneous" floating-point
or vector aggregates in FPRs and VRs (this is similar to the ARM
homogeneous aggregate ABI)
- As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates
that fit fully in registers without using the "byval" mechanism
The patch uses the functionArgumentNeedsConsecutiveRegisters callback
to encode that special treatment is required for all directly-passed
array types. The isInConsecutiveRegs / isInConsecutiveRegsLast bits set
as a results are then used to implement the required size and alignment
rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc.
As a related change, the ABI routines have to be modified to support
passing floating-point types in GPRs. This is necessary because with
homogeneous aggregates of 4-byte float type we can now run out of FPRs
*before* we run out of the 64-byte argument save area that is shadowed
by GPRs. Any extra floating-point arguments that no longer fit in FPRs
must now be passed in GPRs until we run out of those too.
Note that there was already code to pass floating-point arguments in
GPRs used with vararg parameters, which was done by writing the argument
out to the argument save area first and then reloading into GPRs. The
patch re-implements this, however, in favor of code packing float arguments
directly via extension/truncation, BITCAST, and BUILD_PAIR operations.
This is required to support the ELFv2 ABI, since we cannot unconditionally
write to the argument save area (which the caller might not have allocated).
The change does, however, affect ELFv1 varags routines too; but even here
the overall effect should be advantageous: Instead of loading the argument
into the FPR, then storing the argument to the stack slot, and finally
reloading the argument from the stack slot into a GPR, the new code now
just loads the argument into the FPR, and subsequently loads the argument
into the GPR (via BITCAST). That BITCAST might imply a save/reload from
a stack temporary (in which case we're no worse than before); but it
might be implemented more efficiently in some cases.
The final part of the patch enables up to 8 FPRs and VRs for argument
return in PPCCallingConv.td; this is required to support returning
ELFv2 homogeneous aggregates. (Note that this doesn't affect other ABIs
since LLVM wil only look for which register to use if the parameter is
marked as "direct" return anyway.)
Reviewed by Hal Finkel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213493 91177308-0d34-0410-b5e6-96231b3b80d8
The ELFv2 ABI reduces the amount of stack required to implement an
ABI-compliant function call in two ways:
* the "linkage area" is reduced from 48 bytes to 32 bytes by
eliminating two unused doublewords
* the 64-byte "parameter save area" is now optional and need not be
present in certain cases (it remains mandatory in functions with
variable arguments, and functions that have any parameter that is
passed on the stack)
The following patch implements this required changes:
- reducing the linkage area, and associated relocation of the TOC save
slot, in getLinkageSize / getTOCSaveOffset (this requires updating all
callers of these routines to pass in the isELFv2ABI flag).
- (partially) handling the case where the parameter save are is optional
This latter part requires some extra explanation: Currently, we still
always allocate the parameter save area when *calling* a function.
That is certainly always compliant with the ABI, but may cause code to
allocate stack unnecessarily. This can be addressed by a follow-on
optimization patch.
On the *callee* side, in LowerFormalArguments, we *must* track
correctly whether the ABI guarantees that the caller has allocated
the parameter save area for our use, and the patch does so. However,
there is one complication: the code that handles incoming "byval"
arguments will currently *always* write to the parameter save area,
because it has to force incoming register arguments to the stack since
it must return an *address* to implement the byval semantics.
To fix this, the patch changes the LowerFormalArguments code to write
arguments to a freshly allocated stack slot on the function's own stack
frame instead of the argument save area in those cases where that area
is not present.
Reviewed by Hal Finkel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213490 91177308-0d34-0410-b5e6-96231b3b80d8
This patch builds upon the two preceding MC changes to implement the
basic ELFv2 function call convention. In the ELFv1 ABI, a "function
descriptor" was associated with every function, pointing to both the
entry address and the related TOC base (and a static chain pointer
for nested functions). Function pointers would actually refer to that
descriptor, and the indirect call sequence needed to load up both entry
address and TOC base.
In the ELFv2 ABI, there are no more function descriptors, and function
pointers simply refer to the (global) entry point of the function code.
Indirect function calls simply branch to that address, after loading it
up into r12 (as required by the ABI rules for a global entry point).
Direct function calls continue to just do a "bl" to the target symbol;
this will be resolved by the linker to the local entry point of the
target function if it is local, and to a PLT stub if it is global.
That PLT stub would then load the (global) entry point address of the
final target into r12 and branch to it. Note that when performing a
local function call, r2 must be set up to point to the current TOC
base: if the target ends up local, the ABI requires that its local
entry point is called with r2 set up; if the target ends up global,
the PLT stub requires that r2 is set up.
This patch implements all LLVM changes to implement that scheme:
- No longer create a function descriptor when emitting a function
definition (in EmitFunctionEntryLabel)
- Emit two entry points *if* the function needs the TOC base (r2)
anywhere (this is done EmitFunctionBodyStart; note that this cannot
be done in EmitFunctionBodyStart because the global entry point
prologue code must be *part* of the function as covered by debug info).
- In order to make use tracking of r2 (as needed above) work correctly,
mark direct function calls as implicitly using r2.
- Implement the ELFv2 indirect function call sequence (no function
descriptors; load target address into r12).
- When creating an ELFv2 object file, emit the .abiversion 2 directive
to tell the linker to create the appropriate version of PLT stubs.
Reviewed by Hal Finkel.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213489 91177308-0d34-0410-b5e6-96231b3b80d8
When handling an incoming byval argument, we need to possibly write
incoming registers to the stack in order to create an on-stack image
of the parameter, so we can return its address to common code.
This currently uses CreateFixedObject to access the parts of the
parameter save area where the argument is (or needs to be) stored.
However, sometimes we need to access multiple parts of that area,
e.g. to write multiple registers. The code currently uses a new
CreateFixedObject call for each of these accesses, resulting in
a patchwork of overlapping (fixed) stack objects.
This doesn't really matter in the case of fixed objects, since
any access to those turns into a fixed stackpointer + offset
address anyway. However, with the upcoming ELFv2 patches, we
may actually need to place an incoming argument into our *own*
stack frame instead of the caller's. This means we need to use
CreateStackObject instead, and we cannot have multiple overlapping
instances of those.
To make the rest of the argument handling code work equally in
both situations, this patch refactors it to always use just a
single call to CreateFixedObject, and access parts of that object
as required using address arithmetic. This way, we can in a future
patch substitute CreateStackObject without further changes.
No change to generated code intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213483 91177308-0d34-0410-b5e6-96231b3b80d8
The PPCTargetLowering::SelectAddressRegImm routine needs to handle
FrameIndex nodes in a special manner, by tranlating them into a
TargetFrameIndex node. This was done in most cases, but seems to
have been neglected in one path: when the input tree has an OR of
the FrameIndex with an immediate. This can happen if the FrameIndex
can be proven to be sufficiently aligned that an OR of that immediate
is equivalent to an ADD.
The missing handling of FrameIndex in that case caused the SelectionDAG
instruction selection to miss opportunities to merge the OR back into
the FrameIndex node, leading to superfluous addi/ori instructions in
the final assembler output.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213482 91177308-0d34-0410-b5e6-96231b3b80d8
This adds initial support for PPC32 ELF PIC (Position Independent Code; the
-fPIC variety), thus rectifying a long-standing deficiency in the PowerPC
backend.
Patch by Justin Hibbits!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213427 91177308-0d34-0410-b5e6-96231b3b80d8
This changes the implementation of atomic NAND operations
from "a & ~b" (compatible with GCC < 4.4) to actual "~(a & b)"
(compatible with GCC >= 4.4).
This is in line with the common-code and ARM back-end change
implemented in r212433.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212547 91177308-0d34-0410-b5e6-96231b3b80d8
Arguments passed as "byval align" should get the specified alignment
in the parameter save area. There was some code in PPCISelLowering.cpp
that attempted to implement this, but this didn't work correctly:
while code did update the ArgOffset value, it neglected to update
the PtrOff value (which was already computed from the old ArgOffset),
and it also neglected to update GPR_idx -- fields skipped due to
alignment in the save area must likewise be skipped in GPRs.
This patch fixes and simplifies this logic by:
- handling argument offset alignment right at the beginning
of argument processing, using a new helper routine
CalculateStackSlotAlignment (this avoids having to update
PtrOff and other derived values later on)
- not tracking GPR_idx separately, but always computing the
correct GPR_idx for each argument *from* its ArgOffset
- removing some redundant computation in LowerFormalArguments:
MinReservedArea must equal ArgOffset after argument processing,
so there's no use in computing it twice.
[This doesn't change the behavior of the current clang front-end,
since that never creates "byval align" arguments at the moment.
This will change with a follow-on patch, however.]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212476 91177308-0d34-0410-b5e6-96231b3b80d8
The argument list vector is never used after it has been passed to the
CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and
pretty much as cheap as keeping a pointer to it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212135 91177308-0d34-0410-b5e6-96231b3b80d8
As of r211495, the only remaining users of getMinCallFrameSize are in
core ABI code (LowerFormalParameter / LowerCall). This is actually a
good thing, since the details of the parameter save area are ABI specific.
With the new ELFv2 ABI in particular, the rules defining the size of the
save area will become significantly more complex, so it wouldn't make
sense to implement those outside ABI code that has all required
information.
In preparation, this patch eliminates the getMinCallFrameSize (and
associated getMinCallArgumentsSize) routines, and inlines them into all
callers. Note that since nearly all call arguments are constant, this
allows simplifying the inlined copies to a single line everywhere.
No change in generate code expected.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211497 91177308-0d34-0410-b5e6-96231b3b80d8