Commit Graph

826 Commits

Author SHA1 Message Date
Hal Finkel
9ad0f4907b Cleanup PPC(64) i32 -> float/double conversion
The existing SINT_TO_FP code for i32 -> float/double conversion was disabled
because it relied on broken EXTSW_32/STD_32 instruction definitions. The
original intent had been to enable these 64-bit instructions to be used on CPUs
that support them even in 32-bit mode.  Unfortunately, this form of lying to
the infrastructure was buggy (as explained in the FIXME comment) and had
therefore been disabled.

This re-enables this functionality, using regular DAG nodes, but only when
compiling in 64-bit mode. The old STD_32/EXTSW_32 definitions (which were dead)
are removed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178438 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-31 01:58:02 +00:00
Hal Finkel
0882fd6c4f Implement FRINT lowering on PPC using frin
Like nearbyint, rint can be implemented on PPC using the frin instruction. The
complication comes from the fact that rint needs to set the FE_INEXACT flag
when the result does not equal the input value (and frin does not do that). As
a result, we use a custom inserter which, after the rounding, compares the
rounded value with the original, and if they differ, explicitly sets the XX bit
in the FPSCR register (which corresponds to FE_INEXACT).

Once LLVM has better modeling of the floating-point environment we should be
able to (often) eliminate this extra complexity.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178362 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-29 19:41:55 +00:00
Hal Finkel
f5d5c43460 Add PPC FP rounding instructions fri[mnpz]
These instructions are available on the P5x (and later) and on the A2. They
implement the standard floating-point rounding operations (floor, trunc, etc.).
One caveat: frin (round to nearest) does not implement "ties to even", and so
is only enabled in fast-math mode.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178337 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-29 08:57:48 +00:00
Hal Finkel
af0d148b20 Specify CPUs on the PPC bswap-load-store test
Otherwise, the CHECK-NOT's might trigger depending on the host's CPU.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178287 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-28 20:35:18 +00:00
Hal Finkel
2544f221c5 Only enable 64-bit bswap DAG combines for PPC64
Compiling in 32-bit mode on a P7 would assert after 64-bit DAG combines were
added for bswap with load/store. This is because these combines are really only
valid in 64-bit mode, regardless of the CPU (and this was not being checked).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178286 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-28 20:23:46 +00:00
Hal Finkel
efdd4673d6 Add the PPC64 ldbrx/stdbrx instructions
These are 64-bit load/store with byte-swap, and available on the P7 and the A2.
Like the similar instructions for 16- and 32-bit words, these are matched in the
target DAG-combine phase against load/store-bswap pairs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178276 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-28 19:25:55 +00:00
Hal Finkel
c53ab4d77f Add the PPC64 popcntd instruction
PPC ISA 2.06 (P7, A2, etc.) has a popcntd instruction. Add this instruction and
tell TTI about it so that popcount-loop recognition will know about it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178233 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-28 13:29:47 +00:00
Hal Finkel
d957f957ee Cleanup PPC CR-spill kill flags and 32- vs. 64-bit instructions
There were a few places where kill flags were not being set correctly, and
where 32-bit instruction variants were being used with 64-bit registers. After
r178180, this code was being triggered causing llc to assert.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178220 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-28 03:38:16 +00:00
Hal Finkel
32e12df253 Print PPC ZERO as 0 (not r0) even on Darwin
It seems that the Darwin PPC assembler requires r0 to be written as 0 when it
means 0 (at least in lwarx/stwcx.). Fixes PR15605.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178142 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-27 13:20:52 +00:00
Hal Finkel
240b7f3324 Allocate r0 on PPC
The R0 register can now be allocated because instructions
that cannot use R0 as a GPR have been appropriately marked.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178123 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-27 06:52:27 +00:00
Bill Schmidt
37ef805818 Remove the link register from the GPR classes on PowerPC.
Some implementation detail in the forgotten past required the link
register to be placed in the GPRC and G8RC register classes.  This is
just wrong on the face of it, and causes several extra intersection
register classes to be generated.  I found this was having evil
effects on instruction scheduling, by causing the wrong register class
to be consulted for register pressure decisions.

No code generation changes are expected, other than some minor changes
in instruction order.  Seven tests in the test bucket required minor
tweaks to adjust to the new normal.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178114 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-27 02:40:14 +00:00
Hal Finkel
b7e11e400d Don't spill PPC VRSAVE on non-Darwin (even in SjLj)
As Bill Schmidt pointed out to me, only on Darwin do we need to spill/restore
VRSAVE in the SjLj code. For non-Darwin, don't spill/restore VRSAVE (and I've
added some asserts to make sure that we're not).

As it turns out, we're not currently handling the Darwin case correctly (I've
added a FIXME in the test case). I've tried adding various implied register
definitions/uses to force the spill without success, so I'll need to address
this later.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178096 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-27 00:02:20 +00:00
Hal Finkel
01f99d29c3 Use multiple virtual registers in PPC CR spilling
Now that the register scavenger can support multiple spill slots, and PEI can
use virtual-register-based scavenging for multiple simultaneous registers, we
can use a virtual register for the transfer register in the CR spilling code.

This should eliminate the last place (outside of the prologue/epilogue) where
we depend on the unconditional availability of the r0 register. We will soon be
able to allocate it (in a somewhat restricted sense) as a GPR.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178060 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-26 18:57:22 +00:00
Hal Finkel
3ea1b064a0 Fix a register-class comparison bug in PPCCTRLoops
Thanks to Jakob for isolating the underlying problem from the
test case in r177423. The original commit had introduced
asymmetric copy operations, but these turned out to be a work-around
to the real problem (the use of == instead of hasSubClassEq in PPCCTRLoops).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177679 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-21 23:23:34 +00:00
Hal Finkel
7ee74a663a Implement builtin_{setjmp/longjmp} on PPC
This implements SJLJ lowering on PPC, making the Clang functions
__builtin_{setjmp/longjmp} functional on PPC platforms. The implementation
strategy is similar to that on X86, with the exception that a branch-and-link
variant is used to get the right jump address. Credit goes to Bill Schmidt for
suggesting the use of the unconditional bcl form (instead of the regular bl
instruction) to limit return-address-cache pollution.

Benchmarking the speed at -O3 of:

static jmp_buf env_sigill;

void foo() {
                __builtin_longjmp(env_sigill,1);
}

main() {
	...

        for (int i = 0; i < c; ++i) {
                if (__builtin_setjmp(env_sigill)) {
                        goto done;
                } else {
                        foo();
                }

done:;
        }

	...
}

vs. the same code using the libc setjmp/longjmp functions on a P7 shows that
this builtin implementation is ~4x faster with Altivec enabled and ~7.25x
faster with Altivec disabled. This comparison is somewhat unfair because the
libc version must also save/restore the VSX registers which we don't yet
support.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177666 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-21 21:37:52 +00:00
David Blaikie
ebb5183a2f Remove unused field in DISubprogram
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177661 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-21 20:28:52 +00:00
Hal Finkel
10f7f2a222 Add support for spilling VRSAVE on PPC
Although there is only one Altivec VRSAVE register, it is a member of
a register class, and we need the ability to spill it. Because this
register is normally callee-preserved and handled by special code this
has never before been necessary. However, this capability will be required by
a forthcoming commit adding SjLj support.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177654 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-21 19:03:21 +00:00
Hal Finkel
e9cc0a09ae Correct PPC FRAMEADDR lowering using a pseudo-register
The old code used to lower FRAMEADDR tried to replicate the logic in the real
frame-lowering code that determines whether or not the frame pointer (r31) will
be used. When it seemed as through the frame pointer would not be used, the
stack pointer (r1) was used instead. Unfortunately, because the stack size is
not yet known, this does not work. Instead, this change introduces new
always-reserved pseudo-registers (FP and FP8) that are replaced during prologue
insertion with the real frame-pointer register (either r1 or r31).

It is important that this intrinsic always return a valid frame address because
it is used by Clang to store the frame address as part of code generation for
__builtin_setjmp.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177653 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-21 19:03:19 +00:00
David Blaikie
404ecce890 Remove unused field in DICompileUnit
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177590 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-20 22:34:33 +00:00
Hal Finkel
7ab1e60133 Add a comment to the CodeGen/PowerPC/asym-regclass-copy.ll test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177434 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-19 20:22:32 +00:00
Ulrich Weigand
5882e3d828 Rewrite pre-increment store patterns to use standard memory operands.
Currently, pre-increment store patterns are written to use two separate
operands to represent address base and displacement:

  stwu $rS, $ptroff($ptrreg)

This causes problems when implementing the assembler parser, so this
commit changes the patterns to use standard (complex) memory operands
like in all other memory access instruction patterns:

  stwu $rS, $dst

To still match those instructions against the appropriate pre_store
SelectionDAG nodes, the patch uses the new feature that allows a Pat
to match multiple DAG operands against a single (complex) instruction
operand.

Approved by Hal Finkel.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177429 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-19 19:52:04 +00:00
Hal Finkel
a548afc98f Prepare to make r0 an allocatable register on PPC
Currently the PPC r0 register is unconditionally reserved. There are two reasons
for this:

 1. r0 is treated specially (as the constant 0) by certain instructions, and so
    cannot be used with those instructions as a regular register.

 2. r0 is used as a temporary register in the CR-register spilling process
    (where, under some circumstances, we require two GPRs).

This change addresses the first reason by introducing a restricted register
class (without r0) for use by those instructions that treat r0 specially. These
register classes have a new pseudo-register, ZERO, which represents the r0-as-0
use. This has the side benefit of making the existing target code simpler (and
easier to understand), and will make it clear to the register allocator that
uses of r0 as 0 don't conflict will real uses of the r0 register.

Once the CR spilling code is improved, we'll be able to allocate r0.

Adding these extra register classes, for some reason unclear to me, causes
requests to the target to copy 32-bit registers to 64-bit registers. The
resulting code seems correct (and causes no test-suite failures), and the new
test case covers this new kind of asymmetric copy.

As r0 is still reserved, no functionality change intended.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177423 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-19 18:51:05 +00:00
Hal Finkel
ec2e968b7a Cleanup PPC64 unaligned i64 load/store
Remove an accidentally-added instruction definition and add a comment in the
test case. This is in response to a post-commit review by Bill Schmidt.

No functionality change intended.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177404 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-19 15:23:39 +00:00
Hal Finkel
54e57f8cb7 Don't reserve R31 on PPC64 unless the frame pointer is needed
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177379 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-19 08:09:38 +00:00
Hal Finkel
9f2518cdc6 Fix a sign-extension bug in PPCCTRLoops
Don't sign extend the immediate value from the OR instruction in
an LIS/OR pair.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177361 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-18 23:58:28 +00:00
Hal Finkel
08a215c286 Fix PPC unaligned 64-bit loads and stores
PPC64 supports unaligned loads and stores of 64-bit values, but
in order to use the r+i forms, the offset must be a multiple of 4.
Unfortunately, this cannot always be determined by examining the
immediate itself because it might be available only via a TOC entry.

In order to get around this issue, we additionally predicate the
selection of the r+i form on the alignment of the load or store
(forcing it to be at least 4 in order to select the r+i form).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177338 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-18 23:00:58 +00:00
Bill Schmidt
09a01e92d0 Change test cases to handle unaligned references.
Hal Finkel recently added code to allow unaligned memory references
for PowerPC.  Two tests were temporarily modified with
-disable-ppc-unaligned to keep them from failing.  This patch adjusts
the expected code generation for the unaligned references.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177328 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-18 22:12:04 +00:00
David Blaikie
4388d58ff4 Remove unnecessary leading comment characters in lit-only file
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177327 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-18 22:08:16 +00:00
David Blaikie
e68f0b650e Include '.test' suffix in target specific lit configs that need it
Apparently my final cleanup to use a relevant suffix for these tests before
committing r176831 caused them to stop running since lit wasn't configured to
run tests with that suffix in those directories (why don't we just have a
global suffix list?). So, add the suffix to the relevant directories & fix the
test that has bitrotted over the last week due to my debug info schema changes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177315 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-18 20:31:44 +00:00
Hal Finkel
9887ec31e6 Fix large count and negative constant count handling in PPCCTRLoops
This commit fixes an assert that would occur on loops with large constant counts
(like looping for ((uint32_t) -1) iterations on PPC64). The existing code did
not handle counts that it computed to be negative (asserting instead), but
these can be created with valid inputs.

This bug was discovered by bugpoint while I was attempting to isolate a
completely different problem.

Also, in writing test cases for the negative-count problem, I discovered that
the ori/lsi handling was broken (there was a typo which caused the logic that
was supposed to detect these pairs and extract the iteration count to always
fail). This has now also been corrected (and is covered by one of the new test
cases).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177295 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-18 17:40:44 +00:00
Hal Finkel
1448d06156 Cleanup initial-value constants in PPCCTRLoops
Because the initial-value constants had not been added to the list
of instructions considered for DCE the resulting code had redundant
constant-materialization instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177294 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-18 17:40:27 +00:00
Hal Finkel
3249729043 Improve PPC VR (Altivec) register spilling
This change cleans up two issues with Altivec register spilling:

  1. The spilling code was inefficient (using two instructions, and add and a
     load, when just one would do)

  2. The code assumed that r0 would always be available (true for now, but this
     will change)

The new code handles VR spilling just like GPR spills but forced into r+r mode.
As a result, when any VR spills are present, we must now always allocate the
register-scavenger spill slot.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177231 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-17 04:43:44 +00:00
Hal Finkel
ea9b914d2f Remove FIXMEs in PPC test cases related to unaligned loads/stores
As pointed out by Bill in response to r177160, these two FIXMEs
can also be removed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177229 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-16 23:02:31 +00:00
Hal Finkel
2d37f7b979 Enable unaligned memory access on PPC for scalar types
Unaligned access is supported on PPC for non-vector types, and is generally
more efficient than manually expanding the loads and stores.

A few of the existing test cases were using expanded unaligned loads and stores
to test other features (like load/store with update), and for these test cases,
unaligned access remains disabled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177160 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-15 15:27:13 +00:00
Hal Finkel
044f841267 Protect PPC Altivec patterns with a predicate
In preparation for the addition of other SIMD ISA extensions (such as QPX) we
need to make sure that all Altivec patterns are properly predicated on having
Altivec support.

No functionality change intended (one test case needed to be updated b/c it
assumed that Altivec intrinsics would be supported without enabling Altivec
support).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177152 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-15 13:21:21 +00:00
Hal Finkel
0cfb42adb5 Allocate the RS spill slot for any PPC function with spills and a large stack frame
For spills into a large stack frame, the FI-elimination code uses the register
scavenger to obtain a free GPR for use with an r+r-addressed load or store.
When there are no available GPRs, the scavenger gets one by using its spill
slot. Previously, we were not always allocating that spill slot and the RS
would assert when the spill slot was needed.

I don't currently have a small test that triggered the assert, but I've
created a small regression test that verifies that the spill slot is now
added when the stack frame is sufficiently large.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177140 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-15 05:06:04 +00:00
Hal Finkel
100a94bc93 Not all PPC functions with a frame pointer need a RS spill slot
We used to add a spill slot for the register scavenger whenever the function
has a frame pointer. This is unnecessarily conservative: We may need the spill
slot for dynamic stack allocations, and functions with dynamic stack
allocations always have a FP, but we might also have a FP for other reasons
(such as the user explicitly disabling frame-pointer elimination), and we don't
necessarily need a spill slot for those functions.

The structsinregs test needed adjustment because it disables FP elimination.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177106 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-14 19:34:32 +00:00
David Blaikie
de3077ae6b Refactor filename/directory in DICompileUnit into a DIFile
This is the next step towards making the metadata for DIScopes have a common
prefix rather than having to delegate based on their tag type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176913 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-13 00:01:35 +00:00
David Blaikie
46561ce249 Remove unused "isMain" field from DICompileUnit
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176910 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-12 22:43:04 +00:00
David Blaikie
14891447ff Update debug info test cases with empty SplitDebugFilename field.
This could be 'null' or the empty string, DIDescriptor::getStringField
coalesces the two cases anyway so it's just a matter of legible/efficient
representation.

The change in behavior of the DICompileUnit::get* functions could be
subsumed by the full verification check - but ideally that should just be an
assertion if we could front-load the actual debug info metadata failure paths.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176907 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-12 22:25:36 +00:00
Jan Wen Voung
4323665bd8 Revert the test moves from 176733. Use "REQUIRES: asserts" instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176873 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-12 16:27:52 +00:00
Hal Finkel
4d53e7798c Don't reserve R2 on Darwin/PPC
Now that only the register-scavenger version of the CR spilling code remains,
we no longer need the Darwin R2 hack. Darwin can use R0 as a spare register in
any case where the System V ABI uses it (R0 is special architecturally, and so
is reserved under all common ABIs).

A few test cases needed to be updated to reflect the register-allocation changes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176868 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-12 15:18:14 +00:00
David Blaikie
7cf04f3e12 Remove duplicate test contents.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176831 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-11 22:10:14 +00:00
Benjamin Kramer
1cb47b9afe Test case hygiene.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176772 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-09 18:25:40 +00:00
Jan Wen Voung
fa785cb22d Disable statistics on Release builds and move tests that depend on -stats.
Summary:
Statistics are still available in Release+Asserts (any +Asserts builds),
and stats can also be turned on with LLVM_ENABLE_STATS.

Move some of the FastISel stats that were moved under DEBUG()
back out of DEBUG(), since stats are disabled across the board now.

Many tests depend on grepping "-stats" output.  Move those into
a orig_dir/Stats/. so that they can be marked as unsupported
when building without statistics.

Differential Revision: http://llvm-reviews.chandlerc.com/D486

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176733 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-08 22:56:31 +00:00
Bill Schmidt
6539682330 Fix PR15332 (patch by Florian Zeitz).
There's no need to generate a stack frame for PPC32 SVR4 when there are
no local variables assigned to the stack, i.e., when no red zone is needed.
(PPC64 supports a red zone, but PPC32 does not.)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176124 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-26 21:28:57 +00:00
Bill Schmidt
3a42989d3d Fix PR15359.
The PowerPC TLS relocation types were not previously added to the
necessary list in MCELFStreamer::fixSymbolsInTLSFixups().  Now they are!


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176094 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-26 16:41:03 +00:00
Bill Schmidt
fc7695a653 Fix missing relocation for TLS addressing peephole optimization.
Report and fix due to Kai Nacke.  Testcase update by me.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176029 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-25 16:44:35 +00:00
Bill Schmidt
53b0b0e754 Large code model support for PowerPC.
Large code model is identical to medium code model except that the
addis/addi sequence for "local" accesses is never used.  All accesses
use the addis/ld sequence.

The coding changes are straightforward; most of the patch is taken up
with creating variants of the medium model tests for large model.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175767 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-21 17:12:27 +00:00
Bill Schmidt
421021157e PPCDAGToDAGISel::PostprocessISelDAG()
This patch implements the PPCDAGToDAGISel::PostprocessISelDAG virtual
method to perform post-selection peephole optimizations on the DAG
representation.

One optimization is implemented here:  folds to clean up complex
addressing expressions for thread-local storage and medium code
model.  It will also be useful for large code model sequences when
those are added later.  I originally thought about doing this on the
MI representation prior to register assignment, but it's difficult to
do effective global dead code elimination at that point.  DCE is
trivial on the DAG representation.

A typical example of a candidate code sequence in assembly:

   addis 3, 2, globalvar@toc@ha
   addi  3, 3, globalvar@toc@l
   lwz   5, 0(3)

When the final instruction is a load or store with an immediate offset
of zero, the offset from the add-immediate can replace the zero,
provided the relocation information is carried along:

   addis 3, 2, globalvar@toc@ha
   lwz   5, globalvar@toc@l(3)

Since the addi can in general have multiple uses, we need to only
delete the instruction when the last use is removed.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175697 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-21 00:38:25 +00:00
Bill Schmidt
08addcab19 Stabilize vec_constants.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175683 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-20 22:43:03 +00:00
Bill Schmidt
abc402886e Additional fixes for bug 15155.
This handles the cases where the 6-bit splat element is odd, converting
to a three-instruction sequence to add or subtract two splats.  With this
fix, the XFAIL in test/CodeGen/PowerPC/vec_constants.ll is removed.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175663 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-20 20:41:42 +00:00
Bill Schmidt
49deebb5eb Fix bug 14779 for passing anonymous aggregates [patch by Kai Nacke].
The PPC backend doesn't handle these correctly.  This patch uses logic
similar to that in the X86 and ARM backends to track these arguments
properly.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175635 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-20 17:31:41 +00:00
Bill Schmidt
b34c79e4bb Fix PR15155: lost vadd/vsplat optimization.
During lowering of a BUILD_VECTOR, we look for opportunities to use a
vector splat.  When the splatted value fits in 5 signed bits, a single
splat does the job.  When it doesn't fit in 5 bits but does fit in 6,
and is an even value, we can splat on half the value and add the result
to itself.

This last optimization hasn't been working recently because of improved
constant folding.  To circumvent this, create a pseudo VADD_SPLAT that
can be expanded during instruction selection.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175632 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-20 15:50:31 +00:00
Hal Finkel
089a5f8a8c DAGCombiner: Constant folding around pre-increment loads/stores
Previously, even when a pre-increment load or store was generated,
we often needed to keep a copy of the original base register for use
with other offsets. If all of these offsets are constants (including
the offset which was combined into the addressing mode), then this is
clearly unnecessary. This change adjusts these other offsets to use the
new incremented address.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174746 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-08 21:35:47 +00:00
Benjamin Kramer
0d3731478e Disable a couple more vector splat optimizations on PPC.
I didn't see those because the test case used "not grep". FileCheck the test and
XFAIL it, preserving the old optimization, so this can be fixed eventually.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174330 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-04 15:52:32 +00:00
Benjamin Kramer
4969310052 SelectionDAG: Teach FoldConstantArithmetic how to deal with vectors.
This required disabling a PowerPC optimization that did the following:
input:
x = BUILD_VECTOR <i32 16, i32 16, i32 16, i32 16>
lowered to:
tmp = BUILD_VECTOR <i32 8, i32 8, i32 8, i32 8>
x = ADD tmp, tmp

The add now gets folded immediately and we're back at the BUILD_VECTOR we
started from. I don't see a way to fix this currently so I left it disabled
for now.

Fix some trivially foldable X86 tests too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174325 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-04 15:19:18 +00:00
David Blaikie
a8eefc7cc7 Remove the (apparently) unnecessary debug info metadata indirection.
The main lists of debug info metadata attached to the compile_unit had an extra
layer of metadata nodes they went through for no apparent reason. This patch
removes that (& still passes just as much of the GDB 7.5 test suite). If anyone
can show evidence as to why these extra metadata nodes are there I'm open to
reverting this patch & documenting why they're there.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174266 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-02 05:56:24 +00:00
Bill Schmidt
cdc3b74cfb LLVM enablement for some older PowerPC CPUs
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174230 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-01 22:59:51 +00:00
Hal Finkel
9a79b320cb PPC QPX requires a 32-byte aligned stack
On systems which support the QPX vector instructions, the stack must be
32-byte aligned.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@173993 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-30 23:43:27 +00:00
Hal Finkel
5bb16fdbb3 Add definitions for the PPC a2q core marked as having QPX available
This is the first commit of a large series which will add support for the
QPX vector instruction set to the PowerPC backend. This instruction set is
used on the IBM Blue Gene/Q supercomputers.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@173973 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-30 21:17:42 +00:00
Bill Schmidt
5ff776bfde This patch addresses bug 15031.
The common code in the post-RA scheduler to break anti-dependencies on the
critical path contained a flaw.  In the reported case, an anti-dependency
between the overlapping registers %X4 and %R4 exists:

	%X29<def> = OR8 %X4, %X4
	%R4<def>, %X3<def,dead,tied3> = LBZU 1, %X3<kill,tied1>

The unpatched code breaks the dependency by replacing %R4 and its uses
with %R3, the first register on the available list.  However, %R3 and
%X3 overlap, so this creates two overlapping definitions on the same
instruction.

The fix is straightforward, preventing selection of a register that
overlaps any other defined register on the same instruction.

The test case is reduced from the bug report, and verifies that we no
longer produce "lbzu 3, 1(3)" when breaking this anti-dependency.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@173706 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-28 18:36:58 +00:00
Bill Schmidt
d69a43a88d Restore reverted test case, this time with REQUIRES: asserts
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172747 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-17 19:46:51 +00:00
Bill Schmidt
c087dcfa63 Remove bad test case
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172746 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-17 19:39:36 +00:00
Bill Schmidt
8f4ee4b2a2 This patch fixes PR13626 by providing i128 support in the return
calling convention.  128-bit integers are now properly returned
in GPR3 and GPR4 on PowerPC.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172745 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-17 19:34:57 +00:00
Bill Schmidt
792b123338 This patch fixes the PPC calling convention to handle returns of
_Complex float and _Complex long double, by simply increasing the
number of floating point registers available for return values.

The test case verifies that the correct registers are loaded.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172733 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-17 17:45:19 +00:00
Bill Schmidt
89e88e30bf This patch addresses an incorrect transformation in the DAG combiner.
The included test case is derived from one of the GCC compatibility tests.
The problem arises after the selection DAG has been converted to type-legalized
form.  The combiner first sees a 64-bit load that can be converted into a
pre-increment form.  The original load feeds into a SRL that isolates the
upper 32 bits of the loaded doubleword.  This looks like an opportunity for
DAGCombiner::ReduceLoadWidth() to replace the 64-bit load with a 32-bit load.

However, this transformation is not valid, as the replacement load is not
a pre-increment load.  The pre-increment load produces an extra result,
which feeds a subsequent add instruction.  The replacement load only has
one result value, and this value is propagated to all uses of the pre-
increment load, including the add.  Because the add is looking for the
second result value as its operand, it ends up attempting to add a constant
to a token chain, resulting in a crash.

So the patch simply disables this transformation for any load with more than
two result values.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172480 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-14 22:04:38 +00:00
Benjamin Kramer
4dc478308f When lowering an inreg sext first shift left, then right arithmetically.
Shifting right two times will only yield zero. Should fix
SingleSource/UnitTests/SignlessTypes/factor.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172322 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-12 19:06:44 +00:00
Nadav Rotem
66de2af815 PPC: Implement efficient lowering of sign_extend_inreg.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172269 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-11 22:57:48 +00:00
Tim Northover
5f2801bd65 Simplify writing floating types to assembly.
This removes previous special cases for each floating-point type in favour of a
shared codepath.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@172189 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-11 10:36:13 +00:00
Andrew Trick
47579cf390 MIsched: add an ILP window property to machine model.
This was an experimental option, but needs to be defined
per-target. e.g. PPC A2 needs to aggressively hide latency.

I converted some in-order scheduling tests to A2. Hal is working on
more test cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171946 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-09 03:36:49 +00:00
Tim Northover
a67352d401 Specify complete triple for fp128 tests.
This avoids FileCheck failing over different comment characters in
assembly (notably powerpc64 on Linux vs Darwin) and should fix David's
build-bot.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171886 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-08 19:36:33 +00:00
Tim Northover
90f011e0ba Allow the asm printer to print fp128 values properly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171866 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-08 16:56:23 +00:00
Bill Schmidt
5b7f9216c3 This patch addresses bug 14678 by fixing two problems in medium code model
code generation.  Variables addressed through a GlobalAlias were not being
handled, and variables with available_externally linkage were treated
incorrectly.  The patch contains two new tests to verify the correct code
generation for these cases.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171778 91177308-0d34-0410-b5e6-96231b3b80d8
2013-01-07 19:29:18 +00:00
Hal Finkel
6eb7a4270b Support ppcf128 in SelectionDAG::getConstantFP
Fixes pr14751.

Patch by Kai; Thanks!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171261 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-30 19:03:32 +00:00
Hal Finkel
abdf75511b Loosen scheduling restrictions on the PPC dcbt intrinsic
As with the prefetch intrinsic to which it maps, simply have dcbt
marked as reading from and writing to its arguments instead of having
unmodeled side effects. While this might cause unwanted code motion
(because aliasing checks don't really capture cache-line sharing),
it is more important that prefetches in unrolled loops don't block
the scheduler from rearranging the unrolled loop body.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171073 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-25 18:51:18 +00:00
Hal Finkel
cd9ea51986 Expand PPC64 atomic load and store
Use of store or load with the atomic specifier on 64-bit types would
cause instruction-selection failures. As with the 32-bit case, these
can use the default expansion in terms of cmp-and-swap.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171072 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-25 17:22:53 +00:00
Rafael Espindola
ffc7d3b0ad Simplify the testcase a bit.
I checked that it would still crash llc before the corresponding fix.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170709 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-20 17:47:27 +00:00
Benjamin Kramer
91223a41ef PowerPC: Expand VSELECT nodes.
There's probably a better expansion for those nodes than the default for
altivec, but this is better than crashing. VSELECTs occur in loop vectorizer
output.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170551 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-19 15:49:14 +00:00
Hal Finkel
ca2dd36c39 Check multiple register classes for inline asm tied registers
A register can be associated with several distinct register classes.
For example, on PPC, the floating point registers are each associated with
both F4RC (which holds f32) and F8RC (which holds f64). As a result, this code
would fail when provided with a floating point register and an f64 operand
because it would happen to find the register in the F4RC class first and
return that. From the F4RC class, SDAG would extract f32 as the register
type and then assert because of the invalid implied conversion between
the f64 value and the f32 register.

Instead, search all register classes. If a register class containing the
the requested register has the requested type, then return that register
class. Otherwise, as before, return the first register class found that
contains the requested register.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170436 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-18 17:50:58 +00:00
Bill Schmidt
d3eb4f46f0 This patch removes some nondeterminism from direct object file output
for TLS dynamic models on 64-bit PowerPC ELF.  The default sort routine
for relocations only sorts on the r_offset field; but with TLS, there
can be two relocations with the same r_offset.  For PowerPC, this patch
sorts secondarily on descending r_type, which matches the behavior
expected by the linker.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170237 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-14 20:28:38 +00:00
Bill Schmidt
b453e16855 This patch improves the 64-bit PowerPC InitialExec TLS support by providing
for a wider range of GOT entries that can hold thread-relative offsets.
This matches the behavior of GCC, which was not documented in the PPC64 TLS
ABI.  The ABI will be updated with the new code sequence.

Former sequence:

  ld 9,x@got@tprel(2)
  add 9,9,x@tls

New sequence:

  addis 9,2,x@got@tprel@ha
  ld 9,x@got@tprel@l(9)
  add 9,9,x@tls

Note that a linker optimization exists to transform the new sequence into
the shorter sequence when appropriate, by replacing the addis with a nop
and modifying the base register and relocation type of the ld.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170209 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-14 17:02:38 +00:00
Bill Schmidt
71fe60ef10 The ordering of two relocations on the same instruction is apparently not
predictable when compiled on at least one non-PowerPC host.  Source of
nondeterminism not apparent.  Restrict the test to build on PowerPC hosts
for now while looking into the issue further.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170016 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-12 20:29:20 +00:00
Bill Schmidt
349c2787cf This patch implements local-dynamic TLS model support for the 64-bit
PowerPC target.  This is the last of the four models, so we now have 
full TLS support.

This is mostly a straightforward extension of the general dynamic model.
I had to use an additional Chain operand to tie ADDIS_DTPREL_HA to the
register copy following ADDI_TLSLD_L; otherwise everything above the
ADDIS_DTPREL_HA appeared dead and was removed.

As before, there are new test cases to test the assembly generation, and
the relocations output during integrated assembly.  The expected code
gen sequence can be read in test/CodeGen/PowerPC/tls-ld.ll.

There are a couple of things I think can be done more efficiently in the
overall TLS code, so there will likely be a clean-up patch forthcoming;
but for now I want to be sure the functionality is in place.

Bill


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@170003 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-12 19:29:35 +00:00
Bill Schmidt
57ac1f458a This patch implements the general dynamic TLS model for 64-bit PowerPC.
Given a thread-local symbol x with global-dynamic access, the generated
code to obtain x's address is:

     Instruction                            Relocation            Symbol
  addis ra,r2,x@got@tlsgd@ha           R_PPC64_GOT_TLSGD16_HA       x
  addi  r3,ra,x@got@tlsgd@l            R_PPC64_GOT_TLSGD16_L        x
  bl __tls_get_addr(x@tlsgd)           R_PPC64_TLSGD                x
                                       R_PPC64_REL24           __tls_get_addr
  nop
  <use address in r3>

The implementation borrows from the medium code model work for introducing
special forms of ADDIS and ADDI into the DAG representation.  This is made
slightly more complicated by having to introduce a call to the external
function __tls_get_addr.  Using the full call machinery is overkill and,
more importantly, makes it difficult to add a special relocation.  So I've
introduced another opcode GET_TLS_ADDR to represent the function call, and
surrounded it with register copies to set up the parameter and return value.

Most of the code is pretty straightforward.  I ran into one peculiarity
when I introduced a new PPC opcode BL8_NOP_ELF_TLSGD, which is just like
BL8_NOP_ELF except that it takes another parameter to represent the symbol
("x" above) that requires a relocation on the call.  Something in the 
TblGen machinery causes BL8_NOP_ELF and BL8_NOP_ELF_TLSGD to be treated
identically during the emit phase, so this second operand was never
visited to generate relocations.  This is the reason for the slightly
messy workaround in PPCMCCodeEmitter.cpp:getDirectBrEncoding().

Two new tests are included to demonstrate correct external assembly and
correct generation of relocations using the integrated assembler.

Comments welcome!

Thanks,
Bill


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169910 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-11 20:30:11 +00:00
Hal Finkel
f21831073c Use GetUnderlyingObjects in misched
misched used GetUnderlyingObject in order to break false load/store
dependencies, and the -enable-aa-sched-mi feature similarly relied on
GetUnderlyingObject in order to ensure it is safe to use the aliasing analysis.
Unfortunately, GetUnderlyingObject does not recurse through phi nodes, and so
(especially due to LSR) all of these mechanisms failed for
induction-variable-dependent loads and stores inside loops.

This change replaces uses of GetUnderlyingObject with GetUnderlyingObjects
(which will recurse through phi and select instructions) in misched.

Andy reviewed, tested and simplified this patch; Thanks!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169744 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-10 18:49:16 +00:00
Bill Schmidt
d7802bf0dd This patch introduces initial-exec model support for thread-local storage
on 64-bit PowerPC ELF.

The patch includes code to handle external assembly and MC output with the
integrated assembler.  It intentionally does not support the "old" JIT.

For the initial-exec TLS model, the ABI requires the following to calculate
the address of external thread-local variable x:

 Code sequence            Relocation                  Symbol
  ld 9,x@got@tprel(2)      R_PPC64_GOT_TPREL16_DS      x
  add 9,9,x@tls            R_PPC64_TLS                 x

The register 9 is arbitrary here.  The linker will replace x@got@tprel
with the offset relative to the thread pointer to the generated GOT
entry for symbol x.  It will replace x@tls with the thread-pointer
register (13).

The two test cases verify correct assembly output and relocation output
as just described.

PowerPC-specific selection node variants are added for the two
instructions above:  LD_GOT_TPREL and ADD_TLS.  These are inserted
when an initial-exec global variable is encountered by
PPCTargetLowering::LowerGlobalTLSAddress(), and later lowered to
machine instructions LDgotTPREL and ADD8TLS.  LDgotTPREL is a pseudo
that uses the same LDrs support added for medium code model's LDtocL,
with a different relocation type.

The rest of the processing is straightforward.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169281 91177308-0d34-0410-b5e6-96231b3b80d8
2012-12-04 16:18:08 +00:00
Chad Rosier
89d86118c5 test/CodeGen/PowerPC/vec_mul.ll: Add a triple. Thanks, Hal.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169026 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-30 19:15:10 +00:00
Chad Rosier
75cbb00727 test/CodeGen/PowerPC/vec_mul.ll: Fix register operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@169020 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-30 18:29:01 +00:00
NAKAMURA Takumi
09af5b8c3e test/CodeGen/PowerPC: Add explicit -march=ppc32.
FIXME: Please add another RUN line if you would like to check also on ppc64.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168999 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-30 13:28:31 +00:00
Adhemerval Zanella
375cbe4143 This patch fixes the Altivec addend construction for the fused multiply-add
instruction (vmaddfp) to conform with IEEE to ensure the sign of a zero
result when resulting product is -0.0.

The -0.0 vector addend to vmaddfp is generated by a creating a vector
with full bits sets and then shifting each elements by 31-bits to the
left, resulting in a vector of 0x80000000 (or -0.0 as float).

The 'buildvec_canonicalize.ll' was adjusted to reflect this change and
the 'vec_mul.ll' was complemented with the float vector multiplication
test.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168998 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-30 13:05:44 +00:00
Bill Schmidt
daa65f5e08 This patch makes medium code model the default for 64-bit PowerPC ELF.
When the CodeGenInfo is to be created for the PPC64 target machine,
a default code-model selection is converted to CodeModel::Medium
provided we are not targeting the Darwin OS.  Defaults for Darwin
are unaffected.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168747 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-27 23:36:26 +00:00
Bill Schmidt
34a9d4b3b9 This patch implements medium code model support for 64-bit PowerPC.
The default for 64-bit PowerPC is small code model, in which TOC entries
must be addressable using a 16-bit offset from the TOC pointer.  Additionally,
only TOC entries are addressed via the TOC pointer.

With medium code model, TOC entries and data sections can all be addressed
via the TOC pointer using a 32-bit offset.  Cooperation with the linker
allows 16-bit offsets to be used when these are sufficient, reducing the
number of extra instructions that need to be executed.  Medium code model
also does not generate explicit TOC entries in ".section toc" for variables
that are wholly internal to the compilation unit.

Consider a load of an external 4-byte integer.  With small code model, the
compiler generates:

	ld 3, .LC1@toc(2)
	lwz 4, 0(3)

	.section	.toc,"aw",@progbits
.LC1:
	.tc ei[TC],ei

With medium model, it instead generates:

	addis 3, 2, .LC1@toc@ha
	ld 3, .LC1@toc@l(3)
	lwz 4, 0(3)

	.section	.toc,"aw",@progbits
.LC1:
	.tc ei[TC],ei

Here .LC1@toc@ha is a relocation requesting the upper 16 bits of the
32-bit offset of ei's TOC entry from the TOC base pointer.  Similarly,
.LC1@toc@l is a relocation requesting the lower 16 bits.  Note that if
the linker determines that ei's TOC entry is within a 16-bit offset of
the TOC base pointer, it will replace the "addis" with a "nop", and
replace the "ld" with the identical "ld" instruction from the small
code model example.

Consider next a load of a function-scope static integer.  For small code
model, the compiler generates:

	ld 3, .LC1@toc(2)
	lwz 4, 0(3)

	.section	.toc,"aw",@progbits
.LC1:
	.tc test_fn_static.si[TC],test_fn_static.si
	.type	test_fn_static.si,@object
	.local	test_fn_static.si
	.comm	test_fn_static.si,4,4

For medium code model, the compiler generates:

	addis 3, 2, test_fn_static.si@toc@ha
	addi 3, 3, test_fn_static.si@toc@l
	lwz 4, 0(3)

	.type	test_fn_static.si,@object
	.local	test_fn_static.si
	.comm	test_fn_static.si,4,4

Again, the linker may replace the "addis" with a "nop", calculating only
a 16-bit offset when this is sufficient.

Note that it would be more efficient for the compiler to generate:

	addis 3, 2, test_fn_static.si@toc@ha
        lwz 4, test_fn_static.si@toc@l(3)

The current patch does not perform this optimization yet.  This will be
addressed as a peephole optimization in a later patch.

For the moment, the default code model for 64-bit PowerPC will remain the
small code model.  We plan to eventually change the default to medium code
model, which matches current upstream GCC behavior.  Note that the different
code models are ABI-compatible, so code compiled with different models will
be linked and execute correctly.

I've tested the regression suite and the application/benchmark test suite in
two ways:  Once with the patch as submitted here, and once with additional
logic to force medium code model as the default.  The tests all compile
cleanly, with one exception.  The mandel-2 application test fails due to an
unrelated ABI compatibility with passing complex numbers.  It just so happens
that small code model was incredibly lucky, in that temporary values in 
floating-point registers held the expected values needed by the external
library routine that was called incorrectly.  My current thought is to correct
the ABI problems with _Complex before making medium code model the default,
to avoid introducing this "regression."

Here are a few comments on how the patch works, since the selection code
can be difficult to follow:

The existing logic for small code model defines three pseudo-instructions:
LDtoc for most uses, LDtocJTI for jump table addresses, and LDtocCPT for
constant pool addresses.  These are expanded by SelectCodeCommon().  The
pseudo-instruction approach doesn't work for medium code model, because
we need to generate two instructions when we match the same pattern.
Instead, new logic in PPCDAGToDAGISel::Select() intercepts the TOC_ENTRY
node for medium code model, and generates an ADDIStocHA followed by either
a LDtocL or an ADDItocL.  These new node types correspond naturally to
the sequences described above.

The addis/ld sequence is generated for the following cases:
 * Jump table addresses
 * Function addresses
 * External global variables
 * Tentative definitions of global variables (common linkage)

The addis/addi sequence is generated for the following cases:
 * Constant pool entries
 * File-scope static global variables
 * Function-scope static variables

Expanding to the two-instruction sequences at select time exposes the
instructions to subsequent optimization, particularly scheduling.

The rest of the processing occurs at assembly time, in
PPCAsmPrinter::EmitInstruction.  Each of the instructions is converted to
a "real" PowerPC instruction.  When a TOC entry needs to be created, this
is done here in the same manner as for the existing LDtoc, LDtocJTI, and
LDtocCPT pseudo-instructions (I factored out a new routine to handle this).

I had originally thought that if a TOC entry was needed for LDtocL or
ADDItocL, it would already have been generated for the previous ADDIStocHA.
However, at higher optimization levels, the ADDIStocHA may appear in a 
different block, which may be assembled textually following the block
containing the LDtocL or ADDItocL.  So it is necessary to include the
possibility of creating a new TOC entry for those two instructions.

Note that for LDtocL, we generate a new form of LD called LDrs.  This
allows specifying the @toc@l relocation for the offset field of the LD
instruction (i.e., the offset is replaced by a SymbolLo relocation).
When the peephole optimization described above is added, we will need
to do similar things for all immediate-form load and store operations.

The seven "mcm-n.ll" test cases are kept separate because otherwise the
intermingling of various TOC entries and so forth makes the tests fragile
and hard to understand.

The above assumes use of an external assembler.  For use of the
integrated assembler, new relocations are added and used by
PPCELFObjectWriter.  Testing is done with "mcm-obj.ll", which tests for
proper generation of the various relocations for the same sequences
tested with the external assembler.






git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168708 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-27 17:35:46 +00:00
Eli Bendersky
a5cf16fc51 Rewrite test to not use a FileCheck variable and redefine it on the same line.
In preparation for the FileCheck functionality change which will allow using
a variable later on the same line.

No functionality change.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168588 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-26 14:09:46 +00:00
Benjamin Kramer
915558e775 PPC: MCize most of the darwin PIC emission.
The last remaining bit is "bcl 20, 31, AnonSymbol", which I couldn't find the
instruction definition for. Only whitespace changes in assembly output.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168541 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-24 13:18:25 +00:00
Andrew Trick
410fe6fe19 Use a full triple for a PPC test case for asm syntax.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168283 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-18 06:21:03 +00:00
Andrew Trick
953663a9aa Silence the buildbots for this test while I figure out the triple
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168249 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-17 03:39:26 +00:00
Andrew Trick
e1f663933a Broaden isSchedulingBoundary to check aliases of SP.
On PPC the stack pointer is X1, but ADJCALLSTACK writes R1.

Fixes PR14315: Register regmask dependency problem with misched.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168248 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-17 03:35:11 +00:00
Adhemerval Zanella
e95ed2b7af PowerPC: Lowering floor intrinsic for Altivec
This patch lowers the llvm.floor, llvm.ceil, llvm.trunc, and
llvm.nearbyint to Altivec instruction when using 4 single-precision
float vectors.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@168086 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-15 20:56:03 +00:00
Bill Schmidt
527388dea5 This patch is in preparation for adding medium code model support to the
PPC64 target.  The five tests modified herein test code generation that is
sensitive to the code model selected.  So I've added -code-model=small to
the RUN commands for each.

Since small code model is the default, this has no effect for now; but this
prepares us for eventually changing the default to medium code model for PPC64.

Test changes verified with small and medium code model as default on
powerpc64-unknown-linux-gnu.  All tests continue to pass.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167999 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-14 23:23:27 +00:00
Ulrich Weigand
b64e2115de Do not consider a machine instruction that uses and defines the same
physical register as candidate for common subexpression elimination
in MachineCSE.

This fixes a bug on PowerPC in MultiSource/Applications/oggenc/oggenc
caused by MachineCSE invalidly merging two separate DYNALLOC insns.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167855 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-13 18:40:58 +00:00
Jakob Stoklund Olesen
722c9a7925 Fix assertions in updateRegMaskSlots().
The RegMaskSlots contains 'r' slots while NewIdx and OldIdx are 'B'
slots. This broke the checks in the assertions.

This fixes PR14302.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167625 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-09 19:18:49 +00:00
Ulrich Weigand
86aef0a4f0 On PowerPC64, integer return values (as well as arguments) are supposed
to be extended to a full register.   This is modeled in the IR by marking
the return value (or argument) with a signext or zeroext attribute.

However, while these attributes are respected for function arguments,
they are currently ignored for function return values by the PowerPC
back-end.  This patch updates PPCCallingConv.td to ask for the promotion
to i64, and fixes LowerReturn and LowerCallResult to implement it.

The new test case verifies that both arguments and return values are
properly extended when passing them; and also that the optimizers
understand incoming argument and return values are in fact guaranteed
by the ABI to be extended.

The patch caused a spurious breakage in CodeGen/PowerPC/coalesce-ext.ll,
since the test case used a "ret" instruction to create a use of an i32
value at the end of the function (to set up data flow as required for
what the test is intended to test).  Since there's now an implicit
promotion to i64, that data flow no longer works as expected.  To fix
this, this patch now adds an extra "add" to ensure we have an appropriate
use of the i32 value.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167396 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-05 19:39:45 +00:00
Hal Finkel
827b7a070d Add support for the PowerPC-specific inline asm Z constraint and y modifier.
The Z constraint specifies an r+r memory address, and the y modifier expands
to the "r, r" in the asm string. For this initial implementation, the base
register is forced to r0 (which has the special meaning of 0 for r+r addressing
on PowerPC) and the full address is taken in the second register. In the
future, this should be improved.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167388 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-05 18:18:42 +00:00
Adhemerval Zanella
cfe09ed28d [PATCH] PowerPC: Expand load extend vector operations
This patch expands the SEXTLOAD, ZEXTLOAD, and EXTLOAD operations for
vector types when altivec is enabled.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167386 91177308-0d34-0410-b5e6-96231b3b80d8
2012-11-05 17:15:56 +00:00
Bill Schmidt
42d43351b2 This patch addresses an ABI compatibility issue with empty aggregate
parameters.  Examples of these are:

  struct { } a;
  union { } b[256];
  int a[0];

An empty aggregate has an address, although dereferencing that address is
pointless.  When passed as a parameter, an empty aggregate does not consume
a protocol register, nor does it consume a doubleword in the parameter save
area.  Passing an empty aggregate by reference passes an address just as
for any other aggregate.  Returning an empty aggregate uses GPR3 as a hidden
address of the return value location, just as for any other aggregate.

The patch modifies PPCTargetLowering::LowerFormalArguments_64SVR4 and
PPCTargetLowering::LowerCall_64SVR4 to properly skip empty aggregate
parameters passed by value.  The handling of return values and by-reference
parameters was already correct.

Built on powerpc64-unknown-linux-gnu and tested with no new regressions.
A test case is included to test proper handling of empty aggregate
parameters on both sides of the function call protocol.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167090 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-31 01:15:05 +00:00
Adhemerval Zanella
c83b5dc625 PowerPC: Expand FSRQT for vector types
This patch expands FSQRT for floating point vector types when altivec is
used.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167034 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-30 18:29:42 +00:00
Adhemerval Zanella
5f41fd685b PowerPC: More support for Altivec compare operations
This patch adds more support for vector type comparisons using altivec.
It adds correct support for v16i8, v8i16, v4i32, and v4f32 vector
types for comparison operators ==, !=, >, >=, <, and <=.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@167015 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-30 13:50:19 +00:00
Bill Schmidt
e6c56433de This patch solves a problem with passing varargs parameters under the PPC64
ELF ABI.

A varargs parameter consisting of a single-precision floating-point value,
or of a single-element aggregate containing a single-precision floating-point
value, must be passed in the low-order (rightmost) four bytes of the
doubleword stack slot reserved for that parameter.  If there are GPR protocol
registers remaining, the parameter must also be mirrored in the low-order
four bytes of the reserved GPR.

Prior to this patch, such parameters were being passed in the high-order
four bytes of the stack slot and the mirrored GPR.

The patch adds a new test case to verify the correct code generation.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166968 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-29 21:18:16 +00:00
Ulrich Weigand
e669c930a6 In various places throughout the code generator, there were special
checks to avoid performing compile-time arithmetic on PPCDoubleDouble.

Now that APFloat supports arithmetic on PPCDoubleDouble, those checks
are no longer needed, and we can treat the type like any other.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166958 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-29 18:35:49 +00:00
Ulrich Weigand
78dab643e0 Allow i32/i64 for 'f' constraint on PowerPC.
This fixes PR12757.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166943 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-29 17:49:34 +00:00
Bill Schmidt
01d013ec04 This patch adds alignment information for long double to the 64-bit PowerPC
ELF subtarget.

The existing logic is used as a fallback to avoid any changes to the Darwin
ABI.  PPC64 ELF now has two possible data layout strings: one for FreeBSD,
which requires 8-byte alignment, and a default string that requires
16-byte alignment.

I've added a test for PPC64 Linux to verify the 16-byte alignment.  If
somebody wants to add a separate test for FreeBSD, that would be great.

Note that there is a companion patch to update the alignment information
in Clang, which I am committing now as well.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166928 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-29 14:59:36 +00:00
Bill Schmidt
37900c5dcb This patch addresses a PPC64 ELF issue with passing parameters consisting of
structs having size 3, 5, 6, or 7.  Such a struct must be passed and received
as right-justified within its register or memory slot.  The problem is only
present for structs that are passed in registers.

Previously, as part of a patch handling all structs of size less than 8, I
added logic to rotate the incoming register so that the struct was left-
justified prior to storing the whole register.  This was incorrect because
the address of the parameter had already been adjusted earlier to point to
the right-adjusted value in the storage slot.  Essentially I had accidentally
accounted for the right-adjustment twice.

In this patch, I removed the incorrect logic and reorganized the code to make
the flow clearer.

The removal of the rotates changes the expected code generation, so test case
structsinregs.ll has been modified to reflect this.  I also added a new test
case, jaggedstructs.ll, to demonstrate that structs of these sizes can now
be properly received and passed.

I've built and tested the code on powerpc64-unknown-linux-gnu with no new
regressions.  I also ran the GCC compatibility test suite and verified that
earlier problems with these structs are now resolved, with no new regressions.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166680 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-25 13:38:09 +00:00
Ulrich Weigand
6c28a7eec8 This patch fixes failures in the SingleSource/Regression/C/uint64_to_float
test case on PowerPC caused by rounding errors when converting from a 64-bit
integer to a single-precision floating point. The reason for this are
double-rounding effects, since on PowerPC we have to convert to an
intermediate double-precision value first, which gets rounded to the
final single-precision result.

The patch fixes the problem by preparing the 64-bit integer so that the
first conversion step to double-precision will always be exact, and the
final rounding step will result in the correctly-rounded single-precision
result.  The generated code sequence is equivalent to what GCC would generate.

When -enable-unsafe-fp-math is in effect, that extra effort is omitted
and we accept possible rounding errors (just like GCC does as well).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166178 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-18 13:16:11 +00:00
Bill Schmidt
7a6cb15a92 This patch addresses PR13949.
For the PowerPC 64-bit ELF Linux ABI, aggregates of size less than 8
bytes are to be passed in the low-order bits ("right-adjusted") of the
doubleword register or memory slot assigned to them.  A previous patch
addressed this for aggregates passed in registers.  However, small
aggregates passed in the overflow portion of the parameter save area are
still being passed left-adjusted.

The fix is made in PPCTargetLowering::LowerCall_Darwin_Or_64SVR4 on the
caller side, and in PPCTargetLowering::LowerFormalArguments_64SVR4 on
the callee side.  The main fix on the callee side simply extends
existing logic for 1- and 2-byte objects to 1- through 7-byte objects,
and correcting a constant left over from 32-bit code.  There is also a
fix to a bogus calculation of the offset to the following argument in
the parameter save area.

On the caller side, again a constant left over from 32-bit code is
fixed.  Additionally, some code for 1, 2, and 4-byte objects is
duplicated to handle the 3, 5, 6, and 7-byte objects for SVR4 only.  The
LowerCall_Darwin_Or_64SVR4 logic is getting fairly convoluted trying to
handle both ABIs, and I propose to separate this into two functions in a
future patch, at which time the duplication can be removed.

The patch adds a new test (structsinmem.ll) to demonstrate correct
passing of structures of all seven sizes.  Eight dummy parameters are
used to force these structures to be in the overflow portion of the
parameter save area.

As a side effect, this corrects the case when aggregates passed in
registers are saved into the first eight doublewords of the parameter
save area:  Previously they were stored left-justified, and now are
properly stored right-justified.  This requires changing the expected
output of existing test case structsinregs.ll.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166022 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-16 13:30:53 +00:00
NAKAMURA Takumi
20ce6e6562 llvm/test/CodeGen/PowerPC/2012-10-12-bitcast.ll: Try to fix failure on non-ppc hosts, to add -mattr=+altivec.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165803 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-12 16:01:08 +00:00
Ulrich Weigand
7bbb9c7b4a Fix big-endian codegen bug in DAGTypeLegalizer::ExpandRes_BITCAST
On PowerPC, a bitcast of <16 x i8> to i128 may run through a code
path in ExpandRes_BITCAST that attempts to do an intermediate
bitcast to a <4 x i32> vector, and then construct the Hi and Lo parts
of the resulting i128 by pairing up two of those i32 vector elements
each.  The code already recognizes that on a big-endian system, the
first two vector elements form the Hi part, and the final two vector
elements form the Lo part (vice-versa from the little-endian situation).

However, we also need to take endianness into account when forming each
of those separate pairs:  on a big-endian system, vector element 0 is
the *high* part of the pair making up the Hi part of the result, and
vector element 1 is the low part of the pair.  The code currently always
uses vector element 0 as the low part and vector element 1 as the high
part, as is appropriate for little-endian platforms only.

This patch fixes this by swapping the vector elements as they are
paired up as appropriate.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165802 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-12 15:42:58 +00:00
Bill Schmidt
a867f37897 This patch addresses PR13947.
For function calls on the 64-bit PowerPC SVR4 target, each parameter
is mapped to as many doublewords in the parameter save area as
necessary to hold the parameter.  The first 13 non-varargs
floating-point values are passed in registers; any additional
floating-point parameters are passed in the parameter save area.  A
single-precision floating-point parameter (32 bits) must be mapped to
the second (rightmost, low-order) word of its assigned doubleword
slot.

Currently LLVM violates this ABI requirement by mapping such a
parameter to the first (leftmost, high-order) word of its assigned
doubleword slot.  This is internally self-consistent but will not
interoperate correctly with libraries compiled with an ABI-compliant
compiler.

This patch corrects the problem by adjusting the parameter addressing
on both sides of the calling convention.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165714 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-11 15:38:20 +00:00
Bill Schmidt
e0bc37579b Add -mattr=+altivec and remove XFAIL.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165666 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-10 22:25:11 +00:00
Bill Schmidt
3a55b64e5e XFAIL for all targets pending investigation
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165664 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-10 21:52:10 +00:00
Bill Schmidt
26160f4e64 When generating spill and reload code for vector registers on PowerPC,
the compiler makes use of GPR0.  However, there are two flavors of
GPR0 defined by the target:  the 32-bit GPR0 (R0) and the 64-bit GPR0
(X0).  The spill/reload code makes use of R0 regardless of whether we
are generating 32- or 64-bit code.

This patch corrects the problem in the obvious manner, using X0 and
ADDI8 for 64-bit and R0 and ADDI for 32-bit.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165658 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-10 21:25:01 +00:00
Bill Schmidt
a5d0ab5553 The PowerPC VRSAVE register has been somewhat of an odd beast since
the Altivec extensions were introduced.  Its use is optional, and
allows the compiler to communicate to the operating system which
vector registers should be saved and restored during a context switch.
In practice, this information is ignored by the various operating
systems using the SVR4 ABI; the kernel saves and restores the entire
register state.  Setting the VRSAVE register is no longer performed by
the AIX XL compilers, the IBM i compilers, or by GCC on Power Linux
systems.  It seems best to avoid this logic within LLVM as well.

This patch avoids generating code to update and restore VRSAVE for the
PowerPC SVR4 ABIs (32- and 64-bit).  The code remains in place for the
Darwin ABI.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165656 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-10 20:54:15 +00:00
Adhemerval Zanella
1c7d69bbe2 PR12716: PPC crashes on vector compare
Vector compare using altivec 'vcmpxxx' instructions have as third argument
a vector register instead of CR one, different from integer and float-point
compares. This leads to a failure in code generation, where 'SelectSETCC'
expects a DAG with a CR register and gets vector register instead.

This patch changes the behavior by just returning a DAG with the 
vector compare instruction based on the type. The patch also adds a testcase
for all vector types llvm defines.

It also included a fix on signed 5-bits predicates printing, where
signed values were not handled correctly as signed (char are unsigned by
default for PowerPC). This generates 'vspltisw' (vector splat)
instruction with SIM out of range.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165419 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-08 18:59:53 +00:00
Adhemerval Zanella
51aaadb7bd Add floating-point to and from integer conversion
This patch add altivec support for v4i32 to v4f32 and for v4f32 to
v4i32 vector rounding conversion.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165409 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-08 17:27:24 +00:00
Rafael Espindola
7c0278e14e Convert to unix line endings.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@165308 91177308-0d34-0410-b5e6-96231b3b80d8
2012-10-05 13:32:38 +00:00
Roman Divacky
5236ab3fdd Specify MachinePointerInfo as refering to the argument value and offset of the
store when handling byval arguments. Thus preventing reordering of the store
with load with post-RA scheduler.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164553 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-24 20:47:19 +00:00
Roman Divacky
c1611d8d10 Specify cpu to get the correct instruction ordering. Remove XFAIL.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164306 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-20 14:59:42 +00:00
Jordan Rose
3856b07276 Really XFAIL test/CodeGen/PowerPC/structsinregs.ll.
XFAIL needs a trailing colon. Hopefully this will get the buildbots
happy again while Bill works on getting it passing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164237 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-19 17:03:11 +00:00
Bill Schmidt
1bc12b579d XFAIL test/CodeGen/PowerPC/structsinregs.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164233 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-19 16:18:23 +00:00
Bill Schmidt
419f376564 Small structs for PPC64 SVR4 must be passed right-justified in registers.
lib/Target/PowerPC/PPCISelLowering.{h,cpp}
 Rename LowerFormalArguments_Darwin to LowerFormalArguments_Darwin_Or_64SVR4.
 Rename LowerFormalArguments_SVR4 to LowerFormalArguments_32SVR4.
 Receive small structs right-justified in LowerFormalArguments_Darwin_Or_64SVR4.
 Rename LowerCall_Darwin to LowerCall_Darwin_Or_64SVR4.
 Rename LowerCall_SVR4 to LowerCall_32SVR4.
 Pass small structs right-justified in LowerCall_Darwin_Or_64SVR4.

test/CodeGen/PowerPC/structsinregs.ll
 New test.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164228 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-19 15:42:13 +00:00
Roman Divacky
51ca601977 Add test for r164155 and remove two tests superseded by ppc64-calls.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164162 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 19:51:44 +00:00
Roman Divacky
f145c135f3 Avoid symbol name clash when filling TOC.
Patch by Adhemerval Zanella.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164141 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 17:10:37 +00:00
Roman Divacky
4cd56014ae On PPC64 emit the environment pointer. Patch by Adhemerval Zanella.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164139 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 16:55:29 +00:00
Roman Divacky
eb8b7dc536 Optimize local func calls to not emit nop for TOC restoration.
Patch by Adhemerval Zanella.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@164138 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-18 16:47:58 +00:00
Roman Divacky
9d760ae5c6 This patch corrects logic in PPCFrameLowering for save and restore of
nonvolatile condition register fields across calls under the SVR4 ABIs.                                            
                                                                                                                   
 * With the 64-bit ABI, the save location is at a fixed offset of 8 from                                           
the stack pointer.  The frame pointer cannot be used to access this                                                
portion of the stack frame since the distance from the frame pointer may                                           
change with alloca calls.                                                                                          
                                                                                                                   
 * With the 32-bit ABI, the save location is just below the general
register save area, and is accessed via the frame pointer like the rest
of the save areas.  This is an optional slot, so it must only be created                                           
if any of CR2, CR3, and CR4 were modified.                                                                      
                                                                                                                   
 * For both ABIs, save/restore logic is generated only if one of the     
nonvolatile CR fields were modified.                                   

I also took this opportunity to clean up an extra FIXME in
PPCFrameLowering.h.  Save area offsets for 32-bit GPRs are meaningless
for the 64-bit ABI, so I removed them for correctness and efficiency.


Fixes PR13708 and partially also PR13623. It lets us enable exception handling
on PPC64.

Patch by William J. Schmidt!


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163713 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-12 14:47:47 +00:00
Jakob Stoklund Olesen
45c5c57179 Allow overlaps between virtreg and physreg live ranges.
The RegisterCoalescer understands overlapping live ranges where one
register is defined as a copy of the other. With this change, register
allocators using LiveRegMatrix can do the same, at least for copies
between physical and virtual registers.

When a physreg is defined by a copy from a virtreg, allow those live
ranges to overlap:

  %CL<def> = COPY %vreg11:sub_8bit; GR32_ABCD:%vreg11
  %vreg13<def,tied1> = SAR32rCL %vreg13<tied0>, %CL<imp-use,kill>

We can assign %vreg11 to %ECX, overlapping the live range of %CL.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163336 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-06 18:15:23 +00:00
Jakob Stoklund Olesen
daddf07497 Move tie checks into MachineVerifier::visitMachineOperand.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@163152 91177308-0d34-0410-b5e6-96231b3b80d8
2012-09-04 18:38:28 +00:00
Hal Finkel
bbd169b1d9 Reserve space for the mandatory traceback fields on PPC64.
We need to reserve space for the mandatory traceback fields,
though leaving them as zero is appropriate for now.

Although the ABI calls for these fields to be filled in fully, no
compiler on Linux currently does this, and GDB does not read these
fields.  GDB uses the first word of zeroes during exception handling to
find the end of the function and the size field, allowing it to compute
the beginning of the function.  DWARF information is used for everything
else.  We need the extra 8 bytes of pad so the size field is found in
the right place.

As a comparison, GCC fills in a few of the fields -- language, number
of saved registers -- but ignores the rest.  IBM's proprietary OSes do
make use of the full traceback table facility.

Patch by Bill Schmidt.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162854 91177308-0d34-0410-b5e6-96231b3b80d8
2012-08-29 20:22:24 +00:00
Roman Divacky
c6c2ced384 Emit word of zeroes after the last instruction as a start of the mandatory
traceback table on PowerPC64. This helps gdb handle exceptions. The other
mandatory fields are ignored by gdb and harder to implement so just add
there a FIXME.

Patch by Bill Schmidt. PR13641.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162778 91177308-0d34-0410-b5e6-96231b3b80d8
2012-08-28 19:06:55 +00:00
Hal Finkel
621b77ade2 Add PPC Freescale e500mc and e5500 subtargets.
Add subtargets for Freescale e500mc (32-bit) and e5500 (64-bit) to
the PowerPC backend.

Patch by Tobias von Koch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162764 91177308-0d34-0410-b5e6-96231b3b80d8
2012-08-28 16:12:39 +00:00
Hal Finkel
f3c3828e57 Allow remat of LI on PPC.
Allow load-immediates to be rematerialised in the register coalescer for
PPC. This makes test/CodeGen/PowerPC/big-endian-formal-args.ll fail,
because it relies on a register move getting emitted. The immediate load is
equivalent, so change this test case.

Patch by Tobias von Koch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162727 91177308-0d34-0410-b5e6-96231b3b80d8
2012-08-28 02:10:33 +00:00
Hal Finkel
82b3821208 Eliminate redundant CR moves on PPC32.
The 32-bit ABI requires CR bit 6 to be set if the call has fp arguments and
unset if it doesn't. The solution up to now was to insert a MachineNode to
set/unset the CR bit, which produces a CR vreg. This vreg was then copied
into CR bit 6. When the register allocator saw a bunch of these in the same
function, it allocated the set/unset CR bit in some random CR register (1
extra instruction) and then emitted CR moves before every vararg function
call, rather than just setting and unsetting CR bit 6 directly before every
vararg function call. This patch instead inserts a PPCcrset/PPCcrunset
instruction which are then matched by a dedicated instruction pattern.

Patch by Tobias von Koch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162725 91177308-0d34-0410-b5e6-96231b3b80d8
2012-08-28 02:10:27 +00:00
Hal Finkel
97d047dec7 Optimize zext on PPC64.
The zeroextend IR instruction is lowered to an 'and' node with an immediate
mask operand, which in turn gets legalised to a sequence of ori's & ands.
This can be done more efficiently using the rldicl instruction.

Patch by Tobias von Koch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162724 91177308-0d34-0410-b5e6-96231b3b80d8
2012-08-28 02:10:15 +00:00
Roman Divacky
9fb8b49380 Lower constant pools and jump tables via TOC on PPC64/SVR4.
In collaboration with Adhemerval Zanella.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@162562 91177308-0d34-0410-b5e6-96231b3b80d8
2012-08-24 16:26:02 +00:00
Nadav Rotem
3e883734fa During the CodeGenPrepare we often lower intrinsics (such as objsize)
and allow some optimizations to turn conditional branches into unconditional.
This commit adds a simple control-flow optimization which merges two consecutive
basic blocks which are connected by a single edge. This allows the codegen to
operate on larger basic blocks.

rdar://11973998



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161852 91177308-0d34-0410-b5e6-96231b3b80d8
2012-08-14 05:19:07 +00:00
Hal Finkel
f45717e985 MFTB on PPC64 should really be encoded using MFSPR.
The MFTB instruction itself is being phased out, and its functionality
is provided by MFSPR. According to the ISA docs, using MFSPR works on all known
chips except for the 601 (which did not have a timebase register anyway)
and the POWER3.

Thanks to Adhemerval Zanella for pointing this out!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161346 91177308-0d34-0410-b5e6-96231b3b80d8
2012-08-06 21:21:44 +00:00
Hal Finkel
8cc3474f72 Add readcyclecounter lowering on PPC64.
On PPC64, this can be done with a simple TableGen pattern.
To enable this, I've added the (otherwise missing) readcyclecounter
SDNode definition to TargetSelectionDAG.td.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161302 91177308-0d34-0410-b5e6-96231b3b80d8
2012-08-04 14:10:46 +00:00
Bob Wilson
53624a2df5 Refactor and check "onlyReadsMemory" before optimizing builtins.
This patch is mostly just refactoring a bunch of copy-and-pasted code, but
it also adds a check that the call instructions are readnone or readonly.
That check was already present for sin, cos, sqrt, log2, and exp2 calls, but
it was missing for the rest of the builtins being handled in this code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161282 91177308-0d34-0410-b5e6-96231b3b80d8
2012-08-03 23:29:17 +00:00
Chandler Carruth
1de43ede89 Fix the remaining TCL-style quotes found in the testsuite. This is
another mechanical change accomplished though the power of terrible Perl
scripts.

I have manually switched some "s to 's to make escaping simpler.

While I started this to fix tests that aren't run in all configurations,
the massive number of tests is due to a really frustrating fragility of
our testing infrastructure: things like 'grep -v', 'not grep', and
'expected failures' can mask broken tests all too easily.

Essentially, I'm deeply disturbed that I can change the testsuite so
radically without causing any change in results for most platforms. =/

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159547 91177308-0d34-0410-b5e6-96231b3b80d8
2012-07-02 19:09:46 +00:00
Chandler Carruth
49589f0d0e Convert the uses of '|&' to use '2>&1 |' instead, which works on old
versions of Bash. In addition, I can back out the change to the lit
built-in shell test runner to support this.

This should fix the majority of fallout on Darwin, but I suspect there
will be a few straggling issues.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159544 91177308-0d34-0410-b5e6-96231b3b80d8
2012-07-02 18:37:59 +00:00