Commit Graph

4269 Commits

Author SHA1 Message Date
Bill Schmidt
9a2a305ed4 [PowerPC 3/4] Little-endian adjustments for VSX vector shuffle
When performing instruction selection for ISD::VECTOR_SHUFFLE, there
is special code for handling v2f64 and v2i64 using VSX instructions.
This code must be adjusted for little-endian.  Because the two inputs
are treated as a double-wide register, we must swap their order for
little endian.  To get the appropriate mask elements to use with the
big-endian biased XXPERMDI instruction, we must reverse their order
and invert the bits.

A new test is added to test the 16 possible values of the shuffle
mask.  It is initially disabled for reasons specified in the test.  It
is re-enabled by patch 4/4.




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223791 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 16:52:29 +00:00
Bill Schmidt
03b9f9f2b6 [PowerPC 2/4] Little-endian adjustments for VSX insert/extract operations
For little endian, we need to make some straightforward adjustments in
the code expansions for scalar_to_vector and vector_extract of v2f64.
First, scalar_to_vector must place the scalar into vector element
zero.  However, our implementation of SUBREG_TO_REG will place it into
big-element vector element zero (high-order bits), and for little
endian we need it in the low-order bits.  The LE implementation splats
the high-order doubleword into the low-order doubleword.

Second, the meaning of (vector_extract x, 0) and (vector_extract x, 1)
must be reversed for similar reasons.

A new test is added that tests code generation for insertelement and
extractelement for both element 0 and element 1.  It is disabled in
this patch but enabled in patch 4/4, for reasons stated in the test.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223788 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 16:43:32 +00:00
Bill Schmidt
b900895384 [PowerPC 1/4] Little-endian adjustments for VSX loads/stores
This patch addresses the inherent big-endian bias in the lxvd2x,
lxvw4x, stxvd2x, and stxvw4x instructions.  These instructions load
vector elements into registers left-to-right (with the first element
loaded into the high-order bits of the register), regardless of the
endian setting of the processor.  However, these are the only
vector memory instructions that permit unaligned storage accesses, so
we want to use them for little-endian.

To make this work, a lxvd2x or lxvw4x is replaced with an lxvd2x
followed by an xxswapd, which swaps the doublewords.  This works for
lxvw4x as well as lxvd2x, because for lxvw4x on an LE system the
vector elements are in LE order (right-to-left) within each
doubleword.  (Thus after lxvw2x of a <4 x float> the elements will
appear as 1, 0, 3, 2.  Following the swap, they will appear as 3, 2,
0, 1, as desired.)   For stores, an stxvd2x or stxvw4x is replaced
with an stxvd2x preceded by an xxswapd.

Introduction of extra swap instructions provides correctness, but
obviously is not ideal from a performance perspective.  Future patches
will address this with optimizations to remove most of the introduced
swaps, which have proven effective in other implementations.

The introduction of the swaps is performed during lowering of LOAD,
STORE, INTRINSIC_W_CHAIN, and INTRINSIC_VOID operations.  The latter
are used to translate intrinsics that specify the VSX loads and stores
directly into equivalent sequences for little endian.  Thus code that
uses vec_vsx_ld and vec_vsx_st does not have to be modified to be
ported from BE to LE.

We introduce new PPCISD opcodes for LXVD2X, STXVD2X, and XXSWAPD for
use during this lowering step.  In PPCInstrVSX.td, we add new SDType
and SDNode definitions for these (PPClxvd2x, PPCstxvd2x, PPCxxswapd).
These are recognized during instruction selection and mapped to the
correct instructions.

Several tests that were written to use -mcpu=pwr7 or pwr8 are modified
to disable VSX on LE variants because code generation changes with
this and subsequent patches in this set.  I chose to include all of
these in the first patch than try to rigorously sort out which tests
were broken by one or another of the patches.  Sorry about that.

The new test vsx-ldst-builtin-le.ll, and the changes to vsx-ldst.ll,
are disabled until LE support is enabled because of breakages that
occur as noted in those tests.  They are re-enabled in patch 4/4.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223783 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 16:35:51 +00:00
Bill Schmidt
13dd854d8c Restore r223709 as it was meant to be, and enable FeatureP8Vector for P8
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223751 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 03:02:48 +00:00
NAKAMURA Takumi
3d6e1eeeb2 Revert r223709, "[PowerPC]Activate FeatureVSX for the Power target", to unbreak bots.
CodeGen/PowerPC/vsx-p8.ll was failing.

  '+power8-vector' is not a recognized feature for this target (ignoring feature)
  llvm/test/CodeGen/PowerPC/vsx-p8.ll:33:14: error: expected string not found in input
  ; CHECK-REG: lxvw4x 34, 0, 3
               ^
  <stdin>:50:2: note: scanning from here
   .align 3
   ^
  <stdin>:61:2: note: possible intended match here
   lvx 3, 0, 3
   ^

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223729 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-09 01:03:27 +00:00
Bill Seurer
ac6e0c82ed [PowerPC]Activate FeatureVSX for the Power target
This change activates FeatureVSX for Power 7 and Power 8 in PPC.td.

http://reviews.llvm.org/D6570


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223709 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-08 23:07:12 +00:00
Hal Finkel
b849e04d2b [PowerPC] Don't use a non-allocatable register to implement the 'cc' alias
GCC accepts 'cc' as an alias for 'cr0', and we need to do the same when
processing inline asm constraints. This had previously been implemented using a
non-allocatable register, named 'cc', that was listed as an alias of 'cr0', but
the infrastructure does not seem to support this properly (neither the register
allocator nor the scheduler properly accounts for the alias). Instead, we can
just process this as a naming alias inside of the inline asm
constraint-processing code, so we'll do that instead.

There are two regression tests, one where the post-RA scheduler did the wrong
thing with the non-allocatable alias, and one where the register allocator did
the wrong thing. Fixes PR21742.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223708 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-08 22:54:22 +00:00
Bill Seurer
dfa6293b55 [PowerPC]Add VSX loads/stores to fastisel for PPC target
This patch adds VSX floating point loads and stores to fastisel.

Along with the change to tablegen (D6220), VSX instructions are now fully supported in fastisel.

http://reviews.llvm.org/D6274


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223507 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-05 20:15:56 +00:00
Hal Finkel
ec086bf087 [PowerPC] 'cc' should be an alias only to 'cr0'
We had mistakenly believed that GCC's 'cc' referred to the entire
condition-code register (cr0 through cr7) -- and implemented this in r205630 to
fix PR19326, but 'cc' is actually an alias only to 'cr0'. This is causing LLVM
to clobber too much with legacy code with inline asm using the 'cc' clobber.

Fixes PR21451.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223328 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-04 00:46:20 +00:00
Hal Finkel
c48b3bf318 [PowerPC] Fix inline asm memory operands not to use r0
On PowerPC, inline asm memory operands might be expanded as 0($r), where $r is
a register containing the address. As a result, this register cannot be r0, and
we need to enforce this register subclass constraint to prevent miscompiling
the code (we'd get this constraint for free with the usual instruction
definitions, but that scheme has no knowledge of how we end up printing inline
asm memory operands, and so here we need to do it 'by hand'). We can accomplish
this within the current address-mode selection framework by introducing an
explicit COPY_TO_REGCLASS node.

Fixes PR21443.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223318 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-03 23:40:13 +00:00
Will Schmidt
ad304153f4 Add TableGen info for Power8.
This is based on the Power7 version, with units added and renamed to match P8.

Differential Revision: http://reviews.llvm.org/D6358




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223257 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-03 18:46:30 +00:00
Hal Finkel
cf988bca67 [PowerPC] Print all inline-asm consts as signed numbers
Almost all immediates in PowerPC assembly (both 32-bit and 64-bit) are signed
numbers, and it is important that we print them as such. To make sure that
happens, we change PPCTargetLowering::LowerAsmOperandForConstraint so that it
does all intermediate checks on a signed-extended int64_t value, and then
creates the resulting target constant using MVT::i64. This will ensure that all
negative values are printed as negative values (mirroring what is done in other
backends to achieve the same sign-extension effect).

This came up in the context of inline assembly like this:
  "add%I2   %0,%0,%2", ..., "Ir"(-1ll)
where we used to print:
  addi   3,3,4294967295
and gcc would print:
  addi   3,3,-1
and gas accepts both forms, but our builtin assembler (correctly) does not. Now
we print -1 like gcc does.

While here, I replaced a bunch of custom integer checks with isInt<16> and
friends from MathExtras.h.

Thanks to Paul Hargrove for the bug report.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223220 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-03 09:37:50 +00:00
Hal Finkel
1dce7b19a0 [PowerPC] Fix readcyclecounter to be custom expanded for all 32-bit targets
We need to use the custom expansion of readcyclecounter on all 32-bit targets
(even those with 64-bit registers). This should fix the ppc64 buildbot.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223182 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-03 00:19:17 +00:00
Hal Finkel
1855b261db [PowerPC] Implement readcyclecounter for PPC32
We've long supported readcyclecounter on PPC64, but it is easier there (the
read of the 64-bit time-base register can be accomplished via a single
instruction). This now provides an implementation for PPC32 as well. On PPC32,
the time-base register is still 64 bits, but can only be read 32 bits at a time
via two separate SPRs. The ISA manual explains how to do this properly (it
involves re-reading the upper bits and looping if the counter has wrapped while
being read).

This requires PPC to implement a custom integer splitting legalization for the
READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then
gets turned into a pseudo-instruction, which is then expanded to the necessary
sequence (which has three SPR reads, the comparison and the branch).

Thanks to Paul Hargrove for pointing out to me that this was still unimplemented.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@223161 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-02 22:01:00 +00:00
Jay Foad
8b9cea42db [PowerPC] Fix unwind info with dynamic stack realignment
Summary:
PowerPC DWARF unwind info defined CFA as SP + offset even in a function
where the stack had been dynamically realigned. This clearly doesn't
work because the offset from SP to CFA is not a constant. Fix it by
defining CFA as BP instead.

This was causing the AddressSanitizer null_deref test to fail 50% of
the time, depending on whether SP happened to be 32-byte aligned on
entry to a particular function or not.

Reviewers: willschm, uweigand, hfinkel

Reviewed By: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6410

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222996 91177308-0d34-0410-b5e6-96231b3b80d8
2014-12-01 09:42:32 +00:00
Hal Finkel
186abcb853 [PowerPC] Add asm support for cache-inhibited ld/st instructions
Add assembler support for the fixed-point cache-inhibited load/store
instructions. These are hypervisor-level only, so don't get too excited ;)

Fixes PR21650.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222976 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-30 10:15:56 +00:00
Simon Pilgrim
1c86b63b9b Target triple OS detection tidyup. NFC
Use Triple::isOS*() helpers where possible.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222960 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-29 19:18:21 +00:00
Craig Topper
c0dae440e6 Replace neverHasSideEffects=1 with hasSideEffects=0 in all .td files.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222801 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-26 00:46:26 +00:00
Hal Finkel
b932ed3c3d [PowerPC] Add the 'attn' instruction
The attn instruction is not part of the Power ISA, but is documented in the A2
user manual, and is accepted by the GNU assembler for the A2 and the POWER4+.
Reported as part of PR21650.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222712 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-25 00:30:11 +00:00
Hal Finkel
5d6f185653 [PowerPC] Implement combineRepeatedFPDivisors
This does not matter on newer cores (where we can use reciprocal estimates in
fast-math mode anyway), but for older cores this allows us to generate better
fast-math code where we have multiple FDIVs with a common divisor.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222710 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-24 23:45:21 +00:00
Ulrich Weigand
edc6a13992 [PowerPC] Fix PR 21652 - copy st_other bits on symbol assignment
When processing an assignment in the integrated assembler that sets
a symbol to the value of another symbol, we need to copy the st_other
bits that encode the local entry point offset.

Modeled after MipsTargetELFStreamer::emitAssignment handling of the
ELF::STO_MIPS_MICROMIPS flag.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222672 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-24 18:09:47 +00:00
NAKAMURA Takumi
3bc8b4b38b Add LLVMScalarOpts to LLVMPowerPCCodeGen.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222516 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-21 09:14:45 +00:00
Craig Topper
e0ed7df6b0 Remove a bunch of unnecessary typecasts to 'const TargetRegisterClass *'
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222509 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-21 05:58:21 +00:00
Hal Finkel
361eafaffa [PPC] Use SeparateConstOffsetFromGEP
This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in
the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on
SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM,
there is a store moved out of the inner loop) and a potential speedup on
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it
makes some code look cleaner, and synchronizing the backends in this regard
seems like a generally good thing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222504 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-21 04:35:51 +00:00
Reid Kleckner
d12434058d Add out of line virtual destructors to all LLVMTargetMachine subclasses
These recently all grew a unique_ptr<TargetLoweringObjectFile> member in
r221878.  When anyone calls a virtual method of a class, clang-cl
requires all virtual methods to be semantically valid. This includes the
implicit virtual destructor, which triggers instantiation of the
unique_ptr destructor, which fails because the type being deleted is
incomplete.

This is just part of the ongoing saga of PR20337, which is affecting
Blink as well. Because the MSVC ABI doesn't have key functions, we end
up referencing the vtable and implicit destructor on any virtual call
through a class. We don't actually end up emitting the dtor, so it'd be
good if we could avoid this unneeded type completion work.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222480 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-20 23:37:18 +00:00
David Blaikie
5401ba7099 Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.

This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@222334 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-19 07:49:26 +00:00
Bill Schmidt
40b0f5d6ce [PowerPC] Add VSX builtins for vec_div
This patch adds builtin support for xvdivdp and xvdivsp, along with a
test case.  Straightforward stuff.

There's a companion patch for Clang.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221983 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-14 12:10:40 +00:00
Aditya Nandakumar
365df40768 We can get the TLOF from the TargetMachine - so constructor no longer requires TargetLoweringObjectFile to be passed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221926 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 21:29:21 +00:00
Aditya Nandakumar
847729d19a This patch changes the ownership of TLOF from TargetLoweringBase to TargetMachine so that different subtargets could share the TLOF effectively
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221878 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-13 09:26:31 +00:00
Justin Hibbits
fcd08c294a Add support for small-model PIC for PowerPC.
Summary:
Large-model was added first.  With the addition of support for multiple PIC
models in LLVM, now add small-model PIC for 32-bit PowerPC, SysV4 ABI.  This
generates more optimal code, for shared libraries with less than about 16380
data objects.

Test Plan: Test cases added or updated

Reviewers: joerg, hfinkel

Reviewed By: hfinkel

Subscribers: jholewinski, mcrosier, emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D5399

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221791 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-12 15:16:30 +00:00
Bill Schmidt
fc22bfd921 [PowerPC] Add vec_vsx_ld and vec_vsx_st intrinsics
This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for
PowerPC, which provide programmer access to the lxvd2x, lxvw4x,
stxvd2x, and stxvw4x instructions.

New LLVM intrinsics are provided to represent these four instructions
in IntrinsicsPowerPC.td.  These are patterned after the similar
intrinsics for lvx and stvx (Altivec).  In PPCInstrVSX.td, these
intrinsics are tied to the code gen patterns, with additional patterns
to allow plain vanilla loads and stores to still generate these
instructions.

At -O1 and higher the intrinsics are immediately converted to loads
and stores in InstCombineCalls.cpp.  This will open up more
optimization opportunities while still allowing the correct
instructions to be generated.  (Similar code exists for aligned
Altivec loads and stores.)

The new intrinsics are added to the code that checks for consecutive
loads and stores in PPCISelLowering.cpp, as well as to
PPCTargetLowering::getTgtMemIntrinsic().

There's a new test to verify the correct instructions are generated.
The loads and stores tend to be reordered, so the test just counts
their number.  It runs at -O2, as it's not very effective to test this
at -O0, when many unnecessary loads and stores are generated.

I ended up having to modify vsx-fma-m.ll.  It turns out this test case
is slightly unreliable, but I don't know a good way to prevent
problems with it.  The xvmaddmdp instructions read and write the same
register, which is one of the multiplicands.  Commutativity allows
either to be chosen.  If the FMAs are reordered differently than
expected by the test, the register assignment can be different as a
result.  Hopefully this doesn't change often.

There is a companion patch for Clang.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221767 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-12 04:19:40 +00:00
Rafael Espindola
6a222ec893 Pass an ArrayRef to MCDisassembler::getInstruction.
With this patch MCDisassembler::getInstruction takes an ArrayRef<uint8_t>
instead of a MemoryObject.

Even on X86 there is a maximum size an instruction can have. Given
that, it seems way simpler and more efficient to just pass an ArrayRef
to the disassembler instead of a MemoryObject and have it do a virtual
call every time it wants some extra bytes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221751 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-12 02:04:27 +00:00
Bill Schmidt
10161a0cce [PowerPC] Replace foul hackery with real calls to __tls_get_addr
My original support for the general dynamic and local dynamic TLS
models contained some fairly obtuse hacks to generate calls to
__tls_get_addr when lowering a TargetGlobalAddress.  Rather than
generating real calls, special GET_TLS_ADDR nodes were used to wrap
the calls and only reveal them at assembly time.  I attempted to
provide correct parameter and return values by chaining CopyToReg and
CopyFromReg nodes onto the GET_TLS_ADDR nodes, but this was also not
fully correct.  Problems were seen with two back-to-back stores to TLS
variables, where the call sequences ended up overlapping with unhappy
results.  Additionally, since these weren't real calls, the proper
register side effects of a call were not recorded, so clobbered values
were kept live across the calls.

The proper thing to do is to lower these into calls in the first
place.  This is relatively straightforward; see the changes to
PPCTargetLowering::LowerGlobalTLSAddress() in PPCISelLowering.cpp.
The changes here are standard call lowering, except that we need to
track the fact that these calls will require a relocation.  This is
done by adding a machine operand flag of MO_TLSLD or MO_TLSGD to the
TargetGlobalAddress operand that appears earlier in the sequence.

The calls to LowerCallTo() eventually find their way to
LowerCall_64SVR4() or LowerCall_32SVR4(), which call FinishCall(),
which calls PrepareCall().  In PrepareCall(), we detect the calls to
__tls_get_addr and immediately snag the TargetGlobalTLSAddress with
the annotated relocation information.  This becomes an extra operand
on the call following the callee, which is expected for nodes of type
tlscall.  We change the call opcode to CALL_TLS for this case.  Back
in FinishCall(), we change it again to CALL_NOP_TLS for 64-bit only,
since we require a TOC-restore nop following the call for the 64-bit
ABIs.

During selection, patterns in PPCInstrInfo.td and PPCInstr64Bit.td
convert the CALL_TLS nodes into BL_TLS nodes, and convert the
CALL_NOP_TLS nodes into BL8_NOP_TLS nodes.  This replaces the code
removed from PPCAsmPrinter.cpp, as the BL_TLS or BL8_NOP_TLS
nodes can now be emitted normally using their patterns and the
associated printTLSCall print method.

Finally, as a result of these changes, all references to get-tls-addr
in its various guises are no longer used, so they have been removed.

There are existing TLS tests to verify the changes haven't messed
anything up).  I've added one new test that verifies that the problem
with the original code has been fixed.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221703 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-11 20:44:09 +00:00
Rafael Espindola
9272305648 MCAsmParserExtension has a copy of the MCAsmParser. Use it.
Base classes were storing a second copy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221667 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-11 05:18:41 +00:00
Rafael Espindola
d342c4c748 Misc style fixes. NFC.
This fixes a few cases of:

* Wrong variable name style.
* Lines longer than 80 columns.
* Repeated names in comments.
* clang-format of the above.

This make the next patch a lot easier to read.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221615 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-10 18:11:10 +00:00
Rafael Espindola
5793838fc8 Remove redundant calls to isMaterializable.
This removes calls to isMaterializable in the following cases:

* It was redundant with a call to isDeclaration now that isDeclaration returns
  the correct answer for materializable functions.
* It was followed by a call to Materialize. Just call Materialize and check EC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@221050 91177308-0d34-0410-b5e6-96231b3b80d8
2014-11-01 16:46:18 +00:00
Bill Schmidt
2d32816a45 [PowerPC] Initial VSX intrinsic support, with min/max for vector double
Now that we have initial support for VSX, we can begin adding
intrinsics for programmer access to VSX instructions.  This patch adds
basic support for VSX intrinsics in general, and tests it by
implementing intrinsics for minimum and maximum for the vector double
data type.

The LLVM portion of this is quite straightforward.  There is a
companion patch for Clang.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220988 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-31 19:19:07 +00:00
Ulrich Weigand
8a9c531e9a [PowerPC] Load BlockAddress values from the TOC in 64-bit SVR4 code
Since block address values can be larger than 2GB in 64-bit code, they
cannot be loaded simply using an @l / @ha pair, but instead must be
loaded from the TOC, just like GlobalAddress, ConstantPool, and
JumpTable values are.

The commit also fixes a bug in PPCLinuxAsmPrinter::doFinalization where
temporary labels could not be used as TOC values, since code would
attempt (and fail) to use GetOrCreateSymbol to create a symbol of the
same name as the temporary label.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220959 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-31 10:33:14 +00:00
Sanjay Patel
a46f06efe2 Use rsqrt (X86) to speed up reciprocal square root calcs
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.

For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).

This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..

See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900

Differential Revision: http://reviews.llvm.org/D5658



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220570 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-24 17:02:16 +00:00
Bill Schmidt
55321cd8a7 [PATCH] Support select-cc for VSFRC when VSX is enabled
A previous patch enabled SELECT_VSRC and SELECT_CC_VSRC for VSX to
handle <2 x double> cases.  This patch adds SELECT_VSFRC and
SELECT_CC_VSFRC to allow use of all 64 vector-scalar registers for the
f64 type when VSX is enabled.  The changes are analogous to those in
the previous patch.  I've added a new variant to vsx.ll to test the
code generation.

(I also cleaned up a little formatting in PPCInstrVSX.td from the
previous patch.)


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220395 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-22 16:58:20 +00:00
Bill Schmidt
013cbf2f1a [PowerPC] Support select-cc for VSX
The tests test/CodeGen/Generic/select-cc.ll and
test/CodeGen/PowerPC/select-cc.ll both fail with VSX enabled.  The
problem is that the lowering logic for the SELECT and SELECT_CC
operations doesn't currently support the VSX registers.  This patch
fixes that.

In lib/Target/PowerPC/PPCInstrInfo.td, we have pseudos to handle this
for other register classes.  Similar pseudos are added in
PPCInstrVSX.td (they must be there, because the "vsrc" register class
definition appears there) for the VSRC register class.  The
SELECT_VSRC pseudo is then used in pattern matching for SELECT_CC.

The rest of the patch just adds logic for SELECT_VSRC wherever similar
logic appears for SELECT_VRRC.

There are no new test cases because the existing tests above test
this, along with a variant in test/CodeGen/PowerPC/vsx.ll.

After discussion with Hal, a future patch will add similar _VSFRC
variants to override f64 type handling (currently using F8RC).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220385 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-22 13:13:40 +00:00
Bill Schmidt
41454cc88b [PowerPC] Avoid VSX FMA mutate when killed product reg = addend reg
With VSX enabled, test/CodeGen/PowerPC/recipest.ll exposes a bug in
the FMA mutation pass.  If we have a situation where a killed product
register is the same register as the FMA target, such as:

   %vreg5<def,tied1> = XSNMSUBADP %vreg5<tied0>, %vreg11, %vreg5,
                       %RM<imp-use>; VSFRC:%vreg5 F8RC:%vreg11 

then the substitution makes no sense.  We end up getting a crash when
we try to extend the interval associated with the killed product
register, as there is already a live range for %vreg5 there.  This
patch just disables the mutation under those circumstances.

Since recipest.ll generates different code with VMX enabled, I've
modified that test to use -mattr=-vsx.  I've borrowed the code from
that test that exposed the bug and placed it in fma-mutate.ll, where
it tests several mutation opportunities including the "bad" one.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220290 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-21 13:02:37 +00:00
Bill Schmidt
b56bf6b112 [PowerPC] Change assert to better form
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220092 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-17 21:19:59 +00:00
Bill Schmidt
d2dcbd00f7 [PowerPC] Change liveness testing in VSX FMA mutation pass
With VSX enabled, LLVM crashes when compiling
test/CodeGen/PowerPC/fma.ll.  I traced this to the liveness test
that's revised in this patch. The interval test is designed to only
work for virtual registers, but in this case the AddendSrcReg is
physical. Since there is already a walk of the MIs between the
AddendMI and the FMA, I added a check for def/kill of the AddendSrcReg
in that loop.  At Hal Finkel's request, I converted the liveness test
to an assert restricted to virtual registers.

I've changed the fma.ll test to have VSX and non-VSX variants so we
can test both kinds of multiply-adds.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220090 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-17 21:02:44 +00:00
Bill Schmidt
b76f5ba103 [PowerPC] Enable use of lxvw4x/stxvw4x in VSX code generation
Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64
types, but does not yet use lxvw4x and stxvw4x for 4x32 types.  This
patch adds that support.

As with lxvd2x/stxvd2x, this involves straightforward overriding of
the patterns normally recognized for lvx/stvx, with preference given
to the VSX patterns when VSX is enabled.

In addition, the logic for permitting misaligned memory accesses is
modified so that v4r32 and v4i32 are treated the same as v2f64 and
v2i64 when VSX is enabled.  Finally, the DAG generation for unaligned
loads is changed to just use a normal LOAD (which will become lxvw4x)
on P8 and later hardware, where unaligned loads are preferred over
lvsl/lvx/lvx/vperm.

A number of tests now generate the VSX loads/stores instead of
lvx/stvx, so this patch adds VSX variants to those tests.  I've also
added <4 x float> tests to the vsx.ll test case, and created a
vsx-p8.ll test case to be used for testing code generation for the
P8Vector feature.  For now, that simply tests the unaligned load/store
behavior.

This has been tested along with a temporary patch to enable the VSX
and P8Vector features, with no new regressions encountered with or
without the temporary patch applied.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220047 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-17 15:13:38 +00:00
Rafael Espindola
90ce9f70e2 Simplify handling of --noexecstack by using getNonexecutableStackSection.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219799 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-15 16:12:52 +00:00
Eric Christopher
ff9182749e Use the triple to figure out if this is a darwin target, not
the subtarget.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219673 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-14 08:25:26 +00:00
Benjamin Kramer
fedd0e2a21 MC: Bit pack MCSymbolData.
On x86_64 this brings it from 80 bytes to 64 bytes. Also make any member
variables private and clean up uses to go through the existing accessors.

NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219573 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-11 15:07:21 +00:00
Bill Schmidt
38a202955c [PowerPC] Reduce names from Power8Vector to P8Vector
Per Hal Finkel's review, improving typability of some variable names.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219514 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-10 17:21:15 +00:00
Bill Schmidt
836ca75dd7 [PowerPC] Add feature for Power8 vector extensions
The current VSX feature for PowerPC specifies availability of the VSX
instructions added with the 2.06 architecture version.  With 2.07, the
architecture adds new instructions to both the Category:Vector and
Category:VSX instruction sets.  Additionally, unaligned vector storage
operations have improved performance.

This patch adds a feature to provide access to the new instructions
and performance capabilities of Power8.  For compatibility with GCC,
the feature is controlled via a new -mpower8-vector switch, and the
feature causes the __POWER8_VECTOR__ builtin define to be generated by
the preprocessor.

There is a companion patch for cfe being committed at the same time.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219501 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-10 15:09:28 +00:00
Samuel Antao
f75bfbea17 Fix bug in GPR to FPR moves in PPC64LE.
The current implementation of GPR->FPR register moves uses a stack slot. This mechanism writes a double word and reads a word. In big-endian the load address must be displaced by 4-bytes in order to get the right value. In little endian this is no longer required. This patch fixes the issue and adds LE regression tests to fast-isel-conversion which currently expose this problem.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219441 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-09 20:42:56 +00:00
Bill Schmidt
c307b3034a [PPC64] VSX indexed-form loads use wrong instruction format
The VSX instruction definitions for lxsdx, lxvd2x, lxvdsx, and lxvw4x
incorrectly use the XForm_1 instruction format, rather than the
XX1Form instruction format.  This is likely a pasto when creating
these instructions, which were based on lvx and so forth.  This patch
uses the correct format.

The existing reformatting test (test/MC/PowerPC/vsx.s) missed this
because the two formats differ only in that XX1Form has an extension
to the target register field in bit 31.  The tests for these
instructions used a target register of 7, so the default of 0 in bit
31 for XForm_1 didn't expose a problem.  For register numbers 32-63
this would be noticeable.  I've changed the test to use higher
register numbers to verify my change is effective.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219416 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-09 17:51:35 +00:00
Eric Christopher
e6f32be8df Add subtarget caches to aarch64, arm, ppc, and x86.
These will make it easier to test further changes to the
code generation and optimization pipelines as those are
moved to subtargets initialized with target feature and
target cpu.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219106 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-06 06:45:36 +00:00
Robin Morisset
8e2b2ae80e [Power] Use lwsync for non-seq_cst fences
Summary:
hwsync is only required for seq_cst fences, acquire and release one can use
the cheaper lwsync.

Test Plan: Added some cases to atomics.ll + make check-all

Reviewers: jfb, wschmidt

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5317

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218995 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-03 18:04:36 +00:00
Hal Finkel
626236d9bc [PowerPC] Modern Book-E cores support sync
Older Book-E cores, such as the PPC 440, support only msync (which has the same
encoding as sync 0), but not any of the other sync forms. Newer Book-E cores,
however, do support sync, and for performance reasons we should allow the use
of the more-general form.

This refactors msync use into its own feature group so that it applies by
default only to older Book-E cores (of the relevant cores, we only have
definitions for the PPC440/450 currently).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218923 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 22:34:22 +00:00
Robin Morisset
2b1874cbd4 [Power] Improve the expansion of atomic loads/stores
Summary:
Atomic loads and store of up to the native size (32 bits, or 64 for PPC64)
can be lowered to a simple load or store instruction (as the synchronization
is already handled by AtomicExpand, and the atomicity is guaranteed thanks to
the alignment requirements of atomic accesses). This is exactly what this patch
does. Previously, these were implemented by complex
load-linked/store-conditional loops.. an obvious performance problem.

For example, this patch turns
```
define void @store_i8_unordered(i8* %mem) {
  store atomic i8 42, i8* %mem unordered, align 1
  ret void
}
```
from
```
_store_i8_unordered:                    ; @store_i8_unordered
; BB#0:
    rlwinm r2, r3, 3, 27, 28
    li r4, 42
    xori r5, r2, 24
    rlwinm r2, r3, 0, 0, 29
    li r3, 255
    slw r4, r4, r5
    slw r3, r3, r5
    and r4, r4, r3
LBB4_1:                                 ; =>This Inner Loop Header: Depth=1
    lwarx r5, 0, r2
    andc r5, r5, r3
    or r5, r4, r5
    stwcx. r5, 0, r2
    bne cr0, LBB4_1
; BB#2:
    blr
```
into
```
_store_i8_unordered:                    ; @store_i8_unordered
; BB#0:
    li r2, 42
    stb r2, 0(r3)
    blr

```
which looks like a pretty clear win to me.

Test Plan:
fixed the tests + new test for indexed accesses + make check-all

Reviewers: jfb, wschmidt, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5587

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218922 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-02 22:27:07 +00:00
Eric Christopher
300743f74a constify the TargetMachine argument used in the subtarget and
lowering constructors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218832 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 21:36:28 +00:00
Eric Christopher
406dccea99 Now that the optimization level is adjusting the feature string
before we hit the subtarget, remove the constructor parameter.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218817 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 21:05:35 +00:00
Eric Christopher
c9038d9c1b Rework the PPC TargetMachine so that the non-function specific
overrides happen at TargetMachine creation and not on every
subtarget creation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218805 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-01 20:38:26 +00:00
Sanjay Patel
cafc85bf1e Split the estimate() interface into separate functions for each type. NFC.
It was hacky to use an opcode as a switch because it won't always match
(rsqrte != sqrte), and it looks like we'll need to add more special casing
per arch than I had hoped for. Eg, x86 will prefer a different NR estimate
implementation. ARM will want to use it's 'step' instructions. There also
don't appear to be any new estimate instructions in any arch in a long,
long time. Altivec vloge and vexpte may have been the first and last in
that field...



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218698 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-30 20:28:48 +00:00
Sanjay Patel
676af35b38 Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2).
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

And:

z = y / x

into:

z = y * rcpe(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.

Differential Revision: http://reviews.llvm.org/D5484



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218553 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-26 23:01:47 +00:00
Robin Morisset
58bca6e8ec [Power] Use AtomicExpandPass for fence insertion, and use lwsync where appropriate
Summary:
This patch makes use of AtomicExpandPass in Power for inserting fences around
atomic as part of an effort to remove fence insertion from SelectionDAGBuilder.
As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic
lwsync) instead of sync 0 (heavyweight sync) in many cases.

I also added a test, as there was no test for the barriers emitted by the Power
backend for atomic loads and stores.

Test Plan: new test + make check-all

Reviewers: jfb

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5180

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218331 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 20:46:49 +00:00
Lang Hames
4fcebee6d8 [MCJIT] Remove PPCRelocations.h - it's no longer used.
This was overlooked in r218320, which removed the relocation headers for other
targets. Thanks to Ulrich Weigand for catching it.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218327 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-23 19:17:48 +00:00
Sanjay Patel
3e05b40fd0 Refactor reciprocal square root estimate into target-independent function; NFC.
This is purely a plumbing patch. No functional changes intended.

The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:

z = y / sqrt(x)

into:

z = y * rsqrte(x)

using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .

The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner.

Next steps:

    The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added.
    Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner.
    X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added.

Differential Revision: http://reviews.llvm.org/D5425



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218219 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-21 15:19:15 +00:00
Hal Finkel
c404e8208c Optionally enable more-aggressive FMA formation in DAGCombine
The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one
use, but this is overly-conservative on some systems. Specifically, if the FMA
and the FADD have the same latency (and the FMA does not compete for resources
with the FMUL any more than the FADD does), there is no need for the
restriction, and furthermore, forming the FMA leaving the FMUL can still allow
for higher overall throughput and decreased critical-path length.

Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to
elide the hasOneUse check. This is enabled for PowerPC by default, as most
PowerPC systems will benefit.

Patch by Olivier Sallenave, thanks!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218120 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-19 11:42:56 +00:00
Aaron Ballman
c21e4e197d Reverting NFC changes from r218050. Instead, the warning was disabled for GCC in r218059, so these changes are no longer required.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218062 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 17:34:23 +00:00
Aaron Ballman
cf5bea8e4a Fixing a bunch of -Woverloaded-virtual warnings due to hiding getSubtargetImpl from the base class. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218050 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 13:27:14 +00:00
Eric Christopher
757c90dd00 Add a new pass FunctionTargetTransformInfo. This pass serves as a
shim between the TargetTransformInfo immutable pass and the Subtarget
via the TargetMachine and Function. Migrate a single call from
BasicTargetTransformInfo as an example and provide shims where TargetMachine
begins taking a Function to determine the subtarget.

No functional change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218004 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-18 00:34:14 +00:00
Samuel Antao
6693d0de3e Fix FastISel bug in boolean returns for PowerPC.
For PPC targets, FastISel does not take the sign extension information into account when selecting return instructions whose operands are constants. A consequence of this is that the return of boolean values is not correct. This patch fixes the problem by evaluating the sign extension information also for constants, forwarding this information to PPCMaterializeInt which takes this information to drive the sign extension during the materialization. 



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217993 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-17 23:25:06 +00:00
Samuel Antao
0c3b56bdab Remove unnecessary blank space (test commit)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217991 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-17 22:47:28 +00:00
Bill Schmidt
183704cb08 Address comments on r217622
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217680 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 14:26:36 +00:00
Bill Schmidt
24ebd0edd6 [PATCH, PowerPC] Accept 'U' and 'X' constraints in inline asm
Inline asm may specify 'U' and 'X' constraints to print a 'u' for an
update-form memory reference, or an 'x' for an indexed-form memory
reference.  However, these are really only useful in GCC internal code
generation.  In inline asm the operand of the memory constraint is
typically just a register containing the address, so 'U' and 'X' make
no sense.

This patch quietly accepts 'U' and 'X' in inline asm patterns, but
otherwise does nothing.  If we ever unexpectedly see a non-register,
we'll assert and sort it out afterwards.

I've added a new test for these constraints; the test case should be
used for other asm-constraints changes down the road.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217622 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-11 20:10:03 +00:00
Sanjay Patel
87c977a52b Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable.
"Unroll" is not the appropriate name for this variable. Clang already uses 
the term "interleave" in pragmas and metadata for this.

Differential Revision: http://reviews.llvm.org/D5066



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217528 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 17:58:16 +00:00
Craig Topper
c4e394a333 Use cast to MVT instead of EVT on a couple calls to getSizeInBits.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217473 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 04:51:36 +00:00
Juergen Ributzka
ecadea992a [FastISel][tblgen] Rename tblgen generated FastISel functions. NFC.
This is the final round of renaming. This changes tblgen to emit lower-case
function names for FastEmitInst_* and FastEmit_*, and updates all its uses
in the source code.

Reviewed by Eric

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217075 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-03 20:56:59 +00:00
Juergen Ributzka
6042034603 [FastISel] Rename public visible FastISel functions. NFC.
This commit renames the following public FastISel functions:
LowerArguments -> lowerArguments
SelectInstruction -> selectInstruction
TargetSelectInstruction -> fastSelectInstruction
FastLowerArguments -> fastLowerArguments
FastLowerCall -> fastLowerCall
FastLowerIntrinsicCall -> fastLowerIntrinsicCall
FastEmitZExtFromI1 -> fastEmitZExtFromI1
FastEmitBranch -> fastEmitBranch
UpdateValueMap -> updateValueMap
TargetMaterializeConstant -> fastMaterializeConstant
TargetMaterializeAlloca -> fastMaterializeAlloca
TargetMaterializeFloatZero -> fastMaterializeFloatZero
LowerCallTo -> lowerCallTo

Reviewed by Eric

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217074 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-03 20:56:52 +00:00
Eric Christopher
5b7ae59f6d Remove resetSubtargetFeatures as it is unused.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217071 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-03 20:36:31 +00:00
Benjamin Kramer
a80ff26688 Add override to overriden virtual methods, remove virtual keywords.
No functionality change. Changes made by clang-tidy + some manual cleanup.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217028 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-03 11:41:21 +00:00
Eric Christopher
d5dd8ce2a5 Reinstate "Nuke the old JIT."
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.

This reinstates commits r215111, 215115, 215116, 215117, 215136.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216982 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 22:28:02 +00:00
Alexey Samsonov
93accb4815 Fix signed integer overflow in PPCInstPrinter.
This bug was reported by UBSan.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216917 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 17:38:34 +00:00
Hal Finkel
dea837d3c5 [PowerPC] Guard against illegal selection of add for TargetConstant operands
r208640 was reverted because it caused a self-hosting failure on ppc64. The
underlying cause was the formation of ISD::ADD nodes with ISD::TargetConstant
operands. Because we have no patterns for 'add' taking 'timm' nodes, these are
selected as r+r add instructions (which is a miscompile). Guard against this
kind of behavior in the future by making the backend crash should this occur
(instead of silently generating invalid output).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216897 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 06:23:54 +00:00
Craig Topper
3af13568fb Remove 'virtual' keyword from methods markedwith 'override' keyword.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216823 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-30 16:48:34 +00:00
Justin Hibbits
cf3736085b Test commit. Fix whitespace from a previous patch of mine.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216650 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-28 04:40:55 +00:00
Karthik Bhat
e637d65af3 Allow vectorization of division by uniform power of 2.
This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible.
Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend.
Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216371 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-25 04:56:54 +00:00
Hal Finkel
7ca2a7d742 [PowerPC] Add support for dcbtst and icbt (prefetch)
Adds code generation support for dcbtst (data cache prefetch for write) and
icbt (instruction cache prefetch for read - Book E cores only).

We still end up with a 'cannot select' error for the non-supported prefetch
intrinsic forms. This will be fixed in a later commit.

Fixes PR20692.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216339 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-23 23:21:04 +00:00
Sanjay Patel
d1a09c47d2 name change: isPow2DivCheap -> isPow2SDivCheap
isPow2DivCheap

That name doesn't specify signed or unsigned.

Lazy as I am, I eventually read the function and variable comments. It turns out that this is strictly about signed div. But I discovered that the comments are wrong:

   srl/add/sra

is not the general sequence for signed integer division by power-of-2. We need one more 'sra':

   sra/srl/add/sra

That's the sequence produced in DAGCombiner. The first 'sra' may be removed when dividing by exactly '2', but that's a special case.

This patch corrects the comments, changes the name of the flag bit, and changes the name of the accessor methods.

No functional change intended.

Differential Revision: http://reviews.llvm.org/D5010


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216237 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 22:31:48 +00:00
Tim Northover
049ffbbdf2 TableGen: allow use of uint64_t for available features mask.
ARM in particular is getting dangerously close to exceeding 32 bits worth of
possible subtarget features. When this happens, various parts of MC start to
fail inexplicably as masks get truncated to "unsigned".

Mostly just refactoring at present, and there's probably no way to test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215887 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-18 11:49:42 +00:00
Hal Finkel
5dc48ac04a [PowerPC] Mark fixed-offset byvals as pointed-to by IR values
A byval object, even if allocated at a fixed offset (prescribed by the ABI) is
pointed to by IR values. Most fixed-offset stack objects are not pointed-to by
IR values, so the default is to assume this is not possible. However, we need
to override the default in this case (instruction scheduling can cause
miscompiles otherwise).

Fixes PR20280.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215795 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-16 00:17:05 +00:00
Hal Finkel
bdd8b6bfb9 [PowerPC] Darwin byval arguments are not immutable
On PPC/Darwin, byval arguments occur at fixed stack offsets in the callee's
frame, but are not immutable -- the pointer value is directly available to the
higher-level code as the address of the argument, and the value of the byval
argument can be modified at the IR level.

This is necessary, but not sufficient, to fix PR20280. When PR20280 is fixed in
a follow-up commit, its test case will cover this change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215793 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-16 00:16:29 +00:00
Rafael Espindola
a348fc7fda Remove HasLEB128.
We already require CFI, so it should be safe to require .leb128 and .uleb128.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215712 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 14:01:07 +00:00
Benjamin Kramer
5a649ba0ee PPC: Clean up pointer casting, no functionality change.
Silences GCC's -Wcast-qual.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215703 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 11:05:45 +00:00
Bill Schmidt
feb45e3f0f [PPC64] Add missing dependency on X2 to LDinto_toc.
The LDinto_toc pattern has been part of 64-bit PowerPC for a long
time, and represents loading from a memory location into the TOC
register (X2).  However, this pattern doesn't explicitly record that
it modifies that register.  This patch adds the missing dependency.

It was very surprising to me that this has never shown up as a problem
in the past, and that we only saw this problem recently in a single
scenario when building a self-hosted clang.  It turns out that in most
cases we have another dependency present that keeps the LDinto_toc
instruction tied in place.  LDinto_toc is used for TOC restore
following a call site, so this is a typical sequence:

   BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
   LDinto_toc 24, %X1
   ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use>

Because the LDinto_toc is inserted prior to the ADJCALLSTACKUP, there
is a natural anti-dependency between the two that keeps it in place.

Therefore we don't usually see a problem.  However, in one particular
case, one call is followed immediately by another call, and the second
call requires a parameter that is a TOC-relative address.  This is the
code sequence:

  BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
  LDinto_toc 24, %X1
  ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use>
  ADJCALLSTACKDOWN 96, %R1<imp-def>, %R1<imp-use>
  %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
  %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39

Note that the back-to-back stack adjustments are the same size!  The
back end is smart enough to recognize this and optimize them away:

  BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
  LDinto_toc 24, %X1
  %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
  %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39

Now there is nothing to prevent the ADDIStocHA instruction from moving
ahead of the LDinto_toc instruction, and because of the longest-path
heuristic, this is what happens.

With the accompanying patch, %X2 is represented as an implicit def:

  BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ...
  LDinto_toc 24, %X1, %X2<imp-def,dead>
  ADJCALLSTACKUP 96, 0, %R1<imp-def,dead>, %R1<imp-use>
  ADJCALLSTACKDOWN 96, %R1<imp-def,dead>, %R1<imp-use>
  %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39
  %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39

So now when the two stack adjustments are removed, ADDIStocHA is
prevented from being moved above LDinto_toc.

I have not yet created a test case for this, because the original
failure occurs on a relatively large function that needs reduction.
However, this is a fairly serious bug, despite its infrequency, and I
wanted to get this patch onto the list as soon as possible so that it
can be considered for a 3.5 backport.  I'll work on whittling down a
test case.

Have we missed the boat for 3.5 at this point?

Thanks,
Bill


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215685 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-15 01:25:26 +00:00
Benjamin Kramer
00e08fcaa0 Canonicalize header guards into a common format.
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)

Changes made by clang-tidy with minor tweaks.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215558 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 16:26:38 +00:00
Hal Finkel
e693d3c558 [PowerPC] Implement PPCTargetLowering::getTgtMemIntrinsic
This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store
intrinsics. As with the construction of the MachineMemOperands for the
intrinsic calls used for unaligned load/store lowering, the only slight
complication is that we need to represent a larger memory range than the
loaded/stored value-type size (because the address is rounded down to an
aligned address, and we need to conservatively represent the entire possible
range of the actual access). This required adding an extra size field to
TargetLowering::IntrinsicInfo, and this was done in a way that required no
modifications to other targets (the size defaults to the store size of the
provided memory data type).

This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215512 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-13 01:15:40 +00:00
Joerg Sonnenberger
4417c25b03 @l and friends adjust their value depending the context used in.
For ori, they are unsigned, for addi, signed. Create a new target
expression type to handle this and evaluate Fixups accordingly.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215315 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-10 12:41:50 +00:00
Joerg Sonnenberger
b2b363408b If available, pass down the Fixup object to EvaluateAsRelocatable.
At least on PowerPC, the interpretation of certain modifiers depends on
the context they appear in.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215310 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-10 11:35:12 +00:00
Joerg Sonnenberger
06e8dbef25 Allow the third argument for the subi family to be an expression.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215286 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-09 17:10:26 +00:00
Joerg Sonnenberger
26adfd1ca4 Use the full form of dccci and iccci from the early PPC 405 documents,
since the operands are actually used on those cores. Provide aliases for
the only documented case in the newer Power ISA speec.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215282 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-09 13:58:31 +00:00
Eric Christopher
e9da71fc86 Initialize PPC DataLayout based on the Triple only.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215281 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-09 04:53:17 +00:00
Eric Christopher
f3157be108 Remove extraneous 64-bit argument to the PPC TargetMachine constructor
and update initialization.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215280 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-09 04:38:56 +00:00
Joerg Sonnenberger
996a304351 Allow large immediates for branch instructions in 32bit mode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215240 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-08 20:57:58 +00:00
Joerg Sonnenberger
f0b70e2fbc Provide an implementation of getNoopForMachoTarget for PPC, otherwise
empty functions will assert in the MC object writer.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215238 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-08 19:13:23 +00:00
Joerg Sonnenberger
6f3e49e03c Add low-level option for avoiding float stores from va_start until
soft-float is properly supported.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215221 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-08 16:46:10 +00:00
Joerg Sonnenberger
c9def6b938 Add support for SPE load/store from memory.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215220 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-08 16:43:49 +00:00
Eric Christopher
aa5b9c0f6f Temporarily Revert "Nuke the old JIT." as it's not quite ready to
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.

Approved by Jim Grosbach, Lang Hames, Rafael Espindola.

This reverts commits r215111, 215115, 215116, 215117, 215136.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215154 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 22:02:54 +00:00
Joerg Sonnenberger
71c5eed711 Add the majority of the remaining SPE instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215131 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 18:52:39 +00:00
Rafael Espindola
875710a2fd Nuke the old JIT.
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.

Thanks to Lang Hames for making MCJIT a good replacement!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215111 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 14:21:18 +00:00
Joerg Sonnenberger
7ad7c75048 Add mfasr and mtasr
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215110 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 13:35:34 +00:00
Joerg Sonnenberger
d94b6e8895 Add mfrtcu and mfrtcl instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215109 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 13:16:58 +00:00
Joerg Sonnenberger
5b1fba4a83 Support mttbl and mttbu mnemonic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215108 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 13:06:23 +00:00
Joerg Sonnenberger
80de56ebde Add RFID instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215105 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 12:39:59 +00:00
Joerg Sonnenberger
619bacb039 Fix Itineray class of rfi
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215104 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 12:35:16 +00:00
Joerg Sonnenberger
46a750b1a0 Spell e500 feature in lower case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215103 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 12:31:28 +00:00
Joerg Sonnenberger
445a9f9729 Add first bunch of SPE instructions. As they overlap with Altivec, mark
them as parser-only until the disassembler is extended to handle
predicates properly.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215102 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-07 12:18:21 +00:00
Eric Christopher
41612a9b85 Remove the target machine from CCState. Previously it was only used
to get the subtarget and that's accessible from the MachineFunction
now. This helps clear the way for smaller changes where we getting
a subtarget will require passing in a MachineFunction/Function as
well.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214988 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-06 18:45:26 +00:00
Rafael Espindola
9920f561c3 Remove a virtual function from TargetMachine. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214929 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-05 22:10:21 +00:00
Bill Schmidt
bb639a1f96 [PowerPC] Swap arguments and adjust shift count for vsldoi on little endian
Commits r213915 and r214718 fix recognition of shuffle masks for vmrg*
and vpku*um instructions for a little-endian target, by swapping the
input arguments.  The vsldoi instruction requires similar treatment,
and also needs its shift count adjusted for little endian.

Reviewed by Ulrich Weigand.

This is a bug fix candidate for release 3.5 (and hopefully the last of
those for PowerPC).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214923 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-05 20:47:25 +00:00
Joerg Sonnenberger
2888b08b44 Add accessors for the PPC 403 bank registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214875 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-05 15:45:15 +00:00
Joerg Sonnenberger
bb97134ffe Accessors for SSR2 and SSR3 on PPC 403.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214867 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-05 14:53:05 +00:00
Joerg Sonnenberger
deaa09e169 Add dci/ici instructions for PPC 476 and friends.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214864 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-05 14:40:32 +00:00
Joerg Sonnenberger
2bdd960ae3 Add mftblo and mftbhi for PPC 4xx.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214863 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-05 14:18:16 +00:00
Joerg Sonnenberger
0f365741f3 Add lswi / stswi for assembler use with a warning to not add patterns
for them.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214862 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-05 13:34:01 +00:00
Eric Christopher
6035518e3b Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.

Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214838 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-05 02:39:49 +00:00
Joerg Sonnenberger
e2f9c8d663 Add TCR register access
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214826 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 23:53:42 +00:00
Joerg Sonnenberger
7c5b978254 Add PPC 603's tlbld and tlbli instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214825 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 23:49:45 +00:00
Bill Schmidt
84fef1f55d [PPC64LE] Fix wrong IR for vec_sld and vec_vsldoi
My original LE implementation of the vsldoi instruction, with its
altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect
shufflevector operations in the LLVM IR.  Correct code is generated
because the back end handles the incorrect shufflevector in a
consistent manner.

This patch and a companion patch for Clang correct this problem by
removing the fixup from altivec.h and the corresponding fixup from the
PowerPC back end.  Several test cases are also modified to reflect the
now-correct LLVM IR.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214800 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 23:21:01 +00:00
Joerg Sonnenberger
db3ce56a58 Add simplified aliases for access to DCCR, ICCR, DEAR and ESR
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214797 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 22:56:42 +00:00
Joerg Sonnenberger
25c8b4774b tlbre / tlbwe / tlbsx / tlbsx. variants for the PPC 4xx CPUs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214784 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 21:28:22 +00:00
Eric Christopher
9f85dccfc6 Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214781 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 21:25:23 +00:00
Joerg Sonnenberger
355845b437 Recognize mftbl as alias for mftb, for symmetry with mttb.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214769 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 20:28:34 +00:00
Joerg Sonnenberger
d6c23bed52 Rename PPCLinuxMCAsmInfo to PPCELFMCAsmInfo to better reflect the
systems it represents.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214755 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 18:46:13 +00:00
Joerg Sonnenberger
e8f23a8a5d Allow .lcomm with alignment on ELF targets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214754 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 18:45:10 +00:00
Joerg Sonnenberger
93bbddf1f2 Refactor SPRG instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214733 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 17:26:15 +00:00
Joerg Sonnenberger
df64464ad2 Add support for m[ft][di]bat[ul] instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214731 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 17:07:41 +00:00
Joerg Sonnenberger
977b978f93 Add features for PPC 4xx and e500/e500mc instructions.
Move the test cases for them into separate files.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214724 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 15:47:38 +00:00
Ulrich Weigand
c568629589 [PowerPC] Swap arguments to vpkuhum/vpkuwum on little-endian
In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl*
by swapping the input arguments.  As it turns out, the exact same fix
is also required for the vpkuhum/vpkuwum patterns.

This fixes another regression in llvmpipe when vector support is
enabled.

Reviewed by Bill Schmidt.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214718 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 13:53:40 +00:00
Ulrich Weigand
d5e9497c88 [PowerPC] MULHU/MULHS are not legal for vector types
I ran into some test failures where common code changed vector division
by constant into a multiply-high operation (MULHU).  But these are not
implemented by the back-end, so we failed to recognize the insn.

Fixed by marking MULHU/MULHS as Expand for vector types.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214716 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 13:27:12 +00:00
Ulrich Weigand
3b7a193521 [PowerPC] Fix and improve vector comparisons
This patch refactors code generation of vector comparisons.

This fixes a wrong code-gen bug for ISD::SETGE for floating-point types,
and improves generated code for vector comparisons in general.

Specifically, the patch moves all logic deciding how to implement vector
comparisons into getVCmpInst, which gets two extra boolean outputs
indicating to its caller whether its needs to swap the input operands
and/or negate the result of the comparison.  Apart from implementing
these two modifications as directed by getVCmpInst, there is no need
to ever implement vector comparisons in any other manner; in particular,
there is never a need to perform two separate comparisons (e.g. one for
equal and one for greater-than, as code used to do before this patch).

Reviewed by Bill Schmidt.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214714 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 13:13:57 +00:00
Joerg Sonnenberger
a0e1d10f75 tlbia support
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214640 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-02 20:16:29 +00:00
Joerg Sonnenberger
55f925abc5 mfdcr / mtdcr support
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214639 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-02 20:00:26 +00:00
Joerg Sonnenberger
419f3804f0 Don't use additional arguments for dss and friends to satisfy DSS_Form,
when let can do the same thing. Keep the 64bit variants as codegen-only.
While they have a different register class, the encoding is the same for
32bit and 64bit mode. Having both present would otherwise confuse the
disassembler.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214636 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-02 15:09:41 +00:00
Eric Christopher
f9799dbe51 Add a non-const subtarget returning function to the target machine
so that we can use it to get the old-style JIT out of the subtarget.

This code should be removed when the old-style JIT is removed
(imminently).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214560 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-01 21:18:01 +00:00
Ulrich Weigand
a45d218f15 [PowerPC] PR20280 - Slots for byval parameters are not immutable
Found by inspection while looking at PR20280: code would mark slots
in the parameter save area where a byval parameter is passed as
"immutable".  This is not correct since code is allowed to modify
byval parameters in place in the parameter save area.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214517 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-01 14:35:58 +00:00
Hal Finkel
d2a90101cd [PowerPC] Generate unaligned vector loads using intrinsics instead of regular loads
Altivec vector loads on PowerPC have an interesting property: They always load
from an aligned address (by rounding down the address actually provided if
necessary). In order to generate an actual unaligned load, you can generate two
load instructions, one with the original address, one offset by one vector
length, and use a special permutation to extract the bytes desired.

When this was originally implemented, I generated these two loads using regular
ISD::LOAD nodes, now marked as aligned. Unfortunately, there is a problem with
this:

The alignment of a load does not contribute to its identity, and SDNodes
are uniqued. So, imagine that we have some unaligned load, L1, that is not
aligned. The routine will create two loads, L1(aligned) and (L1+16)(aligned).
Further imagine that there had already existed a load (L1+16)(unaligned) with
the same chain operand as the load L1. When (L1+16)(aligned) is created as part
of the lowering of L1, this load *is* also the (L1+16)(unaligned) node, just
now marked as aligned (because the new alignment overwrites the old). But the
original users of (L1+16)(unaligned) now get the data intended for the
permutation yielding the data for L1, and (L1+16)(unaligned) no longer exists
to get its own permutation-based expansion. This was PR19991.

A second potential problem has to do with the MMOs on these loads, which can be
used by AA during instruction scheduling to break chain-based dependencies. If
the new "aligned" loads get the MMO from the original unaligned load, this does
not represent the fact that it will load data from below the original address.
Normally, this would not matter, but this load might be combined with another
load pair for a previous vector, and then the dependency on the otherwise-
ignored lower bytes can matter.

To fix both problems, instead of generating the necessary loads using regular
ISD::LOAD instructions, ppc_altivec_lvx intrinsics are used instead. These are
provided with MMOs with a conservative address range.

Unfortunately, I no longer have a failing test case (since PR19991 was
reported, other changes in CodeGen have forced this bug back into hiding it
again). Nevertheless, this should fix the underlying problem.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214481 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-01 05:20:41 +00:00
Hal Finkel
bcaf5e176a [PowerPC] Recognize consecutive memory accesses from intrinsics
When generating unaligned vector loads, we need to search for other loads or
stores nearby offset by one vector width. If we find one, then we know that we
can safely generate another aligned load at that address. Otherwise, we must
generate the next load using an offset of the vector width minus one byte (so
we don't read off the end of the allocation if the base unaligned address
happened to be aligned at runtime). We had previously done this using only
other vector loads and stores, but did not consider the PowerPC-specific vector
load/store intrinsics. Now we'll also consider vector intrinsics. By itself,
this change is a feature enhancement, but is a necessary step toward fixing the
underlying problem behind PR19991.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214469 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-01 01:02:01 +00:00
Louis Gerbarg
7d54c5b0f2 Make sure no loads resulting from load->switch DAGCombine are marked invariant
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.

This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214449 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-31 21:45:05 +00:00
Joerg Sonnenberger
cadb2a8a32 Add mtpid/mfpid for BookE.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214363 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-30 23:59:11 +00:00
Joerg Sonnenberger
367c5c25e8 Refactor TLBIVAX and add tlbsx.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214354 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-30 22:51:15 +00:00
Joerg Sonnenberger
ee6a05091a Add rfdi and rfmci from the e500/e500mc ISA.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214339 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-30 21:09:03 +00:00
Joerg Sonnenberger
9e3d58aae1 Add BookE's tlbre, tlbwe and tlbivax instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214332 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-30 20:44:04 +00:00
Joerg Sonnenberger
a4d6ef15b8 Add BookE's wrtee and wrteei instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214297 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-30 10:32:51 +00:00
Joerg Sonnenberger
a2d6cb1e55 SPRG 0 to 3 are valid outside BookE, so move them to the normal test
file. Add support for accessing SPRG 4 to 7 on BookE.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214295 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-30 09:24:37 +00:00
Joerg Sonnenberger
f34e598090 Add rfci instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214256 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-29 23:45:20 +00:00
Joerg Sonnenberger
1bb9c8155a mbar without argument is equivalent to mbar 0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214250 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-29 23:31:27 +00:00
Joerg Sonnenberger
6e48dd6d5b Recognize BookE's mbar instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214244 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-29 23:16:31 +00:00
Joerg Sonnenberger
2f8f622d18 Fix typo in alias: DSIR -> DSISR
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214238 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-29 22:42:44 +00:00
Joerg Sonnenberger
b9253653c7 Support move to/from segment register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214234 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-29 22:21:57 +00:00
Joerg Sonnenberger
e31c7fd974 Add a number of aliases for SPR access.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214196 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-29 18:55:43 +00:00
Joerg Sonnenberger
f6689601ee Add rfi instruction. Based on feedback by Ulrich Weigand.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214181 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-29 15:49:09 +00:00
Matt Arsenault
3bd14877eb Fix typos / grammar.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214147 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-29 00:02:40 +00:00
Ulrich Weigand
fe7131e2d0 [PowerPC] Support ELFv1/ELFv2 ABI selection via features
While LLVM now supports both ELFv1 and ELFv2 ABIs, their use is currently
hard-coded via the target triple: powerpc64-linux is always ELFv1, while
powerpc64le-linux is always ELFv2.

These are of course the most common scenarios, but in principle it is
possible to support the ELFv2 ABI on big-endian or the ELFv1 ABI on
little-endian systems (and GCC does support that), and there are some
special use cases for that (e.g. certain Linux kernel versions could
only be built using ELFv1 on LE).

This patch implements the LLVM side of supporting this.  As precedent
on other platforms suggests, ABI options are passed to the back-end as
features.  Thus, this patch implements two features "elfv1" and "elfv2"
that select the desired ABI if present.  (If not, the LLVM uses the
same default rules as now.)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214072 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-28 13:09:28 +00:00
Matt Arsenault
2dd264c8a3 Add alignment value to allowsUnalignedMemoryAccess
Rename to allowsMisalignedMemoryAccess.

On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment,
and don't need to be split into multiple accesses. Vector loads with
an alignment of the element type are not uncommon in OpenCL code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214055 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-27 17:46:40 +00:00
Hal Finkel
e9b6201f4d [PowerPC] Support TLS on PPC32/ELF
Patch by Justin Hibbits!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213960 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-25 17:47:22 +00:00
Bill Schmidt
2286ae542c [PATCH][PPC64LE] Correct little-endian usage of vmrgh* and vmrgl*.
Because the PowerPC vmrgh* and vmrgl* instructions have a built-in
big-endian bias, it is necessary to swap their inputs in little-endian
mode when using them to implement a vector shuffle.  This was
previously missed in the vector LE implementation.

There was already logic to distinguish between unary and "normal"
vmrg* vector shuffles, so this patch extends that logic to use a third
option:  "swapped" vmrg* vector shuffles that are used for little
endian in place of the "normal" ones.

I've updated the vec-shuffle-le.ll test to check for the expected
register ordering on the generated instructions.

This bug was discovered when testing the LE and ELFv2 patches for
safety if they were backported to 3.4.  A different vectorization
decision was made in 3.4 than on mainline trunk, and that exposed the
problem.  I've verified this fix takes care of that issue.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213915 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-25 01:55:55 +00:00
Joerg Sonnenberger
86854be8b1 Don't use 128bit functions on PPC32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213899 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-24 22:20:10 +00:00
NAKAMURA Takumi
9d8661d666 Prune dependency to MC from each target disassembler.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213856 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-24 11:45:11 +00:00
NAKAMURA Takumi
9fda421ee4 Update library dependencies.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213832 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-24 02:10:42 +00:00
Ulrich Weigand
d4542a8cdc [PowerPC] ELFv2 aggregate passing support
This patch adds infrastructure support for passing array types
directly.  These can be used by the front-end to pass aggregate
types (coerced to an appropriate array type).  The details of the
array type being used inform the back-end about ABI-relevant
properties.  Specifically, the array element type encodes:
- whether the parameter should be passed in FPRs, VRs, or just
  GPRs/stack slots  (for float / vector / integer element types,
  respectively)
- what the alignment requirements of the parameter are when passed in
  GPRs/stack slots  (8 for float / 16 for vector / the element type
  size for integer element types) -- this corresponds to the
  "byval align" field

Using the infrastructure provided by this patch, a companion patch
to clang will enable two features:
- In the ELFv2 ABI, pass (and return) "homogeneous" floating-point
  or vector aggregates in FPRs and VRs (this is similar to the ARM
  homogeneous aggregate ABI)
- As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates
  that fit fully in registers without using the "byval" mechanism

The patch uses the functionArgumentNeedsConsecutiveRegisters callback
to encode that special treatment is required for all directly-passed
array types.  The isInConsecutiveRegs / isInConsecutiveRegsLast bits set
as a results are then used to implement the required size and alignment
rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc.

As a related change, the ABI routines have to be modified to support
passing floating-point types in GPRs.  This is necessary because with
homogeneous aggregates of 4-byte float type we can now run out of FPRs
*before* we run out of the 64-byte argument save area that is shadowed
by GPRs.  Any extra floating-point arguments that no longer fit in FPRs
must now be passed in GPRs until we run out of those too.

Note that there was already code to pass floating-point arguments in
GPRs used with vararg parameters, which was done by writing the argument
out to the argument save area first and then reloading into GPRs.  The
patch re-implements this, however, in favor of code packing float arguments
directly via extension/truncation, BITCAST, and BUILD_PAIR operations.

This is required to support the ELFv2 ABI, since we cannot unconditionally
write to the argument save area (which the caller might not have allocated).
The change does, however, affect ELFv1 varags routines too; but even here
the overall effect should be advantageous: Instead of loading the argument
into the FPR, then storing the argument to the stack slot, and finally
reloading the argument from the stack slot into a GPR, the new code now
just loads the argument into the FPR, and subsequently loads the argument
into the GPR (via BITCAST).  That BITCAST might imply a save/reload from
a stack temporary (in which case we're no worse than before); but it
might be implemented more efficiently in some cases.

The final part of the patch enables up to 8 FPRs and VRs for argument
return in PPCCallingConv.td; this is required to support returning
ELFv2 homogeneous aggregates.  (Note that this doesn't affect other ABIs
since LLVM wil only look for which register to use if the parameter is
marked as "direct" return anyway.)

Reviewed by Hal Finkel.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213493 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 00:13:26 +00:00
Ulrich Weigand
970c019d02 [PowerPC] ELFv2 explicit CFI for CR fields
This is a minor improvement in the ELFv2 ABI.   In ELFv1, DWARF CFI
would represent a saved CR word (holding CR fields CR2, CR3, and CR4)
using just a single CFI record refering to CR2.   In ELFv2 instead,
each of the CR fields is represented by its own CFI record.  The
advantage is that the compiler can now chose to save just a single
(or two) CR fields instead of all of them, if those are the only ones
that actually need saving.  That can lead to more efficient code using
mf(o)crf instead of the (slow) mfcr instruction.

Note that this patch does not (yet) implement this more efficient
code generation, but it does implement the part that is required to
be ABI compliant: creating multiple CFI records if multiple CR fields
are saved.

Reviewed by Hal Finkel.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213492 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-21 00:03:18 +00:00
Ulrich Weigand
7fc5011e8d [PowerPC] ELFv2 stack space reduction
The ELFv2 ABI reduces the amount of stack required to implement an
ABI-compliant function call in two ways:
* the "linkage area" is reduced from 48 bytes to 32 bytes by
  eliminating two unused doublewords
* the 64-byte "parameter save area" is now optional and need not be
  present in certain cases (it remains mandatory in functions with
  variable arguments, and functions that have any parameter that is
  passed on the stack)

The following patch implements this required changes:
- reducing the linkage area, and associated relocation of the TOC save
  slot, in getLinkageSize / getTOCSaveOffset (this requires updating all
  callers of these routines to pass in the isELFv2ABI flag).
- (partially) handling the case where the parameter save are is optional

This latter part requires some extra explanation:  Currently, we still
always allocate the parameter save area when *calling* a function.
That is certainly always compliant with the ABI, but may cause code to
allocate stack unnecessarily.  This can be addressed by a follow-on
optimization patch.

On the *callee* side, in LowerFormalArguments, we *must* track
correctly whether the ABI guarantees that the caller has allocated
the parameter save area for our use, and the patch does so. However,
there is one complication: the code that handles incoming "byval"
arguments will currently *always* write to the parameter save area,
because it has to force incoming register arguments to the stack since
it must return an *address* to implement the byval semantics.

To fix this, the patch changes the LowerFormalArguments code to write
arguments to a freshly allocated stack slot on the function's own stack
frame instead of the argument save area in those cases where that area
is not present.

Reviewed by Hal Finkel.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213490 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-20 23:43:15 +00:00
Ulrich Weigand
edfd4f18bc [PowerPC] ELFv2 function call changes
This patch builds upon the two preceding MC changes to implement the
basic ELFv2 function call convention.  In the ELFv1 ABI, a "function
descriptor" was associated with every function, pointing to both the
entry address and the related TOC base (and a static chain pointer
for nested functions).  Function pointers would actually refer to that
descriptor, and the indirect call sequence needed to load up both entry
address and TOC base.

In the ELFv2 ABI, there are no more function descriptors, and function
pointers simply refer to the (global) entry point of the function code.
Indirect function calls simply branch to that address, after loading it
up into r12 (as required by the ABI rules for a global entry point).
Direct function calls continue to just do a "bl" to the target symbol;
this will be resolved by the linker to the local entry point of the
target function if it is local, and to a PLT stub if it is global.
That PLT stub would then load the (global) entry point address of the
final target into r12 and branch to it.  Note that when performing a
local function call, r2 must be set up to point to the current TOC
base: if the target ends up local, the ABI requires that its local
entry point is called with r2 set up; if the target ends up global,
the PLT stub requires that r2 is set up.

This patch implements all LLVM changes to implement that scheme:
- No longer create a function descriptor when emitting a function
  definition (in EmitFunctionEntryLabel)
- Emit two entry points *if* the function needs the TOC base (r2)
  anywhere (this is done EmitFunctionBodyStart; note that this cannot
  be done in EmitFunctionBodyStart because the global entry point
  prologue code must be *part* of the function as covered by debug info).
- In order to make use tracking of r2 (as needed above) work correctly,
  mark direct function calls as implicitly using r2.
- Implement the ELFv2 indirect function call sequence (no function
  descriptors; load target address into r12).
- When creating an ELFv2 object file, emit the .abiversion 2 directive
  to tell the linker to create the appropriate version of PLT stubs.  

Reviewed by Hal Finkel.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213489 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-20 23:31:44 +00:00
Ulrich Weigand
76fcace66e [MC] Pass MCSymbolData to needsRelocateWithSymbol
As discussed in a previous checking to support the .localentry
directive on PowerPC, we need to inspect the actual target symbol
in needsRelocateWithSymbol to make the appropriate decision based
on that symbol's st_other bits.

Currently, needsRelocateWithSymbol does not get the target symbol.
However, it is directly available to its sole caller.  This patch
therefore simply extends the needsRelocateWithSymbol by a new
parameter "const MCSymbolData &SD", passes in the target symbol,
and updates all derived implementations.

In particular, in the PowerPC implementation, this patch removes
the FIXME added by the previous checkin.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213487 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-20 23:15:06 +00:00
Ulrich Weigand
5ee5fc4c47 [PowerPC] ELFv2 MC support for .localentry directive
A second binutils feature needed to support ELFv2 is the .localentry
directive.  In the ELFv2 ABI, functions may have two entry points:
one for calling the routine locally via "bl", and one for calling the
function via function pointer (either at the source level, or implicitly
via a PLT stub for global calls).  The two entry points share a single
ELF symbol, where the ELF symbol address identifies the global entry
point address, while the local entry point is found by adding a delta
offset to the symbol address.  That offset is encoded into three
platform-specific bits of the ELF symbol st_other field.

The .localentry directive instructs the assembler to set those fields
to encode a particular offset.  This is typically used by a function
prologue sequence like this:

func:
        addis r2, r12, (.TOC.-func)@ha
        addi r2, r2, (.TOC.-func)@l
        .localentry func, .-func

Note that according to the ABI, when calling the global entry point,
r12 must be set to point the global entry point address itself; while
when calling the local entry point, r2 must be set to point to the TOC
base.  The two instructions between the global and local entry point in
the above example translate the first requirement into the second.

This patch implements support in the PowerPC MC streamers to emit the
.localentry directive (both into assembler and ELF object output), as
well as support in the assembler parser to parse that directive.

In addition, there is another change required in MC fixup/relocation
handling to properly deal with relocations targeting function symbols
with two entry points: When the target function is known local, the MC
layer would immediately handle the fixup by inserting the target
address -- this is wrong, since the call may need to go to the local
entry point instead.  The GNU assembler handles this case by *not*
directly resolving fixups targeting functions with two entry points,
but always emits the relocation and relies on the linker to handle
this case correctly.  This patch changes LLVM MC to do the same (this
is done via the processFixupValue routine).

Similarly, there are cases where the assembler would normally emit a
relocation, but "simplify" it to a relocation targeting a *section*
instead of the actual symbol.  For the same reason as above, this
may be wrong when the target symbol has two entry points.  The GNU
assembler again handles this case by not performing this simplification
in that case, but leaving the relocation targeting the full symbol,
which is then resolved by the linker.  This patch changes LLVM MC
to do the same (via the needsRelocateWithSymbol routine).
NOTE: The method used in this patch is overly pessimistic, since the
needsRelocateWithSymbol routine currently does not have access to the
actual target symbol, and thus must always assume that it might have
two entry points.  This will be improved upon by a follow-on patch
that modifies common code to pass the target symbol when calling
needsRelocateWithSymbol.

Reviewed by Hal Finkel.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213485 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-20 23:06:03 +00:00
Ulrich Weigand
0d9bcaacd1 [PowerPC] ELFv2 MC support for .abiversion directive
ELFv2 binaries are marked by a bit in the ELF header e_flags field.
A new assembler directive .abiversion can be used to set that flag.
This patch implements support in the PowerPC MC streamers to emit the
.abiversion directive (both into assembler and ELF binary output),
as well as support in the assembler parser to parse the .abiversion
directive.

Reviewed by Hal Finkel.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213484 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-20 22:56:57 +00:00
Ulrich Weigand
675c967d55 [PowerPC] Refactor byval handling in LowerFormalArguments_64SVR4
When handling an incoming byval argument, we need to possibly write
incoming registers to the stack in order to create an on-stack image
of the parameter, so we can return its address to common code.

This currently uses CreateFixedObject to access the parts of the
parameter save area where the argument is (or needs to be) stored.
However, sometimes we need to access multiple parts of that area,
e.g. to write multiple registers.  The code currently uses a new
CreateFixedObject call for each of these accesses, resulting in
a patchwork of overlapping (fixed) stack objects.

This doesn't really matter in the case of fixed objects, since
any access to those turns into a fixed stackpointer + offset
address anyway.  However, with the upcoming ELFv2 patches, we
may actually need to place an incoming argument into our *own*
stack frame instead of the caller's.  This means we need to use
CreateStackObject instead, and we cannot have multiple overlapping
instances of those.

To make the rest of the argument handling code work equally in
both situations, this patch refactors it to always use just a
single call to CreateFixedObject, and access parts of that object
as required using address arithmetic.  This way, we can in a future
patch substitute CreateStackObject without further changes.

No change to generated code intended.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213483 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-20 22:36:52 +00:00
Ulrich Weigand
e4b2165648 [PowerPC] Fix FrameIndex handling in SelectAddressRegImm
The PPCTargetLowering::SelectAddressRegImm routine needs to handle
FrameIndex nodes in a special manner, by tranlating them into a
TargetFrameIndex node.  This was done in most cases, but seems to
have been neglected in one path: when the input tree has an OR of
the FrameIndex with an immediate.  This can happen if the FrameIndex
can be proven to be sufficiently aligned that an OR of that immediate
is equivalent to an ADD.

The missing handling of FrameIndex in that case caused the SelectionDAG
instruction selection to miss opportunities to merge the OR back into
the FrameIndex node, leading to superfluous addi/ori instructions in
the final assembler output.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213482 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-20 22:26:40 +00:00
Hal Finkel
d644d17dd4 [PowerPC] 32-bit ELF PIC support
This adds initial support for PPC32 ELF PIC (Position Independent Code; the
-fPIC variety), thus rectifying a long-standing deficiency in the PowerPC
backend.

Patch by Justin Hibbits!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213427 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-18 23:29:49 +00:00
Sanjay Patel
f7e042324a Move Post RA Scheduling flag bit into SchedMachineModel
Refactoring; no functional changes intended

    Removed PostRAScheduler bits from subtargets (X86, ARM).
    Added PostRAScheduler bit to MCSchedModel class.
    This bit is set by a CPU's scheduling model (if it exists).
    Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses.
    Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!).
    Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling.
    Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values.
    Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86: 
       a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget. 
       b. MIPS overrides the CPU's postRA settings by enabling postRA for everything. 
       c. PPC overrides the CPU's postRA settings by enabling postRA for everything. 
       d. X86 is the only target that actually has postRA specified via sched model info.

Differential Revision: http://reviews.llvm.org/D4217


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213101 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-15 22:39:58 +00:00
NAKAMURA Takumi
3ee5fc8618 Prune Redundant libdeps in CMake's target_link_libraries and LLVMBuild.txt.
I checked this with Release+Asserts on x86_64-mingw32. Please restore partially if this were overkill.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213064 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-15 11:37:03 +00:00
Ulrich Weigand
edb27188b4 [PowerPC] Fix invalid displacement created by LocalStackAlloc
This commit fixes a bug in PPCRegisterInfo::isFrameOffsetLegal that
could result in the LocalStackAlloc pass creating an MI instruction
out-of-range displacement:
        %vreg17<def> = LD 33184, %vreg31; mem:LD8[%g](align=32)
        %G8RC:%vreg17 G8RC_and_G8RC_NOX0:%vreg31
(In final assembler output the top bits are stripped off, resulting
in a negative offset loading from below the stack pointer.)

Common code expects the isFrameOffsetLegal routine to verify whether
adding a given offset to the offset already present in the instruction
results in a valid displacement.  However, on PowerPC the routine
did not take the already present instruction offset into account.

This commit fixes isFrameOffsetLegal to add the instruction offset,
and updates a local caller (needsFrameBaseReg) to no longer add the
instruction offset itself before calling isFrameOffsetLegal.

Reviewed by Hal Finkel.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212832 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-11 17:19:31 +00:00
Ulrich Weigand
b7fdc7ff16 [PowerPC] Implement atomic NAND operations as actual NAND
This changes the implementation of atomic NAND operations
from "a & ~b" (compatible with GCC < 4.4) to actual "~(a & b)"
(compatible with GCC >= 4.4).

This is in line with the common-code and ARM back-end change
implemented in r212433.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212547 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-08 16:16:02 +00:00
Ulrich Weigand
b053ddc909 [PowerPC] Fix no-assert build
r212476 caused a compile failure (unused variable) in a non-assertion
build ...



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212477 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-07 19:39:44 +00:00
Ulrich Weigand
bf7bfe3549 [PowerPC] Fix "byval align" arguments
Arguments passed as "byval align" should get the specified alignment
in the parameter save area.  There was some code in PPCISelLowering.cpp
that attempted to implement this, but this didn't work correctly:
while code did update the ArgOffset value, it neglected to update
the PtrOff value (which was already computed from the old ArgOffset),
and it also neglected to update GPR_idx -- fields skipped due to
alignment in the save area must likewise be skipped in GPRs.

This patch fixes and simplifies this logic by:
- handling argument offset alignment right at the beginning
  of argument processing, using a new helper routine
  CalculateStackSlotAlignment (this avoids having to update
  PtrOff and other derived values later on)
- not tracking GPR_idx separately, but always computing the
  correct GPR_idx for each argument *from* its ArgOffset
- removing some redundant computation in LowerFormalArguments:
  MinReservedArea must equal ArgOffset after argument processing,
  so there's no use in computing it twice.

[This doesn't change the behavior of the current clang front-end,
since that never creates "byval align" arguments at the moment.
This will change with a follow-on patch, however.]


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212476 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-07 19:26:41 +00:00
Juergen Ributzka
75909261f0 [DAG] Pass the argument list to the CallLoweringInfo via move semantics. NFCI.
The argument list vector is never used after it has been passed to the
CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and
pretty much as cheap as keeping a pointer to it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212135 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-01 22:01:54 +00:00
Craig Topper
521a69f182 Add ops() method to SDNode that returns an ArrayRef<SDUse>. Use it to simplify some code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211993 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-29 00:40:57 +00:00
Ulrich Weigand
1edaab996f [PowerPC] Constrain base register in PPCRegisterInfo::resolveFrameIndex
I've run into a bug where current LLVM at -O0 (with fast-isel)
generated invalid code like:

        ld 0, 20936(1)                  # 8-byte Folded Reload
        stw 12, 10348(0)
        stw 12, 10344(0)

The underlying vreg had been introduced as base register by the
Local Stack Slot Allocation pass.  That register was constrained
to G8RC by PPCRegisterInfo::materializeFrameBaseRegister to match
the ADDI instruction used to set it, but it was *not* constrained
to G8RC_NOX0 to fit the *use* of the register in an address.

That should have happened in PPCRegisterInfo::resolveFrameIndex.
This patch adds an appropriate constrainRegClass call.

Reviewed by Hal Finkel.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211897 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-27 13:04:12 +00:00
Eric Christopher
db1c494276 Remove extraneous includes from the target machines.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211800 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-26 19:30:05 +00:00
Will Schmidt
eb3092083f add ppc64/pwr8 as target
includes handling DIR_PWR8 where appropriate
The P7Model Itinerary is currently tied in for use under the P8Model, and will be updated later.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211779 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-26 13:36:19 +00:00
Rafael Espindola
c7abd27294 Move expression visitation logic up to MCStreamer.
Remove the duplicate from MCRecordStreamer. No functionality change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211714 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-25 15:45:33 +00:00
Rafael Espindola
d4feaf82bc Simplify the visitation of target expressions. No functionality change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211707 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-25 15:29:54 +00:00
Bill Schmidt
808d878a96 [PPC64] Fix PR20071 (fctiduz generated for targets lacking that instruction)
PR20071 identifies a problem in PowerPC's fast-isel implementation for
floating-point conversion to integer.  The fctiduz instruction was added in
Power ISA 2.06 (i.e., Power7 and later).  However, this instruction is being
generated regardless of which 64-bit PowerPC target is selected.

The intent is for fast-isel to punt to DAG selection when this instruction is
not available.  This patch implements that change.  For testing purposes, the
existing fast-isel-conversion.ll test adds a RUN line for -mcpu=970 and tests
for the expected code generation.  Additionally, the existing test
fast-isel-conversion-p5.ll was found to be incorrectly expecting the
unavailable instruction to be generated.  I've removed these test variants
since we have adequate coverage in fast-isel-conversion.ll.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211627 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-24 20:05:18 +00:00
Ulrich Weigand
b548b6bfc3 [PowerPC] Refactor getMinCallFrameSize / getMinCallArgumentsSize
As of r211495, the only remaining users of getMinCallFrameSize are in
core ABI code (LowerFormalParameter / LowerCall).  This is actually a
good thing, since the details of the parameter save area are ABI specific.

With the new ELFv2 ABI in particular, the rules defining the size of the
save area will become significantly more complex, so it wouldn't make
sense to implement those outside ABI code that has all required
information.

In preparation, this patch eliminates the getMinCallFrameSize (and
associated getMinCallArgumentsSize) routines, and inlines them into all
callers.  Note that since nearly all call arguments are constant, this
allows simplifying the inlined copies to a single line everywhere.

No change in generate code expected.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211497 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-23 14:15:53 +00:00
Ulrich Weigand
9a154bfe94 [PowerPC] Allow stack frames without parameter save area
The PPCFrameLowering::determineFrameLayout routine currently ensures
that every function that allocates a stack frame provides space for the
parameter save area (via PPCFrameLowering::getMinCallFrameSize).

This is actually not necessary.  There may be functions that never call
another routine but still allocate a frame; those do not require the
parameter save area.  In the future, with the ELFv2 ABI, even some
routines that do call other functions do not need to allocate the
parameter save area.

While it is not a bug to allocate the parameter area when it is not
needed, it is better to avoid it to save stack space.

Note that when any particular function call requires the parameter save
area, this space will already have been included by ABI code in the size
the CALLSEQ_START insn is annotated with, and therefore included in the
size returned by MFI->getMaxCallFrameSize().

This means that determineFrameLayout simply does not need to care about
the parameter save area.  (It still needs to ensure that every frame
provides the linkage area.)  This is implemented by this patch.

Note that this exposed a bug in the new fast-isel code where the parameter
area was *not* included in the CALLSEQ_START size; this is also fixed.

A couple of test cases needed to be adapted for the new (smaller) stack
frame size those tests now see.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211495 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-23 13:47:52 +00:00
Ulrich Weigand
899842d2af [PowerPC] Fix IsDarwin arg in PPCFrameLowering:: calls
As remarked in the commit message to r211493, in several places
throughout the 64-bit SVR4 ABI code there are calls to
PPCFrameLowering::getLinkageSize and getMinCallFrameSize
using an incorrect IsDarwin argument of "true".

(Some of those were made explicit by the above refactoring patch, others
have been there all along.)

This patch fixes those places to pass "false" for IsDarwin.

No change in generated code expected.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211494 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-23 13:21:43 +00:00
Ulrich Weigand
c125f3c15a [PowerPC] Refactor setMinReservedArea and CalculateParameterAndLinkageAreaSize
The PPCISelLowering.cpp routines PPCTargetLowering::setMinReservedArea and
CalculateParameterAndLinkageAreaSize are currently used as subroutines
from both 64-bit SVR4 and Darwin ABI code.

However, the two ABIs are already quite different w.r.t. AltiVec
conventions, and they will become more different when the ELFv2 ABI is
supported.  Also, in general it seems better to disentangle ABI support
routines for different ABIs to avoid accidentally affecting one ABI when
intending to change only the other.

(Actually, the current code strictly speaking already contains a bug:
these routines call PPCFrameLowering::getMinCallFrameSize and
PPCFrameLowering::getLinkageSize with the IsDarwin parameter set to
"true" even on 64-bit SVR4.  This bug currently has no adverse effect
since those routines always return the same for 64-bit SVR4 and 64-bit
Darwin, but it still seems wrong ...  I'll fix this in a follow-up
commit shortly.)

To remove this code sharing, I'm simply inlining both routines into all
call sites (there are just two each, one for 64-bit SVR4 and one for
Darwin), and simplifying due to constant parameters where possible.

A small piece of code that *does* make sense to share is refactored into
the new routine EnsureStackAlignment, now also called from 32-bit SVR4
ABI code.

No change in generated code is expected.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211493 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-23 13:08:27 +00:00
Ulrich Weigand
fdb6eb65c7 [PowerPC] Fix on-stack AltiVec arguments with 64-bit SVR4
Current 64-bit SVR4 code seems to have some remnants of Darwin code
in AltiVec argument handing.  This had the effect that AltiVec arguments
(or subsequent arguments) were not correctly placed in the parameter area
in some cases.

The correct behaviour with the 64-bit SVR4 ABI is:
- All AltiVec arguments take up space in the parameter area, just like
  any other arguments, whether vararg or not.
- They are always 16-byte aligned, skipping a parameter area doubleword
  (and the associated GPR, if any), if necessary.

This patch implements the correct behaviour and adds a test case.
(Verified against GCC behaviour via the ABI compat test suite.)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211492 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-23 12:36:34 +00:00
Ulrich Weigand
69e4786797 [PowerPC] Fix small argument stack slot offset for LE
When small arguments (structures < 8 bytes or "float") are passed in a
stack slot in the ppc64 SVR4 ABI, they must reside in the least
significant part of that slot.  On BE, this means that an offset needs
to be added to the stack address of the parameter, but on LE, the least
significant part of the slot has the same address as the slot itself.

This changes the PowerPC back-end ABI code to only add the small
argument stack slot offset for BE.  It also adds test cases to verify
the correct behavior on both BE and LE.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211368 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-20 16:34:05 +00:00
Ulrich Weigand
ffbd906558 [PowerPC] Remove unnecessary load of r12 in indirect call
When looking at the 64-bit SVR4 indirect call sequence, I noticed
an unnecessary load of r12.  And indeed the code says:

  // R12 must contain the address of an indirect callee. 

But this is not correct; in the 64-bit SVR4 (ELFv1) ABI, there is
no need to load r12 at this point.  It seems this code and comment
is a remnant of code originally shared with the Darwin ABI ...

This patch simply removes the unnecessary load.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211203 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-18 18:33:36 +00:00
Ulrich Weigand
0c57babfc6 [PowerPC] Simplify and improve loading into TOC register
During an indirect function call sequence on the 64-bit SVR4 ABI,
generate code must load and then restore the TOC register.

This does not use a regular LOAD instruction since the TOC
register r2 is marked as reserved.  Instead, the are two
special instruction patterns:

 let RST = 2, DS = 2 in
 def LDinto_toc: DSForm_1a<58, 0, (outs), (ins g8rc:$reg),
                     "ld 2, 8($reg)", IIC_LdStLD,
                     [(PPCload_toc i64:$reg)]>, isPPC64;
 
 let RST = 2, DS = 10, RA = 1 in
 def LDtoc_restore : DSForm_1a<58, 0, (outs), (ins),
                     "ld 2, 40(1)", IIC_LdStLD,
                     [(PPCtoc_restore)]>, isPPC64;

Note that these not only restrict the destination of the
load to r2, but they also restrict the *source* of the
load to particular address combinations.  The latter is
a problem when we want to support the ELFv2 ABI, since
there the TOC save slot is no longer at 40(1).

This patch replaces those two instructions with a single
instruction pattern that only hard-codes r2 as destination,
but supports generic addresses as source.  This will allow
supporting the ELFv2 ABI, and also helps generate more
efficient code for calls to absolute addresses (allowing
simplification of the ppc64-calls.ll test case).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211193 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-18 17:52:49 +00:00
Ulrich Weigand
336da8cdc5 [PowerPC] Do not use BLA with the 64-bit SVR4 ABI
The PowerPC back-end uses BLA to implement calls to functions at
known-constant addresses, which is apparently used for certain
system routines on Darwin.

However, with the 64-bit SVR4 ABI, this is actually incorrect.
An immediate function pointer value on this platform is not
directly usable as a target address for BLA:
- in the ELFv1 ABI, the function pointer value refers to the
  *function descriptor*, not the code address
- in the ELFv2 ABI, the function pointer value refers to the
  global entry point, but BL(A) would only be correct when
  calling the *local* entry point

This bug didn't show up since using immediate function pointer
values is not usually done in the 64-bit SVR4 ABI in the first
place.  However, I ran into this issue with a certain use case
of LLVM as JIT, where immediate function pointer values were
uses to implement callbacks from JITted code to helpers in
statically compiled code.

Fixed by simply not using BLA with the 64-bit SVR4 ABI.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211174 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-18 16:14:04 +00:00