Commit Graph

858 Commits

Author SHA1 Message Date
Ulrich Weigand
b548b6bfc3 [PowerPC] Refactor getMinCallFrameSize / getMinCallArgumentsSize
As of r211495, the only remaining users of getMinCallFrameSize are in
core ABI code (LowerFormalParameter / LowerCall).  This is actually a
good thing, since the details of the parameter save area are ABI specific.

With the new ELFv2 ABI in particular, the rules defining the size of the
save area will become significantly more complex, so it wouldn't make
sense to implement those outside ABI code that has all required
information.

In preparation, this patch eliminates the getMinCallFrameSize (and
associated getMinCallArgumentsSize) routines, and inlines them into all
callers.  Note that since nearly all call arguments are constant, this
allows simplifying the inlined copies to a single line everywhere.

No change in generate code expected.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211497 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-23 14:15:53 +00:00
Ulrich Weigand
899842d2af [PowerPC] Fix IsDarwin arg in PPCFrameLowering:: calls
As remarked in the commit message to r211493, in several places
throughout the 64-bit SVR4 ABI code there are calls to
PPCFrameLowering::getLinkageSize and getMinCallFrameSize
using an incorrect IsDarwin argument of "true".

(Some of those were made explicit by the above refactoring patch, others
have been there all along.)

This patch fixes those places to pass "false" for IsDarwin.

No change in generated code expected.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211494 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-23 13:21:43 +00:00
Ulrich Weigand
c125f3c15a [PowerPC] Refactor setMinReservedArea and CalculateParameterAndLinkageAreaSize
The PPCISelLowering.cpp routines PPCTargetLowering::setMinReservedArea and
CalculateParameterAndLinkageAreaSize are currently used as subroutines
from both 64-bit SVR4 and Darwin ABI code.

However, the two ABIs are already quite different w.r.t. AltiVec
conventions, and they will become more different when the ELFv2 ABI is
supported.  Also, in general it seems better to disentangle ABI support
routines for different ABIs to avoid accidentally affecting one ABI when
intending to change only the other.

(Actually, the current code strictly speaking already contains a bug:
these routines call PPCFrameLowering::getMinCallFrameSize and
PPCFrameLowering::getLinkageSize with the IsDarwin parameter set to
"true" even on 64-bit SVR4.  This bug currently has no adverse effect
since those routines always return the same for 64-bit SVR4 and 64-bit
Darwin, but it still seems wrong ...  I'll fix this in a follow-up
commit shortly.)

To remove this code sharing, I'm simply inlining both routines into all
call sites (there are just two each, one for 64-bit SVR4 and one for
Darwin), and simplifying due to constant parameters where possible.

A small piece of code that *does* make sense to share is refactored into
the new routine EnsureStackAlignment, now also called from 32-bit SVR4
ABI code.

No change in generated code is expected.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211493 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-23 13:08:27 +00:00
Ulrich Weigand
fdb6eb65c7 [PowerPC] Fix on-stack AltiVec arguments with 64-bit SVR4
Current 64-bit SVR4 code seems to have some remnants of Darwin code
in AltiVec argument handing.  This had the effect that AltiVec arguments
(or subsequent arguments) were not correctly placed in the parameter area
in some cases.

The correct behaviour with the 64-bit SVR4 ABI is:
- All AltiVec arguments take up space in the parameter area, just like
  any other arguments, whether vararg or not.
- They are always 16-byte aligned, skipping a parameter area doubleword
  (and the associated GPR, if any), if necessary.

This patch implements the correct behaviour and adds a test case.
(Verified against GCC behaviour via the ABI compat test suite.)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211492 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-23 12:36:34 +00:00
Ulrich Weigand
69e4786797 [PowerPC] Fix small argument stack slot offset for LE
When small arguments (structures < 8 bytes or "float") are passed in a
stack slot in the ppc64 SVR4 ABI, they must reside in the least
significant part of that slot.  On BE, this means that an offset needs
to be added to the stack address of the parameter, but on LE, the least
significant part of the slot has the same address as the slot itself.

This changes the PowerPC back-end ABI code to only add the small
argument stack slot offset for BE.  It also adds test cases to verify
the correct behavior on both BE and LE.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211368 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-20 16:34:05 +00:00
Ulrich Weigand
ffbd906558 [PowerPC] Remove unnecessary load of r12 in indirect call
When looking at the 64-bit SVR4 indirect call sequence, I noticed
an unnecessary load of r12.  And indeed the code says:

  // R12 must contain the address of an indirect callee. 

But this is not correct; in the 64-bit SVR4 (ELFv1) ABI, there is
no need to load r12 at this point.  It seems this code and comment
is a remnant of code originally shared with the Darwin ABI ...

This patch simply removes the unnecessary load.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211203 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-18 18:33:36 +00:00
Ulrich Weigand
0c57babfc6 [PowerPC] Simplify and improve loading into TOC register
During an indirect function call sequence on the 64-bit SVR4 ABI,
generate code must load and then restore the TOC register.

This does not use a regular LOAD instruction since the TOC
register r2 is marked as reserved.  Instead, the are two
special instruction patterns:

 let RST = 2, DS = 2 in
 def LDinto_toc: DSForm_1a<58, 0, (outs), (ins g8rc:$reg),
                     "ld 2, 8($reg)", IIC_LdStLD,
                     [(PPCload_toc i64:$reg)]>, isPPC64;
 
 let RST = 2, DS = 10, RA = 1 in
 def LDtoc_restore : DSForm_1a<58, 0, (outs), (ins),
                     "ld 2, 40(1)", IIC_LdStLD,
                     [(PPCtoc_restore)]>, isPPC64;

Note that these not only restrict the destination of the
load to r2, but they also restrict the *source* of the
load to particular address combinations.  The latter is
a problem when we want to support the ELFv2 ABI, since
there the TOC save slot is no longer at 40(1).

This patch replaces those two instructions with a single
instruction pattern that only hard-codes r2 as destination,
but supports generic addresses as source.  This will allow
supporting the ELFv2 ABI, and also helps generate more
efficient code for calls to absolute addresses (allowing
simplification of the ppc64-calls.ll test case).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211193 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-18 17:52:49 +00:00
Ulrich Weigand
336da8cdc5 [PowerPC] Do not use BLA with the 64-bit SVR4 ABI
The PowerPC back-end uses BLA to implement calls to functions at
known-constant addresses, which is apparently used for certain
system routines on Darwin.

However, with the 64-bit SVR4 ABI, this is actually incorrect.
An immediate function pointer value on this platform is not
directly usable as a target address for BLA:
- in the ELFv1 ABI, the function pointer value refers to the
  *function descriptor*, not the code address
- in the ELFv2 ABI, the function pointer value refers to the
  global entry point, but BL(A) would only be correct when
  calling the *local* entry point

This bug didn't show up since using immediate function pointer
values is not usually done in the 64-bit SVR4 ABI in the first
place.  However, I ran into this issue with a certain use case
of LLVM as JIT, where immediate function pointer values were
uses to implement callbacks from JITted code to helpers in
statically compiled code.

Fixed by simply not using BLA with the 64-bit SVR4 ABI.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211174 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-18 16:14:04 +00:00
Eric Christopher
cb7dc25df1 Remove an extraneous this-> to access the subtarget.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210849 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-12 22:38:20 +00:00
Eric Christopher
f6b9efa7db Rename PPCSubTarget to Subtarget in PPCTargetLowering for consistency.
Also remove an extra local subtarget in the initialization functions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210848 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-12 22:38:18 +00:00
Bill Schmidt
b02d95cb66 [PPC64LE] Recognize shufflevector patterns for little endian
Various masks on shufflevector instructions are recognizable as
specific PowerPC instructions (vector pack, vector merge, etc.).
There is existing code in PPCISelLowering.cpp to recognize the correct
patterns for big endian code.  The masks for these instructions are
different for little endian code due to the big-endian numbering
employed by these instructions.  This patch adds the recognition code
for little endian.

I've added a new test case test/CodeGen/PowerPC/vec_shuffle_le.ll for
this.  The existing recognizer test (vec_shuffle.ll) is unnecessarily
verbose and difficult to read, so I felt it was better to add a new
test rather than modify the old one.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210536 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-10 14:35:01 +00:00
Bill Schmidt
8e38e86266 [PPC64LE] Generate correct code for unaligned little-endian vector loads
The code in PPCTargetLowering::PerformDAGCombine() that handles
unaligned Altivec vector loads generates a lvsl followed by a vperm.
As we've seen in numerous other places, the vperm instruction has a
big-endian bias, and this is fixed for little endian by complementing
the permute control vector and swapping the input operands.  In this
case the lvsl is providing the permute control vector.  Rather than
generating an lvsl and a complement operation, it is sufficient to
generate an lvsr instruction instead.  Thus for LE code generation we
will generate an lvsr rather than an lvsl, and swap the other input
arguments on the vperm.

The existing test/CodeGen/PowerPC/vec_misalign.ll is updated to test
the code generation for PPC64 and PPC64LE, in addition to the existing
PPC32/G5 testing.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210493 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-09 22:00:52 +00:00
Bill Schmidt
4cef3fb022 [PPC64LE] Generate correct little-endian code for v16i8 multiply
The existing code in PPCTargetLowering::LowerMUL() for multiplying two
v16i8 values assumes that vector elements are numbered in big-endian
order.  For little-endian targets, the vector element numbering is
reversed, but the vmuleub, vmuloub, and vperm instructions still
assume big-endian numbering.  To account for this, we must adjust the
permute control vector and reverse the order of the input registers on
the vperm instruction.

The existing test/CodeGen/PowerPC/vec_mul.ll is updated to be executed
on powerpc64 and powerpc64le targets as well as the original powerpc
(32-bit) target.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210474 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-09 16:06:29 +00:00
Bill Schmidt
6c9eb10784 [PPC64LE] Fix lowering of BUILD_VECTOR and SHUFFLE_VECTOR for little endian
This patch fixes a couple of lowering issues for little endian
PowerPC.  The code for lowering BUILD_VECTOR contains a number of
optimizations that are only valid for big endian.  For now, we disable
those optimizations for correctness.  In the future, we will add
analogous optimizations that are correct for little endian.

When lowering a SHUFFLE_VECTOR to a VPERM operation, we again need to
make the now-familiar transformation of swapping the input operands
and complementing the permute control vector.  Correctness of this
transformation is tested by the accompanying test case.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210336 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-06 14:06:26 +00:00
Eric Christopher
11b190e979 Omit else branch after return.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210034 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-02 17:29:07 +00:00
Eric Christopher
c55e193cdd Have the TLOF creation take a Triple rather than needing a subtarget.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209937 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-31 00:07:32 +00:00
Eric Christopher
96241f26fc isSVR4ABI() returned !isDarwin() so just move that to the else
block and remove the unreachable code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209927 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-30 22:47:53 +00:00
Eric Christopher
46949d58b9 Rename CreateTLOF->createTLOF to match the rest of the file and the
rest of the targets with a similar function name.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209926 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-30 22:47:48 +00:00
Bill Schmidt
3f01f5296e [PATCH] Correct type used for VADD_SPLAT optimization on PowerPC
In PPCISelLowering.cpp: PPCTargetLowering::LowerBUILD_VECTOR(), there
is an optimization for certain patterns to generate one or two vector
splats followed by a vector add or subtract.  This operation is
represented by a VADD_SPLAT in the selection DAG.  Prior to this
patch, it was possible for the VADD_SPLAT to be assigned the wrong
data type, causing incorrect code generation.  This patch corrects the
problem.

Specifically, the code previously assigned the value type of the
BUILD_VECTOR node to the newly generated VADD_SPLAT node.  This is
correct much of the time, but not always.  The problem is that the
call to isConstantSplat() may return a SplatBitSize that is not the
same as the number of bits in the original element vector type.  The
correct type to assign is a vector type with the same element bit size
as SplatBitSize.

The included test case shows an example of this, where the
BUILD_VECTOR node has a type of v16i8.  The vector to be built is {0,
16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16}.  isConstantSplat
detects that we can generate a splat of 16 for type v8i16, which is
the type we must assign to the VADD_SPLAT node.  If we do not, we
generate a vspltisb of 8 and a vaddubm, which generates the incorrect
result {16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16}.  The correct code generation is a vspltish of 8 and a vadduhm.

This patch also corrected code generation for
CodeGen/PowerPC/2008-07-10-SplatMiscompile.ll, which had been marked
as an XFAIL, so we can remove the XFAIL from the test case.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209662 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-27 15:57:51 +00:00
Adam Nemet
c9b12d06ef [PowerPC] PR19796: Also match ISD::TargetConstant in isIntS16Immediate
The SplitIndexingFromLoad changes exposed a latent isel bug in the PowerPC64
backend.  We matched an immediate offset with STWX8 even though it only
supports register offset.

The culprit is the complex-pattern predicate, SelectAddrIdx, which decides
that if the offset is not ISD::Constant it must be a register.

Many thanks to Bill Schmidt for testing this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209219 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-20 17:20:34 +00:00
Benjamin Kramer
bb81d9d5fa SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the bswap not.
- On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though.
- On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal.
- On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209123 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-19 13:12:38 +00:00
Saleem Abdulrasool
82b1114fef Target: remove old constructors for CallLoweringInfo
This is mostly a mechanical change changing all the call sites to the newer
chained-function construction pattern.  This removes the horrible 15-parameter
constructor for the CallLoweringInfo in favour of setting properties of the call
via chained functions.  No functional change beyond the removal of the old
constructors are intended.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209082 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-17 21:50:17 +00:00
Jay Foad
6b543713a2 Rename ComputeMaskedBits to computeKnownBits. "Masked" has been
inappropriate since it lost its Mask parameter in r154011.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208811 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-14 21:14:37 +00:00
Hal Finkel
70a83b490e [PowerPC] Add global named register support
Support for the intrinsics that read from and write to global named registers
is added for r1, r2 and r13 (depending on the subtarget).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208509 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-11 19:29:11 +00:00
Craig Topper
7ae9b5fc71 Use makeArrayRef insted of calling ArrayRef<T> constructor directly. I introduced most of these recently.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207616 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-30 07:17:30 +00:00
Craig Topper
7d811a53de Convert more SelectionDAG functions to use ArrayRef.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207397 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-28 05:57:50 +00:00
Craig Topper
c34a25d59d [C++] Use 'nullptr'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207394 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-28 04:05:08 +00:00
Craig Topper
a7f892b33b Convert SelectionDAG::getMergeValues to use ArrayRef.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207374 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-27 19:20:57 +00:00
Craig Topper
72c93595de Convert getMemIntrinsicNode to take ArrayRef of SDValue instead of pointer and size.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207329 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-26 19:29:41 +00:00
Craig Topper
80d8db7a1f Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207327 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-26 18:35:24 +00:00
Craig Topper
c848b1bbcf [C++] Use 'nullptr'. Target edition.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207197 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-25 05:30:21 +00:00
Reid Kleckner
710c1a449d Add 'musttail' marker to call instructions
This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur.  It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.

Reviewers: nicholas

Differential Revision: http://llvm-reviews.chandlerc.com/D3240

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207143 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-24 20:14:34 +00:00
Nick Lewycky
d63390cba1 Break PseudoSourceValue out of the Value hierarchy. It is now the root of its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206255 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-15 07:22:52 +00:00
Hal Finkel
f4c3a5601a [PowerPC] Implement some additional TLI callbacks
Add implementations of:
  bool isLegalICmpImmediate(int64_t Imm) const
  bool isLegalAddImmediate(int64_t Imm) const
  bool isTruncateFree(Type *Ty1, Type *Ty2) const
  bool isTruncateFree(EVT VT1, EVT VT2) const
  bool shouldConvertConstantLoadToIntImm(const APInt &Imm, Type *Ty) const

Unfortunately, this regresses counter-register-based loop formation because
some of the loops now end up in forms were SE cannot compute loop counts.
However, nevertheless, the test-suite results favor committing:

SingleSource/Benchmarks/BenchmarkGame/puzzle: 26% speedup
MultiSource/Benchmarks/FreeBench/analyzer/analyzer: 21% speedup
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan: 20% speedup
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv: 19% speedup
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv: 15% speedup
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2: 2% speedup

MultiSource/Benchmarks/VersaBench/bmm/bmm: 26% slowdown

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206120 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-12 21:52:38 +00:00
Hal Finkel
c0c10f20a2 [PowerPC] Don't return false from PPC::isVSLDOIShuffleMask
PPC::isVSLDOIShuffleMask should return -1, not false, when the shuffle
predicate should be false.

Noticed by inspection; no test case (yet).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205787 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-08 19:00:27 +00:00
Craig Topper
84f7f350c3 Make consistent use of MCPhysReg instead of uint16_t throughout the tree.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205610 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-04 05:16:06 +00:00
Hal Finkel
e2b3751924 [PowerPC] Don't ever expand BUILD_VECTOR of v2i64 with shuffles
If we have two unique values for a v2i64 build vector, this will always result
in two vector loads if we expand using shuffles. Only one is necessary.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205231 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-31 17:48:16 +00:00
Hal Finkel
ee8e48d4c9 [PowerPC] Handle VSX v2i64 SIGN_EXTEND_INREG
sitofp from v2i32 to v2f64 ends up generating a SIGN_EXTEND_INREG v2i64 node
(and similarly for v2i16 and v2i8). Even though there are no sign-extension (or
algebraic shifts) for v2i64 types, we can handle v2i32 sign extensions by
converting two and from v2i64. The small trick necessary here is to shift the
i32 elements into the right lanes before the i32 -> f64 step. This is because
of the big Endian nature of the system, we need the i32 portion in the high
word of the i64 elements.

For v2i16 and v2i8 we can do the same, but we first use the default Altivec
shift-based expansion from v2i16 or v2i8 to v2i32 (by casting to v4i32) and
then apply the above procedure.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205146 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-30 13:22:59 +00:00
Hal Finkel
7563821402 [PowerPC] Handle v2i64 comparisons
v2i64 is a legal type under VSX, however we don't have native vector
comparisons. We can handle eq/ne by casting it to an Altivec type, but
everything else must be expanded.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205106 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-29 16:04:40 +00:00
Hal Finkel
44b2b9dc1a [PowerPC] Add subregister classes for f64 VSX values
We had stored both f64 values and v2f64, etc. values in the VSX registers. This
worked, but was suboptimal because we would always spill 16-byte values even
through we almost always had scalar 8-byte values. This resulted in an
increase in stack-size use, extra memory bandwidth, etc. To fix this, I've
added 64-bit subregisters of the Altivec registers, and combined those with the
existing scalar floating-point registers to form a class of VSX scalar
floating-point registers. The ABI code has also been enhanced to use this
register class and some other necessary improvements have been made.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205075 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-29 05:29:01 +00:00
Hal Finkel
c9de9e60b9 [PowerPC] v2[fi]64 need to be explicitly passed in VSX registers
v2[fi]64 values need to be explicitly passed in VSX registers. This is because
the code in TRI that finds the minimal register class given a register and a
value type will assert if given an Altivec register and a non-Altivec type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205041 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-28 19:58:11 +00:00
Hal Finkel
6bdc4ebedd [PowerPC] Fix v2f64 vector extract and related patterns
First, v2f64 vector extract had not been declared legal (and so the existing
patterns were not being used). Second, the patterns for that, and for
scalar_to_vector, should really be a regclass copy, not a subregister
operation, because the VSX registers directly hold both the vector and scalar data.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204971 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-27 22:22:48 +00:00
Hal Finkel
276d854549 [PowerPC] Expand v2i64 shifts
These operations need to be expanded during legalization so that isel does not
crash. In theory, we might be able to custom lower some of these. That,
however, would need to be follow-up work.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204963 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-27 21:26:33 +00:00
Hal Finkel
ee5f4bb6b3 [PowerPC] Generate VSX permutations for v2[fi]64 vectors
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204873 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-26 22:58:37 +00:00
Hal Finkel
6da0178737 [PowerPC] VSX loads and stores support unaligned access
I've not yet updated PPCTTI because I'm not sure what the actual relative cost
is compared to the aligned uses.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204848 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-26 19:39:09 +00:00
Hal Finkel
b397453155 [PowerPC] Use v2f64 <-> v2i64 VSX conversion instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204843 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-26 19:13:54 +00:00
Hal Finkel
c6940d4cb7 [PowerPC] Use VSX vector load/stores for v2[fi]64
These instructions have access to the complete VSX register file. In addition,
they "swap" the order of the elements so that element 0 (the scalar part) comes
first in memory and element 1 follows at a higher address.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204838 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-26 18:26:30 +00:00
Hal Finkel
7363d2223e [PowerPC] Add v2i64 as a legal VSX type
v2i64 needs to be a legal VSX type because it is the SetCC result type from
v2f64 comparisons. We need to expand all non-arithmetic v2i64 operations.

This fixes the lowering for v2f64 VSELECT.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204828 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-26 16:12:58 +00:00
Hal Finkel
159e7f4095 [PowerPC] Lower VSELECT using xxsel when VSX is available
With VSX there is a real vector select instruction, and so we should use it.
Note that VSELECT will still scalarize for v2f64 because the corresponding
SetCC result type (v2i64) is not currently a legal type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204801 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-26 12:49:28 +00:00
Hal Finkel
ab849adec4 [PowerPC] Initial support for the VSX instruction set
VSX is an ISA extension supported on the POWER7 and later cores that enhances
floating-point vector and scalar capabilities. Among other things, this adds
<2 x double> support and generally helps to reduce register pressure.

The interesting part of this ISA feature is the register configuration: there
are 64 new 128-bit vector registers, the 32 of which are super-registers of the
existing 32 scalar floating-point registers, and the second 32 of which overlap
with the 32 Altivec vector registers. This makes things like vector insertion
and extraction tricky: this can be free but only if we force a restriction to
the right register subclass when needed. A new "minipass" PPCVSXCopy takes care
of this (although it could do a more-optimal job of it; see the comment about
unnecessary copies below).

Please note that, currently, VSX is not enabled by default when targeting
anything because it is not yet ready for that.  The assembler and disassembler
are fully implemented and tested. However:

 - CodeGen support causes miscompiles; test-suite runtime failures:
      MultiSource/Benchmarks/FreeBench/distray/distray
      MultiSource/Benchmarks/McCat/08-main/main
      MultiSource/Benchmarks/Olden/voronoi/voronoi
      MultiSource/Benchmarks/mafft/pairlocalalign
      MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4
      SingleSource/Benchmarks/CoyoteBench/almabench
      SingleSource/Benchmarks/Misc/matmul_f64_4x4

 - The lowering currently falls back to using Altivec instructions far more
   than it should. Worse, there are some things that are scalarized through the
   stack that shouldn't be.

 - A lot of unnecessary copies make it past the optimizers, and this needs to
   be fixed.

 - Many more regression tests are needed.

Normally, I'd fix these things prior to committing, but there are some
students and other contributors who would like to work this, and so it makes
sense to move this development process upstream where it can be subject to the
regular code-review procedures.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203768 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-13 07:58:58 +00:00