975 Commits

Author SHA1 Message Date
Benjamin Kramer
30fa873958 Make some non-constant static variables non-static or fully const.
Otherwise we have to emit thread-safe initialization for them. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230894 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-01 18:09:56 +00:00
Hal Finkel
e03aac601f [PowerPC] Use vector types for memcpy and friends (sometimes)
When using Altivec, we can use vector loads and stores for aligned memcpy and
friends. Starting with the P7 and VXS, we have reasonable unaligned vector
stores. Starting with the P8, we have fast unaligned loads too.

For QPX, we use vector loads are stores, but only for aligned memory accesses.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230788 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-27 19:58:28 +00:00
Eric Christopher
acdd4442cb getRegForInlineAsmConstraint wants to use TargetRegisterInfo for
a lookup, pass that in rather than use a naked call to getSubtargetImpl.
This involved passing down and around either a TargetMachine or
TargetRegisterInfo. Update all callers/definitions around the targets
and SelectionDAG.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230699 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-26 22:38:43 +00:00
Eric Christopher
a01bc6a59f Remove an argument-less call to getSubtargetImpl from TargetLoweringBase.
This required plumbing a TargetRegisterInfo through computeRegisterProperties
and into findRepresentativeClass which uses it for register class
iteration. This required passing a subtarget into a few target specific
initializations of TargetLowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230583 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-26 00:00:24 +00:00
Hal Finkel
7840990de8 [PowerPC] Make LDtocL and friends invariant loads
LDtocL, and other loads that roughly correspond to the TOC_ENTRY SDAG node,
represent loads from the TOC, which is invariant. As a result, these loads can
be hoisted out of loops, etc. In order to do this, we need to generate
GOT-style MMOs for TOC_ENTRY, which requires treating it as a legitimate memory
intrinsic node type. Once this is done, the MMO transfer is automatically
handled for TableGen-driven instruction selection, and for nodes generated
directly in PPCISelDAGToDAG, we need to transfer the MMOs manually.

Also, we were not transferring MMOs associated with pre-increment loads, so do
that too.

Lastly, this fixes an exposed bug where R30 was not added as a defined operand of
UpdateGBR.

This problem was highlighted by an example (used to generate the test case)
posted to llvmdev by Francois Pichet.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230553 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-25 21:36:59 +00:00
Hal Finkel
a40d4ae478 [PowerPC] Cleanup unused target-specific SDAG nodes
We had somehow accumulated a few target-specific SDAG nodes dealing with PPC64
TOC access that were referenced only in TableGen patterns. The associated
(pseudo-)instructions are used, but are being generated directly. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230518 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-25 18:06:45 +00:00
Aaron Ballman
3cecbeccf2 Silencing a "result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)" warning in MSVC; NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230489 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-25 13:05:24 +00:00
Aaron Ballman
d7b05fe20f Silencing a -Wsign-compare warning triggered in MSVC; NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230488 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-25 13:02:23 +00:00
Hal Finkel
f8d179ba76 [PowerPC] Add support for the QPX vector instruction set
This adds support for the QPX vector instruction set, which is used by the
enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes
wide, holding 4 double-precision floating-point values. Boolean values, modeled
here as <4 x i1> are actually also represented as floating-point values
(essentially  { -1, 1 } for { false, true }). QPX shares many features with
Altivec and VSX, but is distinct from both of them. One major difference is
that, instead of adding completely-separate vector registers, QPX vector
registers are extensions of the scalar floating-point registers (lane 0 is the
corresponding scalar floating-point value). The operations supported on QPX
vectors mirrors that supported on the scalar floating-point values (with some
additional ones for permutations and logical/comparison operations).

I've been maintaining this support out-of-tree, as part of the bgclang project,
for several years. This is not the entire bgclang patch set, but is most of the
subset that can be cleanly integrated into LLVM proper at this time. Adding
this to the LLVM backend is part of my efforts to rebase bgclang to the current
LLVM trunk, but is independently useful (especially for codes that use LLVM as
a JIT in library form).

The assembler/disassembler test coverage is complete. The CodeGen test coverage
is not, but I've included some tests, and more will be added as follow-up work.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230413 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-25 01:06:45 +00:00
Tim Northover
ca7e0787f0 CodeGen: convert CCState interface to using ArrayRefs
Everyone except R600 was manually passing the length of a static array
at each callsite, calculated in a variety of interesting ways. Far
easier to let ArrayRef handle that.

There should be no functional change, but out of tree targets may have
to tweak their calls as with these examples.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230118 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-21 02:11:17 +00:00
Andrew Trick
4f7d60c1ea AArch64: Safely handle the incoming sret call argument.
This adds a safe interface to the machine independent InputArg struct
for accessing the index of the original (IR-level) argument. When a
non-native return type is lowered, we generate the hidden
machine-level sret argument on-the-fly. Before this fix, we were
representing this argument as OrigArgIndex == 0, which is an outright
lie. In particular this crashed in the AArch64 backend where we
actually try to access the type of the original argument.

Now we use a sentinel value for machine arguments that have no
original argument index. AArch64, ARM, Mips, and PPC now check for this
case before accessing the original argument.

Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229413 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-16 18:10:47 +00:00
Duncan P. N. Exon Smith
4eba62d6c2 PowerPC: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.

getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
  => getFnAttribute(Kind)

getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
  => hasFnAttribute(Kind)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229224 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-14 02:54:07 +00:00
Eric Christopher
7b93acde32 Stash the TargetMachine on the subtarget so we can access it later.
Clean up a subtarget function that has it passed in while we're at it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229164 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 22:23:04 +00:00
Eric Christopher
4015892a84 PPC LinkageSize can be computed at initialization time, do so.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229163 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 22:22:57 +00:00
Eric Christopher
2b4e1b9abf PPCFrameLowering's FramePointerOffset can be computed at initialization
time. Do so.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228998 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 00:39:38 +00:00
Eric Christopher
82eeeb5b94 The TOC save offset can be computed at compile time, do so and
propagate changes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228997 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 00:39:36 +00:00
Eric Christopher
b947233818 The return save offset can be computed at initialization time - do
so and save the value.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228996 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 00:39:27 +00:00
Hal Finkel
091acf253b [PowerPC] Mark jumps as expensive (using using CR bits)
On PowerPC, which has a full set of logical operations on (its multiple sets
of) condition-register bits, it is not profitable to break of complex
conditions feeding a jump into multiple jumps. We can turn off this feature of
CGP/SDAGBuilder by marking jumps as "expensive".

P7 test-suite speedups (no regressions):
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2
	-0.626647% +/- 0.323583%
MultiSource/Benchmarks/Olden/power/power
	-18.2821% +/- 8.06481%

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228895 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-12 01:02:52 +00:00
Bill Schmidt
49b3971b70 [PowerPC] Fix reverted patch r227976 to avoid register assignment issues
See full discussion in http://reviews.llvm.org/D7491.

We now hide the add-immediate and call instructions together in a
separate pseudo-op, which is tagged to define GPR3 and clobber the
call-killed registers.  The PPCTLSDynamicCall pass prior to RA now
expands this op into the two separate addi and call ops, with explicit
definitions of GPR3 on both instructions, and explicit clobbers on the
call instruction.  The pass is now marked as requiring and preserving
the LiveIntervals and SlotIndexes analyses, and fixes these up after
the replacement sequences are introduced.

Self-hosting has been verified on LE P8 and BE P7 with various
optimization levels, etc.  It has also been verified with the
--no-tls-optimize flag workaround removed.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228725 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 19:09:05 +00:00
Hal Finkel
9168f717c9 Revert "r227976 - [PowerPC] Yet another approach to __tls_get_addr" and related fixups
Unfortunately, even with the workaround of disabling the linker TLS
optimizations in Clang restored (which has already been done), this still
breaks self-hosting on my P7 machine (-O3 -DNDEBUG -mcpu=native).

Bill is currently working on an alternate implementation to address the TLS
issue in a way that also fully elides the linker bug (which, unfortunately,
this approach did not fully), so I'm reverting this now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228460 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 23:07:40 +00:00
Hal Finkel
885b67a5c3 [PowerPC] Generate pre-increment floating-point ld/st instructions
PowerPC supports pre-increment floating-point load/store instructions, both r+r
and r+i, and we had patterns for them, but they were not marked as legal. Mark
them as legal (and add a test case).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228327 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 18:42:53 +00:00
Bill Schmidt
202b6045bf [PowerPC] Implement the vclz instructions for PWR8
Patch by Kit Barton.

Add the vector count leading zeros instruction for byte, halfword,
word, and doubleword sizes.  This is a fairly straightforward addition
after the changes made for vpopcnt:

 1. Add the correct definitions for the various instructions in
    PPCInstrAltivec.td
 2. Make the CTLZ operation legal on vector types when using P8Altivec
    in PPCISelLowering.cpp 

Test Plan

Created new test case in test/CodeGen/PowerPC/vec_clz.ll to check the
instructions are being generated when the CTLZ operation is used in
LLVM.

Check the encoding and decoding in test/MC/PowerPC/ppc_encoding_vmx.s
and test/Disassembler/PowerPC/ppc_encoding_vmx.txt respectively.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228301 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 15:24:47 +00:00
Bill Schmidt
8c775a4e7b [PowerPC] Implement the vpopcnt instructions for POWER8
Patch by Kit Barton.

Add the vector population count instructions for byte, halfword, word,
and doubleword sizes.  There are two major changes here:

    PPCISelLowering.cpp: Make CTPOP legal for vector types.
    PPCRegisterInfo.td: Added v2i64 to the VRRC register
      definition. This is needed for the doubleword variations of the
      integer ops that were added in P8. 

Test Plan

Test the instruction vpcnt* encoding/decoding in ppc64-encoding-vmx.s

Test the generation of the vpopcnt instructions for various vector
data types.  When adding the v2i64 type to the Vector Register set, I
also needed to add the appropriate bit conversion patterns between
v2i64 and the existing vector types.  Testing for these conversions
were also added in the test case by passing a different vector type as
a parameter into the test functions.  There is also a run step that
will ensure the vpopcnt instructions are generated when the vsx
feature is disabled.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228046 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 21:58:23 +00:00
Bill Schmidt
1123a81009 [PowerPC] Yet another approach to __tls_get_addr
This patch is a third attempt to properly handle the local-dynamic and
global-dynamic TLS models.

In my original implementation, calls to __tls_get_addr were hidden
from view until the asm-printer phase, at which point the underlying
branch-and-link instruction was created with proper relocations.  This
mostly worked well, but I used some repellent techniques to ensure
that the TLS_GET_ADDR nodes at the SD and MI levels correctly received
input from GPR3 and produced output into GPR3.  This proved to work
badly in the presence of multiple TLS variable accesses, with the
copies to and from GPR3 being scheduled incorrectly and generally
creating havoc.

In r221703, I addressed that problem by representing the calls to
__tls_get_addr as true calls during instruction lowering.  This had
the advantage of removing all of the bad hacks and relying on the
existing call machinery to properly glue the copies in place. It
looked like this was going to be the right way to go.

However, as a side effect of the recent discovery of problems with
linker optimizations for TLS, we discovered cases of suboptimal code
generation with this strategy.  The problem comes when tls_get_addr is
called for the same address, and there is a resulting CSE
opportunity.  It turns out that in such cases MachineCSE will common
the addis/addi instructions that set up the input value to
tls_get_addr, but will not common the calls themselves.  MachineCSE
does not have any machinery to common idempotent calls.  This is
perfectly sensible, since presumably this would be done at the IR
level, and introducing calls in the back end isn't commonplace.  In
any case, we end up with two calls to __tls_get_addr when one would
suffice, and that isn't good.

I presumed that the original design would have allowed commoning of
the machine-specific nodes that hid the __tls_get_addr calls, so as
suggested by Ulrich Weigand, I went back to that design and cleaned it
up so that the copies were properly held together by glue
nodes.  However, it turned out that this didn't work either...the
presence of copies to physical registers kept the machine-specific
nodes from being commoned also.

All of which leads to the design presented here.  This is a return to
the original design, except that no attempt is made to introduce
copies to and from GPR3 during instruction lowering.  Virtual registers
are used until prior to register allocation.  At that point, a special
pass is run that identifies the machine-specific nodes that hide the
tls_get_addr calls and introduces the copies to and from GPR3 around
them.  The register allocator then coalesces these copies away.  With
this design, MachineCSE succeeds in commoning tls_get_addr calls where
possible, and we get nice optimal code generation (better than GCC at
the moment, which does not common these calls).

One additional problem must be dealt with:  After introducing the
mentions of the physical register GPR3, the aggressive anti-dependence
breaker sees opportunities to improve scheduling by selecting a
different register instead.  Flags must be used on the instruction
descriptions to tell the anti-dependence breaker to keep its hands in
its pockets.

One thing missing from the original design was recording a definition
of the link register on the GET_TLS_ADDR nodes.  Doing this was found
to be insufficient to force a stack frame to be created, which led to
looping behavior because two different LR values were stored at the
same address.  This appears to have been an oversight in
PPCFrameLowering::determineFrameLayout(), which is repaired here.

Because MustSaveLR() returns true for calls to builtin_return_address,
this changed the expected behavior of
test/CodeGen/PowerPC/retaddr2.ll, which now stacks a frame but
formerly did not.  I've fixed the test case to reflect this.

There are existing TLS tests to catch regressions; the checks in
test/CodeGen/PowerPC/tls-store2.ll proved to be too restrictive in the
face of instruction scheduling with these changes, so I fixed that
up.

I've added a new test case based on the PrettyStackTrace module that
demonstrated the original problem. This checks that we get correct
code generation and that CSE of the calls to __get_tls_addr has taken
place.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227976 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 16:16:01 +00:00
Hal Finkel
8f5c829c1e [PowerPC] Make r2 allocatable on PPC64/ELF for some leaf functions
The TOC base pointer is passed in r2, and we normally reserve this register so
that we can depend on it being there. However, for leaf functions, and
specifically those leaf functions that don't do any TOC access of their own
(which is generally due to accessing the constant pool, using TLS, etc.),
we can treat r2 as an ordinary callee-saved register (it must be callee-saved
because, for local direct calls, the linker will not insert any save/restore
code).

The allocation order has been changed slightly for PPC64/ELF systems to put r2
at the end of the list (while leaving it near the beginning for Darwin systems
to prevent unnecessary output changes). While r2 is allocatable, using it still
requires spill/restore traffic, and thus comes at the end of the list.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227745 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 15:03:28 +00:00
Eric Christopher
5882e66f5b Use the cached subtargets and remove calls to getSubtarget/getSubtargetImpl
without a Function argument.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227622 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-30 22:02:31 +00:00
Eric Christopher
04bcc11905 Move DataLayout back to the TargetMachine from TargetSubtargetInfo
derived classes.

Since global data alignment, layout, and mangling is often based on the
DataLayout, move it to the TargetMachine. This ensures that global
data is going to be layed out and mangled consistently if the subtarget
changes on a per function basis. Prior to this all targets(*) have
had subtarget dependent code moved out and onto the TargetMachine.

*One target hasn't been migrated as part of this change: R600. The
R600 port has, as a subtarget feature, the size of pointers and
this affects global data layout. I've currently hacked in a FIXME
to enable progress, but the port needs to be updated to either pass
the 64-bitness to the TargetMachine, or fix the DataLayout to
avoid subtarget dependent features.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227113 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 19:03:15 +00:00
Hal Finkel
d1f1656447 [PowerPC] Add r2 as an operand for all calls under both PPC64 ELF V1 and V2
Our PPC64 ELF V2 call lowering logic added r2 as an operand to all direct call
instructions in order to represent the dependency on the TOC base pointer
value. Restricting this to ELF V2, however, does not seem to make sense: calls
under ELF V1 have the same dependence, and indirect calls have an r2 dependence
just as direct ones. Make sure the dependence is noted for all calls under both
ELF V1 and ELF V2.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226432 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 07:20:27 +00:00
Hal Finkel
180f89537a [PowerPC] Add some FIXMEs for fastcc and FPR <-> GPR moves
So we don't forget, once we support FPR <-> GPR moves on the P8, we'll likely
want to re-visit this part of the calling convention.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226401 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 14:31:10 +00:00
Hal Finkel
a01b583dbc [PowerPC] Initial PPC64 calling-convention changes for fastcc
The default calling convention specified by the PPC64 ELF (V1 and V2) ABI is
designed to work with both prototyped and non-prototyped/varargs functions. As
a result, GPRs and stack space are allocated for every argument, even those
that are passed in floating-point or vector registers.

GlobalOpt::OptimizeFunctions will transform local non-varargs functions (that
do not have their address taken) to use the 'fast' calling convention.

When functions are using the 'fast' calling convention, don't allocate GPRs for
arguments passed in other types of registers, and don't allocate stack space for
arguments passed in registers. Other changes for the fast calling convention
may be added in the future.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226399 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 12:08:47 +00:00
Hal Finkel
962cff0c08 [PowerPC] Don't list R11 as a patchpoint scratch register
R11's status is the same under both the PPC64 ELF V1 and V2 ABIs: it is
reserved for use as an "environment pointer" for compilation models that
require such a thing. We don't, we also don't need a second scratch register,
and because we support only "local" patchpoint call targets, we might as well
let R11 be used for anyregcc patchpoints.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226369 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-17 03:57:34 +00:00
Hal Finkel
92cd0ca3b2 [PowerPC] Adjust PatchPoints for ppc64le
Bill Schmidt pointed out that some adjustments would be needed to properly
support powerpc64le (using the ELF V2 ABI). For one thing, R11 is not available
as a scratch register, so we need to use R12. R12 is also available under ELF
V1, so to maintain consistency, I flipped the order to make R12 the first
scratch register in the array under both ABIs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226247 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 04:40:58 +00:00
Hal Finkel
94dc061e85 [PowerPC] Loosen ELFv1 PPC64 func descriptor loads for indirect calls
Function pointers under PPC64 ELFv1 (which is used on PPC64/Linux on the
POWER7, A2 and earlier cores) are really pointers to a function descriptor, a
structure with three pointers: the actual pointer to the code to which to jump,
the pointer to the TOC needed by the callee, and an environment pointer. We
used to chain these loads, and make them opaque to the rest of the optimizer,
so that they'd always occur directly before the call. This is not necessary,
and in fact, highly suboptimal on embedded cores. Once the function pointer is
known, the loads can be performed ahead of time; in fact, they can be hoisted
out of loops.

Now these function descriptors are almost always generated by the linker, and
thus the contents of the descriptors are invariant. As a result, by default,
we'll mark the associated loads as invariant (allowing them to be hoisted out
of loops). I've added a target feature to turn this off, however, just in case
someone needs that option (constructing an on-stack descriptor, casting it to a
function pointer, and then calling it cannot be well-defined C/C++ code, but I
can imagine some JIT-compilation system doing so).

Consider this simple test:
  $ cat call.c

  typedef void (*fp)();
  void bar(fp x) {
    for (int i = 0; i < 1600000000; ++i)
      x();
  }

  $ cat main.c

  typedef void (*fp)();
  void bar(fp x);
  void foo() {}
  int main() {
    bar(foo);
  }

On the PPC A2 (the BG/Q supercomputer), marking the function-descriptor loads
as invariant brings the execution time down to ~8 seconds from ~32 seconds with
the loads in the loop.

The difference on the POWER7 is smaller. Compiling with:

  gcc -std=c99 -O3 -mcpu=native call.c main.c : ~6 seconds [this is 4.8.2]

  clang -O3 -mcpu=native call.c main.c : ~5.3 seconds

  clang -O3 -mcpu=native call.c main.c -mno-invariant-function-descriptors : ~4 seconds
  (looks like we'd benefit from additional loop unrolling here, as a first
   guess, because this is faster with the extra loads)

The -mno-invariant-function-descriptors will be added to Clang shortly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226207 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 21:17:34 +00:00
Hal Finkel
4572a0a0a2 [PowerPC] Add assembler support for mcrfs and friends
Fill out our support for the floating-point status and control register
instructions (mcrfs and friends). As it turns out, these are necessary for
compiling src/test/harness_fp.h in TBB for PowerPC.

Thanks to Raf Schietekat for reporting the issue!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226070 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 01:00:53 +00:00
Hal Finkel
ade705c6e5 Revert "r225811 - Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support""
This re-applies r225808, fixed to avoid problems with SDAG dependencies along
with the preceding fix to ScheduleDAGSDNodes::RegDefIter::InitNodeNumDefs.
These problems caused the original regression tests to assert/segfault on many
(but not all) systems.

Original commit message:

This commit does two things:

 1. Refactors PPCFastISel to use more of the common infrastructure for call
    lowering (this lets us take advantage of this common code for lowering some
    common intrinsics, stackmap/patchpoint among them).

 2. Adds support for stackmap/patchpoint lowering. For the most part, this is
    very similar to the support in the AArch64 target, with the obvious differences
    (different registers, NOP instructions, etc.). The test cases are adapted
    from the AArch64 test cases.

One difference of note is that the patchpoint call sequence takes 24 bytes, so
you can't use less than that (on AArch64 you can go down to 16). Also, as noted
in the docs, we take the patchpoint address to be the actual code address
(assuming the call is local in the TOC-sharing sense), which should yield
higher performance than generating the full cross-DSO indirect-call sequence
and is likely just as useful for JITed code (if not, we'll change it).

StackMaps and Patchpoints are still marked as experimental, and so this support
is doubly experimental. So go ahead and experiment!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225909 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-14 01:07:51 +00:00
Hal Finkel
ea55eceaed Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support"
Reverting this while I investiage buildbot failures (segfaulting in
GetCostForDef at ScheduleDAGRRList.cpp:314).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225811 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 18:25:05 +00:00
Hal Finkel
232f393466 [PowerPC] Add StackMap/PatchPoint support
This commit does two things:

 1. Refactors PPCFastISel to use more of the common infrastructure for call
    lowering (this lets us take advantage of this common code for lowering some
    common intrinsics, stackmap/patchpoint among them).

 2. Adds support for stackmap/patchpoint lowering. For the most part, this is
    very similar to the support in the AArch64 target, with the obvious differences
    (different registers, NOP instructions, etc.). The test cases are adapted
    from the AArch64 test cases.

One difference of note is that the patchpoint call sequence takes 24 bytes, so
you can't use less than that (on AArch64 you can go down to 16). Also, as noted
in the docs, we take the patchpoint address to be the actual code address
(assuming the call is local in the TOC-sharing sense), which should yield
higher performance than generating the full cross-DSO indirect-call sequence
and is likely just as useful for JITed code (if not, we'll change it).

StackMaps and Patchpoints are still marked as experimental, and so this support
is doubly experimental. So go ahead and experiment!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225808 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 17:48:12 +00:00
Olivier Sallenave
9dd21f4380 Added TLI hook for isFPExtFree. Some of the FMA combine heuristics are now guarded with that hook.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225795 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-13 15:06:36 +00:00
Hal Finkel
b6bb7db62b [PowerPC] Fix calls to non-function objects
Looking at r225438 inspired me to see how the PowerPC backend handled the
situation (calling a bitcasted TLS global), and it turns out we also produced
an error (cannot select ...). What it means to "call" something that is not a
function is implementation and platform specific, but in the name of doing
something (besides crashing), this makes sure we do what GCC does (treat all
such calls as calls through a function pointer -- meaning that the pointer is
assumed, as is the convention on PPC, to point to a function descriptor
structure holding the actual code address along with the function's TOC pointer
and environment pointer). As GCC does, we now do the same for calling regular
(non-TLS) non-function globals too.

I'm not sure whether this is the most useful way to define the behavior, but at
least we won't be alone.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225617 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-12 04:34:47 +00:00
Hal Finkel
9ae5b7a40a [PowerPC] Mark zext of a small scalar load as free
This initial implementation of PPCTargetLowering::isZExtFree marks as free
zexts of small scalar loads (that are not sign-extending). This callback is
used by SelectionDAGBuilder's RegsForValue::getCopyToRegs, and thus to
determine whether a zext or an anyext is used to lower illegally-typed PHIs.
Because later truncates of zero-extended values are nops, this allows for the
elimination of later unnecessary truncations.

Fixes the initial complaint associated with PR22120.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225584 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-10 08:21:59 +00:00
Justin Hibbits
8ab13c61ed Remove some whitespace.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225583 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-10 07:50:31 +00:00
Hal Finkel
4e98296890 [PowerPC] Fold [sz]ext with fp_to_int lowering where possible
On modern cores with lfiw[az]x, we can fold a sign or zero extension from i32
to i64 into the load necessary for an i64 -> fp conversion.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225493 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-09 01:34:30 +00:00
Ahmed Bougacha
7fac1d945f [SelectionDAG] Allow targets to specify legality of extloads' result
type (in addition to the memory type).

The *LoadExt* legalization handling used to only have one type, the
memory type.  This forced users to assume that as long as the extload
for the memory type was declared legal, and the result type was legal,
the whole extload was legal.

However, this isn't always the case.  For instance, on X86, with AVX,
this is legal:
    v4i32 load, zext from v4i8
but this isn't:
    v4i64 load, zext from v4i8
Whereas v4i64 is (arguably) legal, even without AVX2.

Note that the same thing was done a while ago for truncstores (r46140),
but I assume no one needed it yet for extloads, so here we go.

Calls to getLoadExtAction were changed to add the value type, found
manually in the surrounding code.

Calls to setLoadExtAction were mechanically changed, by wrapping the
call in a loop, to match previous behavior.  The loop iterates over
the MVT subrange corresponding to the memory type (FP vectors, etc...).
I also pulled neighboring setTruncStoreActions into some of the loops;
those shouldn't make a difference, as the additional types are illegal.
(e.g., i128->i1 truncstores on PPC.)

No functional change intended.

Differential Revision: http://reviews.llvm.org/D6532


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225421 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-08 00:51:32 +00:00
Ahmed Bougacha
8065738154 [CodeGen] Use MVT iterator_ranges in legality loops. NFC intended.
A few loops do trickier things than just iterating on an MVT subset,
so I'll leave them be for now.
Follow-up of r225387.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225392 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-07 21:27:10 +00:00
Hal Finkel
8e9ba0e588 [PowerPC] Reuse a load operand in int->fp conversions
int->fp conversions on PPC must be done through memory loads and stores. On a
modern core, this process begins by storing the int value to memory, then
loading it using a (sometimes special) FP load instruction. Unfortunately, we
would do this even when the value to be converted was itself a load, and we can
just use that same memory location instead of copying it to another first.
There is a slight complication when handling int_to_fp(fp_to_int(x)) pairs,
because the fp_to_int operand has not been lowered when the int_to_fp is being
lowered. We handle this specially by invoking fp_to_int's lowering logic
(partially) and getting the necessary memory location (some trivial refactoring
was done to make this possible).

This is all somewhat ugly, and it would be nice if some later CodeGen stage
could just clean this stuff up, but because doing so would involve modifying
target-specific nodes (or instructions), it is not immediately clear how that
would work.

Also, remove a related entry from the README.txt for which we now generate
reasonable code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225301 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 22:31:02 +00:00
Hal Finkel
7a7ed3cec2 [PowerPC] Add some missing names in getTargetNodeName
These are used for debugging output; NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225249 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 07:02:15 +00:00
Hal Finkel
10ae865847 [PowerPC] Improve int_to_fp(fp_to_int(x)) combining
The old target DAG combine that allowed for performing int_to_fp(fp_to_int(x))
without a load/store pair is updated here with support for unsigned integers,
and to support single-precision values without a third rounding step, on newer
cores with the appropriate instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225248 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-06 06:01:57 +00:00
Hal Finkel
e05b232c20 [PowerPC/BlockPlacement] Allow target to provide a per-loop alignment preference
The existing code provided for specifying a global loop alignment preference.
However, the preferred loop alignment might depend on the loop itself. For
recent POWER cores, loops between 5 and 8 instructions should have 32-byte
alignment (while the others are better with 16-byte alignment) so that the
entire loop will fit in one i-cache line.

To support this, getPrefLoopAlignment has been made virtual, and can be
provided with an optional MachineLoop* so the target can inspect the loop
before answering the query. The default behavior, as before, is to return the
value set with setPrefLoopAlignment. MachineBlockPlacement now queries the
target for each loop instead of only once per function. There should be no
functional change for other targets.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225117 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-03 17:58:24 +00:00
Hal Finkel
a1d22cc789 [PowerPC] Use 16-byte alignment for modern cores for functions/loops
Most modern PowerPC cores prefer that functions and loops start on
16-byte-aligned boundaries (*), so instruct block placement, etc. to make this
happen. The branch selector has also been adjusted so account for the extra
nops that might now be inserted before loop headers.

(*) Some cores actually prefer other alignments for small loops, but that will
    be addressed in a follow-up commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225115 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-03 14:58:25 +00:00
Hal Finkel
958b670c34 [PowerPC] Add support for the CMPB instruction
Newer POWER cores, and the A2, support the cmpb instruction. This instruction
compares its operands, treating each of the 8 bytes in the GPRs separately,
returning a 'mask' result of 0 (for false) or -1 (for true) in each byte.

Code generation support is added, in the form of a PPCISelDAGToDAG
DAG-preprocessing routine, that recognizes patterns close to what the
instruction computes (either exactly, or related by a constant masking
operation), and generates the cmpb instruction (along with any necessary
constant masking operation). This can be expanded if use cases arise.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@225106 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-03 01:16:37 +00:00