llvm-6502/test/CodeGen
Hal Finkel 94dc061e85 [PowerPC] Loosen ELFv1 PPC64 func descriptor loads for indirect calls
Function pointers under PPC64 ELFv1 (which is used on PPC64/Linux on the
POWER7, A2 and earlier cores) are really pointers to a function descriptor, a
structure with three pointers: the actual pointer to the code to which to jump,
the pointer to the TOC needed by the callee, and an environment pointer. We
used to chain these loads, and make them opaque to the rest of the optimizer,
so that they'd always occur directly before the call. This is not necessary,
and in fact, highly suboptimal on embedded cores. Once the function pointer is
known, the loads can be performed ahead of time; in fact, they can be hoisted
out of loops.

Now these function descriptors are almost always generated by the linker, and
thus the contents of the descriptors are invariant. As a result, by default,
we'll mark the associated loads as invariant (allowing them to be hoisted out
of loops). I've added a target feature to turn this off, however, just in case
someone needs that option (constructing an on-stack descriptor, casting it to a
function pointer, and then calling it cannot be well-defined C/C++ code, but I
can imagine some JIT-compilation system doing so).

Consider this simple test:
  $ cat call.c

  typedef void (*fp)();
  void bar(fp x) {
    for (int i = 0; i < 1600000000; ++i)
      x();
  }

  $ cat main.c

  typedef void (*fp)();
  void bar(fp x);
  void foo() {}
  int main() {
    bar(foo);
  }

On the PPC A2 (the BG/Q supercomputer), marking the function-descriptor loads
as invariant brings the execution time down to ~8 seconds from ~32 seconds with
the loads in the loop.

The difference on the POWER7 is smaller. Compiling with:

  gcc -std=c99 -O3 -mcpu=native call.c main.c : ~6 seconds [this is 4.8.2]

  clang -O3 -mcpu=native call.c main.c : ~5.3 seconds

  clang -O3 -mcpu=native call.c main.c -mno-invariant-function-descriptors : ~4 seconds
  (looks like we'd benefit from additional loop unrolling here, as a first
   guess, because this is faster with the extra loads)

The -mno-invariant-function-descriptors will be added to Clang shortly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226207 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-15 21:17:34 +00:00
..
AArch64 IR: Move MDLocation into place 2015-01-14 22:27:36 +00:00
ARM Revert "r226086 - Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers"" 2015-01-15 20:32:09 +00:00
CPP
Generic getMangledTypeStr: clarify how it mangles types, and add tests 2015-01-14 23:05:17 +00:00
Hexagon [Hexagon] Updating indexed load-extend patterns and changing test to new expected output. 2015-01-15 21:07:52 +00:00
Inputs IR: Move MDLocation into place 2015-01-14 22:27:36 +00:00
Mips [mips] Fix a typo in the compare patterns for MIPS32r6/MIPS64r6. 2015-01-15 15:41:03 +00:00
MSP430
NVPTX Check that the TLI callback enableAggressiveFMAFusion has the desired effect on FMA folding. 2015-01-14 15:36:28 +00:00
PowerPC [PowerPC] Loosen ELFv1 PPC64 func descriptor loads for indirect calls 2015-01-15 21:17:34 +00:00
R600 R600/SI: Improve fpext / fptrunc test coverage 2015-01-15 19:39:42 +00:00
SPARC Use the integrated assembler by default on SPARC. 2015-01-14 07:53:39 +00:00
SystemZ Use the integrated assembler as default on SystemZ 2015-01-13 19:45:16 +00:00
Thumb IR: Move MDLocation into place 2015-01-14 22:27:36 +00:00
Thumb2 [ARM] Fix a bug in constant island pass that was triggering an assertion. 2015-01-08 20:44:50 +00:00
X86 Revert "r226086 - Revert "r226071 - [RegisterCoalescer] Remove copies to reserved registers"" 2015-01-15 20:32:09 +00:00
XCore IR: Move MDLocation into place 2015-01-14 22:27:36 +00:00