Commit Graph

4277 Commits

Author SHA1 Message Date
Tim Northover
ca7e0787f0 CodeGen: convert CCState interface to using ArrayRefs
Everyone except R600 was manually passing the length of a static array
at each callsite, calculated in a variety of interesting ways. Far
easier to let ArrayRef handle that.

There should be no functional change, but out of tree targets may have
to tweak their calls as with these examples.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230118 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-21 02:11:17 +00:00
Eric Christopher
d53224014d Fix an asan use-after-free bug introduced by the asm printer
changes to remove non-Function based subtargets out of the asm
printer. For module level emission we'll need to construct up
an MCSubtargetInfo so that we can encode instructions for
emission.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230050 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 19:54:07 +00:00
Eric Christopher
dd38f4e94d Remove a use of the Subtarget in the darwin ppc asm printer.
EmitFunctionStubs is called from doFinalization and so can't
depend on the Subtarget existing. It's also irrelevant as
we know we're darwin since we're in the darwin asm printer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230039 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 18:53:42 +00:00
Eric Christopher
6de800e056 Get the cached subtarget off the MachineFunction rather than
inquiring for a new one from the TargetMachine.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230037 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 18:44:15 +00:00
Kit Barton
3e00ca983c I incorrectly marked the VORC instruction as isCommutable when I added it.
This fix removes the VORC instruction definition from the isCommutable block.

Phabricator review: http://reviews.llvm.org/D7772


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230020 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 15:54:58 +00:00
Eric Christopher
3ce9f152e4 Get the cached subtarget off the MachineFunction rather than
inquiring for a new one from the TargetMachine.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229998 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 08:24:34 +00:00
Eric Christopher
c9d0715997 Make the TargetMachine::getSubtarget that takes a Function argument
take a reference to match the getSubtargetImpl that takes a Function
argument.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229994 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 07:32:59 +00:00
Hal Finkel
2c5f9584ba [PowerPC] Loop Data Prefetching for the BG/Q
The IBM BG/Q supercomputer's A2 cores have a hardware prefetching unit, the
L1P, but it does not prefetch directly into the A2's L1 cache. Instead, it
prefetches into its own L1P buffer, and the latency to access that buffer is
significantly higher than that to the L1 cache (although smaller than the
latency to the L2 cache). As a result, especially when multiple hardware
threads are not actively busy, explicitly prefetching data into the L1 cache is
advantageous.

I've been using this pass out-of-tree for data prefetching on the BG/Q for well
over a year, and it has worked quite well. It is enabled by default only for
the BG/Q, but can be enabled for other cores as well via a command-line option.

Eventually, we might want to add some TTI interfaces and move this into
Transforms/Scalar (there is nothing particularly target dependent about it,
although only machines like the BG/Q will benefit from its simplistic
strategy).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229966 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-20 05:08:21 +00:00
Kit Barton
31840a62af This patch adds the VSX logical instructions introduced in the Power ISA 2.07. It also removes the added complexity that favors VMX versions of the three instructions.
Phabricator review: http://reviews.llvm.org/D7616

Commiting on Nemanja's behalf.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229694 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-18 16:21:46 +00:00
Eric Christopher
2d460786b1 Make the PowerPC AsmPrinter independent of global subtarget
initialization. Initialize the subtarget once per function and
migrate EmitStartOfAsmFile to either use attributes on the
TargetMachine or get information from all of the various
subtargets.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229475 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 07:21:21 +00:00
Eric Christopher
c93485635f Add a FIXME to move IsLittleEndian to the target machine.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229472 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 06:45:17 +00:00
Eric Christopher
fb031eee53 Move ABI handling and 64-bitness to the PowerPC target machine.
This required changing how the computation of the ABI is handled
and how some of the checks for ABI/target are done.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229471 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-17 06:45:15 +00:00
Hal Finkel
ba51ae6864 [PowerPC] Support non-direct-sub/superclass VSX copies
Our register allocation has become better recently, it seems, and is now
starting to generate cross-block copies into inflated register classes. These
copies are not transformed into subregister insertions/extractions by the
PPCVSXCopy class, and so need to be handled directly by
PPCInstrInfo::copyPhysReg. The code to do this was *almost* there, but not
quite (it was unnecessarily restricting itself to only the direct
sub/super-register-class case (not copying between, for example, something in
VRRC and the lower-half of VSRC which are super-registers of F8RC).

Triggering this behavior manually is difficult; I'm including two
bugpoint-reduced test cases from the test suite.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229457 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-16 23:46:30 +00:00
Andrew Trick
4f7d60c1ea AArch64: Safely handle the incoming sret call argument.
This adds a safe interface to the machine independent InputArg struct
for accessing the index of the original (IR-level) argument. When a
non-native return type is lowered, we generate the hidden
machine-level sret argument on-the-fly. Before this fix, we were
representing this argument as OrigArgIndex == 0, which is an outright
lie. In particular this crashed in the AArch64 backend where we
actually try to access the type of the original argument.

Now we use a sentinel value for machine arguments that have no
original argument index. AArch64, ARM, Mips, and PPC now check for this
case before accessing the original argument.

Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229413 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-16 18:10:47 +00:00
Aaron Ballman
66981fe208 Removing LLVM_DELETED_FUNCTION, as MSVC 2012 was the last reason for requiring the macro. NFC; LLVM edition.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229340 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-15 22:54:22 +00:00
Chandler Carruth
96db150cec Remove a variable only used in an assert and sink its initializer into
the assert. Fixes -Wunused-variable on non-asserts builds.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229250 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-14 09:14:44 +00:00
Duncan P. N. Exon Smith
4eba62d6c2 PowerPC: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.

getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
  => getFnAttribute(Kind)

getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
  => hasFnAttribute(Kind)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229224 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-14 02:54:07 +00:00
Eric Christopher
40ccb77781 The base pointer save offset can be computed at initialization time,
do so and fix up the calls.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229169 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 22:48:53 +00:00
Eric Christopher
f7f32530e0 Move the target machine variable so that it's initialized early
enough we can use it to initialize frame lowering.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229168 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 22:48:51 +00:00
Eric Christopher
7b93acde32 Stash the TargetMachine on the subtarget so we can access it later.
Clean up a subtarget function that has it passed in while we're at it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229164 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 22:23:04 +00:00
Eric Christopher
4015892a84 PPC LinkageSize can be computed at initialization time, do so.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229163 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 22:22:57 +00:00
Chandler Carruth
417c5c172c [PM] Remove the old 'PassManager.h' header file at the top level of
LLVM's include tree and the use of using declarations to hide the
'legacy' namespace for the old pass manager.

This undoes the primary modules-hostile change I made to keep
out-of-tree targets building. I sent an email inquiring about whether
this would be reasonable to do at this phase and people seemed fine with
it, so making it a reality. This should allow us to start bootstrapping
with modules to a certain extent along with making it easier to mix and
match headers in general.

The updates to any code for users of LLVM are very mechanical. Switch
from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
Qualify the types which now produce compile errors with "legacy::". The
most common ones are "PassManager", "PassManagerBase", and
"FunctionPassManager".

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229094 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 10:01:29 +00:00
Chandler Carruth
02d6288667 Re-sort #include lines using my handy dandy ./utils/sort_includes.py
script. This is in preparation for changes to lots of include lines.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@229088 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 09:09:03 +00:00
Eric Christopher
2b4e1b9abf PPCFrameLowering's FramePointerOffset can be computed at initialization
time. Do so.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228998 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 00:39:38 +00:00
Eric Christopher
82eeeb5b94 The TOC save offset can be computed at compile time, do so and
propagate changes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228997 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 00:39:36 +00:00
Eric Christopher
b947233818 The return save offset can be computed at initialization time - do
so and save the value.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228996 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-13 00:39:27 +00:00
Olivier Sallenave
90e069dc29 Change max interleave factor to 12 for POWER7 and POWER8.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228973 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-12 22:57:58 +00:00
Benjamin Kramer
d913d9d2c3 MathExtras: Bring Count(Trailing|Leading)Ones and CountPopulation in line with countTrailingZeros
Update all callers.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228930 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-12 15:35:40 +00:00
Hal Finkel
091acf253b [PowerPC] Mark jumps as expensive (using using CR bits)
On PowerPC, which has a full set of logical operations on (its multiple sets
of) condition-register bits, it is not profitable to break of complex
conditions feeding a jump into multiple jumps. We can turn off this feature of
CGP/SDAGBuilder by marking jumps as "expensive".

P7 test-suite speedups (no regressions):
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2
	-0.626647% +/- 0.323583%
MultiSource/Benchmarks/Olden/power/power
	-18.2821% +/- 8.06481%

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228895 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-12 01:02:52 +00:00
Bill Schmidt
03e1afd8fe Fix up r228725, missed change in PPCSubtarget definition
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228728 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 19:31:55 +00:00
Bill Schmidt
49b3971b70 [PowerPC] Fix reverted patch r227976 to avoid register assignment issues
See full discussion in http://reviews.llvm.org/D7491.

We now hide the add-immediate and call instructions together in a
separate pseudo-op, which is tagged to define GPR3 and clobber the
call-killed registers.  The PPCTLSDynamicCall pass prior to RA now
expands this op into the two separate addi and call ops, with explicit
definitions of GPR3 on both instructions, and explicit clobbers on the
call instruction.  The pass is now marked as requiring and preserving
the LiveIntervals and SlotIndexes analyses, and fixes these up after
the replacement sequences are introduced.

Self-hosting has been verified on LE P8 and BE P7 with various
optimization levels, etc.  It has also been verified with the
--no-tls-optimize flag workaround removed.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228725 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 19:09:05 +00:00
Hal Finkel
241ede07b0 [PowerPC] Support the (old) cntlz instruction alias
Some old assembly code uses the cntlz alias for cntlzw, binutils supports this,
and we should too. Fixes PR22519.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228719 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 18:45:02 +00:00
Eric Christopher
cd641756c3 Migrate PPCAsmPrinter's subtarget from reference to pointer in
preparation for making it MachineFunction dependent.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228638 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 00:44:17 +00:00
Kit Barton
f60b0de42a This change implements the following three logical vector operations:
veqv (vector equivalence)
vnand
vorc
I increased the AddedComplexity for these instructions to 500 to ensure they are generated instead of issuing other VSX instructions.


Phabricator review: http://reviews.llvm.org/D7469


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228580 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-09 17:03:18 +00:00
Hal Finkel
05bd43dc6e [PowerPC] Handle loop predecessor invokes
If a loop predecessor has an invoke as its terminator, and the return value
from that invoke is used to determine the loop iteration space, then we can't
insert a computation based on that value in the loop predecessor prior to the
terminator (oops). If there's such an invoke, or just no predecessor for that
matter, insert a new loop preheader.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228488 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-07 07:32:58 +00:00
Hal Finkel
9168f717c9 Revert "r227976 - [PowerPC] Yet another approach to __tls_get_addr" and related fixups
Unfortunately, even with the workaround of disabling the linker TLS
optimizations in Clang restored (which has already been done), this still
breaks self-hosting on my P7 machine (-O3 -DNDEBUG -mcpu=native).

Bill is currently working on an alternate implementation to address the TLS
issue in a way that also fully elides the linker bug (which, unfortunately,
this approach did not fully), so I'm reverting this now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228460 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 23:07:40 +00:00
Benjamin Kramer
e003f1ac8c Make helper functions/classes/globals static. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228410 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 17:51:54 +00:00
Sylvestre Ledru
ec5eba025e Fix an incorrect identifier
Summary:
EIEIO is not a correct declaration and breaks the build under Debian HURD.
Instead, E_IEIO is used.

//
http://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html
Some additional classes of identifier names are reserved for future
extensions to the C language or the POSIX.1 environment. While using
these names for your own purposes right now might not cause a problem,
they do raise the possibility of conflict with future versions of the C
or POSIX standards, so you should avoid these names.
...
Names beginning with a capital ‘E’ followed a digit or uppercase letter
may be used for additional error code names. See Error Reporting.//

Reported here:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=776965
And patch wrote by Svante Signell 
With this patch, LLVM, Clang & LLDB build under Debian HURD:
https://buildd.debian.org/status/fetch.php?pkg=llvm-toolchain-3.6&arch=hurd-i386&ver=1%3A3.6~%2Brc2-2&stamp=1423040039

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7437

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228331 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 18:57:02 +00:00
Hal Finkel
b8a6712c27 [PowerPC] Prepare loops for pre-increment loads/stores
PowerPC supports pre-increment load/store instructions (except for Altivec/VSX
vector load/stores). Using these on embedded cores can be very important, but
most loops are not naturally set up to use them. We can often change that,
however, by placing loops into a non-canonical form. Generically, this means
transforming loops like this:

  for (int i = 0; i < n; ++i)
    array[i] = c;

to look like this:

  T *p = array[-1];
  for (int i = 0; i < n; ++i)
    *++p = c;

the key point is that addresses accessed are pulled into dedicated PHIs and
"pre-decremented" in the loop preheader. This allows the use of pre-increment
load/store instructions without loop peeling.

A target-specific late IR-level pass (running post-LSR), PPCLoopPreIncPrep, is
introduced to perform this transformation. I've used this code out-of-tree for
generating code for the PPC A2 for over a year. Somewhat to my surprise,
running the test suite + externals on a P7 with this transformation enabled
showed no performance regressions, and one speedup:

External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk
	-2.32514% +/- 1.03736%

So I'm going to enable it on everything for now. I was surprised by this
because, on the POWER cores, these pre-increment load/store instructions are
cracked (and, thus, harder to schedule effectively). But seeing no regressions,
and feeling that it is generally easier to split instructions apart late than
it is to combine them late, this might be the better approach regardless.

In the future, we might want to integrate this functionality into LSR (but
currently LSR does not create new PHI nodes, so (for that and other reasons)
significant work would need to be done).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228328 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 18:43:00 +00:00
Hal Finkel
885b67a5c3 [PowerPC] Generate pre-increment floating-point ld/st instructions
PowerPC supports pre-increment floating-point load/store instructions, both r+r
and r+i, and we had patterns for them, but they were not marked as legal. Mark
them as legal (and add a test case).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228327 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 18:42:53 +00:00
Bill Schmidt
202b6045bf [PowerPC] Implement the vclz instructions for PWR8
Patch by Kit Barton.

Add the vector count leading zeros instruction for byte, halfword,
word, and doubleword sizes.  This is a fairly straightforward addition
after the changes made for vpopcnt:

 1. Add the correct definitions for the various instructions in
    PPCInstrAltivec.td
 2. Make the CTLZ operation legal on vector types when using P8Altivec
    in PPCISelLowering.cpp 

Test Plan

Created new test case in test/CodeGen/PowerPC/vec_clz.ll to check the
instructions are being generated when the CTLZ operation is used in
LLVM.

Check the encoding and decoding in test/MC/PowerPC/ppc_encoding_vmx.s
and test/Disassembler/PowerPC/ppc_encoding_vmx.txt respectively.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228301 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 15:24:47 +00:00
Bill Schmidt
4351f76f81 Replace tabs with spaces from r228116. Oops.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228117 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 06:14:38 +00:00
Bill Schmidt
89e8a17b4d [PowerPC] Handle 32-bit targets properly in PPCTLSDynamicCall.cpp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228116 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 05:51:56 +00:00
Bill Schmidt
8c775a4e7b [PowerPC] Implement the vpopcnt instructions for POWER8
Patch by Kit Barton.

Add the vector population count instructions for byte, halfword, word,
and doubleword sizes.  There are two major changes here:

    PPCISelLowering.cpp: Make CTPOP legal for vector types.
    PPCRegisterInfo.td: Added v2i64 to the VRRC register
      definition. This is needed for the doubleword variations of the
      integer ops that were added in P8. 

Test Plan

Test the instruction vpcnt* encoding/decoding in ppc64-encoding-vmx.s

Test the generation of the vpopcnt instructions for various vector
data types.  When adding the v2i64 type to the Vector Register set, I
also needed to add the appropriate bit conversion patterns between
v2i64 and the existing vector types.  Testing for these conversions
were also added in the test case by passing a different vector type as
a parameter into the test functions.  There is also a run step that
will ensure the vpopcnt instructions are generated when the vsx
feature is disabled.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228046 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 21:58:23 +00:00
Bill Schmidt
1123a81009 [PowerPC] Yet another approach to __tls_get_addr
This patch is a third attempt to properly handle the local-dynamic and
global-dynamic TLS models.

In my original implementation, calls to __tls_get_addr were hidden
from view until the asm-printer phase, at which point the underlying
branch-and-link instruction was created with proper relocations.  This
mostly worked well, but I used some repellent techniques to ensure
that the TLS_GET_ADDR nodes at the SD and MI levels correctly received
input from GPR3 and produced output into GPR3.  This proved to work
badly in the presence of multiple TLS variable accesses, with the
copies to and from GPR3 being scheduled incorrectly and generally
creating havoc.

In r221703, I addressed that problem by representing the calls to
__tls_get_addr as true calls during instruction lowering.  This had
the advantage of removing all of the bad hacks and relying on the
existing call machinery to properly glue the copies in place. It
looked like this was going to be the right way to go.

However, as a side effect of the recent discovery of problems with
linker optimizations for TLS, we discovered cases of suboptimal code
generation with this strategy.  The problem comes when tls_get_addr is
called for the same address, and there is a resulting CSE
opportunity.  It turns out that in such cases MachineCSE will common
the addis/addi instructions that set up the input value to
tls_get_addr, but will not common the calls themselves.  MachineCSE
does not have any machinery to common idempotent calls.  This is
perfectly sensible, since presumably this would be done at the IR
level, and introducing calls in the back end isn't commonplace.  In
any case, we end up with two calls to __tls_get_addr when one would
suffice, and that isn't good.

I presumed that the original design would have allowed commoning of
the machine-specific nodes that hid the __tls_get_addr calls, so as
suggested by Ulrich Weigand, I went back to that design and cleaned it
up so that the copies were properly held together by glue
nodes.  However, it turned out that this didn't work either...the
presence of copies to physical registers kept the machine-specific
nodes from being commoned also.

All of which leads to the design presented here.  This is a return to
the original design, except that no attempt is made to introduce
copies to and from GPR3 during instruction lowering.  Virtual registers
are used until prior to register allocation.  At that point, a special
pass is run that identifies the machine-specific nodes that hide the
tls_get_addr calls and introduces the copies to and from GPR3 around
them.  The register allocator then coalesces these copies away.  With
this design, MachineCSE succeeds in commoning tls_get_addr calls where
possible, and we get nice optimal code generation (better than GCC at
the moment, which does not common these calls).

One additional problem must be dealt with:  After introducing the
mentions of the physical register GPR3, the aggressive anti-dependence
breaker sees opportunities to improve scheduling by selecting a
different register instead.  Flags must be used on the instruction
descriptions to tell the anti-dependence breaker to keep its hands in
its pockets.

One thing missing from the original design was recording a definition
of the link register on the GET_TLS_ADDR nodes.  Doing this was found
to be insufficient to force a stack frame to be created, which led to
looping behavior because two different LR values were stored at the
same address.  This appears to have been an oversight in
PPCFrameLowering::determineFrameLayout(), which is repaired here.

Because MustSaveLR() returns true for calls to builtin_return_address,
this changed the expected behavior of
test/CodeGen/PowerPC/retaddr2.ll, which now stacks a frame but
formerly did not.  I've fixed the test case to reflect this.

There are existing TLS tests to catch regressions; the checks in
test/CodeGen/PowerPC/tls-store2.ll proved to be too restrictive in the
face of instruction scheduling with these changes, so I fixed that
up.

I've added a new test case based on the PrettyStackTrace module that
demonstrated the original problem. This checks that we get correct
code generation and that CSE of the calls to __get_tls_addr has taken
place.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227976 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 16:16:01 +00:00
Hal Finkel
a5c1f106b1 [PowerPC] Put PPCEarlyReturn into its own source file
PPCInstrInfo.cpp has ended up containing several small MI-level passes, and
this is making the file harder to read than necessary. Split out
PPCEarlyReturn into its own source file. NFC.

Now that PPCInstrInfo.cpp does not also contain pass implementations, I hope
that it will be slightly less unwieldy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227775 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 22:58:46 +00:00
Hal Finkel
c4b84657f3 [PowerPC] Remove unnecessary include
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227772 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 22:03:13 +00:00
Hal Finkel
2a9d9584b4 [PowerPC] Put PPCVSXCopy into its own source file
PPCInstrInfo.cpp has ended up containing several small MI-level passes, and
this is making the file harder to read than necessary. Split out
PPCVSXCopy into its own source file. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227771 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 22:01:29 +00:00
Hal Finkel
dad591a435 [PowerPC] Put PPCVSXFMAMutate into its own source file
PPCInstrInfo.cpp has ended up containing several small MI-level passes, and
this is making the file harder to read than necessary. Split out
PPCVSXFMAMutate into its own source file. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227770 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 21:51:22 +00:00
Hal Finkel
2767054823 [PowerPC] Remove the PPCVSXCopyCleanup pass
This MI-level pass was necessary when VSX support was first being developed,
specifically, before the ABI code had been updated to use VSX registers for
arguments (the register assignments did not change, in a physical sense, but
the VSX super-registers are now used). Unfortunately, I never went back and
removed this pass after that was done. I believe this code is now effectively
dead.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227767 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 21:20:58 +00:00