Commit Graph

4245 Commits

Author SHA1 Message Date
Eric Christopher
cd641756c3 Migrate PPCAsmPrinter's subtarget from reference to pointer in
preparation for making it MachineFunction dependent.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228638 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-10 00:44:17 +00:00
Kit Barton
f60b0de42a This change implements the following three logical vector operations:
veqv (vector equivalence)
vnand
vorc
I increased the AddedComplexity for these instructions to 500 to ensure they are generated instead of issuing other VSX instructions.


Phabricator review: http://reviews.llvm.org/D7469


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228580 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-09 17:03:18 +00:00
Hal Finkel
05bd43dc6e [PowerPC] Handle loop predecessor invokes
If a loop predecessor has an invoke as its terminator, and the return value
from that invoke is used to determine the loop iteration space, then we can't
insert a computation based on that value in the loop predecessor prior to the
terminator (oops). If there's such an invoke, or just no predecessor for that
matter, insert a new loop preheader.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228488 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-07 07:32:58 +00:00
Hal Finkel
9168f717c9 Revert "r227976 - [PowerPC] Yet another approach to __tls_get_addr" and related fixups
Unfortunately, even with the workaround of disabling the linker TLS
optimizations in Clang restored (which has already been done), this still
breaks self-hosting on my P7 machine (-O3 -DNDEBUG -mcpu=native).

Bill is currently working on an alternate implementation to address the TLS
issue in a way that also fully elides the linker bug (which, unfortunately,
this approach did not fully), so I'm reverting this now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228460 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 23:07:40 +00:00
Benjamin Kramer
e003f1ac8c Make helper functions/classes/globals static. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228410 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-06 17:51:54 +00:00
Sylvestre Ledru
ec5eba025e Fix an incorrect identifier
Summary:
EIEIO is not a correct declaration and breaks the build under Debian HURD.
Instead, E_IEIO is used.

//
http://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html
Some additional classes of identifier names are reserved for future
extensions to the C language or the POSIX.1 environment. While using
these names for your own purposes right now might not cause a problem,
they do raise the possibility of conflict with future versions of the C
or POSIX standards, so you should avoid these names.
...
Names beginning with a capital ‘E’ followed a digit or uppercase letter
may be used for additional error code names. See Error Reporting.//

Reported here:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=776965
And patch wrote by Svante Signell 
With this patch, LLVM, Clang & LLDB build under Debian HURD:
https://buildd.debian.org/status/fetch.php?pkg=llvm-toolchain-3.6&arch=hurd-i386&ver=1%3A3.6~%2Brc2-2&stamp=1423040039

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7437

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228331 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 18:57:02 +00:00
Hal Finkel
b8a6712c27 [PowerPC] Prepare loops for pre-increment loads/stores
PowerPC supports pre-increment load/store instructions (except for Altivec/VSX
vector load/stores). Using these on embedded cores can be very important, but
most loops are not naturally set up to use them. We can often change that,
however, by placing loops into a non-canonical form. Generically, this means
transforming loops like this:

  for (int i = 0; i < n; ++i)
    array[i] = c;

to look like this:

  T *p = array[-1];
  for (int i = 0; i < n; ++i)
    *++p = c;

the key point is that addresses accessed are pulled into dedicated PHIs and
"pre-decremented" in the loop preheader. This allows the use of pre-increment
load/store instructions without loop peeling.

A target-specific late IR-level pass (running post-LSR), PPCLoopPreIncPrep, is
introduced to perform this transformation. I've used this code out-of-tree for
generating code for the PPC A2 for over a year. Somewhat to my surprise,
running the test suite + externals on a P7 with this transformation enabled
showed no performance regressions, and one speedup:

External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk
	-2.32514% +/- 1.03736%

So I'm going to enable it on everything for now. I was surprised by this
because, on the POWER cores, these pre-increment load/store instructions are
cracked (and, thus, harder to schedule effectively). But seeing no regressions,
and feeling that it is generally easier to split instructions apart late than
it is to combine them late, this might be the better approach regardless.

In the future, we might want to integrate this functionality into LSR (but
currently LSR does not create new PHI nodes, so (for that and other reasons)
significant work would need to be done).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228328 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 18:43:00 +00:00
Hal Finkel
885b67a5c3 [PowerPC] Generate pre-increment floating-point ld/st instructions
PowerPC supports pre-increment floating-point load/store instructions, both r+r
and r+i, and we had patterns for them, but they were not marked as legal. Mark
them as legal (and add a test case).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228327 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 18:42:53 +00:00
Bill Schmidt
202b6045bf [PowerPC] Implement the vclz instructions for PWR8
Patch by Kit Barton.

Add the vector count leading zeros instruction for byte, halfword,
word, and doubleword sizes.  This is a fairly straightforward addition
after the changes made for vpopcnt:

 1. Add the correct definitions for the various instructions in
    PPCInstrAltivec.td
 2. Make the CTLZ operation legal on vector types when using P8Altivec
    in PPCISelLowering.cpp 

Test Plan

Created new test case in test/CodeGen/PowerPC/vec_clz.ll to check the
instructions are being generated when the CTLZ operation is used in
LLVM.

Check the encoding and decoding in test/MC/PowerPC/ppc_encoding_vmx.s
and test/Disassembler/PowerPC/ppc_encoding_vmx.txt respectively.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228301 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-05 15:24:47 +00:00
Bill Schmidt
4351f76f81 Replace tabs with spaces from r228116. Oops.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228117 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 06:14:38 +00:00
Bill Schmidt
89e8a17b4d [PowerPC] Handle 32-bit targets properly in PPCTLSDynamicCall.cpp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228116 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-04 05:51:56 +00:00
Bill Schmidt
8c775a4e7b [PowerPC] Implement the vpopcnt instructions for POWER8
Patch by Kit Barton.

Add the vector population count instructions for byte, halfword, word,
and doubleword sizes.  There are two major changes here:

    PPCISelLowering.cpp: Make CTPOP legal for vector types.
    PPCRegisterInfo.td: Added v2i64 to the VRRC register
      definition. This is needed for the doubleword variations of the
      integer ops that were added in P8. 

Test Plan

Test the instruction vpcnt* encoding/decoding in ppc64-encoding-vmx.s

Test the generation of the vpopcnt instructions for various vector
data types.  When adding the v2i64 type to the Vector Register set, I
also needed to add the appropriate bit conversion patterns between
v2i64 and the existing vector types.  Testing for these conversions
were also added in the test case by passing a different vector type as
a parameter into the test functions.  There is also a run step that
will ensure the vpopcnt instructions are generated when the vsx
feature is disabled.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@228046 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 21:58:23 +00:00
Bill Schmidt
1123a81009 [PowerPC] Yet another approach to __tls_get_addr
This patch is a third attempt to properly handle the local-dynamic and
global-dynamic TLS models.

In my original implementation, calls to __tls_get_addr were hidden
from view until the asm-printer phase, at which point the underlying
branch-and-link instruction was created with proper relocations.  This
mostly worked well, but I used some repellent techniques to ensure
that the TLS_GET_ADDR nodes at the SD and MI levels correctly received
input from GPR3 and produced output into GPR3.  This proved to work
badly in the presence of multiple TLS variable accesses, with the
copies to and from GPR3 being scheduled incorrectly and generally
creating havoc.

In r221703, I addressed that problem by representing the calls to
__tls_get_addr as true calls during instruction lowering.  This had
the advantage of removing all of the bad hacks and relying on the
existing call machinery to properly glue the copies in place. It
looked like this was going to be the right way to go.

However, as a side effect of the recent discovery of problems with
linker optimizations for TLS, we discovered cases of suboptimal code
generation with this strategy.  The problem comes when tls_get_addr is
called for the same address, and there is a resulting CSE
opportunity.  It turns out that in such cases MachineCSE will common
the addis/addi instructions that set up the input value to
tls_get_addr, but will not common the calls themselves.  MachineCSE
does not have any machinery to common idempotent calls.  This is
perfectly sensible, since presumably this would be done at the IR
level, and introducing calls in the back end isn't commonplace.  In
any case, we end up with two calls to __tls_get_addr when one would
suffice, and that isn't good.

I presumed that the original design would have allowed commoning of
the machine-specific nodes that hid the __tls_get_addr calls, so as
suggested by Ulrich Weigand, I went back to that design and cleaned it
up so that the copies were properly held together by glue
nodes.  However, it turned out that this didn't work either...the
presence of copies to physical registers kept the machine-specific
nodes from being commoned also.

All of which leads to the design presented here.  This is a return to
the original design, except that no attempt is made to introduce
copies to and from GPR3 during instruction lowering.  Virtual registers
are used until prior to register allocation.  At that point, a special
pass is run that identifies the machine-specific nodes that hide the
tls_get_addr calls and introduces the copies to and from GPR3 around
them.  The register allocator then coalesces these copies away.  With
this design, MachineCSE succeeds in commoning tls_get_addr calls where
possible, and we get nice optimal code generation (better than GCC at
the moment, which does not common these calls).

One additional problem must be dealt with:  After introducing the
mentions of the physical register GPR3, the aggressive anti-dependence
breaker sees opportunities to improve scheduling by selecting a
different register instead.  Flags must be used on the instruction
descriptions to tell the anti-dependence breaker to keep its hands in
its pockets.

One thing missing from the original design was recording a definition
of the link register on the GET_TLS_ADDR nodes.  Doing this was found
to be insufficient to force a stack frame to be created, which led to
looping behavior because two different LR values were stored at the
same address.  This appears to have been an oversight in
PPCFrameLowering::determineFrameLayout(), which is repaired here.

Because MustSaveLR() returns true for calls to builtin_return_address,
this changed the expected behavior of
test/CodeGen/PowerPC/retaddr2.ll, which now stacks a frame but
formerly did not.  I've fixed the test case to reflect this.

There are existing TLS tests to catch regressions; the checks in
test/CodeGen/PowerPC/tls-store2.ll proved to be too restrictive in the
face of instruction scheduling with these changes, so I fixed that
up.

I've added a new test case based on the PrettyStackTrace module that
demonstrated the original problem. This checks that we get correct
code generation and that CSE of the calls to __get_tls_addr has taken
place.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227976 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-03 16:16:01 +00:00
Hal Finkel
a5c1f106b1 [PowerPC] Put PPCEarlyReturn into its own source file
PPCInstrInfo.cpp has ended up containing several small MI-level passes, and
this is making the file harder to read than necessary. Split out
PPCEarlyReturn into its own source file. NFC.

Now that PPCInstrInfo.cpp does not also contain pass implementations, I hope
that it will be slightly less unwieldy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227775 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 22:58:46 +00:00
Hal Finkel
c4b84657f3 [PowerPC] Remove unnecessary include
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227772 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 22:03:13 +00:00
Hal Finkel
2a9d9584b4 [PowerPC] Put PPCVSXCopy into its own source file
PPCInstrInfo.cpp has ended up containing several small MI-level passes, and
this is making the file harder to read than necessary. Split out
PPCVSXCopy into its own source file. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227771 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 22:01:29 +00:00
Hal Finkel
dad591a435 [PowerPC] Put PPCVSXFMAMutate into its own source file
PPCInstrInfo.cpp has ended up containing several small MI-level passes, and
this is making the file harder to read than necessary. Split out
PPCVSXFMAMutate into its own source file. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227770 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 21:51:22 +00:00
Hal Finkel
2767054823 [PowerPC] Remove the PPCVSXCopyCleanup pass
This MI-level pass was necessary when VSX support was first being developed,
specifically, before the ABI code had been updated to use VSX registers for
arguments (the register assignments did not change, in a physical sense, but
the VSX super-registers are now used). Unfortunately, I never went back and
removed this pass after that was done. I believe this code is now effectively
dead.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227767 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 21:20:58 +00:00
Hal Finkel
604e1b770c [PowerPC] Add implicit ops to conditional returns in PPCEarlyReturn
When PPCEarlyReturn, it should really copy implicit ops from the old return
instruction to the new one. This currently does not matter much, because we run
PPCEarlyReturn very late in the pipeline (there is nothing to do DCE on
definitions of those registers). However, for completeness, we should do it
anyway.

Noticed by inspection (and there should be no functional change); thus, no
test case.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227763 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 20:16:10 +00:00
Hal Finkel
3bafb64914 [PowerPC] VSX stores don't also read
The VSX store instructions were also picking up an implicit "may read" from the
default pattern, which was an intrinsic (and we don't currently have a way of
specifying write-only intrinsics).

This was causing MI verification to fail for VSX spill restores.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227759 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 19:07:41 +00:00
Hal Finkel
ec716cecda [PowerPC] Better scheduling for isel on P7/P8
isel is actually a cracked instruction on the P7/P8, and must start a dispatch
group. The scheduling model should reflect this so that we don't bunch too many
of them together when possible.

Thanks to Bill Schmidt and Pat Haugen for helping to sort this out.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227758 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 17:52:16 +00:00
Hal Finkel
8f5c829c1e [PowerPC] Make r2 allocatable on PPC64/ELF for some leaf functions
The TOC base pointer is passed in r2, and we normally reserve this register so
that we can depend on it being there. However, for leaf functions, and
specifically those leaf functions that don't do any TOC access of their own
(which is generally due to accessing the constant pool, using TLS, etc.),
we can treat r2 as an ordinary callee-saved register (it must be callee-saved
because, for local direct calls, the linker will not insert any save/restore
code).

The allocation order has been changed slightly for PPC64/ELF systems to put r2
at the end of the list (while leaving it near the beginning for Darwin systems
to prevent unnecessary output changes). While r2 is allocatable, using it still
requires spill/restore traffic, and thus comes at the end of the list.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227745 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 15:03:28 +00:00
Chandler Carruth
26bc071088 [multiversion] Remove the function parameter from the unrolling
preferences interface on TTI now that all of TTI is per-function.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227741 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 14:31:23 +00:00
Chandler Carruth
b71d385494 [multiversion] Switch the TTI queries from TargetMachine to Subtarget
now that we have a correct and cached subtarget specific to the
function.

Also, finish providing a cached per-function subtarget in the core
LLVMTargetMachine -- that layer hadn't switched over yet.

The only use of the TargetMachine was to re-lookup a subtarget for
a particular function to work around the fact that TTI was immutable.
Now that it is per-function and we haved a cached subtarget, use it.

This still leaves a few interfaces with real warts on them where we were
passing Function objects through the TTI interface. I'll remove these
and clean their usage up in subsequent commits now that this isn't
necessary.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227738 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 14:22:17 +00:00
Chandler Carruth
d0bfb83efb [multiversion] Remove the cached TargetMachine pointer from the
intermediate TTI implementation template and instead query up to the
derived class for both the TargetMachine and the TargetLowering.

Most of the derived types had a TLI cached already and there is no need
to store a less precisely typed target machine pointer.

This will in turn make it much cleaner to look up the TLI via
a per-function subtarget instead of the generic subtarget, and it will
pave the way toward pulling the subtarget used for unroll preferences
into the same form once we are *always* using the function to look up
the correct subtarget.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227737 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 14:01:15 +00:00
Chandler Carruth
6e89e1316a [multiversion] Switch all of the targets over to use the
TargetIRAnalysis access path directly rather than implementing getTTI.

This even removes getTTI from the interface. It's more efficient for
each target to just register a precise callback that creates their
specific TTI.

As part of this, all of the targets which are building their subtargets
individually per-function now build their TTI instance with the function
and thus look up the correct subtarget and cache it. NVPTX, R600, and
XCore currently don't leverage this functionality, but its trivial for
them to add it now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227735 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 13:20:00 +00:00
Chandler Carruth
d12af8754e [multiversion] Remove a false freedom to leave the TargetMachine pointer
null.

For some reason some of the original TTI code supported a null target
machine. This seems to have been legacy, and I made matters worse when
refactoring this code by spreading that pattern further through the
various targets.

The TargetMachine can't actually be null, and it doesn't make sense to
support that use case. I've now consistently removed it and removed all
of the code trying to cope with that situation. This is probably good,
as several targets *didn't* cope with it being null despite the null
default argument in their constructors. =]

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227734 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 12:38:24 +00:00
Chandler Carruth
685c2add65 [PM] Remove a bunch of stale TTI creation method declarations. I nuked
their definitions, but forgot to clean up all the declarations which are
in different files.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227698 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 00:22:15 +00:00
Chandler Carruth
1937233a22 [PM] Switch the TargetMachine interface from accepting a pass manager
base which it adds a single analysis pass to, to instead return the type
erased TargetTransformInfo object constructed for that TargetMachine.

This removes all of the pass variants for TTI. There is now a single TTI
*pass* in the Analysis layer. All of the Analysis <-> Target
communication is through the TTI's type erased interface itself. While
the diff is large here, it is nothing more that code motion to make
types available in a header file for use in a different source file
within each target.

I've tried to keep all the doxygen comments and file boilerplate in line
with this move, but let me know if I missed anything.

With this in place, the next step to making TTI work with the new pass
manager is to introduce a really simple new-style analysis that produces
a TTI object via a callback into this routine on the target machine.
Once we have that, we'll have the building blocks necessary to accept
a function argument as well.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227685 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-31 11:17:59 +00:00
Chandler Carruth
a6a87b595d [PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.

The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.

I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.

There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.

The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.

Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.

The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]

Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:

1) Improving the TargetMachine interface by having it directly return
   a TTI object. Because we have a non-pass object with value semantics
   and an internal type erasure mechanism, we can narrow the interface
   of the TargetMachine to *just* do what we need: build and return
   a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
   This will include splitting off a minimal form of it which is
   sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
   target machine for each function. This may actually be done as part
   of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
   easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
   easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
   just a bit messy and exacerbating the complexity of implementing
   the TTI in each target.

Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.

Differential Revision: http://reviews.llvm.org/D7293

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227669 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-31 03:43:40 +00:00
Eric Christopher
e4c7a8188f Remove unused function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227624 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-30 22:02:36 +00:00
Eric Christopher
bc08db4cba Remove extraneous forward declaration.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227623 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-30 22:02:34 +00:00
Eric Christopher
5882e66f5b Use the cached subtargets and remove calls to getSubtarget/getSubtargetImpl
without a Function argument.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227622 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-30 22:02:31 +00:00
Eric Christopher
dd5e9f624b Use the cached subtarget in PPCFrameLowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227548 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-30 02:11:26 +00:00
Eric Christopher
87dd120c6a Migrate some of PPC away from the use of bare getSubtarget/getSubtargetImpl.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227547 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-30 02:11:24 +00:00
Eric Christopher
31f58f2c74 Migrage PPCRegisterInfo to use the cached subtarget.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227546 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-30 02:11:21 +00:00
Rafael Espindola
9936b80df5 Compute the ELF SectionKind from the flags.
Any code creating an MCSectionELF knows ELF and already provides the flags.

SectionKind is an abstraction used by common code that uses a plain
MCSection.

Use the flags to compute the SectionKind. This removes a lot of
guessing and boilerplate from the MCSectionELF construction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227476 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 17:33:21 +00:00
Bill Schmidt
a5ea0b50a4 [PowerPC] Complete setting the baseline for ppc64le
Patch by Nemanja Ivanovic.

As was uncovered by the failing test case (when run on non-PPC
platforms), the feature set when compiling with -march=ppc64le was not
being picked up. This change ensures that if the -mcpu option is not
specified, the correct feature set is picked up regardless of whether
we are on PPC or not.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227455 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-29 15:59:09 +00:00
Eric Christopher
04bcc11905 Move DataLayout back to the TargetMachine from TargetSubtargetInfo
derived classes.

Since global data alignment, layout, and mangling is often based on the
DataLayout, move it to the TargetMachine. This ensures that global
data is going to be layed out and mangled consistently if the subtarget
changes on a per function basis. Prior to this all targets(*) have
had subtarget dependent code moved out and onto the TargetMachine.

*One target hasn't been migrated as part of this change: R600. The
R600 port has, as a subtarget feature, the size of pointers and
this affects global data layout. I've currently hacked in a FIXME
to enable progress, but the port needs to be updated to either pass
the 64-bitness to the TargetMachine, or fix the DataLayout to
avoid subtarget dependent features.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227113 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 19:03:15 +00:00
Bill Schmidt
c536eed4d8 [PowerPC] Reset the baseline for ppc64le to be equivalent to pwr8
Test by Nemanja Ivanovic.

Since ppc64le implies POWER8 as a minimum, it makes sense that the
same features are included. Since the pwr8 processor model will likely
be getting new features until the implementation is complete, I
created a new list to add these updates to. This will include them in
both pwr8 and ppc64le.

Furthermore, it seems that it would make sense to compose the feature
lists for other processor models (pwr3 and up). Per discussion in the
review, I will make this change in a subsequent patch.

In order to test the changes, I've added an additional run step to
test cases that specify -march=ppc64le -mcpu=pwr8 to omit the -mcpu
option. Since the feature lists are the same, the behaviour should be
unchanged.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227053 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-25 18:05:42 +00:00
Rafael Espindola
a23cc6a1ea Add r224985 back with fixes.
The fixes are to note that AArch64 has additional restrictions on when local
relocations can be used. In particular, ld64 requires that relocations to
cstring/cfstrings use linker visible symbols.

Original message:

In an assembly expression like

bar:
  .long L0 + 1

the intended semantics is that bar will contain a pointer one byte past L0.

In sections that are merged by content (strings, 4 byte constants, etc), a
single position in the section doesn't give the linker enough information.
For example, it would not be able to tell a relocation must point to the
end of a string, since that would look just like the start of the next.

The solution used in ELF to use relocation with symbols if there is a non-zero
addend.

In MachO before this patch we would just keep all symbols in some sections.

This would miss some cases (only cstrings on x86_64 were implemented) and was
inefficient since most relocations have an addend of 0 and can be represented
without the symbol.

This patch implements the non-zero addend logic for MachO too.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226503 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 21:11:14 +00:00
Hal Finkel
e6e4c786b1 [PowerPC] Minor correction to r226432
We don't need to exclude patchpoints from the implicit r2 dependence in
FastISel because it is added as an implicit operand and, thus, should not
confuse that StackMap code.

By inspection / no test case.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226434 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 07:44:45 +00:00
Hal Finkel
d1f1656447 [PowerPC] Add r2 as an operand for all calls under both PPC64 ELF V1 and V2
Our PPC64 ELF V2 call lowering logic added r2 as an operand to all direct call
instructions in order to represent the dependency on the TOC base pointer
value. Restricting this to ELF V2, however, does not seem to make sense: calls
under ELF V1 have the same dependence, and indirect calls have an r2 dependence
just as direct ones. Make sure the dependence is noted for all calls under both
ELF V1 and ELF V2.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226432 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-19 07:20:27 +00:00
David Blaikie
341a7e245e std::unique_ptrify the MCStreamer argument to createAsmPrinter
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226414 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 20:29:04 +00:00
Hal Finkel
341223ee0a [PowerPC] Don't hard-code R2 as register when processing TOC relocations
Instructions that have high-order TOC relocations always carry R2 as their base
register, so it does not matter whether we take the register from the
instruction or just hard-code it in PPCAsmPrinter. In the future, however, we
might want to apply these relocations to instructions using a different
register, so taking the register from the instruction is a better thing to do.
No change in functionality here, however.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226403 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 15:59:44 +00:00
Hal Finkel
180f89537a [PowerPC] Add some FIXMEs for fastcc and FPR <-> GPR moves
So we don't forget, once we support FPR <-> GPR moves on the P8, we'll likely
want to re-visit this part of the calling convention.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226401 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 14:31:10 +00:00
Hal Finkel
a01b583dbc [PowerPC] Initial PPC64 calling-convention changes for fastcc
The default calling convention specified by the PPC64 ELF (V1 and V2) ABI is
designed to work with both prototyped and non-prototyped/varargs functions. As
a result, GPRs and stack space are allocated for every argument, even those
that are passed in floating-point or vector registers.

GlobalOpt::OptimizeFunctions will transform local non-varargs functions (that
do not have their address taken) to use the 'fast' calling convention.

When functions are using the 'fast' calling convention, don't allocate GPRs for
arguments passed in other types of registers, and don't allocate stack space for
arguments passed in registers. Other changes for the fast calling convention
may be added in the future.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226399 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-18 12:08:47 +00:00
Chandler Carruth
de5df29556 [PM] Split the LoopInfo object apart from the legacy pass, creating
a LoopInfoWrapperPass to wire the object up to the legacy pass manager.

This switches all the clients of LoopInfo over and paves the way to port
LoopInfo to the new pass manager. No functionality change is intended
with this iteration.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226373 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-17 14:16:18 +00:00
Hal Finkel
962cff0c08 [PowerPC] Don't list R11 as a patchpoint scratch register
R11's status is the same under both the PPC64 ELF V1 and V2 ABIs: it is
reserved for use as an "environment pointer" for compilation models that
require such a thing. We don't, we also don't need a second scratch register,
and because we support only "local" patchpoint call targets, we might as well
let R11 be used for anyregcc patchpoints.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226369 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-17 03:57:34 +00:00
Hal Finkel
92cd0ca3b2 [PowerPC] Adjust PatchPoints for ppc64le
Bill Schmidt pointed out that some adjustments would be needed to properly
support powerpc64le (using the ELF V2 ABI). For one thing, R11 is not available
as a scratch register, so we need to use R12. R12 is also available under ELF
V1, so to maintain consistency, I flipped the order to make R12 the first
scratch register in the array under both ABIs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@226247 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-16 04:40:58 +00:00