Commit Graph

32 Commits

Author SHA1 Message Date
Hao Liu
5ab48a2f69 [AArch64] Revert r239711 again. We need to discuss how to share code between AArch64 and ARM backend.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239713 91177308-0d34-0410-b5e6-96231b3b80d8
2015-06-15 01:56:40 +00:00
Hao Liu
6024ab3b8f [AArch64] Match interleaved memory accesses into ldN/stN instructions.
Re-commit after adding "-aarch64-neon-syntax=generic" to fix the failure on OS X.
This patch was firstly committed in r239514, then reverted in r239544 because of a syntax incompatible failure on OS X.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239711 91177308-0d34-0410-b5e6-96231b3b80d8
2015-06-15 01:35:49 +00:00
Rafael Espindola
688e7b3049 This reverts commit r239529 and r239514.
Revert "[AArch64] Match interleaved memory accesses into ldN/stN instructions."
Revert "Fixing MSVC 2013 build error."

The  test/CodeGen/AArch64/aarch64-interleaved-accesses.ll test was failing on OS X.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239544 91177308-0d34-0410-b5e6-96231b3b80d8
2015-06-11 17:30:33 +00:00
Hao Liu
442f620296 [AArch64] Match interleaved memory accesses into ldN/stN instructions.
Add a pass AArch64InterleavedAccess to identify and match interleaved memory accesses. This pass transforms an interleaved load/store into ldN/stN intrinsic. As Loop Vectorizor disables optimization on interleaved accesses by default, this optimization is also disabled by default. To enable it by "-aarch64-interleaved-access-opt=true"

E.g. Transform an interleaved load (Factor = 2):
       %wide.vec = load <8 x i32>, <8 x i32>* %ptr
       %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6>  ; Extract even elements
       %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7>  ; Extract odd elements
     Into:
       %ld2 = { <4 x i32>, <4 x i32> } call aarch64.neon.ld2(%ptr)
       %v0 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 0
       %v1 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 1

E.g. Transform an interleaved store (Factor = 2):
       %i.vec = shuffle %v0, %v1, <0, 4, 1, 5, 2, 6, 3, 7>  ; Interleaved vec
       store <8 x i32> %i.vec, <8 x i32>* %ptr
     Into:
       %v0 = shuffle %i.vec, undef, <0, 1, 2, 3>
       %v1 = shuffle %i.vec, undef, <4, 5, 6, 7>
       call void aarch64.neon.st2(%v0, %v1, %ptr)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239514 91177308-0d34-0410-b5e6-96231b3b80d8
2015-06-11 09:05:02 +00:00
Wei Mi
cac51be31f [X86] Disable loop unrolling in loop vectorization pass when VF is 1.
The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture,
by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce
the cost of overflow check, memory boundary check and extra prologue/epilogue code when
regular unroller will unroll the loop another time. Disable it when VF==1 remove the
unnecessary cost on x86. The same can be done for other platforms after verifying
interleaving/memory bound checking to be not perf critical on those platforms.

Differential Revision: http://reviews.llvm.org/D9515


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@236613 91177308-0d34-0410-b5e6-96231b3b80d8
2015-05-06 17:12:25 +00:00
Kevin Qin
40e66277f7 [AArch64] Enable partial & runtime unrolling on cortex-a57
For inner one of nested loops, it is more likely to be a hot loop,
and the runtime check can be promoted out from patch 0001, so the
overhead is less, we can try a doubled threshold to unroll more loops.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@231632 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-09 06:14:28 +00:00
Benjamin Kramer
30fa873958 Make some non-constant static variables non-static or fully const.
Otherwise we have to emit thread-safe initialization for them. NFC.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@230894 91177308-0d34-0410-b5e6-96231b3b80d8
2015-03-01 18:09:56 +00:00
Chandler Carruth
26bc071088 [multiversion] Remove the function parameter from the unrolling
preferences interface on TTI now that all of TTI is per-function.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227741 91177308-0d34-0410-b5e6-96231b3b80d8
2015-02-01 14:31:23 +00:00
Chandler Carruth
1937233a22 [PM] Switch the TargetMachine interface from accepting a pass manager
base which it adds a single analysis pass to, to instead return the type
erased TargetTransformInfo object constructed for that TargetMachine.

This removes all of the pass variants for TTI. There is now a single TTI
*pass* in the Analysis layer. All of the Analysis <-> Target
communication is through the TTI's type erased interface itself. While
the diff is large here, it is nothing more that code motion to make
types available in a header file for use in a different source file
within each target.

I've tried to keep all the doxygen comments and file boilerplate in line
with this move, but let me know if I missed anything.

With this in place, the next step to making TTI work with the new pass
manager is to introduce a really simple new-style analysis that produces
a TTI object via a callback into this routine on the target machine.
Once we have that, we'll have the building blocks necessary to accept
a function argument as well.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227685 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-31 11:17:59 +00:00
Chandler Carruth
a6a87b595d [PM] Change the core design of the TTI analysis to use a polymorphic
type erased interface and a single analysis pass rather than an
extremely complex analysis group.

The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.

I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.

There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.

The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.

Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.

The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]

Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:

1) Improving the TargetMachine interface by having it directly return
   a TTI object. Because we have a non-pass object with value semantics
   and an internal type erasure mechanism, we can narrow the interface
   of the TargetMachine to *just* do what we need: build and return
   a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
   This will include splitting off a minimal form of it which is
   sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
   target machine for each function. This may actually be done as part
   of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
   easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
   easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
   just a bit messy and exacerbating the complexity of implementing
   the TTI in each target.

Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.

Differential Revision: http://reviews.llvm.org/D7293

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227669 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-31 03:43:40 +00:00
Chad Rosier
13faabb6c5 Commoning of target specific load/store intrinsics in Early CSE.
Phabricator revision: http://reviews.llvm.org/D7121
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@227149 91177308-0d34-0410-b5e6-96231b3b80d8
2015-01-26 22:51:15 +00:00
Kevin Qin
f73400a864 [AArch64] Enable partial & runtime unrolling on cortex-a57.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@219401 91177308-0d34-0410-b5e6-96231b3b80d8
2014-10-09 10:13:27 +00:00
Chad Rosier
ea64dce261 [AArch64] Improve cost model to handle sdiv by a pow-of-two.
This patch improves the target-specific cost model to better handle signed
division by a power of two. The immediate result is that this enables the SLP
vectorizer to do a better job.

http://reviews.llvm.org/D5469
PR20714

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218607 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-29 13:59:31 +00:00
Gerolf Hoflehner
e7648ae68d [AArch64] Revert r216141 for cyclone
The increase of the interleave factor to 4 has side-effects
like performance losses eg. due to reminder loops being executed
more frequently and may increase code size. It requires more
analysis and careful heuristic tuning. Expect double digit gains
in small benchmarks like lowercase.c and losses in puzzle.c.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217540 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 20:31:57 +00:00
Sanjay Patel
87c977a52b Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable.
"Unroll" is not the appropriate name for this variable. Clang already uses 
the term "interleave" in pragmas and metadata for this.

Differential Revision: http://reviews.llvm.org/D5066



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217528 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-10 17:58:16 +00:00
Karthik Bhat
e637d65af3 Allow vectorization of division by uniform power of 2.
This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible.
Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend.
Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216371 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-25 04:56:54 +00:00
James Molloy
824431f680 [LoopVectorize] Up the maximum unroll factor to 4 for AArch64
Only for Cortex-A57 and Cyclone for now, where it has shown wins.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216141 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-21 00:02:51 +00:00
James Molloy
72035e9a8e Teach the SLP Vectorizer that keeping some values live over a callsite can have a cost.
Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214859 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-05 12:30:34 +00:00
Eric Christopher
9f85dccfc6 Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214781 91177308-0d34-0410-b5e6-96231b3b80d8
2014-08-04 21:25:23 +00:00
Tim Northover
8bfc50e4a9 AArch64: improve handling & modelling of FP_TO_XINT nodes.
There's probably no acatual change in behaviour here, just updating
the LowerFP_TO_INT function to be more similar to the reverse
implementation and updating costs to current CodeGen.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210985 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-15 09:27:15 +00:00
Tim Northover
94fe5c1fe2 AArch64: improve vector [su]itofp handling.
This somehow got missed in the AArch64 merge, so should fix a
performance regression since 3.4.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210984 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-15 09:27:06 +00:00
Tim Northover
29f94c7201 AArch64/ARM64: move ARM64 into AArch64's place
This commit starts with a "git mv ARM64 AArch64" and continues out
from there, renaming the C++ classes, intrinsics, and other
target-local objects for consistency.

"ARM64" test directories are also moved, and tests that began their
life in ARM64 use an arm64 triple, those from AArch64 use an aarch64
triple. Both should be equivalent though.

This finishes the AArch64 merge, and everyone should feel free to
continue committing as normal now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209577 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-24 12:50:23 +00:00
Tim Northover
9105f66d6f AArch64/ARM64: remove AArch64 from tree prior to renaming ARM64.
I'm doing this in two phases for a better "git blame" record. This
commit removes the previous AArch64 backend and redirects all
functionality to ARM64. It also deduplicates test-lines and removes
orphaned AArch64 tests.

The next step will be "git mv ARM64 AArch64" and rewire most of the
tests.

Hopefully LLVM is still functional, though it would be even better if
no-one ever had to care because the rename happens straight
afterwards.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209576 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-24 12:42:26 +00:00
Craig Topper
0fd57f4b56 [C++11] Add 'override' keywords and remove 'virtual'. Additionally add 'final' and leave 'virtual' on some methods that are marked virtual without overriding anything and have no obvious overrides themselves. AArch64 edition
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207510 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-29 07:58:34 +00:00
Craig Topper
c848b1bbcf [C++] Use 'nullptr'. Target edition.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207197 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-25 05:30:21 +00:00
Chandler Carruth
42e8630239 [Modules] Fix potential ODR violations by sinking the DEBUG_TYPE
definition below all of the header #include lines, lib/Target/...
edition.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206842 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-22 02:41:26 +00:00
Jiangning Liu
a1da819896 This commit allows vectorized loops to be unrolled by a factor of 2 for AArch64.
A new test case is also added for ARM64.

Patched by Z.Zheng



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206563 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 07:57:54 +00:00
Nuno Lopes
2ca626570f remove a bunch of unused private methods
found with a smarter version of -Wunused-member-function that I'm playwing with.
Appologies in advance if I removed someone's WIP code.

 include/llvm/CodeGen/MachineSSAUpdater.h            |    1 
 include/llvm/IR/DebugInfo.h                         |    3 
 lib/CodeGen/MachineSSAUpdater.cpp                   |   10 --
 lib/CodeGen/PostRASchedulerList.cpp                 |    1 
 lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp    |   10 --
 lib/IR/DebugInfo.cpp                                |   12 --
 lib/MC/MCAsmStreamer.cpp                            |    2 
 lib/Support/YAMLParser.cpp                          |   39 ---------
 lib/TableGen/TGParser.cpp                           |   16 ---
 lib/TableGen/TGParser.h                             |    1 
 lib/Target/AArch64/AArch64TargetTransformInfo.cpp   |    9 --
 lib/Target/ARM/ARMCodeEmitter.cpp                   |   12 --
 lib/Target/ARM/ARMFastISel.cpp                      |   84 --------------------
 lib/Target/Mips/MipsCodeEmitter.cpp                 |   11 --
 lib/Target/Mips/MipsConstantIslandPass.cpp          |   12 --
 lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp              |   21 -----
 lib/Target/NVPTX/NVPTXISelDAGToDAG.h                |    2 
 lib/Target/PowerPC/PPCFastISel.cpp                  |    1 
 lib/Transforms/Instrumentation/AddressSanitizer.cpp |    2 
 lib/Transforms/Instrumentation/BoundsChecking.cpp   |    2 
 lib/Transforms/Instrumentation/MemorySanitizer.cpp  |    1 
 lib/Transforms/Scalar/LoopIdiomRecognize.cpp        |    8 -
 lib/Transforms/Scalar/SCCP.cpp                      |    1 
 utils/TableGen/CodeEmitterGen.cpp                   |    2 
 24 files changed, 2 insertions(+), 261 deletions(-)


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204560 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-23 17:09:26 +00:00
Chandler Carruth
436906ab3c [TTI] There is actually no realistic way to pop TTI implementations off
the stack of the analysis group because they are all immutable passes.
This is made clear by Craig's recent work to use override
systematically -- we weren't overriding anything for 'finalizePass'
because there is no such thing.

This is kind of a lame restriction on the API -- we can no longer push
and pop things, we just set up the stack and run. However, I'm not
invested in building some better solution on top of the existing
(terrifying) immutable pass and legacy pass manager.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203437 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-10 02:45:14 +00:00
Craig Topper
629b96cb4f Switch all uses of LLVM_OVERRIDE to just use 'override' directly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202621 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-02 09:09:27 +00:00
Craig Topper
4eb03f049e Switch all uses of LLVM_FINAL to just use 'final', and remove the macro.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202618 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-02 08:08:51 +00:00
Chad Rosier
fda2dfbca7 [AArch64] Add support for TargetTransformInfo Analysis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201793 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-20 16:00:08 +00:00