This patch adds an optimization in CodeGenPrepare to move an extractelement
right before a store when the target can combine them.
The optimization may promote any scalar operations to vector operations in the
way to make that possible.
** Context **
Some targets use different register files for both vector and scalar operations.
This means that transitioning from one domain to another may incur copy from one
register file to another. These copies are not coalescable and may be expensive.
For example, according to the scheduling model, on cortex-A8 a vector to GPR
move is 20 cycles.
** Motivating Example **
Let us consider an example:
define void @foo(<2 x i32>* %addr1, i32* %dest) {
%in1 = load <2 x i32>* %addr1, align 8
%extract = extractelement <2 x i32> %in1, i32 1
%out = or i32 %extract, 1
store i32 %out, i32* %dest, align 4
ret void
}
As it is, this IR generates the following assembly on armv7:
vldr d16, [r0] @vector load
vmov.32 r0, d16[1] @ cross-register-file copy: 20 cycles
orr r0, r0, #1 @ scalar bitwise or
str r0, [r1] @ scalar store
bx lr
Whereas we could generate much faster code:
vldr d16, [r0] @ vector load
vorr.i32 d16, #0x1 @ vector bitwise or
vst1.32 {d16[1]}, [r1:32] @ vector extract + store
bx lr
Half of the computation made in the vector is useless, but this allows to get
rid of the expensive cross-register-file copy.
** Proposed Solution **
To avoid this cross-register-copy penalty, we promote the scalar operations to
vector operations. The penalty will be removed if we manage to promote the whole
chain of computation in the vector domain.
Currently, we do that only when the chain of computation ends by a store and the
target is able to combine an extract with a store.
Stores are the most likely candidates, because other instructions produce values
that would need to be promoted and so, extracted as some point[1]. Moreover,
this is customary that targets feature stores that perform a vector extract (see
AArch64 and X86 for instance).
The proposed implementation relies on the TargetTransformInfo to decide whether
or not it is beneficial to promote a chain of computation in the vector domain.
Unfortunately, this interface is rather inaccurate for this level of details and
although this optimization may be beneficial for X86 and AArch64, the inaccuracy
will lead to the optimization being too aggressive.
Basically in TargetTransformInfo, everything that is legal has a cost of 1,
whereas, even if a vector type is legal, usually a vector operation is slightly
more expensive than its scalar counterpart. That will lead to too many
promotions that may not be counter balanced by the saving of the
cross-register-file copy. For instance, on AArch64 this penalty is just 4
cycles.
For now, the optimization is just enabled for ARM prior than v8, since those
processors have a larger penalty on cross-register-file copies, and the scope is
limited to basic blocks. Because of these two factors, we limit the effects of
the inaccuracy. Indeed, I did not want to build up a fancy cost model with block
frequency and everything on top of that.
[1] We can imagine targets that can combine an extractelement with other
instructions than just stores. If we want to go into that direction, the current
interfaces must be augmented and, moreover, I think this becomes a global isel
problem.
Differential Revision: http://reviews.llvm.org/D5921
<rdar://problem/14170854>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220978 91177308-0d34-0410-b5e6-96231b3b80d8
Do a better job classifying symbols. This increases the consistency
between the COFF handling code and the ELF side of things.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220952 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch adds an llvm_call_once which is a wrapper around std::call_once on platforms where it is available and devoid of bugs. The patch also migrates the ManagedStatic mutex to be allocated using llvm_call_once.
These changes are philosophically equivalent to the changes added in r219638, which were reverted due to a hang on Win32 which was the result of a bug in the Windows implementation of std::call_once.
Reviewers: aaron.ballman, chapuni, chandlerc, rnk
Reviewed By: rnk
Subscribers: majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D5922
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220932 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch finishes up support for handling sampling profiles in both
text and binary formats. The new binary format uses uleb128 encoding to
represent numeric values. This makes profiles files about 25% smaller.
The profile writer class can write profiles in the existing text and the
new binary format. In subsequent patches, I will add the capability to
read (and perhaps write) profiles in the gcov format used by GCC.
Additionally, I will be adding support in llvm-profdata to manipulate
sampling profiles.
There was a bit of refactoring needed to separate some code that was in
the reader files, but is actually common to both the reader and writer.
The new test checks that reading the same profile encoded as text or
raw, produces the same results.
Reviewers: bogner, dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6000
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220915 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This helps llvm-objdump -r to print out the symbol name along
with the relocation type on x86. Adjust existing tests from checking
for "Unknown" to check for the symbol now.
Test Plan: Adjusted test/Object tests.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5987
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220866 91177308-0d34-0410-b5e6-96231b3b80d8
This is a Microsoft calling convention that supports both x86 and x86_64
subtargets. It passes vector and floating point arguments in XMM0-XMM5,
and passes them indirectly once they are consumed.
Homogenous vector aggregates of up to four elements can be passed in
sequential vector registers, but this part is not implemented in LLVM
and will be handled in Clang.
On 32-bit x86, it is similar to fastcall in that it uses ecx:edx as
integer register parameters and is callee cleanup. On x86_64, it
delegates to the normal win64 calling convention.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D5943
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220745 91177308-0d34-0410-b5e6-96231b3b80d8
I noticed that it was untested, and forcing it on caused some tests to fail:
LLVM :: Linker/metadata-a.ll
LLVM :: Linker/prefixdata.ll
LLVM :: Linker/type-unique-odr-a.ll
LLVM :: Linker/type-unique-simple-a.ll
LLVM :: Linker/type-unique-simple2-a.ll
LLVM :: Linker/type-unique-simple2.ll
LLVM :: Linker/type-unique-type-array-a.ll
LLVM :: Linker/unnamed-addr1-a.ll
LLVM :: Linker/visibility1.ll
If it is to be resurrected, it has to be fixed and we should probably have a
-preserve-source command line option in llvm-mc and run tests with and without
it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220741 91177308-0d34-0410-b5e6-96231b3b80d8
sets as keys into a cache of interference matrice values in the Interference
constraint adder.
Creating interference matrices was one of the large remaining time-sinks in
PBQP. Caching them reduces the total compile time (when using PBQP) on the
nightly test suite by ~10%.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220688 91177308-0d34-0410-b5e6-96231b3b80d8
These just delegate to the underlying vector type in the MapVector.
Also just add in some sanity unittests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220687 91177308-0d34-0410-b5e6-96231b3b80d8
We used to always vectorize (slp and loop vectorize) in the LTO pass pipeline.
r220345 changed it so that we used the PassManager's fields 'LoopVectorize' and
'SLPVectorize' out of the desire to be able to disable vectorization using the
cl::opt flags 'vectorize-loops'/'slp-vectorize' which the before mentioned
fields default to.
Unfortunately, this turns off vectorization because those fields
default to false.
This commit adds flags to the LTO library to disable lto vectorization which
reconciles the desire to optionally disable vectorization during LTO and
the desired behavior of defaulting to enabled vectorization.
We really want tools to set PassManager flags directly to enable/disable
vectorization and not go the route via cl::opt flags *in*
PassManagerBuilder.cpp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220652 91177308-0d34-0410-b5e6-96231b3b80d8
Apparently unique_ptr'ifying NodeMetadata exposed an issue in VS where it
occasionally tries to synthesize copy constructors instead of moves. Hopefully
explicitly deleting the copy constructor and defining the move constructor will
fix this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220649 91177308-0d34-0410-b5e6-96231b3b80d8
It broke the Windows build:
[1/19] Building CXX object lib\CodeGen\CMakeFiles\LLVMCodeGen.dir\RegAllocPBQP.cpp.obj
C:\bb-win7\ninja-clang-i686-msc17-R\llvm-project\llvm\include\llvm/CodeGen/RegAllocPBQP.h(132) : error C2248: 'std::unique_ptr<_Ty>::unique_ptr' : cannot access private member declared in class 'std::unique_ptr<_Ty>'
with
[
_Ty=unsigned int []
]
D:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\include\memory(1600) : see declaration of 'std::unique_ptr<_Ty>::unique_ptr'
with
[
_Ty=unsigned int []
]
This diagnostic occurred in the compiler generated function 'llvm::PBQP::RegAlloc::NodeMetadata::NodeMetadata(const llvm::PBQP::RegAlloc::NodeMetadata &)'
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220645 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change. This just brings things more in-line with coding
standards, and makes ValuePool's functionality clearer (it's not tied to pooling
costs, and we may want to use it to hold other things in the future).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220641 91177308-0d34-0410-b5e6-96231b3b80d8
To do this, change the representation of lazy loaded functions.
The previous representation cannot differentiate between a function whose body
has been removed and one whose body hasn't been read from the .bc file. That
means that in order to drop a function, the entire body had to be read.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220580 91177308-0d34-0410-b5e6-96231b3b80d8
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.
For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).
This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..
See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900
Differential Revision: http://reviews.llvm.org/D5658
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220570 91177308-0d34-0410-b5e6-96231b3b80d8
In post-commit review of r219442, Rafael pointed out that the comment style
of the newly introduced helper didn't follow LLVM's coding standard.
Modernize the whole file to the new standards.
Differential Revision: http://reviews.llvm.org/D5918
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220467 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we're sure the only root (non-abstract) scope is the current
function scope, there's no need for isCurrentFunctionScope, the property
can be tested directly instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220451 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently when emitting a label, a new data fragment is created for it if the
current fragment isn't a data fragment.
This change instead enqueues the label and attaches it to the next fragment
(e.g. created for the next instruction) if possible.
When bundle alignment is not enabled, this has no functionality change (it
just results in fewer extra fragments being created). For bundle alignment,
previously labels would point to the beginning of the bundle padding instead
of the beginning of the emitted instruction. This was not only less efficient
(e.g. jumping to the nops instead of past them) but also led to miscalculation
of the address of the GOT (since MC uses a label difference rather than
emitting a "." symbol).
Fixes https://code.google.com/p/nativeclient/issues/detail?id=3982
Test Plan: regression test attached
Reviewers: jvoung, eliben
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D5915
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220439 91177308-0d34-0410-b5e6-96231b3b80d8
When a call to a double-precision libm function has fast-math semantics
(via function attribute for now because there is no IR-level FMF on calls),
we can avoid fpext/fptrunc operations and use the float version of the call
if the input and output are both float.
We already do this optimization using a command-line option; this patch just
adds the ability for fast-math to use the existing functionality.
I moved the cl::opt from InstructionCombining into SimplifyLibCalls because
it's only ever used internally to that class.
Modified the existing test cases to use the unsafe-fp-math attribute rather
than repeating all tests.
This patch should solve: http://llvm.org/bugs/show_bug.cgi?id=17850
Differential Revision: http://reviews.llvm.org/D5893
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220390 91177308-0d34-0410-b5e6-96231b3b80d8
These are named following the IEEE-754 names for these
functions, rather than the libm fmin / fmax to avoid
possible ambiguities. Some languages may implement something
resembling fmin / fmax which return NaN if either operand is
to propagate errors. These implement the IEEE-754 semantics
of returning the other operand if either is a NaN representing
missing data.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220341 91177308-0d34-0410-b5e6-96231b3b80d8
This requires incorporating __GNUC_PATCHLEVEL__ into our prerequisite
check, and renaming our __GNUC_PREREQ to LLVM_GNUC_PREREQ, since it is
now functionally different.
Patch by Chilledheart!
Differential Revision: http://reviews.llvm.org/D5879
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220332 91177308-0d34-0410-b5e6-96231b3b80d8
This enables targets to adapt their pass pipeline to the register
allocator in use. For example, with the AArch64 backend, using PBQP
with the cortex-a57, the FPLoadBalancing pass is no longer necessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220321 91177308-0d34-0410-b5e6-96231b3b80d8
It is just too easy to use a virtual register intead of a NodeId without a
compiler warning. This does not fix the fundamental problem, i.e. both
have the same underlying types, but increases the likelyhood to detect it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220303 91177308-0d34-0410-b5e6-96231b3b80d8
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.
To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.
These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.
I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220277 91177308-0d34-0410-b5e6-96231b3b80d8
Every target we support has support for assembly that looks like
a = b - c
.long a
What is special about MachO is that the above combination suppresses the
production of a relocation.
With this change we avoid producing the intermediary labels when they don't
add any value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@220256 91177308-0d34-0410-b5e6-96231b3b80d8