Commit Graph

836 Commits

Author SHA1 Message Date
James Molloy
94c25519a2 [ARM] Teach the cost model that cross-class copies are costly.
Cross-class copies being expensive is actually a trait of the microarchitecture, but as I haven't yet seen an example of a microarchitecture where they're cheap it seems best to just enable this by default, covering the non-mcpu build case.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217674 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-12 13:29:40 +00:00
Hal Finkel
5f36b46bf0 Make use @llvm.assume for loop guards in ScalarEvolution
This adds a basic (but important) use of @llvm.assume calls in ScalarEvolution.
When SE is attempting to validate a condition guarding a loop (such as whether
or not the loop count can be zero), this check should also include dominating
assumptions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@217348 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-07 21:37:59 +00:00
Hal Finkel
bf301d5670 Add a CFL Alias Analysis implementation
This provides an implementation of CFL alias analysis (including some
supporting data structures). Currently, we don't have any extremely fancy
features, sans some interprocedural analysis (i.e. no field sensitivity, etc.),
and we do best sitting behind BasicAA + TBAA. In such a configuration, we take
~0.6-0.8% of total compile time, and give ~7-8% NoAlias responses to queries
TBAA and BasicAA couldn't answer when bootstrapping LLVM. In testing this on
other projects, we've seen up to 10.5% of queries dropped by BasicAA+TBAA
answered with NoAlias by this algorithm.

Patch by George Burgess IV (with minor modifications by me -- mostly adapting
some BasicAA tests), thanks!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216970 91177308-0d34-0410-b5e6-96231b3b80d8
2014-09-02 21:43:13 +00:00
Hal Finkel
8ef7b17dfc Add @llvm.assume, lowering, and some basic properties
This is the first commit in a series that add an @llvm.assume intrinsic which
can be used to provide the optimizer with a condition it may assume to be true
(when the control flow would hit the intrinsic call). Some basic properties are added here:

 - llvm.invariant(true) is dead.
 - llvm.invariant(false) is unreachable (this directly corresponds to the
   documented behavior of MSVC's __assume(0)), so is llvm.invariant(undef).

The intrinsic is tagged as writing arbitrarily, in order to maintain control
dependencies. BasicAA has been updated, however, to return NoModRef for any
particular location-based query so that we don't unnecessarily block code
motion.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213973 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-25 21:13:35 +00:00
Hal Finkel
9f0a2a8bd5 Convert noalias parameter attributes into noalias metadata during inlining
This functionality is currently turned off by default.

Part of the motivation for introducing scoped-noalias metadata is to enable the
preservation of noalias parameter attribute information after inlining.
Sometimes this can be inferred from the code in the caller after inlining, but
often we simply lose valuable information.

The overall process if fairly simple:
 1. Create a new unqiue scope domain.
 2. For each (used) noalias parameter, create a new alias scope.
 3. For each pointer, collect the underlying objects. Add a noalias scope for
    each noalias parameter from which we're not derived (and has not been
    captured prior to that point).
 4. Add an alias.scope for each noalias parameter from which we might be
    derived (or has been captured before that point).

Note that the capture checks apply only if one of the underlying objects is not
an identified function-local object.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213949 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-25 15:50:08 +00:00
Hal Finkel
6f5c609076 Simplify and improve scoped-noalias metadata semantics
In the process of fixing the noalias parameter -> metadata conversion process
that will take place during inlining (which will be committed soon, but not
turned on by default), I have come to realize that the semantics provided by
yesterday's commit are not really what we want. Here's why:

void foo(noalias a, noalias b, noalias c, bool x) {
  *q = x ? a : b;
  *c = *q;
}

Generically, we know that *c does not alias with *a and with *b (so there is an
'and' in what we know we're not), and we know that *q might be derived from *a
or from *b (so there is an 'or' in what we know that we are). So we do not want
the semantics currently, where any noalias scope matching any alias.scope
causes a NoAlias return. What we want to know is that the noalias scopes form a
superset of the alias.scope list (meaning that all the things we know we're not
is a superset of all of things the other instruction might be).

Making that change, however, introduces a composibility problem. If we inline
once, adding the noalias metadata, and then inline again adding more, and we
append new scopes onto the noalias and alias.scope lists each time. But, this
means that we could change what was a NoAlias result previously into a MayAlias
result because we appended an additional scope onto one of the alias.scope
lists. So, instead of giving scopes the ability to have parents (which I had
borrowed from the TBAA implementation, but seems increasingly unlikely to be
useful in practice), I've given them domains. The subset/superset condition now
applies within each domain independently, and we only need it to hold in one
domain. Each time we inline, we add the new scopes in a new scope domain, and
everything now composes nicely. In addition, this simplifies the
implementation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213948 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-25 15:50:02 +00:00
Hal Finkel
16fd27b2c3 Add scoped-noalias metadata
This commit adds scoped noalias metadata. The primary motivations for this
feature are:
  1. To preserve noalias function attribute information when inlining
  2. To provide the ability to model block-scope C99 restrict pointers

Neither of these two abilities are added here, only the necessary
infrastructure. In fact, there should be no change to existing functionality,
only the addition of new features. The logic that converts noalias function
parameters into this metadata during inlining will come in a follow-up commit.

What is added here is the ability to generally specify noalias memory-access
sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA
nodes:

!scope0 = metadata !{ metadata !"scope of foo()" }
!scope1 = metadata !{ metadata !"scope 1", metadata !scope0 }
!scope2 = metadata !{ metadata !"scope 2", metadata !scope0 }
!scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 }
!scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 }

Loads and stores can be tagged with an alias-analysis scope, and also, with a
noalias tag for a specific scope:

... = load %ptr1, !alias.scope !{ !scope1 }
... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 }

When evaluating an aliasing query, if one of the instructions is associated
with an alias.scope id that is identical to the noalias scope associated with
the other instruction, or is a descendant (in the scope hierarchy) of the
noalias scope associated with the other instruction, then the two memory
accesses are assumed not to alias.

Note that is the first element of the scope metadata is a string, then it can
be combined accross functions and translation units. The string can be replaced
by a self-reference to create globally unqiue scope identifiers.

[Note: This overview is slightly stylized, since the metadata nodes really need
to just be numbers (!0 instead of !scope0), and the scope lists are also global
unnamed metadata.]

Existing noalias metadata in a callee is "cloned" for use by the inlined code.
This is necessary because the aliasing scopes are unique to each call site
(because of possible control dependencies on the aliasing properties). For
example, consider a function: foo(noalias a, noalias b) { *a = *b; } that gets
inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } --
now just because we know that a1 does not alias with b1 at the first call site,
and a2 does not alias with b2 at the second call site, we cannot let inlining
these functons have the metadata imply that a1 does not alias with b2.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213864 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-24 14:25:39 +00:00
Hal Finkel
2c7c54c86c AA metadata refactoring (introduce AAMDNodes)
In order to enable the preservation of noalias function parameter information
after inlining, and the representation of block-level __restrict__ pointer
information (etc.), additional kinds of aliasing metadata will be introduced.
This metadata needs to be carried around in AliasAnalysis::Location objects
(and MMOs at the SDAG level), and so we need to generalize the current scheme
(which is hard-coded to just one TBAA MDNode*).

This commit introduces only the necessary refactoring to allow for the
introduction of other aliasing metadata types, but does not actually introduce
any (that will come in a follow-up commit). What it does introduce is a new
AAMDNodes structure to hold all of the aliasing metadata nodes associated with
a particular memory-accessing instruction, and uses that structure instead of
the raw MDNode* in AliasAnalysis::Location, etc.

No functionality change intended.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213859 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-24 12:16:19 +00:00
Hal Finkel
8f609696e0 Improve BasicAA CS-CS queries (redux)
This reverts, "r213024 - Revert r212572 "improve BasicAA CS-CS queries", it
causes PR20303." with a fix for the bug in pr20303. As it turned out, the
relevant code was both wrong and over-conservative (because, as with the code
it replaced, it would return the overall ModRef mask even if just Ref had been
implied by the argument aliasing results). Hopefully, this correctly fixes both
problems.

Thanks to Nick Lewycky for reducing the test case for pr20303 (which I've
cleaned up a little and added in DSE's test directory). The BasicAA test has
also been updated to check for this error.

Original commit message:

BasicAA contains knowledge of certain intrinsics, such as memcpy and memset,
and uses that information to form more-accurate answers to CallSite vs. Loc
ModRef queries. Unfortunately, it did not use this information when answering
CallSite vs. CallSite queries.

Generically, when an intrinsic takes one or more pointers and the intrinsic is
marked only to read/write from its arguments, the offset/size is unknown. As a
result, the generic code that answers CallSite vs. CallSite (and CallSite vs.
Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's
arguments. While BasicAA's CallSite vs. Loc override could use more-accurate
size information for some intrinsics, it did not do the same for CallSite vs.
CallSite queries.

This change refactors the intrinsic-specific logic in BasicAA into a generic AA
query function: getArgLocation, which is overridden by BasicAA to supply the
intrinsic-specific knowledge, and used by AA's generic implementation. This
allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and
CallSite vs. CallSite queries, and simplifies the BasicAA implementation.

Currently, only one function, Mac's memset_pattern16, is handled by BasicAA
(all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's
getModRefBehavior override now also returns OnlyAccessesArgumentPointees for
this function (which is an improvement).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213219 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-17 01:28:25 +00:00
Nick Lewycky
5508cfd02c Revert r212572 "improve BasicAA CS-CS queries", it causes PR20303.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213024 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-15 00:53:38 +00:00
Hal Finkel
04fe990190 Improve BasicAA CS-CS queries
BasicAA contains knowledge of certain intrinsics, such as memcpy and memset,
and uses that information to form more-accurate answers to CallSite vs. Loc
ModRef queries. Unfortunately, it did not use this information when answering
CallSite vs. CallSite queries.

Generically, when an intrinsic takes one or more pointers and the intrinsic is
marked only to read/write from its arguments, the offset/size is unknown. As a
result, the generic code that answers CallSite vs. CallSite (and CallSite vs.
Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's
arguments. While BasicAA's CallSite vs. Loc override could use more-accurate
size information for some intrinsics, it did not do the same for CallSite vs.
CallSite queries.

This change refactors the intrinsic-specific logic in BasicAA into a generic AA
query function: getArgLocation, which is overridden by BasicAA to supply the
intrinsic-specific knowledge, and used by AA's generic implementation. This
allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and
CallSite vs. CallSite queries, and simplifies the BasicAA implementation.

Currently, only one function, Mac's memset_pattern16, is handled by BasicAA
(all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's
getModRefBehavior override now also returns OnlyAccessesArgumentPointees for
this function (which is an improvement).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212572 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-08 23:16:49 +00:00
Andrea Di Biagio
60e9a53c21 [CostModel][x86] Improved cost model for alternate shuffles.
This patch:
 1) Improves the cost model for x86 alternate shuffles (originally
added at revision 211339);
 2) Teaches the Cost Model Analysis pass how to analyze alternate shuffles.

Alternate shuffles are a special kind of blend; on x86, we can often
easily lowered alternate shuffled into single blend
instruction (depending on the subtarget features).

The existing cost model didn't take into account subtarget features.
Also, it had a couple of "dead" entries for vector types that are never
legal (example: on x86 types v2i32 and v2f32 are not legal; those are
always either promoted or widened to 128-bit vector types).

The new x86 cost model takes into account what target features we have
before returning the shuffle cost (i.e. the number of instructions
after the blend is lowered/expanded).

This patch also teaches the Cost Model Analysis how to identify and analyze
alternate shuffles (i.e. 'SK_Alternate' shufflevector instructions):
 - added function 'isAlternateVectorMask';
 - added some logic to check if an instruction is a alternate shuffle and, in
   case, call the target specific TTI to get the corresponding shuffle cost;
 - added a test to verify the cost model analysis on alternate shuffles.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@212296 91177308-0d34-0410-b5e6-96231b3b80d8
2014-07-03 22:24:18 +00:00
Alp Toker
8aeca44558 Reduce verbiage of lit.local.cfg files
We can just split targets_to_build in one place and make it immutable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210496 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-09 22:42:55 +00:00
Tobias Grosser
5e66eea5ba ScalarEvolution: Derive element size from the type of the loaded element
Before, we where looking at the size of the pointer type that specifies the
location from which to load the element. This did not make any sense at all.

This change fixes a bug in the delinearization where we failed to delinerize
certain load instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@210435 91177308-0d34-0410-b5e6-96231b3b80d8
2014-06-08 19:21:20 +00:00
Sebastian Pop
421b2c571c remove constant terms
The delinearization is needed only to remove the non linearity induced by
expressions involving multiplications of parameters and induction variables.
There is no problem in dealing with constant times parameters, or constant times
an induction variable.

For this reason, the current patch discards all constant terms and multipliers
before running the delinearization algorithm on the terms. The only thing
remaining in the term expressions are parameters and multiply expressions of
parameters: these simplified term expressions are passed to the array shape
recognizer that will not recognize constant dimensions anymore: these will be
recognized as different strides in parametric subscripts.

The only important special case of a constant dimension is the size of elements.
Instead of relying on the delinearization to infer the size of an element,
compute the element size from the base address type. This is a much more precise
way of computing the element size than before, as we would have mixed together
the size of an element with the strides of the innermost dimension.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209691 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-27 22:41:45 +00:00
Dinesh Dwivedi
f31b009b3e Adding testcase for PR18886.
Differential Revision: http://reviews.llvm.org/D3837



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209645 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-27 06:44:25 +00:00
Tim Northover
29f94c7201 AArch64/ARM64: move ARM64 into AArch64's place
This commit starts with a "git mv ARM64 AArch64" and continues out
from there, renaming the C++ classes, intrinsics, and other
target-local objects for consistency.

"ARM64" test directories are also moved, and tests that began their
life in ARM64 use an arm64 triple, those from AArch64 use an aarch64
triple. Both should be equivalent though.

This finishes the AArch64 merge, and everyone should feel free to
continue committing as normal now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209577 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-24 12:50:23 +00:00
Andrew Trick
7ce3a725d3 Test case comments. Fix sloppiness.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209551 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-23 20:46:21 +00:00
Andrew Trick
ab0d042a74 Fix and improve SCEV ComputeBackedgeTankCount.
This is a follow-up to r209358: PR19799: Indvars miscompile due to an
incorrect max backedge taken count from SCEV.

That fix was incomplete as pointed out by Arnold and Michael Z. The
code was also too confusing. It needed a careful rewrite with more
unit tests. This version will also happen to optimize more cases.

<rdar://17005101> PR19799: Indvars miscompile...

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209545 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-23 19:47:13 +00:00
Andrew Trick
facca6e3f3 Fix a bug in SCEV's backedge taken count computation from my prior fix in Jan.
This has to do with the trip count computation for loops with multiple
exits, which is quite subtle. Most passes just ask for a single trip
count number, so we must be conservative assuming any exit could be
taken.  Normally, we rely on the "exact" trip count, which was
correctly given as "unknown". However, SCEV also gives a "max"
back-edge taken count. The loops max BE taken count is conservatively
a maximum over the max of each exit's non-exiting iterations
count. Note that some exit tests can be skipped so the max loop
back-edge taken count can actually exceed the max non-exiting
iterations for some exits. However, when we know the loop *latch*
cannot be skipped, we can directly use its max taken count
disregarding other exits. I previously took the minimum here without
checking whether the other exit could be skipped. The correct, and
simpler thing to do here is just to directly use the loop latch's max
non-exiting iterations as the loops max back-edge count.

In the problematic test case, the first loop exit had a max of zero
non-exiting iterations, but could be skipped. The loop latch was known
not to be skipped but had max of one non-exiting iteration. We
incorrectly claimed the loop back-edge could be taken zero times, when
it is actually taken one time.

Fixes Loop %for.body.i: <multiple exits> Unpredictable backedge-taken count.
Loop %for.body.i: max backedge-taken count is 1.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209358 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-22 00:37:03 +00:00
Filipe Cabecinhas
31c8059e5d Added tests for the cost of lowering VSELECT instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@209045 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-16 22:47:58 +00:00
Alp Toker
727273b11c Fix typos
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208839 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-15 01:52:21 +00:00
Adam Nemet
45fc47013f [Test] Trim unnecessary .c and .cpp from config.suffix in lit.local.cfg
Tested by comparing make check VERBOSE=1 before and after to make sure
no tests are missed.  (VERBOSE=1 prints the list of tests.)

Only one test :( remains where .cpp is required:

tools/llvm-cov/range_based_for.cpp:// RUN: llvm-cov range_based_for.cpp | FileCheck %s --check-prefix=STDOUT

The topic was discussed in this thread:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140428/214905.html

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208621 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-12 19:57:31 +00:00
Sebastian Pop
2f5f1c2ccb do not assert when delinearization fails
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208615 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-12 19:01:53 +00:00
Sebastian Pop
dd2dc5b166 add testcase for r208237: do not collect undef terms
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208347 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-08 18:38:58 +00:00
Sebastian Pop
5026b2cc8b split delinearization pass in 3 steps
To compute the dimensions of the array in a unique way, we split the
delinearization analysis in three steps:

- find parametric terms in all memory access functions
- compute the array dimensions from the set of terms
- compute the delinearized access functions for each dimension

The first step is executed on all the memory access functions such that we
gather all the patterns in which an array is accessed. The second step reduces
all this information in a unique description of the sizes of the array. The
third step is delinearizing each memory access function following the common
description of the shape of the array computed in step 2.

This rewrite of the delinearization pass also solves a problem we had with the
previous implementation: because the previous algorithm was by induction on the
structure of the SCEV, it would not correctly recognize the shape of the array
when the memory access was not following the nesting of the loops: for example,
see polly/test/ScopInfo/multidim_only_ivs_3d_reverse.ll

; void foo(long n, long m, long o, double A[n][m][o]) {
;
;   for (long i = 0; i < n; i++)
;     for (long j = 0; j < m; j++)
;       for (long k = 0; k < o; k++)
;         A[i][k][j] = 1.0;

Starting with this patch we no longer delinearize access functions that do not
contain parameters, for example in test/Analysis/DependenceAnalysis/GCD.ll

;;  for (long int i = 0; i < 100; i++)
;;    for (long int j = 0; j < 100; j++) {
;;      A[2*i - 4*j] = i;
;;      *B++ = A[6*i + 8*j];

these accesses will not be delinearized as the upper bound of the loops are
constants, and their access functions do not contain SCEVUnknown parameters.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208232 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-07 18:01:20 +00:00
Benjamin Kramer
2c06cd8612 TTI: Estimate @llvm.fmuladd cost as fmul + fadd when FMA's aren't legal on the target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@208115 91177308-0d34-0410-b5e6-96231b3b80d8
2014-05-06 18:36:23 +00:00
Duncan P. N. Exon Smith
96837f7232 Reapply "blockfreq: Approximate irreducible control flow"
This reverts commit r207287, reapplying r207286.

I'm hoping that declaring an explicit struct and instantiating
`addBlockEdges()` directly works around the GCC crash from r207286.
This is a lot more boilerplate, though.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207438 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-28 20:02:29 +00:00
Benjamin Kramer
3551384ae2 X86TTI: Adjust sdiv cost now that we can lower it on plain SSE2.
Includes a fix for a horrible typo that caused all SDIV costs to be
slightly off :)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207371 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-27 18:47:54 +00:00
Benjamin Kramer
d9ced7112e X86TTI: i16/i32 vector div with a constant (splat) divisor are reasonably cheap now.
Turn vectorization back on.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207320 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-26 14:53:05 +00:00
Duncan P. N. Exon Smith
cee7abfb2c Revert "blockfreq: Approximate irreducible control flow"
This reverts commit r207286.  It causes an ICE on the
cmake-llvm-x86_64-linux buildbot [1]:

    llvm/lib/Analysis/BlockFrequencyInfo.cpp: In lambda function:
    llvm/lib/Analysis/BlockFrequencyInfo.cpp:182:1: internal compiler error: in get_expr_operands, at tree-ssa-operands.c:1035

[1]: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/12093/steps/build_llvm/logs/stdio

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207287 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-25 23:16:58 +00:00
Duncan P. N. Exon Smith
d905bba691 blockfreq: Approximate irreducible control flow
Previously, irreducible backedges were ignored.  With this commit,
irreducible SCCs are discovered on the fly, and modelled as loops with
multiple headers.

This approximation specifies the headers of irreducible sub-SCCs as its
entry blocks and all nodes that are targets of a backedge within it
(excluding backedges within true sub-loops).  Block frequency
calculations act as if we insert a new block that intercepts all the
edges to the headers.  All backedges and entries to the irreducible SCC
point to this imaginary block.  This imaginary block has an edge (with
even probability) to each header block.

The result is now reasonable enough that I've added a number of
testcases for irreducible control flow.  I've outlined in
`BlockFrequencyInfoImpl.h` ways to improve the approximation.

<rdar://problem/14292693>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207286 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-25 23:08:57 +00:00
Duncan P. N. Exon Smith
39087bfbf0 blockfreq: Only one mass distribution per node
Remove the concepts of "forward" and "general" mass distributions, which
was wrong.  The split might have made sense in an early version of the
algorithm, but it's definitely wrong now.

<rdar://problem/14292693>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207195 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-25 04:38:43 +00:00
Duncan P. N. Exon Smith
ebda12ef6f blockfreq: Use better branch weights in multiexit test
The branch weights were even before.  Make them different.

<rdar://problem/14292693>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207193 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-25 04:38:37 +00:00
Duncan P. N. Exon Smith
e838db3104 blockfreq: Clean up irreducible testcases
Strip irreducible testcases to pure control flow.  The function calls
made the branch weights more believable but cluttered it up a lot.
There isn't going to be any constant analysis here, so just use dumb
branch logic to clarify the important parts.

<rdar://problem/14292693>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@207192 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-25 04:38:35 +00:00
Duncan P. N. Exon Smith
846a14340c blockfreq: Skip irreducible backedges inside functions
The branch that skips irreducible backedges was only active when
propagating mass at the top-level.  In particular, when propagating mass
through a loop recognized by `LoopInfo` with irreducible control flow
inside, irreducible backedges would not be skipped.

Not sure where that idea came from, but the result was that mass was
lost until after loop exit.  Added a testcase that covers this case.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206860 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-22 03:31:53 +00:00
Duncan P. N. Exon Smith
9a11d668f9 Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commit r206707, reapplying r206704.  The preceding commit
to CalcSpillWeights should have sorted out the failing buildbots.

<rdar://problem/14292693>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206766 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-21 17:57:07 +00:00
Duncan P. N. Exon Smith
f44eda4764 Revert "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commit r206704, as expected.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206707 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-19 22:46:00 +00:00
Duncan P. N. Exon Smith
f465370a49 Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commit r206677, reapplying my BlockFrequencyInfo rewrite.

I've done a careful audit, added some asserts, and fixed a couple of
bugs (unfortunately, they were in unlikely code paths).  There's a small
chance that this will appease the failing bots [1][2].  (If so, great!)

If not, I have a follow-up commit ready that will temporarily add
-debug-only=block-freq to the two failing tests, allowing me to compare
the code path between what the failing bots and what my machines (and
the rest of the bots) are doing.  Once I've triggered those builds, I'll
revert both commits so the bots go green again.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445

<rdar://problem/14292693>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206704 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-19 22:34:26 +00:00
Duncan P. N. Exon Smith
2033057de8 Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2)
This reverts commit r206666, as planned.

Still stumped on why the bots are failing.  Sanitizer bots haven't
turned anything up.  If anyone can help me debug either of the failures
(referenced in r206666) I'll owe them a beer.  (In the meantime, I'll be
auditing my patch for undefined behaviour.)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206677 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-19 00:42:46 +00:00
Duncan P. N. Exon Smith
036e26bc29 Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2)
This reverts commit r206628, reapplying r206622 (and r206626).

Two tests are failing only on buildbots [1][2]: i.e., I can't reproduce
on Darwin, and Chandler can't reproduce on Linux.  Asan and valgrind
don't tell us anything, but we're hoping the msan bot will catch it.

So, I'm applying this again to get more feedback from the bots.  I'll
leave it in long enough to trigger builds in at least the sanitizer
buildbots (it was failing for reasons unrelated to my commit last time
it was in), and hopefully a few others.... and then I expect to revert a
third time.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206666 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 22:30:03 +00:00
Duncan P. N. Exon Smith
ebb5d29473 Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2)
This reverts commit r206622 and the MSVC fixup in r206626.

Apparently the remotely failing tests are still failing, despite my
attempt to fix the nondeterminism in r206621.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206628 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 17:56:08 +00:00
Duncan P. N. Exon Smith
54850bedf2 Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commit r206556, effectively reapplying commit r206548 and
its fixups in r206549 and r206550.

In an intervening commit I've added target triples to the tests that
were failing remotely [1] (but passing locally).  I'm hoping the mystery
is solved?  I'll revert this again if the tests are still failing
remotely.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206622 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 17:22:25 +00:00
Chandler Carruth
4c7edb1240 [LCG] Add support for building persistent and connected SCCs to the
LazyCallGraph. This is the start of the whole point of this different
abstraction, but it is just the initial bits. Here is a run-down of
what's going on here. I'm planning to incorporate some (or all) of this
into comments going forward, hopefully with better editing and wording.
=]

The crux of the problem with the traditional way of building SCCs is
that they are ephemeral. The new pass manager however really needs the
ability to associate analysis passes and results of analysis passes with
SCCs in order to expose these analysis passes to the SCC passes. Making
this work is kind-of the whole point of the new pass manager. =]

So, when we're building SCCs for the call graph, we actually want to
build persistent nodes that stick around and can be reasoned about
later. We'd also like the ability to walk the SCC graph in more complex
ways than just the traditional postorder traversal of the current CGSCC
walk. That means that in addition to being persistent, the SCCs need to
be connected into a useful graph structure.

However, we still want the SCCs to be formed lazily where possible.

These constraints are quite hard to satisfy with the SCC iterator. Also,
using that would bypass our ability to actually add data to the nodes of
the call graph to facilite implementing the Tarjan walk. So I've
re-implemented things in a more direct and embedded way. This
immediately makes it easy to get the persistence and connectivity
correct, and it also allows leveraging the existing nodes to simplify
the algorithm. I've worked somewhat to make this implementation more
closely follow the traditional paper's nomenclature and strategy,
although it is still a bit obtuse because it isn't recursive, using
an explicit stack and a tail call instead, and it is interruptable,
resuming each time we need another SCC.

The other tricky bit here, and what actually took almost all the time
and trials and errors I spent building this, is exactly *what* graph
structure to build for the SCCs. The naive thing to build is the call
graph in its newly acyclic form. I wrote about 4 versions of this which
did precisely this. Inevitably, when I experimented with them across
various use cases, they became incredibly awkward. It was all
implementable, but it felt like a complete wrong fit. Square peg, round
hole. There were two overriding aspects that pushed me in a different
direction:

1) We want to discover the SCC graph in a postorder fashion. That means
   the root node will be the *last* node we find. Using the call-SCC DAG
   as the graph structure of the SCCs results in an orphaned graph until
   we discover a root.

2) We will eventually want to walk the SCC graph in parallel, exploring
   distinct sub-graphs independently, and synchronizing at merge points.
   This again is not helped by the call-SCC DAG structure.

The structure which, quite surprisingly, ended up being completely
natural to use is the *inverse* of the call-SCC DAG. We add the leaf
SCCs to the graph as "roots", and have edges to the caller SCCs. Once
I switched to building this structure, everything just fell into place
elegantly.

Aside from general cleanups (there are FIXMEs and too few comments
overall) that are still needed, the other missing piece of this is
support for iterating across levels of the SCC graph. These will become
useful for implementing #2, but they aren't an immediate priority.

Once SCCs are in good shape, I'll be working on adding mutation support
for incremental updates and adding the pass manager that this analysis
enables.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206581 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 10:50:32 +00:00
Duncan P. N. Exon Smith
c7a3b95c0f Revert "blockfreq: Rewrite BlockFrequencyInfoImpl"
This reverts commits r206548, r206549 and r206549.

There are some unit tests failing that aren't failing locally [1], so
reverting until I have time to investigate.

[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206556 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 02:17:43 +00:00
Duncan P. N. Exon Smith
cc1e1707b8 blockfreq: Rewrite BlockFrequencyInfoImpl
Rewrite the shared implementation of BlockFrequencyInfo and
MachineBlockFrequencyInfo entirely.

The old implementation had a fundamental flaw:  precision losses from
nested loops (or very wide branches) compounded past loop exits (and
convergence points).

The @nested_loops testcase at the end of
test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating.  This
function has three nested loops, with branch weights in the loop headers
of 1:4000 (exit:continue).  The old analysis gives non-sensical results:

    Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
    ---- Block Freqs ----
     entry = 1.0
     for.cond1.preheader = 1.00103
     for.cond4.preheader = 5.5222
     for.body6 = 18095.19995
     for.inc8 = 4.52264
     for.inc11 = 0.00109
     for.end13 = 0.0

The new analysis gives correct results:

    Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
    block-frequency-info: nested_loops
     - entry: float = 1.0, int = 8
     - for.cond1.preheader: float = 4001.0, int = 32007
     - for.cond4.preheader: float = 16008001.0, int = 128064007
     - for.body6: float = 64048012001.0, int = 512384096007
     - for.inc8: float = 16008001.0, int = 128064007
     - for.inc11: float = 4001.0, int = 32007
     - for.end13: float = 1.0, int = 8

Most importantly, the frequency leaving each loop matches the frequency
entering it.

The new algorithm leverages BlockMass and PositiveFloat to maintain
precision, separates "probability mass distribution" from "loop
scaling", and uses dithering to eliminate probability mass loss.  I have
unit tests for these types out of tree, but it was decided in the review
to make the classes private to BlockFrequencyInfoImpl, and try to shrink
them (or remove them entirely) in follow-up commits.

The new algorithm should generally have a complexity advantage over the
old.  The previous algorithm was quadratic in the worst case.  The new
algorithm is still worst-case quadratic in the presence of irreducible
control flow, but it's linear without it.

The key difference between the old algorithm and the new is that control
flow within a loop is evaluated separately from control flow outside,
limiting propagation of precision problems and allowing loop scale to be
calculated independently of mass distribution.  Loops are visited
bottom-up, their loop scales are calculated, and they are replaced by
pseudo-nodes.  Mass is then distributed through the function, which is
now a DAG.  Finally, loops are revisited top-down to multiply through
the loop scales and the masses distributed to pseudo nodes.

There are some remaining flaws.

  - Irreducible control flow isn't modelled correctly.  LoopInfo and
    MachineLoopInfo ignore irreducible edges, so this algorithm will
    fail to scale accordingly.  There's a note in the class
    documentation about how to get closer.  See also the comments in
    test/Analysis/BlockFrequencyInfo/irreducible.ll.

  - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting
    the 64-bit integer precision used downstream.

  - The "bias" calculation proposed on llvmdev is *not* incorporated
    here.  This will be added in a follow-up commit, once comments from
    this review have been handled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206548 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-18 01:57:45 +00:00
Akira Hatanaka
268c0509a9 Fix a bug in which BranchProbabilityInfo wasn't setting branch weights of basic blocks inside loops correctly.
Previously, BranchProbabilityInfo::calcLoopBranchHeuristics would determine the weights of basic blocks inside loops even when it didn't have enough information to estimate the branch probabilities correctly. This patch fixes the function to exit early if it doesn't see any exit edges or back edges and let the later heuristics determine the weights.

This fixes PR18705 and <rdar://problem/15991090>.

Differential Revision: http://reviews.llvm.org/D3363



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206194 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 16:56:19 +00:00
Hal Finkel
1aee811d71 Don't assert in BasicTTI::getMemoryOpCost for non-simple types
BasicTTI::getMemoryOpCost must explicitly check for non-simple types; setting
AllowUnknown=true with TLI->getSimpleValueType is not sufficient because, for
example, non-power-of-two vector types return non-simple EVTs (not MVT::Other).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206150 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-14 05:59:09 +00:00
Sebastian Pop
d541e6e6ea in findGCD of multiply expr return the gcd
we used to return 1 instead of the gcd

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205800 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-08 21:21:05 +00:00
Hal Finkel
e6a5b33e6e [PowerPC] Adjust load/store costs in PPCTTI
This provides more realistic costs for the insert/extractelement instructions
(which are load/store pairs), accounts for the cheap unaligned Altivec load
sequence, and for unaligned VSX load/stores.

Bad news:
MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation)
SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized)
MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown

Good news:
SingleSource/Benchmarks/Shootout/ary3 - 54% speedup
SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup
MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup
MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup
MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup

Unfortunately, estimating the costs of the stack-based scalarization sequences
is hard, and adjusting these costs is like a game of whac-a-mole :( I'll
revisit this again after we have better codegen for vector extloads and
truncstores and unaligned load/stores.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205658 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-04 23:51:18 +00:00
Hal Finkel
d68b03bcd2 Account for scalarization costs in BasicTTI::getMemoryOpCost for extending vector loads
When a vector type legalizes to a larger vector type, and the target does not
support the associated extending load (or truncating store), then legalization
will scalarize the load (or store) resulting in an associated scalarization
cost.  BasicTTI::getMemoryOpCost needs to account for this.

Between this, and r205487, PowerPC on the P7 with VSX enabled shows:

MultiSource/Benchmarks/PAQ8p/paq8p: 43% speedup
SingleSource/Benchmarks/BenchmarkGame/puzzle: 51% speedup
SingleSource/UnitTests/Vectorizer/gcc-loops 28% speedup

(some of these are new; some of these, such as PAQ8p, just reverse regressions
that VSX support would trigger)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205495 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-03 00:53:59 +00:00
Hal Finkel
9263e6f08d Fix multi-register costs in BasicTTI::getCastInstrCost
For an cast (extension, etc.), the currently logic predicts a low cost if the
associated operation (keyed on the destination type) is legal (or promoted).
This is not true when the number of values required to legalize the type is
changing. For example, <8 x i16> being sign extended by <8 x i32> is not
generically cheap on PPC with VSX, even though sign extension to v4i32 is
legal, because two output v4i32 values are required compared to the single
v8i16 input value, and without custom logic in the target, this conversion will
scalarize.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205487 91177308-0d34-0410-b5e6-96231b3b80d8
2014-04-02 23:18:54 +00:00
Tim Northover
7b837d8c75 ARM64: initial backend import
This adds a second implementation of the AArch64 architecture to LLVM,
accessible in parallel via the "arm64" triple. The plan over the
coming weeks & months is to merge the two into a single backend,
during which time thorough code review should naturally occur.

Everything will be easier with the target in-tree though, hence this
commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205090 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-29 10:18:08 +00:00
Arnold Schwaighofer
23463c9261 PR15967 Fix in basicaa for faulty returning no alias.
This commit consist of two parts.
The first part fix the PR15967. The wrong conclusion was made when the MaxLookup
limit was reached. The fix introduce a out parameter (MaxLookupReached) to
DecomposeGEPExpression that the function aliasGEP can act upon.
The second part is introducing the constant MaxLookupSearchDepth to make sure
that DecomposeGEPExpression and GetUnderlyingObject use the same search depth.
This is a small cleanup to clarify the original algorithm.

Patch by Karl-Johan Karlsson!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204859 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-26 21:30:19 +00:00
Benjamin Kramer
c1c74fb2b4 ScalarEvolution: Compute exit counts for loops with a power-of-2 step.
If we have a loop of the form
for (unsigned n = 0; n != (k & -32); n += 32) {}
then we know that n is always divisible by 32 and the loop must
terminate. Even if we have a condition where the loop counter will
overflow it'll always hold this invariant.

PR19183. Our loop vectorizer creates this pattern and it's also
occasionally formed by loop counters derived from pointers.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204728 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-25 16:25:12 +00:00
Rafael Espindola
38048cdb1c Reject alias to undefined symbols in the verifier.
On ELF and COFF an alias is just another name for a position in the file.
There is no way to refer to a position in another file, so an alias to
undefined is meaningless.

MachO currently doesn't support aliases. The spec has a N_INDR, which when
implemented will have a different set of restrictions. Adding support for
it shouldn't be harder than any other IR extension.

For now, having the IR represent what is actually possible with current
tools makes it easier to fix the design of GlobalAlias.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203705 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-12 20:15:49 +00:00
Raul E. Silvera
6df2b69098 When analyzing vectors of element type that require legalization,
the legalization cost must be included to get an accurate
estimation of the total cost of the scalarized vector.
The inaccurate cost triggered unprofitable SLP vectorization on
32-bit X86.

Summary:
Include legalization overhead when computing scalarization cost

Reviewers: hfinkel, nadav

CC: chandlerc, rnk, llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2992

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203509 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-10 22:59:13 +00:00
Matt Arsenault
38c18efe41 Teach lint about address spaces
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203132 91177308-0d34-0410-b5e6-96231b3b80d8
2014-03-06 17:33:55 +00:00
Sebastian Pop
4449ed2a70 add -da-delinearize runs and checks to MIV testcases
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201869 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-21 18:15:18 +00:00
Nico Rieck
c15d3a82ae Add extra CHECK prefix to tests with explicit prefix
These tests mistakenly assume that CHECK is still available even if an
explicit prefix is specified.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201492 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-16 13:28:15 +00:00
Nico Rieck
da39cf486a Actually call FileCheck in tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201491 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-16 13:27:39 +00:00
Nico Rieck
268e96a8a6 Fix broken CHECK lines
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201479 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-16 07:31:05 +00:00
Andrea Di Biagio
029a76b0a2 [Vectorizer] Add a new 'OperandValueKind' in TargetTransformInfo called
'OK_NonUniformConstValue' to identify operands which are constants but
not constant splats.

The cost model now allows returning 'OK_NonUniformConstValue'
for non splat operands that are instances of ConstantVector or
ConstantDataVector.

With this change, targets are now able to compute different costs
for instructions with non-uniform constant operands.
For example, On X86 the cost of a vector shift may vary depending on whether
the second operand is a uniform or non-uniform constant.

This patch applies the following changes:
 - The cost model computation now takes into account non-uniform constants;
 - The cost of vector shift instructions has been improved in
   X86TargetTransformInfo analysis pass;
 - BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish
   between non-uniform and uniform constant operands.

Added a new test to verify that the output of opt
'-cost-model -analyze' is valid in the following configurations: SSE2,
SSE4.1, AVX, AVX2.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201272 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-12 23:43:47 +00:00
Craig Topper
c7709a43ee Test case I forgot to 'add' for r201126.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201207 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-12 03:58:47 +00:00
Benjamin Kramer
cb27441554 ScalarEvolution: Analyze trip count of loops with a switch guarding the exit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201159 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-11 15:44:32 +00:00
Tim Northover
0c245b69f7 X86: add costs for 64-bit vector ext/trunc & rebalance
The most important part of this is probably adding any cost at all for
operations like zext <8 x i8> to <8 x i32>. Before they were being
recorded as extremely costly (24, I believe) which made LLVM fall back
on a 4-wide vectorisation of a loop.

It also rebalances the values for sext, zext and trunc. Lacking any
other sane metric that might work across CPU microarchitectures I went
for instructions. This seems to be in reasonable accord with the rest
of the table (sitofp, ...) though no doubt at least one value is
sub-optimal for some bizarre reason.

Finally, separate AVX and AVX2 values are provided where appropriate.
The CodeGen is quite different in many cases.

rdar://problem/15981990

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200928 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-06 18:18:36 +00:00
Chandler Carruth
57732bff1e [PM] Add a new "lazy" call graph analysis pass for the new pass manager.
The primary motivation for this pass is to separate the call graph
analysis used by the new pass manager's CGSCC pass management from the
existing call graph analysis pass. That analysis pass is (somewhat
unfortunately) over-constrained by the existing CallGraphSCCPassManager
requirements. Those requirements make it *really* hard to cleanly layer
the needed functionality for the new pass manager on top of the existing
analysis.

However, there are also a bunch of things that the pass manager would
specifically benefit from doing differently from the existing call graph
analysis, and this new implementation tries to address several of them:

- Be lazy about scanning function definitions. The existing pass eagerly
  scans the entire module to build the initial graph. This new pass is
  significantly more lazy, and I plan to push this even further to
  maximize locality during CGSCC walks.
- Don't use a single synthetic node to partition functions with an
  indirect call from functions whose address is taken. This node creates
  a huge choke-point which would preclude good parallelization across
  the fanout of the SCC graph when we got to the point of looking at
  such changes to LLVM.
- Use a memory dense and lightweight representation of the call graph
  rather than value handles and tracking call instructions. This will
  require explicit update calls instead of some updates working
  transparently, but should end up being significantly more efficient.
  The explicit update calls ended up being needed in many cases for the
  existing call graph so we don't really lose anything.
- Doesn't explicitly model SCCs and thus doesn't provide an "identity"
  for an SCC which is stable across updates. This is essential for the
  new pass manager to work correctly.
- Only form the graph necessary for traversing all of the functions in
  an SCC friendly order. This is a much simpler graph structure and
  should be more memory dense. It does limit the ways in which it is
  appropriate to use this analysis. I wish I had a better name than
  "call graph". I've commented extensively this aspect.

This is still very much a WIP, in fact it is really just the initial
bits. But it is about the fourth version of the initial bits that I've
implemented with each of the others running into really frustrating
problms. This looks like it will actually work and I'd like to split the
actual complexity across commits for the sake of my reviewers. =] The
rest of the implementation along with lots of wiring will follow
somewhat more rapidly now that there is a good path forward.

Naturally, this doesn't impact any of the existing optimizer. This code
is specific to the new pass manager.

A bunch of thanks are deserved for the various folks that have helped
with the design of this, especially Nick Lewycky who actually sat with
me to go through the fundamentals of the final version here.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200903 91177308-0d34-0410-b5e6-96231b3b80d8
2014-02-06 04:37:03 +00:00
Nick Lewycky
4bfa6fecc1 Fix crasher introduced in r200203 and caught by a libc++ buildbot. Don't assume that getMulExpr returns a SCEVMulExpr, it may have simplified it to something else!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200210 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-27 10:47:44 +00:00
Nick Lewycky
f2282cac95 Teach SCEV to handle more cases of 'and X, CST', specifically where CST is any number of contiguous 1 bits in a row, with any number of leading and trailing 0 bits.
Unfortunately, this in turn led to some lower quality SCEVs due to some different paths through expression simplification, so add getUDivExactExpr and use it. This fixes all instances of the problems that I found, but we can make that function smarter as necessary.

Merge test "xor-and.ll" into "and-xor.ll" since I needed to update it anyways. Test 'nsw-offset.ll' analyzes a little deeper, %n now gets a scev in terms of %no instead of a SCEVUnknown.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200203 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-27 10:04:03 +00:00
Alp Toker
ae43cab6ba Fix known typos
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200018 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-24 17:20:08 +00:00
Arnold Schwaighofer
8963071a14 BasicAA: We need to check both access sizes when comparing a gep and an
underlying object of unknown size.

Fixes PR18460.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199351 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-16 04:53:18 +00:00
Benjamin Kramer
ccdb9c9483 Fix broken CHECK lines.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199016 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-11 21:06:00 +00:00
Stepan Dyatkovskiy
3539d6d40c Fixed old typo in ScalarEvolution, that caused wrong SCEVs zext operation.
Detailed description is here:
http://llvm.org/bugs/show_bug.cgi?id=18000#c16

For participation in bugfix process special thanks to David Wiberg.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198863 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-09 12:26:12 +00:00
Arnold Schwaighofer
3036182e99 BasicAA: Use reachabilty instead of dominance for checking value equality in phi
cycles

This allows the value equality check to work even if we don't have a dominator
tree. Also add some more comments.

I was worried about compile time impacts and did not implement reachability but
used the dominance check in the initial patch. The trade-off was that the
dominator tree was required.
The llvm utility function isPotentiallyReachable cuts off the recursive search
after 32 visits. Testing did not show any compile time regressions showing my
worries unjustfied.

No compile time or performance regressions at O3 -flto -mavx on test-suite +
externals.

Addresses review comments from r198290.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198400 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-03 05:47:03 +00:00
Arnold Schwaighofer
1bdb320dae BasicAA: Fix value equality and phi cycles
When there are cycles in the value graph we have to be careful interpreting
"Value*" identity as "value" equivalence. We interpret the value of a phi node
as the value of its operands.
When we check for value equivalence now we make sure that the "Value*" dominates
all cycles (phis).

%0 = phi [%noaliasval, %addr2]
%l = load %ptr
%addr1 = gep @a, 0, %l
%addr2 = gep @a, 0, (%l + 1)
store %ptr ...

Before this patch we would return NoAlias for (%0, %addr1) which is wrong
because the value of the load is from different iterations of the loop.

Tested on x86_64 -mavx at O3 and O3 -flto with no performance or compile time
regressions.

PR18068
radar://15653794

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198290 91177308-0d34-0410-b5e6-96231b3b80d8
2014-01-02 03:31:36 +00:00
Matt Arsenault
74c996cbd1 Use correct size for address space in BasicAA.
The tests just hit this with a different sized
address space since I haven't figured out how
to use this to break it.

I thought I committed this a long time ago,
and I'm not sure why missing this hasn't caused
any problems.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194903 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-16 00:36:43 +00:00
Sebastian Pop
430b6eb419 improve dependence analysis testcases
print the name of the function on which the dependence analysis is performed
such that changes to the testcase are easier to review.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194528 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-12 22:47:30 +00:00
Sebastian Pop
5230ad61fd delinearization of arrays
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194527 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-12 22:47:20 +00:00
Andrew Trick
10bb82e54f Rewrite SCEV's backedge taken count computation.
Patch by Michele Scandale!

Rewrite of the functions used to compute the backedge taken count of a
loop on LT and GT comparisons.

I decided to split the handling of LT and GT cases becasue the trick
"a > b == -a < -b" in some cases prevents the trip count computation
due to the multiplication by -1 on the two operands of the
comparison. This issue comes from the conservative computation of
value range of SCEVs: taking the negative SCEV of an expression that
have a small positive range (e.g. [0,31]), we would have a SCEV with a
fullset as value range.

Indeed, in the new rewritten function I tried to better handle the
maximum backedge taken count computation when MAX/MIN expression are
used to handle the cases where no entry guard is found.

Some test have been modified in order to check the new value correctly
(I manually check them and reasoning on possible overflow the new
values seem correct).

I finally added a new test case related to the multiplication by -1
issue on GT comparisons.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194116 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-06 02:08:26 +00:00
Hal Finkel
e14fb07357 Consider (x == -1) unlikely in BranchProbabilityInfo
This adds another heuristic to BPI, similar to the existing heuristic that
considers (x == 0) unlikely to be true. As suggested in the PACT'98 paper by
Deitrich, Cheng, and Hwu, -1 is often used to indicate an invalid index, and
equality comparisons with -1 are also unlikely to succeed. Local
experimentation supports this hypothesis: This yields a 1-2% speedup in the
test-suite sqlite benchmark on the PPC A2 core, with no significant
regressions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193855 91177308-0d34-0410-b5e6-96231b3b80d8
2013-11-01 10:58:22 +00:00
Benjamin Kramer
19ea37059a SCEV: Make the final add of an inbounds GEP nuw if we know that the index is positive.
We can't do this for the general case as saying a GEP with a negative index
doesn't have unsigned wrap isn't valid for negative indices.
  %gep = getelementptr inbounds i32* %p, i64 -1

But an inbounds GEP cannot run past the end of address space. So we check for
the very common case of a positive index and make GEPs derived from that NUW.
Together with Andy's recent non-unit stride work this lets us analyze loops
like

  void foo3(int *a, int *b) {
    for (; a < b; a++) {}
  }

PR12375, PR12376.

Differential Revision: http://llvm-reviews.chandlerc.com/D2033

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193514 91177308-0d34-0410-b5e6-96231b3b80d8
2013-10-28 07:30:06 +00:00
Shuxin Yang
69bd41dfe3 Revert r193251 : Use address-taken to disambiguate global variable and indirect memops.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193489 91177308-0d34-0410-b5e6-96231b3b80d8
2013-10-27 03:08:44 +00:00
Benjamin Kramer
bb41c75ab5 X86: Custom lower sext v16i8 to v16i16, and the corresponding truncate.
Also update the cost model.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193270 91177308-0d34-0410-b5e6-96231b3b80d8
2013-10-23 21:06:07 +00:00
Shuxin Yang
8e3851a6eb Use address-taken to disambiguate global variable and indirect memops.
Major steps include:
 1). introduces a not-addr-taken bit-field in GlobalVariable
 2). GlobalOpt pass sets "not-address-taken" if it proves a global varirable 
    dosen't have its address taken.
 3). AA use this info for disambiguation. 


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193251 91177308-0d34-0410-b5e6-96231b3b80d8
2013-10-23 17:28:19 +00:00
Manman Ren
b62e1033a4 Simplify testing case (Thanks Rafael for the testing case).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193177 91177308-0d34-0410-b5e6-96231b3b80d8
2013-10-22 18:15:50 +00:00
Manman Ren
11d78777d5 TBAA: fix PR17620.
We can have a struct type with a single field and the field does not start
with 0. In that case, we should correctly update the offset.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193137 91177308-0d34-0410-b5e6-96231b3b80d8
2013-10-22 01:40:25 +00:00
Matt Arsenault
4784bb6f44 Fix creating bitcasts between address spaces in SCEV.
The test before wasn't successfully testing this
since it was missing the datalayout piece to change
the size of the second address space.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193102 91177308-0d34-0410-b5e6-96231b3b80d8
2013-10-21 18:41:10 +00:00
Andrew Trick
a5c5bc9948 SCEV should use NSW to get trip count for positive nonunit stride loops.
SCEV currently fails to compute loop counts for nonunit stride
loops. This comes up frequently. It prevents loop optimization and
forces vectorization to insert extra loop checks.

For example:
void foo(int n, int *x) {
 for (int i = 0; i < n; i += 3) {
   x[i] = i;
   x[i+1] = i+1;
   x[i+2] = i+2;
 }
}

We need to properly handle the case in which limit > INT_MAX-stride. In
the above case: n > INT_MAX-3. In this case the loop counter will step
beyond the limit and overflow at the same time. However, knowing that
signed integer overlow in undefined, we can assume the loop test
behavior is arbitrary after overflow. This obeys both C undefined
behavior rules, and the more strict LLVM poison value rules.

I'm finally fixing this in response to Hal Finkel's persistence.
The most probable reason that we never optimized this before is that
we were being careful to handle case where the developer expected a
side-effect free infinite loop relying on overflow:

for (int i = 0; i < n; i += s) {
  ++j;
}
return j;

If INT_MAX+1 is a multiple of s and n > INT_MAX-s, then we might
expect an infinite loop. However there are plenty of ways to achieve
this effect without relying on undefined behavior of signed overflow.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193015 91177308-0d34-0410-b5e6-96231b3b80d8
2013-10-18 23:43:53 +00:00
Chandler Carruth
dd5d86d992 Remove the very substantial, largely unmaintained legacy PGO
infrastructure.

This was essentially work toward PGO based on a design that had several
flaws, partially dating from a time when LLVM had a different
architecture, and with an effort to modernize it abandoned without being
completed. Since then, it has bitrotted for several years further. The
result is nearly unusable, and isn't helping any of the modern PGO
efforts. Instead, it is getting in the way, adding confusion about PGO
in LLVM and distracting everyone with maintenance on essentially dead
code. Removing it paves the way for modern efforts around PGO.

Among other effects, this removes the last of the runtime libraries from
LLVM. Those are being developed in the separate 'compiler-rt' project
now, with somewhat different licensing specifically more approriate for
runtimes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191835 91177308-0d34-0410-b5e6-96231b3b80d8
2013-10-02 15:42:23 +00:00
Matt Arsenault
81877ad396 Use CHECK-LABEL
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191713 91177308-0d34-0410-b5e6-96231b3b80d8
2013-09-30 23:31:55 +00:00
Manman Ren
9e81c3bdb2 TBAA: handle scalar TBAA format and struct-path aware TBAA format.
Remove the command line argument "struct-path-tbaa" since we should not depend
on command line argument to decide which format the IR file is using. Instead,
we check the first operand of the tbaa tag node, if it is a MDNode, we treat
it as struct-path aware TBAA format, otherwise, we treat it as scalar TBAA
format.

When clang starts to use struct-path aware TBAA format no matter whether
struct-path-tbaa is no, and we can auto-upgrade existing bc files, the support
for scalar TBAA format can be dropped.

Existing testing cases are updated to use the struct-path aware TBAA format.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191538 91177308-0d34-0410-b5e6-96231b3b80d8
2013-09-27 18:34:27 +00:00
Yi Jiang
cdfb43f0a6 X86 horizontal vector reduction cost model
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191021 91177308-0d34-0410-b5e6-96231b3b80d8
2013-09-19 17:48:48 +00:00
Arnold Schwaighofer
65457b679a Costmodel: Add support for horizontal vector reductions
Upcoming SLP vectorization improvements will want to be able to estimate costs
of horizontal reductions. Add infrastructure to support this.

We model reductions as a series of (shufflevector,add) tuples ultimately
followed by an extractelement. For example, for an add-reduction of <4 x float>
we could generate the following sequence:

 (v0, v1, v2, v3)
   \   \  /  /
     \  \  /
       +  +

 (v0+v2, v1+v3, undef, undef)
    \      /
 ((v0+v2) + (v1+v3), undef, undef)

 %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef,
                           <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
 %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
 %rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef,
                          <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
 %bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7
 %r = extractelement <4 x float> %bin.rdx8, i32 0

This commit adds a cost model interface "getReductionCost(Opcode, Ty, Pairwise)"
that will allow clients to ask for the cost of such a reduction (as backends
might generate more efficient code than the cost of the individual instructions
summed up). This interface is excercised by the CostModel analysis pass which
looks for reduction patterns like the one above - starting at extractelements -
and if it sees a matching sequence will call the cost model interface.

We will also support a second form of pairwise reduction that is well supported
on common architectures (haddps, vpadd, faddp).

 (v0, v1, v2, v3)
  \   /    \  /
 (v0+v1, v2+v3, undef, undef)
    \     /
 ((v0+v1)+(v2+v3), undef, undef, undef)

  %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef,
        <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef>
  %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef,
        <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
  %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1
  %rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
        <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
  %rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef,
        <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
  %bin.rdx.1 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1
  %r = extractelement <4 x float> %bin.rdx.1, i32 0

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190876 91177308-0d34-0410-b5e6-96231b3b80d8
2013-09-17 18:06:50 +00:00
Matt Arsenault
14807bd8c8 Teach ScalarEvolution about pointer address spaces
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190425 91177308-0d34-0410-b5e6-96231b3b80d8
2013-09-10 19:55:24 +00:00
Matt Arsenault
e3e5f77d77 Fix lint assert on integer vector division
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@189290 91177308-0d34-0410-b5e6-96231b3b80d8
2013-08-26 23:29:33 +00:00
Bill Wendling
3c3ee1f8ac FileCheck-ize tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188971 91177308-0d34-0410-b5e6-96231b3b80d8
2013-08-22 00:51:19 +00:00
Daniel Dunbar
24ec2e5a72 [tests] Cleanup initialization of test suffixes.
- Instead of setting the suffixes in a bunch of places, just set one master
   list in the top-level config. We now only modify the suffix list in a few
   suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).

 - Aside from removing the need for a bunch of lit.local.cfg files, this enables
   4 tests that were inadvertently being skipped (one in
   Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
   CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
   XFAILED).

 - This commit also fixes a bunch of config files to use config.root instead of
   older copy-pasted code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188513 91177308-0d34-0410-b5e6-96231b3b80d8
2013-08-16 00:37:11 +00:00
Bill Wendling
ac838d1c14 FileCheckize some of the testcases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187756 91177308-0d34-0410-b5e6-96231b3b80d8
2013-08-05 23:43:18 +00:00
Renato Golin
38ffffeebc Fixes ARM LNT bot from SLP change in O3
This patch fixes the multiple breakages on ARM test-suite after the SLP
vectorizer was introduced by default on O3. The problem was an illegal
vector type on ARMTTI::getCmpSelInstrCost() <3 x i1> which is not simple.

The guard protects this code from breaking (cause of the problems) but
doesn't fix the issue that is generating the odd vector in the first
place, which also needs to be investigated.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187658 91177308-0d34-0410-b5e6-96231b3b80d8
2013-08-02 17:10:04 +00:00
Stephen Lin
0dfc166487 Add newlines at end of test files, no functionality change
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186263 91177308-0d34-0410-b5e6-96231b3b80d8
2013-07-13 22:00:58 +00:00
Hal Finkel
04b84c2f92 Add the nearbyint -> FNEARBYINT mapping to BasicTargetTransformInfo
This fixes an oversight that Intrinsic::nearbyint was not being mapped to
ISD::FNEARBYINT (thus fixing the over-optimistic cost we were assigning to
nearbyint calls for some targets).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185783 91177308-0d34-0410-b5e6-96231b3b80d8
2013-07-08 03:24:07 +00:00
Nick Lewycky
dc89737bcd Extend 'readonly' and 'readnone' to work on function arguments as well as
functions. Make the function attributes pass add it to known library functions
and when it can deduce it.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185735 91177308-0d34-0410-b5e6-96231b3b80d8
2013-07-06 00:29:58 +00:00
Jakob Stoklund Olesen
97be1d608e Minimize precision loss when computing cyclic probabilities.
Allow block frequencies to exceed 32 bits by using the new
BlockFrequency division function.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185236 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-28 22:40:43 +00:00
Preston Briggs
26ba495309 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185187 91177308-0d34-0410-b5e6-96231b3b80d8 2013-06-28 18:44:48 +00:00
Nadav Rotem
16d36a5cd1 CostModel: improve the cost model for load/store of non power-of-two types such as <3 x float>, which are popular in graphics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185085 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-27 17:52:04 +00:00
Jakob Stoklund Olesen
b1c0cc22dd Print block frequencies in decimal form.
This is easier to read than the internal fixed-point representation.

If anybody knows the correct algorithm for converting fixed-point
numbers to base 10, feel free to fix it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184881 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-25 21:57:38 +00:00
Arnold Schwaighofer
34eb2406b4 X86 cost model: Vectorizing integer division is a bad idea
radar://14057959

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184872 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-25 19:14:09 +00:00
Benjamin Kramer
75b5162154 BlockFrequency: Bump up the entry frequency a bit.
This is a band-aid to fix the most severe regressions we're seeing from basing
spill decisions on block frequencies, until we have a better solution.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184835 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-25 13:34:40 +00:00
Benjamin Kramer
b47aceaf06 Revert "BlockFrequency: Saturate at 1 instead of 0 when multiplying a frequency with a branch probability."
This reverts commit r184584. Breaks PPC selfhost.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184590 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-21 20:20:27 +00:00
Benjamin Kramer
93702a3b07 BlockFrequency: Saturate at 1 instead of 0 when multiplying a frequency with a branch probability.
Zero is used by BlockFrequencyInfo as a special "don't know" value. It also
causes a sink for frequencies as you can't ever get off a zero frequency with
more multiplies.

This recovers a 10% regression on MultiSource/Benchmarks/7zip. A zero frequency
was propagated into an inner loop causing excessive spilling.

PR16402.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184584 91177308-0d34-0410-b5e6-96231b3b80d8
2013-06-21 19:30:05 +00:00
Andrew Trick
9c8e1f93b4 Unit test for SCEV fix r182989, PR16130.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183017 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-31 16:42:41 +00:00
Michael Kuperstein
9f5de6dadc Make BasicAliasAnalysis recognize the fact a noalias argument cannot alias another argument, even if the other argument is not itself marked noalias.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182755 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-28 08:17:48 +00:00
Diego Novillo
77226a03dc Add a new function attribute 'cold' to functions.
Other than recognizing the attribute, the patch does little else.
It changes the branch probability analyzer so that edges into
blocks postdominated by a cold function are given low weight.

Added analysis and code generation tests.  Added documentation for the
new attribute.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182638 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-24 12:26:52 +00:00
Tim Northover
4c8850cd1c AArch64: use MCJIT by default and enable related tests.
This just enables some testing I'd missed after implementing MCJIT
support.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181215 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-06 16:51:08 +00:00
Matt Arsenault
8b9dc21d6f Fix unchecked uses of DominatorTree in MemoryDependenceAnalysis.
Use unknown results for places where it would be needed

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181176 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-06 02:07:24 +00:00
Tobias Grosser
333403abbd RegionInfo: Do not crash if unreachable block is found
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181025 91177308-0d34-0410-b5e6-96231b3b80d8
2013-05-03 15:48:34 +00:00
Manman Ren
e78d832097 TBAA: remove !tbaa from testing cases if not used.
This will make it easier to turn on struct-path aware TBAA since the metadata
format will change.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180743 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-29 22:42:01 +00:00
Manman Ren
a5b314c27a Struct-path aware TBAA: change the format of TBAAStructType node.
We switch the order of offset and field type to make TBAAStructType node
(name, parent node, offset) similar to scalar TBAA node (name, parent node).
TypeIsImmutable is added to TBAAStructTag node.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180654 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-27 00:26:11 +00:00
Arnold Schwaighofer
45c9e0b412 ARM cost model: Integer div and rem is lowered to a function call
Reflect this in the cost model. I observed this in MiBench/consumer-lame.

radar://13354716

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180576 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-25 21:16:18 +00:00
Jim Grosbach
0cb1019e9c Legalize vector truncates by parts rather than just splitting.
Rather than just splitting the input type and hoping for the best, apply
a bit more cleverness. Just splitting the types until the source is
legal often leads to an illegal result time, which is then widened and a
scalarization step is introduced which leads to truly horrible code
generation. With the loop vectorizer, these sorts of operations are much
more common, and so it's worth extra effort to do them well.

Add a legalization hook for the operands of a TRUNCATE node, which will
be encountered after the result type has been legalized, but if the
operand type is still illegal. If simple splitting of both types
ends up with the result type of each half still being legal, just
do that (v16i16 -> v16i8 on ARM, for example). If, however, that would
result in an illegal result type (v8i32 -> v8i8 on ARM, for example),
we can get more clever with power-two vectors. Specifically,
split the input type, but also widen the result element size, then
concatenate the halves and truncate again.  For example on ARM,
To perform a "%res = v8i8 trunc v8i32 %in" we transform to:
  %inlo = v4i32 extract_subvector %in, 0
  %inhi = v4i32 extract_subvector %in, 4
  %lo16 = v4i16 trunc v4i32 %inlo
  %hi16 = v4i16 trunc v4i32 %inhi
  %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
  %res = v8i8 trunc v8i16 %in16

This allows instruction selection to generate three VMOVN instructions
instead of a sequences of moves, stores and loads.

Update the ARMTargetTransformInfo to take this improved legalization
into account.

Consider the simplified IR:

define <16 x i8> @test1(<16 x i32>* %ap) {
  %a = load <16 x i32>* %ap
  %tmp = trunc <16 x i32> %a to <16 x i8>
  ret <16 x i8> %tmp
}

define <8 x i8> @test2(<8 x i32>* %ap) {
  %a = load <8 x i32>* %ap
  %tmp = trunc <8 x i32> %a to <8 x i8>
  ret <8 x i8> %tmp
}

Previously, we would generate the truly hideous:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #20
	bic	sp, sp, #7
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d24, d25}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	vld1.64	{d18, d19}, [r2:128]
	add	r1, r0, #16
	vmovn.i32	d22, q8
	vld1.64	{d16, d17}, [r1:128]
	vmovn.i32	d20, q9
	vmovn.i32	d18, q12
	vmov.u16	r0, d22[3]
	strb	r0, [sp, #15]
	vmov.u16	r0, d22[2]
	strb	r0, [sp, #14]
	vmov.u16	r0, d22[1]
	strb	r0, [sp, #13]
	vmov.u16	r0, d22[0]
	vmovn.i32	d16, q8
	strb	r0, [sp, #12]
	vmov.u16	r0, d20[3]
	strb	r0, [sp, #11]
	vmov.u16	r0, d20[2]
	strb	r0, [sp, #10]
	vmov.u16	r0, d20[1]
	strb	r0, [sp, #9]
	vmov.u16	r0, d20[0]
	strb	r0, [sp, #8]
	vmov.u16	r0, d18[3]
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	vldmia	sp, {d16, d17}
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	mov	sp, r7
	pop	{r7}
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	push	{r7}
	mov	r7, sp
	sub	sp, sp, #12
	bic	sp, sp, #7
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d20, d21}, [r0:128]
	vmovn.i32	d18, q8
	vmov.u16	r0, d18[3]
	vmovn.i32	d16, q10
	strb	r0, [sp, #3]
	vmov.u16	r0, d18[2]
	strb	r0, [sp, #2]
	vmov.u16	r0, d18[1]
	strb	r0, [sp, #1]
	vmov.u16	r0, d18[0]
	strb	r0, [sp]
	vmov.u16	r0, d16[3]
	strb	r0, [sp, #7]
	vmov.u16	r0, d16[2]
	strb	r0, [sp, #6]
	vmov.u16	r0, d16[1]
	strb	r0, [sp, #5]
	vmov.u16	r0, d16[0]
	strb	r0, [sp, #4]
	ldm	sp, {r0, r1}
	mov	sp, r7
	pop	{r7}
	bx	lr

Now, however, we generate the much more straightforward:
	.syntax unified
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_test1
	.align	2
_test1:                                 @ @test1
@ BB#0:
	add	r1, r0, #48
	add	r2, r0, #32
	vld1.64	{d20, d21}, [r0:128]
	vld1.64	{d16, d17}, [r1:128]
	add	r1, r0, #16
	vld1.64	{d18, d19}, [r2:128]
	vld1.64	{d22, d23}, [r1:128]
	vmovn.i32	d17, q8
	vmovn.i32	d16, q9
	vmovn.i32	d18, q10
	vmovn.i32	d19, q11
	vmovn.i16	d17, q8
	vmovn.i16	d16, q9
	vmov	r0, r1, d16
	vmov	r2, r3, d17
	bx	lr

	.globl	_test2
	.align	2
_test2:                                 @ @test2
@ BB#0:
	vld1.64	{d16, d17}, [r0:128]
	add	r0, r0, #16
	vld1.64	{d18, d19}, [r0:128]
	vmovn.i32	d16, q8
	vmovn.i32	d17, q9
	vmovn.i16	d16, q8
	vmov	r0, r1, d16
	bx	lr

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179989 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-21 23:47:41 +00:00
Arnold Schwaighofer
9c63f0d687 X86 cost model: Exit before calling getSimpleVT on non-simple VTs
getSimpleVT can only handle simple value types.

radar://13676022

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179714 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-17 20:04:53 +00:00
Nadav Rotem
9eb366acba CostModel: increase the default cost of supported floating point operations from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179413 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-12 21:15:03 +00:00
Manman Ren
4df1854f26 Aliasing rules for struct-path aware TBAA.
Added PathAliases to check if two struct-path tags can alias.
Added command line option -struct-path-tbaa.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179337 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-11 23:24:18 +00:00
Arnold Schwaighofer
813456527e X86 cost model: Model cost for uitofp and sitofp on SSE2
The costs are overfitted so that I can still use the legalization factor.

For example the following kernel has about half the throughput vectorized than
unvectorized when compiled with SSE2. Before this patch we would vectorize it.

unsigned short A[1024];
double B[1024];
void f() {
  int i;
  for (i = 0; i < 1024; ++i) {
    B[i] = (double) A[i];
  }
}

radar://13599001

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179033 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-08 18:05:48 +00:00
Arnold Schwaighofer
cd3d60c450 TargetLowering: Fix getTypeConversion handling of extended vector types
The code in getTypeConversion attempts to promote the element vector type
before it trys to split or widen the vector.
After it failed finding a legal vector type by promoting it would continue using
the promoted vector element type. Thereby missing legal splitted vector types.
For example the type v32i32 that has a legal split of 4 x v3i32 on x86/sse2
would be transformed to: v32i256 and from there on successively split to:
v16i256, v8i256, v1i256 and then finally ends up as an i64 type.
By resetting the vector element type to the original vector element type that
existed before the promotion the code will attempt to split the vector type to
smaller vector widths of the same type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178999 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-07 20:22:56 +00:00
Arnold Schwaighofer
2537f3c659 X86 cost model: Differentiate cost for vector shifts of constants
SSE2 has efficient support for shifts by a scalar. My previous change of making
shifts expensive did not take this into account marking all shifts as expensive.
This would prevent vectorization from happening where it is actually beneficial.

With this change we differentiate between shifts of constants and other shifts.

radar://13576547

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178808 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-04 23:26:24 +00:00
Arnold Schwaighofer
6b6050b229 X86 cost model: Vector shifts are expensive in most cases
The default logic does not correctly identify costs of casts because they are
marked as custom on x86.

For some cases, where the shift amount is a scalar we would be able to generate
better code. Unfortunately, when this is the case the value (the splat) will get
hoisted out of the loop, thereby making it invisible to ISel.

radar://13130673
radar://13537826

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178703 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-03 21:46:05 +00:00
Benjamin Kramer
13497b3aa7 X86TTI: Add accurate costs for itofp operations, based on the actual instruction counts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178459 91177308-0d34-0410-b5e6-96231b3b80d8
2013-04-01 10:23:49 +00:00
Andrew Trick
e74c2e86cb Fix SCEV forgetMemoizedResults should search and destroy backedge exprs.
Fixes PR15570: SEGV: SCEV back-edge info invalid after dead code removal.

Indvars creates a SCEV expression for the loop's back edge taken
count, then determines that the comparison is always true and
removes it.

When loop-unroll asks for the expression, it contains a NULL
SCEVUnknkown (as a CallbackVH).

forgetMemoizedResults should invalidate the loop back edges expression.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177986 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-26 03:14:53 +00:00
Jyotsna Verma
1f7fe80447 Disable profiling tests for Hexagon since it doesn't support JIT.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177917 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-25 21:15:11 +00:00
Manman Ren
a2e3834d16 Support in AAEvaluator to print alias queries of loads/stores with TBAA tags.
Add "evaluate-tbaa" to print alias queries of loads/stores. Alias queries
between pointers do not include TBAA tags.

Add testing case for "placement new". TBAA currently says NoAlias.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177772 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-22 22:34:41 +00:00
Michael Liao
f74e9bf650 Correct cost model for vector shift on AVX2
- After moving logic recognizing vector shift with scalar amount from
  DAG combining into DAG lowering, we declare to customize all vector
  shifts even vector shift on AVX is legal. As a result, the cost model
  needs special tuning to identify these legal cases.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177586 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-20 22:01:10 +00:00
Nadav Rotem
b05130e1b2 Optimize sext <4 x i8> and <4 x i16> to <4 x i64>.
Patch by Ahmad, Muhammad T <muhammad.t.ahmad@intel.com>



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177421 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-19 18:38:27 +00:00
Renato Golin
5ad5f5931e Improve long vector sext/zext lowering on ARM
The ARM backend currently has poor codegen for long sext/zext
operations, such as v8i8 -> v8i32. This patch addresses this
by performing a custom expansion in ARMISelLowering. It also
adds/changes the cost of such lowering in ARMTTI.

This partially addresses PR14867.

Patch by Pete Couperus

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177380 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-19 08:15:38 +00:00
Arnold Schwaighofer
bf37bf9e21 ARM cost model: Make some vector integer to float casts cheaper
The default logic marks them as too expensive.

For example, before this patch we estimated:
  cost of 16 for instruction:   %r = uitofp <4 x i16> %v0 to <4 x float>

While this translates to:
  vmovl.u16 q8, d16
  vcvt.f32.u32  q8, q8

All other costs are left to the values assigned by the fallback logic. Theses
costs are mostly reasonable in the sense that they get progressively more
expensive as the instruction sequences emitted get longer.

radar://13445992

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177334 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-18 22:47:09 +00:00
Arnold Schwaighofer
01f2571014 ARM cost model: Correct cost for some cheap float to integer conversions
Fix cost of some "cheap" cast instructions. Before this patch we used to
estimate for example:
  cost of 16 for instruction:   %r = fptoui <4 x float> %v0 to <4 x i16>

While we would emit:
  vcvt.s32.f32  q8, q8
  vmovn.i32 d16, q8
  vuzp.8  d16, d17

All other costs are left to the values assigned by the fallback logic. Theses
costs are mostly reasonable in the sense that they get progressively more
expensive as the instruction sequences emitted get longer.

radar://13434072

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177333 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-18 22:47:06 +00:00
Arnold Schwaighofer
5193e4ebe2 ARM cost model: Fix costs for some vector selects
I was too pessimistic in r177105. Vector selects that fit into a legal register
type lower just fine. I was mislead by the code fragment that I was using. The
stores/loads that I saw in those cases came from lowering the conditional off
an address.

Changing the code fragment to:

%T0_3 = type <8 x i18>
%T1_3 = type <8 x i1>

define void @func_blend3(%T0_3* %loadaddr, %T0_3* %loadaddr2,
                         %T1_3* %blend, %T0_3* %storeaddr) {
  %v0 = load %T0_3* %loadaddr
  %v1 = load %T0_3* %loadaddr2
==> FROM:
  ;%c = load %T1_3* %blend
==> TO:
  %c = icmp slt %T0_3 %v0, %v1
==> USE:
  %r = select %T1_3 %c, %T0_3 %v0, %T0_3 %v1

  store %T0_3 %r, %T0_3* %storeaddr
  ret void
}

revealed this mistake.

radar://13403975

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177170 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-15 18:31:01 +00:00
Arnold Schwaighofer
c0d8dc0eb6 ARM cost model: Fix cost of fptrunc and fpext instructions
A vector fptrunc and fpext simply gets split into scalar instructions.

radar://13192358

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177159 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-15 15:10:47 +00:00
Arnold Schwaighofer
d81511f0a6 ARM cost model: Increase cost of some vector selects we do terrible on
By terrible I mean we store/load from the stack.

This matters on PAQp8 in _Z5trainPsS_ii (which is inlined into Mixer::update)
where we decide to vectorize a loop with a VF of 8 resulting in a 25%
degradation on a cortex-a8.

LV: Found an estimated cost of 2 for VF 8 For instruction:   icmp slt i32
LV: Found an estimated cost of 2 for VF 8 For instruction:   select i1, i32, i32

The bug that tracks the CodeGen part is PR14868.

radar://13403975

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177105 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-14 19:17:02 +00:00
Arnold Schwaighofer
b6f4872d29 ARM cost model: Increase the cost for vector casts that use the stack
Increase the cost of v8/v16-i8 to v8/v16-i32 casts and truncates as the backend
currently lowers those using stack accesses.

This was responsible for a significant degradation on
MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1
where we vectorize one loop to a vector factor of 16. After this patch we select
a vector factor of 4 which will generate reasonable code.

unsigned char cle[32];

void test(short c) {
  unsigned short compte;
  for (compte = 0; compte <= 31; compte++) {
    cle[compte] = cle[compte] ^ c;
  }
}

radar://13220512

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176898 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-12 21:19:22 +00:00
Jan Wen Voung
4323665bd8 Revert the test moves from 176733. Use "REQUIRES: asserts" instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176873 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-12 16:27:52 +00:00
Jan Wen Voung
fa785cb22d Disable statistics on Release builds and move tests that depend on -stats.
Summary:
Statistics are still available in Release+Asserts (any +Asserts builds),
and stats can also be turned on with LLVM_ENABLE_STATS.

Move some of the FastISel stats that were moved under DEBUG()
back out of DEBUG(), since stats are disabled across the board now.

Many tests depend on grepping "-stats" output.  Move those into
a orig_dir/Stats/. so that they can be marked as unsupported
when building without statistics.

Differential Revision: http://llvm-reviews.chandlerc.com/D486

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176733 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-08 22:56:31 +00:00
Shuxin Yang
985dac6579 Memory Dependence Analysis (not mem-dep test) take advantage of "invariant.load" metadata.
The "invariant.load" metadata indicates the memory unit being accessed is immutable.
A load annotated with this metadata can be moved across any store.

As I am not sure if it is legal to move such loads across barrier/fence, this
change dose not allow such transformation.

rdar://11311484

Thank Arnold for code review.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176562 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-06 17:48:48 +00:00
Arnold Schwaighofer
5f0d9dbdf4 X86 cost model: Adjust cost for custom lowered vector multiplies
This matters for example in following matrix multiply:

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
  int i, j, k, val;
  for (i=0; i<rows; i++) {
    for (j=0; j<cols; j++) {
      val = 0;
      for (k=0; k<cols; k++) {
        val += m1[i][k] * m2[k][j];
      }
      m3[i][j] = val;
    }
  }
  return(m3);
}

Taken from the test-suite benchmark Shootout.

We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).

Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.

I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:

for (i ...)
   r += a[i] * 3;
for (i ...)
  m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.

In each case the vectorized version was considerably slower.

radar://13304919

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176403 91177308-0d34-0410-b5e6-96231b3b80d8
2013-03-02 04:02:52 +00:00
Benjamin Kramer
8611d4449a Cost model support for lowered math builtins.
We make the cost for calling libm functions extremely high as emitting the
calls is expensive and causes spills (on x86) so performance suffers. We still
vectorize important calls like ceilf and friends on SSE4.1. and fabs.

Differential Revision: http://llvm-reviews.chandlerc.com/D466

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176287 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-28 19:09:33 +00:00
Bill Wendling
351b7a10e2 Use references to attribute groups on the call/invoke instructions.
Listing all of the attributes for the callee of a call/invoke instruction is way
too much and makes the IR unreadable. Use references to attributes instead.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175877 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-22 09:09:42 +00:00
Elena Demikhovsky
52981c4b60 I optimized the following patterns:
sext <4 x i1> to <4 x i64>
 sext <4 x i8> to <4 x i64>
 sext <4 x i16> to <4 x i64>
 
I'm running Combine on SIGN_EXTEND_IN_REG and revert SEXT patterns:
 (sext_in_reg (v4i64 anyext (v4i32 x )), ExtraVT) -> (v4i64 sext (v4i32 sext_in_reg (v4i32 x , ExtraVT)))
 
 The sext_in_reg (v4i32 x) may be lowered to shl+sar operations.
 The "sar" does not exist on 64-bit operation, so lowering sext_in_reg (v4i64 x) has no vector solution.

I also added a cost of this operations to the AVX costs table.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175619 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-20 12:42:54 +00:00
Bill Wendling
7ab6c76ad1 Modify the LLVM assembly output so that it uses references to represent function attributes.
This makes the LLVM assembly look better. E.g.:

     define void @foo() #0 { ret void }
     attributes #0 = { nounwind noinline ssp }


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175605 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-20 07:21:42 +00:00
Tim Northover
f13a7e2b21 AArch64: adjust tests which rely on a default JIT
Profiling tests *do* need a JIT. They'll pass if a cross-compiler targetting
AArch64 by default has been built, but fail if a native AArch64 compiler has
been build. Therefore XFAIL is inappropriate and we mark them unsupported.

ExecutionEngine tests are JIT by definition, they should also be unsupported.

Transforms/LICM only uses the interpreter to check the output is still sane
after optimisation. It can be switched to use an interpreter.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175433 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-18 11:08:37 +00:00
Arnold Schwaighofer
6851623c54 ARM cost model: Add vector reverse shuffle costs
A reverse shuffle is lowered to a vrev and possibly a vext instruction (quad
word).

radar://13171406

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@174933 91177308-0d34-0410-b5e6-96231b3b80d8
2013-02-12 02:40:39 +00:00