llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2024-08-25 00:29:20 +00:00

Author	SHA1	Message	Date
Hal Finkel	1aee811d71	Don't assert in BasicTTI::getMemoryOpCost for non-simple types BasicTTI::getMemoryOpCost must explicitly check for non-simple types; setting AllowUnknown=true with TLI->getSimpleValueType is not sufficient because, for example, non-power-of-two vector types return non-simple EVTs (not MVT::Other). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@206150 91177308-0d34-0410-b5e6-96231b3b80d8	2014-04-14 05:59:09 +00:00
Sebastian Pop	d541e6e6ea	in findGCD of multiply expr return the gcd we used to return 1 instead of the gcd git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205800 91177308-0d34-0410-b5e6-96231b3b80d8	2014-04-08 21:21:05 +00:00
Hal Finkel	e6a5b33e6e	[PowerPC] Adjust load/store costs in PPCTTI This provides more realistic costs for the insert/extractelement instructions (which are load/store pairs), accounts for the cheap unaligned Altivec load sequence, and for unaligned VSX load/stores. Bad news: MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation) SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized) MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown Good news: SingleSource/Benchmarks/Shootout/ary3 - 54% speedup SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup Unfortunately, estimating the costs of the stack-based scalarization sequences is hard, and adjusting these costs is like a game of whac-a-mole :( I'll revisit this again after we have better codegen for vector extloads and truncstores and unaligned load/stores. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205658 91177308-0d34-0410-b5e6-96231b3b80d8	2014-04-04 23:51:18 +00:00
Hal Finkel	d68b03bcd2	Account for scalarization costs in BasicTTI::getMemoryOpCost for extending vector loads When a vector type legalizes to a larger vector type, and the target does not support the associated extending load (or truncating store), then legalization will scalarize the load (or store) resulting in an associated scalarization cost. BasicTTI::getMemoryOpCost needs to account for this. Between this, and r205487, PowerPC on the P7 with VSX enabled shows: MultiSource/Benchmarks/PAQ8p/paq8p: 43% speedup SingleSource/Benchmarks/BenchmarkGame/puzzle: 51% speedup SingleSource/UnitTests/Vectorizer/gcc-loops 28% speedup (some of these are new; some of these, such as PAQ8p, just reverse regressions that VSX support would trigger) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205495 91177308-0d34-0410-b5e6-96231b3b80d8	2014-04-03 00:53:59 +00:00
Hal Finkel	9263e6f08d	Fix multi-register costs in BasicTTI::getCastInstrCost For an cast (extension, etc.), the currently logic predicts a low cost if the associated operation (keyed on the destination type) is legal (or promoted). This is not true when the number of values required to legalize the type is changing. For example, <8 x i16> being sign extended by <8 x i32> is not generically cheap on PPC with VSX, even though sign extension to v4i32 is legal, because two output v4i32 values are required compared to the single v8i16 input value, and without custom logic in the target, this conversion will scalarize. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205487 91177308-0d34-0410-b5e6-96231b3b80d8	2014-04-02 23:18:54 +00:00
Tim Northover	7b837d8c75	ARM64: initial backend import This adds a second implementation of the AArch64 architecture to LLVM, accessible in parallel via the "arm64" triple. The plan over the coming weeks & months is to merge the two into a single backend, during which time thorough code review should naturally occur. Everything will be easier with the target in-tree though, hence this commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@205090 91177308-0d34-0410-b5e6-96231b3b80d8	2014-03-29 10:18:08 +00:00
Arnold Schwaighofer	23463c9261	PR15967 Fix in basicaa for faulty returning no alias. This commit consist of two parts. The first part fix the PR15967. The wrong conclusion was made when the MaxLookup limit was reached. The fix introduce a out parameter (MaxLookupReached) to DecomposeGEPExpression that the function aliasGEP can act upon. The second part is introducing the constant MaxLookupSearchDepth to make sure that DecomposeGEPExpression and GetUnderlyingObject use the same search depth. This is a small cleanup to clarify the original algorithm. Patch by Karl-Johan Karlsson! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204859 91177308-0d34-0410-b5e6-96231b3b80d8	2014-03-26 21:30:19 +00:00
Benjamin Kramer	c1c74fb2b4	ScalarEvolution: Compute exit counts for loops with a power-of-2 step. If we have a loop of the form for (unsigned n = 0; n != (k & -32); n += 32) {} then we know that n is always divisible by 32 and the loop must terminate. Even if we have a condition where the loop counter will overflow it'll always hold this invariant. PR19183. Our loop vectorizer creates this pattern and it's also occasionally formed by loop counters derived from pointers. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@204728 91177308-0d34-0410-b5e6-96231b3b80d8	2014-03-25 16:25:12 +00:00
Rafael Espindola	38048cdb1c	Reject alias to undefined symbols in the verifier. On ELF and COFF an alias is just another name for a position in the file. There is no way to refer to a position in another file, so an alias to undefined is meaningless. MachO currently doesn't support aliases. The spec has a N_INDR, which when implemented will have a different set of restrictions. Adding support for it shouldn't be harder than any other IR extension. For now, having the IR represent what is actually possible with current tools makes it easier to fix the design of GlobalAlias. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203705 91177308-0d34-0410-b5e6-96231b3b80d8	2014-03-12 20:15:49 +00:00
Raul E. Silvera	6df2b69098	When analyzing vectors of element type that require legalization, the legalization cost must be included to get an accurate estimation of the total cost of the scalarized vector. The inaccurate cost triggered unprofitable SLP vectorization on 32-bit X86. Summary: Include legalization overhead when computing scalarization cost Reviewers: hfinkel, nadav CC: chandlerc, rnk, llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2992 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203509 91177308-0d34-0410-b5e6-96231b3b80d8	2014-03-10 22:59:13 +00:00
Matt Arsenault	38c18efe41	Teach lint about address spaces git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@203132 91177308-0d34-0410-b5e6-96231b3b80d8	2014-03-06 17:33:55 +00:00
Sebastian Pop	4449ed2a70	add -da-delinearize runs and checks to MIV testcases git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201869 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-21 18:15:18 +00:00
Nico Rieck	c15d3a82ae	Add extra CHECK prefix to tests with explicit prefix These tests mistakenly assume that CHECK is still available even if an explicit prefix is specified. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201492 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-16 13:28:15 +00:00
Nico Rieck	da39cf486a	Actually call FileCheck in tests git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201491 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-16 13:27:39 +00:00
Nico Rieck	268e96a8a6	Fix broken CHECK lines git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201479 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-16 07:31:05 +00:00
Andrea Di Biagio	029a76b0a2	[Vectorizer] Add a new 'OperandValueKind' in TargetTransformInfo called 'OK_NonUniformConstValue' to identify operands which are constants but not constant splats. The cost model now allows returning 'OK_NonUniformConstValue' for non splat operands that are instances of ConstantVector or ConstantDataVector. With this change, targets are now able to compute different costs for instructions with non-uniform constant operands. For example, On X86 the cost of a vector shift may vary depending on whether the second operand is a uniform or non-uniform constant. This patch applies the following changes: - The cost model computation now takes into account non-uniform constants; - The cost of vector shift instructions has been improved in X86TargetTransformInfo analysis pass; - BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish between non-uniform and uniform constant operands. Added a new test to verify that the output of opt '-cost-model -analyze' is valid in the following configurations: SSE2, SSE4.1, AVX, AVX2. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201272 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-12 23:43:47 +00:00
Craig Topper	c7709a43ee	Test case I forgot to 'add' for r201126. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201207 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-12 03:58:47 +00:00
Benjamin Kramer	cb27441554	ScalarEvolution: Analyze trip count of loops with a switch guarding the exit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201159 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-11 15:44:32 +00:00
Tim Northover	0c245b69f7	X86: add costs for 64-bit vector ext/trunc & rebalance The most important part of this is probably adding any cost at all for operations like zext <8 x i8> to <8 x i32>. Before they were being recorded as extremely costly (24, I believe) which made LLVM fall back on a 4-wide vectorisation of a loop. It also rebalances the values for sext, zext and trunc. Lacking any other sane metric that might work across CPU microarchitectures I went for instructions. This seems to be in reasonable accord with the rest of the table (sitofp, ...) though no doubt at least one value is sub-optimal for some bizarre reason. Finally, separate AVX and AVX2 values are provided where appropriate. The CodeGen is quite different in many cases. rdar://problem/15981990 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200928 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-06 18:18:36 +00:00
Chandler Carruth	57732bff1e	[PM] Add a new "lazy" call graph analysis pass for the new pass manager. The primary motivation for this pass is to separate the call graph analysis used by the new pass manager's CGSCC pass management from the existing call graph analysis pass. That analysis pass is (somewhat unfortunately) over-constrained by the existing CallGraphSCCPassManager requirements. Those requirements make it really hard to cleanly layer the needed functionality for the new pass manager on top of the existing analysis. However, there are also a bunch of things that the pass manager would specifically benefit from doing differently from the existing call graph analysis, and this new implementation tries to address several of them: - Be lazy about scanning function definitions. The existing pass eagerly scans the entire module to build the initial graph. This new pass is significantly more lazy, and I plan to push this even further to maximize locality during CGSCC walks. - Don't use a single synthetic node to partition functions with an indirect call from functions whose address is taken. This node creates a huge choke-point which would preclude good parallelization across the fanout of the SCC graph when we got to the point of looking at such changes to LLVM. - Use a memory dense and lightweight representation of the call graph rather than value handles and tracking call instructions. This will require explicit update calls instead of some updates working transparently, but should end up being significantly more efficient. The explicit update calls ended up being needed in many cases for the existing call graph so we don't really lose anything. - Doesn't explicitly model SCCs and thus doesn't provide an "identity" for an SCC which is stable across updates. This is essential for the new pass manager to work correctly. - Only form the graph necessary for traversing all of the functions in an SCC friendly order. This is a much simpler graph structure and should be more memory dense. It does limit the ways in which it is appropriate to use this analysis. I wish I had a better name than "call graph". I've commented extensively this aspect. This is still very much a WIP, in fact it is really just the initial bits. But it is about the fourth version of the initial bits that I've implemented with each of the others running into really frustrating problms. This looks like it will actually work and I'd like to split the actual complexity across commits for the sake of my reviewers. =] The rest of the implementation along with lots of wiring will follow somewhat more rapidly now that there is a good path forward. Naturally, this doesn't impact any of the existing optimizer. This code is specific to the new pass manager. A bunch of thanks are deserved for the various folks that have helped with the design of this, especially Nick Lewycky who actually sat with me to go through the fundamentals of the final version here. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200903 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-06 04:37:03 +00:00
Nick Lewycky	4bfa6fecc1	Fix crasher introduced in r200203 and caught by a libc++ buildbot. Don't assume that getMulExpr returns a SCEVMulExpr, it may have simplified it to something else! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200210 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-27 10:47:44 +00:00
Nick Lewycky	f2282cac95	Teach SCEV to handle more cases of 'and X, CST', specifically where CST is any number of contiguous 1 bits in a row, with any number of leading and trailing 0 bits. Unfortunately, this in turn led to some lower quality SCEVs due to some different paths through expression simplification, so add getUDivExactExpr and use it. This fixes all instances of the problems that I found, but we can make that function smarter as necessary. Merge test "xor-and.ll" into "and-xor.ll" since I needed to update it anyways. Test 'nsw-offset.ll' analyzes a little deeper, %n now gets a scev in terms of %no instead of a SCEVUnknown. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200203 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-27 10:04:03 +00:00
Alp Toker	ae43cab6ba	Fix known typos Sweep the codebase for common typos. Includes some changes to visible function names that were misspelt. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200018 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-24 17:20:08 +00:00
Arnold Schwaighofer	8963071a14	BasicAA: We need to check both access sizes when comparing a gep and an underlying object of unknown size. Fixes PR18460. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199351 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-16 04:53:18 +00:00
Benjamin Kramer	ccdb9c9483	Fix broken CHECK lines. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199016 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-11 21:06:00 +00:00
Stepan Dyatkovskiy	3539d6d40c	Fixed old typo in ScalarEvolution, that caused wrong SCEVs zext operation. Detailed description is here: http://llvm.org/bugs/show_bug.cgi?id=18000#c16 For participation in bugfix process special thanks to David Wiberg. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198863 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-09 12:26:12 +00:00
Arnold Schwaighofer	3036182e99	BasicAA: Use reachabilty instead of dominance for checking value equality in phi cycles This allows the value equality check to work even if we don't have a dominator tree. Also add some more comments. I was worried about compile time impacts and did not implement reachability but used the dominance check in the initial patch. The trade-off was that the dominator tree was required. The llvm utility function isPotentiallyReachable cuts off the recursive search after 32 visits. Testing did not show any compile time regressions showing my worries unjustfied. No compile time or performance regressions at O3 -flto -mavx on test-suite + externals. Addresses review comments from r198290. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198400 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-03 05:47:03 +00:00
Arnold Schwaighofer	1bdb320dae	BasicAA: Fix value equality and phi cycles When there are cycles in the value graph we have to be careful interpreting "Value" identity as "value" equivalence. We interpret the value of a phi node as the value of its operands. When we check for value equivalence now we make sure that the "Value" dominates all cycles (phis). %0 = phi [%noaliasval, %addr2] %l = load %ptr %addr1 = gep @a, 0, %l %addr2 = gep @a, 0, (%l + 1) store %ptr ... Before this patch we would return NoAlias for (%0, %addr1) which is wrong because the value of the load is from different iterations of the loop. Tested on x86_64 -mavx at O3 and O3 -flto with no performance or compile time regressions. PR18068 radar://15653794 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198290 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-02 03:31:36 +00:00
Matt Arsenault	74c996cbd1	Use correct size for address space in BasicAA. The tests just hit this with a different sized address space since I haven't figured out how to use this to break it. I thought I committed this a long time ago, and I'm not sure why missing this hasn't caused any problems. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194903 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-16 00:36:43 +00:00
Sebastian Pop	430b6eb419	improve dependence analysis testcases print the name of the function on which the dependence analysis is performed such that changes to the testcase are easier to review. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194528 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-12 22:47:30 +00:00
Sebastian Pop	5230ad61fd	delinearization of arrays git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194527 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-12 22:47:20 +00:00
Andrew Trick	10bb82e54f	Rewrite SCEV's backedge taken count computation. Patch by Michele Scandale! Rewrite of the functions used to compute the backedge taken count of a loop on LT and GT comparisons. I decided to split the handling of LT and GT cases becasue the trick "a > b == -a < -b" in some cases prevents the trip count computation due to the multiplication by -1 on the two operands of the comparison. This issue comes from the conservative computation of value range of SCEVs: taking the negative SCEV of an expression that have a small positive range (e.g. [0,31]), we would have a SCEV with a fullset as value range. Indeed, in the new rewritten function I tried to better handle the maximum backedge taken count computation when MAX/MIN expression are used to handle the cases where no entry guard is found. Some test have been modified in order to check the new value correctly (I manually check them and reasoning on possible overflow the new values seem correct). I finally added a new test case related to the multiplication by -1 issue on GT comparisons. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194116 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-06 02:08:26 +00:00
Hal Finkel	e14fb07357	Consider (x == -1) unlikely in BranchProbabilityInfo This adds another heuristic to BPI, similar to the existing heuristic that considers (x == 0) unlikely to be true. As suggested in the PACT'98 paper by Deitrich, Cheng, and Hwu, -1 is often used to indicate an invalid index, and equality comparisons with -1 are also unlikely to succeed. Local experimentation supports this hypothesis: This yields a 1-2% speedup in the test-suite sqlite benchmark on the PPC A2 core, with no significant regressions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193855 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-01 10:58:22 +00:00
Benjamin Kramer	19ea37059a	SCEV: Make the final add of an inbounds GEP nuw if we know that the index is positive. We can't do this for the general case as saying a GEP with a negative index doesn't have unsigned wrap isn't valid for negative indices. %gep = getelementptr inbounds i32* %p, i64 -1 But an inbounds GEP cannot run past the end of address space. So we check for the very common case of a positive index and make GEPs derived from that NUW. Together with Andy's recent non-unit stride work this lets us analyze loops like void foo3(int a, int b) { for (; a < b; a++) {} } PR12375, PR12376. Differential Revision: http://llvm-reviews.chandlerc.com/D2033 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193514 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-28 07:30:06 +00:00
Shuxin Yang	69bd41dfe3	Revert r193251 : Use address-taken to disambiguate global variable and indirect memops. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193489 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-27 03:08:44 +00:00
Benjamin Kramer	bb41c75ab5	X86: Custom lower sext v16i8 to v16i16, and the corresponding truncate. Also update the cost model. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193270 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-23 21:06:07 +00:00
Shuxin Yang	8e3851a6eb	Use address-taken to disambiguate global variable and indirect memops. Major steps include: 1). introduces a not-addr-taken bit-field in GlobalVariable 2). GlobalOpt pass sets "not-address-taken" if it proves a global varirable dosen't have its address taken. 3). AA use this info for disambiguation. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193251 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-23 17:28:19 +00:00
Manman Ren	b62e1033a4	Simplify testing case (Thanks Rafael for the testing case). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193177 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-22 18:15:50 +00:00
Manman Ren	11d78777d5	TBAA: fix PR17620. We can have a struct type with a single field and the field does not start with 0. In that case, we should correctly update the offset. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193137 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-22 01:40:25 +00:00
Matt Arsenault	4784bb6f44	Fix creating bitcasts between address spaces in SCEV. The test before wasn't successfully testing this since it was missing the datalayout piece to change the size of the second address space. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193102 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-21 18:41:10 +00:00
Andrew Trick	a5c5bc9948	SCEV should use NSW to get trip count for positive nonunit stride loops. SCEV currently fails to compute loop counts for nonunit stride loops. This comes up frequently. It prevents loop optimization and forces vectorization to insert extra loop checks. For example: void foo(int n, int *x) { for (int i = 0; i < n; i += 3) { x[i] = i; x[i+1] = i+1; x[i+2] = i+2; } } We need to properly handle the case in which limit > INT_MAX-stride. In the above case: n > INT_MAX-3. In this case the loop counter will step beyond the limit and overflow at the same time. However, knowing that signed integer overlow in undefined, we can assume the loop test behavior is arbitrary after overflow. This obeys both C undefined behavior rules, and the more strict LLVM poison value rules. I'm finally fixing this in response to Hal Finkel's persistence. The most probable reason that we never optimized this before is that we were being careful to handle case where the developer expected a side-effect free infinite loop relying on overflow: for (int i = 0; i < n; i += s) { ++j; } return j; If INT_MAX+1 is a multiple of s and n > INT_MAX-s, then we might expect an infinite loop. However there are plenty of ways to achieve this effect without relying on undefined behavior of signed overflow. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193015 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-18 23:43:53 +00:00
Chandler Carruth	dd5d86d992	Remove the very substantial, largely unmaintained legacy PGO infrastructure. This was essentially work toward PGO based on a design that had several flaws, partially dating from a time when LLVM had a different architecture, and with an effort to modernize it abandoned without being completed. Since then, it has bitrotted for several years further. The result is nearly unusable, and isn't helping any of the modern PGO efforts. Instead, it is getting in the way, adding confusion about PGO in LLVM and distracting everyone with maintenance on essentially dead code. Removing it paves the way for modern efforts around PGO. Among other effects, this removes the last of the runtime libraries from LLVM. Those are being developed in the separate 'compiler-rt' project now, with somewhat different licensing specifically more approriate for runtimes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191835 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-02 15:42:23 +00:00
Matt Arsenault	81877ad396	Use CHECK-LABEL git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191713 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-30 23:31:55 +00:00
Manman Ren	9e81c3bdb2	TBAA: handle scalar TBAA format and struct-path aware TBAA format. Remove the command line argument "struct-path-tbaa" since we should not depend on command line argument to decide which format the IR file is using. Instead, we check the first operand of the tbaa tag node, if it is a MDNode, we treat it as struct-path aware TBAA format, otherwise, we treat it as scalar TBAA format. When clang starts to use struct-path aware TBAA format no matter whether struct-path-tbaa is no, and we can auto-upgrade existing bc files, the support for scalar TBAA format can be dropped. Existing testing cases are updated to use the struct-path aware TBAA format. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191538 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-27 18:34:27 +00:00
Yi Jiang	cdfb43f0a6	X86 horizontal vector reduction cost model git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191021 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-19 17:48:48 +00:00
Arnold Schwaighofer	65457b679a	Costmodel: Add support for horizontal vector reductions Upcoming SLP vectorization improvements will want to be able to estimate costs of horizontal reductions. Add infrastructure to support this. We model reductions as a series of (shufflevector,add) tuples ultimately followed by an extractelement. For example, for an add-reduction of <4 x float> we could generate the following sequence: (v0, v1, v2, v3) \ \ / / \ \ / + + (v0+v2, v1+v3, undef, undef) \ / ((v0+v2) + (v1+v3), undef, undef) %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef> %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf %rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef> %bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7 %r = extractelement <4 x float> %bin.rdx8, i32 0 This commit adds a cost model interface "getReductionCost(Opcode, Ty, Pairwise)" that will allow clients to ask for the cost of such a reduction (as backends might generate more efficient code than the cost of the individual instructions summed up). This interface is excercised by the CostModel analysis pass which looks for reduction patterns like the one above - starting at extractelements - and if it sees a matching sequence will call the cost model interface. We will also support a second form of pairwise reduction that is well supported on common architectures (haddps, vpadd, faddp). (v0, v1, v2, v3) \ / \ / (v0+v1, v2+v3, undef, undef) \ / ((v0+v1)+(v2+v3), undef, undef, undef) %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef, <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef> %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef> %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1 %rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef> %rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef> %bin.rdx.1 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1 %r = extractelement <4 x float> %bin.rdx.1, i32 0 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190876 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-17 18:06:50 +00:00
Matt Arsenault	14807bd8c8	Teach ScalarEvolution about pointer address spaces git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190425 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-10 19:55:24 +00:00
Matt Arsenault	e3e5f77d77	Fix lint assert on integer vector division git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@189290 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-26 23:29:33 +00:00
Bill Wendling	3c3ee1f8ac	FileCheck-ize tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188971 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-22 00:51:19 +00:00
Daniel Dunbar	24ec2e5a72	[tests] Cleanup initialization of test suffixes. - Instead of setting the suffixes in a bunch of places, just set one master list in the top-level config. We now only modify the suffix list in a few suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py). - Aside from removing the need for a bunch of lit.local.cfg files, this enables 4 tests that were inadvertently being skipped (one in Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been XFAILED). - This commit also fixes a bunch of config files to use config.root instead of older copy-pasted code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188513 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-16 00:37:11 +00:00
Bill Wendling	ac838d1c14	FileCheckize some of the testcases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187756 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-05 23:43:18 +00:00
Renato Golin	38ffffeebc	Fixes ARM LNT bot from SLP change in O3 This patch fixes the multiple breakages on ARM test-suite after the SLP vectorizer was introduced by default on O3. The problem was an illegal vector type on ARMTTI::getCmpSelInstrCost() <3 x i1> which is not simple. The guard protects this code from breaking (cause of the problems) but doesn't fix the issue that is generating the odd vector in the first place, which also needs to be investigated. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187658 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-02 17:10:04 +00:00
Stephen Lin	0dfc166487	Add newlines at end of test files, no functionality change git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186263 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-13 22:00:58 +00:00
Hal Finkel	04b84c2f92	Add the nearbyint -> FNEARBYINT mapping to BasicTargetTransformInfo This fixes an oversight that Intrinsic::nearbyint was not being mapped to ISD::FNEARBYINT (thus fixing the over-optimistic cost we were assigning to nearbyint calls for some targets). git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185783 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-08 03:24:07 +00:00
Nick Lewycky	dc89737bcd	Extend 'readonly' and 'readnone' to work on function arguments as well as functions. Make the function attributes pass add it to known library functions and when it can deduce it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185735 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-06 00:29:58 +00:00
Jakob Stoklund Olesen	97be1d608e	Minimize precision loss when computing cyclic probabilities. Allow block frequencies to exceed 32 bits by using the new BlockFrequency division function. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185236 91177308-0d34-0410-b5e6-96231b3b80d8	2013-06-28 22:40:43 +00:00
Preston Briggs	26ba495309	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185187 91177308-0d34-0410-b5e6-96231b3b80d8	2013-06-28 18:44:48 +00:00
Nadav Rotem	16d36a5cd1	CostModel: improve the cost model for load/store of non power-of-two types such as <3 x float>, which are popular in graphics. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185085 91177308-0d34-0410-b5e6-96231b3b80d8	2013-06-27 17:52:04 +00:00
Jakob Stoklund Olesen	b1c0cc22dd	Print block frequencies in decimal form. This is easier to read than the internal fixed-point representation. If anybody knows the correct algorithm for converting fixed-point numbers to base 10, feel free to fix it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184881 91177308-0d34-0410-b5e6-96231b3b80d8	2013-06-25 21:57:38 +00:00
Arnold Schwaighofer	34eb2406b4	X86 cost model: Vectorizing integer division is a bad idea radar://14057959 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184872 91177308-0d34-0410-b5e6-96231b3b80d8	2013-06-25 19:14:09 +00:00
Benjamin Kramer	75b5162154	BlockFrequency: Bump up the entry frequency a bit. This is a band-aid to fix the most severe regressions we're seeing from basing spill decisions on block frequencies, until we have a better solution. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184835 91177308-0d34-0410-b5e6-96231b3b80d8	2013-06-25 13:34:40 +00:00
Benjamin Kramer	b47aceaf06	Revert "BlockFrequency: Saturate at 1 instead of 0 when multiplying a frequency with a branch probability." This reverts commit r184584. Breaks PPC selfhost. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184590 91177308-0d34-0410-b5e6-96231b3b80d8	2013-06-21 20:20:27 +00:00
Benjamin Kramer	93702a3b07	BlockFrequency: Saturate at 1 instead of 0 when multiplying a frequency with a branch probability. Zero is used by BlockFrequencyInfo as a special "don't know" value. It also causes a sink for frequencies as you can't ever get off a zero frequency with more multiplies. This recovers a 10% regression on MultiSource/Benchmarks/7zip. A zero frequency was propagated into an inner loop causing excessive spilling. PR16402. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184584 91177308-0d34-0410-b5e6-96231b3b80d8	2013-06-21 19:30:05 +00:00
Andrew Trick	9c8e1f93b4	Unit test for SCEV fix r182989, PR16130. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@183017 91177308-0d34-0410-b5e6-96231b3b80d8	2013-05-31 16:42:41 +00:00
Michael Kuperstein	9f5de6dadc	Make BasicAliasAnalysis recognize the fact a noalias argument cannot alias another argument, even if the other argument is not itself marked noalias. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182755 91177308-0d34-0410-b5e6-96231b3b80d8	2013-05-28 08:17:48 +00:00
Diego Novillo	77226a03dc	Add a new function attribute 'cold' to functions. Other than recognizing the attribute, the patch does little else. It changes the branch probability analyzer so that edges into blocks postdominated by a cold function are given low weight. Added analysis and code generation tests. Added documentation for the new attribute. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@182638 91177308-0d34-0410-b5e6-96231b3b80d8	2013-05-24 12:26:52 +00:00
Tim Northover	4c8850cd1c	AArch64: use MCJIT by default and enable related tests. This just enables some testing I'd missed after implementing MCJIT support. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181215 91177308-0d34-0410-b5e6-96231b3b80d8	2013-05-06 16:51:08 +00:00
Matt Arsenault	8b9dc21d6f	Fix unchecked uses of DominatorTree in MemoryDependenceAnalysis. Use unknown results for places where it would be needed git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181176 91177308-0d34-0410-b5e6-96231b3b80d8	2013-05-06 02:07:24 +00:00
Tobias Grosser	333403abbd	RegionInfo: Do not crash if unreachable block is found git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@181025 91177308-0d34-0410-b5e6-96231b3b80d8	2013-05-03 15:48:34 +00:00
Manman Ren	e78d832097	TBAA: remove !tbaa from testing cases if not used. This will make it easier to turn on struct-path aware TBAA since the metadata format will change. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180743 91177308-0d34-0410-b5e6-96231b3b80d8	2013-04-29 22:42:01 +00:00
Manman Ren	a5b314c27a	Struct-path aware TBAA: change the format of TBAAStructType node. We switch the order of offset and field type to make TBAAStructType node (name, parent node, offset) similar to scalar TBAA node (name, parent node). TypeIsImmutable is added to TBAAStructTag node. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180654 91177308-0d34-0410-b5e6-96231b3b80d8	2013-04-27 00:26:11 +00:00
Arnold Schwaighofer	45c9e0b412	ARM cost model: Integer div and rem is lowered to a function call Reflect this in the cost model. I observed this in MiBench/consumer-lame. radar://13354716 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@180576 91177308-0d34-0410-b5e6-96231b3b80d8	2013-04-25 21:16:18 +00:00
Jim Grosbach	0cb1019e9c	Legalize vector truncates by parts rather than just splitting. Rather than just splitting the input type and hoping for the best, apply a bit more cleverness. Just splitting the types until the source is legal often leads to an illegal result time, which is then widened and a scalarization step is introduced which leads to truly horrible code generation. With the loop vectorizer, these sorts of operations are much more common, and so it's worth extra effort to do them well. Add a legalization hook for the operands of a TRUNCATE node, which will be encountered after the result type has been legalized, but if the operand type is still illegal. If simple splitting of both types ends up with the result type of each half still being legal, just do that (v16i16 -> v16i8 on ARM, for example). If, however, that would result in an illegal result type (v8i32 -> v8i8 on ARM, for example), we can get more clever with power-two vectors. Specifically, split the input type, but also widen the result element size, then concatenate the halves and truncate again. For example on ARM, To perform a "%res = v8i8 trunc v8i32 %in" we transform to: %inlo = v4i32 extract_subvector %in, 0 %inhi = v4i32 extract_subvector %in, 4 %lo16 = v4i16 trunc v4i32 %inlo %hi16 = v4i16 trunc v4i32 %inhi %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16 %res = v8i8 trunc v8i16 %in16 This allows instruction selection to generate three VMOVN instructions instead of a sequences of moves, stores and loads. Update the ARMTargetTransformInfo to take this improved legalization into account. Consider the simplified IR: define <16 x i8> @test1(<16 x i32>* %ap) { %a = load <16 x i32>* %ap %tmp = trunc <16 x i32> %a to <16 x i8> ret <16 x i8> %tmp } define <8 x i8> @test2(<8 x i32>* %ap) { %a = load <8 x i32>* %ap %tmp = trunc <8 x i32> %a to <8 x i8> ret <8 x i8> %tmp } Previously, we would generate the truly hideous: .syntax unified .section __TEXT,__text,regular,pure_instructions .globl _test1 .align 2 _test1: @ @test1 @ BB#0: push {r7} mov r7, sp sub sp, sp, #20 bic sp, sp, #7 add r1, r0, #48 add r2, r0, #32 vld1.64 {d24, d25}, [r0:128] vld1.64 {d16, d17}, [r1:128] vld1.64 {d18, d19}, [r2:128] add r1, r0, #16 vmovn.i32 d22, q8 vld1.64 {d16, d17}, [r1:128] vmovn.i32 d20, q9 vmovn.i32 d18, q12 vmov.u16 r0, d22[3] strb r0, [sp, #15] vmov.u16 r0, d22[2] strb r0, [sp, #14] vmov.u16 r0, d22[1] strb r0, [sp, #13] vmov.u16 r0, d22[0] vmovn.i32 d16, q8 strb r0, [sp, #12] vmov.u16 r0, d20[3] strb r0, [sp, #11] vmov.u16 r0, d20[2] strb r0, [sp, #10] vmov.u16 r0, d20[1] strb r0, [sp, #9] vmov.u16 r0, d20[0] strb r0, [sp, #8] vmov.u16 r0, d18[3] strb r0, [sp, #3] vmov.u16 r0, d18[2] strb r0, [sp, #2] vmov.u16 r0, d18[1] strb r0, [sp, #1] vmov.u16 r0, d18[0] strb r0, [sp] vmov.u16 r0, d16[3] strb r0, [sp, #7] vmov.u16 r0, d16[2] strb r0, [sp, #6] vmov.u16 r0, d16[1] strb r0, [sp, #5] vmov.u16 r0, d16[0] strb r0, [sp, #4] vldmia sp, {d16, d17} vmov r0, r1, d16 vmov r2, r3, d17 mov sp, r7 pop {r7} bx lr .globl _test2 .align 2 _test2: @ @test2 @ BB#0: push {r7} mov r7, sp sub sp, sp, #12 bic sp, sp, #7 vld1.64 {d16, d17}, [r0:128] add r0, r0, #16 vld1.64 {d20, d21}, [r0:128] vmovn.i32 d18, q8 vmov.u16 r0, d18[3] vmovn.i32 d16, q10 strb r0, [sp, #3] vmov.u16 r0, d18[2] strb r0, [sp, #2] vmov.u16 r0, d18[1] strb r0, [sp, #1] vmov.u16 r0, d18[0] strb r0, [sp] vmov.u16 r0, d16[3] strb r0, [sp, #7] vmov.u16 r0, d16[2] strb r0, [sp, #6] vmov.u16 r0, d16[1] strb r0, [sp, #5] vmov.u16 r0, d16[0] strb r0, [sp, #4] ldm sp, {r0, r1} mov sp, r7 pop {r7} bx lr Now, however, we generate the much more straightforward: .syntax unified .section __TEXT,__text,regular,pure_instructions .globl _test1 .align 2 _test1: @ @test1 @ BB#0: add r1, r0, #48 add r2, r0, #32 vld1.64 {d20, d21}, [r0:128] vld1.64 {d16, d17}, [r1:128] add r1, r0, #16 vld1.64 {d18, d19}, [r2:128] vld1.64 {d22, d23}, [r1:128] vmovn.i32 d17, q8 vmovn.i32 d16, q9 vmovn.i32 d18, q10 vmovn.i32 d19, q11 vmovn.i16 d17, q8 vmovn.i16 d16, q9 vmov r0, r1, d16 vmov r2, r3, d17 bx lr .globl _test2 .align 2 _test2: @ @test2 @ BB#0: vld1.64 {d16, d17}, [r0:128] add r0, r0, #16 vld1.64 {d18, d19}, [r0:128] vmovn.i32 d16, q8 vmovn.i32 d17, q9 vmovn.i16 d16, q8 vmov r0, r1, d16 bx lr git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179989 91177308-0d34-0410-b5e6-96231b3b80d8	2013-04-21 23:47:41 +00:00
Arnold Schwaighofer	9c63f0d687	X86 cost model: Exit before calling getSimpleVT on non-simple VTs getSimpleVT can only handle simple value types. radar://13676022 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179714 91177308-0d34-0410-b5e6-96231b3b80d8	2013-04-17 20:04:53 +00:00
Nadav Rotem	9eb366acba	CostModel: increase the default cost of supported floating point operations from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179413 91177308-0d34-0410-b5e6-96231b3b80d8	2013-04-12 21:15:03 +00:00
Manman Ren	4df1854f26	Aliasing rules for struct-path aware TBAA. Added PathAliases to check if two struct-path tags can alias. Added command line option -struct-path-tbaa. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179337 91177308-0d34-0410-b5e6-96231b3b80d8	2013-04-11 23:24:18 +00:00
Arnold Schwaighofer	813456527e	X86 cost model: Model cost for uitofp and sitofp on SSE2 The costs are overfitted so that I can still use the legalization factor. For example the following kernel has about half the throughput vectorized than unvectorized when compiled with SSE2. Before this patch we would vectorize it. unsigned short A[1024]; double B[1024]; void f() { int i; for (i = 0; i < 1024; ++i) { B[i] = (double) A[i]; } } radar://13599001 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@179033 91177308-0d34-0410-b5e6-96231b3b80d8	2013-04-08 18:05:48 +00:00
Arnold Schwaighofer	cd3d60c450	TargetLowering: Fix getTypeConversion handling of extended vector types The code in getTypeConversion attempts to promote the element vector type before it trys to split or widen the vector. After it failed finding a legal vector type by promoting it would continue using the promoted vector element type. Thereby missing legal splitted vector types. For example the type v32i32 that has a legal split of 4 x v3i32 on x86/sse2 would be transformed to: v32i256 and from there on successively split to: v16i256, v8i256, v1i256 and then finally ends up as an i64 type. By resetting the vector element type to the original vector element type that existed before the promotion the code will attempt to split the vector type to smaller vector widths of the same type. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178999 91177308-0d34-0410-b5e6-96231b3b80d8	2013-04-07 20:22:56 +00:00
Arnold Schwaighofer	2537f3c659	X86 cost model: Differentiate cost for vector shifts of constants SSE2 has efficient support for shifts by a scalar. My previous change of making shifts expensive did not take this into account marking all shifts as expensive. This would prevent vectorization from happening where it is actually beneficial. With this change we differentiate between shifts of constants and other shifts. radar://13576547 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178808 91177308-0d34-0410-b5e6-96231b3b80d8	2013-04-04 23:26:24 +00:00
Arnold Schwaighofer	6b6050b229	X86 cost model: Vector shifts are expensive in most cases The default logic does not correctly identify costs of casts because they are marked as custom on x86. For some cases, where the shift amount is a scalar we would be able to generate better code. Unfortunately, when this is the case the value (the splat) will get hoisted out of the loop, thereby making it invisible to ISel. radar://13130673 radar://13537826 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178703 91177308-0d34-0410-b5e6-96231b3b80d8	2013-04-03 21:46:05 +00:00
Benjamin Kramer	13497b3aa7	X86TTI: Add accurate costs for itofp operations, based on the actual instruction counts. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@178459 91177308-0d34-0410-b5e6-96231b3b80d8	2013-04-01 10:23:49 +00:00
Andrew Trick	e74c2e86cb	Fix SCEV forgetMemoizedResults should search and destroy backedge exprs. Fixes PR15570: SEGV: SCEV back-edge info invalid after dead code removal. Indvars creates a SCEV expression for the loop's back edge taken count, then determines that the comparison is always true and removes it. When loop-unroll asks for the expression, it contains a NULL SCEVUnknkown (as a CallbackVH). forgetMemoizedResults should invalidate the loop back edges expression. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177986 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-26 03:14:53 +00:00
Jyotsna Verma	1f7fe80447	Disable profiling tests for Hexagon since it doesn't support JIT. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177917 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-25 21:15:11 +00:00
Manman Ren	a2e3834d16	Support in AAEvaluator to print alias queries of loads/stores with TBAA tags. Add "evaluate-tbaa" to print alias queries of loads/stores. Alias queries between pointers do not include TBAA tags. Add testing case for "placement new". TBAA currently says NoAlias. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177772 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-22 22:34:41 +00:00
Michael Liao	f74e9bf650	Correct cost model for vector shift on AVX2 - After moving logic recognizing vector shift with scalar amount from DAG combining into DAG lowering, we declare to customize all vector shifts even vector shift on AVX is legal. As a result, the cost model needs special tuning to identify these legal cases. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177586 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-20 22:01:10 +00:00
Nadav Rotem	b05130e1b2	Optimize sext <4 x i8> and <4 x i16> to <4 x i64>. Patch by Ahmad, Muhammad T <muhammad.t.ahmad@intel.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177421 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-19 18:38:27 +00:00
Renato Golin	5ad5f5931e	Improve long vector sext/zext lowering on ARM The ARM backend currently has poor codegen for long sext/zext operations, such as v8i8 -> v8i32. This patch addresses this by performing a custom expansion in ARMISelLowering. It also adds/changes the cost of such lowering in ARMTTI. This partially addresses PR14867. Patch by Pete Couperus git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177380 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-19 08:15:38 +00:00
Arnold Schwaighofer	bf37bf9e21	ARM cost model: Make some vector integer to float casts cheaper The default logic marks them as too expensive. For example, before this patch we estimated: cost of 16 for instruction: %r = uitofp <4 x i16> %v0 to <4 x float> While this translates to: vmovl.u16 q8, d16 vcvt.f32.u32 q8, q8 All other costs are left to the values assigned by the fallback logic. Theses costs are mostly reasonable in the sense that they get progressively more expensive as the instruction sequences emitted get longer. radar://13445992 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177334 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-18 22:47:09 +00:00
Arnold Schwaighofer	01f2571014	ARM cost model: Correct cost for some cheap float to integer conversions Fix cost of some "cheap" cast instructions. Before this patch we used to estimate for example: cost of 16 for instruction: %r = fptoui <4 x float> %v0 to <4 x i16> While we would emit: vcvt.s32.f32 q8, q8 vmovn.i32 d16, q8 vuzp.8 d16, d17 All other costs are left to the values assigned by the fallback logic. Theses costs are mostly reasonable in the sense that they get progressively more expensive as the instruction sequences emitted get longer. radar://13434072 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177333 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-18 22:47:06 +00:00
Arnold Schwaighofer	5193e4ebe2	ARM cost model: Fix costs for some vector selects I was too pessimistic in r177105. Vector selects that fit into a legal register type lower just fine. I was mislead by the code fragment that I was using. The stores/loads that I saw in those cases came from lowering the conditional off an address. Changing the code fragment to: %T0_3 = type <8 x i18> %T1_3 = type <8 x i1> define void @func_blend3(%T0_3* %loadaddr, %T0_3* %loadaddr2, %T1_3* %blend, %T0_3* %storeaddr) { %v0 = load %T0_3* %loadaddr %v1 = load %T0_3* %loadaddr2 ==> FROM: ;%c = load %T1_3* %blend ==> TO: %c = icmp slt %T0_3 %v0, %v1 ==> USE: %r = select %T1_3 %c, %T0_3 %v0, %T0_3 %v1 store %T0_3 %r, %T0_3* %storeaddr ret void } revealed this mistake. radar://13403975 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177170 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-15 18:31:01 +00:00
Arnold Schwaighofer	c0d8dc0eb6	ARM cost model: Fix cost of fptrunc and fpext instructions A vector fptrunc and fpext simply gets split into scalar instructions. radar://13192358 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177159 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-15 15:10:47 +00:00
Arnold Schwaighofer	d81511f0a6	ARM cost model: Increase cost of some vector selects we do terrible on By terrible I mean we store/load from the stack. This matters on PAQp8 in _Z5trainPsS_ii (which is inlined into Mixer::update) where we decide to vectorize a loop with a VF of 8 resulting in a 25% degradation on a cortex-a8. LV: Found an estimated cost of 2 for VF 8 For instruction: icmp slt i32 LV: Found an estimated cost of 2 for VF 8 For instruction: select i1, i32, i32 The bug that tracks the CodeGen part is PR14868. radar://13403975 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@177105 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-14 19:17:02 +00:00
Arnold Schwaighofer	b6f4872d29	ARM cost model: Increase the cost for vector casts that use the stack Increase the cost of v8/v16-i8 to v8/v16-i32 casts and truncates as the backend currently lowers those using stack accesses. This was responsible for a significant degradation on MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1 where we vectorize one loop to a vector factor of 16. After this patch we select a vector factor of 4 which will generate reasonable code. unsigned char cle[32]; void test(short c) { unsigned short compte; for (compte = 0; compte <= 31; compte++) { cle[compte] = cle[compte] ^ c; } } radar://13220512 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176898 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-12 21:19:22 +00:00
Jan Wen Voung	4323665bd8	Revert the test moves from 176733. Use "REQUIRES: asserts" instead. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176873 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-12 16:27:52 +00:00
Jan Wen Voung	fa785cb22d	Disable statistics on Release builds and move tests that depend on -stats. Summary: Statistics are still available in Release+Asserts (any +Asserts builds), and stats can also be turned on with LLVM_ENABLE_STATS. Move some of the FastISel stats that were moved under DEBUG() back out of DEBUG(), since stats are disabled across the board now. Many tests depend on grepping "-stats" output. Move those into a orig_dir/Stats/. so that they can be marked as unsupported when building without statistics. Differential Revision: http://llvm-reviews.chandlerc.com/D486 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176733 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-08 22:56:31 +00:00
Shuxin Yang	985dac6579	Memory Dependence Analysis (not mem-dep test) take advantage of "invariant.load" metadata. The "invariant.load" metadata indicates the memory unit being accessed is immutable. A load annotated with this metadata can be moved across any store. As I am not sure if it is legal to move such loads across barrier/fence, this change dose not allow such transformation. rdar://11311484 Thank Arnold for code review. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176562 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-06 17:48:48 +00:00
Arnold Schwaighofer	5f0d9dbdf4	X86 cost model: Adjust cost for custom lowered vector multiplies This matters for example in following matrix multiply: int mmult(int rows, int cols, int m1, int m2, int m3) { int i, j, k, val; for (i=0; i<rows; i++) { for (j=0; j<cols; j++) { val = 0; for (k=0; k<cols; k++) { val += m1[i][k] * m2[k][j]; } m3[i][j] = val; } } return(m3); } Taken from the test-suite benchmark Shootout. We estimate the cost of the multiply to be 2 while we generate 9 instructions for it and end up being quite a bit slower than the scalar version (48% on my machine). Also, properly differentiate between avx1 and avx2. On avx-1 we still split the vector into 2 128bits and handle the subvector muls like above with 9 instructions. Only on avx-2 will we have a cost of 9 for v4i64. I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an add instead of a mul because with a mul we now no longer vectorize. I did verify that the mul would be indeed more expensive when vectorized with 3 kernels: for (i ...) r += a[i] * 3; for (i ...) m1[i] = m1[i] * 3; // This matches the test case in avx1.ll and a matrix multiply. In each case the vectorized version was considerably slower. radar://13304919 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176403 91177308-0d34-0410-b5e6-96231b3b80d8	2013-03-02 04:02:52 +00:00
Benjamin Kramer	8611d4449a	Cost model support for lowered math builtins. We make the cost for calling libm functions extremely high as emitting the calls is expensive and causes spills (on x86) so performance suffers. We still vectorize important calls like ceilf and friends on SSE4.1. and fabs. Differential Revision: http://llvm-reviews.chandlerc.com/D466 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@176287 91177308-0d34-0410-b5e6-96231b3b80d8	2013-02-28 19:09:33 +00:00
Bill Wendling	351b7a10e2	Use references to attribute groups on the call/invoke instructions. Listing all of the attributes for the callee of a call/invoke instruction is way too much and makes the IR unreadable. Use references to attributes instead. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175877 91177308-0d34-0410-b5e6-96231b3b80d8	2013-02-22 09:09:42 +00:00
Elena Demikhovsky	52981c4b60	I optimized the following patterns: sext <4 x i1> to <4 x i64> sext <4 x i8> to <4 x i64> sext <4 x i16> to <4 x i64> I'm running Combine on SIGN_EXTEND_IN_REG and revert SEXT patterns: (sext_in_reg (v4i64 anyext (v4i32 x )), ExtraVT) -> (v4i64 sext (v4i32 sext_in_reg (v4i32 x , ExtraVT))) The sext_in_reg (v4i32 x) may be lowered to shl+sar operations. The "sar" does not exist on 64-bit operation, so lowering sext_in_reg (v4i64 x) has no vector solution. I also added a cost of this operations to the AVX costs table. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@175619 91177308-0d34-0410-b5e6-96231b3b80d8	2013-02-20 12:42:54 +00:00

1 2 3 4 5 ...

739 Commits