llvm-6502

mirror of https://github.com/c64scene-ar/llvm-6502.git synced 2024-07-23 16:30:00 +00:00

Author	SHA1	Message	Date
Rafael Espindola	034b8f9d31	Trivial cleanup: reuse existing variable. Extracted while trying to understand http://llvm-reviews.chandlerc.com/D1764. Patch by Matt Arsenault. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201425 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-14 19:02:01 +00:00
Chandler Carruth	8615ab4a4a	[LPM] Switch LICM to actively use LCSSA in addition to preserving it. Fixes PR18753 and PR18782. This is necessary for LICM to preserve LCSSA correctly and efficiently. There is still some active discussion about whether we should be using LCSSA, but we can't just immediately stop using it and we need LICM to preserve it while we are using it. We can restore the old SSAUpdater driven code if and when there is a serious effort to remove the reliance on LCSSA from all of the loop passes. However, this also serves as a great example of why LCSSA is very nice to have. This change significantly simplifies the process of sinking instructions for LICM, and makes it quite a bit less expensive. It wouldn't even be as complex as it is except that I had to start the process of removing the big recursive LCSSA formation hammer in order to switch even this much of the re-forming code to asserting that LCSSA was preserved. I'll fully remove that next just to tidy things up until the LCSSA debate settles one way or the other. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201148 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-11 12:52:27 +00:00
Quentin Colombet	921f0b1d66	[CodeGenPrepare] Undo changes that happened for the profitability check. The addressing mode matcher checks at some point the profitability of folding an instruction into the addressing mode. When the instruction to be folded has several uses, it checks that the instruction can be folded in each use. To do so, it creates a new matcher for each use and check if the instruction is in the list of the matched instructions of this new matcher. The new matchers may promote some instructions and this has to be undone to keep the state of the original matcher consistent. A test case will follow. <rdar://problem/16020230> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201121 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-11 01:59:02 +00:00
Benjamin Kramer	299918ad48	Make succ_iterator a real random access iterator and clean up a couple of users. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201088 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-10 14:17:42 +00:00
Juergen Ributzka	6f1819f2e6	[Constant Hoisting] Fix insertion point for constant materialization. The bitcast instruction during constant materialization was not placed correcly in the presence of phi nodes. This commit fixes the insertion point to be in the idom instead. This fixes PR18768 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201009 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-08 00:20:49 +00:00
Juergen Ributzka	1368e659d7	[Constant Hoisting] Don't update the use list while traversing it - DOH! This fix first traverses the whole use list of the constant expression and keeps track of the instructions that need to be updated. Then perform the fixup afterwards. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201008 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-08 00:20:45 +00:00
Quentin Colombet	30c0f72237	[CodeGenPrepare] Move away sign extensions that get in the way of addressing mode. Basically the idea is to transform code like this: %idx = add nsw i32 %a, 1 %sextidx = sext i32 %idx to i64 %gep = gep i8* %myArray, i64 %sextidx load i8* %gep Into: %sexta = sext i32 %a to i64 %idx = add nsw i64 %sexta, 1 %gep = gep i8* %myArray, i64 %idx load i8* %gep That way the computation can be folded into the addressing mode. This transformation is done as part of the addressing mode matcher. If the matching fails (not profitable, addressing mode not legal, etc.), the matcher will revert the related promotions. <rdar://problem/15519855> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200947 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-06 21:44:56 +00:00
Nick Lewycky	44e40408ee	A memcpy out of an fresh alloca is a no-op, delete it. Patch by Patrick Walton! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200907 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-06 06:29:19 +00:00
Paul Robinson	2684ddd72e	Disable most IR-level transform passes on functions marked 'optnone'. Ideally only those transform passes that run at -O0 remain enabled, in reality we get as close as we reasonably can. Passes are responsible for disabling themselves, it's not the job of the pass manager to do it for them. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200892 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-06 00:07:05 +00:00
Duncan P. N. Exon Smith	483727da48	cleanup: scc_iterator consumers should use isAtEnd No functional change. Updated loops from: for (I = scc_begin(), E = scc_end(); I != E; ++I) to: for (I = scc_begin(); !I.isAtEnd(); ++I) for teh win. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200789 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-04 19:19:07 +00:00
Nick Lewycky	e4d1a3e352	Self-memcpy-elision and memcpy of constant byte to memset transforms don't care how many bytes you were trying to transfer. Sink that safety test after those transforms. Noticed by inspection. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200726 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-04 00:18:54 +00:00
Chandler Carruth	115fd30b24	[LPM] Apply a really big hammer to fix PR18688 by recursively reforming LCSSA when we promote to SSA registers inside of LICM. Currently, this is actually necessary. The promotion logic in LICM uses SSAUpdater which doesn't understand how to place LCSSA PHI nodes. Teaching it to do so would be a very significant undertaking. It may be worthwhile and I've left a FIXME about this in the code as well as starting a thread on llvmdev to try to figure out the right long-term solution. For now, the PR needs to be fixed. Short of using the promition SSAUpdater to place both the LCSSA PHI nodes and the promoted PHI nodes, I don't see a cleaner or cheaper way of achieving this. Fortunately, LCSSA is relatively lazy and sparse -- it should only update instructions which need it. We can also skip the recursive variant when we don't promote to SSA values. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200612 91177308-0d34-0410-b5e6-96231b3b80d8	2014-02-01 13:35:14 +00:00
Chandler Carruth	a403ceb205	[LPM] Fix PR18643, another scary place where loop transforms failed to preserve loop simplify of enclosing loops. The problem here starts with LoopRotation which ends up cloning code out of the latch into the new preheader it is buidling. This can create a new edge from the preheader into the exit block of the loop which breaks LoopSimplify form. The code tries to fix this by splitting the critical edge between the latch and the exit block to get a new exit block that only the latch dominates. This sadly isn't sufficient. The exit block may be an exit block for multiple nested loops. When we clone an edge from the latch of the inner loop to the new preheader being built in the outer loop, we create an exiting edge from the outer loop to this exit block. Despite breaking the LoopSimplify form for the inner loop, this is fine for the outer loop. However, when we split the edge from the inner loop to the exit block, we create a new block which is in neither the inner nor outer loop as the new exit block. This is a predecessor to the old exit block, and so the split itself takes the outer loop out of LoopSimplify form. We need to split every edge entering the exit block from inside a loop nested more deeply than the exit block in order to preserve all of the loop simplify constraints. Once we try to do that, a problem with splitting critical edges surfaces. Previously, we tried a very brute force to update LoopSimplify form by re-computing it for all exit blocks. We don't need to do this, and doing this much will sometimes but not always overlap with the LoopRotate bug fix. Instead, the code needs to specifically handle the cases which can start to violate LoopSimplify -- they aren't that common. We need to see if the destination of the split edge was a loop exit block in simplified form for the loop of the source of the edge. For this to be true, all the predecessors need to be in the exact same loop as the source of the edge being split. If the dest block was originally in this form, we have to split all of the deges back into this loop to recover it. The old mechanism of doing this was conservatively correct because at least one of the exiting blocks it rewrote was the DestBB and so the DestBB's predecessors were fixed. But this is a much more targeted way of doing it. Making it targeted is important, because ballooning the set of edges touched prevents LoopRotate from being able to split edges it needs to split to preserve loop simplify in a coherent way -- the critical edge splitting would sometimes find the other edges in need of splitting but not others. Many, many thanks for help from Nick reducing these test cases mightily. And helping lots with the analysis here as this one was quite tricky to track down. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200393 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-29 13:16:53 +00:00
Chandler Carruth	6a67a3f3ec	[LPM] Fix PR18642, a pretty nasty bug in IndVars that "never mattered" because of the inside-out run of LoopSimplify in the LoopPassManager and the fact that LoopSimplify couldn't be "preserved" across two independent LoopPassManagers. Anyways, in that case, IndVars wasn't correctly preserving an LCSSA PHI node because it thought it was rewriting (via SCEV) the incoming value to a loop invariant value. While it may well be invariant for the current loop, it may be rewritten in terms of an enclosing loop's values. This in and of itself is fine, as the LCSSA PHI node in the enclosing loop for the inner loop value we're rewriting will have its own LCSSA PHI node if used outside of the enclosing loop. With me so far? Well, the current loop and the enclosing loop may share an exiting block and exit block, and when they do they also share LCSSA PHI nodes. In this case, its not valid to RAUW through the LCSSA PHI node. Expected crazy test included. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200372 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-29 04:40:19 +00:00
Reid Kleckner	59bec0e3c0	Update optimization passes to handle inalloca arguments Summary: I searched Transforms/ and Analysis/ for 'ByVal' and updated those call sites to check for inalloca if appropriate. I added tests for any change that would allow an optimization to fire on inalloca. Reviewers: nlewycky Differential Revision: http://llvm-reviews.chandlerc.com/D2449 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200281 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-28 02:38:36 +00:00
Benjamin Kramer	08aa38d39b	ConstantHoisting: We can't insert instructions directly in front of a PHI node. Insert before the terminating instruction of the dominating block instead. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200218 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-27 13:11:43 +00:00
Chandler Carruth	3d69cf57e1	[LPM] Make LCSSA a utility with a FunctionPass that applies it to all the loops in a function, and teach LICM to work in the presance of LCSSA. Previously, LCSSA was a loop pass. That made passes requiring it also be loop passes and unable to depend on function analysis passes easily. It also caused outer loops to have a different "canonical" form from inner loops during analysis. Instead, we go into LCSSA form and preserve it through the loop pass manager run. Note that this has the same problem as LoopSimplify that prevents enabling its verification -- loop passes which run at the end of the loop pass manager and don't preserve these are valid, but the subsequent loop pass runs of outer loops that do preserve this pass trigger too much verification and fail because the inner loop no longer verifies. The other problem this exposed is that LICM was completely unable to handle LCSSA form. It didn't preserve it and it actually would give up on moving instructions in many cases when they were used by an LCSSA phi node. I've taught LICM to support detecting LCSSA-form PHI nodes and to hoist and sink around them. This may actually let LICM fire significantly more because we put everything into LCSSA form to rotate the loop before running LICM. =/ Now LICM should handle that fine and preserve it correctly. The down side is that LICM has to require LCSSA in order to preserve it. This is just a fact of life for LCSSA. It's entirely possible we should completely remove LCSSA from the optimizer. The test updates are essentially accomodating LCSSA phi nodes in the output of LICM, and the fact that we now completely sink every instruction in ashr-crash below the loop bodies prior to unrolling. With this change, LCSSA is computed only three times in the pass pipeline. One of them could be removed (and potentially a SCEV run and a separate LoopPassManager entirely!) if we had a LoopPass variant of InstCombine that ran InstCombine on the loop body but refused to combine away LCSSA PHI nodes. Currently, this also prevents loop unrolling from being in the same loop pass manager is rotate, LICM, and unswitch. There is one thing that I really don't like -- preserving LCSSA in LICM is quite expensive. We end up having to re-run LCSSA twice for some loops after LICM runs because LICM can undo LCSSA both in the current loop and the parent loop. I don't really see good solutions to this other than to completely move away from LCSSA and using tools like SSAUpdater instead. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200067 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-25 04:07:24 +00:00
Juergen Ributzka	943ce55f39	Revert "Revert "Add Constant Hoisting Pass" (r200034)" This reverts commit r200058 and adds the using directive for ARMTargetTransformInfo to silence two g++ overload warnings. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200062 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-25 02:02:55 +00:00
Hans Wennborg	503793e834	Revert "Add Constant Hoisting Pass" (r200034) This commit caused -Woverloaded-virtual warnings. The two new TargetTransformInfo::getIntImmCost functions were only added to the superclass, and to the X86 subclass. The other targets were not updated, and the warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was hiding the two new getIntImmCost variants. We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost" to the various subclasses, or turning it off, but I suspect that it's wrong to leave the functions unimplemnted in those targets. The default implementations return TCC_Free, which I don't think is right e.g. for ARM. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200058 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-25 01:18:18 +00:00
Juergen Ributzka	96172cb4a4	Add Constant Hoisting Pass Retry commit r200022 with a fix for the build bot errors. Constant expressions have (unlike instructions) module scope use lists and therefore may have users in different functions. The fix is to simply ignore these out-of-function uses. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200034 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-24 20:18:00 +00:00
Juergen Ributzka	dc6f9b9a4f	Revert "Add Constant Hoisting Pass" This reverts commit r200022 to unbreak the build bots. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200024 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-24 18:40:30 +00:00
Juergen Ributzka	fb282c68b7	Add Constant Hoisting Pass This pass identifies expensive constants to hoist and coalesces them to better prepare it for SelectionDAG-based code generation. This works around the limitations of the basic-block-at-a-time approach. First it scans all instructions for integer constants and calculates its cost. If the constant can be folded into the instruction (the cost is TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't consider it expensive and leave it alone. This is the default behavior and the default implementation of getIntImmCost will always return TCC_Free. If the cost is more than TCC_BASIC, then the integer constant can't be folded into the instruction and it might be beneficial to hoist the constant. Similar constants are coalesced to reduce register pressure and materialization code. When a constant is hoisted, it is also hidden behind a bitcast to force it to be live-out of the basic block. Otherwise the constant would be just duplicated and each basic block would have its own copy in the SelectionDAG. The SelectionDAG recognizes such constants as opaque and doesn't perform certain transformations on them, which would create a new expensive constant. This optimization is only applied to integer constants in instructions and simple (this means not nested) constant cast experessions. For example: %0 = load i64* inttoptr (i64 big_constant to i64*) Reviewed by Eric git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200022 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-24 18:23:08 +00:00
Alp Toker	ae43cab6ba	Fix known typos Sweep the codebase for common typos. Includes some changes to visible function names that were misspelt. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200018 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-24 17:20:08 +00:00
Chandler Carruth	4296ce5662	[LPM] Fix a logic error in LICM spotted by inspection. We completely skipped promotion in LICM if the loop has a preheader or dedicated exits, but not both. We hoist if there is a preheader, and sink if there are dedicated exits, but either hoisting or sinking can move loop invariant code out of the loop! I have no idea if this has a practical consequence. If anyone has ideas for a test case, let me know. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199966 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-24 02:24:47 +00:00
Chandler Carruth	42e23de4db	[cleanup] Use the type-based preservation method rather than a string literal that bakes a pass name and forces parsing it in the pass manager. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199963 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-24 01:59:49 +00:00
Chandler Carruth	aaf44af769	[LPM] Make LoopSimplify no longer a LoopPass and instead both a utility function and a FunctionPass. This has many benefits. The motivating use case was to be able to compute function analysis passes after running LoopSimplify (to avoid invalidating them) and then to run other passes which require LoopSimplify. Specifically passes like unrolling and vectorization are critical to wire up to BranchProbabilityInfo and BlockFrequencyInfo so that they can be profile aware. For the LoopVectorize pass the only things in the way are LoopSimplify and LCSSA. This fixes LoopSimplify and LCSSA is next on my list. There are also a bunch of other benefits of doing this: - It is now very feasible to make more passes preserve LoopSimplify because they can simply run it after changing a loop. Because subsequence passes can assume LoopSimplify is preserved we can reduce the runs of this pass to the times when we actually mutate a loop structure. - The new pass manager should be able to more easily support loop passes factored in this way. - We can at long, long last observe that LoopSimplify is preserved across SCEV. This halves the number of times we run LoopSimplify!!! Now, getting here wasn't trivial. First off, the interfaces used by LoopSimplify are all over the map regarding how analysis are updated. We end up with weird "pass" parameters as a consequence. I'll try to clean at least some of this up later -- I'll have to have it all clean for the new pass manager. Next up I discovered a really frustrating bug. LoopUnroll claims to preserve LoopSimplify. That's actually a lie. But the way the LoopPassManager ends up running the passes, it always ran LoopSimplify on the unrolled-into loop, rectifying this oversight before any verification could kick in and point out that in fact nothing was preserved. So I've added code to the unroller to actually simplify the surrounding loop when it succeeds at unrolling. The only functional change in the test suite is that we now catch a case that was previously missed because SCEV and other loop transforms see their containing loops as simplified and thus don't miss some opportunities. One test case has been converted to check that we catch this case rather than checking that we miss it but at least don't get the wrong answer. Note that I have #if-ed out all of the verification logic in LoopSimplify! This is a temporary workaround while extracting these bits from the LoopPassManager. Currently, there is no way to have a pass in the LoopPassManager which preserves LoopSimplify along with one which does not. The LPM will try to verify on each loop in the nest that LoopSimplify holds but the now-Function-pass cannot distinguish what loop is being verified and so must try to verify all of them. The inner most loop is clearly no longer simplified as there is a pass which didn't even attempt to preserve it. =/ Once I get LCSSA out (and maybe LoopVectorize and some other fixes) I'll be able to re-enable this check and catch any places where we are still failing to preserve LoopSimplify. If this causes problems I can back this out and try to commit all of this at once, but so far this seems to work and allow much more incremental progress. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199884 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-23 11:23:19 +00:00
Matt Arsenault	88a9f0476c	Handle an addrspacecast case in memcpyopt git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199836 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-22 21:53:19 +00:00
Tim Northover	b9b629cbaa	Loop strength reduce: fix function name. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199801 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-22 13:27:00 +00:00
Chandler Carruth	c04f2c99ab	[SROA] Fix a bug which could cause the common type finding to return inconsistent results for different orderings of alloca slices. The fundamental issue is that it is just always a mistake to return early from this function. There is no effective early exit to leverage. This patch stops trynig to do so and simplifies the code a bit as a consequence. Original diagnosis and patch by James Molloy with some name tweaks by me in part reflecting feedback from Duncan Smith on the mailing list. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199771 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-21 23:16:05 +00:00
Chandler Carruth	e1a5243053	Fix a really nasty SROA bug with how we handled out-of-bounds memcpy intrinsics. Reported on the list by Evan with a couple of attempts to fix, but it took a while to dig down to the root cause. There are two overlapping bugs here, both centering around the circumstance of discovering a memcpy operand which is known to be completely outside the bounds of the alloca. First, we need to kill the other side of the memcpy if it was added to this alloca. Otherwise we'll factor it into our slicing and try to rewrite it even though we know for a fact that it is dead. This is made more tricky because we can visit the sides in either order. So we have to both kill the other side and skip instructions marked as dead. The latter really should be goodness in every case, but here is a matter of correctness. Second, we need to actually remove the uses of the alloca by the memcpy when queuing it for later deletion. Otherwise it may still be using the alloca when we go to promote it (if the rewrite re-uses the existing alloca instruction). Do this by factoring out the use-clobbering used when for nixing a Phi argument and re-using it across the operands of a to-be-deleted instruction. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199590 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-19 12:16:54 +00:00
Quentin Colombet	9b24eeee01	[opt][PassInfo] Allow opt to run passes that need target machine. When registering a pass, a pass can now specify a second construct that takes as argument a pointer to TargetMachine. The PassInfo class has been updated to reflect that possibility. If such a constructor exists opt will use it instead of the default constructor when instantiating the pass. Since such IR passes are supposed to be rare, no specific support has been added to this commit to allow an easy registration of such a pass. In other words, for such pass, the initialization function has to be hand-written (see CodeGenPrepare for instance). Now, codegenprepare can be tested using opt: opt -codegenprepare -mtriple=mytriple input.ll git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199430 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-16 21:44:34 +00:00
Chandler Carruth	7f2eff792a	[PM] Split DominatorTree into a concrete analysis result object which can be used by both the new pass manager and the old. This removes it from any of the virtual mess of the pass interfaces and lets it derive cleanly from the DominatorTreeBase<> template. In turn, tons of boilerplate interface can be nuked and it turns into a very straightforward extension of the base DominatorTree interface. The old analysis pass is now a simple wrapper. The names and style of this split should match the split between CallGraph and CallGraphWrapperPass. All of the users of DominatorTree have been updated to match using many of the same tricks as with CallGraph. The goal is that the common type remains the resulting DominatorTree rather than the pass. This will make subsequent work toward the new pass manager significantly easier. Also in numerous places things became cleaner because I switched from re-running the pass (!!! mid way through some other passes run!!!) to directly recomputing the domtree. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199104 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-13 13:07:17 +00:00
Chandler Carruth	2073b0a63c	[PM] Pull the generic graph algorithms and data structures for dominator trees into the Support library. These are all expressed in terms of the generic GraphTraits and CFG, with no reliance on any concrete IR types. Putting them in support clarifies that and makes the fact that the static analyzer in Clang uses them much more sane. When moving the Dominators.h file into the IR library I claimed that this was the right home for it but not something I planned to work on. Oops. So why am I doing this? It happens to be one step toward breaking the requirement that IR verification can only be performed from inside of a pass context, which completely blocks the implementation of verification for the new pass manager infrastructure. Fixing it will also allow removing the concept of the "preverify" step (WTF???) and allow the verifier to cleanly flag functions which fail verification in a way that precludes even computing dominance information. Currently, that results in a fatal error even when you ask the verifier to not fatally error. It's awesome like that. The yak shaving will continue... git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199095 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-13 10:52:56 +00:00
Chandler Carruth	56e1394c88	[cleanup] Move the Dominators.h and Verifier.h headers into the IR directory. These passes are already defined in the IR library, and it doesn't make any sense to have the headers in Analysis. Long term, I think there is going to be a much better way to divide these matters. The dominators code should be fully separated into the abstract graph algorithm and have that put in Support where it becomes obvious that evn Clang's CFGBlock's can use it. Then the verifier can manually construct dominance information from the Support-driven interface while the Analysis library can provide a pass which both caches, reconstructs, and supports a nice update API. But those are very long term, and so I don't want to leave the really confusing structure until that day arrives. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199082 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-13 09:26:24 +00:00
Chandler Carruth	9f20a4c6ce	Re-sort #include lines again, prior to moving headers around. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@199080 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-13 08:04:33 +00:00
Diego Novillo	4b2b2da9c7	Extend and simplify the sample profile input file. 1- Use the line_iterator class to read profile files. 2- Allow comments in profile file. Lines starting with '#' are completely ignored while reading the profile. 3- Add parsing support for discriminators and indirect call samples. Our external profiler can emit more profile information that we are currently not handling. This patch does not add new functionality to support this information, but it allows profile files to provide it. I will add actual support later on (for at least one of these features, I need support for DWARF discriminators in Clang). A sample line may contain the following additional information: Discriminator. This is used if the sampled program was compiled with DWARF discriminator support (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This is currently only emitted by GCC and we just ignore it. Potential call targets and samples. If present, this line contains a call instruction. This models both direct and indirect calls. Each called target is listed together with the number of samples. For example, 130: 7 foo:3 bar:2 baz:7 The above means that at relative line offset 130 there is a call instruction that calls one of foo(), bar() and baz(). With baz() being the relatively more frequent call target. Differential Revision: http://llvm-reviews.chandlerc.com/D2355 4- Simplify format of profile input file. This implements earlier suggestions to simplify the format of the sample profile file. The symbol table is not necessary and function profiles do not need to know the number of samples in advance. Differential Revision: http://llvm-reviews.chandlerc.com/D2419 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198973 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-10 23:23:51 +00:00
Diego Novillo	0de8cecb84	Propagation of profile samples through the CFG. This adds a propagation heuristic to convert instruction samples into branch weights. It implements a similar heuristic to the one implemented by Dehao Chen on GCC. The propagation proceeds in 3 phases: 1- Assignment of block weights. All the basic blocks in the function are initial assigned the same weight as their most frequently executed instruction. 2- Creation of equivalence classes. Since samples may be missing from blocks, we can fill in the gaps by setting the weights of all the blocks in the same equivalence class to the same weight. To compute the concept of equivalence, we use dominance and loop information. Two blocks B1 and B2 are in the same equivalence class if B1 dominates B2, B2 post-dominates B1 and both are in the same loop. 3- Propagation of block weights into edges. This uses a simple propagation heuristic. The following rules are applied to every block B in the CFG: - If B has a single predecessor/successor, then the weight of that edge is the weight of the block. - If all the edges are known except one, and the weight of the block is already known, the weight of the unknown edge will be the weight of the block minus the sum of all the known edges. If the sum of all the known edges is larger than B's weight, we set the unknown edge weight to zero. - If there is a self-referential edge, and the weight of the block is known, the weight for that edge is set to the weight of the block minus the weight of the other incoming edges to that block (if known). Since this propagation is not guaranteed to finalize for every CFG, we only allow it to proceed for a limited number of iterations (controlled by -sample-profile-max-propagate-iterations). It currently uses the same GCC default of 100. Before propagation starts, the pass builds (for each block) a list of unique predecessors and successors. This is necessary to handle identical edges in multiway branches. Since we visit all blocks and all edges of the CFG, it is cleaner to build these lists once at the start of the pass. Finally, the patch fixes the computation of relative line locations. The profiler emits lines relative to the function header. To discover it, we traverse the compilation unit looking for the subprogram corresponding to the function. The line number of that subprogram is the line where the function begins. That becomes line zero for all the relative locations. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198972 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-10 23:23:46 +00:00
Chandler Carruth	560e3955c3	Put the functionality for printing a value to a raw_ostream as an operand into the Value interface just like the core print method is. That gives a more conistent organization to the IR printing interfaces -- they are all attached to the IR objects themselves. Also, update all the users. This removes the 'Writer.h' header which contained only a single function declaration. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198836 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-09 02:29:41 +00:00
Chandler Carruth	bc65a8d518	Move the LLVM IR asm writer header files into the IR directory, as they are part of the core IR library in order to support dumping and other basic functionality. Rename the 'Assembly' include directory to 'AsmParser' to match the library name and the only functionality left their -- printing has been in the core IR library for quite some time. Update all of the #includes to match. All of this started because I wanted to have the layering in good shape before I started adding support for printing LLVM IR using the new pass infrastructure, and commandline support for the new pass infrastructure. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198688 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-07 12:34:26 +00:00
Chandler Carruth	974a445bd9	Re-sort all of the includes with ./utils/sort_includes.py so that subsequent changes are easier to review. About to fix some layering issues, and wanted to separate out the necessary churn. Also comment and sink the include of "Windows.h" in three .inc files to match the usage in Memory.inc. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198685 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-07 11:48:04 +00:00
Andrew Trick	b4e0c9b85d	Reapply r198654 "indvars: sink truncates outside the loop." This doesn't seem to have actually broken anything. It was paranoia on my part. Trying again now that bots are more stable. This is a follow up of the r198338 commit that added truncates for lcssa phi nodes. Sinking the truncates below the phis cleans up the loop and simplifies subsequent analysis within the indvars pass. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198678 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-07 06:59:12 +00:00
Andrew Trick	2352abb7c6	Revert "indvars: sink truncates outside the loop." This reverts commit r198654. One of the bots reported a SciMark failure. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198659 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-07 01:50:58 +00:00
Andrew Trick	ced88c5918	indvars: sink truncates outside the loop. This is a follow up of the r198338 commit that added truncates for lcssa phi nodes. Sinking the truncates below the phis cleans up the loop and simplifies subsequent analysis within the indvars pass. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198654 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-07 01:02:55 +00:00
Andrew Trick	f86063e7ec	80 col. comment. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198653 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-07 01:02:52 +00:00
Alp Toker	395f7c2505	Add missed cleanup from r198456 All other uses of this macro in LLVM/clang have been moved to the function definition so follow suite (and the usage advice) here too for consistency. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198516 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-04 22:47:48 +00:00
Nico Weber	c3d3f0c696	Add a LLVM_DUMP_METHOD macro. The motivation is to mark dump methods as used in debug builds so that they can be called from lldb, but to not do so in release builds so that they can be dead-stripped. There's lots of potential follow-up work suggested in the thread "Should dump methods be LLVM_ATTRIBUTE_USED only in debug builds?" on cfe-dev, but everyone seems to agreen on this subset. Macro name chosen by fair coin toss. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198456 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-03 22:53:37 +00:00
David Peixotto	dace98d805	Fix loop rerolling pass failure with non-consant loop lower bound The loop rerolling pass was failing with an assertion failure from a failed cast on loops like this: void foo(int A, int B, int m, int n) { for (int i = m; i < n; i+=4) { A[i+0] = B[i+0] * 4; A[i+1] = B[i+1] * 4; A[i+2] = B[i+2] * 4; A[i+3] = B[i+3] * 4; } } The code was casting the SCEV-expanded code for the new induction variable to a phi-node. When the loop had a non-constant lower bound, the SCEV expander would end the code expansion with an add insted of a phi node and the cast would fail. It looks like the cast to a phi node was only needed to get the induction variable value coming from the backedge to compute the end of loop condition. This patch changes the loop reroller to compare the induction variable to the number of times the backedge is taken instead of the iteration count of the loop. In other words, we stop the loop when the current value of the induction variable == IterationCount-1. Previously, the comparison was comparing the induction variable value from the next iteration == IterationCount. This problem only seems to occur on 32-bit targets. For some reason, the loop is not rerolled on 64-bit targets. PR18290 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198425 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-03 17:20:01 +00:00
Hal Finkel	ac8ba0c0fd	Disable compare sinking in CodeGenPrepare when multiple condition registers are available As noted in the comment above CodeGenPrepare::OptimizeInst, which aggressively sinks compares to reduce pressure on the condition register(s), for targets such as PowerPC with multiple condition registers, this may not be the right thing to do. This adds an HasMultipleConditionRegisters boolean to TLI, and CodeGenPrepare::OptimizeInst is skipped when HasMultipleConditionRegisters is true. This functionality will be used by the PowerPC backend in an upcoming commit. Especially when the PowerPC backend starts tracking individual condition register bits as separate allocatable entities (which will happen in this upcoming commit), this sinking from CodeGenPrepare::OptimizeInst is significantly suboptimial. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198354 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-02 21:13:43 +00:00
Andrew Trick	5f8e79e6d2	indvars: cleanup the IV visitor. It does more than gather sext/zext info. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198353 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-02 21:12:11 +00:00
Andrew Trick	fcbe3d9501	indvars: insert truncate at loop boundary to avoid redundant IVs. When widening an IV to remove s/zext, we generally try to eliminate the original narrow IV. However, LCSSA phi nodes outside the loop were still using the original IV. Clean this up more aggressively to avoid redundancy in generated code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@198338 91177308-0d34-0410-b5e6-96231b3b80d8	2014-01-02 19:29:38 +00:00
Andrew Trick	c7b0b7dc8f	Add support to indvars for optimizing sadd.with.overflow. Split sadd.with.overflow into add + sadd.with.overflow to allow analysis and optimization. This should ideally be done after InstCombine, which can perform code motion (eventually indvars should run after all canonical instcombines). We want ISEL to recombine the add and the check, at least on x86. This is currently under an option for reducing live induction variables: -liv-reduce. The next step is reducing liveness of IVs that are live out of the overflow check paths. Once the related optimizations are fully developed, reviewed and tested, I do expect this to become default. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@197926 91177308-0d34-0410-b5e6-96231b3b80d8	2013-12-23 23:31:49 +00:00
Richard Sandiford	166acc9489	Fix Scalarizer insertion point when replacing PHIs with insertelements If the Scalarizer scalarized a vector PHI but could not scalarize all uses of it, it would insert a series of insertelements to reconstruct the vector PHI value from the scalar ones. The problem was that it would emit these insertelements immediately after the PHI, even if there were other PHIs after it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@197909 91177308-0d34-0410-b5e6-96231b3b80d8	2013-12-23 14:51:56 +00:00
Richard Sandiford	b09beed540	Fix Scalarizer handling of vector GEPs with multiple index operands The old code only worked for one index operand. Also handle "inbounds". git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@197908 91177308-0d34-0410-b5e6-96231b3b80d8	2013-12-23 14:45:00 +00:00
NAKAMURA Takumi	e1d55bb5d5	Add proper dependencies to LLVMBuild.txt in llvm/lib. I'll prune redundant deps in LLVMBuild.txt, later. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196881 91177308-0d34-0410-b5e6-96231b3b80d8	2013-12-10 05:39:34 +00:00
Jakub Staszak	7ae72bfd94	Don't #include heavy Dominators.h file in LoopInfo.h. This change reduces overall time of LLVM compilation by ~1%. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196667 91177308-0d34-0410-b5e6-96231b3b80d8	2013-12-07 21:20:17 +00:00
Michael Gottesman	f3f9cff0fb	Change std::deque => std::vector. No functionality change. There is no reason to use std::deque here over std::vector. Thus given the performance differences inbetween the two it makes sense to change deque to vector. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196524 91177308-0d34-0410-b5e6-96231b3b80d8	2013-12-05 18:42:12 +00:00
Alp Toker	087ab613f4	Correct word hyphenations This patch tries to avoid unrelated changes other than fixing a few hyphen-related ambiguities and contractions in nearby lines. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@196471 91177308-0d34-0410-b5e6-96231b3b80d8	2013-12-05 05:44:44 +00:00
Diego Novillo	d0d8d6462a	Refactor some code in SampleProfile.cpp I'm adding new functionality in the sample profiler. This will require more data to be kept around for each function, so I moved the structure SampleProfile that we keep for each function into a separate class. There are no functional changes in this patch. It simply provides a new home where to place all the new data that I need to propagate weights through edges. There are some other name and minor edits throughout. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195780 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-26 20:37:33 +00:00
Matt Arsenault	08e1b756df	StructurizeCFG: Fix verification failure with some loops. If the beginning of the loop was also the entry block of the function, branches were inserted to the entry block which isn't allowed. If this occurs, create a new dummy function entry block that branches to the start of the loop. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195493 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-22 19:24:39 +00:00
Matt Arsenault	7575fdd7a4	StructurizeCFG: Fix inverting a branch on an argument git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195492 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-22 19:24:37 +00:00
Richard Sandiford	0f778794c8	Add a Scalarizer pass. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195471 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-22 16:58:05 +00:00
Chandler Carruth	ed1951e79f	Fix an issue where SROA computed different results based on the relative order of slices of the alloca which have exactly the same size and other properties. This was found by a perniciously unstable sort implementation used to flush out buggy uses of the algorithm. The fundamental idea is that findCommonType should return the best common type it can find across all of the slices in the range. There were two bugs here previously: 1) We would accept an integer type smaller than a byte-width multiple, and if there were different bit-width integer types, we would accept the first one. This caused an actual failure in the testcase updated here when the sort order changed. 2) If we found a bad combination of types or a non-load, non-store use before an integer typed load or store we would bail, but if we found the integere typed load or store, we would use it. The correct behavior is to always use an integer typed operation which covers the partition if one exists. While a clever debugging sort algorithm found problem #1 in our existing test cases, I have no useful test case ideas for #2. I spotted in by inspection when looking at this code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@195118 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-19 09:03:18 +00:00
Hal Finkel	b7dabccbce	Fix ndebug-build unused variable in loop rerolling git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194941 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-17 01:21:54 +00:00
Hal Finkel	bebe48dbfe	Add a loop rerolling pass This adds a loop rerolling pass: the opposite of (partial) loop unrolling. The transformation aims to take loops like this: for (int i = 0; i < 3200; i += 5) { a[i] += alpha * b[i]; a[i + 1] += alpha * b[i + 1]; a[i + 2] += alpha * b[i + 2]; a[i + 3] += alpha * b[i + 3]; a[i + 4] += alpha * b[i + 4]; } and turn them into this: for (int i = 0; i < 3200; ++i) { a[i] += alpha * b[i]; } and loops like this: for (int i = 0; i < 500; ++i) { x[3i] = foo(0); x[3i+1] = foo(0); x[3*i+2] = foo(0); } and turn them into this: for (int i = 0; i < 1500; ++i) { x[i] = foo(0); } There are two motivations for this transformation: 1. Code-size reduction (especially relevant, obviously, when compiling for code size). 2. Providing greater choice to the loop vectorizer (and generic unroller) to choose the unrolling factor (and a better ability to vectorize). The loop vectorizer can take vector lengths and register pressure into account when choosing an unrolling factor, for example, and a pre-unrolled loop limits that choice. This is especially problematic if the manual unrolling was optimized for a machine different from the current target. The current implementation is limited to single basic-block loops only. The rerolling recognition should work regardless of how the loop iterations are intermixed within the loop body (subject to dependency and side-effect constraints), but the significant restriction is that the order of the instructions in each iteration must be identical. This seems sufficient to capture all current use cases. This pass is not currently enabled by default at any optimization level. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194939 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-16 23:59:05 +00:00
Alexey Samsonov	4223b96010	Fix -Wdelete-non-virtual-dtor warnings by making SampleProfile methods non-virtual git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194568 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-13 13:09:39 +00:00
Diego Novillo	563b29f8db	SampleProfileLoader pass. Initial setup. This adds a new scalar pass that reads a file with samples generated by 'perf' during runtime. The samples read from the profile are incorporated and emmited as IR metadata reflecting that profile. The profile file is assumed to have been generated by an external profile source. The profile information is converted into IR metadata, which is later used by the analysis routines to estimate block frequencies, edge weights and other related data. External profile information files have no fixed format, each profiler is free to define its own. This includes both the on-disk representation of the profile and the kind of profile information stored in the file. A common kind of profile is based on sampling (e.g., perf), which essentially counts how many times each line of the program has been executed during the run. The SampleProfileLoader pass is organized as a scalar transformation. On startup, it reads the file given in -sample-profile-file to determine what kind of profile it contains. This file is assumed to contain profile information for the whole application. The profile data in the file is read and incorporated into the internal state of the corresponding profiler. To facilitate testing, I've organized the profilers to support two file formats: text and native. The native format is whatever on-disk representation the profiler wants to support, I think this will mostly be bitcode files, but it could be anything the profiler wants to support. To do this, every profiler must implement the SampleProfile::loadNative() function. The text format is mostly meant for debugging. Records are separated by newlines, but each profiler is free to interpret records as it sees fit. Profilers must implement the SampleProfile::loadText() function. Finally, the pass will call SampleProfile::emitAnnotations() for each function in the current translation unit. This function needs to translate the loaded profile into IR metadata, which the analyzer will later be able to use. This patch implements the first steps towards the above design. I've implemented a sample-based flat profiler. The format of the profile is fairly simplistic. Each sampled function contains a list of relative line locations (from the start of the function) together with a count representing how many samples were collected at that line during execution. I generate this profile using perf and a separate converter tool. Currently, I have only implemented a text format for these profiles. I am interested in initial feedback to the whole approach before I send the other parts of the implementation for review. This patch implements: - The SampleProfileLoader pass. - The base ExternalProfile class with the core interface. - A SampleProfile sub-class using the above interface. The profiler generates branch weight metadata on every branch instructions that matches the profiles. - A text loader class to assist the implementation of SampleProfile::loadText(). - Basic unit tests for the pass. Additionally, the patch uses profile information to compute branch weights based on instruction samples. This patch converts instruction samples into branch weights. It does a fairly simplistic conversion: Given a multi-way branch instruction, it calculates the weight of each branch based on the maximum sample count gathered from each target basic block. Note that this assignment of branch weights is somewhat lossy and can be misleading. If a basic block has more than one incoming branch, all the incoming branches will get the same weight. In reality, it may be that only one of them is the most heavily taken branch. I will adjust this assignment in subsequent patches. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194566 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-13 12:22:21 +00:00
Shuxin Yang	e26299d76e	Correct a glitch in r194424 which may invalidate iterator. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194457 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-12 08:33:03 +00:00
Shuxin Yang	6c7a7c6474	Fix PR17952. The symptom is that an assertion is triggered. The assertion was added by me to detect the situation when value is propagated from dead blocks. (We can certainly get rid of assertion; it is safe to do so, because propagating value from dead block to alive join node is certainly ok.) The root cause of this bug is : edge-splitting is conducted on the fly, the edge being split could be a dead edge, therefore the block that split the critial edge needs to be flagged "dead" as well. There are 3 ways to fix this bug: 1) Get rid of the assertion as I mentioned eariler 2) When an dead edge is split, flag the inserted block "dead". 3) proactively split the critical edges connecting dead and live blocks when new dead blocks are revealed. This fix go for 3) with additional 2 LOC. Testing case was added by Rafael the other day. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194424 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-11 22:00:23 +00:00
Bill Wendling	855c29d82c	Revert "Resurrect r191017 " GVN proceeds in the presence of dead code" plus a fix to PR17307 & 17308." This causes PR17852. This reverts commit `d93e8a06b2`. Conflicts: test/Transforms/GVN/cond_br2.ll git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194348 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-10 07:34:34 +00:00
Hal Finkel	ab09d1e0ea	Remove dead code from LoopUnswitch LoopUnswitch's code simplification routine has logic to convert conditional branches into unconditional branches, after unswitching makes the condition constant, and then remove any blocks that renders dead. Unfortunately, this code is dead, currently broken, and furthermore, has never been alive (at least as far back at 2006). No functionality change intended. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194277 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-08 19:58:21 +00:00
Hal Finkel	c88eb08d02	Add a runtime unrolling parameter to the LoopUnroll pass constructor As with the other loop unrolling parameters (the unrolling threshold, partial unrolling, etc.) runtime unrolling can now also be controlled via the constructor. This will be necessary for moving non-trivial unrolling late in the pass manager (after loop vectorization). No functionality change intended. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@194027 91177308-0d34-0410-b5e6-96231b3b80d8	2013-11-05 00:08:03 +00:00
Matt Arsenault	9effcbb879	Teach scalarrepl about address spaces git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193720 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-30 22:54:58 +00:00
Matt Arsenault	b7ff48e374	Fix GVN creating bitcast between address spaces git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193710 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-30 19:05:41 +00:00
Andrew Trick	4d4bbaf997	Fix SCEVExpander: don't try to expand quadratic recurrences outside a loop. Partial fix for PR17459: wrong code at -O3 on x86_64-linux-gnu (affecting trunk and 3.3) When SCEV expands a recurrence outside of a loop it attempts to scale by the stride of the recurrence. Chained recurrences don't work that way. We could compute binomial coefficients, but would hve to guarantee that the chained AddRec's are in a perfectly reduced form. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193438 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-25 21:35:56 +00:00
Juergen Ributzka	d084153a8f	Fix a bug in LinearFunctionTestReplace that created invalid loop exit checks. Reviewed by Andy git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193303 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-24 05:29:56 +00:00
Andrew Trick	577ac566c4	Clarify comments in genLoopLimit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193292 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-24 00:43:38 +00:00
Matt Arsenault	244d245974	Use more type helper functions git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193109 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-21 19:43:56 +00:00
Bill Wendling	3e033f2923	Don't eliminate a partially redundant load if it's in a landing pad. A landing pad can be jumped to only by the unwind edge of an invoke instruction. If we eliminate a partially redundant load in a landing pad, it will create a basic block that violates this constraint. It then leads to other problems down the line if it tries to merge that basic block with the landing pad. Avoid this by not eliminating the load in a landing pad. PR17621 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@193064 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-21 04:09:17 +00:00
Tom Stellard	af7ae9d689	StructurizeCFG: Add dependency on LowerSwitch pass Switch instructions were crashing the StructurizeCFG pass, and it's probably easier anyway if we don't need to handle them in this pass. Reviewed-by: Christian König <christian.koenig@amd.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191841 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-02 17:04:59 +00:00
Chandler Carruth	dd5d86d992	Remove the very substantial, largely unmaintained legacy PGO infrastructure. This was essentially work toward PGO based on a design that had several flaws, partially dating from a time when LLVM had a different architecture, and with an effort to modernize it abandoned without being completed. Since then, it has bitrotted for several years further. The result is nearly unusable, and isn't helping any of the modern PGO efforts. Instead, it is getting in the way, adding confusion about PGO in LLVM and distracting everyone with maintenance on essentially dead code. Removing it paves the way for modern efforts around PGO. Among other effects, this removes the last of the runtime libraries from LLVM. Those are being developed in the separate 'compiler-rt' project now, with somewhat different licensing specifically more approriate for runtimes. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191835 91177308-0d34-0410-b5e6-96231b3b80d8	2013-10-02 15:42:23 +00:00
Robert Wilhelm	3f4f420ab7	Even more spelling fixes for "instruction". git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191611 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-28 13:42:22 +00:00
Benjamin Kramer	d721520e4c	Push analysis passes to InstSimplify when they're around anyways. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191309 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-24 16:37:40 +00:00
Benjamin Kramer	7f80b75b96	Drop spurious handle in comment. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191172 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-22 11:24:58 +00:00
Benjamin Kramer	1ce1525ed4	SROA: Handle casts involving vectors of pointers and integer scalars. SROA wants to convert any types of equivalent widths but it's not possible to convert vectors of pointers to an integer scalar with a single cast. As a workaround we add a bitcast to the corresponding int ptr type first. This type of cast used to be an edge case but has become common with SLP vectorization. Fixes PR17271. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191143 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-21 20:36:04 +00:00
Shuxin Yang	d93e8a06b2	Resurrect r191017 " GVN proceeds in the presence of dead code" plus a fix to PR17307 & 17308. The problem of r191017 is that when GVN fabricate a val-number for a dead instruction (in order to make following expr-PRE happy), it forget to fabricate a leader-table entry for it as well. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191118 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-20 23:12:57 +00:00
Joerg Sonnenberger	fc572d87d2	Revert r191017, it results in segmentation faults in Qt. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191104 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-20 20:33:57 +00:00
Shuxin Yang	1bc7315c02	GVN proceeds in the presence of dead code. This is how it ignores the dead code: 1) When a dead branch target, say block B, is identified, all the blocks dominated by B is dead as well. 2) The PHIs of those blocks in dominance-frontier(B) is updated such that the operands corresponding to dead predecessors are replaced by "UndefVal". Using lattice's jargon, the "UndefVal" is the "Top" in essence. Phi node like this "phi(v1 bb1, undef xx)" will be optimized into "v1" if v1 is constant, or v1 is an instruction which dominate this PHI node. 3) When analyzing the availability of a load L, all dead mem-ops which L depends on disguise as a load which evaluate exactly same value as L. 4) The dead mem-ops will be materialized as "UndefVal" during code motion. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@191017 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-19 17:22:51 +00:00
Matt Arsenault	4b28ee2088	MemCpyOptimizer: Use max legal int size instead of pointer size If there are no legal integers, assume 1 byte. This makes more sense than using the pointer size as a guess for the maximum GPR width. It is conceivable to want to use some 64-bit pointers on a target where 64-bit integers aren't legal. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190817 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-16 22:43:16 +00:00
Chandler Carruth	3748de6e2d	Remove the long, long defunct IR block placement pass. This pass was based on the previous (essentially unused) profiling infrastructure and the assumption that by ordering the basic blocks at the IR level in a particular way, the correct layout would happen in the end. This sometimes worked, and mostly didn't. It also was a really naive implementation of the classical paper that dates from when branch predictors were primarily directional and when loop structure wasn't commonly available. It also didn't factor into the equation non-fallthrough branches and other machine level details. Anyways, for all of these reasons and more, I wrote MachineBlockPlacement, which completely supercedes this pass. It both uses modern profile information infrastructure, and actually works. =] git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190748 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-14 09:28:14 +00:00
Hal Finkel	4f7e2c38e8	Add getUnrollingPreferences to TTI Allow targets to customize the default behavior of the generic loop unrolling transformation. This will be used by the PowerPC backend when targeting the A2 core (which is in-order with a deep pipeline), and using more aggressive defaults is important. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190542 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-11 19:25:43 +00:00
Matt Arsenault	11250c1194	Teach loop-idiom about address space pointer sizes git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190491 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-11 05:09:42 +00:00
Matt Arsenault	f834dce7c7	Add braces git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190490 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-11 05:09:35 +00:00
Eli Friedman	22647a0783	Get rid of unused isPodLike definitions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190461 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-11 00:36:54 +00:00
Eli Friedman	5912a12519	Fix mistake in r190442. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190446 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-10 23:09:24 +00:00
Eli Friedman	63a9660a41	Remove unused functions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190442 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-10 22:42:31 +00:00
Matt Arsenault	14807bd8c8	Teach ScalarEvolution about pointer address spaces git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190425 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-10 19:55:24 +00:00
Matt Arsenault	4598bd53ab	Use type helper functions. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190113 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-06 00:37:24 +00:00
Matt Arsenault	ce8e4647bf	Teach CodeGenPrepare about address spaces git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@190112 91177308-0d34-0410-b5e6-96231b3b80d8	2013-09-06 00:18:43 +00:00
Hal Finkel	f208398528	Revert: r189565 - Add getUnrollingPreferences to TTI Revert unintentional commit (of an unreviewed change). Original commit message: Add getUnrollingPreferences to TTI Allow targets to customize the default behavior of the generic loop unrolling transformation. This will be used by the PowerPC backend when targeting the A2 core (which is in-order with a deep pipeline), and using more aggressive defaults is important. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@189566 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-29 03:33:15 +00:00
Hal Finkel	32f258b96a	Add getUnrollingPreferences to TTI Allow targets to customize the default behavior of the generic loop unrolling transformation. This will be used by the PowerPC backend when targeting the A2 core (which is in-order with a deep pipeline), and using more aggressive defaults is important. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@189565 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-29 03:29:57 +00:00
Richard Sandiford	a8a7099c18	Turn MipsOptimizeMathLibCalls into a target-independent scalar transform ...so that it can be used for z too. Most of the code is the same. The only real change is to use TargetTransformInfo to test when a sqrt instruction is available. The pass is opt-in because at the moment it only handles sqrt. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@189097 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-23 10:27:02 +00:00
Nick Lewycky	6c1fa7caae	Revert r187191, which broke opt -mem2reg on the testcases included in PR16867. However, opt -O2 doesn't run mem2reg directly so nobody noticed until r188146 when SROA started sending more things directly down the PromoteMemToReg path. In order to revert r187191, I also revert dependent revisions r187296, r187322 and r188146. Fixes PR16867. Does not add the testcases from that PR, but both of them should get added for both mem2reg and sroa when this revert gets unreverted. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188327 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-13 22:51:58 +00:00
Peter Collingbourne	4f96b7e147	Reapply r188119 now that the bug it exposed is fixed. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188217 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-12 22:38:43 +00:00
Chandler Carruth	5b854f1ea5	Re-instate r187323 which fast-tracks promotable allocas as soon as the SROA-based analysis has enough information. This should work now that both mem2reg and the SSAUpdater-based AllocaPromoter have been updated to be able to promote the types of allocas that the SROA analysis detects. I've included tests for the AllocaPromoter that were only possible to write once we fast-tracked promotable allocas without rewriting them. This includes a test both for r187347 and r188145. Original commit log for r187323: """ Now that mem2reg understands how to cope with a slightly wider set of uses of an alloca, we can pre-compute promotability while analyzing an alloca for splitting in SROA. That lets us short-circuit the common case of a bunch of trivially promotable allocas. This cuts 20% to 30% off the run time of SROA for typical frontend-generated IR sequneces I'm seeing. It gets the new SROA to within 20% of ScalarRepl for such code. My current benchmark for these numbers is PR15412, but it fits the general pattern of IR emitted by Clang so it should be widely applicable. """ git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188146 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-11 02:17:11 +00:00
Chandler Carruth	37508bb842	Finish fixing the SSAUpdater-based AllocaPromoter strategy in SROA to cope with the more general set of patterns that are now handled by mem2reg and that we can detect quickly while doing SROA's initial analysis. Notably, this allows it to promote through no-op bitcast and GEP sequences. A core part of the SSAUpdater approach is the ability to test whether a particular instruction is part of the set being promoted. Testing this becomes significantly more complex in the world where the operand to every load and store isn't the alloca itself. I ended up using the approach of walking up the def-chain until we find the alloca. I benchmarked this against keeping a set of pointer operands and keeping a set of the loads and stores we care about, and this one seemed faster although the difference was very small. No test case yet because currently the rewriting always "fixes" the inputs to not require this. The next patch which re-enables early promotion of easy cases in SROA will include a test case that specifically exercises this aspect of the alloca promoter. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188145 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-11 01:56:15 +00:00
Chandler Carruth	3c7a446059	Reformat some bits of AllocaPromoter and simplify the name and type of our visiting datastructures in the AllocaPromoter/SSAUpdater path of SROA. Also shift the order if clears around to be more consistent. No functionality changed here, this is just a cleanup. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188144 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-11 01:03:18 +00:00
Arnold Schwaighofer	5cf14916c3	Revert r188119 "Kill some duplicated code for removing unreachable BBs." It is breaking builbots with libgmalloc enabled on Mac OS X. $ cd llvm ; mkdir release ; cd release $ ../configure --enable-optimized —prefix=$PWD/install $ make $ make check $ Release+Asserts/bin/llvm-lit -v --param use_gmalloc=1 --param \ gmalloc_path=/usr/lib/libgmalloc.dylib \ ../test/Instrumentation/DataFlowSanitizer/args-unreachable-bb.ll git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188142 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-10 20:16:06 +00:00
Peter Collingbourne	835738ce54	Kill some duplicated code for removing unreachable BBs. This moves removeUnreachableBlocksFromFn from SimplifyCFGPass.cpp to Utils/Local.cpp and uses it to replace the implementation of llvm::removeUnreachableBlocks, which appears to do a strict subset of what removeUnreachableBlocksFromFn does. Differential Revision: http://llvm-reviews.chandlerc.com/D1334 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@188119 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-09 22:47:24 +00:00
Benjamin Kramer	c11b107f21	JumpThreading: Turn a select instruction into branching if it allows to thread one half of the select. This is a common pattern coming out of simplifycfg generating gross code. a: ; preds = %entry %sel = select i1 %cmp1, double %add, double 0.000000e+00 br label %b b: %cond5 = phi double [ %sel, %a ], [ %sub, %entry ] %cmp6 = fcmp oeq double %cond5, 0.000000e+00 br i1 %cmp6, label %if.then, label %if.end becomes a: br i1 %cmp1, label %b, label %if.then b: %cond5 = phi double [ %sub, %entry ], [ %add, %a ] %cmp6 = fcmp oeq double %cond5, 0.000000e+00 br i1 %cmp6, label %if.then, label %if.end Skipping block b completely if possible. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187880 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-07 10:29:38 +00:00
Jakub Staszak	7198ee6f62	Adjust file to the coding standard. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187808 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-06 17:03:42 +00:00
Tom Stellard	01d7203ef8	Factor FlattenCFG out from SimplifyCFG Patch by: Mei Ye git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187764 91177308-0d34-0410-b5e6-96231b3b80d8	2013-08-06 02:43:45 +00:00
Chandler Carruth	e1361ec325	Teach the AllocaPromoter which is wrapped around the SSAUpdater infrastructure to do promotion without a domtree the same smarts about looking through GEPs, bitcasts, etc., that I just taught mem2reg about. This way, if SROA chooses to promote an alloca which still has some noisy instructions this code can cope with them. I've not used as principled of an approach here for two reasons: 1) This code doesn't really need it as we were already set up to zip through the instructions used by the alloca. 2) I view the code here as more of a hack, and hopefully a temporary one. The SSAUpdater path in SROA is a real sore point for me. It doesn't make a lot of architectural sense for many reasons: - We're likely to end up needing the domtree anyways in a subsequent pass, so why not compute it earlier and use it. - In the future we'll likely end up needing the domtree for parts of the inliner itself. - If we need to we could teach the inliner to preserve the domtree. Part of the re-work of the pass manager will allow this to be very powerful even in large SCCs with many functions. - Ultimately, computing a domtree has gotten significantly faster since the original SSAUpdater-using code went into ScalarRepl. We no longer use domfrontiers, and much of domtree is lazily done based on queries rather than eagerly. - At this point keeping the SSAUpdater-based promotion saves a total of 0.7% on a build of the 'opt' tool for me. That's not a lot of performance given the complexity! So I'm leaving this a bit ugly in the hope that eventually we just remove all of this nonsense. I can't even readily test this because this code isn't reachable except through SROA. When I re-instate the patch that fast-tracks allocas already suitable for promotion, I'll add a testcase there that failed before this change. Before that, SROA will fix any test case I give it. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187347 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-29 09:06:53 +00:00
Chandler Carruth	65f12f1d05	Temporarily revert r187323 until I update SSAUpdater to match mem2reg. I forgot that we had two totally independent things here. :: sigh :: git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187327 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-28 09:05:49 +00:00
Chandler Carruth	cea60aff34	Now that mem2reg understands how to cope with a slightly wider set of uses of an alloca, we can pre-compute promotability while analyzing an alloca for splitting in SROA. That lets us short-circuit the common case of a bunch of trivially promotable allocas. This cuts 20% to 30% off the run time of SROA for typical frontend-generated IR sequneces I'm seeing. It gets the new SROA to within 20% of ScalarRepl for such code. My current benchmark for these numbers is PR15412, but it fits the general pattern of IR emitted by Clang so it should be widely applicable. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187323 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-28 08:27:12 +00:00
Chandler Carruth	6c3a95dab5	Thread DataLayout through the callers and into mem2reg. This will be useful in a subsequent patch, but causes an unfortunate amount of noise, so I pulled it out into a separate patch. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187322 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-28 06:43:11 +00:00
Chandler Carruth	89934cbd34	Don't use all the #ifdefs to hide the stats counters and instead rely on their being optimized out in debug mode. Realistically, this just isn't going to be the slow part anyways. This also fixes unused variable warnings that are breaking LLD build bots. =/ I didn't see these at first, and kept losing track of the fact that they were broken. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187297 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-27 10:17:49 +00:00
Nick Lewycky	81e480463d	Reimplement isPotentiallyReachable to make nocapture deduction much stronger. Adds unit tests for it too. Split BasicBlockUtils into an analysis-half and a transforms-half, and put the analysis bits into a new Analysis/CFG.{h,cpp}. Promote isPotentiallyReachable into llvm::isPotentiallyReachable and move it into Analysis/CFG. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187283 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-27 01:24:00 +00:00
Tom Stellard	57e6b2d1f3	SimplifyCFG: Use parallel-and and parallel-or mode to consolidate branch conditions Merge consecutive if-regions if they contain identical statements. Both transformations reduce number of branches. The transformation is guarded by a target-hook, and is currently enabled only for +R600, but the correctness has been tested on X86 target using a variety of CPU benchmarks. Patch by: Mei Ye git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187278 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-27 00:01:07 +00:00
Benjamin Kramer	6a565e5be6	TRE: Move class into anonymous namespace. While there shrink a dangerously large SmallPtrSet. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187050 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-24 16:12:08 +00:00
Chandler Carruth	b7f27824fb	Fix a problem I introduced in r187029 where we would over-eagerly schedule an alloca for another iteration in SROA. This only showed up with a mixture of promotable and unpromotable selects and phis. Added a test case for this. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187031 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-24 12:12:17 +00:00
Chandler Carruth	9b3b286247	Fix PR16687 where we were incorrectly promoting an alloca that had pending speculation for a phi node. The problem here is that we were using growth of the specluation set as an indicator of whether speculation would occur, and if the phi node is already in the set we don't see it grow. This is a symptom of the fact that this signal is a total hack. Unfortunately, I couldn't really come up with a non-hacky way of signaling that promotion remains valid after speculation occurs, such that we only speculate when all else looks good for promotion. In the end, I went with at least a much more explicit approach of doing the work of queuing inside the phi and select processing and setting a preposterously named flag to convey that we're in the special state of requiring speculating before promotion. Thanks to Richard Trieu and Nick Lewycky for the excellent work reducing a testcase for this from a pretty giant, nasty assert in a big application. =] The testcase was excellent. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@187029 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-24 09:47:28 +00:00
Nick Lewycky	1579a0f8a6	Remove extraneous null statement. No functionality change! git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186893 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-22 23:38:27 +00:00
Jakub Staszak	eb0588b992	Use switch instead of if. No functionality change. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186892 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-22 23:38:16 +00:00
Jakub Staszak	85f6cbd1a5	OldPtr is llvm::Instruction. Remove unneeded cast<>. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186880 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-22 22:10:43 +00:00
Jakub Staszak	dca13e0b3f	Change tabs to spaces. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186877 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-22 21:11:30 +00:00
Matt Arsenault	1f4492e0b0	Fix spelling and grammar git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186858 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-22 18:59:58 +00:00
Benjamin Kramer	916cde6416	SROA: Microoptimization: Remove dead entries first, then sort. While there replace an explicit struct with std::mem_fun. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186761 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-20 08:38:34 +00:00
Chandler Carruth	47042bcc26	Cleanup the stats counters for the new implementation. These actually count the right things and have the right names. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186667 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-19 10:57:36 +00:00
Chandler Carruth	fbf2a02622	Fix another assert failure very similar to PR16651's test case. This test case came from Benjamin and found the parallel bug in the vector promotion code. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186666 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-19 10:57:32 +00:00
Chandler Carruth	c09228dba3	Try to move to a more reasonable set of naming conventions given the new implementation of the SROA algorithm. We were using the term 'partition' in many places that no longer ever represented an actual partition, but rather just an arbitrary slice of an alloca. No functionality change intended here. Mostly just renaming of types, functions, variables, and rewording of comments. Several comments were rewritten to make a lot more sense in the new structure of things. The stats are still weird and not reflective of how this really works. I'll fix those up in a separate patch as it is a touch more semantic of a change... git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186659 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-19 09:13:58 +00:00
Chandler Carruth	df5ed3f642	A long overdue cleanup in SROA to use 'DL' instead of 'TD' for the DataLayout variables. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186656 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-19 07:21:28 +00:00
Chandler Carruth	8f0a1cecc5	Fix PR16651, an assert introduced in my recent re-work of the innards of SROA. The crux of the issue is that now we track uses of a partition of the alloca in two places: the iterators over the partitioning uses and the previously collected split uses vector. We weren't accounting for the fact that the split uses might invalidate integer widening in ways other than due to their width (in this case due to being volatile). Further reduced testcase added to the tests. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186655 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-19 07:12:23 +00:00
Chandler Carruth	f7c45ce3f5	Reapply r186316 with a fix for one bug where the code could walk off the end of a vector. This was found with ASan. I've had one other report of a crasher, but thus far been unable to reproduce the crash. It may well be fixed with this version, and if not I'd like to get more information from the build bots about what is happening. See r186316 for the full commit log for the new implementation of the SROA algorithm. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186565 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-18 07:15:00 +00:00
Craig Topper	4172a8abba	Add 'const' qualifiers to static const char* variables. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186371 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-16 01:17:10 +00:00
Stephen Lin	f7b6f55e4c	Remove trailing whitespace git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186333 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-15 17:55:02 +00:00
Chandler Carruth	ebf72b3301	Revert r186316 while I track down an ASan failure and an assert from a bot. This reverts the commit which introduced a new implementation of the fancy SROA pass designed to reduce its overhead. I'll skip the huge commit log here, refer to r186316 if you're looking for how this all works and why it works that way. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186332 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-15 17:36:21 +00:00
Chandler Carruth	ea2e90df15	Reimplement SROA yet again. Same fundamental principle, but a totally different core implementation strategy. Previously, SROA would build a relatively elaborate partitioning of an alloca, associate uses with each partition, and then rewrite the uses of each partition in an attempt to break apart the alloca into chunks that could be promoted. This was very wasteful in terms of memory and compile time because regardless of how complex the alloca or how much we're able to do in breaking it up, all of the datastructure work to analyze the partitioning was done up front. The new implementation attempts to form partitions of the alloca lazily and on the fly, rewriting the uses that make up that partition as it goes. This has a few significant effects: 1) Much simpler data structures are used throughout. 2) No more double walk of the recursive use graph of the alloca, only walk it once. 3) No more complex algorithms for associating a particular use with a particular partition. 4) PHI and Select speculation is simplified and happens lazily. 5) More precise information is available about a specific use of the alloca, removing the need for some side datastructures. Ultimately, I think this is a much better implementation. It removes about 300 lines of code, but arguably removes more like 500 considering that some code grew in the process of being factored apart and cleaned up for this all to work. I've re-used as much of the old implementation as possible, which includes the lion's share of code in the form of the rewriting logic. The interesting new logic centers around how the uses of a partition are sorted, and split into actual partitions. Each instruction using a pointer derived from the alloca gets a 'Partition' entry. This name is totally wrong, but I'll do a rename in a follow-up commit as there is already enough churn here. The entry describes the offset range accessed and the nature of the access. Once we have all of these entries we sort them in a very specific way: increasing order of begin offset, followed by whether they are splittable uses (memcpy, etc), followed by the end offset or whatever. Sorting by splittability is important as it simplifies the collection of uses into a partition. Once we have these uses sorted, we walk from the beginning to the end building up a range of uses that form a partition of the alloca. Overlapping unsplittable uses are merged into a single partition while splittable uses are broken apart and carried from one partition to the next. A partition is also introduced to bridge splittable uses between the unsplittable regions when necessary. I've looked at the performance PRs fairly closely. PR15471 no longer will even load (the module is invalid). Not sure what is up there. PR15412 improves by between 5% and 10%, however it is nearly impossible to know what is holding it up as SROA (the entire pass) takes less time than reading the IR for that test case. The analysis takes the same time as running mem2reg on the final allocas. I suspect (without much evidence) that the new implementation will scale much better however, and it is just the small nature of the test cases that makes the changes small and noisy. Either way, it is still simpler and cleaner I think. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186316 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-15 10:30:19 +00:00
Craig Topper	a0ec3f9b7b	Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186274 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-14 04:42:23 +00:00
Andrew Trick	16404cc817	LFTR improvement to avoid truncation. This is a reimplemntation of the patch originally in r186107. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186215 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-12 22:08:48 +00:00
Andrew Trick	807e6c71a8	Cleanup LFTR logic. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186214 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-12 22:08:44 +00:00
Andrew Trick	7137909128	Cleanup: rename a variable to make the logic easier to follow. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186213 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-12 22:08:41 +00:00
Chandler Carruth	6f0ec20e8f	Revert "indvars: Improve LFTR by eliminating truncation when comparing against a constant." This reverts commit r186107. It didn't handle wrapping arithmetic in the loop correctly and thus caused the following C program to count from 0 to UINT64_MAX instead of from 0 to 255 as intended: #include <stdio.h> int main() { unsigned char first = 0, last = 255; do { printf("%d\n", first); } while (first++ != last); } Full test case and instructions to reproduce with just the -indvars pass sent to the original review thread rather than to r186107's commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186152 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-12 11:18:55 +00:00
Andrew Trick	53b28f8623	indvars: Improve LFTR by eliminating truncation when comparing against a constant. Patch by Michele Scandale! Adds a special handling of the case where, during the loop exit condition rewriting, the exit value is a constant of bitwidth lower than the type of the induction variable: instead of introducing a trunc operation in order to match correctly the operand types, it allows to convert the constant value to an equivalent constant, depending on the initial value of the induction variable and the trip count, in order have an equivalent comparison between the induction variable and the new constant. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186107 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-11 17:08:59 +00:00
Michael Gottesman	03fddb710e	Teach TailRecursionElimination to handle certain cases of nocapture escaping allocas. Without the changes introduced into this patch, if TRE saw any allocas at all, TRE would not perform TRE or mark callsites with the tail marker. Because TRE runs after mem2reg, this inadequacy is not a death sentence. But given a callsite A without escaping alloca argument, A may not be able to have the tail marker placed on it due to a separate callsite B having a write-back parameter passed in via an argument with the nocapture attribute. Assume that B is the only other callsite besides A and B only has nocapture escaping alloca arguments (NOTE B may have other arguments that are not passed allocas). In this case not marking A with the tail marker is unnecessarily conservative since: 1. By assumption A has no escaping alloca arguments itself so it can not access the caller's stack via its arguments. 2. Since all of B's escaping alloca arguments are passed as parameters with the nocapture attribute, we know that B does not stash said escaping allocas in a manner that outlives B itself and thus could be accessed indirectly by A. With the changes introduced by this patch: 1. If we see any escaping allocas passed as a capturing argument, we do nothing and bail early. 2. If we do not see any escaping allocas passed as captured arguments but we do see escaping allocas passed as nocapture arguments: i. We do not perform TRE to avoid PR962 since the code generator produces significantly worse code for the dynamic allocas that would be created by the TRE algorithm. ii. If we do not return twice, mark call sites without escaping allocas with the tail marker. NOTE This excludes functions with escaping nocapture allocas. 3. If we do not see any escaping allocas at all (whether captured or not): i. If we do not have usage of setjmp, mark all callsites with the tail marker. ii. If there are no dynamic/variable sized allocas in the function, attempt to perform TRE on all callsites in the function. Based off of a patch by Nick Lewycky. rdar://14324281. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@186057 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-11 04:40:01 +00:00
Benjamin Kramer	34ae5725c0	Reassociate: Remove unnecessary default operator=. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185757 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-06 15:10:13 +00:00
Sylvestre Ledru	23191804e8	Remove a useless declarations (found by scan-build) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185709 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-05 15:58:12 +00:00
Craig Topper	6227d5c690	Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185606 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-04 01:31:24 +00:00
Craig Topper	365ef0b197	Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@185540 91177308-0d34-0410-b5e6-96231b3b80d8	2013-07-03 15:07:05 +00:00
Nick Lewycky	34b96d1576	dbgs() << Instruction doesn't print a newline on the end any more. Update these debug statements to add a missing newline. Also canonicalize to '\n' instead of "\n"; the latter calls a function with a loop the former does not. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184897 91177308-0d34-0410-b5e6-96231b3b80d8	2013-06-26 00:30:18 +00:00
Bob Wilson	a1fe2948ed	Fix SROA to avoid unnecessary scalar conversions for 1-element vectors. When a 1-element vector alloca is promoted, a store instruction can often be rewritten without converting the value to a scalar and using an insertelement instruction to stuff it into the new alloca. This patch just adds a check to skip that conversion when it is unnecessary. This turns out to be really important for some ARM Neon operations where <1 x i64> is used to get around the fact that i64 is not a legal type. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@184870 91177308-0d34-0410-b5e6-96231b3b80d8	2013-06-25 19:09:50 +00:00

1 2 3 4 5 ...

5888 Commits