llvm-6502/lib/Analysis
Benjamin Kramer b6fdd022b7 PR13095: Give an inline cost bonus to functions using byval arguments.
We give a bonus for every argument because the argument setup is not needed
anymore when the function is inlined. With this patch we interpret byval
arguments as a compact representation of many arguments. The byval argument
setup is implemented in the backend as an inline memcpy, so to model the
cost as accurately as possible we take the number of pointer-sized elements
in the byval argument and give a bonus of 2 instructions for every one of
those. The bonus is capped at 8 elements, which is the number of stores
at which the x86 backend switches from an expanded inline memcpy to a real
memcpy. It would be better to use the real memcpy threshold from the backend,
but it's not available via TargetData.

This change brings the performance of c-ray in line with gcc 4.7. The included
test case tries to reproduce the c-ray problem to catch regressions for this
benchmark early, its performance is dominated by the inline decision of a
specific call.

This only has a small impact on most code, more on x86 and arm than on x86_64
due to the way the ABI works. When building LLVM for x86 it gives a small
inline cost boost to virtually any function using StringRef or STL allocators,
but only a 0.01% increase in overall binary size. The size of gcc compiled by
clang actually shrunk by a couple bytes with this patch applied, but not
significantly.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@161413 91177308-0d34-0410-b5e6-96231b3b80d8
2012-08-07 11:13:19 +00:00
..
IPA RefreshCallGraph: ignore 'invoke intrinsic'. IntrinsicInst doesnt not recognize invoke, and shouldnt at this point, since the rest of LLVM codebase doesnt expect invoke of intrinsics 2012-06-29 17:49:32 +00:00
AliasAnalysis.cpp Move the capture analysis from MemoryDependencyAnalysis to a more general place 2012-05-14 20:35:04 +00:00
AliasAnalysisCounter.cpp Persuade GCC that there is nothing worth warning about here (there isn't). 2012-02-05 14:20:11 +00:00
AliasAnalysisEvaluator.cpp
AliasDebugger.cpp
AliasSetTracker.cpp Reduce use list thrashing by using DenseMap's find_as for maps with ValueHandle keys. 2012-06-30 22:37:15 +00:00
Analysis.cpp
BasicAliasAnalysis.cpp refactor the MemoryBuiltin analysis: 2012-06-21 15:45:28 +00:00
BlockFrequencyInfo.cpp
BranchProbabilityInfo.cpp
CaptureTracking.cpp Fix intendation. 2012-05-10 23:38:07 +00:00
CFGPrinter.cpp
CMakeLists.txt Update the CMake files. 2012-06-29 09:01:47 +00:00
CodeMetrics.cpp A pile of long over-due refactorings here. There are some very, *very* 2012-05-04 00:58:03 +00:00
ConstantFolding.cpp When constant folding GEP expressions, keep the address space information of pointers. 2012-07-30 07:25:20 +00:00
DbgInfoPrinter.cpp Move lib/Analysis/DebugInfo.cpp to lib/VMCore/DebugInfo.cpp and 2012-06-28 00:05:13 +00:00
DominanceFrontier.cpp
DomPrinter.cpp remove the blank line from previous ci. 2012-02-04 03:18:47 +00:00
InlineCost.cpp PR13095: Give an inline cost bonus to functions using byval arguments. 2012-08-07 11:13:19 +00:00
InstCount.cpp
InstructionSimplify.cpp Fix PR13412, a nasty miscompile due to the interleaved 2012-08-07 10:59:59 +00:00
Interval.cpp
IntervalPartition.cpp
IVUsers.cpp IVUsers should only generate SCEV's for values that are safe to speculate. 2012-07-13 23:33:05 +00:00
LazyValueInfo.cpp make LazyValueInfo analyze the default case of switch statements (we know that in the default branch the value cannot be any of the switch cases) 2012-06-28 16:13:37 +00:00
LibCallAliasAnalysis.cpp
LibCallSemantics.cpp
Lint.cpp Always compute all the bits in ComputeMaskedBits. 2012-04-04 12:51:34 +00:00
LLVMBuild.txt
Loads.cpp enhance jump threading to preserve TBAA information when PRE'ing loads, 2012-03-13 18:07:41 +00:00
LoopDependenceAnalysis.cpp
LoopInfo.cpp Enable the new LoopInfo algorithm by default. 2012-06-26 04:11:38 +00:00
LoopPass.cpp Enable the new LoopInfo algorithm by default. 2012-06-26 04:11:38 +00:00
Makefile
MemDepPrinter.cpp Mark some static arrays as const. 2012-05-24 06:35:32 +00:00
MemoryBuiltins.cpp fix PR13390: do not loop forever with self-referencing self instructions 2012-07-27 18:21:15 +00:00
MemoryDependenceAnalysis.cpp refactor the MemoryBuiltin analysis: 2012-06-21 15:45:28 +00:00
ModuleDebugInfoPrinter.cpp Move lib/Analysis/DebugInfo.cpp to lib/VMCore/DebugInfo.cpp and 2012-06-28 00:05:13 +00:00
NoAliasAnalysis.cpp
PathNumbering.cpp Move llvm/Support/TypeBuilder.h -> llvm/TypeBuilder.h. This completes 2012-07-15 23:45:24 +00:00
PathProfileInfo.cpp
PathProfileVerifier.cpp
PHITransAddr.cpp Uniformize the InstructionSimplify interface by ensuring that all routines 2012-03-13 11:42:19 +00:00
PostDominators.cpp
ProfileEstimatorPass.cpp
ProfileInfo.cpp
ProfileInfoLoader.cpp Remove unused private member variables uncovered by the recent changes to clang's -Wunused-private-field. 2012-07-20 22:05:57 +00:00
ProfileInfoLoaderPass.cpp Round 2 of dead private variable removal. 2012-06-06 19:47:08 +00:00
ProfileVerifierPass.cpp
README.txt
RegionInfo.cpp Implement the block_iterator of Region based on df_iterator. 2012-08-02 14:20:02 +00:00
RegionPass.cpp Rename the Region::block_iterator to Region::block_node_iterator, and 2012-05-04 20:55:23 +00:00
RegionPrinter.cpp Rename the Region::block_iterator to Region::block_node_iterator, and 2012-05-04 20:55:23 +00:00
ScalarEvolution.cpp Stay rational; don't assert trying to take the square root of a negative value. 2012-08-01 09:14:36 +00:00
ScalarEvolutionAliasAnalysis.cpp
ScalarEvolutionExpander.cpp Fix a typo (the the => the) 2012-07-23 08:51:15 +00:00
ScalarEvolutionNormalization.cpp
SparsePropagation.cpp Taken into account Duncan's comments for r149481 dated by 2nd Feb 2012: 2012-03-08 07:06:20 +00:00
Trace.cpp
TypeBasedAliasAnalysis.cpp
ValueTracking.cpp PHINode::hasConstantValue(): return undef if the PHI is fully recursive. 2012-07-03 21:15:40 +00:00

Analysis Opportunities:

//===---------------------------------------------------------------------===//

In test/Transforms/LoopStrengthReduce/quadradic-exit-value.ll, the
ScalarEvolution expression for %r is this:

  {1,+,3,+,2}<loop>

Outside the loop, this could be evaluated simply as (%n * %n), however
ScalarEvolution currently evaluates it as

  (-2 + (2 * (trunc i65 (((zext i64 (-2 + %n) to i65) * (zext i64 (-1 + %n) to i65)) /u 2) to i64)) + (3 * %n))

In addition to being much more complicated, it involves i65 arithmetic,
which is very inefficient when expanded into code.

//===---------------------------------------------------------------------===//

In formatValue in test/CodeGen/X86/lsr-delayed-fold.ll,

ScalarEvolution is forming this expression:

((trunc i64 (-1 * %arg5) to i32) + (trunc i64 %arg5 to i32) + (-1 * (trunc i64 undef to i32)))

This could be folded to

(-1 * (trunc i64 undef to i32))

//===---------------------------------------------------------------------===//