mirror of
				https://github.com/c64scene-ar/llvm-6502.git
				synced 2025-10-30 16:17:05 +00:00 
			
		
		
		
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235827 91177308-0d34-0410-b5e6-96231b3b80d8
		
			
				
	
	
		
			184 lines
		
	
	
		
			9.0 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			184 lines
		
	
	
		
			9.0 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| =====================================
 | |
| Performance Tips for Frontend Authors
 | |
| =====================================
 | |
| 
 | |
| .. contents::
 | |
|    :local:
 | |
|    :depth: 2
 | |
| 
 | |
| Abstract
 | |
| ========
 | |
| 
 | |
| The intended audience of this document is developers of language frontends 
 | |
| targeting LLVM IR. This document is home to a collection of tips on how to 
 | |
| generate IR that optimizes well.  As with any optimizer, LLVM has its strengths
 | |
| and weaknesses.  In some cases, surprisingly small changes in the source IR 
 | |
| can have a large effect on the generated code.  
 | |
| 
 | |
| Avoid loads and stores of large aggregate type
 | |
| ================================================
 | |
| 
 | |
| LLVM currently does not optimize well loads and stores of large :ref:`aggregate
 | |
| types <t_aggregate>` (i.e. structs and arrays).  As an alternative, consider 
 | |
| loading individual fields from memory.
 | |
| 
 | |
| Aggregates that are smaller than the largest (performant) load or store 
 | |
| instruction supported by the targeted hardware are well supported.  These can 
 | |
| be an effective way to represent collections of small packed fields.  
 | |
| 
 | |
| Prefer zext over sext when legal
 | |
| ==================================
 | |
| 
 | |
| On some architectures (X86_64 is one), sign extension can involve an extra 
 | |
| instruction whereas zero extension can be folded into a load.  LLVM will try to
 | |
| replace a sext with a zext when it can be proven safe, but if you have 
 | |
| information in your source language about the range of a integer value, it can 
 | |
| be profitable to use a zext rather than a sext.  
 | |
| 
 | |
| Alternatively, you can :ref:`specify the range of the value using metadata 
 | |
| <range-metadata>` and LLVM can do the sext to zext conversion for you.
 | |
| 
 | |
| Zext GEP indices to machine register width
 | |
| ============================================
 | |
| 
 | |
| Internally, LLVM often promotes the width of GEP indices to machine register
 | |
| width.  When it does so, it will default to using sign extension (sext) 
 | |
| operations for safety.  If your source language provides information about 
 | |
| the range of the index, you may wish to manually extend indices to machine 
 | |
| register width using a zext instruction.
 | |
| 
 | |
| Other things to consider
 | |
| =========================
 | |
| 
 | |
| #. Make sure that a DataLayout is provided (this will likely become required in
 | |
|    the near future, but is certainly important for optimization).
 | |
| 
 | |
| #. Add nsw/nuw flags as appropriate.  Reasoning about overflow is 
 | |
|    generally hard for an optimizer so providing these facts from the frontend 
 | |
|    can be very impactful.  
 | |
| 
 | |
| #. Use fast-math flags on floating point operations if legal.  If you don't 
 | |
|    need strict IEEE floating point semantics, there are a number of additional 
 | |
|    optimizations that can be performed.  This can be highly impactful for 
 | |
|    floating point intensive computations.
 | |
| 
 | |
| #. Use inbounds on geps.  This can help to disambiguate some aliasing queries.
 | |
| 
 | |
| #. Add noalias/align/dereferenceable/nonnull to function arguments and return 
 | |
|    values as appropriate
 | |
| 
 | |
| #. Mark functions as readnone/readonly or noreturn/nounwind when known.  The 
 | |
|    optimizer will try to infer these flags, but may not always be able to.  
 | |
|    Manual annotations are particularly important for external functions that 
 | |
|    the optimizer can not analyze.
 | |
| 
 | |
| #. Use ptrtoint/inttoptr sparingly (they interfere with pointer aliasing 
 | |
|    analysis), prefer GEPs
 | |
| 
 | |
| #. Use the lifetime.start/lifetime.end and invariant.start/invariant.end 
 | |
|    intrinsics where possible.  Common profitable uses are for stack like data 
 | |
|    structures (thus allowing dead store elimination) and for describing 
 | |
|    life times of allocas (thus allowing smaller stack sizes).  
 | |
| 
 | |
| #. Use pointer aliasing metadata, especially tbaa metadata, to communicate 
 | |
|    otherwise-non-deducible pointer aliasing facts
 | |
| 
 | |
| #. Use the "most-private" possible linkage types for the functions being defined
 | |
|    (private, internal or linkonce_odr preferably)
 | |
| 
 | |
| #. Mark invariant locations using !invariant.load and TBAA's constant flags
 | |
| 
 | |
| #. Prefer globals over inttoptr of a constant address - this gives you 
 | |
|    dereferencability information.  In MCJIT, use getSymbolAddress to provide 
 | |
|    actual address.
 | |
| 
 | |
| #. Be wary of ordered and atomic memory operations.  They are hard to optimize 
 | |
|    and may not be well optimized by the current optimizer.  Depending on your
 | |
|    source language, you may consider using fences instead.
 | |
| 
 | |
| #. If calling a function which is known to throw an exception (unwind), use 
 | |
|    an invoke with a normal destination which contains an unreachable 
 | |
|    instruction.  This form conveys to the optimizer that the call returns 
 | |
|    abnormally.  For an invoke which neither returns normally or requires unwind
 | |
|    code in the current function, you can use a noreturn call instruction if 
 | |
|    desired.  This is generally not required because the optimizer will convert
 | |
|    an invoke with an unreachable unwind destination to a call instruction.
 | |
| 
 | |
| #. If you language uses range checks, consider using the IRCE pass.  It is not 
 | |
|    currently part of the standard pass order.
 | |
| 
 | |
| #. For languages with numerous rarely executed guard conditions (e.g. null 
 | |
|    checks, type checks, range checks) consider adding an extra execution or 
 | |
|    two of LoopUnswith and LICM to your pass order.  The standard pass order, 
 | |
|    which is tuned for C and C++ applications, may not be sufficient to remove 
 | |
|    all dischargeable checks from loops.
 | |
| 
 | |
| #. Use profile metadata to indicate statically known cold paths, even if 
 | |
|    dynamic profiling information is not available.  This can make a large 
 | |
|    difference in code placement and thus the performance of tight loops.
 | |
| 
 | |
| #. When generating code for loops, try to avoid terminating the header block of
 | |
|    the loop earlier than necessary.  If the terminator of the loop header 
 | |
|    block is a loop exiting conditional branch, the effectiveness of LICM will
 | |
|    be limited for loads not in the header.  (This is due to the fact that LLVM 
 | |
|    may not know such a load is safe to speculatively execute and thus can't 
 | |
|    lift an otherwise loop invariant load unless it can prove the exiting 
 | |
|    condition is not taken.)  It can be profitable, in some cases, to emit such 
 | |
|    instructions into the header even if they are not used along a rarely 
 | |
|    executed path that exits the loop.  This guidance specifically does not 
 | |
|    apply if the condition which terminates the loop header is itself invariant,
 | |
|    or can be easily discharged by inspecting the loop index variables.
 | |
| 
 | |
| #. In hot loops, consider duplicating instructions from small basic blocks 
 | |
|    which end in highly predictable terminators into their successor blocks.  
 | |
|    If a hot successor block contains instructions which can be vectorized 
 | |
|    with the duplicated ones, this can provide a noticeable throughput
 | |
|    improvement.  Note that this is not always profitable and does involve a 
 | |
|    potentially large increase in code size.
 | |
| 
 | |
| #. Avoid high in-degree basic blocks (e.g. basic blocks with dozens or hundreds
 | |
|    of predecessors).  Among other issues, the register allocator is known to 
 | |
|    perform badly with confronted with such structures.  The only exception to 
 | |
|    this guidance is that a unified return block with high in-degree is fine.
 | |
| 
 | |
| #. When checking a value against a constant, emit the check using a consistent
 | |
|    comparison type.  The GVN pass *will* optimize redundant equalities even if
 | |
|    the type of comparison is inverted, but GVN only runs late in the pipeline.
 | |
|    As a result, you may miss the opportunity to run other important 
 | |
|    optimizations.  Improvements to EarlyCSE to remove this issue are tracked in 
 | |
|    Bug 23333.
 | |
| 
 | |
| #. Avoid using arithmetic intrinsics unless you are *required* by your source 
 | |
|    language specification to emit a particular code sequence.  The optimizer 
 | |
|    is quite good at reasoning about general control flow and arithmetic, it is
 | |
|    not anywhere near as strong at reasoning about the various intrinsics.  If 
 | |
|    profitable for code generation purposes, the optimizer will likely form the 
 | |
|    intrinsics itself late in the optimization pipeline.  It is *very* rarely 
 | |
|    profitable to emit these directly in the language frontend.  This item
 | |
|    explicitly includes the use of the :ref:`overflow intrinsics <int_overflow>`.
 | |
| 
 | |
| #. Avoid using the :ref:`assume intrinsic <int_assume>` until you've 
 | |
|    established that a) there's no other way to express the given fact and b) 
 | |
|    that fact is critical for optimization purposes.  Assumes are a great 
 | |
|    prototyping mechanism, but they can have negative effects on both compile 
 | |
|    time and optimization effectiveness.  The former is fixable with enough 
 | |
|    effort, but the later is fairly fundamental to their designed purpose.
 | |
| 
 | |
| p.s. If you want to help improve this document, patches expanding any of the 
 | |
| above items into standalone sections of their own with a more complete 
 | |
| discussion would be very welcome.  
 | |
| 
 | |
| 
 | |
| Adding to this document
 | |
| =======================
 | |
| 
 | |
| If you run across a case that you feel deserves to be covered here, please send
 | |
| a patch to `llvm-commits
 | |
| <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>`_ for review.
 | |
| 
 | |
| If you have questions on these items, please direct them to `llvmdev 
 | |
| <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_.  The more relevant 
 | |
| context you are able to give to your question, the more likely it is to be 
 | |
| answered.
 | |
| 
 |