mirror of
				https://github.com/c64scene-ar/llvm-6502.git
				synced 2025-10-25 10:27:04 +00:00 
			
		
		
		
	git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@137243 91177308-0d34-0410-b5e6-96231b3b80d8
		
			
				
	
	
		
			300 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			300 lines
		
	
	
		
			14 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
 | |
|                       "http://www.w3.org/TR/html4/strict.dtd">
 | |
| <html>
 | |
| <head>
 | |
|   <title>LLVM Atomic Instructions and Concurrency Guide</title>
 | |
|   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 | |
|   <link rel="stylesheet" href="llvm.css" type="text/css">
 | |
| </head>
 | |
| <body>
 | |
| 
 | |
| <h1>
 | |
|   LLVM Atomic Instructions and Concurrency Guide
 | |
| </h1>
 | |
| 
 | |
| <ol>
 | |
|   <li><a href="#introduction">Introduction</a></li>
 | |
|   <li><a href="#loadstore">Load and store</a></li>
 | |
|   <li><a href="#ordering">Atomic orderings</a></li>
 | |
|   <li><a href="#otherinst">Other atomic instructions</a></li>
 | |
|   <li><a href="#iropt">Atomics and IR optimization</a></li>
 | |
|   <li><a href="#codegen">Atomics and Codegen</a></li>
 | |
| </ol>
 | |
| 
 | |
| <div class="doc_author">
 | |
|   <p>Written by Eli Friedman</p>
 | |
| </div>
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| <h2>
 | |
|   <a name="introduction">Introduction</a>
 | |
| </h2>
 | |
| <!-- *********************************************************************** -->
 | |
| 
 | |
| <div>
 | |
| 
 | |
| <p>Historically, LLVM has not had very strong support for concurrency; some
 | |
| minimal intrinsics were provided, and <code>volatile</code> was used in some
 | |
| cases to achieve rough semantics in the presence of concurrency.  However, this
 | |
| is changing; there are now new instructions which are well-defined in the
 | |
| presence of threads and asynchronous signals, and the model for existing
 | |
| instructions has been clarified in the IR.</p>
 | |
| 
 | |
| <p>The atomic instructions are designed specifically to provide readable IR and
 | |
|    optimized code generation for the following:</p>
 | |
| <ul>
 | |
|   <li>The new C++0x <code><atomic></code> header.</li>
 | |
|   <li>Proper semantics for Java-style memory, for both <code>volatile</code> and
 | |
|       regular shared variables.</li>
 | |
|   <li>gcc-compatible <code>__sync_*</code> builtins.</li>
 | |
|   <li>Other scenarios with atomic semantics, including <code>static</code>
 | |
|       variables with non-trivial constructors in C++.</li>
 | |
| </ul>
 | |
| 
 | |
| <p>This document is intended to provide a guide to anyone either writing a
 | |
|    frontend for LLVM or working on optimization passes for LLVM with a guide
 | |
|    for how to deal with instructions with special semantics in the presence of
 | |
|    concurrency.  This is not intended to be a precise guide to the semantics;
 | |
|    the details can get extremely complicated and unreadable, and are not
 | |
|    usually necessary.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| <h2>
 | |
|   <a name="loadstore">Load and store</a>
 | |
| </h2>
 | |
| <!-- *********************************************************************** -->
 | |
| 
 | |
| <div>
 | |
| 
 | |
| <p>The basic <code>'load'</code> and <code>'store'</code> allow a variety of 
 | |
|    optimizations, but can have unintuitive results in a concurrent environment.
 | |
|    For a frontend writer, the rule is essentially that all memory accessed 
 | |
|    with basic loads and stores by multiple threads should be protected by a
 | |
|    lock or other synchronization; otherwise, you are likely to run into
 | |
|    undefined behavior. (Do not use volatile as a substitute for atomics; it
 | |
|    might work on some platforms, but does not provide the necessary guarantees
 | |
|    in general.)</p>
 | |
| 
 | |
| <p>From the optimizer's point of view, the rule is that if there
 | |
|    are not any instructions with atomic ordering involved, concurrency does not
 | |
|    matter, with one exception: if a variable might be visible to another
 | |
|    thread or signal handler, a store cannot be inserted along a path where it
 | |
|    might not execute otherwise. Note that speculative loads are allowed;
 | |
|    a load which is part of a race returns <code>undef</code>, but is not
 | |
|    undefined behavior.</p>
 | |
| 
 | |
| <p>For cases where simple loads and stores are not sufficient, LLVM provides
 | |
|    atomic loads and stores with varying levels of guarantees.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| <h2>
 | |
|   <a name="ordering">Atomic orderings</a>
 | |
| </h2>
 | |
| <!-- *********************************************************************** -->
 | |
| 
 | |
| <div>
 | |
| 
 | |
| <p>In order to achieve a balance between performance and necessary guarantees,
 | |
|    there are six levels of atomicity. They are listed in order of strength;
 | |
|    each level includes all the guarantees of the previous level except for
 | |
|    Acquire/Release.</p>
 | |
| 
 | |
| <p>Unordered is the lowest level of atomicity. It essentially guarantees that
 | |
|    races produce somewhat sane results instead of having undefined behavior. 
 | |
|    This is intended to match the Java memory model for shared variables. It 
 | |
|    cannot be used for synchronization, but is useful for Java and other 
 | |
|    "safe" languages which need to guarantee that the generated code never 
 | |
|    exhibits undefined behavior.  Note that this guarantee is cheap on common
 | |
|    platforms for loads of a native width, but can be expensive or unavailable
 | |
|    for wider loads, like a 64-bit load on ARM. (A frontend for a "safe"
 | |
|    language would normally split a 64-bit load on ARM into two 32-bit
 | |
|    unordered loads.) In terms of the optimizer, this prohibits any
 | |
|    transformation that transforms a single load into multiple loads, 
 | |
|    transforms a store into multiple stores, narrows a store, or stores a
 | |
|    value which would not be stored otherwise.  Some examples of unsafe
 | |
|    optimizations are narrowing an assignment into a bitfield, rematerializing
 | |
|    a load, and turning loads and stores into a memcpy call. Reordering 
 | |
|    unordered operations is safe, though, and optimizers should take 
 | |
|    advantage of that because unordered operations are common in
 | |
|    languages that need them.</p>
 | |
| 
 | |
| <p>Monotonic is the weakest level of atomicity that can be used in
 | |
|    synchronization primitives, although it does not provide any general
 | |
|    synchronization. It essentially guarantees that if you take all the
 | |
|    operations affecting a specific address, a consistent ordering exists.
 | |
|    This corresponds to the C++0x/C1x <code>memory_order_relaxed</code>; see 
 | |
|    those standards for the exact definition.  If you are writing a frontend, do
 | |
|    not use the low-level synchronization primitives unless you are compiling
 | |
|    a language which requires it or are sure a given pattern is correct. In
 | |
|    terms of the optimizer, this can be treated as a read+write on the relevant 
 | |
|    memory location (and alias analysis will take advantage of that).  In 
 | |
|    addition, it is legal to reorder non-atomic and Unordered loads around 
 | |
|    Monotonic loads. CSE/DSE and a few other optimizations are allowed, but
 | |
|    Monotonic operations are unlikely to be used in ways which would make
 | |
|    those optimizations useful.</p>
 | |
| 
 | |
| <p>Acquire provides a barrier of the sort necessary to acquire a lock to access
 | |
|    other memory with normal loads and stores. This corresponds to the 
 | |
|    C++0x/C1x <code>memory_order_acquire</code>. It should also be used for
 | |
|    C++0x/C1x <code>memory_order_consume</code>. This is a low-level 
 | |
|    synchronization primitive. In general, optimizers should treat this like
 | |
|    a nothrow call.</p>
 | |
| 
 | |
| <p>Release is similar to Acquire, but with a barrier of the sort necessary to
 | |
|    release a lock. This corresponds to the C++0x/C1x
 | |
|    <code>memory_order_release</code>. In general, optimizers should treat this
 | |
|    like a nothrow call.</p>
 | |
| 
 | |
| <p>AcquireRelease (<code>acq_rel</code> in IR) provides both an Acquire and a Release barrier.
 | |
|    This corresponds to the C++0x/C1x <code>memory_order_acq_rel</code>. In general,
 | |
|    optimizers should treat this like a nothrow call.</p>
 | |
| 
 | |
| <p>SequentiallyConsistent (<code>seq_cst</code> in IR) provides Acquire and/or
 | |
|    Release semantics, and in addition guarantees a total ordering exists with
 | |
|    all other SequentiallyConsistent operations. This corresponds to the
 | |
|    C++0x/C1x <code>memory_order_seq_cst</code>, and Java volatile.  The intent
 | |
|    of this ordering level is to provide a programming model which is relatively
 | |
|    easy to understand. In general, optimizers should treat this like a
 | |
|    nothrow call.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| <h2>
 | |
|   <a name="otherinst">Other atomic instructions</a>
 | |
| </h2>
 | |
| <!-- *********************************************************************** -->
 | |
| 
 | |
| <div>
 | |
| 
 | |
| <p><code>cmpxchg</code> and <code>atomicrmw</code> are essentially like an
 | |
|    atomic load followed by an atomic store (where the store is conditional for
 | |
|    <code>cmpxchg</code>), but no other memory operation can happen between
 | |
|    the load and store.  Note that our cmpxchg does not have quite as many
 | |
|    options for making cmpxchg weaker as the C++0x version.</p>
 | |
| 
 | |
| <p>A <code>fence</code> provides Acquire and/or Release ordering which is not
 | |
|    part of another operation; it is normally used along with Monotonic memory
 | |
|    operations.  A Monotonic load followed by an Acquire fence is roughly
 | |
|    equivalent to an Acquire load.</p>
 | |
| 
 | |
| <p>Frontends generating atomic instructions generally need to be aware of the
 | |
|    target to some degree; atomic instructions are guaranteed to be lock-free,
 | |
|    and therefore an instruction which is wider than the target natively supports
 | |
|    can be impossible to generate.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| <h2>
 | |
|   <a name="iropt">Atomics and IR optimization</a>
 | |
| </h2>
 | |
| <!-- *********************************************************************** -->
 | |
| 
 | |
| <div>
 | |
| 
 | |
| <p>Predicates for optimizer writers to query:
 | |
| <ul>
 | |
|   <li>isSimple(): A load or store which is not volatile or atomic.  This is
 | |
|       what, for example, memcpyopt would check for operations it might
 | |
|       transform.
 | |
|   <li>isUnordered(): A load or store which is not volatile and at most
 | |
|       Unordered. This would be checked, for example, by LICM before hoisting
 | |
|       an operation.
 | |
|   <li>mayReadFromMemory()/mayWriteToMemory(): Existing predicate, but note
 | |
|       that they return true for any operation which is volatile or at least
 | |
|       Monotonic.
 | |
|   <li>Alias analysis: Note that AA will return ModRef for anything Acquire or
 | |
|       Release, and for the address accessed by any Monotonic operation.
 | |
| </ul>
 | |
| 
 | |
| <p>There are essentially two components to supporting atomic operations. The
 | |
|    first is making sure to query isSimple() or isUnordered() instead
 | |
|    of isVolatile() before transforming an operation.  The other piece is
 | |
|    making sure that a transform does not end up replacing, for example, an 
 | |
|    Unordered operation with a non-atomic operation.  Most of the other 
 | |
|    necessary checks automatically fall out from existing predicates and
 | |
|    alias analysis queries.</p>
 | |
| 
 | |
| <p>Some examples of how optimizations interact with various kinds of atomic
 | |
|    operations:
 | |
| <ul>
 | |
|   <li>memcpyopt: An atomic operation cannot be optimized into part of a
 | |
|       memcpy/memset, including unordered loads/stores.  It can pull operations
 | |
|       across some atomic operations.
 | |
|   <li>LICM: Unordered loads/stores can be moved out of a loop.  It just treats
 | |
|       monotonic operations like a read+write to a memory location, and anything
 | |
|       stricter than that like a nothrow call.
 | |
|   <li>DSE: Unordered stores can be DSE'ed like normal stores.  Monotonic stores
 | |
|       can be DSE'ed in some cases, but it's tricky to reason about, and not
 | |
|       especially important.
 | |
|   <li>Folding a load: Any atomic load from a constant global can be
 | |
|       constant-folded, because it cannot be observed.  Similar reasoning allows
 | |
|       scalarrepl with atomic loads and stores.
 | |
| </ul>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| <h2>
 | |
|   <a name="codegen">Atomics and Codegen</a>
 | |
| </h2>
 | |
| <!-- *********************************************************************** -->
 | |
| 
 | |
| <div>
 | |
| 
 | |
| <p>Atomic operations are represented in the SelectionDAG with
 | |
|    <code>ATOMIC_*</code> opcodes.  On architectures which use barrier
 | |
|    instructions for all atomic ordering (like ARM), appropriate fences are
 | |
|    split out as the DAG is built.</p>
 | |
| 
 | |
| <p>The MachineMemOperand for all atomic operations is currently marked as
 | |
|    volatile; this is not correct in the IR sense of volatile, but CodeGen
 | |
|    handles anything marked volatile very conservatively.  This should get
 | |
|    fixed at some point.</p>
 | |
| 
 | |
| <p>The implementation of atomics on LL/SC architectures (like ARM) is currently
 | |
|    a bit of a mess; there is a lot of copy-pasted code across targets, and
 | |
|    the representation is relatively unsuited to optimization (it would be nice
 | |
|    to be able to optimize loops involving cmpxchg etc.).</p>
 | |
| 
 | |
| <p>On x86, all atomic loads generate a <code>MOV</code>.
 | |
|    SequentiallyConsistent stores generate an <code>XCHG</code>, other stores
 | |
|    generate a <code>MOV</code>. SequentiallyConsistent fences generate an
 | |
|    <code>MFENCE</code>, other fences do not cause any code to be generated.
 | |
|    cmpxchg uses the <code>LOCK CMPXCHG</code> instruction.
 | |
|    <code>atomicrmw xchg</code> uses <code>XCHG</code>,
 | |
|    <code>atomicrmw add</code> and <code>atomicrmw sub</code> use
 | |
|    <code>XADD</code>, and all other <code>atomicrmw</code> operations generate
 | |
|    a loop with <code>LOCK CMPXCHG</code>.  Depending on the users of the
 | |
|    result, some <code>atomicrmw</code> operations can be translated into
 | |
|    operations like <code>LOCK AND</code>, but that does not work in
 | |
|    general.</p>
 | |
| 
 | |
| <p>On ARM, MIPS, and many other RISC architectures, Acquire, Release, and
 | |
|    SequentiallyConsistent semantics require barrier instructions
 | |
|    for every such operation. Loads and stores generate normal instructions.
 | |
|    <code>atomicrmw</code> and <code>cmpxchg</code> generate LL/SC loops.</p>
 | |
| 
 | |
| </div>
 | |
| 
 | |
| <!-- *********************************************************************** -->
 | |
| 
 | |
| <hr>
 | |
| <address>
 | |
|   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
 | |
|   src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a>
 | |
|   <a href="http://validator.w3.org/check/referer"><img
 | |
|   src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
 | |
| 
 | |
|   <a href="http://llvm.org/">LLVM Compiler Infrastructure</a><br>
 | |
|   Last modified: $Date: 2011-08-09 02:07:00 -0700 (Tue, 09 Aug 2011) $
 | |
| </address>
 | |
| 
 | |
| </body>
 | |
| </html>
 |