mirror of
https://github.com/c64scene-ar/llvm-6502.git
synced 2025-02-27 02:31:09 +00:00
Sphinxify the Atomics documentation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@159416 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
parent
16eeb6f5eb
commit
3bf24bdb1b
@ -1,569 +0,0 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
||||
"http://www.w3.org/TR/html4/strict.dtd">
|
||||
<html>
|
||||
<head>
|
||||
<title>LLVM Atomic Instructions and Concurrency Guide</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
||||
<link rel="stylesheet" href="_static/llvm.css" type="text/css">
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>
|
||||
LLVM Atomic Instructions and Concurrency Guide
|
||||
</h1>
|
||||
|
||||
<ol>
|
||||
<li><a href="#introduction">Introduction</a></li>
|
||||
<li><a href="#outsideatomic">Optimization outside atomic</a></li>
|
||||
<li><a href="#atomicinst">Atomic instructions</a></li>
|
||||
<li><a href="#ordering">Atomic orderings</a></li>
|
||||
<li><a href="#iropt">Atomics and IR optimization</a></li>
|
||||
<li><a href="#codegen">Atomics and Codegen</a></li>
|
||||
</ol>
|
||||
|
||||
<div class="doc_author">
|
||||
<p>Written by Eli Friedman</p>
|
||||
</div>
|
||||
|
||||
<!-- *********************************************************************** -->
|
||||
<h2>
|
||||
<a name="introduction">Introduction</a>
|
||||
</h2>
|
||||
<!-- *********************************************************************** -->
|
||||
|
||||
<div>
|
||||
|
||||
<p>Historically, LLVM has not had very strong support for concurrency; some
|
||||
minimal intrinsics were provided, and <code>volatile</code> was used in some
|
||||
cases to achieve rough semantics in the presence of concurrency. However, this
|
||||
is changing; there are now new instructions which are well-defined in the
|
||||
presence of threads and asynchronous signals, and the model for existing
|
||||
instructions has been clarified in the IR.</p>
|
||||
|
||||
<p>The atomic instructions are designed specifically to provide readable IR and
|
||||
optimized code generation for the following:</p>
|
||||
<ul>
|
||||
<li>The new C++0x <code><atomic></code> header.
|
||||
(<a href="http://www.open-std.org/jtc1/sc22/wg21/">C++0x draft available here</a>.)
|
||||
(<a href="http://www.open-std.org/jtc1/sc22/wg14/">C1x draft available here</a>)</li>
|
||||
<li>Proper semantics for Java-style memory, for both <code>volatile</code> and
|
||||
regular shared variables.
|
||||
(<a href="http://java.sun.com/docs/books/jls/third_edition/html/memory.html">Java Specification</a>)</li>
|
||||
<li>gcc-compatible <code>__sync_*</code> builtins.
|
||||
(<a href="http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html">Description</a>)</li>
|
||||
<li>Other scenarios with atomic semantics, including <code>static</code>
|
||||
variables with non-trivial constructors in C++.</li>
|
||||
</ul>
|
||||
|
||||
<p>Atomic and volatile in the IR are orthogonal; "volatile" is the C/C++
|
||||
volatile, which ensures that every volatile load and store happens and is
|
||||
performed in the stated order. A couple examples: if a
|
||||
SequentiallyConsistent store is immediately followed by another
|
||||
SequentiallyConsistent store to the same address, the first store can
|
||||
be erased. This transformation is not allowed for a pair of volatile
|
||||
stores. On the other hand, a non-volatile non-atomic load can be moved
|
||||
across a volatile load freely, but not an Acquire load.</p>
|
||||
|
||||
<p>This document is intended to provide a guide to anyone either writing a
|
||||
frontend for LLVM or working on optimization passes for LLVM with a guide
|
||||
for how to deal with instructions with special semantics in the presence of
|
||||
concurrency. This is not intended to be a precise guide to the semantics;
|
||||
the details can get extremely complicated and unreadable, and are not
|
||||
usually necessary.</p>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- *********************************************************************** -->
|
||||
<h2>
|
||||
<a name="outsideatomic">Optimization outside atomic</a>
|
||||
</h2>
|
||||
<!-- *********************************************************************** -->
|
||||
|
||||
<div>
|
||||
|
||||
<p>The basic <code>'load'</code> and <code>'store'</code> allow a variety of
|
||||
optimizations, but can lead to undefined results in a concurrent environment;
|
||||
see <a href="#o_nonatomic">NonAtomic</a>. This section specifically goes
|
||||
into the one optimizer restriction which applies in concurrent environments,
|
||||
which gets a bit more of an extended description because any optimization
|
||||
dealing with stores needs to be aware of it.</p>
|
||||
|
||||
<p>From the optimizer's point of view, the rule is that if there
|
||||
are not any instructions with atomic ordering involved, concurrency does
|
||||
not matter, with one exception: if a variable might be visible to another
|
||||
thread or signal handler, a store cannot be inserted along a path where it
|
||||
might not execute otherwise. Take the following example:</p>
|
||||
|
||||
<pre>
|
||||
/* C code, for readability; run through clang -O2 -S -emit-llvm to get
|
||||
equivalent IR */
|
||||
int x;
|
||||
void f(int* a) {
|
||||
for (int i = 0; i < 100; i++) {
|
||||
if (a[i])
|
||||
x += 1;
|
||||
}
|
||||
}
|
||||
</pre>
|
||||
|
||||
<p>The following is equivalent in non-concurrent situations:</p>
|
||||
|
||||
<pre>
|
||||
int x;
|
||||
void f(int* a) {
|
||||
int xtemp = x;
|
||||
for (int i = 0; i < 100; i++) {
|
||||
if (a[i])
|
||||
xtemp += 1;
|
||||
}
|
||||
x = xtemp;
|
||||
}
|
||||
</pre>
|
||||
|
||||
<p>However, LLVM is not allowed to transform the former to the latter: it could
|
||||
indirectly introduce undefined behavior if another thread can access x at
|
||||
the same time. (This example is particularly of interest because before the
|
||||
concurrency model was implemented, LLVM would perform this
|
||||
transformation.)</p>
|
||||
|
||||
<p>Note that speculative loads are allowed; a load which
|
||||
is part of a race returns <code>undef</code>, but does not have undefined
|
||||
behavior.</p>
|
||||
|
||||
|
||||
</div>
|
||||
|
||||
<!-- *********************************************************************** -->
|
||||
<h2>
|
||||
<a name="atomicinst">Atomic instructions</a>
|
||||
</h2>
|
||||
<!-- *********************************************************************** -->
|
||||
|
||||
<div>
|
||||
|
||||
<p>For cases where simple loads and stores are not sufficient, LLVM provides
|
||||
various atomic instructions. The exact guarantees provided depend on the
|
||||
ordering; see <a href="#ordering">Atomic orderings</a></p>
|
||||
|
||||
<p><code>load atomic</code> and <code>store atomic</code> provide the same
|
||||
basic functionality as non-atomic loads and stores, but provide additional
|
||||
guarantees in situations where threads and signals are involved.</p>
|
||||
|
||||
<p><code>cmpxchg</code> and <code>atomicrmw</code> are essentially like an
|
||||
atomic load followed by an atomic store (where the store is conditional for
|
||||
<code>cmpxchg</code>), but no other memory operation can happen on any thread
|
||||
between the load and store. Note that LLVM's cmpxchg does not provide quite
|
||||
as many options as the C++0x version.</p>
|
||||
|
||||
<p>A <code>fence</code> provides Acquire and/or Release ordering which is not
|
||||
part of another operation; it is normally used along with Monotonic memory
|
||||
operations. A Monotonic load followed by an Acquire fence is roughly
|
||||
equivalent to an Acquire load.</p>
|
||||
|
||||
<p>Frontends generating atomic instructions generally need to be aware of the
|
||||
target to some degree; atomic instructions are guaranteed to be lock-free,
|
||||
and therefore an instruction which is wider than the target natively supports
|
||||
can be impossible to generate.</p>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- *********************************************************************** -->
|
||||
<h2>
|
||||
<a name="ordering">Atomic orderings</a>
|
||||
</h2>
|
||||
<!-- *********************************************************************** -->
|
||||
|
||||
<div>
|
||||
|
||||
<p>In order to achieve a balance between performance and necessary guarantees,
|
||||
there are six levels of atomicity. They are listed in order of strength;
|
||||
each level includes all the guarantees of the previous level except for
|
||||
Acquire/Release. (See also <a href="LangRef.html#ordering">LangRef</a>.)</p>
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h3>
|
||||
<a name="o_notatomic">NotAtomic</a>
|
||||
</h3>
|
||||
|
||||
<div>
|
||||
|
||||
<p>NotAtomic is the obvious, a load or store which is not atomic. (This isn't
|
||||
really a level of atomicity, but is listed here for comparison.) This is
|
||||
essentially a regular load or store. If there is a race on a given memory
|
||||
location, loads from that location return undef.</p>
|
||||
|
||||
<dl>
|
||||
<dt>Relevant standard</dt>
|
||||
<dd>This is intended to match shared variables in C/C++, and to be used
|
||||
in any other context where memory access is necessary, and
|
||||
a race is impossible. (The precise definition is in
|
||||
<a href="LangRef.html#memmodel">LangRef</a>.)
|
||||
<dt>Notes for frontends</dt>
|
||||
<dd>The rule is essentially that all memory accessed with basic loads and
|
||||
stores by multiple threads should be protected by a lock or other
|
||||
synchronization; otherwise, you are likely to run into undefined
|
||||
behavior. If your frontend is for a "safe" language like Java,
|
||||
use Unordered to load and store any shared variable. Note that NotAtomic
|
||||
volatile loads and stores are not properly atomic; do not try to use
|
||||
them as a substitute. (Per the C/C++ standards, volatile does provide
|
||||
some limited guarantees around asynchronous signals, but atomics are
|
||||
generally a better solution.)
|
||||
<dt>Notes for optimizers</dt>
|
||||
<dd>Introducing loads to shared variables along a codepath where they would
|
||||
not otherwise exist is allowed; introducing stores to shared variables
|
||||
is not. See <a href="#outsideatomic">Optimization outside
|
||||
atomic</a>.</dd>
|
||||
<dt>Notes for code generation</dt>
|
||||
<dd>The one interesting restriction here is that it is not allowed to write
|
||||
to bytes outside of the bytes relevant to a store. This is mostly
|
||||
relevant to unaligned stores: it is not allowed in general to convert
|
||||
an unaligned store into two aligned stores of the same width as the
|
||||
unaligned store. Backends are also expected to generate an i8 store
|
||||
as an i8 store, and not an instruction which writes to surrounding
|
||||
bytes. (If you are writing a backend for an architecture which cannot
|
||||
satisfy these restrictions and cares about concurrency, please send an
|
||||
email to llvmdev.)</dd>
|
||||
</dl>
|
||||
|
||||
</div>
|
||||
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h3>
|
||||
<a name="o_unordered">Unordered</a>
|
||||
</h3>
|
||||
|
||||
<div>
|
||||
|
||||
<p>Unordered is the lowest level of atomicity. It essentially guarantees that
|
||||
races produce somewhat sane results instead of having undefined behavior.
|
||||
It also guarantees the operation to be lock-free, so it do not depend on
|
||||
the data being part of a special atomic structure or depend on a separate
|
||||
per-process global lock. Note that code generation will fail for
|
||||
unsupported atomic operations; if you need such an operation, use explicit
|
||||
locking.</p>
|
||||
|
||||
<dl>
|
||||
<dt>Relevant standard</dt>
|
||||
<dd>This is intended to match the Java memory model for shared
|
||||
variables.</dd>
|
||||
<dt>Notes for frontends</dt>
|
||||
<dd>This cannot be used for synchronization, but is useful for Java and
|
||||
other "safe" languages which need to guarantee that the generated
|
||||
code never exhibits undefined behavior. Note that this guarantee
|
||||
is cheap on common platforms for loads of a native width, but can
|
||||
be expensive or unavailable for wider loads, like a 64-bit store
|
||||
on ARM. (A frontend for Java or other "safe" languages would normally
|
||||
split a 64-bit store on ARM into two 32-bit unordered stores.)
|
||||
<dt>Notes for optimizers</dt>
|
||||
<dd>In terms of the optimizer, this prohibits any transformation that
|
||||
transforms a single load into multiple loads, transforms a store
|
||||
into multiple stores, narrows a store, or stores a value which
|
||||
would not be stored otherwise. Some examples of unsafe optimizations
|
||||
are narrowing an assignment into a bitfield, rematerializing
|
||||
a load, and turning loads and stores into a memcpy call. Reordering
|
||||
unordered operations is safe, though, and optimizers should take
|
||||
advantage of that because unordered operations are common in
|
||||
languages that need them.</dd>
|
||||
<dt>Notes for code generation</dt>
|
||||
<dd>These operations are required to be atomic in the sense that if you
|
||||
use unordered loads and unordered stores, a load cannot see a value
|
||||
which was never stored. A normal load or store instruction is usually
|
||||
sufficient, but note that an unordered load or store cannot
|
||||
be split into multiple instructions (or an instruction which
|
||||
does multiple memory operations, like <code>LDRD</code> on ARM).</dd>
|
||||
</dl>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h3>
|
||||
<a name="o_monotonic">Monotonic</a>
|
||||
</h3>
|
||||
|
||||
<div>
|
||||
|
||||
<p>Monotonic is the weakest level of atomicity that can be used in
|
||||
synchronization primitives, although it does not provide any general
|
||||
synchronization. It essentially guarantees that if you take all the
|
||||
operations affecting a specific address, a consistent ordering exists.
|
||||
|
||||
<dl>
|
||||
<dt>Relevant standard</dt>
|
||||
<dd>This corresponds to the C++0x/C1x <code>memory_order_relaxed</code>;
|
||||
see those standards for the exact definition.
|
||||
<dt>Notes for frontends</dt>
|
||||
<dd>If you are writing a frontend which uses this directly, use with caution.
|
||||
The guarantees in terms of synchronization are very weak, so make
|
||||
sure these are only used in a pattern which you know is correct.
|
||||
Generally, these would either be used for atomic operations which
|
||||
do not protect other memory (like an atomic counter), or along with
|
||||
a <code>fence</code>.</dd>
|
||||
<dt>Notes for optimizers</dt>
|
||||
<dd>In terms of the optimizer, this can be treated as a read+write on the
|
||||
relevant memory location (and alias analysis will take advantage of
|
||||
that). In addition, it is legal to reorder non-atomic and Unordered
|
||||
loads around Monotonic loads. CSE/DSE and a few other optimizations
|
||||
are allowed, but Monotonic operations are unlikely to be used in ways
|
||||
which would make those optimizations useful.</dd>
|
||||
<dt>Notes for code generation</dt>
|
||||
<dd>Code generation is essentially the same as that for unordered for loads
|
||||
and stores. No fences are required. <code>cmpxchg</code> and
|
||||
<code>atomicrmw</code> are required to appear as a single operation.</dd>
|
||||
</dl>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h3>
|
||||
<a name="o_acquire">Acquire</a>
|
||||
</h3>
|
||||
|
||||
<div>
|
||||
|
||||
<p>Acquire provides a barrier of the sort necessary to acquire a lock to access
|
||||
other memory with normal loads and stores.
|
||||
|
||||
<dl>
|
||||
<dt>Relevant standard</dt>
|
||||
<dd>This corresponds to the C++0x/C1x <code>memory_order_acquire</code>. It
|
||||
should also be used for C++0x/C1x <code>memory_order_consume</code>.
|
||||
<dt>Notes for frontends</dt>
|
||||
<dd>If you are writing a frontend which uses this directly, use with caution.
|
||||
Acquire only provides a semantic guarantee when paired with a Release
|
||||
operation.</dd>
|
||||
<dt>Notes for optimizers</dt>
|
||||
<dd>Optimizers not aware of atomics can treat this like a nothrow call.
|
||||
It is also possible to move stores from before an Acquire load
|
||||
or read-modify-write operation to after it, and move non-Acquire
|
||||
loads from before an Acquire operation to after it.</dd>
|
||||
<dt>Notes for code generation</dt>
|
||||
<dd>Architectures with weak memory ordering (essentially everything relevant
|
||||
today except x86 and SPARC) require some sort of fence to maintain
|
||||
the Acquire semantics. The precise fences required varies widely by
|
||||
architecture, but for a simple implementation, most architectures provide
|
||||
a barrier which is strong enough for everything (<code>dmb</code> on ARM,
|
||||
<code>sync</code> on PowerPC, etc.). Putting such a fence after the
|
||||
equivalent Monotonic operation is sufficient to maintain Acquire
|
||||
semantics for a memory operation.</dd>
|
||||
</dl>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h3>
|
||||
<a name="o_acquire">Release</a>
|
||||
</h3>
|
||||
|
||||
<div>
|
||||
|
||||
<p>Release is similar to Acquire, but with a barrier of the sort necessary to
|
||||
release a lock.
|
||||
|
||||
<dl>
|
||||
<dt>Relevant standard</dt>
|
||||
<dd>This corresponds to the C++0x/C1x <code>memory_order_release</code>.</dd>
|
||||
<dt>Notes for frontends</dt>
|
||||
<dd>If you are writing a frontend which uses this directly, use with caution.
|
||||
Release only provides a semantic guarantee when paired with a Acquire
|
||||
operation.</dd>
|
||||
<dt>Notes for optimizers</dt>
|
||||
<dd>Optimizers not aware of atomics can treat this like a nothrow call.
|
||||
It is also possible to move loads from after a Release store
|
||||
or read-modify-write operation to before it, and move non-Release
|
||||
stores from after an Release operation to before it.</dd>
|
||||
<dt>Notes for code generation</dt>
|
||||
<dd>See the section on Acquire; a fence before the relevant operation is
|
||||
usually sufficient for Release. Note that a store-store fence is not
|
||||
sufficient to implement Release semantics; store-store fences are
|
||||
generally not exposed to IR because they are extremely difficult to
|
||||
use correctly.</dd>
|
||||
</dl>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h3>
|
||||
<a name="o_acqrel">AcquireRelease</a>
|
||||
</h3>
|
||||
|
||||
<div>
|
||||
|
||||
<p>AcquireRelease (<code>acq_rel</code> in IR) provides both an Acquire and a
|
||||
Release barrier (for fences and operations which both read and write memory).
|
||||
|
||||
<dl>
|
||||
<dt>Relevant standard</dt>
|
||||
<dd>This corresponds to the C++0x/C1x <code>memory_order_acq_rel</code>.
|
||||
<dt>Notes for frontends</dt>
|
||||
<dd>If you are writing a frontend which uses this directly, use with caution.
|
||||
Acquire only provides a semantic guarantee when paired with a Release
|
||||
operation, and vice versa.</dd>
|
||||
<dt>Notes for optimizers</dt>
|
||||
<dd>In general, optimizers should treat this like a nothrow call; the
|
||||
the possible optimizations are usually not interesting.</dd>
|
||||
<dt>Notes for code generation</dt>
|
||||
<dd>This operation has Acquire and Release semantics; see the sections on
|
||||
Acquire and Release.</dd>
|
||||
</dl>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h3>
|
||||
<a name="o_seqcst">SequentiallyConsistent</a>
|
||||
</h3>
|
||||
|
||||
<div>
|
||||
|
||||
<p>SequentiallyConsistent (<code>seq_cst</code> in IR) provides
|
||||
Acquire semantics for loads and Release semantics for
|
||||
stores. Additionally, it guarantees that a total ordering exists
|
||||
between all SequentiallyConsistent operations.
|
||||
|
||||
<dl>
|
||||
<dt>Relevant standard</dt>
|
||||
<dd>This corresponds to the C++0x/C1x <code>memory_order_seq_cst</code>,
|
||||
Java volatile, and the gcc-compatible <code>__sync_*</code> builtins
|
||||
which do not specify otherwise.
|
||||
<dt>Notes for frontends</dt>
|
||||
<dd>If a frontend is exposing atomic operations, these are much easier to
|
||||
reason about for the programmer than other kinds of operations, and using
|
||||
them is generally a practical performance tradeoff.</dd>
|
||||
<dt>Notes for optimizers</dt>
|
||||
<dd>Optimizers not aware of atomics can treat this like a nothrow call.
|
||||
For SequentiallyConsistent loads and stores, the same reorderings are
|
||||
allowed as for Acquire loads and Release stores, except that
|
||||
SequentiallyConsistent operations may not be reordered.</dd>
|
||||
<dt>Notes for code generation</dt>
|
||||
<dd>SequentiallyConsistent loads minimally require the same barriers
|
||||
as Acquire operations and SequentiallyConsistent stores require
|
||||
Release barriers. Additionally, the code generator must enforce
|
||||
ordering between SequentiallyConsistent stores followed by
|
||||
SequentiallyConsistent loads. This is usually done by emitting
|
||||
either a full fence before the loads or a full fence after the
|
||||
stores; which is preferred varies by architecture.</dd>
|
||||
</dl>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- *********************************************************************** -->
|
||||
<h2>
|
||||
<a name="iropt">Atomics and IR optimization</a>
|
||||
</h2>
|
||||
<!-- *********************************************************************** -->
|
||||
|
||||
<div>
|
||||
|
||||
<p>Predicates for optimizer writers to query:
|
||||
<ul>
|
||||
<li>isSimple(): A load or store which is not volatile or atomic. This is
|
||||
what, for example, memcpyopt would check for operations it might
|
||||
transform.</li>
|
||||
<li>isUnordered(): A load or store which is not volatile and at most
|
||||
Unordered. This would be checked, for example, by LICM before hoisting
|
||||
an operation.</li>
|
||||
<li>mayReadFromMemory()/mayWriteToMemory(): Existing predicate, but note
|
||||
that they return true for any operation which is volatile or at least
|
||||
Monotonic.</li>
|
||||
<li>Alias analysis: Note that AA will return ModRef for anything Acquire or
|
||||
Release, and for the address accessed by any Monotonic operation.</li>
|
||||
</ul>
|
||||
|
||||
<p>To support optimizing around atomic operations, make sure you are using
|
||||
the right predicates; everything should work if that is done. If your
|
||||
pass should optimize some atomic operations (Unordered operations in
|
||||
particular), make sure it doesn't replace an atomic load or store with
|
||||
a non-atomic operation.</p>
|
||||
|
||||
<p>Some examples of how optimizations interact with various kinds of atomic
|
||||
operations:
|
||||
<ul>
|
||||
<li>memcpyopt: An atomic operation cannot be optimized into part of a
|
||||
memcpy/memset, including unordered loads/stores. It can pull operations
|
||||
across some atomic operations.
|
||||
<li>LICM: Unordered loads/stores can be moved out of a loop. It just treats
|
||||
monotonic operations like a read+write to a memory location, and anything
|
||||
stricter than that like a nothrow call.
|
||||
<li>DSE: Unordered stores can be DSE'ed like normal stores. Monotonic stores
|
||||
can be DSE'ed in some cases, but it's tricky to reason about, and not
|
||||
especially important.
|
||||
<li>Folding a load: Any atomic load from a constant global can be
|
||||
constant-folded, because it cannot be observed. Similar reasoning allows
|
||||
scalarrepl with atomic loads and stores.
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
|
||||
<!-- *********************************************************************** -->
|
||||
<h2>
|
||||
<a name="codegen">Atomics and Codegen</a>
|
||||
</h2>
|
||||
<!-- *********************************************************************** -->
|
||||
|
||||
<div>
|
||||
|
||||
<p>Atomic operations are represented in the SelectionDAG with
|
||||
<code>ATOMIC_*</code> opcodes. On architectures which use barrier
|
||||
instructions for all atomic ordering (like ARM), appropriate fences are
|
||||
split out as the DAG is built.</p>
|
||||
|
||||
<p>The MachineMemOperand for all atomic operations is currently marked as
|
||||
volatile; this is not correct in the IR sense of volatile, but CodeGen
|
||||
handles anything marked volatile very conservatively. This should get
|
||||
fixed at some point.</p>
|
||||
|
||||
<p>Common architectures have some way of representing at least a pointer-sized
|
||||
lock-free <code>cmpxchg</code>; such an operation can be used to implement
|
||||
all the other atomic operations which can be represented in IR up to that
|
||||
size. Backends are expected to implement all those operations, but not
|
||||
operations which cannot be implemented in a lock-free manner. It is
|
||||
expected that backends will give an error when given an operation which
|
||||
cannot be implemented. (The LLVM code generator is not very helpful here
|
||||
at the moment, but hopefully that will change.)</p>
|
||||
|
||||
<p>The implementation of atomics on LL/SC architectures (like ARM) is currently
|
||||
a bit of a mess; there is a lot of copy-pasted code across targets, and
|
||||
the representation is relatively unsuited to optimization (it would be nice
|
||||
to be able to optimize loops involving cmpxchg etc.).</p>
|
||||
|
||||
<p>On x86, all atomic loads generate a <code>MOV</code>.
|
||||
SequentiallyConsistent stores generate an <code>XCHG</code>, other stores
|
||||
generate a <code>MOV</code>. SequentiallyConsistent fences generate an
|
||||
<code>MFENCE</code>, other fences do not cause any code to be generated.
|
||||
cmpxchg uses the <code>LOCK CMPXCHG</code> instruction.
|
||||
<code>atomicrmw xchg</code> uses <code>XCHG</code>,
|
||||
<code>atomicrmw add</code> and <code>atomicrmw sub</code> use
|
||||
<code>XADD</code>, and all other <code>atomicrmw</code> operations generate
|
||||
a loop with <code>LOCK CMPXCHG</code>. Depending on the users of the
|
||||
result, some <code>atomicrmw</code> operations can be translated into
|
||||
operations like <code>LOCK AND</code>, but that does not work in
|
||||
general.</p>
|
||||
|
||||
<p>On ARM, MIPS, and many other RISC architectures, Acquire, Release, and
|
||||
SequentiallyConsistent semantics require barrier instructions
|
||||
for every such operation. Loads and stores generate normal instructions.
|
||||
<code>cmpxchg</code> and <code>atomicrmw</code> can be represented using
|
||||
a loop with LL/SC-style instructions which take some sort of exclusive
|
||||
lock on a cache line (<code>LDREX</code> and <code>STREX</code> on
|
||||
ARM, etc.). At the moment, the IR does not provide any way to represent a
|
||||
weak <code>cmpxchg</code> which would not require a loop.</p>
|
||||
</div>
|
||||
|
||||
<!-- *********************************************************************** -->
|
||||
|
||||
<hr>
|
||||
<address>
|
||||
<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
|
||||
src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a>
|
||||
<a href="http://validator.w3.org/check/referer"><img
|
||||
src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
|
||||
|
||||
<a href="http://llvm.org/">LLVM Compiler Infrastructure</a><br>
|
||||
Last modified: $Date: 2011-08-09 02:07:00 -0700 (Tue, 09 Aug 2011) $
|
||||
</address>
|
||||
|
||||
</body>
|
||||
</html>
|
441
docs/Atomics.rst
Normal file
441
docs/Atomics.rst
Normal file
@ -0,0 +1,441 @@
|
||||
.. _atomics:
|
||||
|
||||
==============================================
|
||||
LLVM Atomic Instructions and Concurrency Guide
|
||||
==============================================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
Historically, LLVM has not had very strong support for concurrency; some minimal
|
||||
intrinsics were provided, and ``volatile`` was used in some cases to achieve
|
||||
rough semantics in the presence of concurrency. However, this is changing;
|
||||
there are now new instructions which are well-defined in the presence of threads
|
||||
and asynchronous signals, and the model for existing instructions has been
|
||||
clarified in the IR.
|
||||
|
||||
The atomic instructions are designed specifically to provide readable IR and
|
||||
optimized code generation for the following:
|
||||
|
||||
* The new C++0x ``<atomic>`` header. (`C++0x draft available here
|
||||
<http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C1x draft available here
|
||||
<http://www.open-std.org/jtc1/sc22/wg14/>`_.)
|
||||
|
||||
* Proper semantics for Java-style memory, for both ``volatile`` and regular
|
||||
shared variables. (`Java Specification
|
||||
<http://java.sun.com/docs/books/jls/third_edition/html/memory.html>`_)
|
||||
|
||||
* gcc-compatible ``__sync_*`` builtins. (`Description
|
||||
<http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html>`_)
|
||||
|
||||
* Other scenarios with atomic semantics, including ``static`` variables with
|
||||
non-trivial constructors in C++.
|
||||
|
||||
Atomic and volatile in the IR are orthogonal; "volatile" is the C/C++ volatile,
|
||||
which ensures that every volatile load and store happens and is performed in the
|
||||
stated order. A couple examples: if a SequentiallyConsistent store is
|
||||
immediately followed by another SequentiallyConsistent store to the same
|
||||
address, the first store can be erased. This transformation is not allowed for a
|
||||
pair of volatile stores. On the other hand, a non-volatile non-atomic load can
|
||||
be moved across a volatile load freely, but not an Acquire load.
|
||||
|
||||
This document is intended to provide a guide to anyone either writing a frontend
|
||||
for LLVM or working on optimization passes for LLVM with a guide for how to deal
|
||||
with instructions with special semantics in the presence of concurrency. This
|
||||
is not intended to be a precise guide to the semantics; the details can get
|
||||
extremely complicated and unreadable, and are not usually necessary.
|
||||
|
||||
.. _Optimization outside atomic:
|
||||
|
||||
Optimization outside atomic
|
||||
===========================
|
||||
|
||||
The basic ``'load'`` and ``'store'`` allow a variety of optimizations, but can
|
||||
lead to undefined results in a concurrent environment; see `NotAtomic`_. This
|
||||
section specifically goes into the one optimizer restriction which applies in
|
||||
concurrent environments, which gets a bit more of an extended description
|
||||
because any optimization dealing with stores needs to be aware of it.
|
||||
|
||||
From the optimizer's point of view, the rule is that if there are not any
|
||||
instructions with atomic ordering involved, concurrency does not matter, with
|
||||
one exception: if a variable might be visible to another thread or signal
|
||||
handler, a store cannot be inserted along a path where it might not execute
|
||||
otherwise. Take the following example:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
/* C code, for readability; run through clang -O2 -S -emit-llvm to get
|
||||
equivalent IR */
|
||||
int x;
|
||||
void f(int* a) {
|
||||
for (int i = 0; i < 100; i++) {
|
||||
if (a[i])
|
||||
x += 1;
|
||||
}
|
||||
}
|
||||
|
||||
The following is equivalent in non-concurrent situations:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int x;
|
||||
void f(int* a) {
|
||||
int xtemp = x;
|
||||
for (int i = 0; i < 100; i++) {
|
||||
if (a[i])
|
||||
xtemp += 1;
|
||||
}
|
||||
x = xtemp;
|
||||
}
|
||||
|
||||
However, LLVM is not allowed to transform the former to the latter: it could
|
||||
indirectly introduce undefined behavior if another thread can access ``x`` at
|
||||
the same time. (This example is particularly of interest because before the
|
||||
concurrency model was implemented, LLVM would perform this transformation.)
|
||||
|
||||
Note that speculative loads are allowed; a load which is part of a race returns
|
||||
``undef``, but does not have undefined behavior.
|
||||
|
||||
Atomic instructions
|
||||
===================
|
||||
|
||||
For cases where simple loads and stores are not sufficient, LLVM provides
|
||||
various atomic instructions. The exact guarantees provided depend on the
|
||||
ordering; see `Atomic orderings`_.
|
||||
|
||||
``load atomic`` and ``store atomic`` provide the same basic functionality as
|
||||
non-atomic loads and stores, but provide additional guarantees in situations
|
||||
where threads and signals are involved.
|
||||
|
||||
``cmpxchg`` and ``atomicrmw`` are essentially like an atomic load followed by an
|
||||
atomic store (where the store is conditional for ``cmpxchg``), but no other
|
||||
memory operation can happen on any thread between the load and store. Note that
|
||||
LLVM's cmpxchg does not provide quite as many options as the C++0x version.
|
||||
|
||||
A ``fence`` provides Acquire and/or Release ordering which is not part of
|
||||
another operation; it is normally used along with Monotonic memory operations.
|
||||
A Monotonic load followed by an Acquire fence is roughly equivalent to an
|
||||
Acquire load.
|
||||
|
||||
Frontends generating atomic instructions generally need to be aware of the
|
||||
target to some degree; atomic instructions are guaranteed to be lock-free, and
|
||||
therefore an instruction which is wider than the target natively supports can be
|
||||
impossible to generate.
|
||||
|
||||
.. _Atomic orderings:
|
||||
|
||||
Atomic orderings
|
||||
================
|
||||
|
||||
In order to achieve a balance between performance and necessary guarantees,
|
||||
there are six levels of atomicity. They are listed in order of strength; each
|
||||
level includes all the guarantees of the previous level except for
|
||||
Acquire/Release. (See also `LangRef Ordering <LangRef.html#ordering>`_.)
|
||||
|
||||
.. _NotAtomic:
|
||||
|
||||
NotAtomic
|
||||
---------
|
||||
|
||||
NotAtomic is the obvious, a load or store which is not atomic. (This isn't
|
||||
really a level of atomicity, but is listed here for comparison.) This is
|
||||
essentially a regular load or store. If there is a race on a given memory
|
||||
location, loads from that location return undef.
|
||||
|
||||
Relevant standard
|
||||
This is intended to match shared variables in C/C++, and to be used in any
|
||||
other context where memory access is necessary, and a race is impossible. (The
|
||||
precise definition is in `LangRef Memory Model <LangRef.html#memmodel>`_.)
|
||||
|
||||
Notes for frontends
|
||||
The rule is essentially that all memory accessed with basic loads and stores
|
||||
by multiple threads should be protected by a lock or other synchronization;
|
||||
otherwise, you are likely to run into undefined behavior. If your frontend is
|
||||
for a "safe" language like Java, use Unordered to load and store any shared
|
||||
variable. Note that NotAtomic volatile loads and stores are not properly
|
||||
atomic; do not try to use them as a substitute. (Per the C/C++ standards,
|
||||
volatile does provide some limited guarantees around asynchronous signals, but
|
||||
atomics are generally a better solution.)
|
||||
|
||||
Notes for optimizers
|
||||
Introducing loads to shared variables along a codepath where they would not
|
||||
otherwise exist is allowed; introducing stores to shared variables is not. See
|
||||
`Optimization outside atomic`_.
|
||||
|
||||
Notes for code generation
|
||||
The one interesting restriction here is that it is not allowed to write to
|
||||
bytes outside of the bytes relevant to a store. This is mostly relevant to
|
||||
unaligned stores: it is not allowed in general to convert an unaligned store
|
||||
into two aligned stores of the same width as the unaligned store. Backends are
|
||||
also expected to generate an i8 store as an i8 store, and not an instruction
|
||||
which writes to surrounding bytes. (If you are writing a backend for an
|
||||
architecture which cannot satisfy these restrictions and cares about
|
||||
concurrency, please send an email to llvmdev.)
|
||||
|
||||
Unordered
|
||||
---------
|
||||
|
||||
Unordered is the lowest level of atomicity. It essentially guarantees that races
|
||||
produce somewhat sane results instead of having undefined behavior. It also
|
||||
guarantees the operation to be lock-free, so it do not depend on the data being
|
||||
part of a special atomic structure or depend on a separate per-process global
|
||||
lock. Note that code generation will fail for unsupported atomic operations; if
|
||||
you need such an operation, use explicit locking.
|
||||
|
||||
Relevant standard
|
||||
This is intended to match the Java memory model for shared variables.
|
||||
|
||||
Notes for frontends
|
||||
This cannot be used for synchronization, but is useful for Java and other
|
||||
"safe" languages which need to guarantee that the generated code never
|
||||
exhibits undefined behavior. Note that this guarantee is cheap on common
|
||||
platforms for loads of a native width, but can be expensive or unavailable for
|
||||
wider loads, like a 64-bit store on ARM. (A frontend for Java or other "safe"
|
||||
languages would normally split a 64-bit store on ARM into two 32-bit unordered
|
||||
stores.)
|
||||
|
||||
Notes for optimizers
|
||||
In terms of the optimizer, this prohibits any transformation that transforms a
|
||||
single load into multiple loads, transforms a store into multiple stores,
|
||||
narrows a store, or stores a value which would not be stored otherwise. Some
|
||||
examples of unsafe optimizations are narrowing an assignment into a bitfield,
|
||||
rematerializing a load, and turning loads and stores into a memcpy
|
||||
call. Reordering unordered operations is safe, though, and optimizers should
|
||||
take advantage of that because unordered operations are common in languages
|
||||
that need them.
|
||||
|
||||
Notes for code generation
|
||||
These operations are required to be atomic in the sense that if you use
|
||||
unordered loads and unordered stores, a load cannot see a value which was
|
||||
never stored. A normal load or store instruction is usually sufficient, but
|
||||
note that an unordered load or store cannot be split into multiple
|
||||
instructions (or an instruction which does multiple memory operations, like
|
||||
``LDRD`` on ARM).
|
||||
|
||||
Monotonic
|
||||
---------
|
||||
|
||||
Monotonic is the weakest level of atomicity that can be used in synchronization
|
||||
primitives, although it does not provide any general synchronization. It
|
||||
essentially guarantees that if you take all the operations affecting a specific
|
||||
address, a consistent ordering exists.
|
||||
|
||||
Relevant standard
|
||||
This corresponds to the C++0x/C1x ``memory_order_relaxed``; see those
|
||||
standards for the exact definition.
|
||||
|
||||
Notes for frontends
|
||||
If you are writing a frontend which uses this directly, use with caution. The
|
||||
guarantees in terms of synchronization are very weak, so make sure these are
|
||||
only used in a pattern which you know is correct. Generally, these would
|
||||
either be used for atomic operations which do not protect other memory (like
|
||||
an atomic counter), or along with a ``fence``.
|
||||
|
||||
Notes for optimizers
|
||||
In terms of the optimizer, this can be treated as a read+write on the relevant
|
||||
memory location (and alias analysis will take advantage of that). In addition,
|
||||
it is legal to reorder non-atomic and Unordered loads around Monotonic
|
||||
loads. CSE/DSE and a few other optimizations are allowed, but Monotonic
|
||||
operations are unlikely to be used in ways which would make those
|
||||
optimizations useful.
|
||||
|
||||
Notes for code generation
|
||||
Code generation is essentially the same as that for unordered for loads and
|
||||
stores. No fences are required. ``cmpxchg`` and ``atomicrmw`` are required
|
||||
to appear as a single operation.
|
||||
|
||||
Acquire
|
||||
-------
|
||||
|
||||
Acquire provides a barrier of the sort necessary to acquire a lock to access
|
||||
other memory with normal loads and stores.
|
||||
|
||||
Relevant standard
|
||||
This corresponds to the C++0x/C1x ``memory_order_acquire``. It should also be
|
||||
used for C++0x/C1x ``memory_order_consume``.
|
||||
|
||||
Notes for frontends
|
||||
If you are writing a frontend which uses this directly, use with caution.
|
||||
Acquire only provides a semantic guarantee when paired with a Release
|
||||
operation.
|
||||
|
||||
Notes for optimizers
|
||||
Optimizers not aware of atomics can treat this like a nothrow call. It is
|
||||
also possible to move stores from before an Acquire load or read-modify-write
|
||||
operation to after it, and move non-Acquire loads from before an Acquire
|
||||
operation to after it.
|
||||
|
||||
Notes for code generation
|
||||
Architectures with weak memory ordering (essentially everything relevant today
|
||||
except x86 and SPARC) require some sort of fence to maintain the Acquire
|
||||
semantics. The precise fences required varies widely by architecture, but for
|
||||
a simple implementation, most architectures provide a barrier which is strong
|
||||
enough for everything (``dmb`` on ARM, ``sync`` on PowerPC, etc.). Putting
|
||||
such a fence after the equivalent Monotonic operation is sufficient to
|
||||
maintain Acquire semantics for a memory operation.
|
||||
|
||||
Release
|
||||
-------
|
||||
|
||||
Release is similar to Acquire, but with a barrier of the sort necessary to
|
||||
release a lock.
|
||||
|
||||
Relevant standard
|
||||
This corresponds to the C++0x/C1x ``memory_order_release``.
|
||||
|
||||
Notes for frontends
|
||||
If you are writing a frontend which uses this directly, use with caution.
|
||||
Release only provides a semantic guarantee when paired with a Acquire
|
||||
operation.
|
||||
|
||||
Notes for optimizers
|
||||
Optimizers not aware of atomics can treat this like a nothrow call. It is
|
||||
also possible to move loads from after a Release store or read-modify-write
|
||||
operation to before it, and move non-Release stores from after an Release
|
||||
operation to before it.
|
||||
|
||||
Notes for code generation
|
||||
See the section on Acquire; a fence before the relevant operation is usually
|
||||
sufficient for Release. Note that a store-store fence is not sufficient to
|
||||
implement Release semantics; store-store fences are generally not exposed to
|
||||
IR because they are extremely difficult to use correctly.
|
||||
|
||||
AcquireRelease
|
||||
--------------
|
||||
|
||||
AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release
|
||||
barrier (for fences and operations which both read and write memory).
|
||||
|
||||
Relevant standard
|
||||
This corresponds to the C++0x/C1x ``memory_order_acq_rel``.
|
||||
|
||||
Notes for frontends
|
||||
If you are writing a frontend which uses this directly, use with caution.
|
||||
Acquire only provides a semantic guarantee when paired with a Release
|
||||
operation, and vice versa.
|
||||
|
||||
Notes for optimizers
|
||||
In general, optimizers should treat this like a nothrow call; the the possible
|
||||
optimizations are usually not interesting.
|
||||
|
||||
Notes for code generation
|
||||
This operation has Acquire and Release semantics; see the sections on Acquire
|
||||
and Release.
|
||||
|
||||
SequentiallyConsistent
|
||||
----------------------
|
||||
|
||||
SequentiallyConsistent (``seq_cst`` in IR) provides Acquire semantics for loads
|
||||
and Release semantics for stores. Additionally, it guarantees that a total
|
||||
ordering exists between all SequentiallyConsistent operations.
|
||||
|
||||
Relevant standard
|
||||
This corresponds to the C++0x/C1x ``memory_order_seq_cst``, Java volatile, and
|
||||
the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.
|
||||
|
||||
Notes for frontends
|
||||
If a frontend is exposing atomic operations, these are much easier to reason
|
||||
about for the programmer than other kinds of operations, and using them is
|
||||
generally a practical performance tradeoff.
|
||||
|
||||
Notes for optimizers
|
||||
Optimizers not aware of atomics can treat this like a nothrow call. For
|
||||
SequentiallyConsistent loads and stores, the same reorderings are allowed as
|
||||
for Acquire loads and Release stores, except that SequentiallyConsistent
|
||||
operations may not be reordered.
|
||||
|
||||
Notes for code generation
|
||||
SequentiallyConsistent loads minimally require the same barriers as Acquire
|
||||
operations and SequentiallyConsistent stores require Release
|
||||
barriers. Additionally, the code generator must enforce ordering between
|
||||
SequentiallyConsistent stores followed by SequentiallyConsistent loads. This
|
||||
is usually done by emitting either a full fence before the loads or a full
|
||||
fence after the stores; which is preferred varies by architecture.
|
||||
|
||||
Atomics and IR optimization
|
||||
===========================
|
||||
|
||||
Predicates for optimizer writers to query:
|
||||
|
||||
* ``isSimple()``: A load or store which is not volatile or atomic. This is
|
||||
what, for example, memcpyopt would check for operations it might transform.
|
||||
|
||||
* ``isUnordered()``: A load or store which is not volatile and at most
|
||||
Unordered. This would be checked, for example, by LICM before hoisting an
|
||||
operation.
|
||||
|
||||
* ``mayReadFromMemory()``/``mayWriteToMemory()``: Existing predicate, but note
|
||||
that they return true for any operation which is volatile or at least
|
||||
Monotonic.
|
||||
|
||||
* Alias analysis: Note that AA will return ModRef for anything Acquire or
|
||||
Release, and for the address accessed by any Monotonic operation.
|
||||
|
||||
To support optimizing around atomic operations, make sure you are using the
|
||||
right predicates; everything should work if that is done. If your pass should
|
||||
optimize some atomic operations (Unordered operations in particular), make sure
|
||||
it doesn't replace an atomic load or store with a non-atomic operation.
|
||||
|
||||
Some examples of how optimizations interact with various kinds of atomic
|
||||
operations:
|
||||
|
||||
* ``memcpyopt``: An atomic operation cannot be optimized into part of a
|
||||
memcpy/memset, including unordered loads/stores. It can pull operations
|
||||
across some atomic operations.
|
||||
|
||||
* LICM: Unordered loads/stores can be moved out of a loop. It just treats
|
||||
monotonic operations like a read+write to a memory location, and anything
|
||||
stricter than that like a nothrow call.
|
||||
|
||||
* DSE: Unordered stores can be DSE'ed like normal stores. Monotonic stores can
|
||||
be DSE'ed in some cases, but it's tricky to reason about, and not especially
|
||||
important.
|
||||
|
||||
* Folding a load: Any atomic load from a constant global can be constant-folded,
|
||||
because it cannot be observed. Similar reasoning allows scalarrepl with
|
||||
atomic loads and stores.
|
||||
|
||||
Atomics and Codegen
|
||||
===================
|
||||
|
||||
Atomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes.
|
||||
On architectures which use barrier instructions for all atomic ordering (like
|
||||
ARM), appropriate fences are split out as the DAG is built.
|
||||
|
||||
The MachineMemOperand for all atomic operations is currently marked as volatile;
|
||||
this is not correct in the IR sense of volatile, but CodeGen handles anything
|
||||
marked volatile very conservatively. This should get fixed at some point.
|
||||
|
||||
Common architectures have some way of representing at least a pointer-sized
|
||||
lock-free ``cmpxchg``; such an operation can be used to implement all the other
|
||||
atomic operations which can be represented in IR up to that size. Backends are
|
||||
expected to implement all those operations, but not operations which cannot be
|
||||
implemented in a lock-free manner. It is expected that backends will give an
|
||||
error when given an operation which cannot be implemented. (The LLVM code
|
||||
generator is not very helpful here at the moment, but hopefully that will
|
||||
change.)
|
||||
|
||||
The implementation of atomics on LL/SC architectures (like ARM) is currently a
|
||||
bit of a mess; there is a lot of copy-pasted code across targets, and the
|
||||
representation is relatively unsuited to optimization (it would be nice to be
|
||||
able to optimize loops involving cmpxchg etc.).
|
||||
|
||||
On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores
|
||||
generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent
|
||||
fences generate an ``MFENCE``, other fences do not cause any code to be
|
||||
generated. cmpxchg uses the ``LOCK CMPXCHG`` instruction. ``atomicrmw xchg``
|
||||
uses ``XCHG``, ``atomicrmw add`` and ``atomicrmw sub`` use ``XADD``, and all
|
||||
other ``atomicrmw`` operations generate a loop with ``LOCK CMPXCHG``. Depending
|
||||
on the users of the result, some ``atomicrmw`` operations can be translated into
|
||||
operations like ``LOCK AND``, but that does not work in general.
|
||||
|
||||
On ARM, MIPS, and many other RISC architectures, Acquire, Release, and
|
||||
SequentiallyConsistent semantics require barrier instructions for every such
|
||||
operation. Loads and stores generate normal instructions. ``cmpxchg`` and
|
||||
``atomicrmw`` can be represented using a loop with LL/SC-style instructions
|
||||
which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX``
|
||||
on ARM, etc.). At the moment, the IR does not provide any way to represent a
|
||||
weak ``cmpxchg`` which would not require a loop.
|
Loading…
x
Reference in New Issue
Block a user