From 2eb93b3f03fb1167f7d40a1d5a17e84bb95d0510 Mon Sep 17 00:00:00 2001
From: Chandler Carruth
+ These intrinsic functions expand the "universal IR" of LLVM to represent + hardware constructs for atomic operations and memory synchronization. This + provides an interface to the hardware, not an interface to the programmer. It + is aimed at a low enough level to allow any programming models or APIs which + need atomic behaviors to map cleanly onto it. It is also modeled primarily on + hardware behavior. Just as hardware provides a "unviresal IR" for source + languages, it also provides a starting point for developing a "universal" + atomic operation and synchronization IR. +
++ These do not form an API such as high-level threading libraries, + software transaction memory systems, atomic primitives, and intrinsic + functionss as found in BSD, GNU libc, atomic_ops, APR, and other system and + application libraries. The hardware interface provided by LLVM should allow + a clean implementation of all of these APIs and parallel programming models. + No one model or paradigm should be selected above others unless the hardware + itself ubiquitously does so. +
++ This is an overloaded intrinsic. You can use llvm.atomic.lcs on any + integer bit width. Not all targets support all bit widths however. +
+declare i8 @llvm.atomic.lcs.i8.i8p.i8.i8( i8* <ptr>, i8 <cmp>, i8 <val> ) +declare i16 @llvm.atomic.lcs.i16.i16p.i16.i16( i16* <ptr>, i16 <cmp>, i16 <val> ) +declare i32 @llvm.atomic.lcs.i32.i32p.i32.i32( i32* <ptr>, i32 <cmp>, i32 <val> ) +declare i64 @llvm.atomic.lcs.i64.i64p.i64.i64( i64* <ptr>, i64 <cmp>, i64 <val> ) ++ +
+ This loads a value in shared memory and compares it to a given value. If they + are equal, it stores a new value into the shared memory. +
++ The llvm.atomic.lcs intrinsic takes three arguments. The result as + well as both cmp and val must be integer values with the + same bit width. The ptr argument must be a pointer to a value of + this integer type. While any bit width integer may be used, targets may only + lower representations they support in hardware. +
++ This entire intrinsic must be executed atomically. It first loads the value + in shared memory pointed to by ptr and compares it with the value + cmp. If they are equal, val is stored into the shared + memory. The loaded value is yielded in all cases. This provides the + equivalent of an atomic compare-and-swap operation within the SSA framework. +
++%ptr = malloc i32 + store i32 4, %ptr + +%val1 = add i32 4, 4 +%result1 = call i32 @llvm.atomic.lcs( i32* %ptr, i32 4, %val1 ) + ; yields {i32}:result1 = 4 +%stored1 = icmp eq i32 %result1, 4 ; yields {i1}:stored1 = true +%memval1 = load i32* %ptr ; yields {i32}:memval1 = 8 + +%val2 = add i32 1, 1 +%result2 = call i32 @llvm.atomic.lcs( i32* %ptr, i32 5, %val2 ) + ; yields {i32}:result2 = 8 +%stored2 = icmp eq i32 %result2, 5 ; yields {i1}:stored2 = false +%memval2 = load i32* %ptr ; yields {i32}:memval2 = 8 ++
+ This is an overloaded intrinsic. You can use llvm.atomic.ls on any + integer bit width. Not all targets support all bit widths however. +
+declare i8 @llvm.atomic.ls.i8.i8p.i8( i8* <ptr>, i8 <val> ) +declare i16 @llvm.atomic.ls.i16.i16p.i16( i16* <ptr>, i16 <val> ) +declare i32 @llvm.atomic.ls.i32.i32p.i32( i32* <ptr>, i32 <val> ) +declare i64 @llvm.atomic.ls.i64.i64p.i64( i64* <ptr>, i64 <val> ) ++ +
+ This intrinsic loads the value stored in shared memory at ptr and + yields the value from memory. It then stores the value in val in the + shared memory at ptr. +
++ The llvm.atomic.ls intrinsic takes two arguments. Both the + val argument and the result must be integers of the same bit width. + The first argument, ptr, must be a pointer to a value of this + integer type. The targets may only lower integer representations they + support. +
++ This intrinsic loads the value pointed to by ptr, yields it, and + stores val back into ptr atomically. This provides the + equivalent of an atomic swap operation within the SSA framework. +
++%ptr = malloc i32 + store i32 4, %ptr + +%val1 = add i32 4, 4 +%result1 = call i32 @llvm.atomic.ls( i32* %ptr, i32 %val1 ) + ; yields {i32}:result1 = 4 +%stored1 = icmp eq i32 %result1, 4 ; yields {i1}:stored1 = true +%memval1 = load i32* %ptr ; yields {i32}:memval1 = 8 + +%val2 = add i32 1, 1 +%result2 = call i32 @llvm.atomic.ls( i32* %ptr, i32 %val2 ) + ; yields {i32}:result2 = 8 +%stored2 = icmp eq i32 %result2, 8 ; yields {i1}:stored2 = true +%memval2 = load i32* %ptr ; yields {i32}:memval2 = 2 ++
+ This is an overloaded intrinsic. You can use llvm.atomic.las on any + integer bit width. Not all targets support all bit widths however. +
+declare i8 @llvm.atomic.las.i8.i8p.i8( i8* <ptr>, i8 <delta> ) +declare i16 @llvm.atomic.las.i16.i16p.i16( i16* <ptr>, i16 <delta> ) +declare i32 @llvm.atomic.las.i32.i32p.i32( i32* <ptr>, i32 <delta> ) +declare i64 @llvm.atomic.las.i64.i64p.i64( i64* <ptr>, i64 <delta> ) ++ +
+ This intrinsic adds delta to the value stored in shared memory at + ptr. It yields the original value at ptr. +
++ The intrinsic takes two arguments, the first a pointer to an integer value + and the second an integer value. The result is also an integer value. These + integer types can have any bit width, but they must all have the same bit + width. The targets may only lower integer representations they support. +
++ This intrinsic does a series of operations atomically. It first loads the + value stored at ptr. It then adds delta, stores the result + to ptr. It yields the original value stored at ptr. +
++%ptr = malloc i32 + store i32 4, %ptr +%result1 = call i32 @llvm.atomic.las( i32* %ptr, i32 4 ) + ; yields {i32}:result1 = 4 +%result2 = call i32 @llvm.atomic.las( i32* %ptr, i32 2 ) + ; yields {i32}:result2 = 8 +%result3 = call i32 @llvm.atomic.las( i32* %ptr, i32 5 ) + ; yields {i32}:result3 = 10 +%memval = load i32* %ptr ; yields {i32}:memval1 = 15 ++
+ This is an overloaded intrinsic. You can use llvm.atomic.lss on any + integer bit width. Not all targets support all bit widths however. +
+declare i8 @llvm.atomic.lss.i8.i8.i8( i8* <ptr>, i8 <delta> ) +declare i16 @llvm.atomic.lss.i16.i16.i16( i16* <ptr>, i16 <delta> ) +declare i32 @llvm.atomic.lss.i32.i32.i32( i32* <ptr>, i32 <delta> ) +declare i64 @llvm.atomic.lss.i64.i64.i64( i64* <ptr>, i64 <delta> ) ++ +
+ This intrinsic subtracts delta from the value stored in shared + memory at ptr. It yields the original value at ptr. +
++ The intrinsic takes two arguments, the first a pointer to an integer value + and the second an integer value. The result is also an integer value. These + integer types can have any bit width, but they must all have the same bit + width. The targets may only lower integer representations they support. +
++ This intrinsic does a series of operations atomically. It first loads the + value stored at ptr. It then subtracts delta, + stores the result to ptr. It yields the original value stored + at ptr. +
++%ptr = malloc i32 + store i32 32, %ptr +%result1 = call i32 @llvm.atomic.lss( i32* %ptr, i32 4 ) + ; yields {i32}:result1 = 32 +%result2 = call i32 @llvm.atomic.lss( i32* %ptr, i32 2 ) + ; yields {i32}:result2 = 28 +%result3 = call i32 @llvm.atomic.lss( i32* %ptr, i32 5 ) + ; yields {i32}:result3 = 26 +%memval = load i32* %ptr ; yields {i32}:memval1 = 21 ++
+
+declare void @llvm.memory.barrier( i1 <ll>, i1 <ls>, i1 <sl>, i1 <ss> ) ++ +
+ The llvm.memory.barrier intrinsic guarantees ordering between + specific pairs of memory access types. +
++ The llvm.memory.barrier intrinsic requires four boolean arguments. + Each argument enables a specific barrier as listed below. +
+ This intrinsic causes the system to enforce some ordering constraints upon + the loads and stores of the program. This barrier does not indicate + when any events will occur, it only enforces an order in + which they occur. For any of the specified pairs of load and store operations + (f.ex. load-load, or store-load), all of the first operations preceding the + barrier will complete before any of the second operations succeeding the + barrier begin. Specifically the semantics for each pairing is as follows: +
+%ptr = malloc i32 + store i32 4, %ptr + +%result1 = load i32* %ptr ; yields {i32}:result1 = 4 + call void @llvm.memory.barrier( i1 false, i1 true, i1 false, i1 false ) + ; guarantee the above finishes + store i32 8, %ptr ; before this begins ++