C++ Atomic Types and Operations

ISO/IEC JTC1 SC22 WG21 N2145 = 07-0005 - 2007-01-12

Hans-J. Boehm, Hans.Boehm@hp.com, boehm@acm.org
Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org

Introduction

We present an interface and minimal implementation for standard atomic types and operations.

Rationale

The standard needs atomic types for two reasons.

Lock-free concurrent data structures
Lock-free concurrent data structures are difficult to design and implement correctly. As such, there is a significant need for programmers capable of doing so to write portably, and for that they need standards.
Inter-thread memory synchronization
Ocassionally synchronization with locks offers insufficient performance, and other, more specific idioms suffice.

The traditional shared-memory notion that every store is instantly visible to all threads

The traditional notion that the memory operations seen by a single shared memory is just an interleaving of the actions performed by each thread induces an unacceptable performance loss on modern systems. Therefore programmers must have a mechanism to indicate when stores in one thread should be communicated to another. This mechanism must necessarily be atomic, and we use the proposed interface to provide that mechanism.

Specifically, a program that wishes to communicate the fact that a particular piece of data prepared by one thread is ready to be examined by another thread, needs a shared variable flag, that both

Although the second aspect is often glossed over, it is usually not automatic with modern hardware and compilers, and is just as important as the first in ensuring correctness.

Prior Approaches

N1875 presented an atomic statement, which would require the compiler to select the appropriate atomic operations. We chose to abandon this approach because:

N2047 presented a template-based C++ library of atomic types layered on top of a C library of atomic operations on plain integral types. We chose to abandon this approach because:

Current Approach

We propose to add atomic types and operations purely as a library API. In practice, this API would have to be implemented largely with either compiler intrinsics or assembly code. (As such, this proposal should be implemented by compiler vendors, not library vendors, much as the exception facilities are implemented by compiler vendors.)

The proposal defines atomic types and defines all atomic operations on those types, which enables enforcement of a single synchronization protocol for all operations on an object.

The proposal defines the types as POD structs and the core functions on pointers to those structs, so that types and core functions are usable from both C and C++. That is, a header included from both C and C++ can declare a variable of an atomic type and provide inline functions that operate on them. The proposal additionally provides member operators so that C++ programmers may use a more concise syntax.

The proposal defines the core functions as overloaded functions in C++ and as type-generic macros in C. This approach helps programmers avoid changing code when an atomic type changes size.

The proposal defines additional template types to aid type-safe use of atomic pointers and to simplify the wrapping of user-defined types.

The proposal defines feature queries to determine whether or not a type's operations are lock-free. In some cases, both the decision to use a lock-free algorithm, and sometimes the choice of lock-free algorithm, depends on the availability of underlying hardware primitives. In other cases, e.g. when dealing with asynchronous signals, it may be important to know that operations like compare-and-swap are really lock-free, because a lock-based emulation might result in deadlock. We provide two kinds of feature queries, compile-time preprocessor macros and run-time functions. The former provide general characteristics of the implementation. The latter provide per-type information.

To facilitate inter-process communication via shared memory, it is our intent that lock-free operations also be address-free. That is, atomic operations on the same memory location via two different addresses will communicate atomically. The implementation may not depend on any per-process state. While such a definition is beyond the scope of the standard, a clear statement of our intent will enable a portable expression of class of a programs already extant.

Memory Model Interaction

Synchronization operations in the memory model (N2052, N2138) may be either acquire or release operations, or both. These operations govern the communication of non-atomic stores between threads. A release operation ensures that prior memory operations become visible to the thread performing subsequent acquire operation on the same object.

Rather than have atomic operations implicitly provide both acquire and release semantics, and thus be fully ordered, we choose to complicate the interface by adding explicit ordering specifications to various operations. Many comparable packages do not, and instead provide only a single version of operations, like compare-and-swap, which implicitly include a full memory fence.

Unfortunately, the extra ordering constraint introduced by the single version is almost never completely necessary. For example, an atomic increment operation may be used simply to count the number of times a function is called, as in a profiler. This requires atomicity, but no ordering constraints. And on many architectures (PowerPC, ARM, Itanium, Alpha, though not X86), the extra ordering constraints are at least moderately expensive.

It is also unclear how a convention requiring full memory fences would properly interact with an interface that supported simple atomic loads and stores. Here a full memory fence would generally multiply the cost by a large factor. (The current gcc interface does not seem to support simple atomic loads and stores explicitly, which makes it unclear how to support e.g. lock-based emulation, or architectures on which the relevant loads and stores are not implicitly atomic.)

There are two possible approaches to specifying ordering constraints:

Both approaches appear to have their merits. We chose the latter for several reasons:

Some architectures provide fences that are limited to loads or stores. We have, so far, not included them, since it seems to be hard to find cases in which both:

Most architectures provide additional ordering guarantees if one memory operation is dependent on another. In fact, these are critical for efficient implementation of languages like Java. They are not reflected in our proposal (except for the "inter-thread-ordered-before" implications from the memory model).

In this case, there is near-universal agreement that it would be nice to have some such guarantees. The difficulty is that we have not been able to formulate such a guarantee that both makes sense at the C++ source level, and does not interfere with optimization of code in compilation units that do not mention atomics. The fundamental issues are:

Summary of Types and Operations

The proposal includes a very simple atomic flag type, providing two operations, test-and-set-acquire and clear-release. This type is the minimum hardware-implemented type needed to conform to this standard. The remaining types can be emulated with the atomic flag, though with less than ideal properties. Few programmers should be using this type.

The proposal includes several standard scalar atomic types, boolean, address, and many integral types. To support generic data and algorithms, the proposal also includes a generic atomic type, a partial specialization for pointers (based on the atomic address), and full specializations for integral types (based on the atomic scalars).

The proposal defines several atomic functions, which serve the requirements of both C and C++ programmers. Because these functions are clumsy and potentially unsafe, the proposal includes member operator and member function definitions that are syntactically simpler and semantically safer. In particular, they provide only the strongest memory synchronization.

functions methods operators
load store swp cmp
swp
fetch and swp cmp
swp
cvt = ++ -- += -= &= |= ^=
add sub and ior xor
type        
boolean Y Y Y Y - - - - - Y Y Y Y - - - - - - -
<generic> Y Y Y Y - - - - - Y Y Y Y - - - - - - -
address Y Y Y Y Y Y - - - Y Y Y Y - - Y Y - - -
<pointer> Y Y Y Y Y Y - - - Y Y Y Y Y Y Y Y - - -
integral Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
memory        
relaxed Y Y Y Y Y Y Y Y Y - - - - - - - - - - -
acquire Y - Y Y Y Y Y Y Y - - Y - - - - - - - -
release - Y Y Y Y Y Y Y Y - - - - - - - - - - -
ordered - Y Y Y Y Y Y Y Y Y Y - Y Y Y Y Y Y Y Y

The fetch-and-operation functions return the original stored value. This approach is required for fetch-and-inclusive-or and fetch-and-and because there is no means to compute to the original stored value. In contrast to the core functions, the modifying assignment operators return the new value. We do this for consistency with normal assignment operators. Unlike normal assignment operators, though, the atomic assignments return values rather than references. The reason is that another thread might intervene between an assignment and a subsequent read. Rather than introduce this classic parallel programming bug, we return a value.

The functions and operations are defined to work with volatile objects, so that variables that should be volatile can also be atomic. The volatile qualifier, however, is not required for atomicity.

The normal signed integral addition operations have undefined behavior in the event of an overflow. For atomic variables, this undefined behavior introduces significant burden. Rather than leave overflow undefined, we recognize the defacto behavior of modern systems and define atomic fetch-and-add (-sub) to use two's-complement arithmetic. We are aware of no implementation of C++ for which this definition is a problem.

Implementability

We believe that there is ample evidence for implementatability.

The Intel/gcc __sync intrinsics provide evidence for compiler implementability of the proposed interface.

(We do not advocate standardizing these intrisics as is. They provide far less control over memory ordering than we advocated above. For example, they provide no way to atomically increment a counter without imposing unnecessary ordering constraints. The lack of appropriate ordering control appears to already have resulted in implementation shortcuts, e.g. gcc does not implement __sync_synchronize() as a full memory barrier on X86, in spite of the documentation. We believe a number of issues were not fully understood when that design was developed, and it could could greatly benefit from another iteration at this stage.)

Other packages, particularly Boehm's atomic_ops package provide evidence of efficient implementability over a range of architectures.

This proposal includes atomic integers smaller than a machine word, even though many architectures do not have such operations. For machines that implement a word-based compare-and-swap operation, the effect of operations can be achieved by loading the containing word, modifying the sub-word in place, and performing a compare-and-swap on the containing word. In the event that no compare-and-swap is available, the implementation may need to either implement smaller types with larger types or use locking algorithms.

The remaining issue is the burden on implementors to produce a minimally conforming implementation. The minimum hardware support required is the atomic test-and-set and clear operations that form the basis of the atomic flag type. This proposal includes an example implementation based on that minimal hardware support, and thus shows the vendor work required.

Current Weakness

The POD definition of atomic types implies compiler-defined copy-assignment operators that are not atomic and should not be used.

Defining a private copy-assignment operator or defining a correct copy-assignment operator, under the 1998 language, loses the C compatibility guarantee because the type would no longer be a POD. This consequence is stronger than necessary because adding an assignment operator implies no change in representation. N2102 addresses this issue. Along those same lines, but unaddressed by N2102, is that defining an assignment operator loses the aggregate initialization syntax, which is necessary for C static initialization. N2100 may address this issue in its final form.

Likewise, the specializations of the generic template are derived classes, and as such are no longer PODs and no longer statically initializable. Static initialization is extremely important for concurrent programming as dynamic initialization tends to introduce race conditions. Again, N2102 addresses this issue.

While the interface does not define the name of fields, such a name could be determined by reading the header file. Adding a private label would disable PODness. Again, N2102 addresses this issue.

Remaining Issues

There is a general feeling that the fully ordered operations should yield a total store order. At present, it is unclear that hardware will support this definition. This property is also not emergent from an operation having both acquire and release semantics.

This proposal uses the term "relaxed" when an atomic operation does not have acquire, release, or ordered semantics. The term is not well-liked, but alternatives have problems. Suggestions are welcome.

The proposal does not standardize any flag wait functions, despite using one. This issue as that any function will often be inappropriate. Should the proposal define a set of wait functions for common situations? Example situations include a non-preemptive kernel execution and preemptive space execution.

There are 470 functions and 206 methods defined by this proposal. While the methods generally have trival implementations, the functions do not. Will vendors commit to implementing this large a definition? Note that both the prototype presented here and the atomic_ops package provide moderately compact implementations, in spite of the interface sizes.

The names of some functions are fairly verbose. Some shortening is available, though care must be taken to avoid inviting novices to use what they do not understand.

We have not implemented Herb Sutter's suggestion to put the ordered operations in their own sub-namespace. Given the verboseness of the current names, it seems redundant.

Notes on the Presentation

The proposal marks the defined interface with the <code> font tag, which typically renders in a teletype font.

The proposal marks the example implemenation with the <var> font tag within the <code> font tag, which typically renders in an italic teletype font. This example implementation is not part of the standard; it is evidence of implementability.

The embedded source is a bash script that generates the C and C++ source files. We have taken this approach because the definitions have a high degree of redundancy, which would otherwise interfere with the readability of the document.

To extract the bash script from the HTML source, use the following sed script. (The bash script will also generate the sed script.)


echo n2145.sed
cat <<EOF >n2145.sed

1,/<code>/		d
/<\/code>/,/<code>/	d
			s|<var>||g
			s|</var>||g
			s|&lt;|<|g
			s|&gt;|>|g
			s|&amp;|\&|g

EOF

To compile the enclosed sources and examples, use the following Makefile. (The bash script will also generate the Makefile.)


echo Makefile
cat <<EOF >Makefile

default : test

n2145.bash : n2145.html
	sed -f n2145.sed n2145.html > n2145.bash

stdatomic.h cstdatomic impatomic.h impatomic.c n2145.c : n2145.bash
	bash n2145.bash

impatomic.o : impatomic.h impatomic.c
	gcc -std=c99 -c impatomic.c

n2145.c.exe : n2145.c stdatomic.h impatomic.o
	gcc -std=c99 -o n2145.c.exe n2145.c impatomic.o

n2145.c++.exe : n2145.c stdatomic.h impatomic.o
	g++ -o n2145.c++.exe n2145.c impatomic.o

test : n2145.c.exe n2145.c++.exe

clean :
	rm -f n2145.bash stdatomic.h cstdatomic impatomic.h impatomic.c
	rm -f impatomic.o n2145.c.exe n2145.c++.exe

EOF

Interface and Implementation

This proposal embeds both the interface and an example, minimally-conforming, implementation. The implementation uses a hash table of flags and does busy waiting on the flags.

Implementation Files

As is common practice, we place the common portions of the C and C++ standard headers in a separate implementation header.

The implementation header includes standard headers to obtain basic typedefs.


echo impatomic.h includes
cat <<EOF >impatomic.h

#ifdef __cplusplus
#include <cstddef>
namespace std {
#else
#include <stddef.h>
#include <stdbool.h>
#endif

EOF

The corresponding implementation source includes the implementation header and stdint.h.


echo impatomic.c includes
cat <<EOF >impatomic.c

#include <stdint.h>
#include "impatomic.h"

EOF

Flag Type and Operations

The atomic flag is the minimum hardware-implemented type needed to conform to this standard. The remaining types can be emulated with the atomic flag, though with less than ideal properties.

To aid the emulated implementation, the example implementation includes a predefined hashtable of locks implemented via flags.


echo impatomic.h flag
cat <<EOF >>impatomic.h

typedef struct atomic_flag
{
#ifdef __cplusplus
	bool test_and_set() volatile;
	void clear() volatile;
#endif
	bool __f__;
} atomic_flag;

#define ATOMIC_FLAG_INIT { false }

#ifdef __cplusplus
extern "C" {
#endif

extern void atomic_flag_clear_release( atomic_flag volatile * __a__ );
extern bool atomic_flag_test_and_set_acquire( atomic_flag volatile * __a__ );
extern void __atomic_flag_wait_acquire__( atomic_flag volatile * __a__ );
extern atomic_flag volatile * __atomic_flag_for_address__( void volatile * __z__ )
__attribute__((const));

#ifdef __cplusplus
}
#endif

#ifdef __cplusplus

inline bool atomic_flag::test_and_set() volatile
{ return atomic_flag_test_and_set_acquire( this ); }

inline void atomic_flag::clear() volatile
{ atomic_flag_clear_release( this ); }

#endif

EOF

The clear and test-and-set operations must be lock-free, and hence address-free. The wait operation may be implemented with busy-waiting, and hence must be used only with care.

The for_address function returns the address of a flag. Multiple argument values may yield a single flag, and the implementation of locks may use these flags, so no operation should attempt to hold any flag or lock while holding a flag for an address.

The prototype implementation of flags uses the __sync macros from the GNU C/C++ compiler, when available, and otherwise uses a non-atomic implementation with the expectation that vendors will replace it. It might even be implemented with, for example, pthread_mutex_trylock, in which case the internal flag wait function might just be pthread_mutex_lock. This would of course tend to make atomic_flag larger than necessary.

The prototype implementation of flags is implemented in C.


echo impatomic.c flag
cat <<EOF >>impatomic.c

#if defined(__GNUC__)
#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 0)
#define USE_SYNC
#endif
#endif

void atomic_flag_clear_release( atomic_flag volatile * __a__ )
{
#ifdef USE_SYNC
    __sync_lock_release( &(__a__->__f__) );
#else
    __a__->__f__ = false;
#endif
} 

bool atomic_flag_test_and_set_acquire( atomic_flag volatile * __a__ )
{
#ifdef USE_SYNC
    return __sync_lock_test_and_set( &(__a__->__f__), 1 );
#else
    bool result = __a__->__f__;
    __a__->__f__ = true;
    return result;
#endif
}

Note that the following implementation of wait is almost always wrong -- it has high contention. Some form of exponential backoff prevents excessive contention.


void __atomic_flag_wait_acquire__( atomic_flag volatile * __a__ )
{ while ( atomic_flag_test_and_set_acquire( __a__ ) ); }

#define LOGSIZE 4

static atomic_flag volatile __atomic_flag_anon_table__[ 1 << LOGSIZE ] =
{
    ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT,
    ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT,
    ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT,
    ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT, ATOMIC_FLAG_INIT,
};

atomic_flag volatile * __atomic_flag_for_address__( void volatile * __z__ )
{
    uintptr_t __u__ = (uintptr_t)__z__;
    __u__ += (__u__ >> 2) + (__u__ << 4);
    __u__ += (__u__ >> 7) + (__u__ << 5);
    __u__ += (__u__ >> 17) + (__u__ << 13);
    if ( sizeof(uintptr_t) > 4 ) __u__ += (__u__ >> 31);
    __u__ &= ~((~(uintptr_t)0) << LOGSIZE);
    return __atomic_flag_anon_table__ + __u__;
}

EOF

Implementation Macros

The remainder of the example implementation uses the following macros. These macros exploit GNU extensions for value-returning blocks (AKA statement expressions) and __typeof__.

The macros rely on data fields of atomic structs being named __f__.


echo impatomic.h macros implementation
cat <<EOF >>impatomic.h

#define _ATOMIC_LOAD_( __a__ ) \\
({ __typeof__((__a__)->__f__) volatile * __p__ = &((__a__)->__f__); \\
   atomic_flag volatile * __g__ = __atomic_flag_for_address__( __p__ ); \\
   __atomic_flag_wait_acquire__( __g__ ); \\
   __typeof__((__a__)->__f__) __r__ = *__p__; \\
   atomic_flag_clear_release( __g__ ); \\
   __r__; })

#define _ATOMIC_STORE_( __a__, __m__ ) \\
({ __typeof__((__a__)->__f__) volatile * __p__ = &((__a__)->__f__); \\
   __typeof__(__m__) __v__ = (__m__); \\
   atomic_flag volatile * __g__ = __atomic_flag_for_address__( __p__ ); \\
   __atomic_flag_wait_acquire__( __g__ ); \\
   *__p__ = __v__; \\
   atomic_flag_clear_release( __g__ ); \\
   __v__; })

#define _ATOMIC_MODIFY_( __a__, __o__, __m__ ) \\
({ __typeof__((__a__)->__f__) volatile * __p__ = &((__a__)->__f__); \\
   __typeof__(__m__) __v__ = (__m__); \\
   atomic_flag volatile * __g__ = __atomic_flag_for_address__( __p__ ); \\
   __atomic_flag_wait_acquire__( __g__ ); \\
   __typeof__((__a__)->__f__) __r__ = *__p__; \\
   *__p__ __o__ __v__; \\
   atomic_flag_clear_release( __g__ ); \\
   __r__; })

#define _ATOMIC_CMPSWP_( __a__, __e__, __m__ ) \\
({ __typeof__((__a__)->__f__) volatile * __p__ = &((__a__)->__f__); \\
   __typeof__(__e__) __q__ = (__e__); \\
   __typeof__(__m__) __v__ = (__m__); \\
   bool __r__; \\
   atomic_flag volatile * __g__ = __atomic_flag_for_address__( __p__ ); \\
   __atomic_flag_wait_acquire__( __g__ ); \\
   __typeof__((__a__)->__f__) __t__ = *__p__; \\
   if ( __t__ == *__q__ ) { *__p__ = __v__; __r__ = true; } \\
   else { *__q__ = __t__; __r__ = false; } \\
   atomic_flag_clear_release( __g__ ); \\
   __r__; })

EOF

Feature Macros

To facilitate optimal storage use, the proposal supplies a feature macro that indicates general lock-free status of scalar atomic types. A value of 0 indicates that the scalar types are never lock-free. A value of 1 indicates that the scalar types are sometimes lock-free. A value of 2 indicates that the scalar types are always lock-free.


echo impatomic.h macros feature
cat <<EOF >>impatomic.h

#define ATOMIC_SCALAR_LOCK_FREE 0

EOF

(This could be two boolean macros. Comments?)

Types

The standard defines atomic types corresponding to booleans, addresses, integers, and for C++, wider characters. These atomic types are defined in terms of a base type.

The base types have two names in this proposal, a short name usually embedded within other identifiers, and a long name for the base type. The mapping between them is as follows.


bool="bool"
address="void *"

INTEGERS="char schar uchar short ushort int uint long ulong llong ullong"
char="char"
schar="signed char"
uchar="unsigned char"
short="short"
ushort="unsigned short"
int="int"
uint="unsigned int"
long="long"
ulong="unsigned long"
llong="long long"
ullong="unsigned long long"

CHARACTERS="wchar_t"
wchar_t="wchar_t"

In addition to types, some operations also need two names, one for embedding within other identifiers, and one consisting of the operator.


ADR_OPERATIONS="add sub"
INT_OPERATIONS="add sub and ior xor"
add="+"
sub="-"
and="&"
ior="|"
xor="^"

Boolean


echo impatomic.h type boolean
cat <<EOF >>impatomic.h

typedef struct atomic_bool
{
#ifdef __cplusplus
	bool lock_free() volatile;
	bool operator =( bool __v__ ) volatile;
	operator bool() volatile;
	bool swap( bool __v__ ) volatile;
	bool compare_swap( bool * __r__, bool __v__ ) volatile;
#endif
	bool __f__;
} atomic_bool;

EOF

Address


echo impatomic.h type address
cat <<EOF >>impatomic.h

typedef struct atomic_address
{
#ifdef __cplusplus
	bool lock_free() volatile;
	void * operator =( void * __v__ ) volatile;
	operator void *() volatile;
	void * operator +=( ptrdiff_t __v__ ) volatile;
	void * operator -=( ptrdiff_t __v__ ) volatile;
	void * swap( void * __v__ ) volatile;
	bool compare_swap( void * * __r__, void * __v__ ) volatile;
#endif
	void * __f__;
} atomic_address;

EOF

Integers


echo impatomic.h type integers
for TYPEKEY in ${INTEGERS}
do
TYPENAME=${!TYPEKEY}
cat <<EOF >>impatomic.h

typedef struct atomic_${TYPEKEY}
{
#ifdef __cplusplus
	bool lock_free() volatile;
	${TYPENAME} operator =( ${TYPENAME} __v__ ) volatile;
	operator ${TYPENAME}() volatile;
	${TYPENAME} operator ++( int ) volatile;
	${TYPENAME} operator --( int ) volatile;
	${TYPENAME} operator ++() volatile;
	${TYPENAME} operator --() volatile;
	${TYPENAME} operator +=( ${TYPENAME} __v__ ) volatile;
	${TYPENAME} operator -=( ${TYPENAME} __v__ ) volatile;
	${TYPENAME} operator &=( ${TYPENAME} __v__ ) volatile;
	${TYPENAME} operator |=( ${TYPENAME} __v__ ) volatile;
	${TYPENAME} operator ^=( ${TYPENAME} __v__ ) volatile;
	${TYPENAME} swap( ${TYPENAME} __v__ ) volatile;
	bool compare_swap( ${TYPENAME} * __r__, ${TYPENAME} __v__ ) volatile;
#endif
	${TYPENAME} __f__;
} atomic_${TYPEKEY};

EOF
done

Integer Typedefs

The following typedefs support atomic versons of the cstdint and stdint.h types.


echo impatomic.h typedefs integers
cat <<EOF >>impatomic.h

typedef atomic_schar atomic_int_least8_t;
typedef atomic_uchar atomic_uint_least8_t;
typedef atomic_short atomic_int_least16_t;
typedef atomic_ushort atomic_uint_least16_t;
typedef atomic_int atomic_int_least32_t;
typedef atomic_uint atomic_uint_least32_t;
typedef atomic_llong atomic_int_least64_t;
typedef atomic_ullong atomic_uint_least64_t;

typedef atomic_schar atomic_int_fast8_t;
typedef atomic_uchar atomic_uint_fast8_t;
typedef atomic_short atomic_int_fast16_t;
typedef atomic_ushort atomic_uint_fast16_t;
typedef atomic_int atomic_int_fast32_t;
typedef atomic_uint atomic_uint_fast32_t;
typedef atomic_llong atomic_int_fast64_t;
typedef atomic_ullong atomic_uint_fast64_t;

typedef atomic_long atomic_intptr_t;
typedef atomic_ulong atomic_uintptr_t;

typedef atomic_long atomic_ssize_t;
typedef atomic_ulong atomic_size_t;

typedef atomic_long atomic_ptrdiff_t;

typedef atomic_llong atomic_intmax_t;
typedef atomic_ullong atomic_uintmax_t;

EOF

Characters


echo impatomic.h type characters
cat <<EOF >>impatomic.h

#ifdef __cplusplus

EOF

for TYPEKEY in ${CHARACTERS}
do
TYPENAME=${!TYPEKEY}
cat <<EOF >>impatomic.h

typedef struct atomic_${TYPEKEY}
{
#ifdef __cplusplus
	bool lock_free() volatile;
	${TYPENAME} operator =( ${TYPENAME} __v__ ) volatile;
	operator ${TYPENAME}() volatile;
	${TYPENAME} operator ++( int ) volatile;
	${TYPENAME} operator --( int ) volatile;
	${TYPENAME} operator ++() volatile;
	${TYPENAME} operator --() volatile;
	${TYPENAME} operator +=( ${TYPENAME} __v__ ) volatile;
	${TYPENAME} operator -=( ${TYPENAME} __v__ ) volatile;
	${TYPENAME} operator &=( ${TYPENAME} __v__ ) volatile;
	${TYPENAME} operator |=( ${TYPENAME} __v__ ) volatile;
	${TYPENAME} operator ^=( ${TYPENAME} __v__ ) volatile;
	${TYPENAME} swap( ${TYPENAME} __v__ ) volatile;
	bool compare_swap( ${TYPENAME} * __r__, ${TYPENAME} __v__ ) volatile;
#endif
	${TYPENAME} __f__;
} atomic_${TYPEKEY};

EOF
done

cat <<EOF >>impatomic.h

#else

typedef atomic_int atomic_wchar_t;

#endif

EOF

Generic Type

The primary problem with a generic atomic template is that effective use of machine operations requires three properties of their parameter types:

These properties are subsumed by the 1998 definition of POD, but that definition is more restrictive than required.

In the present definition, there is no mechanism to enforce the properties parameter type. Roland Schwarz suggests using a template union to enfore POD parameter types. Unfortunately, that approach also prevents the derivation of specializations of atomic for the types above, which is unacceptable.

The intent is that vendors will specialize a fully-general locking implementation of a generic atomic template with implementations using hardware primitives when those primitives are applicable. This specialization can be accomplished by defining a base template with the size and alignment of the parameter type as additional template parameters, and then specializing on those two arguments.


echo impatomic.h type generic
cat <<EOF >>impatomic.h

#ifdef __cplusplus

template< typename T > struct atomic
{
	bool lock_free() volatile;
	operator T () volatile;
	T operator =( T __v__ ) volatile;
	T swap( T __v__ ) volatile;
	bool compare_swap( T * __r__, T __v__ ) volatile;
private:
	T __f__;
};

#endif
EOF

Pointer Partial Specialization


echo impatomic.h type pointer
cat <<EOF >>impatomic.h

#ifdef __cplusplus

template< typename T > struct atomic< T * > : atomic_address
{
	bool lock_free() volatile;
	operator T *() volatile;
	T operator =( T * __v__ ) volatile;
	T * operator ++( int ) volatile;
	T * operator --( int ) volatile;
	T * operator ++() volatile;
	T * operator --() volatile;
	T * operator +=( ptrdiff_t __v__ ) volatile;
	T * operator -=( ptrdiff_t __v__ ) volatile;
	T * swap( T * __v__ ) volatile;
	bool compare_swap( T * * __r__, T * __v__ ) volatile;
};

#endif
EOF

Full Specializations

We provide full specializations of the generic atomic for integers and characters. These specializations derive from the specific atomic types to enable implicit reference conversions. The implicit assignment of the derived class prevents inheriting the base class assignments, and so the assignment must be explicitly redeclared.


echo impatomic.h type specializations
cat <<EOF >>impatomic.h

#ifdef __cplusplus

EOF

for TYPEKEY in bool address ${INTEGERS} ${CHARACTERS}
do
TYPENAME=${!TYPEKEY}
cat <<EOF >>impatomic.h

template<> struct atomic< ${TYPENAME} > : atomic_${TYPEKEY}
{
	${TYPENAME} operator =( ${TYPENAME} __v__ ) volatile;
};

EOF
done

cat <<EOF >>impatomic.h

#endif

EOF

Core Functions

The following functions provide the core interface. They are load, store, swap, compare-and-swap, fetch-and-add, fetch-and-subtract, fetch-and-and, fetch-and-inclusive-or, and fetch-and-exclusive-or. These operations are further specialized for the various synchronization operations that apply.

In C++, these operations are implemented as overloaded functions.


cat <<EOF >>impatomic.h

#ifdef __cplusplus

EOF

echo impatomic.h functions ordinary basic
for TYPEKEY in bool address ${INTEGERS} ${CHARACTERS}
do
TYPENAME=${!TYPEKEY}
cat <<EOF >>impatomic.h

inline bool atomic_lock_free( atomic_${TYPEKEY} volatile * __a__ )
{ return false; }

EOF

for READ in relaxed acquire
do

cat <<EOF >>impatomic.h

inline ${TYPENAME} atomic_load_${READ}( atomic_${TYPEKEY} volatile * __a__ )
{ return _ATOMIC_LOAD_( __a__ ); }

EOF
done

for WRITE in relaxed release ordered 
do
cat <<EOF >>impatomic.h

inline ${TYPENAME} atomic_store_${WRITE}
( atomic_${TYPEKEY} volatile * __a__, ${TYPENAME} __m__ )
{ return _ATOMIC_STORE_( __a__, __m__ ); }

EOF
done

for MODIFY in relaxed acquire release ordered 
do
cat <<EOF >>impatomic.h

inline ${TYPENAME} atomic_swap_${MODIFY}
( atomic_${TYPEKEY} volatile * __a__, ${TYPENAME} __m__ )
{ return _ATOMIC_MODIFY_( __a__, =, __m__ ); }

inline bool atomic_compare_swap_${MODIFY}
( atomic_${TYPEKEY} volatile * __a__, ${TYPENAME} * __e__, ${TYPENAME} __m__ )
{ return _ATOMIC_CMPSWP_( __a__, __e__, __m__ ); }

EOF
done
done

echo impatomic.h functions template basic
cat <<EOF >>impatomic.h

template< typename T >
inline bool atomic_lock_free( atomic<T> volatile * __a__ )
{ return false; }

EOF

for READ in relaxed acquire
do

cat <<EOF >>impatomic.h

template< typename T >
inline T atomic_load_${READ}( atomic<T> volatile * __a__ )
{ return _ATOMIC_LOAD_( __a__ ); }

template< typename T >
inline T * atomic_load_${READ}( atomic<T *> volatile * __a__ )
{ return static_cast<T *>( _ATOMIC_LOAD_( __a__ ) ); }

EOF
done

for WRITE in relaxed release ordered 
do
cat <<EOF >>impatomic.h

template< typename T >
inline T atomic_store_${WRITE}
( atomic<T> volatile * __a__, T __m__ )
{ return _ATOMIC_STORE_( __a__, __m__ ); }

EOF
done

for MODIFY in relaxed acquire release ordered 
do
cat <<EOF >>impatomic.h

template< typename T >
inline T atomic_swap_${MODIFY}
( atomic<T> volatile * __a__, T __m__ )
{ return _ATOMIC_MODIFY_( __a__, =, __m__ ); }

template< typename T >
inline T * atomic_swap_${MODIFY}
( atomic<T *> volatile * __a__, T * __m__ )
{ return _ATOMIC_MODIFY_( __a__, =, __m__ ); }

template< typename T >
inline bool atomic_compare_swap_${MODIFY}
( atomic<T> volatile * __a__, T * __e__, T __m__ )
{ return _ATOMIC_CMPSWP_( __a__, __e__, __m__ ); }

template< typename T >
inline bool atomic_compare_swap_${MODIFY}
( atomic<T *> volatile * __a__, T * * __e__, T * __m__ )
{ return _ATOMIC_CMPSWP_( __a__, (void * * )__e__, __m__ ); }

EOF
done

echo impatomic.h functions address fetch
TYPEKEY=address
TYPENAME=${!TYPEKEY}

for FNKEY in ${ADR_OPERATIONS}
do
OPERATOR=${!FNKEY}

for MODIFY in relaxed acquire release ordered 
do
cat <<EOF >>impatomic.h

inline ${TYPENAME} atomic_fetch_${FNKEY}_${MODIFY}
( atomic_${TYPEKEY} volatile * __a__, ptrdiff_t __m__ )
{ ${TYPENAME} volatile * __p__ = &((__a__)->__f__);
  atomic_flag volatile * __g__ = __atomic_flag_for_address__( __p__ );
  __atomic_flag_wait_acquire__( __g__ );
  ${TYPENAME} __r__ = *__p__;
  *__p__ = (${TYPENAME})((char *)(*__p__) ${OPERATOR} __m__);
  atomic_flag_clear_release( __g__ );
  return __r__; }

EOF
done
done

echo impatomic.h functions integer fetch
for TYPEKEY in ${INTEGERS} ${CHARACTERS}
do
TYPENAME=${!TYPEKEY}

for FNKEY in ${INT_OPERATIONS}
do
OPERATOR=${!FNKEY}

for MODIFY in relaxed acquire release ordered 
do
cat <<EOF >>impatomic.h

inline ${TYPENAME} atomic_fetch_${FNKEY}_${MODIFY}
( atomic_${TYPEKEY} volatile * __a__, ${TYPENAME} __m__ )
{ return _ATOMIC_MODIFY_( __a__, ${OPERATOR}=, __m__ ); }

EOF
done
done
done

For C, we need type-generic macros.


cat <<EOF >>impatomic.h

#else

EOF

echo impatomic.h type-generic macros basic
cat <<EOF >>impatomic.h

#define atomic_lock_free( __a__ ) \\
false

EOF

for READ in relaxed acquire
do
cat <<EOF >>impatomic.h

#define atomic_load_${READ}( __a__ ) \\
_ATOMIC_LOAD_( __a__ )

EOF
done

for WRITE in relaxed release ordered 
do
cat <<EOF >>impatomic.h

#define atomic_store_${WRITE}( __a__, __m__ ) \\
_ATOMIC_STORE_( __a__, __m__ )

EOF
done

for MODIFY in relaxed acquire release ordered 
do
cat <<EOF >>impatomic.h

#define atomic_swap_${MODIFY}( __a__, __m__ ) \\
_ATOMIC_MODIFY_( __a__, =, __m__ )

#define atomic_compare_swap_${MODIFY}( __a__, __e__, __m__ ) \\
_ATOMIC_CMPSWP_( __a__, __e__, __m__ )

EOF
done

echo impatomic.h type-generic macros fetch
for FNKEY in ${INT_OPERATIONS}
do
OPERATOR=${!FNKEY}

for MODIFY in relaxed acquire release ordered 
do
cat <<EOF >>impatomic.h

#define atomic_fetch_${FNKEY}_${MODIFY}( __a__, __m__ ) \\
_ATOMIC_MODIFY_( __a__, ${OPERATOR}=, __m__ )

EOF
done
done

cat <<EOF >>impatomic.h

#endif

EOF

Operators and Methods

The core functions are difficult to use, and so the proposal includes member operator and member function definitions that are syntactically simpler and semantically safer.


cat <<EOF >>impatomic.h

#ifdef __cplusplus

EOF

echo impatomic.h methods ordinary basic
for TYPEKEY in bool address ${INTEGERS} ${CHARACTERS}
do
TYPENAME=${!TYPEKEY}

cat <<EOF >>impatomic.h

inline bool atomic_${TYPEKEY}::lock_free() volatile
{ return false; }

inline atomic_${TYPEKEY}::operator ${TYPENAME}() volatile
{ return atomic_load_acquire( this ); }

inline ${TYPENAME} atomic_${TYPEKEY}::operator =
( ${TYPENAME} __m__ ) volatile
{ return atomic_store_ordered( this, __m__ ); }

inline ${TYPENAME} atomic_${TYPEKEY}::swap
( ${TYPENAME} __m__ ) volatile
{ return atomic_swap_ordered( this, __m__ ); }

inline bool atomic_${TYPEKEY}::compare_swap
( ${TYPENAME} * __e__, ${TYPENAME} __m__ ) volatile
{ return atomic_compare_swap_ordered( this, __e__, __m__ ); }

EOF
done

echo impatomic.h methods template basic
cat <<EOF >>impatomic.h

template< typename T >
inline bool atomic<T>::lock_free() volatile
{ return false; }

template< typename T >
inline atomic<T>::operator T () volatile
{ return atomic_load_acquire( this ); }

template< typename T >
inline T atomic<T>::operator =( T __v__ ) volatile
{ return atomic_store_ordered( this, __v__ ); }

template< typename T >
inline T atomic<T>::swap( T __v__ ) volatile
{ return atomic_swap_ordered( this, __v__ ); }

template< typename T >
inline bool atomic<T>::compare_swap( T * __r__, T __v__ ) volatile
{ return atomic_compare_swap_ordered( this, __r__, __v__ ); }

template< typename T >
inline bool atomic<T *>::lock_free() volatile
{ return false; }

template< typename T >
inline atomic<T *>::operator T *() volatile
{ return atomic_load_acquire( this ); }

template< typename T >
inline T atomic<T *>::operator =( T * __v__ ) volatile
{ return atomic_store_ordered( this, __v__ ); }

template< typename T >
inline T * atomic<T *>::swap( T * __v__ ) volatile
{ return atomic_swap_ordered( this, __v__ ); }

template< typename T >
inline bool atomic<T *>::compare_swap( T * * __r__, T * __v__ ) volatile
{ return atomic_compare_swap_ordered( this, __r__, __v__ ); }

EOF

echo impatomic.h methods address fetch
TYPEKEY=address
TYPENAME=${!TYPEKEY}

for FNKEY in ${ADR_OPERATIONS}
do
OPERATOR=${!FNKEY}

cat <<EOF >>impatomic.h

inline ${TYPENAME} atomic_${TYPEKEY}::operator ${OPERATOR}=
( ptrdiff_t __m__ ) volatile
{ return reinterpret_cast<${TYPENAME}>(
 reinterpret_cast<char*>( atomic_fetch_${FNKEY}_ordered( this, __m__ ) )
 ${OPERATOR} __m__ ); }

EOF
done

echo impatomic.h methods pointer fetch
cat <<EOF >>impatomic.h

template< typename T > T * atomic<T *>::operator ++( int ) volatile
{ return atomic_fetch_add_ordered( this, sizeof(T) ); }

template< typename T > T * atomic<T *>::operator --( int ) volatile
{ return atomic_fetch_sub_ordered( this, sizeof(T) ); }

template< typename T > T * atomic<T *>::operator ++() volatile
{ return atomic_fetch_add_ordered( this, sizeof(T) ) + sizeof(T); }

template< typename T > T * atomic<T *>::operator --() volatile
{ return atomic_fetch_sub_ordered( this, sizeof(T) ) - sizeof(T); }

template< typename T >
T * atomic<T *>::operator +=( ptrdiff_t __v__ ) volatile
{ return atomic_fetch_add_ordered( this, sizeof(T) * __v__ ) + sizeof(T) * __v__; }

template< typename T >
T * atomic<T *>::operator -=( ptrdiff_t __v__ ) volatile
{ return atomic_fetch_sub_ordered( this, sizeof(T) * __v__ ) - sizeof(T) * __v__; }

EOF

echo impatomic.h methods integer character fetch
for TYPEKEY in ${INTEGERS} ${CHARACTERS}
do
TYPENAME=${!TYPEKEY}

cat <<EOF >>impatomic.h

inline ${TYPENAME} atomic_${TYPEKEY}::operator ++( int ) volatile
{ return atomic_fetch_add_ordered( this, 1 ); }

inline ${TYPENAME} atomic_${TYPEKEY}::operator --( int ) volatile
{ return atomic_fetch_sub_ordered( this, 1 ); }

inline ${TYPENAME} atomic_${TYPEKEY}::operator ++() volatile
{ return atomic_fetch_add_ordered( this, 1 ) + 1; }

inline ${TYPENAME} atomic_${TYPEKEY}::operator --() volatile
{ return atomic_fetch_sub_ordered( this, 1 ) - 1; }

EOF

for FNKEY in ${INT_OPERATIONS}
do
OPERATOR=${!FNKEY}

cat <<EOF >>impatomic.h

inline ${TYPENAME} atomic_${TYPEKEY}::operator ${OPERATOR}=
( ${TYPENAME} __m__ ) volatile
{ return atomic_fetch_${FNKEY}_ordered( this, __m__ ) ${OPERATOR} __m__; }

EOF
done
done

echo impatomic.h specialization assignments
for TYPEKEY in bool address ${INTEGERS} ${CHARACTERS}
do
TYPENAME=${!TYPEKEY}
cat <<EOF >>impatomic.h

inline ${TYPENAME} atomic< ${TYPENAME} >::operator =
( ${TYPENAME} __v__ ) volatile
{ return atomic_${TYPEKEY}::operator =( __v__ ); }

EOF
done

cat <<EOF >>impatomic.h

#endif

EOF

Implementation Header Cleanup


echo impatomic.h close namespace
cat <<EOF >>impatomic.h

#ifdef __cplusplus
} // namespace std
#endif

EOF

Standard Headers

The C standard header.


echo stdatomic.h
cat <<EOF >stdatomic.h

#include "impatomic.h"

#ifdef __cplusplus

EOF

for TYPEKEY in flag bool address ${INTEGERS} ${CHARACTERS}
do
cat <<EOF >>stdatomic.h

using std::atomic_${TYPEKEY};

EOF
done

cat <<EOF >>stdatomic.h

using std::atomic;

#endif

EOF

The C++ standard header.


echo cstdatomic
cat <<EOF >cstdatomic

#include "impatomic.h"

EOF

Examples of Use

The following program shows example uses of the atomic types. These examples also serve as tests for the interface definition.


echo n2145.c include
cat <<EOF >n2145.c

#include "stdatomic.h"

EOF

Atomic Flag


echo n2145.c flag
cat <<EOF >>n2145.c

atomic_flag af = ATOMIC_FLAG_INIT;

void flag_example( void )
{
	if ( ! atomic_flag_test_and_set_acquire( & af ) )
		atomic_flag_clear_release( & af );
#ifdef __cplusplus
	if ( ! af.test_and_set() )
		af.clear();
#endif
}

EOF

Lazy Initialization

For lazy initialization, a thread that does not do initialization may need to wait on the thread that does. (Lazy initialization is similar to double-checked locking.) For this example, we busy wait on a boolean. Busy waiting like this is usually ill-advised, but it sufficies for the example.


echo n2145.c lazy
cat <<EOF >>n2145.c

atomic_bool lazy_ready = { false };
atomic_bool lazy_assigned = { false };
int lazy_value;

int lazy_example( void )
{
	if ( ! atomic_load_acquire( & lazy_ready ) ) {
		/* the value is not yet ready */
		if ( atomic_swap_relaxed( & lazy_assigned, true ) ) {
			/* initialization assigned to another thread; wait */
			while ( ! atomic_load_acquire( & lazy_ready ) );
		}
		else { /* initialization not assigned; this thread does it */
			lazy_value = 42;
			atomic_store_release( & lazy_ready, true );
		}
	}
	return lazy_value;
}

EOF

Atomic Integer


echo n2145.c integer
cat <<EOF >>n2145.c

atomic_ulong volatile aulv = { 0 };
atomic_ulong auln = { 1 };
#ifdef __cplusplus
atomic< unsigned long > taul; // unable to statically initialize
#endif

void integer_example( void )
{
	atomic_ulong a = { 3 };
	unsigned long x = atomic_load_acquire( & auln );
	atomic_store_release( & aulv, x );
	unsigned long y = atomic_fetch_add_relaxed( & aulv, 1 );
	unsigned long z = atomic_fetch_xor_ordered( & auln, 4 );
#ifdef __cplusplus
	x = auln;
	aulv = x;
	auln += 1;
	aulv ^= 4;
	auln = aulv; // Misuse of assignment; not atomic.
	aulv -= auln++;
	auln |= --aulv;
	aulv &= 7;
	atomic_store_release( & taul, 7 );
	x = atomic_load_acquire( & taul );
	y = atomic_fetch_add_relaxed( & taul, 1 );
	z = atomic_fetch_xor_ordered( & taul, 4 );
	x = taul;
	auln = taul; // Misuse of assignment; not atomic.
	taul = aulv; // Misuse of assignment; not atomic.
	taul = x;
	taul += 1;
	taul ^= 4;
	taul -= taul++;
	taul |= --taul;
	taul &= 7;
#endif
}

EOF

Event Counter

An event counter is not part of the communication between threads, and so it can use faster primitives.


echo n2145.c event
cat <<EOF >>n2145.c

#ifdef __cplusplus

struct event_counter
{
	void inc() { atomic_fetch_add_relaxed( &au, 1 ); }
	unsigned long get() { atomic_load_relaxed( &au ); }
	atomic_ulong au;
};
event_counter ec = { 0 };

void generate_events()
{
	ec.inc();
	ec.inc();
	ec.inc();
}

int read_events()
{
	return ec.get();
}

int event_example()
{
	generate_events(); // possibly in multiple threads
	// join all other threads, ensuring that final value is written
	return read_events();
}

#endif

EOF

An important point here is that this is safe, and we are guaranteed to see exactly the final value, because the thread joins force the necessary ordering between the inc calls and the get call.

List Insert

The insertion into a shared linked list can be accomplished in a lock-free manner with compare-and-swap, provided that compare-and-swap is lock-free. (Note that adding a correct "remove" operation without a garbage collector is harder than it seems.)


echo n2145.c list
cat <<EOF >>n2145.c

#ifdef __cplusplus

struct data;
struct node
{
	node * next;
	data * value;
};

atomic< node * > head; // unable to statically initialize head

void list_example( data * item )
{
	node * candidate = new node;
	candidate->value = item;
	candidate->next = head;
	while ( ! head.compare_swap( &candidate->next, candidate ) );
}

#endif

EOF

Update

The best algorithm for updating a variable may depend on whether or not atomics are lock-free. This update can be accomplished in a lock-free manner with compare-and-swap. This example uses the feature macro to generate minimal code when the lock-free status is known a priori.


echo n2145.c update
cat <<EOF >>n2145.c

#if ATOMIC_SCALAR_LOCK_FREE <= 1
atomic_flag pseudo_mutex = ATOMIC_FLAG_INIT;
unsigned long regular_variable = 1;
#endif
#if ATOMIC_SCALAR_LOCK_FREE >= 1
atomic_ulong atomic_variable = { 1 };
#endif

void update( unsigned long v )
{
#if ATOMIC_SCALAR_LOCK_FREE == 1
	if ( atomic_lock_free( &atomic_variable ) ) {
#endif
#if ATOMIC_SCALAR_LOCK_FREE != 0
		atomic_variable = v;
#endif
#if ATOMIC_SCALAR_LOCK_FREE == 1
	} else {
#endif
#if ATOMIC_SCALAR_LOCK_FREE != 2
		__atomic_flag_wait_acquire__( &pseudo_mutex );
		regular_variable = v;
		atomic_flag_clear_release( &pseudo_mutex );
#endif
#if ATOMIC_SCALAR_LOCK_FREE == 1
	}
#endif
}

EOF

Main


echo n2145.c main
cat <<EOF >>n2145.c

int main()
{
}

EOF