Proposed Text for Bidirectional Fences

Changes from N2633

Renamed atomic_memory_fence to atomic_thread_fence.
Renamed atomic_compiler_fence to atomic_signal_fence.
Defined atomic_*_fence(memory_order_consume) as an acquire fence.
Moved the portions of the text that modify synchronizes-with into 1.10 [intro.multithreaded].
Changed the release sequence definition to include relaxed read-modify-write operations. The PowerPC architecture, which was the original reason for the restriction, is now believed to propagate B-cumulativity across lwarx/stwcx loops. Relaxed RMW operations are no longer believed to break a release sequence on PowerPC.
Changed the specification of memory_order_seq_cst fences to interact properly with other memory_order_seq_cst operations.
Moved the portions of the text that describe memory_order_seq_cst fences into 29.1 [atomics.order].

Proposed Text

All edits are relative to N2691.

Chapter 1 edits

Change 1.10 [intro.multithreaded] p4 as follows:

The library defines a number of atomic operations (clause 29) and operations on locks (clause 30) that are specially identified as synchronization operations. These operations play a special role in making assignments in one thread visible to another. A synchronization operation on one or more memory locations is either a consume operation, an acquire operation, a release operation, or both an acquire and release operation~~, on one or more memory locations; the semantics of these are described below~~. A synchronization operation without an associated memory location is a fence and can be either an acquire fence, a release fence or both an acquire and release fence. In addition, there are relaxed atomic operations, which are not synchronization operations, and atomic read-modify-write operations, which have special characteristics~~, also described below~~. [ Note: For example, a call that acquires a lock will perform an acquire operation on the locations comprising the lock. Correspondingly, a call that releases the same lock will perform a release operation on those same locations. Informally, performing a release operation on A forces prior side effects on other memory locations to become visible to other threads that later perform a consume or an acquire operation on A. We do not include “relaxed” atomic operations as synchronization operations although, like synchronization operations, they cannot contribute to data races. —end note ]

Change 1.10 [intro.multithreaded] p6 as follows:

A release sequence on an atomic object M is a maximal contiguous sub-sequence of side effects in the modification order of M, where the first operation is a release, and every subsequent operation

is performed by the same thread that performed the release, or

is ~~a non-relaxed~~ an atomic read-modify-write operation.

Change 1.10 [intro.multithreaded] p7 as follows:

Certain library calls synchronize with other library calls performed by another thread. In particular, an An evaluation A that performs a release operation on an object M synchronizes with an evaluation B that performs an acquire operation on M and reads a value written by any side effect in the release sequence headed by A. [ Note: Except in the specified cases, reading a later value does not necessarily ensure visibility as described below. Such a requirement would sometimes interfere with efficient implementation. —end note ] [ Note: The specifications of the synchronization operations define when one reads the value written by another. For atomic variables, the definition is clear. All operations on a given lock occur in a single total order. Each lock acquisition “reads the value written” by the last lock release. —end note ]

A release fence R synchronizes with an acquire fence A if there exist evaluations X and Y such that X is sequenced after R and performs an atomic modification operation on an object M, Y is sequenced before A and performs an atomic operation on M, and Y reads the value written by X or a value written by any side effect in the release sequence X would head had it been a release operation.

A release fence R synchronizes with an evaluation A that performs an acquire operation on an object M if there exists an evaluation X such that X is sequenced after R, X performs an atomic modification operation on M and A reads the value written by X or a value written by any side effect in the release sequence X would head had it been a release operation.

An evaluation R that is a release operation on an object M synchronizes with an acquire fence A if an evaluation Y, sequenced before A, performs an atomic operation on M and reads the value written by R, or a value written by any side effect in the release sequence headed by R.

Chapter 29 edits

Remove the

void fence(memory_order) const volatile;

members from all types in [atomics].

Remove the

void atomic_flag_fence(const volatile atomic_flag *object, memory_order order);

function.

Remove the

void atomic_fence(const volatile atomic_type*, memory_order);

functions.

Remove the definition

extern const atomic_flag atomic_global_fence_compatibility;

Change 29.1 [atomics.order] p2 as follows:

The memory_order_seq_cst operations that load a value are acquire operations on the affected locations. The memory_order_seq_cst operations that store a value are release operations on the affected locations. In addition, in a consistent execution, there must be a single total order S on all memory_order_seq_cst operations and fences, consistent with the happens before order and modification orders for all affected locations, such that each memory_order_seq_cst operation observes either the last preceding modification according to this order S, or the result of an operation that is not memory_order_seq_cst. [ Note: Although it is not explicitly required that S include locks, it can always be extended to an order that does include lock and unlock operations, since the ordering between those is already included in the happens before ordering. —end note ]

If a memory_order_seq_cst fence F is sequenced before an atomic operation A on an object M, A observes either the last memory_order_seq_cst modification of M preceding F in the total order S, or a later modification of M in its modification order.

If an atomic modification operation A of an object M is sequenced before a memory_order_seq_cst fence F, and a memory_order_seq_cst operation B on M follows F in S, B observes either the effects of A on M, or a later modification of M in its modification order.

If an atomic modification operation A of an object M is sequenced before a memory_order_seq_cst fence Fa, a memory_order_seq_cst fence Fb is sequenced before an atomic operation B on M, and Fb follows Fa in S, B observes either the effects of A on M, or a later modification on M in its modification order.

Add

// 29.6, fences
void atomic_thread_fence(memory_order);
void atomic_signal_fence(memory_order);

to the synopsis of <cstdatomic>.

Add a new section, [atomic.fences], with the following contents:

29.6 Fences

This section introduces synchronization primitives called fences. Their synchronization properties are described in [intro.multithreaded] and [atomics.order].

void atomic_thread_fence(memory_order mo);

Effects: Depending on the value of mo, this operation:

has no effects, if mo == memory_order_relaxed;

is an acquire fence, if mo == memory_order_acquire || mo == memory_order_consume;

is a release fence, if mo == memory_order_release;

is both an acquire fence and a release fence, if mo == memory_order_acq_rel;

is a sequentially consistent acquire and release fence, if mo == memory_order_seq_cst.

void atomic_signal_fence(memory_order mo);

Effects: equivalent to atomic_thread_fence(mo), except that synchronizes with relationships are established only between a thread and a signal handler executed in the same thread.

[Note: atomic_signal_fence can be used to specify the order in which actions performed by the thread become visible to the signal handler. — end note]

[Note: Compiler optimizations or reorderings of loads and stores are inhibited in the same way as with atomic_thread_fence, but the hardware fence instructions that atomic_thread_fence would have inserted are not emitted. — end note]

Thanks to Hans Boehm, Lawrence Crowl, Paul McKenney, Clark Nelson and Raul Silvera for reviewing this paper.

--end