Comments on the C++ Memory Model Following a Partial Formalization Attempt

ISO/IEC JTC1 SC22 WG21 DRAFT N2955 = 09-0145 - 2009-09-28

Mark Batty, mjb220@cl.cam.ac.uk

Motivation for Suggestions
Mixed Accesses to Memory Locations
Release Sequence and Visible Sequence of Side Effects
The Conditions for Sequential Consistency
Reading the Initial State
Restriction on Compiler Hoisting
Collected Suggestions for Change to the Standard
    1.10p2
    1.10p4
    1.10p6
    1.10p12
    1.10p13
    1.10p14
    29.3p9
References

Motivation for Suggestions

The issues highlighted in this document were first raised in [BattySewell09], an attempt to formalize part of the memory model as it is described in the current working draft of the standard [N2914], using the HOL proof assistant [HOL], paying particular attention to the subset of the standard corresponding to the sequentially consistent model of [BoehmAdve08]. Such formalization work commonly uncovers ambiguities, inconsistencies, or omissions. In this case, we have identified a number of places where some local clarification or improvement may be appropriate.

Mixed Accesses to Memory Locations

The atomic memory operations in section 29 [Atomics] are the programmer's interface to the explicitly synchronized memory operations. They are presented as a set of types with functions for operating on them. It does not seem sufficiently clear what provision there is for non-atomic access to objects of this type.

At first glance the Standard seems to allow non-atomic accesses at atomic locations. In 1.10 [intro.multithread] paragraph 14 this is implied from the definition of a data race (note that memory operations in conflict act on the same location):

The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens-before the other. Any such data race results in undefined behavior.

How then would a programmer write down a non-atomic access to the atomic location? There are no operations built into the C++ library functions on the atomic types for non-atomic access. Non-atomic access will have to take place on the object through an lvalue of some other type. 3.10 [basic.lval] paragraph 15 lists the types of lvalue that would not result in undefined behavior:

If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined

All of these types are slight variations of the original type and are still atomic, except for the char and unsigned char types. This suggests that any non-atomic access to an atomic location would have to be through a char or unsigned char lvalue. There is evidence in the standard to suggest that using such an lvalue to modify an atomic object would result in undefined behavior, but it is not definitive.

Each atomic type in section 29 has a deleted copy constructor, therefore has no trivial copy constructor, and ought not to be trivially copyable. This is not certain however from the definition in 9p6 [class] which defines a trivially copyable class, but does not explicitly require such a class to have a trivial copy constructor. Section 3.9 [basic.types] leaves memcpy undefined over non-trivially copyable types. If memcpy is undefined over the atomics, this suggests that other non-atomic memory accesses might not work either.

This argument is not very strong, but if it does indeed hold, then the following note, to be appended to 1.10p4, will clarify that mixed accesses are not allowed:

( Note: Atomic and locking objects are not trivially copyable [3.9], so executions that access atomic or locking objects as non-atomic objects, through an array of character type, for example, shall have undefined behaviour - end note )

The possibility of mixed accesses should be removed from the definition of a data race in 1.10p14:

The execution of a program contains a data race if it contains two non-atomic conflicting actions in different threads, at least one of which is not atomic, and neither happens-before the other. Any such data race results in undefined behavior.

Release Sequence and Visible Sequence of Side Effects

The standard presents several definitions with the phrase "maximal contiguous sub-sequence of side effects in the modification order of M". The usages of this phrase seem unclear, in the definitions of a visible sequence of side effects and a release sequence in 1.10p13 and 1.10p6 respectively:

The visible sequence of side effects on an atomic object M, with respect to a value computation B of M, is a maximal contiguous sub-sequence of side effects in the modification order of M, where the first side effect is visible with respect to B, and for every subsequent side effect, it is not the case that B happens-before it. [our emphasis]

A release sequence on an atomic object M is a maximal contiguous sub-sequence of side effects in the modification order of M, where the first operation is a release, and every subsequent operation is performed by the same thread that performed the release, or is an atomic read-modify-write operation. [our emphasis]

Early in the process of formalization, the word 'maximal' was understood to be specifying that the first operation in the sequence is maximal in the modification order. This was an error in interpretation rather than in the prose of the standard, but it might improve the clarity of the paragraphs to change the word 'maximal' for the words 'maximal-length'. So respectively 1.10p13 and 1.10p6 become:

The visible sequence of side effects on an atomic object M, with respect to a value computation B of M, is a maximal-length contiguous sub-sequence of side effects in the modification order of M, where the first side effect is visible with respect to B, and for every subsequent side effect, it is not the case that B happens-before it.

A release sequence on an atomic object M is a maximal-length contiguous sub-sequence of side effects in the modification order of M, where the first operation is a release, and every subsequent operation is performed by the same thread that performed the release, or is an atomic read-modify-write operation.

The Conditions for Sequential Consistency

The standard alludes to supporting a simplified sequentially consistent model of program execution although it does not explicitly specify it. It would be valuable for programmers to be guaranteed sequential consistency with minimal well-specified conditions.

The standard does promise that programmers can rely on a sequentially consistent execution of memory operations, but only if they do not use atomic operations at all. The note that makes this claim is in 1.10p14:

It can be shown that programs that correctly use simple locks to prevent all data races, and use no other synchronization operations, behave as though the executions of their constituent threads were simply interleaved, with each observed value of an object being the last value assigned in that interleaving. This is normally referred to as "sequential consistency". However, this applies only to race-free programs, and race-free programs cannot observe most program transformations that do not change single-threaded program semantics. [...]

In 1.10p12 the standard alludes to requiring sequential consistency with slightly weaker conditions, describing the correspondence of data races in restricted executions to data races in a sequentially consistent execution.

This states that operations on ordinary variables are not visibly reordered. This is not actually detectable without data races, but it is necessary to ensure that data races, as defined here, and with suitable restrictions on the use of atomics, correspond to data races in a simple interleaved (sequentially consistent) execution.

Given the models described in [BoehmAdve08] and [BattySewell09] correspond to data race free executions in the model of the standard over a restricted set of the atomic operations, it seems that the language regarding the conditions for sequential consistency could be made stronger.

The suggestion for change is that the note in 1.10p14 be changed to:

It can be shown that programs that correctly use simple locks and atomic memory_order_seq_cst operations to prevent all data races, and use no other synchronization operations, behave as though the executions of their constituent threads were simply interleaved [...]

Reading the Initial State

There is only one paragraph discussing reads from the initial state in the standard. In 1.10p2 it is mentioned that reads might read from the initial state.

The value of an object visible to a thread T at a particular point might be the initial value of the object, a value assigned to the object by T, or a value assigned to the object by another thread, according to the rules below.

Given the weakness of the memory model it does not seem enough to leave the reader to come up with a common-sense semantics for reads from the initial state. The standard should provide explicit conditions for such reads, for the non atomic memory objects, 1.10p12 states:

The value of a non-atomic scalar object M as determined by evaluation B, shall be the value stored by the visible side effect A.

Which could be changed to:

The value of a non-atomic scalar object M as determined by evaluation B, shall be the value stored by the visible side effect A, if one exists, or the initial value of the object otherwise.

Similarly, 1.10p13 describes the values determined for evaluations on atomic objects:

The visible sequence of side effects on an atomic object M, with respect to a value computation B of M, is a maximal contiguous sub-sequence of side effects in the modification order of M, where the first side effect is visible with respect to B, and for every subsequent side effect, it is not the case that B happens before it.

The value of an atomic object M, as determined by evaluation B, shall be the value stored by some operation in the visible sequence of M with respect to B.

In the atomic case, if there is no visible side effect with respect to an evaluation, then we assume that the evaluation could take the value of any side effect that it does not happen before or the initial value of the object. Altering 1.10p13 to incorporate this gives:

The visible sequence of side effects on an atomic object M, with respect to a value computation B of M, is a maximal contiguous sub-sequence of side effects in the modification order of M, where the first side effect is visible with respect to B, and for every subsequent side effect, it is not the case that B happens before it.

If there are no visible side effects with respect to B and there are some side effects that B does not happen before, then B has an initial side effect sequence, a maximal-length contiguous sub-sequence of side effects in the modification order of M where for each side effect, it is not the case that B happens before it.

The value of an atomic object M, as determined by evaluation B, shall be the value stored by some operation in the visible sequence of M with respect to B, or if no such sequence exists, the initial value or any value from the initial side effect sequence.

Returning now to 1.10p2, the word 'might' suggests that there may be some ambiguity over what values are read. The intended meaning is not that the value of an object might be the initial value at any arbitrary point, nor is the intention that the value might not be the initial value at the initial point. Given the edits of 10.1p12 and 10.1p13 above, 10.1p2 could be changed slightly to remove the word 'might':

The value of an object visible to a thread T at a particular point might be is the initial value of the object, a value assigned to the object by T, or a value assigned to the object by another thread, according to the rules below.

Restriction on Compiler Hoisting

Consider paragraph 29.3p9 [atomics.order]:

Implementations shall not move an atomic operation out of an unbounded loop.

The phrase "moving an atomic operation" may be meaningful in the context of a specific implementation, but the presentation of the rest of the model has been more general. The word "moved" seems to refer to memory operations being moved by a compiler (or some abstract interpreter), whereas the style of the rest of the standard is more axiomatic: given some particular candidate execution, the memory model determines whether it is valid or not.

Even leaving that aside, paragraph 29.3p9 mentions an unbounded loop. This term is not used anywhere else in the standard. Given the context it seems to be referring to iteration statements that are not assumed to terminate by 6.5p5 [stmt.iter]. Any iteration statement containing an atomic operation is not assumed to terminate by paragraph 6.5p5, so 29.3p9 is redundant.

There seems to be a serious question as to just what is intended here. If the clause is meant to prevent compilers from hoisting atomic memory actions out of the body of a loop, then this restriction is overly strong. If a given transformation would produce observably different behavior, then 1.9p1 [intro.execution] dictates that the transformation would not be valid:

The semantic descriptions in this International Standard define a parametrized non-deterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.

The suggestion for change is to remove the second sentence in 29.3p9 entirely.

Collected Suggestions for Change to the Standard

This section collects all of the suggestions for change to the standard described in this document. The relevant paragraphs label is given then the whole paragraph is reproduced with any added text in bold and any removed text struck through.

1.10p2

The value of an object visible to a thread T at a particular point might be is the initial value of the object, a value assigned to the object by T, or a value assigned to the object by another thread, according to the rules below. ( Note: In some cases, there may instead be undefined behavior. Much of this section is motivated by the desire to support atomic operations with explicit and detailed visibility constraints. However, it also implicitly supports a simpler view for more restricted programs. - end note )

1.10p4

The library defines a number of atomic operations (paragraph 29) and operations on locks (paragraph 30) that are specially identified as synchronization operations. These operations play a special role in making assignments in one thread visible to another. A synchronization operation on one or more memory locations is either a consume operation, an acquire operation, a release operation, or both an acquire and release operation. A synchronization operation without an associated memory location is a fence and can be either an acquire fence, a release fence, or both an acquire and release fence. In addition, there are relaxed atomic operations, which are not synchronization operations, and atomic read-modify-write operations, which have special characteristics. ( Note: For example, a call that acquires a lock will perform an acquire operation on the locations comprising the lock. Correspondingly, a call that releases the same lock will perform a release operation on those same locations. Informally, performing a release operation on A forces prior side effects on other memory locations to become visible to other threads that later perform a consume or an acquire operation on A. ``Relaxed'' atomic operations are not synchronization operations even though, like synchronization operations, they cannot contribute to data races. - end note ) ( Note: Atomic and locking objects are not trivially copyable [3.9], so executions that access atomic or locking objects as non-atomic objects, through an array of character type, for example, will have undefined behaviour - end note )

1.10p6

A release sequence on an atomic object M is a maximal-length contiguous sub-sequence of side effects in the modification order of M, where the first operation is a release, and every subsequent operation is performed by the same thread that performed the release, or is an atomic read-modify-write operation.

1.10p12

A visible side effect A on an object M with respect to a value computation B of M satisfies the conditions:

The value of a non-atomic scalar object M, as determined by evaluation B, shall be the value stored by the visible side effect A if one exists, or the initial value of the object otherwise. ( Note: If there is ambiguity about which side effect to a non-atomic object is visible, then there is a data race, and the behavior is undefined. - end note ) ( Note: This states that operations on ordinary variables are not visibly reordered. This is not actually detectable without data races, but it is necessary to ensure that data races, as defined here, and with suitable restrictions on the use of atomics, correspond to data races in a simple interleaved (sequentially consistent) execution. - end note )

1.10p13

The visible sequence of side effects on an atomic object M, with respect to a value computation B of M, is a maximal-length contiguous sub-sequence of side effects in the modification order of M, where the first side effect is visible with respect to B and for every subsequent side effect, it is not the case that B happens-before it.

If there are no visible side effects with respect to B and there are some side effects that B does not happen before, then B has an initial side effect sequence, a maximal-length contiguous sub-sequence of side effects in the modification order of M where for each side effect, it is not the case that B happens before it.

The value of an atomic object M, as determined by evaluation B, shall be the value stored by some operation in the visible sequence of M with respect to B, or if no such sequence exists, the initial value or any value from the initial side effect sequence. Furthermore, if a value computation A of an atomic object M happens before a value computation B of M, and the value computed by A corresponds to the value stored by side effect X, then the value computed by B shall either equal the value computed by A, or be the value stored by side effect Y, where Y follows X in the modification order of M. ( Note: This effectively disallows compiler reordering of atomic operations to a single object, even if both operations are relaxed loads. By doing so, we This effectively makes the cache coherence guarantee provided by most hardware available to C++ atomic operations. - end note ) ( Note: The visible sequence depends on the happens before relation, which depends on the values observed by loads of atomics, which we are restricting here. The intended reading is that there must exist an association of atomic loads with modifications they observe that, together with suitably chosen modification orders and the happens before relation derived as described above, satisfy the resulting constraints as imposed here. - end note )

1.10p14

The execution of a program contains a data race if it contains two non-atomic conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior. ( Note: It can be shown that programs that correctly use simple locks and atomic memory_order_seq_cst operations to prevent all data races, and use no other synchronization operations, behave as though the executions of their constituent threads were simply interleaved, with each observed value of an object being the last value assigned in that interleaving. This is normally referred to as "sequential consistency". However, this applies only to race-free programs, and race-free programs cannot observe most program transformations that do not change single-threaded program semantics. In fact, most single-threaded program transformations continue to be allowed, since any program that behaves differently as a result must perform an undefined operation. - end note )

29.3p9

Implementations should make atomic stores visible to atomic loads within a reasonable amount of time. Implementations shall not move an atomic operation out of an unbounded loop.

References

[N2914]
Working Draft, Standard for Programming Language C++, ISO/IEC JTC1 SC22 WG21 N2914 = 09-0104 - 2009-06-22, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2914.pdf. Accessed 2009/09/28.
[BoehmAdve08]
Foundations of the C++ Concurrency Memory Model, H.-J. Boehm and S. Adve, In Proc. PLDI, pages 68-78, New York, NY, USA, 2008. ACM. http://www.hpl.hp.com/techreports/2008/HPL-2008-56.pdf. Accessed 2009/09/28.
[HOL]
The HOL 4 system. http://hol.sourceforge.net/.
[BattySewell09]
Notes on the Draft C++ Memory Model, Mark Batty, Peter Sewell, http://www.cl.cam.ac.uk/~mjb220/cpp/cpp2440.pdf