Doc. No.: WG21/N2171
J16/07-0031
Date: 2007-03-12
Reply to: Clark Nelson Hans-J. Boehm
Phone: +1-503-712-8433 +1-650-857-3406
Email: clark.nelson@intel.com Hans.Boehm@hp.com

Sequencing and the concurrency memory model (revised)

This paper is a revision of N2052. Significant changes relative to that paper are detailed below.

This paper is also a successor to, but not a revision of, N1944. N1944 was basically an exploratory paper, despite the amount of nearly-WD-ready text proposed; its style of presentation was very heavy on explanation and motivation. Consequently, it is certain to be useful as a tutorial introduction and/or rationale for this paper.

But based on the amount of positive feedback received, the exploratory phase could hopefully be considered complete. Furthermore, some of the feedback received would have been difficult to address in a document organized as N1944 was. It now seems highly desirable to have a cohesive presentation of the changed WD text, emphasizing the result rather than the process. This paper also presents work on aspects of sequencing explicitly related to concurrency, addressing other feedback on N1944.

This paper should also be viewed as a successor to N1942, the memory model proposal. Again, much of the explanatory material from N1942 is not repeated here. In an attempt to simplify, some of the terminology has changed from N1942.

Contents

Significant changes since N2052

Some editorial notes were added, pointing out significant implications and observations which may interest the reader.

A statement on the sequencing of value computations has been added to 1.9p16.

Transitivity was added to the definition of the "precedes" relation in 1.10p9.

The section on library thread-safety has been deleted, in anticipation of a more comprehensive paper by a more qualified author.

Sequencing of compound assignment and post-increment

The following troublesome example was deleted from the proposed wording for 1.9p17:

int increment_x() { return x++; }
x++ + increment_x();                // Evaluation order unspecified; x may be incremented only once
increment_x() + increment_x();      // x is incremented twice

Instead, wording was added explicitly forbidding the sequencing of an indeterminately-sequenced function call "within" a postfix increment (therefore decrement) or compound assignment (therefore prefix increment or decrement); see 5.2.6p1 and 5.17p1.

This change was motivated by a combination of factors. Firstly, it is not clear from the existing standard whether the interpretation that allows the outcome described in the example is correct or not; therefore, the status quo is a matter of opinion, with different experts expressing different opinions.

Secondly, it is clear that the existing standard describes the example as having unspecified behavior, but not undefined behavior. Unspecified behavior is described in the C++ standard as applying to "a well-formed program construct and correct data". If the example expression is truly as unreliable as is claimed, it seems unhelpful (if not disingenuous) to classify it into such a benign-sounding category.

Between the alternatives of reclassifying the expression as having undefined behavior, and of tightening the sequencing rules sufficiently to eliminate the possibility of (apparently) losing a side effect, the latter is far less radical, easier to specify, more conducive to writing reliable programs, and probably has minimal if any impact on existing implementations.

Concurrency memory model changes

Some sections of the threads memory model (1.10) have been in flux recently and are subject to further debate.

The notion of a "modification order" was added recently to ensure that the perceived value of a single atomic variable does not "flip-flop" in unexpected ways. The precise way in which this should be integrated into the rest of the model was not clear, though we have significantly more confidence in the current approach than its predecessors.

There have also been recent back-and-forth changes in the definition of "synchronizes with", and particularly in whether synchronizes-with relationships should exist between "relaxed" accesses to the same atomic variable. This impacts whether atomic read-modify-write accesses can effectively be used as fences to order other accesses, and whether synchronization operations on variables accessed by only a single thread can be effectively optimized by the compiler. The latter in turn impacts how well the compiler can combine "small" threads. Some of this may also need to be revisited if explicit fences are added to the library, as proposed in N2153.

Significant changes in the proposed wording since N1944

The WD text proposed in N1944 introduced ambiguity in the use of the term "evaluation". Most new uses of that term were intended to reflect usage in mathematics, as in the computation of a value, without side effects. This usage is inconsistent with C/C++ tradition, and the way the term is used in the standard. So when it is necessary to talk about evaluations that do not have side effects, the term "value computation" is now used.

There is a new paragraph defining and explaining the "sequenced before" relation; see 1.9p14.

To reflect the consensus from the discussion in Berlin, a note has been added clearly stating that there is no requirement of consistency for operations whose sequencing is not constrained; see 1.9p16.

The statement of the "no interleaving" rule for functions has been updated; see 1.9p17. Also, an example has been added pointing out a possibly-surprising interpretation of "unspecified behavior".

Resolutions are proposed for several questions raised but not answered in N1944, mostly in Fixes for miscellaneous sequencing issues.

Rearranging the text of "Program execution"

The changes proposed in N1944 were mainly in section 1.9 (Program execution) and various locations in clause 5 (Expressions), plus a couple of spots in clause 12 (Special member functions). The "undefined behavior" rule, a key paragraph in the understanding of sequencing, which basically describes what may be called an "intra-thread data race", is currently in 5p4, which is widely separated from the bulk of the discussion of the principles of sequencing in 1.9. Furthermore, it would seem logical to describe concurrency — and particularly inter-thread data races — in a new section building on and immediately following 1.9. Therefore we propose to move the "undefined behavior" rule from 5p4 to 1.9.

Within 1.9 with the changes proposed in N1944, the bulk of the discussion of sequencing is in p15-16. Paragraph 8, which currently contains the "no overlap" rule for function execution, should be merged into p16, which discusses many other sequencing constraints on function calls. And if, as proposed, the references to sequence points and evaluation are removed from p11 (the "least requirements"), then the definitions in p7 are not needed until p15; moving paragraph 7 down would result in a more cohesive presentation.

Finally, it could be argued that cohesiveness would be increased still further by moving the discussion of reassociation (concerning implications of the "as-if" rule) to immediately follow the "least requirements" (which is basically the normative statement of the "as-if" rule), instead of showing up in the middle of the discussion of expressions and sequencing.

This table shows the proposed shifting of content, assuming regular paragraph (re-)numbering. The letters in the central columns are just tags, intended to illustrate how text moves around (in lieu of arrows): the tag stays with the content.

Paragraph number Old content New content
1.9p7 Definitions of "side effect", "sequence point" A C Effect of asynchronous signal
1.9p8 "No overlap" rule for function execution B C Allocation of automatic objects
1.9p9 Effect of asynchronous signal C C The "least requirements"
1.9p10 Allocation of automatic objects C E Note concerning reassociation
1.9p11 The "least requirements" C D Definition of "full-expression"
1.9p12 Definition of "full-expression" D D Note concerning default arguments
1.9p13 Note concerning default arguments D A Definition of "side effect", "evaluation"
1.9p14 Note concerning reassociation E [new] Definition of "sequenced before"
1.9p15 Sequencing between full-expressions F F Sequencing between full-expressions
1.9p16 Sequencing constraints on function calls G 5p4 The "undefined behavior" rule
1.9p17 Operators that impose a sequence point [delete] G+B Sequencing constraints on function calls, including the "no overlap" rule

The text proposed for "Program execution"

So here is the proposed reading of section 1.9, beginning with p6 (just for the sake of context). Each paragraph is introduced with its proposed paragraph number, and an explanation of its source. Text from the current working draft to be replaced or deleted is stricken through. Replacement or added text is underlined. Footnotes are presented here in the same style as examples and notes. If the introductory paragraphs, editorial notes and stricken text were deleted, the result would be a longish block of consecutive paragraphs, as proposed for the standard.

1.9p6 (unchanged):

The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library I/O functions. An implementation can offer additional library I/O functions as an extension. [ Footnote: Implementations that do so should treat calls to those functions as "observable behavior" as well. —end footnote ]

Editorial note: This definition of observable behavior is not clearly consistent with the "least requirements" described in the proposed 1.9p9 below (and is arguably incorrect, especially for multithreaded programs). Core issue 612 has been opened to consider this inconsistency, and any corrections necessary for multithreading will be drafted in accordance with its resolution.

1.9p7 (unchanged from the current p9, except for the addition of an omitted word):

When the processing of the abstract machine is interrupted by receipt of a signal, the values of objects with type other than volatile std::sig_atomic_t are unspecified, and the value of any object not of type volatile std::sig_atomic_t that is modified by the handler becomes undefined.

1.9p8 (unchanged from the current p10):

An instance of each object with automatic storage duration (3.7.2) is associated with each entry into its block. Such an object exists and retains its last-stored value during the execution of the block and while the block is suspended (by a call of a function or receipt of a signal).

1.9p9 (original text from p11):

The least requirements on a conforming implementation are:

[ Note: more stringent correspondences between abstract and actual semantics may be defined by each implementation. —end note ]

1.9p10 (unchanged from p14):

[ Note: operators can be regrouped according to the usual mathematical rules only where the operators really are associative or commutative.11) For example, in the following fragment

[unchanged text omitted]

However on a machine in which overflows do not produce an exception and in which the results of overflows are reversible, the above expression statement can be rewritten by the implementation in any of the above ways because the same result will occur. —end note ]

1.9p11 (original text from p12):

A full-expression is an expression that is not a subexpression of another expression. If a language construct is defined to produce an implicit call of a function, a use of the language construct is considered to be an expression for the purposes of this definition. A call to a destructor generated at the end of the lifetime of an object other than a temporary object is an implicit full-expression. Conversions applied to the result of an expression in order to satisfy the requirements of the language construct in which the expression appears are also considered to be part of the full-expression. [ Example:

[unchanged example omitted]

1.9p12 (unchanged from p13):

[ Note: the evaluation of a full-expression can include the evaluation of subexpressions that are not lexically part of the full-expression. For example, subexpressions involved in evaluating default argument expressions (8.3.6) are considered to be created in the expression that calls the function, not the expression that defines the default argument. —end note ]

1.9p13 (original text from p7):

Accessing an object designated by a volatile lvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression might produce side effects. Evaluation of an expression (or sub-expression) in general includes both value computations (including fetching a value previously assigned to an object) and initiation of side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. [ Footnote: Note that some aspects of sequencing in the abstract machine are unspecified; the preceding restriction upon side effects applies to that particular execution sequence in which the actual code is generated. Also note that when a call to a library I/O function returns, the side effect is considered complete, even though some external actions implied by the call (such as the I/O itself) may not have completed yet. —end footnote ]

1.9p14 (new paragraph):

"Sequenced before" is an asymmetric, transitive, pair-wise relation between evaluations executed by a single thread, which induces a partial order among those evaluations. Given any two evaluations A and B, if A is sequenced before B, then the execution of A shall precede the execution of B. If A is not sequenced before B and B is not sequenced before A, then A and B are unsequenced. [ Note: The execution of unsequenced evaluations can overlap. —end note ] Evaluations A and B are indeterminately sequenced when either A is sequenced before B, or B is sequenced before A, but it is unspecified which. [ Note: Indeterminately sequenced evaluations shall not overlap, but either could be executed first. —end note ]

1.9p15 (original text from p15):

There is a sequence point at the completion of evaluation of each full-expression. Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated. [ Footnote: As specified in 12.2, after the "end-of-full-expression" sequence point after a full-expression is evaluated, a sequence of zero or more invocations of destructor functions for temporary objects takes place, usually in reverse order of the construction of each temporary object. —end footnote ]

1.9p16 (original text from clause 5 paragraph 4):

Except where noted, the order of evaluation evaluations of operands of individual operators, and of subexpressions of individual expressions, and the order in which side effects take place, is unspecified are unsequenced. [ Footnote: The precedence of operators is not directly specified, but it can be derived from the syntax. —end footnote ] [ Note: In an expression that is evaluated more than once during the execution of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations. —end note ] Except where noted, the value computations of the operands of an operator are sequenced before the value computation of the result of the operator. Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored. The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full expression; otherwise the behavior is undefined. If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object, or a value computation using the value of the same scalar object, the behavior is undefined. [ Example:

i = v[i++];       // the behavior is undefined
i = 7, i++, i++;  // i becomes 9
i = ++i + 1;      // the behavior is undefined
i = i + 1;        // the value of i is incremented

end example ]

Editorial note: It has been pointed out that, under this proposed wording, unsequenced read accesses to a single volatile object (clearly) entail undefined behavior, which was not clearly the case with the previous wording. The key difference is that the new words refer to a "side effect", which definitely includes reading a volatile object, whereas the previous words referred to modifying an object "by the evaluation of an expression", which is ambiguous with respect to reading a volatile object — since such an action is a side effect, modification of the object accessed (or of some other volatile object) is possible but not inevitable.

1.9p17 (original text is p16 with p8 inserted):

When calling a function (whether or not the function is inline), there is a sequence point after the evaluation of all function arguments (if any) which takes place every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of any expressions or statements expression or statement in the body of the called function body. [ Note: Value computations and side effects associated with different argument expressions are unsequenced. —end note ] There is also a sequence point after the copying of a returned value and before the execution of any expressions outside the function. [ Footnote: The sequence point at the function return is not explicitly specified in ISO C, and can be considered redundant with sequence points at full-expressions, but the extra clarity is important in C++. In C++, there are more ways in which a called function can terminate its execution, such as the throw of an exception. —end footnote ] Once the execution of a function begins, no expressions from the calling function are evaluated until execution of the called function has completed. Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function. [ Footnote: In other words, function executions do not "interleave" with each other. —end footnote ] Several contexts in C++ cause evaluation of a function call, even though no corresponding function call syntax appears in the translation unit. [ Example: evaluation of a new expression invokes one or more allocation and constructor functions; see 5.3.4. For another example, invocation of a conversion function (12.3.2) can arise in contexts in which no function call syntax appears. —end example ] The sequence points at function-entry and function-exit sequencing constraints on the execution of the called function (as described above) are features of the function calls as evaluated, whatever the syntax of the expression that calls the function might be.

Deleted as redundant with descriptions of operators (original text from p17):

In the evaluation of each of the expressions

a && b
a || b
a ? b : c
a , b

using the built-in meaning of the operators in these expressions (5.14, 5.15, 5.16, 5.18), there is a sequence point after the evaluation of the first expression. [ Footnote: The operators indicated in this paragraph are the built-in operators, as described in clause 5. When one of these operators is overloaded (clause 13) in a valid context, thus designating a user-defined operator function, the expression designates a function invocation, and the operands form an argument list, without an implied sequence point between them. —end footnote ]

The definition of "memory location"

New paragraphs inserted as 1.7p3 et seq.:

A memory location is either an object of scalar type, or a maximal sequence of adjacent bit-fields all having non-zero width. Two threads of execution can update and access separate memory locations without interfering with each other.

[Note: Thus a bit-field and an adjacent non-bit-field are in separate memory locations, and therefore can be concurrently updated by two threads of execution without interference. The same applies to two bit-fields, if one is declared inside a nested struct declaration and the other is not, or if the two are separated by a zero-length bit-field declaration, or if they are separated by a non-bit-field declaration. It is not safe to concurrently update two bit-fields in the same struct if all fields between them are also bit-fields, no matter what the sizes of those intervening bit-fields happen to be. —end note ]

[Example: A structure declared as struct {char a; int b:5, c:11, :0, d:8; struct {int ee:8;} e;} contains four separate memory locations: The field a, and bit-fields d and e.ee are each separate memory locations, and can be modified concurrently without interfering with each other. The bit-fields b and c together constitute the fourth memory location. The bit-fields b and c can not be concurrently modified, but b and a, for example, can be. —end example.]

Multi-threaded executions and data races

Insert a new section between 1.9 and 1.10, titled "Multi-threaded executions and data races".

1.10p1:

Under a hosted implementation, a C++ program can have more than one thread of execution (a.k.a. thread) running concurrently. Each thread executes a single function according to the rules expressed in this standard. The execution of the entire program consists of an execution of all of its threads. [Note: Usually the execution can be viewed as an interleaving of all its threads. However some kinds of atomic operations, for example, allow executions inconsistent with a simple interleaving, as described below. —end note ] Under a freestanding implementation, it is implementation-defined whether a program can have more than one thread of execution.

1.10p2:

The execution of each thread proceeds as defined by the remainder of this standard. The value of an object visible to a thread T at a particular point might be the initial value of the object, a value assigned to the object by T, or a value assigned to the object by another thread, according to the rules below.

1.10p3:

Two expression evaluations conflict if one of them modifies a memory location and the other one accesses or modifies the same memory location.

1.10p4:

The library defines a number of operations, such as operations on locks and atomic objects, that are specially identified as synchronization operations. These operations play a special role in making assignments in one thread visible to another. A synchronization operation is either an acquire operation or a release operation, or both, on one or more memory locations. [Note: For example, a call that acquires a lock will perform an acquire operation on the locations comprising the lock. Correspondingly, a call that releases the same lock will perform a release operation on those same locations. Informally, performing a release operation on A forces prior side effects on other memory locations to become visible to other threads that later perform an acquire operation on A. —end note ]

1.10p5-6, previously containing the definition of "inter-thread ordered before", have been deleted from this revision. Subsequent paragraphs will be renumbered eventually.

This was rewritten in terms of "synchronizes with", which is restricted to synchronization operations, instead of explicitly including store-load dependencies in a "communicates with" relation as in N1944. This version is intended to be equivalent, since we insist that "happens before" together with store-load dependencies remains acyclic. We need that for the race free implies sequential consistency proof, and for one of the examples.

1.10p7:

All modifications to a particular atomic object M occur in some particular total order, called the modification order of M. An evaluation A that performs a release operation on an object M synchronizes with an evaluation B that performs an acquire operation on M and reads either the value written by A or a later value in the modification order of M. [Note: The specifications of the synchronization operations define when one reads the value written by another. For atomic variables, the definition is clear. All operations on a given lock occur in a single total order. Each lock acquisition "reads the value written" by the last lock release. —end note ]

1.10p8:

An evaluation A happens before an evaluation B if:

1.10p9:

An evaluation A precedes an evaluation B if:

1.10p10:

A multi-threaded execution is consistent if each thread observes values of objects that obey the following constraints:

[Note: The first condition implies that a read operation B cannot "see" an assignment A if B happens before A. It also prevents cyclic situation in which, for example x and y are initially zero, one thread evaluates x = y; while another evaluates y = x;, each sees the result of the other thread, and both x and y obtain a value of 42. The second condition effectively asserts that later assignments hide earlier ones if there is a well-defined order between them. —end note ]

1.10p11:

An execution contains an inter-thread data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any inter-thread data race results in undefined behavior. A multi-threaded program that does not contain a data race exhibits the behavior of a consistent execution. [Note: It can be shown that programs that correctly use simple locks to prevent all inter-thread data races, and use no other synchronization operations, behave as though the executions of their constituent threads were simply interleaved, with each observed value of an object being the last value assigned in that interleaving. This is normally referred to as "sequential consistency". However, this applies only to race-free programs, and race-free programs cannot observe most program transformations that do not change single-threaded program semantics. In fact, most single-threaded program transformations continue to be allowed, since any program that behaves differently as a result must perform an undefined operation. —end note ]

1.10p12:

[Note: Compiler transformations that introduce assignments to a potentially shared memory location that would not be modified by the abstract machine are generally precluded by this standard, since such an assignment might overwrite another assignment by a different thread in cases in which an abstract machine execution would not have encountered a data race. —end note ]

Various other changes in the base language are no doubt needed, but not yet clear. I think there is somewhat of a consensus that thread-safety of static initialization should be explicitly indicated with a new keyword such as "async"? Exception issues should probably be deferred to the thread API proposal.

Sequencing for specific operators

It seems appropriate to remind the reader, at this point in the paper, that the proposal is to move 5p4 from its current location.

5.2.2p8 (function call); deleted as redundant with (new) 1.9p17:

The order of evaluation of arguments is unspecified. All side effects of argument expression evaluations take effect before the function is entered. The order of evaluation of the postfix expression and the argument expression list is unspecified.

5.2.6p1 (post-increment):

The value obtained by applying of a postfix ++ expression is the value that the of its operand had before applying the operator. [ Note: the value obtained is a copy of the original value —end note ] The operand shall be a modifiable lvalue. The type of the operand shall be an arithmetic type or a pointer to a complete object type. After the result is noted, the The value of the operand object is modified by adding 1 to it, unless the object is of type bool, in which case it is set to true. [ Note: this use is deprecated, see Annex D. —end note ] The value computation of the ++ expression is sequenced before the modification of the operand object. With respect to an indeterminately-sequenced function call, the operation of postfix ++ is a single evaluation. [ Note: Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single postfix ++ operator. —end note ] The result is an rvalue. The type of the result is the cv-unqualified version of the type of the operand. See also 5.7 and 5.17.

5.14p2 (logical AND operator), and also 5.15p2 (logical OR operator):

The result is a bool. All side effects of the first expression except for destruction of temporaries (12.2) happen before the second expression is evaluated. If the second expression is evaluated, every value computation and side effect associated with the first expression is sequenced before every value computation and side effect associated with the second expression.

5.16p1 (conditional operator):

Conditional expressions group right-to-left. The first expression is implicitly converted to bool (clause 4). It is evaluated and if it is true, the result of the conditional expression is the value of the second expression, otherwise that of the third expression. All side effects of the first expression except for destruction of temporaries (12.2) happen before the second or third expression is evaluated. Only one of the second and third expressions is evaluated. Every value computation and side effect associated with the first expression is sequenced before every value computation and side effect associated with the second or third expression.

5.17p1 (assignment and compound assignment operators):

The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand and return an lvalue with the type and value of the left operand after the assignment has taken place an lvalue referring to the left operand. The result in all cases is a bit-field if the left operand is a bit-field. In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression. With respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation. [ Note: Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single compound assignment operator. —end note ]

5.18p1 (comma operator):

A pair of expressions separated by a comma is evaluated left-to-right and the value of the left expression is discarded. The lvalue-to-rvalue (4.1), array-to-pointer (4.2), and function-to-pointer (4.3) standard conversions are not applied to the left expression. All side effects (1.9) of the left expression, except for the destruction of temporaries (12.2), are performed before the evaluation of the right expression. Every value computation and side effect associated with the left expression is sequenced before every value computation and side effect associated with the right expression. The type and value of the result are the type and value of the right operand; the result is an lvalue if its right operand is an lvalue, and is a bit-field if its right operand is an lvalue and a bit-field.

Sequencing for destruction of temporaries

12.2p3:

When an implementation introduces a temporary object of a class that has a non-trivial constructor (12.1, 12.8), it shall ensure that a constructor is called for the temporary object. Similarly, the destructor shall be called for a temporary with a non-trivial destructor (12.4). Temporary objects are destroyed as the last step in evaluating the full-expression (1.9) that (lexically) contains the point where they were created. This is true even if that evaluation ends in throwing an exception. The value computations and side effects of destroying a temporary object are associated only with the full-expression, not with any specific subexpression.

12.2p4:

There are two contexts in which temporaries are destroyed at a different point than the end of the full-expression. The first context is when a default constructor is called to initialize an element of an array. If the constructor has one or more default arguments, the destruction of any temporaries temporary created in the a default argument expressions are destroyed immediately after return from the constructor expression is sequenced before the construction of the next array element, if any.

12.2p5:

The second context is when a reference is bound to a temporary. The temporary to which the reference is bound or the temporary that is the complete object of a subobject to which the reference is bound persists for the lifetime of the reference except as specified below. A temporary bound to a reference member in a constructor’s ctor-initializer (12.6.2) persists until the constructor exits. A temporary bound to a reference parameter in a function call (5.2.2) persists until the completion of the full expression containing the call. A temporary bound to the returned value in a function return statement (6.6.3) persists until the function exits. In all these cases, the temporaries created during the evaluation of the expression initializing the reference, except the temporary to which the reference is bound, are destroyed at the end of the full-expression in which they are created and in the reverse order of the completion of their construction. The destruction of a temporary whose lifetime is not extended by being bound to a reference is sequenced before the destruction of any of any temporary which is constructed earlier in the same full-expression. If the lifetime of two or more temporaries to which references are bound ends at the same point, these temporaries are destroyed at that point in the reverse order of the completion of their construction. In addition, the destruction of temporaries bound to references shall take into account the ordering of destruction of objects with static or automatic storage duration (3.7.1, 3.7.2); that is, if obj1 is an object with the same storage duration as the temporary and created before the temporary is created the temporary shall be destroyed before obj1 is destroyed; if obj2 is an object with the same storage duration as the temporary and created after the temporary is created the temporary shall be destroyed after obj2 is destroyed. [ Example:

Fixes for miscellaneous sequencing issues

3.6.2p1 (initialization of non-local objects):

Objects with static storage duration (3.7.1) shall be zero-initialized (8.5) before any other initialization takes place. A reference with static storage duration and an object of POD type with static storage duration can be initialized with a constant expression (5.19); this is called constant initialization. Together, zero-initialization and constant initialization are called static initialization; all other initialization is dynamic initialization. Static initialization shall be performed before any dynamic initialization takes place. Dynamic initialization of an object is either ordered or unordered. Definitions of explicitly specialized class template static data members have ordered initialization. Other class template static data members (i.e., implicitly or explicitly instantiated specializations) have unordered initialization. Other objects defined in namespace scope have ordered initialization. Objects defined within a single translation unit and with ordered initialization shall be initialized in the order of their definitions in the translation unit. The order of initialization is unspecified for objects with unordered initialization and for objects defined in different translation units. An unordered initialization is indeterminately sequenced with respect to every other dynamic initialization. [ Note: 8.5.1 describes the order in which aggregate members are initialized. The initialization of local static objects is described in 6.7. —end note ]

8.5.1p17 (aggregate initialization); new paragraph:

The full-expressions in an initializer-clause are evaluated in the order in which they appear.

12.6.2p3 (mem-initializers):

The expression-list in a mem-initializer is used to initialize the base class or non-static data member subobject denoted by the mem-initializer-id. The semantics of a mem-initializer are as follows:

[unchanged example omitted]

There is a sequence point (1.9) after the initialization of each base and member. The initialization of each base and member constitutes a full-expression. The expression-list of Any expression in a mem-initializer is evaluated as part of the initialization of the corresponding base or member full-expression that performs the initialization.

14.2 (template arguments):

template-argument:
assignment-expression constant-expression
type-id
id-expression

Semantics of some non-terminating loops

Concern has been expressed about whether it is safe and legal for a compiler to optimize based on the assumption that a loop will terminate. The canonical example:

for (T * p = q; p != 0; p = p->next)
    ++count;
x = 42;

Is it valid for the compiler to move the assignment to x above the loop? If the loop terminates, clearly yes, because the overall effect of the code doesn't change; furthermore, in the absence of synchronization, there is no guarantee that the assignment to x will not be visible to a different thread before any assignments to count.

But what if the loop doesn't terminate? For example, may a user assume that a non-terminating loop effects synchronization, and may therefore be used to prevent a data race? Clearly, a loop that contains any explicit synchronizations must be assumed to interact with a different thread, and a loop that contains a volatile access or a call to an I/O function must be assumed to interact with the environment, so optimization opportunities for such a loop are already limited. But what about a simple loop, as above?

If such a loop does not terminate, then clearly neither the loop itself nor any code following the loop can have any observable behavior. Moreover, as the "least requirements" refer to data written to files "at program termination", the presence of a non-terminating loop may even nullify observable behavior preceding entry to the loop (for example, because of buffered output). For these reasons, there are problems with concluding that a strictly-conforming program can contain any non-terminating loop. We therefore conclude that a compiler is free to assume that a simple loop will terminate, and to optimize based on that assumption.