Sequence points and related issues ================================== SC22/WG14/N1008 Clive Feather Last changed 2003-04-30 Introduction This paper describes a "Consensus Model" for addressing the issues of sequence points and the evaluation of expressions. It then points out the major problems with the current wording of the Standard and proposes changes. The Consensus Model While there is still debate on many aspects of these matters, consensus appears to have emerged on certain features of the model. This "consensus model" works thus. Evaluation of an expression implies the presence of one or more "events"; the consensus model does not cover the details of what events exist or how they relate to the expression (N925 is an example of such a model that is compatible with the consensus model), but it is agreed that the significant ones are "read" and "write" events and sequence points. For example, the expression "x = y + z" involves read events for y and z and a write event for x. The order that events occur in is generally unspecified, but is constrained in two important ways: (1) causality - in "x = y + z", the write event for x must come after the two read events; (2) sequencing - where two events are required by the Standard to come in a particular order; for example, in "x = 0, y = 0", the write event for x must come before that for y. Because the latter is often described by the Standard in terms of sequence points, these become a third significant type of event. Having identified the events, it is necessary (in principle) to examine every possible ordering that is consistent with the above constraints. * If any possible ordering could cause two write events, or a write event followed by a read event, for the same object to occur with no intervening sequence point, the behaviour is undefined. Note that: - this does not apply to two read events, or when the read event precedes the write event; - the requirement is on the programmer to ensure there is no such ordering possible, not on the implementation to choose a different order. * Otherwise the expression is legitimate. However, if two write events, or a read event and a write event, for the same object can occur either way round, it is unspecified which of the two values written is the final value of the object, or whether the value read is the one written or is the previous value; this may in turn affect other results. As an example of the first rule, the expression "x = x++" involves two write events for x and no contraints on their relative order. Therefore the behaviour is undefined. Similarly, "x++ * -x" involves two read events for x and one write event; while the read event associated with "x++" must come before the write event, that associated with "-x" need not and so the behaviour is undefined. On the other hand, "x++" is legitimate because the write event must be after the read event, and "x = 1, x = 2" is legitimate because the sequence point separates the two writes. The rationale behind these rules is based on optimisation and parallel execution. When re-organising code, an optimiser may well want to move a read to an earlier point than it is absolutely needed, or save a generated value in a register and move the write to a later point. Sequence points represent the limits of how such moves can affect the behaviour of code (the optimiser can move the read or write further, but must produce results that are "as if" it had not). "Parallel execution" refers not just to systems with more than one processor, but to any arrangement (such as pipelining or auto-spilling of registers) where two actions can be happening at once. Where the same memory location is accessed through two different paths at the same time, these systems may not guarantee the results - for example, the value actually stored might be the bitwise-OR of the two values being written. Similarly, if an object consists of more than one bus-width of data, the two writes might be interleaved in an unpredictable manner. Standard text The Standard has the following to say about sequence points in general: 5.1.2.3: [#2...] At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. [#4] When the processing of the abstract machine is interrupted by receipt of a signal, only the values of objects as of the previous sequence point may be relied on. Objects that may be modified between the previous sequence point and the next sequence point need not have received their correct values yet. [#5] The least requirements on a conforming implementation are: -- At sequence points, volatile objects are stable in the sense that previous accesses are complete and subsequent accesses have not yet occurred. 6.5: [#1] An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof. [#2] Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.70) [#3] The grouping of operators and operands is indicated by the syntax.71) Except as specified later (for the function- call (), &&, ||, ?:, and comma operators), the order of evaluation of subexpressions and the order in which side effects take place are both unspecified. 6.8: [#2] A statement specifies an action to be performed. Except as indicated, statements are executed in sequence. 7.1.4: 156 Such macros might not contain the sequence points that the corresponding function calls do. and, as an example of how they are used in specifying execution sequence: 6.5.2.4: [#2...] The side effect of updating the stored value of the operand shall occur between the previous and the next sequence point. Problems with the Standard text The problems with the above extracts can be summarised as follows: (1) It is unclear that the Consensus Model can be derived from the above. (2) In particular, it is unclear how to apply terms such as "previous sequence point" to expressions like "x + (y, z)", where a sub-expression contains a sequence point but the whole expression involves an unspecified order of evaluation. (3) A strict reading of the wording "shall be read only to determine the value to be stored" would prevent the value being used for any other purpose as well, such as determining the value to be stored in another object. (4) Though WG14 has stated that "statements are executed in sequence" means that the execution of a function call is atomic with respect to the surrounding expression (that is, the events of the expression all occur strictly before or strictly after the events of the function call), it is far from clear how to derive this from the Standard. Wording proposals These proposals are not in order in the Standard, but rather in conceptual order. First we need to make it clear that sequence points are not global, but must be read in context. 5.1.2.3: [#2...] At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. + Note that some evaluations are unordered with respect to a given + sequence point and, unless stated otherwise, may therefore occur + before or after that sequence point. Next we address the basic rules for when access to an object in an expression involves undefined behaviour. This is done by introducing the concept of "collateral expressions", which are expressions whose relative order is not specified and whose sequence points do not affect one another. We change the first three paragraphs of 6.5: 6.5: [#1] An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof. * [#2] The precedence of operators and operands is indicated by the syntax.71) Except as specified later (for the function- call (), &&, ||, ?:, and comma operators), the order of evaluation of subexpressions and the order in which side effects take place are both unspecified. + Where the order of evaluation of two expressions is + unspecified, they are /collateral expressions/. If E1 and E2 + are collateral subexpressions and E3 is a subexpression of E2, + then E1 and E3 are also collateral expressions (this + definition applies recursively). * [#3] If an object has its stored value (or two overlapping * objects have the overlapping part of their stored values) * accessed by two expressions, and one of the accesses modifies * the value, then the expressions shall not be collateral * expressions (but one may be within a function called by a * collateral expression of the other; see 6.5.2.2). Furthermore, * either the accesses shall be separated by a sequence point or * else the other access shall be a read and the prior value shall be read only to determine the value to be stored.70) (this last part will be re-addressed later on). There are a couple of places where multiple expressions occur and it is important to add wording to clarify which are collateral expressions. These are array declarations in 6.7.5.2: [#5] If the size is an expression that is not an integer constant expression: if it occurs in a declaration at function prototype scope, it is treated as if it were replaced by *; otherwise, each time it is evaluated it shall have a value greater than zero. + All of the size expressions in a full declarator that are + evaluated are collateral expressions of each other. The size of each instance of a variable length array type does not change during its lifetime. Where a size expression is part of the operand of a sizeof operator and changing the value of the size expression would not affect the result of the operator, it is unspecified whether or not the size expression is evaluated. and initializer lists in 6.7.8: + [#23] The expressions in an initializer list are collateral + expressions of each other, and therefore the order in which any side effects occur among the initialization list expressions is unspecified.130) Next we address the issue of making function calls atomic. This is done by amending 6.5.2.2#10: * [#10] The order of evaluation of the function designator * and the actual arguments is unspecified, but there are sequence * points before and after the actual call. + All side-effects within the function shall take place between + these two sequence calls. All side-effects in collateral + expressions of the function call shall either be complete before + the function is called or shall not take place until after the + function returns. As well as the additional text, there are two changes to the existing words: the reference to subexpressions of the arguments is dropped (since it has no effect) and a sequence point is added after the call (making it clear that it is "before" any subsequent subexpression). We can then add some further examples: + [#13] EXAMPLE Given: + + static int x; + int get_x (void) { return x; } + int put_x (int y) { x = y; return 0; } + + then each of the following are legitimate, though it is + unspecified which of two values results: + + x = 4; int z = x + put_x(8); // z could be 4 or 8 + x = 4; int z = x++ + get_x(); // z could be 8 or 9 + + Similarly, in: + + x = 4; int z = x++ + put_x(8); + + there are three possibilities: + * increment stored before call: x becomes 8, z becomes 4 + * increment stored after call: x becomes 5, z becomes 4 + * increment all done after call: x becomes 9, z becomes 8 + + Note that the following expressions, with apparently similar + intent, all involve undefined behaviour: + + z = x + (x = 8, 0); + z = x++ + x; + z = x++ + (x = 8, 0); The above change also allows us to drop a paragraph in 7.1.4: - [#3] There is a sequence point immediately before a library - function returns. since this sequence point was only put there to ensure there is one at the end of any function call. We might also consider extended a related footnote: 156 Such macros might not contain the sequence points that the corresponding function calls do. + Thus sin(x)+sin(y) might involve undefined behaviour even though + (sin)(x)+(sin)(y) does not. Finally, the last part of 6.5#2 (6.5#3 after the above changes) still has the problem described earlier - a strict reading of the wording "shall be read only to determine the value to be stored" would prevent the value being used for any other purpose as well. The change: [...] one of the accesses modifies the value [...] else the other access shall be a read * used to determine the value that is stored.70) would help here, though it still isn't clear what that means. However, I believe (though cannot at this stage prove) that saying: [...] one of the accesses modifies the value [...] else the other access shall be a read * which is part of the evaluation of an operand of the ++, --, * or assignment operator which makes the modification.70) would, when combined with the wording from above: then the expressions shall not be collateral expressions be sufficient. In other words, the following wording for 6.5#3 would establish all the essential features of the Consensus Model: [#3] If an object has its stored value (or two overlapping objects have the overlapping part of their stored values) accessed by two expressions, and one of the accesses modifies the value, then the expressions shall not be collateral expressions (but one may be within a function called by a collateral expression of the other; see 6.5.2.2). Furthermore, either the accesses shall be separated by a sequence point or else the other access shall be a read which is part of the evaluation of an operand of the ++, --, or assignment operator which makes the modification.