ISO/ IEC JTC1/SC22/WG14 N822

SC22/WG14 N822 1998-05-03

                                 Annex S
                               (normative)
                     Formal model of Sequence Points


S.1 Introduction

Subclause 6.3 reads in part:
    Between the previous and next sequence point an object
    shall have its stored value modified at most once by the
    evaluation of an expression. Furthermore, the prior value
    shall be accessed only to determine the value to be stored.

This annex defines a formal algorithm for determining whether any given
expression or set of expressions obeys this requirement, and therefore
whether or not it involves unspecified or undefined behaviour.

An implementation is not required to follow the model defined in this
annex. In particular the model assumes serial evaluation, but parallel
evaluation is equally valid. However, an implementation shall ensure -
for all expressions that are not determined to be undefined in this
model - that expressions are evaluated in a way that will yield one of
the results specified by this model, and that each volatile object is
accessed exactly the number of times determined by the model.


S.2 Basic concepts

This model operates in terms of "events". An expression or set of
expressions give rise to a number of events and some constraints on
their order of occurence. All possible arrangements that meet those
constraints are then examined. If any of these arrangements involves
undefined or unspecified behavior then so does the expression or
set of expressions.

S.2.1 Events

An event is one of:
- reading the value of a byte
- writing a value to a byte
- designation of a byte in an object
- a function call
- a sequence point.

A "write" event does not mean that the value is stored immediately,
but that it will be stored before the next sequence point. The model
automatically takes this into account.

S.2.2 Function calls

Where an expression involves a function call, that call is "atomic" for
the purposes of this model. There must be a sequence point immediately
before (see 6.3.2.2) and after each function call (either because it
ends in a full expression, or because it is required by 7.1.8), and so
it can be seen that the effects of the call - for the purposes of the
surrounding expression - can be determined purely by the read and write
events involved in it, ignoring their ordering. These events cannot be
interspersed with events from outside the call. Therefore this model
treats a function call as being a sequence point.

S.2.3 Short circuits and optimisation

The && || and ?: operators are "short circuiting" - the way in which an
expression containing them is evaluated depends on the operands. The
model described in this annex handles this.

Other expressions can sometimes be optimised, for example because the
result of an operator can be determined from only some of the operands;
the most obvious case is multiplication by constant zero. Not only does
this model require the other operand be evaluated in this case, but all
the implications of this model must continue to hold even where an
optimising implementation could store values in objects earlier than the
model would allow. [*]

[*] Of course, provided the relevant side-effects take place as if the
model were followed, any optimisation that yields a valid result (as
specified in S.1) is permitted.

S.2.4 Notation

Events are written as follows:
  R(a)      read the byte at address a;
  W(a)      write the byte at address a;
  L(a)      an lvalue designating the byte at address a;
  F(e)      the call to the function designated by e;
  S         a sequence point;
  D         a dummy event with no effect.

If it is necessary to distinguish separate events of the same kind, this
is done by placing a number or name in braces after the event (such as
"R(a,n){5}" or "S{pre}").

In addition the following notation is used:
  X < Y     for each event P in those indicated by X, and each event Q
            in those indicated by Y, P must occur before Q;
  R(a,n)    R(a), R(a+1), ... R(a+n-1)
  W(a,n)    W(a), W(a+1), ... W(a+n-1)
  L(a,n)    L(a), L(a+1), ... L(a+n-1)
  E(e)      the events and constraints of expression e;
  V(e)      the events and constraints of expression e, but with each
            L(a) event replaced by a (different) dummy event D;
  Z(e)      the size (in bytes) of an object with the same type as the
            expression e (or 1 if e has function type).


S.3 Operation of the model

The model is operated in the following stages:
(1) transforming each expression
(2) identifying events and constraints
(3) analysis of all possible orders of events

Where more than one expression is involved, each is processed through
the first two stages separately, and then combined as given in S.4
before the last stage.

S.3.1 Transformation

The first stage of analysis of an expression is to transform it to a
canonical form. Then:
    each expression of the form             is replaced by the form

    e->field                                (*(e)).field
    e1 [e2]                                 *((e1)+(e2))
    &*e                                     e
    e1 && e2       where e1 is zero         (e1)
                   where e1 is nonzero      ((e1) , (e2))
    e1 || e2       where e1 is zero         ((e1) , (e2))
                   where e1 is nonzero      (e1)
    e1 ? e2 : e3   where e1 is zero         ((e1) , (e3))
                   where e1 is nonzero      ((e1) , (e2))

Note that the && || and ?: operators require multiple analyses to be
made; the expression involves undefined behavior in those situations
where the corresponding case does; if the value of e1 is unspecified,
the expression involves undefined behavior if either alternative does.

Finally, wherever subclause 6.3 requires an lvalue not of array type to
be converted to the value stored in the designated object, replace the
lvalue expression
    e
by the expression
    $e
and where it requires an lvalue of array type to be converted to a
pointer to the first element of the array, or an lvalue of function
type to be converted to a pointer to the function, replace it by the
expression
    @e
(where $ and @ are notional operators introduced by the model).

S.3.2 Event identification

Each sub-expression gives rise to a set of events and constraints on
events as follows.

A constant or string literal does not give rise to any events. A
parenthesised expression gives rise to the same events and constraints
that the unparenthesised version would.

An identifier x gives rise to L(a,Z(x)), where a is the address of x.
This includes the case where x has an array type. However, an identifier
with function type does not give rise to any events.

A compound literal (type){e1,e2,...} gives rise to L(a,Z(type)) plus
the constraints E(e1) < L(a,Z(type)), E(e2) < L(a,Z(type)), etc, where
a is the address of the object generated by the compound literal.

A type name with variably modified type and containing the expressions
e1, e2, etc. (in array declarators) gives rise to all the events and
constraints from V(e1), V(e2), etc. (with no contraints between these).

All other expressions give rise to events and constraints as follows:

Expression       Events and constraints
---------------------------------------
$e               E(e), but replacing each L(a) by R(a)
@e               V(e)
e0(e1,e2,...)    E(e0), E(e1), E(e2), etc. plus new F(e0),
                   E(e0) < F(e0), E(e1) < F(e0), E(e2) < F(e0), etc.
e.field          E(e), but if this contains L(a,Z(e)) for some address a
                   then that is replaced by L(a+b,Z(e.field)), where b
                   is the offset of the field in the structure or union
                   type
++e   --e      ) E(e), but replacing each L(a) by the pair [R(a), W(a)],
e++   e--      ) with the constraint R(a) < W(a) for each such pair
&e               V(e)
*e               E(e) plus new L(a,Z(*e)), where a is the address
                   pointed to, and the constraint E(e) < L(a,Z(*e))
sizeof e       ) V(e) or V(type) if e or type has variably modified type
sizeof(type)   )   otherwise none
+e  -e  !e  ~e   E(e)
(type)e          V(e), E(type)
e1<op>e2         E(e1), E(e2)
e1,e2            E(e1), E(e2), plus new S
                   E(e1) < S, S < E(e2)
e1=e2            E(e1) but replacing each L(a) by W(a), E(e2),
                   E(e2) < W(a) for each W(a) created by the replacement
e1<op=>e2        E(e1) but replacing each L(a) by the pair [R(a), W(a)],
                   E(e2), and for each W(a) created by the replacement
                   the constraints [E(e2), R(a)] < W(a) where the R(a)
                   is the corresponding event created by the replacement

    <op>   is any of  * / % + - << >> < > <= >= == != & ^ |
    <op=>  is any of  *= /= %= += -= <<= >>= &= ^= |=

S.3.3 Event analysis

When all events and constraints in the expression have been determined,
every possible arrangement of the remaining events[*] that obeys the
constraints is considered.

[*] There will be no L(a) events remaining, since they represent
lvalues and these will have been made the operand of some operator,
possibly a $ operator.

If an arrangement contains W(a) followed by either R(a) or W(a) (for the
same address a) without an intervening S or F event, the expression
involves undefined behavior.

Otherwise, if two different arrangements would have different effects
then it is unspecified which results. [*]

[*] This will occur if one arrangement contains W(a){1} followed by
either R(a){2} or W(a){3} and the other arrangement contains either
R(a){2} or W(a){3} followed by W(a){1} (in each case possibly with other
events intervening).

If an R(a) event applies to a volatile object, this is an access to the
object. If there is more than one R(a) event for the same volatile
object, the model shows which possible orderings can apply to the
accesses for each purpose; if the value of the expression depends on
the order of these accesses and the model permits more than one order,
which order is used is unspecified.

S.4 Application

For each expression statement, and for each full expression in a
selection, iteration, or jump statement, the expression shall be
considered separately in its normal place in the sequence of execution.

For each array declarator involving a variably modified type, and for
each initializer list, the events of all the individual expressions
shall be considered together with no constraints applying between
events from two different full expressions.

S.4.1 Signals

If program execution is interrupted by a signal where the handler
returns to the caller, the behavior shall be as if an F event for the
signal handler were inserted at an unspecified place in the sequence
of events forming the full expression or array declarator being
evaluated at the time of the signal.

S.4.2 Floating-point Environment

For the purposes of this model, the floating-point exception flags
constitute a single object and the remainder of the floating-point
environment another single object. Any W event to the latter implies a
W event to the former with the same constraints and with no constraint
between the two.

If the state of the FENV_ACCESS pragma is on, then any operation that
sets a floating-point exception flag shall imply a W event for the
floating-point exception flags ("WX") with the following constraints:

Operations       Constraints
----------------------------
$e             ) WX shall occur after every event associated with
+e  -e  !e  ~e ) the sub-expression
e1<op>e2       )

++e   --e      ) WX shall have the same constraints as the W events
e++   e--      ) generated in the analysis of the sub-expression, but
e1=e2          ) may occur before any or all of those W events.
e1<op=>e2      )

Notwithstanding S.3.3, if an arrangement contains more than one W event
for the floating-point exception flags without an intervening S or F
event, the behaviour shall not be undefined but it is unspecified which
of the flags associated with the W events will be set (though at least
one of those flags shall be set). [*]

[*] If it were undefined to write twice to the flags between sequence
points, it would not be practical to use the facilities of the
<fenv.h> header in complicated expressions.


S.5 Examples

In these examples the type int and all pointer types have a size of 1,
double has a size of 3, and addresses are in the range 1000 to 1999. In
each case it is the final line that is being analysed.

(1)
    int x, y, z;
    x = y + z;

The canonical form is:
    x = $y + $z

The events and constraints identified by each sub-expression are:
    x:  L(1000){1}
    y:  L(1234){2}
    z:  L(1666){3}
    $y: R(1234){4}
    $z: R(1666){5}
    +:  {4}, {5}
    =:  W(1000){6}, {4}, {5}, [{4}, {5}] < {6}
Since each address only appears in one R or W event, there is no
undefined behavior.

(2)
    int z, y;
    x = y++;    // already in canonical form

The events and constraints identified by each sub-expression are:
    x:      L(1000){1}
    y:      L(1234){2}
    y++:    R(1234){3}, W(1234){4}, {3} < {4}
    =:      W(1000){5}, {3}, {4}, {3} < {4}, [{3}, {4}] < {5}
There is only one possible ordering:
    R(1234) : W(1234) : W(1000)
Again each address only appears in one event, so there is no undefined
behavior.

(3)
    int x;
    x = ++x;    // already in canonical form

The events and constraints identified by each sub-expression are:
    x:   L(1000){1}  [the "x" that is the left operand of =]
    x:   L(1000){2}  [the "x" that is the operand of ++]
    ++x: R(1000){3}, W(1000){4}, {3} < {4}
    =:   W(1000){5}, {3}, {4}, {3} < {4}, [{3}, {4}] < {5}
Even though there is still only one possible ordering, this time the
same address appears in two separate W events without an intervening
sequence point, and so the behavior is undefined.

(4)
    int x;
    x += x * x;

The canonical form is:
    x += $x * $x;

The events and constraints identified by each sub-expression are:
    x:  L(1000){1}    [the "x" that is the left operand of =]
    x:  L(1000){2}    [the "x" that is the left operand of *]
    x:  L(1000){3}    [the "x" that is the right operand of *]
    $x: R(1000){4}    [derived from {2}]
    $x: R(1000){5}    [derived from {3}]
    *:  {4}, {5}
    +=: R(1000){6}, W(1000){7}, {4}, {5}, [{4}, {5}, {6}] < {7}
The address 1000 appears in all four of the remaining events, and this
time there are six permitted orderings but none of the 6 involve
undefined behavior, and so the expression is valid. If x had volatile-
qualified type it would be unspecified which of the three reads would
be used for each of the three operands of the (implied) x + x * x.

(5)
    int x;
    extern int f(int);
    x = f(x++);

The canonical form is:
    x = (@f)(x++)

The events and constraints identified by each sub-expression are:
    x:    L(1000){1}    [the "x" that is the left operand of =]
    x:    L(1000){2}    [the "x" that is operand of ++]
    x++:  R(1000){3}, W(1000){4}, {3} < {4}
    f:    none
    @f:   none
    call: {3}, {4}, F(f){5}, {3} < {4}, [{3}, {4}] < {5}
    =:    W(1000){6}, {3}, {4}, {5}, {3} < {4}, [{3}, {4}] < {5},
          [{3}, {4}, {5}] < {6}
The four events have one possible ordering:
    R(1000) : W(1000) : F(f) : W(1000)
It is therefore clear that x is incremented before the function call
starts execution, that the result of the function is stored in x, and
that the expression is valid. If the function has some other form of
access to x, it will see the incremented value, and any value it stores
in x will be lost.

If the expression is replaced by:
    x = 2 * f(x++);
the analysis is effectively the same - the multiplication by a constant
adds no new events. And if it is replaced by:
    x = 0 * f(x++);
the analysis continues to remain the same; an optimising implemenation
can determine that x is set to 0, but it must still make the incremented
value of x available to the function.

(6)
    int x, y;
    (x=y) + x;

The canonical form is:
    (x=$y)+$x

The events and constraints identified by each sub-expression are:
    x:    L(1000){1}    [the "x" that is the left operand of =]
    y:    L(1234){2}
    x:    L(1000){3}    [the "x" that is the right operand of +]
    $y:   R(1234){4}
    =:    W(1000){5}, {4}, {4} < {5}
    $x:   R(1000){6}
    +:    {4}, {5}, {6}, {4} < {5}
There are three possible orderings:
    {4} : {5} : {6}    R(1234) : W(1000) : R(1000)
    {4} : {6} : {5}    R(1234) : R(1000) : W(1000)
    {6} : {4} : {5}    R(1000) : R(1234) : W(1000)
The first of the three involves reading address 1000 after writing it -
without an intervening sequence point - and so the expression is
undefined.

(7)
    int x, y, z;
    (x=y) + (x=z);

The canonical form is:
    (x=$y)+(x=$z)

The events and constraints identified by each sub-expression are:
    x:    L(1000){1}    [the "x" that is set to "y"]
    y:    L(1234){2}
    x:    L(1000){3}    [the "x" that is set to "z"]
    z:    L(1666){4}
    $y:   R(1234){5}
    x=$y: W(1000){6}, {5}, {5} < {6}
    $z:   R(1666){7}
    x=$z: W(1000){8}, {7}, {7} < {8}
    +:    {5}, {6}, {7}, {8}
There are six possible orderings, but all involve both {6} and {8} not
separated by a sequence point, and so the expression is undefined.

(8) To demonstrate the rules for arrays:

    double x [5];
    int y = 3;
    x[y] /= (double) &x[y];

The canonical form is:
    *(@x+$y) /= (double)(@x+$y);

The events and constraints identified by each sub-expression are:
    x:      L(1000,15){1}    [the "x" to the left of the =]
    y:      L(1666){2}       [the "y" to the left of the =]
    x:      L(1000,15){3}    [the "x" to the right of the =]
    y:      L(1666){4}       [the "y" to the right of the =]
    @x:     D{5}             [derived from {1}]
    $y:     R(1666){6}       [derived from {2}]
    +:      {5}, {6}
    x[y]:   {5}, {6}, L(1009,3){7}, [{5}, {6}] < {7}
    @x:     D{8}             [derived from {3}]
    $y:     R(1666){9}       [derived from {4}]
    +:      {8}, {9}
    double: none
    cast:   {8}, {9}
    /=:     {5}, {6}, R(1009,3){10}, W(1009,3){11},
            {5} < {10}, {5} < {11}, {6} < {10}, {6} < {11},
            {8}, {9}, [{8}, {9}, {10}] < {11}

The only purpose of D events is to preserve ordering in cases like
[{22} < D, D < {23}] and so {5} and {8} can be ignored. This leaves
the events:
    R(1666){6}, R(1666){9}, R(1009,3){10}, W(1009,3){11}
and the constraints:
    {6} < {10}, {6} < {11}, {9} < {11}, {10} < {11}
This permits three orderings:
    R(1666){6} : R(1009,3) : R(1666){9} : W(1009,3)
    R(1666){6} : R(1666){9} : R(1009,3) : W(1009,3)
    R(1666){9} : R(1666){6} : R(1009,3) : W(1009,3)
None of these involve undefined behavior.

(9) To demonstrate the rules for structures:

    int x;
    struct s { double p; int q; double r; } y;
    x = y.q;

The canonical form is:
    x = $(y.q)

The events and constraints identified by each sub-expression are:
    x:      L(1000){1}
    y:      L(1230,7){2}
    y.q:    L(1233,1){3}
    $(y.q): R(1233){4}
    =:      L(1233,1){5}, {4}, {4} < {5}

(10)
    struct s { double p; int q; int r; } *x, y;
    x = &y;
    x->q = x->r;

The canonical form is:
    (*$x).q = $((*$x).r)

The events and constraints identified by each sub-expression are:
    x:       L(1000){1}     [part of the left operand of the =]
    x:       L(1000){2}     [part of the right operand of the =]
    $x:      R(1000){3}     [derived from {1}]
    *$x:     {3}, L(1220,5){4}, {3} < {4}
    x->q:    {3}, L(1223,1){5}, {3} < {5}
    $x:      R(1000){6}     [derived from {2}]
    *$x:     {6}, L(1220,5){7}, {6} < {7}
    x->r:    {6}, L(1224,1){8}, {6} < {8}
    $(x->r): {6}, R(1224,1){9}, {6} < {9}
    =:       {3}, W(1223,1){10}, {3} < {10}, {6}, {9}, {6} < {9},
             [{6}, {9}] < {10}

These constraints allow three orderings:
    {3} : {6} : {9} : {10}
    {6} : {3} : {9} : {10}
    {6} : {9} : {3} : {10}
and it is clear that all of these are valid.

(11)
    int x;
    x++ && x--;

This expression has two canonical forms, depending on whether x is or
is not initially zero:
    x++             // x is initially 0
    x++ , x--       // x is not initially 0

In this case it is clear that only the latter needs to be evaluated
but this might not be the case in more complex expression. Doing so,
the events and constraints identified by each sub-expression are:
    x:   L(1000){1}     [the "x" in x++]
    x:   L(1000){2}     [the "x" in x--]
    x++: R(1000){3}, W(1000){4}, {3} < {4}
    x--: R(1000){5}, W(1000){6}, {5} < {6}
    ,:   {3}, {4}, {3} < {4}, {5}, {6}, {5} < {6}, S{7},
         [{3}, {4}] < {7}, {7} < [{5}, {6}]

There is only one possible ordering, and this has two W events to the
same address, but these are separated by an S event and so the
expression is valid. It is clear that the decrement must apply to the
incremented variable.

(12)
    int x, y;
    x++ * y++ ? x-- : y--;

Again there are two canonical forms:
    x++ * y++ , x--    // x and y both initially nonzero
    x++ * y++ , y--    // either x or y initially zero

In each case the analysis follows the lines above. The first case
yields:
    R(1000){1}, W(1000){2}, R(1234){3}, W(1234){4}, S{5}, W(1000){6},
    {1} < {2}, {1} < {5}, {2} < {5}, {3} < {4}, {3} < {5}, {4} < {5},
    {5} < {6}

There are six possible orderings, but all place a sequence point between
the first and second writes to x.

(13)
    int x[2], *y;
    y = x;
    *y = f(y++);

The canonical form is:
    *$y = (@f)(y++)

The events and constraints identified by each sub-expression are:
    y:    L(1234){1}   [the "y" on the left of the =]
    y:    L(1234){2}   [the "y" that is the operand of y++]
    y++:  R(1234){3}, W(1234){4}, {3} < {4}
    f:    none
    @f:   none
    call: {3}, {4}, {3} < {4}, F(f){5}, [{3}, {4}] < {5}
    $y:   R(1234){6}   [derived from {1}]
    *$y:  {6}, L(X){7}, {6} < {7}
    =:    {6}, W(X){8}, {6} < {8}, {3}, {4}, {5}, {3} < {4},
          [{3}, {4}] < {5}, [{3}, {5}] < {8}
where X is the location pointed to by y at the point of the * operator.
One possible ordering is:
    R(1234){3} : W(1234){4} : R(1234){6} : F(f){5} : W(X){8}
and therefore the expression involves undefined behavior.

(14)
    int x[2], y;
    y = 0;
    x[y] = f(y++);

The canonical form is:
    *(@x+$y) = (@f)(y++)

The events and constraints identified by each sub-expression are:
    x:    L(1000,2){1}
    y:    L(1234){2}   [the "y" on the left of the =]
    y:    L(1234){3}   [the "y" that is the operand of y++]
    @x:   D{4}
    $y:   R(1234){5}   [derived from {1}]
    +:    {4}, {5}
    x[y]: {4}, {5}, L(X){6}, [{4}, {5}] < {6}
            where X is 1000 plus the value determined at {5}
    y++:  R(1234){7}, W(1234){8}, {7} < {8}
    f:    none
    @f:   none
    call: {7}, {8}, {7} < {8}, F(f){9}, [{7}, {8}] < {9}
    =:    {4}, {5}, W(X){10}, [{4}, {5}] < {10}, {7}, {8}, {9},
          {7} < {8}, {8} < {9}, {7} < {9}, [{7}, {8}, {9}] < {10}

One possible ordering is:
    D{4} : R(1234){7} : W(1234){8} : R(1234){5} : F(f){9} : W(X){10}
and therefore the expression involves undefined behavior.

(15)
    int x = 5;
    int a [x][x++];

There are two expressions to be considered, and their canonical forms
are:
    $x
    x++

The first generates the event:
    R(1000){1}
and the second the events and constraints:
    R(1000){2}, W(1000){3}, {2} < {3}
There are no constraints between {1} and the other two events, and so
they can occur in any of three orders. One of these arrangements puts
W(1000) before R(1000) and so the declaration as a whole involves
undefined behavior.