WG14/N925
A formal model of sequence points and related issues
Working draft
Clive Feather
Last changed 2000-09-28
Introduction
This paper is the latest form of my proposals for a formal model of
sequence points, previously known as Annex S and appearing in FCD1 (1998)
as Annex D.
In this paper the proposed text for the new Annex is interspersed with
discussion and with examples. The exact situation at any point is indicated
as follows:
[[Normative]] The text following would be normative text (or a footnote)
in Annex S within the amended Standard.
[[Change]] The text following describes other changes required to the
Standard to retain consistency.
[[Discussion]] The text following discusses and explains the proposals,
and would not appear in the Standard.
[[Examples]] The text following contains examples which could be
included or omitted as appears desirable.
[[Discussion]]
The Standard puts certain restrictions on expressions based on the concept
of a "sequence point". For example:
Between the previous and next sequence point an object
shall have its stored value modified at most once by the
evaluation of an expression. Furthermore, the prior value
shall be accessed only to determine the value to be
stored.
The problem with wording like this is that, while it appears simple enough
at first glance, it actually runs into a number of problems. For example:
x =3D y++;
The value of y is accessed not only to determine the value to be stored in
y, but also to determine the value to be stored in x. Or:
(x +=3D 0) + (x *=3D 1);
Neither assignment to x alters the stored value, and so it could be argued
that they do not modify it. Finally:
(0, x =3D 1, 1) + (2, x =3D 3, 3);
It is not clear here where the "previous" and "next" sequence points are.
There are other, related, issues involved in interpreting complicated
expressions which are also addressed here. Annex S would be a normative
Annex that provides a full interpretation of such expressions and would
override any other related text.
The Annex operates by providing a formal procedure for analysing an
expression to determine its status. This procedure could, in principle, be
carried out by an automaton, and for any expression produces one of three
results:
(1) The expression is well-formed and its result and side effects are
well-defined.
(2) The expression is well-formed, but there are several choices in the
order of evaluation of subexpressions, and so it is unspecified which
of these applies; thus the result and order of side-effects is also
unspecified to this extent.
(3) The expression involves undefined behaviour.
[[Change]]
Append the following to 6.5 paragraph 2, moving the reference to footnote
70 to the end of the appended text.
This rule shall be interpreted according to Annex S.
[[Normative]]
Annex S
(normative)
Formal model of sequence points and related issues
--------------------------------------------------
S.1 Introduction
[#1] This annex defines a formal algorithm for determining
whether any given expression or set of expressions meets
various requirements of the Standard such as that in 6.5:
Between the previous and next sequence point an object
shall have its stored value modified at most once by the
evaluation of an expression. Furthermore, the prior value
shall be accessed only to determine the value to be
stored.
An implementation is not required to follow exactly the model
defined in this annex. In particular, the model assumes serial
evaluation, but parallel evaluation is equally valid. However, an
implementation must ensure - for all expressions that are not
determined by the model to involve undefined behaviour - that
expressions are evaluated in a way that will yield one of the
results (including side-effects) specified by this model.
S.2 Basic concepts
S.2.1 Events and rules
The model is defined in terms of "events" and "rules", which are
concepts used only for this purpose and have no other meaning
in this International Standard.
Each subexpression of the expressions being analysed has
associated with it one or more events. An event consists of:
* a type, which is one of D, F, L, R, S, or W;
* in the case of types L, R, and W only, the address of a byte and,
possibly, a designation of one or more bits within that byte;
* in the case of type F only, a designator for a function.
Two different events may have the same type, associated address, or both
and remain separate events. An event may be associated with more than one
subexpression, and in each such subexpression it may be either "central"
or "incidental" (this is its "status" with respect to that subexpression).
Where an expression is said to "inherit" the events of a subexpression
then, unless stated explicitly to the contrary, all the events associated
with the subexpression are also associated with the larger expression with
the same status.
Events are shown in this Annex in one of the following ways:
{11:R 1234} an event of type R and address 1234, referred
to elsewhere as "event 11";
{12:S} an event of type S, referred to elsewhere as
"event 12";
{11} another use of the same "event 11";
{x:L 567 1-3} an event of type L referring only to bits 1-3 of
address 567, referred to elsewhere as "event x".
A rule is a requirement that one event "comes before" another. As such,
the rules generated in analysing an expression put a partial ordering on
the generated events. Given two events X and Y, the notation "X < Y"
means "X comes before Y". The notation "X =3D Y" is also used and means that
any rule that applies to X also applies to Y. Rules have the normal
properties of a partial ordering.
Rules have global effect in the model. That is, if a rule is introduced in
the analysis of one subexpression, it applies to all other subexpressions
that those two events are associated with and thus to the overall expressio=
n.
[*] Thus, given events X, Y, and Z:
X < X never happens;
if X < Y is a rule, then Y < X never happens;
if X < Y and Y < Z, then X < Z.
But it is possible for neither X < Y nor Y < X to be a rule.
Similarly:
if X =3D Y, then Y =3D X and vice versa;
if X < Y and Y =3D Z are rules, then X < Z;
if X =3D Y and Y < Z are rules, then X < Z;
X < Y and X =3D Y never happen together.
[[Discussion]]
Events are abstractions of the processes that make up the evaluation of an
expression. Each event is assumed to be atomic, and a rule is a statement
that one event must occur before a different one. The event types have the
following meanings:
D =3D a dummy event, existing only to preserve order relationships
F x =3D a call to the function designated by x
L x =3D designates the byte or bits at x with no action being taken
R x =3D reading the value of the byte or bits at x
S =3D a sequence point
W x =3D writing a value to the byte or bits at x
A W event does not mean that the value has to be stored immediately at the
point in the model that the event occurs, but merely that it must be
stored sometime before the next sequence point. The model automatically
takes this into account; if the expression is not undefined, the value will
not be required again before the next sequence point.
Central events are those that represent the "value" of the expression,
while incidental events are others that need to be tracked in the analysis.
Every function call has a sequence point immediately before it (see
6.5.2.2) and another immediately after it (either because it ends with a
full expression or because 7.1.4 requires one). Thus the effects of a
function call on the enclosing expression depend only on which locations
are read and written, and not on the order on which those reads and writes
occur. For the purposes of the rest of the model, the events making up the
function call form an atomic whole.
[[Examples]]
If each of the variables x, y, and z occupy only one byte, then the
expression "x =3D y + z" involves 6 events and 5 rules:
{1:L x} {2} =3D {4}
{2:L y} {3} =3D {5}
{3:L z} {1} =3D {6}
{4:R y} {4} < {6}
{5:R z} {5} < {6}
{6:W x}
The events associated with each subexpression are:
central incidental
x {1} none
y {2} none
$y {4} none
z {3} none
$z {5} none
y + z {4} {5} none
x =3D y + z {4} {5} {6} none
(The notation $y is explained below.)
In future examples, unless stated otherwise, all variables occupy one byte.
[[Normative]]
S2.2 Optimization
A particular implementation may be able to optimize an expression so that
one or more subexpressions need not be evaluated (such as "0 * (a + b)").
Nonetheless the implications of this model must be followed as if the
expression were evaluated fully.
[[Discussion]]
This rule is my particular interpretation of the Standard, and WG14 may
wish to take a different view of both this and other interpretations made
elsewhere.
There are further implications involved; for example, the expression
"x =3D 0 * f(x)" is well-defined according to this Annex, using the old val=
ue
of x for the function argument. The model would forbid compilers to produce
code corresponding to the following rewrites unless the implementer can
show that they cannot affect the result:
(f (0), x =3D 0)
(x =3D 0, f (x))
(x =3D 0, f (0))
(__temp =3D x, x =3D 0, f (__temp))
the first three because the wrong value is used for the parameter, and the
last one (in fact, the last three) because f might have some other access
to x (for example because it is a variable with file scope).
[[Normative]]
S.3 Process
The model requires an expression or set of expressions to be analysed in
three stages:
* converting each expression to a "canonical form", as described in S.3.1;
* determining the associated events and rules, as described in S.3.2;
* evaluating the status of the expression, as described in S.3.3.
S.3.1 Canonical form
Each expression is first converted to a "canonical form". First:
each expression of the form is replaced by the form
e->field (*(e)).field
e1[e2] *((e1)+(e2))
&*e e
e1 && e2 ((e1) ? (e2) : 0)
e1 || e2 ((e1) ? 1 : (e2))
sizeof e } one of { 1
sizeof type } (see below) { @(e) or @(type)
(each of these being done recursively).
A sizeof expression is replaced by an @ exprssion if the operand has
variable length array type, and by the constant 1 otherwise.
Next, wherever subclause 6.3.2.1 requires an lvalue not of array type to
be converted to the value stored in the designated object, the lvalue
expression e is replaced by the expression $(e), and where it requires an
expression of array type to be converted to a pointer to the first element
of the array, or an expression of function type to be converted to a
pointer to the function, it is replaced by the expression @(e) ($ and @
are notional operators introduced for the purpose of the analysis).
Finally, each expression of the form "e1 ? e2 : e3" is replaced by two
separate analyses:
((e1) , (e2)) where e1 is non-zero
((e1) , (e3)) where e1 is zero
(this also being done recursively). If the value of e1 is unspecified,
the overall result of the analysis is the worst case of the individual
options (e.g. if either is undefined, so is the whole expression).
[[Discussion]]
Converting the expression to canonical form does three things. Firstly,
certain operators that can be written in the form of other operators are
done so. Strictly speaking this is not necessary - the equivalent process
could be applied to the methods of subclause S.3.2, but doing it this way
makes it less likely that an inconsistency might be introduced.
The two notional operators $ and @ provide an explicit hook for the
conversions described in 6.3.2.1. Without them it would, in effect, be
necessary to repeat their definitions everywhere an lvalue might occur.
The sizeof operator usually involves no evaluation, and so a sizeof
expression can simply be replaced by a constant. However, if the operand
has variable length array type, it is evaluated. Since the result (of
evaluating the operand, not of the sizeof operator) will be an lvalue
designating an array, the notional operator @ is applied to it.
The ? operator (and therefore the && and || operators) requires two
completely different paths of analysis depending on the value of the first
expression, because either the second argument or the third is evaluated,
but not both.
I have considered the possibility that (a ? b : c) could be analysed as if
it were (a, b, c). However, I am not convinced that this would always yield
the same results.
[[Normative]]
S.3.2 Determining the events
Wherever this subclause describes the introduction of a rule, the notation
[=3DXX] (where X varies) is used. This notation is used only to refer to
these rules in the examples, and has no other effect.
S.3.2.1 Primary expressions
A constant, string literal, or identifier with function type has no events
associated with it. A parenthesised expression has the same events
associated with it that the unparenthesised version does.
An identifier with object (including array and incomplete) type has one
event of type L for each byte it occupies; these are all central events.
If the object has register storage class it still has a notional address,
even if that address cannot be determined using the & or @ operators.
S.3.2.2 Type names
If a type name does not have variably modified type, it has no events
associated with it. If it does have variably modified type, then it
inherits the events from each full expression in array declarators within
the type name.
S.3.2.3 Compound Literals
A compound literal inherits the events of the type name and of all the
expressions in the initializer. It also has one new event of type L for
each byte of the resulting object.
For each inherited event {i} that was a central event, for each new
event {c}, there is a rule {i} < {c} [=3DCP].
The new events are central events of the compound literal; all inherited
events become incidental events.
[[Examples]]
Take the compound literal "(int [2]){ x, y }" and assume:
* x has central event {x1} and incidental event {x2};
* y has central events {y1} and {y2} but no incidental events;
* the resulting object occupies addresses 1001 and 1002.=20
The compound literal has central events:
{c1:L 1001}
{c2:L 1002}
and incidental events:
{x1} {x2} {y1} and {y2}
and there are 6 rules introduced:
{x1} < {c1} {x1} < {c2}
{y1} < {c1} {y1} < {c2}
{y2} < {c1} {y2} < {c2}
[[Discussion]]
A compound literal expression is an lvalue, and therefore its important
events are L events that designate the resulting object. However, this
object cannot be used until it is initialized, and so rules are required to
enforce this. The events of the initializer expressions become incidental to
the compound literal, because once the compound literal exists the initial
values don't affect its behaviour in other expressions.
[[Normative]]
S.3.2.4 Nominal operators and address operators
The expression "$e" inherits all central events that do not have type L,
and all incidental events, of the subexpression e. For each central event
of type L of the subexpression (say {i:L addr}) the expression has a new
central event of type R and with the same address ({c:R addr}), and there
is a rule {i} =3D {c} [=3DRV].
The expression "@e" or "&e" inherits all central events that do not have
type L, and all incidental events, of the subexpression e. For each
central event of type L of the subexpression (say {i:L addr}) the
expression has a new central event of type D ({c:D}) and there is a rule
{i} =3D {c} [=3DLV].
The expression "*e" inherits all the events of the subexpression e. In
addition, if the resulting lvalue has object (including incomplete) type
then the expression has a new central event of type L for each byte of the
lvalue designated. For each inherited central event {i}, for each new
event {c}, there is a rule {i} < {c} [=3DP].
[[Discussion]]
The expression "$e" represents taking the value of an lvalue. As such, each
byte of the lvalue needs to be read - represented by the new R events -
and the result is no longer an lvalue - represented by omitting the
existing L events from the expression. Any other events associated with the
expression are simply carried up without change.
The expression "@e" represents the conversion of an lvalue to an address.
Again, the result is not an lvalue and so the L events must be dropped.
Since no action takes place, they are replaced by D (dummy placeholder)
events to ensure that any other rules constraining the order of events
still take place in the right order. Similarly, the expression "&e"
represents the conversion of an lvalue to an address and is handled in the
same way.
On the other hand, the expression "*e" represents the conversion of an
address to an lvalue; it is not possible to use the lvalue until after
the subexpression e has been evaluated. Because the value of e is, in
general, variable, it is not normally possible to determine the address of
the lvalue until the program is executed. In this case the analysis will
have to consider all possibilities, and the final status of the expression
may be (for example) "undefined if x is 7 and well-defined otherwise".
It should be noted that both & and * operators might be eliminated during
the conversion to canonical form; for example, "&*x" is equivalent to "x",
while "&y[z]" is equivalent to "y+z".
[[Normative]]
S.3.2.5 Function calls
The function call operator inherits all the events of the subexpressions,
both that designating the function and those providing the arguments. There
is one new event {f:F function}, using the function actually called. For
each central event {i} that is inherited, there is a rule {i} < {f} [=3DFC].
The inherited events are all incidental events of the function call
expression.
S.3.2.6 Structures and unions
In general, the events associated with an expression "e.field" are a subset
of those associated with the subexpression e. All events not of type L are
simply inherited. Events of type L are inherited if and only if they are
for the bytes comprising the particular field of the structure or union.
This rule is modified slightly in the case of bit-field members of a
structure (but not of a union). The storage unit containing the member is
held in one or more bytes of the structure, but in some bytes only some of
the bits are used. For each such byte, any event of type L {i:L addr} for
that byte is not inherited. Instead, a new event {n:L addr bits} is
created, with the same status, type L, and address augmented by the bits
used by that field, and there is a rule {i} =3D {n} [=3DBF].
[[Examples]]
Suppose that a given structure type is 8 bytes long, while field f is 3
bytes long starting at offset 4. Then, if the expression e has events
{e0:L 2000} {e1:L 2001} {e2:L 2002} {e3:L 2003}
{e4:L 2004} {e5:L 2005} {e6:L 2006} {e7:L 2007}
designating an 8 byte structure starting at address 2000, then the
expression "e.f" inherits only three of those events: {e4} {e5} and {e6}.
Suppose further that field b is a bit-field occupying all of offset 1 and
bits 0 to 3 of offset 2. Then the expression "e.b" inherits the event {e1}
and has a new event {e2b:L 2002 0-3}, with the rule {e2} =3D {e2b}.
[[Discussion]]
A structure or union member consists of some of the bytes of the structure
or union. Therefore the behaviour of an expression containing a dot operator
is same as that of an expression containing the entire structure or union,
except that only the relevant bytes are affected.
In the case of structures, it is possible to simultaneously write to two or
more members, including bit-fields. Therefore special rules are required
to handle bit-fields. In the case of unions this is not possible, and so
these rules are not needed.
[[Normative]]
S.3.2.7 Increment and decrement operators
The expressions "++e", "e++", "--e", and "e--" are all handled identically.
The expression inherits all central events that do not have type L, and all
incidental events, of the subexpression e. For each central event of type L
of the subexpression (say {i:L addr}) the expression has two new central
events {r:R addr} and {w:W addr}, both with the same address, but of types
R and W respectively. There are rules:
{i} =3D {r} [=3DPI1]
{r} < {w} [=3DPI2]
[[Discussion]]
These rules are equivalent to those for the expression e +=3D 1. It might be
a better approach to do this during the creation of the canonical form.
[[Normative]]
S.3.2.8 Arithmetic operators
The following operators simply inherit the events of their operand(s):
unary: + - ! ~
binary: + - * / % << >> < > <=3D >=3D =3D=3D !=3D & ^ |
[[Discussion]]
These operators do not use or create lvalues, and in the case of the binary
operators they do not impose any order of evaluation on their operands.
[[Normative]]
S.3.2.9 Cast operators
The cast operator inherits the events of its operand and of the type name.
[[Discussion]]
The operator does not create or use an lvalue. While the the type name has
to be evaluated before the cast is completed, it does not have to be
evaluated before or after the value of the operand has been determined. So
there is no ordering imposed between the two.
[[Normative]]
S.3.2.10 Comma operator
The comma operator inherits the events of its operands and has a new
central event {s:S} of type S.
For each event {il} inherited from the left operand there is a rule:
{il} < {s} [=3DCL]
For each event {ir} inherited from the right operand there is a rule:
{s} < {ir} [=3DCR]
[[Discussion]]
The new event represents the sequence point.
[[Normative]]
S.3.2.11 Assignment operators
An assignment expression inherits all the events of the two operands except
for central events of the left operand that have type L. The inherited
events become incidental events.
For each central event of the left operand of type L (say {i:L addr}),
the following happens:
* for a simple assignment operator, there is a new central event of type
W and the same address ({w:W addr}), plus a rule {i} < {w} [=3DA1];
* for a compound assignment operator, there is a new central event of type
W and a new incidental event of type R, both with the same address
({w:W addr} and {r:R addr}), plus rules {i} < {r} [=3DA2] and
{r} < {w} [=3DA3].
The new type W events are the only central events of the expression.
For each event {e} inherited from the right operand, for each new type W
event {w} introduced in the previous paragraph, there is a rule {e} < {w}
[=3DA4].
[[Discussion]]
The central events of the left operand that have type L designate the bytes
that will be assigned to. Therefore we need to generate a type W event for
each such byte. This analysis makes one major assumption: that an assignment
cannot happen until the address to be written to and the value to be written
have both been determined.
The value of the assignment is the value written, and therefore the
corresponding W events are the only central events of the resulting
expression; any other event involved in either operand is retained, but it
will not affect what happens when the result is used.
The analysis for "x +=3D y" is the same as that for "x =3D x + (y)", except
that the events inherited from x are only done so once, not twice.
[[Normative]]
S.3.3 Evaluating the status
Once the events and rules associated with the overall expression have been
determined, the status of the expression is evaluated in the following way.
(In the following, the concept of "same address" takes account of the
designation of specific bits: if two events have the same address, both have
bits designated, but they have no bits in common, it is as if they had
different addresses.)
First, all events not associated with the overall expression are changed to
type D. Next, every possible ordering of the events, other than those of
type D, is examined in turn. If the ordering is inconsistent with any
rule (including any rule involving type D events), it is discarded. Those
orderings that are consistent with all rules are called the "permitted
orderings".
If any permitted ordering contains an event of type W followed by an event
of either type R or type W for the same address and the events are not
separated by an event of either type F or type S, then the expression
involves undefined behaviour.
Otherwise:
* If there are two or more events of type W for a given address and two
permitted orderings have different ones last, the value stored in the
object at that address is unspecified; it may be any of the values written
by the subexpressions first generating those events.
* If there is an event of type R and one of type W for a given address,
and two permitted orderings arrange them different ways round, the value
used by the subexpression that first generates the event of type R is
unspecified; it may be either the value written by the subexpression first
generating the event of type W, or it may be the previous value of the
object.
* The above two cases also apply if either or both of the events are
replaced by (distinct) events of type F where the function designated reads
or writes the appropriate object.
Otherwise the expression is well-defined and always has the same value and
produces the same side-effects.
[[Discussion]]
If there are two events of type W for the same address, the corresponding
object is being written to twice by the expression. There are three cases:
(1) It is possible to have the two events not separated by an event of
type F or type S. In this case there is an ordering where there are two
writes to the same object without a sequence point intervening, so the
behaviour is undefined.
(2) There is always an intervening sequence point but the two events can
occur in either order. In this case it is unspecified which order they
occur in, and so which one provides the last write to the object. If one
of the events is the last write to the object, then the final value of the
object will be unspecified.
(3) There is always an intervening sequence point and the order is always
the same. In this case the expression is well-defined as far as these
events are concerned.
If there are two events, one of type R and one of type W, for the same
address, the corresponding object is both read and written. In this case:
(1) If the write can be followed by the read without an intervening event
of type F or type S (a sequence point) the behaviour is undefined.
(2) If this cannot happen but the events can nonetheless be in either
order, the actual order is unspecified.
(3) Otherwise either the object is always read first, or there is always a
sequence point between the write and the subsequent read, but in either
case the expression is well-defined as far as these events are concerned.
[[Normative]]
If an event of type R applies to a volatile object, the object is being
read. If there is more than one such event for the same object, the value
of the expression may depend on the order they occur. If the permitted
orderings allow more than one arrangement for these events, the value of
the expression is unspecified.
[[Discussion]]
There is an obvious mapping from R events to reads of a volatile object,
and if the events can occur in more than one order it implies that the
corresponding reads can also do so.
I understand that this area was seen as contentious in previous drafts of
this document, and have tried to say as little as possible.
[[Examples]]
EXAMPLE 1
int x, y;
x =3D y++; // already in canonical form
The analysis proceeds as follows:
associated events new rules
central incidental introduced
x {1:L x} none
y {2:L y} none
y++ {3:R y} none {2} =3D {3} [=3DPI1]
{4:W y} {3} < {4} [=3DPI2]
x =3D y++ {5:W x} {3} {4} {1} < {5} [=3DA1]
{3} < {5} [=3DA4]
{4} < {5} [=3DA4]
The only permitted ordering is:
{3} {4} {5}
The only location referred to more than once is y, and the R event occurs
before the W event. So the expression is well-defined.
EXAMPLE 2
int x;
x =3D ++x; // already in canonical form
The analysis proceeds as follows:
associated events new rules
central incidental introduced
x [LHS] {1:L x} none
x [RHS] {2:L x} none
++x {3:R x} none {2} =3D {3} [=3DPI1]
{4:W x} {3} < {4} [=3DPI2]
=3D {5:W x} {3} {4} {1} < {5} [=3DA1]
{3} < {5} [=3DA4]
{4} < {5} [=3DA4]
The only permitted ordering is:
{3} {4} {5}
This contains events {4} and {5}, both of which are type W events for x,
without an intervening sequence point. So the expression involves
undefined behaviour. This is not altered by the fact that the same value
is written both times.
EXAMPLE 3
int x;
x +=3D x * x;
The canonical form is:
x +=3D $x * $x;
The analysis proceeds as follows:
associated events new rules
central incidental introduced
x [LHS] {1:L x} none
x [RHS 1] {2:L x} none
$x [RHS 1] {3:R x} none {2} =3D {3} [=3DRV]
x [RHS 2] {4:L x} none
$x [RHS 2] {5:R x} none {4} =3D {5} [=3DRV]
$x * $x {3} {5}
+=3D {7:W x} {6:R x} {1} < {6} [=3DA2]
{3} {5} {6} < {7} [=3DA3]
{3} < {7} [=3DA4]
{5} < {7} [=3DA4]
The permitted orderings are:
{3} {5} {6} {7}
{3} {6} {5} {7}
{5} {3} {6} {7}
{5} {6} {3} {7}
{6} {3} {5} {7}
{6} {5} {3} {7}
Since {7} is the only W event and is always last, the expression is
well-defined.
EXAMPLE 4
int x;
extern int f(int);
x =3D f(x++);
The canonical form is:
x =3D (@f)(x++)
The analysis proceeds as follows:
associated events new rules
central incidental introduced
x [LHS] {1:L x} none
f none none
@f none none
x [RHS] {2:L x} none
x++ {3:R x} none {2} =3D {3} [=3DPI1]
{4:W x} {3} < {4} [=3DPI2]
(@f)(x++) {5:F f} {3} {4} {3} < {5} [=3DFC]
{4} < {5} [=3DFC]
=3D {6:W x} {3} {4} {5} {1} < {6} [=3DA1]
{3} < {6} [=3DA4]
{4} < {6} [=3DA4]
{5} < {6} [=3DA4]
The only permitted ordering is:
{3:R x} {4:W x} {5:F f} {6:W x}
The two writes to x are separated by an event of type F (the sequence point
in a function call). The read occurs before the first write. Therefore the
expression is well-defined.
Note that it is clear that x is incremented before the function call starts
execution and that the final value of x will be the result of the function.
If the function has some other form of access to x, it will see the
incremented value, and any value it stores in x will be overwritten.
If the expression is replaced by:
x =3D 2 * f(x++);
the analysis is effectively the same - multiplication by a constant adds no
new events or rules. And if it is replaced by:
x =3D 0 * f(x++);
the analysis continues to remain the same; an optimizing implementation can
determine that x is set to 0, but it must still make the incremented value
of x available to the function.
EXAMPLE 5
int x, y;
(x=3Dy) + x;
The canonical form is:
(x=3D$y)+$x
The analysis proceeds as follows:
associated events new rules
central incidental introduced
x [LHS] {1:L x} none
y {2:L y} none
$y {3:R y} none {2} =3D {3} [=3DRV]
x =3D $y {4:W x} {3} {1} < {4} [=3DA1]
{3} < {4} [=3DA4]
x [RHS] {5:L x} none
$x {6:R x} none {5} =3D {6} [=3DRV]
(x=3D$y)+$x {4} {6} {3}
The permitted orderings are:
{3} {4} {6}
{3} {6} {4}
{6} {3} {4}
Since the first of these has {4:W x} immediately before {6:R x}, the
expression is undefined.
EXAMPLE 6
int x, y, z;
(x=3Dy) + (x=3Dz);
The canonical form is:
(x=3D$y)+(x=3D$z)
The analysis proceeds as follows:
associated events new rules
central incidental introduced
x [LHS] {1:L x} none
y {2:L y} none
$y {3:R y} none {2} =3D {3} [=3DRV]
x =3D $y {4:W x} {3} {1} < {4} [=3DA1]
{3} < {4} [=3DA4]
x [RHS] {5:L x} none
z {6:L z} none
$z {7:R z} none {6} =3D {7} [=3DRV]
x =3D $z {8:W x} {7} {5} < {8} [=3DA1]
{7} < {8} [=3DA4]
(x=3Dy)+(x=3Dz) {4} {8} {3} {7}
The permitted orderings are:
{3} {4} {7} {8}
{3} {7} {4} {8}
{3} {7} {8} {4}
{7} {3} {4} {8}
{7} {3} {8} {4}
{7} {8} {3} {4}
Since the second of these has {4:W x} immediately before {8:W x}, the
expression is undefined. (In fact, all six involve a forbidden arrangement
of events.)
EXAMPLE 7 To demonstrate the rules for arrays:
double x [5];
int y =3D 3;
x[y] /=3D (double) &x[y];
The canonical form is:
*(@x+$y) /=3D (double)(@x+$y);
The analysis proceeds as follows ("e" is the array element identified by
x[y]):
associated events new rules
central incidental introduced
[LHS]
x {1:L x} none
@x {2:D} none {1} =3D {2} [=3DLV]
y {3:L y} none
$y {4:R y} none {3} =3D {4} [=3DRV]
(@x+$y) {2} {4} none
*(@x+$y) {5:L e} none {2} < {5} [=3DP]
{2} {4} {4} < {5} [=3DP]
[RHS]
x {6:L x} none
@x {7:D} none {6} =3D {7} [=3DLV]
y {8:L y} none
$y {9:R y} none {8} =3D {9} [=3DRV]
(@x+$y) {7} {9} none
(double) none none
cast {7} {9} none
/=3D {11:W e} {2} {4} {5} < {10} [=3DA2]
{7} {9} {10} < {11} [=3DA3]
{10:R e} {7} < {11} [=3DA4]
{9} < {11} [=3DA4]
The permitted orderings are:
{4} {9} {10} {11}
{4} {10} {9} {11}
{9} {4} {10} {11}
There are no type W events for x or y; the only object written to is e. The
events involving e are {10} and {11}, and they are always in that order.
Therefore the expression is well-defined.
EXAMPLE 8 To demonstrate the rules for structures:
int x;
struct s { double p; int q; double r; } y;
x =3D y.q;
The canonical form is:
x =3D $(y.q)
The analysis proceeds as follows:
associated events new rules
central incidental introduced
x {1:L x} none
y {2:L y.p} none
{3:L y.q}
{4:L y.r}
y.q {3} none
$(y.q) {5:R y.q} none {3} =3D {5} [=3DRV]
x =3D $(y.q) {6:W x} {5} {1} < {6} [=3DA1]
{5} < {6} [=3DA4]
There is only one permitted ordering:
{5} {6}
EXAMPLE 9
struct s { double p; int q; int r; } *x, y;
x =3D &y;
x->q =3D x->r;
The canonical form is:
(*$x).q =3D $((*$x).r)
The analysis proceeds as follows:
associated events new rules
central incidental introduced
x [LHS] {1:L x} none
$x {2:R x} none {1} =3D {2} [=3DRV]
*$x {2} none {2} < {3} [=3DP]
{3:L y.p} {2} < {4} [=3DP]
{4:L y.q} {2} < {5} [=3DP]
{5:L y.r}
(*$x).q {4} none
x [RHS] {6:L x} none
$x {7:R x} none {6} =3D {7} [=3DRV]
*$x {7} none {7} < {8} [=3DP]
{8:L y.p} {7} < {9} [=3DP]
{9:L y.q} {7} < {10} [=3DP]
{10:L y.r}
(*$x).r {10} none
$((*$x).r) {11:R y.r} none {10} =3D {11} [=3DRV]
=3D {12:W y.q} {11} {4} < {12} [=3DA1]
{11} < {12} [=3DA4]
There is only one permitted ordering:
{11} {12}
EXAMPLE 10 To illustrate the rules for bit-fields:
struct { int x : 10; int y : 3; } s;
s.y =3D 4;
s.x =3D s.y++;
This is already in canonical form.
This example assumes that s occupies two bytes of memory, called s.1 and
s.2 in this description. The bit-field x occupies all of the first and
bits 0 and 1 of the second, while the bit-field y occupies bits 2-4 of
that byte. The analysis proceeds as follows:
associated events new rules
central incidental introduced
s [LHS] {1:L s.1} none
{2:L s.2}
s.x {1} none {2} =3D {3} [=3DBF]
{3:L s.2 0-1}
s [RHS] {4:L s.1} none
{5:L s.2}
s.y {6:L s.2 2-4} none {5} =3D {6} [=3DBF]
s.y++ {7:R s.2 2-4} none {6} =3D {7} [=3DPI1]
{8:W s.2 2-4} {7} < {8} [=3DPI2]
s.x =3D s.y++ {9:W s.1} {7} {8} {1} < {9} [=3DA1]
{10:W s.2 0-1} {3} < {10} [=3DA1]
{7} < {9} [=3DA4]
{8} < {9} [=3DA4]
{7} < {10} [=3DA4]
{8} < {10} [=3DA4]
The permitted orderings are:
{7} {8} {9} {10}
{7} {8} {10} {9}
Since the three W events all identify different locations in memory (even
if two of them are parts of the same storage unit) the expression is
well-defined.
However, if the type of s was a union instead of a structure, there would
be no designation of bit-fields in the events. Events {8} and {10} would
both be type W for address s.2, and the behaviour is undefined.
EXAMPLE 11
int x;
x++ && x--;
This expression has two canonical forms, depending on
whether x is or is not initially zero:
x++ // x is initially 0
x++ , x-- // x is not initially 0
In this example it is clear that only the latter needs to be analysed in
detail, but this might not be the case in more complex expressions. Doing
the analysis:
associated events new rules
central incidental introduced
x [LHS] {1:L x} none
x++ {2:R x} none {1} =3D {2} [=3DPI1]
{3:W x} {2} < {3} [=3DPI2]
x [RHS] {4:L x} none
x-- {5:R x} none {4} =3D {5} [=3DPI1]
{6:W x} {5} < {6} [=3DPI2]
x++, x-- {2} {3} {5} none {2} < {7} [=3DCL]
{6} {7:S} {3} < {7} [=3DCL]
{7} < {5} [=3DCR]
{7} < {6} [=3DCR]
The only permitted ordering is:
{2} {3} {7} {5} {6}
The W event {3} is immediately followed by a type S event, so the
expression is well-defined. It is clear that the decrement must apply to
the incremented variable.
EXAMPLE 12
int x, y;
x++ * y++ ? x-- : y--;
Again there are two canonical forms:
x++ * y++ , x-- // x and y both initially nonzero
x++ * y++ , y-- // either x or y initially zero
In each case the analysis follows the lines of the previous example, and
there is always a type S event between the writes on the left hand side and
the reads and writes on the right hand side.
EXAMPLE 13
int x[2], *y;
y =3D x;
*y =3D f(y++);
The canonical form is:
*$y =3D (@f)(y++)
The analysis proceeds as follows:
associated events new rules
central incidental introduced
y [LHS] {1:L y} none
$y {2:R y} none {1} =3D {2} [=3DRV]
*$y {2} {3:L e} none {2} < {3} [=3DP]
f none none
@f none none
y [RHS] {4:L y} none
y++ {5:R y} none {4} =3D {5} [=3DPI1]
{6:W y} {5} < {6} [=3DPI2]
(@f)(y++) {7:F} {5} {6} {5} < {7} [=3DFC]
{6} < {7} [=3DFC]
=3D {8:W e} {2} {5} {6} {7} {3} < {8} [=3DA1]
{5} < {8} [=3DA4]
{6} < {8} [=3DA4]
{7} < {8} [=3DA4]
(where e is the location pointed to by y at the moment of the * operation).
The permitted orderings are:
{2} {5} {6} {7} {8}
{5} {2} {6} {7} {8}
{5} {6} {2} {7} {8}
{5} {6} {7} {2} {8}
The third of these involves a read of y (event {2}) immediately after a
write (event {6}), and so the behaviour is undefined.
[[Normative]]
S.4 Application
S.4.1 Expressions
For each expression statement, and for each full expression in a selection,
iteration, or jump statement, the expression is considered separately in
its normal place in the sequence of execution.
For each full declarator involving a variably modified type, and for each
initializer list outside a compound literal, the events of all the
individual expressions are inherited together with no additional rules.
[[Example]]
Consider the declaration:
int a [x][x++];
There are two expressions, whose canonical forms are:
$x
x++
The analysis of the first proceeds as follows:
associated events new rules
central incidental introduced
x {1:L x} none
$x {2:R x} none {1} =3D {2} [=3DRV]
and the second as follows:
associated events new rules
central incidental introduced
x {3:L x} none
x++ {4:R x} none {3} =3D {4} [=3DPI1]
{5:W x} {4} < {5} [=3DPI2]
The overall set of events is therefore {2}, {4}, and {5}, and the permitted
orderings are:
{2} {4} {5}
{4} {2} {5}
{4} {5} {2}
The last one has a W event for x followed by an R event, so this
declaration involves undefined behaviour.
[[Normative]]
S.4.2 Signals
If program execution is interrupted by a signal where the handler returns
to the caller, the behavior is as if an F event for the signal handler were
inserted at an unspecified place in the sequence of events forming the full
expression or array declarator being evaluated at the time of the signal.
S.4.3 Floating-point environment
There are four additional event types for addressing the floating-point
environment: RF, WF, RV, and WV. These represent reads of and changes to
the floating-point exception flags (F) and to the remainder of the floating
point environment (V) respectively.
Any function in that reads or alters these flags generates events
of the appropriate types.
Wherever an event {v:WV} is generated, an incidental event {f:WF} is also
generated, with the rule: {v} =3D {f} [=3DFP1].
If the state of the FENV_ACCESS pragma is on, then any operation that sets
a floating-point exception flag causes an event {f:WF} to be created. The
events associated with the subexpression based on the operator causing the
flag to be set are listed, and for every event {e} other than a central
event of type W, the rule {e} < {f} [=3DFP2] is added.
[[Example]]
In the expression:
x + y * z
suppose that the multiplication sets a floating-point exception flag. The
expression is analysed in the following way. The canonical form is:
$x + $y * $z
The analysis proceeds as follows:
associated events new rules
central incidental introduced
x {1:L x} none
$x {2:R x} none {1} =3D {2} [=3DRV]
y {3:L y} none
$y {4:R y} none {3} =3D {4} [=3DRV]
z {5:L z} none
$z {6:R z} none {5} =3D {6} [=3DRV]
y * z {4} {6} none {4} < {7} [=3DFP2]
{7:WF} {6} < {7} [=3DFP2]
x + y * z {2} {4} {6} none
[[Normative]]
The principles of S.3.3 are applied to RV and WV events as though they were
R and W events for a specific address, and to RF and WF events as though
they were R and W events for another address. However, if a permitted
ordering contains two or more WF events without an intervening event of
type S or F, the expression does not involve undefined behaviour for this
reason. Instead, it is unspecified which combination of flags will actually
be set by the expression, so long as at least one of them is.[*]
[*] If it were undefined to write twice to the flags between sequence
points, it would not be practical to use the facilities of the
header in complicated expressions.
[[Commentary]]
Nick Maclaren was kind enough to suggest semantics for floating-point in an
earlier version of this document. The present semantics are my attempt to
adapt his work to the present notation, and could no doubt be improved
greatly.
--=_Turnpike_JDbkYaZKcs45s0Jw=
Content-Type: text/plain;charset=us-ascii
Content-Transfer-Encoding: quoted-printable
--=20
Clive D.W. Feather | Internet Expert | Work:
Tel: +44 20 8371 1138 | Demon Internet | Home:
Fax: +44 20 8371 1037 | Thus plc | Web:
Written on my laptop; please observe the Reply-To address
--=_Turnpike_JDbkYaZKcs45s0Jw=--
--=_Turnpike_IDLkYeZKcs45AzWh=
Content-Type: application/pgp-signature
Content-Disposition: attachment; filename=signature.asc
-----BEGIN PGP SIGNATURE-----
Version: PGPsdk version 1.7.1
iQEVAwUBOeLHCyNAHP3TFZrhAQEV0gf+IhIkDyBPC9ycsBD1ZsHDAGTyX3HyNkrJ
WM2s4HBNffQ9uhW6flp0YzFWdtCd4Zg5ycNWFXhURxJ+fwkQkRazEwtEfWqvYaY8
GERmMdjXGBvKYARN90sEyFCW0Ta5Bcto7zwr7nyIo3oI4HMOq1SR/7X8N5XOsDLg
RdhAtH/jj+NhxnBlIAhJP32HxVV49I45QSdyun/u/Tv3t5kIXK0oa66Iq5ZQYvSe
k1Cck++ggP2NOEf0QBjxa3qRq+AwGc9Ut2Kr2HsLhviUqs32tCwP4KYkklzRzdt/
SU5vRLxxhKhgOrp70V2Pxia0DY9NrP/JwGn9jll8rMb+x2En1HouKg==
=OUDA
-----END PGP SIGNATURE-----
--=_Turnpike_IDLkYeZKcs45AzWh=--