Document:  n926 
Title:     Sequence Point Analysis 
Author:    Raymond Mak, IBM Corp. 
E-mail:    rmak@ca.ibm.com 
Date:      17-Nov-2000 
  

(Please use fixed width font to print; otherwise the trees will become
all chopped up.) 
  

Abstract 

An idea was presented at the WG14 Toronto meeting to use abstract
syntax tree to tackle the sequence point problem. This paper documents
that presentation and expands the idea into an analysis tool. 

Abstract syntax tree is a tool in the description of expression
semantics. The properties of trees are well understood; and trees have
a natural graphical representation. The notation described in this
paper can be used to facilitate discussions about the sequence point
issues. 

The essential ideas presented below were inspired by Feather's work in
n925. 
  

0. Introduction 

There are read and write events occurring during the evaluation of an
expression. The Standard imposes an ordering on some of these events,
but leaves others unspecified. The result is a partial ordering. Under
certain conditions, an implementation has more than one way to
evaluate an expression. 

An analysis of an expression involves two steps: 

a) Determine the partial ordering of the events. 
b) Basing on (a), determine the status of the expression. 

We use an abstract syntax tree (or syntax tree for short) to determine
the partial ordering, and use a set of rules to determine the status. 
  

1. Abstract syntax tree 

Write the expression as an abstract syntax tree. For example: 

   x = x * y 

becomes 

     = 
    /  \ 
   x    * 
       / \ 
      x   y 
  

The tree is a directed acyclic graph; each edge joining two nodes has
an arrow pointing towards the leaf side. (The arrow is not drawn.) 
  

2. Partial ordering 

The directed edges define a partial ordering. This ordering imposes a
restriction on the order of evaluation. In the above example, the
=-node must be evaluated after the *-node. 

If two nodes cannot be compared, there is no ordering restriction. (We
will add rules later on to handle special ordering semantics; e.g. the
one imposed by &&.) 
  

3. Read and Write Events 

If the operation writes values to address locations, annotate the node
with the addresses being written to. For conciseness, we denote the
address location by an lvalue expression written inside a pair of
square brackets. 

      = [x] 
     / \ 
    x   * 
       / \ 
      x   y 
  

During the evaluation of an expression, whenever an lvalue is
converted to an rvalue, mark it with parentheses. 

      = [x] 
     / \ 
    x   * 
       / \ 
     (x) (y) 
  

If y itself is a subtree, for example if the expression is: 

    x = x * a[i]; 

annotate the rvalue at the operator: 
  

      = [x] 
     / \ 
    x   * 
       / \ 
     (x)  \ 
           \ 
           [] (*(a+i)) 
          /  \ 
        (a)  (i) 
  

The notation will become obvious as we go through the examples later. 
  

4. Relationship between nodes 

We are interested in nodes representing read and write accesses of the
same or overlapping address location. 

When there are two accesses to the same or overlapping address
location, we need to find out their relationship; i.e. whether one
must come before the other, and whether there is an intervening
sequence point. The annotated syntax tree described above is the
starting point to analyze this relationship. 

4.1 Any two nodes in a tree are connected by a unique path, consisting
of the directed edges. In our earlier example, x=x+y, the path joining
the x-locations is: 
  

      = [x] 
       \ 
        * 
       / 
     (x) 
  

For ease of presentation, we can also straighten out the path into a
horizontal line like the following: 

   =[x] -->-- * -->-- (x) 

The arrows are put onto the edge to show the ordering. 

4.2 Edges in a path need not be in the same direction. For example,
the path connecting (x) and (y) is: 

        * 
       / \ 
     (x) (y) 
  

If we straighten it out and show the arrows: 

    (x) --<-- * -->-- (y) 

The edges points to different directions. 

4.3 If the edges point to the same direction, the two nodes can be
compared; the direction of the edges gives the ordering. In the
example in 4.1, [x] > (x). That is, [x] must happen after (x). 

If the edges point to different directions, the two nodes cannot be
compared. In the example in 4.2, (x) and (y) has no ordering; there is
no guarantee on the relative timing of their evaluations. 

Note that there can be at most one node in a path with edges pointing
to different directions. 
  

5. The status of an expression 

5.1 The status of an expression is determined as follows. 

Examine all paths joining nodes with the same or overlapping address
location. For each path, determine the status of the path (i.e.
determine the relationship between the two nodes): 
  

[Rule] 

R.1 If the two nodes are ordered: 

R.1.1 If the smaller one (the one happens earlier) is a read, or if
there is a sequence point in between, it is well-defined. 
R.1.2. Otherwise it is undefined. 

R.2 If the two nodes are not ordered: 

R.2.1 If both are reads, it is well-defined. 
R.2.2 If there is a write guarded by a function call operator (see 5.2
below), it is unspecified. 
R.2.3 Otherwise it is undefined. 

[End Rule] 
  

5.2 Events happening inside a function body are considered 'guarded'
by the function call operator. These include both the reads and writes
events. There is a sequence point before the function returns; and
function calls do not overlap (DR087). These two together justify
R.2.2. We will say more about this in 12.4. 

Note: Another way of looking at this is that function calls are
evaluated as though they were 'atomic' to the rest of the expression. 
  

5.3 After all paths are examined, the status of the expression is: 
 - undefined if there is an undefined path, otherwise 
 - unspecified if there is an unspecified path, 
 - otherwise it is well-defined. 

Note that the true status of an expression has to be determined during
runtime, as we do not know in all cases whether two lvalues designates
the same or overlapping address location. 
  

6. Sequence point 

Use 'S' to denote a sequence point, either put it beside an edge or
inside a node. 

6.1 For operators &&, ||, ?: and comma, put it beside the left edge.
For example, (x && y) is: 
  
  

        && 
       /  \ 
      /S   \ 
     /      \ 
    x        y 
  

6.2 For the call operator, put it inside the node. For example, the
tree for f(x) is: 

        ( ) 
        /\ 
       /  \ 
      f    x 

Putting in the sequence point, it becomes: 

        (S) 
        /\ 
       /  \ 
      f    x 
  

There is a sequence point right before the function is actually
called, but none immediately after each of the evaluations of f or x.
Therefore the sequence point cannot be put on an edge; also the order
of evaluation of f and x (and other arguments if present) is
unspecified. 

A sequence point is an event occurring during the evaluation of an
expression. A sequence point on an edge leading to a subtree forces
all side effects occurring inside it to take effect before the
evaluation leaves the subtree. In a sense, the sequence point is a
'fence' guarding the subtree below it. 

There are two sequence points involved in a function call - one right
after the evaluation of the function designator and all the arguments,
another one before the function returns. To make the notation compact,
the S represents both of them. Also, we hang the function designator
and the arguments under the left parentheses on one side of the S to
illustrate the fact that the S is not involved when these nodes are
compared among themselves. 
  

7. Order of evaluation 

7.1 For operators other than &&, ||, ?: and comma, the order of
evaluation of the subexpression is unspecified. 

7.2 For &&, ||, ?: and comma, the order of evaluation is from left to
right. This means that every nodes in the left subtree is ordered
before all the nodes in the right one. 

We should keep two closely related concepts separate. The first is the
concept of sequence point, which signifies the latest point in time
when all side effects occurred during the evaluation of a subtree must
be applied. The second is the concept of 'time-sequencing', which
imposes an ordering among the events happening during the evaluation
of a subtree. 

Sequence point by itself does not impose an ordering. It is just a
flushing point for the side effects. The evaluation order of &&, ||,
?: and comma is an additional requirement imposed by the operators.
(e.g. C99 6.5.13p4 for &&.) 
  

8. An algorithm 

After we have identified all the reads and writes, and annotated the
synatx tree, the operator of a node is not important anymore. We can
replace them with a generic 'o' without lost of information. For
operators that impose an evaluation order, we can replace them with a
'^'. So an algorithm can be devised as follows. 

A.1 Draw the abstract syntax tree of the full expression. Partial
expressions cannot be analyzed. 
A.2 Identify all the reads and writes and annotate the tree. 
A.3 Put sequence points on the left edge of &&, ||, ?: and comma, and
inside the ()-node. 
A.4 Replace &&, ||, ? and comma with the generic '^'. Replace other
operators with the generic 'o'. The colon in ?: is left unchanged.
(See A.6.) 
A.5 Determine the ordering relationship between two nodes as follows: 
A.5.1 Connect the two nodes with a path. 
A.5.2 If there is a o-node on the path with edges pointing to
different directions: 

    n1 ... ----<---- o ---->---- ... n2 

the two nodes (at the ends of the path) are not ordered. 
A.5.3 If there is a ^-node on the path with edges pointing to
different directions: 

    n1 ... ----<---- ^ ---->---- ... n2 

the node on the left is less than the node on the right. (i.e. the
event on the left must happen before the one on the right.) 
A.5.4 Otherwise the two nodes are ordered. The ordering is defined by
the directed edge. 
(Note that A.5.2 and A.5.3 are mutually exclusive. There can be at
most one node, 'o' or '^', with edges pointing to different
directions.) 

A.6 The ?: consists of two subtrees merged together. We use the
following notation: 

   x ? a : b; 
  

          ? 
         / \ 
        /S  \ 
       /     \ 
      x       : 
             / \ 
            /   \ 
           a     b 

which becomes, after A.4: 

          ^ 
         / \ 
        /S  \ 
       /     \ 
      x       : 
             / \ 
            /   \ 
           a     b 
  

The colon indicates the merge. The colon has a special property;
unlike any other operators we don't have to compare the nodes between
the two sub-trees under the colon because only one of them will be
evaluated. 
  

Note: As we will see later in the examples, we don't have to actually
replace the operators with '^' and 'o'. These are just convenient
devices to describe the algorithm. The notation is clear and flexible
enough to allow different variations of presentation. Also, it is not
necessary to expand all the subexpressions into subtrees; sometimes it
is more convenient to just write out an subexpression as-is as a leaf
node. 
  

[EXAMPLES] 

Before we proceed further, let us use some specific examples to
illustrate how the notation is being used. Following are selected
examples from n925. The numbering of the examples is the same as n925.

  

EXAMPLE 2 

    int x; 
    x = ++x; 

          = [x] 
        /   \ 
       /     \ 
      x      ++ [x] 
            / 
           / 
         (x) 

There is a path with two writes without an intervening sequence point:

  =[x] --->--- ++[x] 

The expression is undefined. 
  

EXAMPLE 3 

    x += x * x; 

          += [x] 
         /  \ 
        /    \ 
      (x)    * 
            /  \ 
          (x)  (x) 

All paths have read before write. (The write is at the highest point
in the tree.) This is well-defined. 
  

EXAMPLE 4 

    int x; 
    extern int f(int); 
    x = f(x++); 

          = [x] 
        /   \ 
       /     \ 
      x      (S) 
             /\ 
            /  \ 
           f   ++ [x] 
               / 
             (x) 

There is path with two ordered writes: 

  =[x] --->--- (S) --->--- ++ [x] 

There is an intervening sequence point. The expression is
well-defined. 
  

EXAMPLE 5 

    int x,y; 
    (x=y) + x; 

          + 
        /   \ 
       /     \ 
      = [x]  (x) 
     / \ 
    x  (y) 
  

There is an undefined path: 

  =[x] ---<--- + --->--- (x) 
  
  

EXAMPLE 6 

    int x, y, z; 
    (x=y) + (x=z); 
  

           + 
          /  \ 
         /    \ 
        /      \ 
       = [x]   = [x] 
      / \     / \ 
     x  (y)  x  (z) 
  

There is an undefined path - two writes without a sequence point: 

  =[x] ---<--- + --->--- =[x] 
  
  
  

EXAMPLE 9 

    struct s { double p; int q; int r; } *x, y; 
    x = &y; 
    x->q = x->r; 
  

            = [(*x).q] 
           /  \ 
          /    \ 
         /      \ 
        ->      -> ((*x).r) 
       /  \    /  \ 
      x   q   x    r 
  

Well-defined. 
  

EXAMPLE 11 

    int x; 
    x++ && x--; 
  

         && 
        /  \ 
       /S   \ 
      /      \ 
    ++ [x]   -- [x] 
    /         \ 
   (x)        (x) 

The path in question is: 

          ^ 
        /   \ 
       /S    \ 
      /       \ 
    ++ [x]   -- [x] 

or to straighten it out horizontally: 

  ++[x] ---<--- ^ --->--- --[x] 
           S 

There is a sequence point between the two ordered writes. The
expression is well-defined. 
  

EXAMPLE 12 

    int x, y; 
    x++ * y++ ? x-- : y --; 
  

                ? 
               / \ 
              /   \ 
             /S    :------- 
            /       \      \ 
           /         \      \ 
          *          --[x]  --[y] 
         / \          \      \ 
        /   \          \      \ 
       /     \         (x)    (y) 
     ++[x]   ++[y] 
      \       \ 
     (x)     (y) 
  
  

There are two paths involving two writes. But both have a sequence
point in between and the notes are ordered: 

  ++[x] --<-- * --<-- ^ -->-- : --[x] 
                  S 

(The [y] path is similar.) 
  

EXAMPLE 13 
  

    int x[2], *y; 
    y=x; 
    *y = f(y++); 
  

          = [*y] 
        /   \ 
       /     \ 
      *      (S) 
     /       /\ 
    /       /  \ 
  (y)      f   ++ [y] 
                 \ 
                 (y) 
  

There is a path with both a read and a write: 
  

  (y) --<-- * --<-- = -->--(S)-->-- ++[y] 
  

Even though there is a sequence point in between, the two nodes are
not ordered. The expression is undefined. 
  

Example T1 
(This example was provided by O'Riordan at the Toronto meeting.) 

    *( *p=2, p) = 7; 
  

                  = [*p] 
                /   \ 
               /     \ 
              *       7 
             / 
            , 
           / \ 
          /   \ 
         /S   (p) 
        / 
       = [*p] 
      / \ 
     *   2 
    / 
  (p) 
  

The two writes are related as follows: 

    [*p] --<-- , --<-- * --<-- = [*p] 
           S 

They are ordered with a sequence point in between. This path is
well-defined. Other paths involving reads are also well-defined. 
  

[END OF EXAMPLES] 
  

8.1 Cast, sizeof and compound literal 

Use 'cast' and 'sizeof' to represent the cast and sizeof operators. We
also use a 'type' node to group the subexpressions of a variably
modified type. For example: 

    x = sizeof( char[x] ); 

      = [x] 
     / \ 
    x   \ 
       sizeof 
        / 
      type 
      / 
     (x) 
  

The type name can be used to label the node instead of using the
generic label 'type'. The left operand of a cast operator is the
'type' node and the right operand the cast-expression; the evaluation
order of the two is unspecified. 

Similarly, use a 'lit' node to represent a compound literal. 
  

9. Function call 

9.1 If a function call has side effects, including writes to
floating-point environment and flags, or reads that are relevant to
the analysis of the expression, annotate them at the call operator
node and treat them like other reads and writes. For example: 

    int x, y=0; 
    int foo(void) { x = y; return x;} 
    x = foo(); 
  

           = [x] 
          /  \ 
         x    \ 
              (S) [x](y) 
             / 
            f 
  

The two writes are related as follows: 

    =[x] ----->----- (S) [x] 

They are ordered with an intervening sequence point. 

9.2 Two function calls cannot overlap (DR087). This imposes a
restriction on the ordering of the events between two function calls.
In our notation, a side effect annotated beside a call operator is
considered 'guarded' by that call operator. 

Consider the following two examples. 

    int x=1; 
    x++ * x--; 

This is undefined as there are two writes without an intervening
sequence point. On the other hand: 

    int x=1; 
    void f() {x++;} 
    void g() {x--;} 
    f() * g(); 
  

        * 
       /  \ 
      /    \ 
     /      \ 
   (S)[x]  (S)[x] 
   /       / 
  f       g 
  

This is unspecified as both writes are guarded by function call
operators. (R.2.2) 
  

10. Floating-point flags 

10.1 If an operator accesses the floating-point environment or flags,
annotate them at the operator. 

Use FG to denote the floating-point flags. FG=1 means set, FG=0 means
clear, and (FG) means read. There are more than one flags; so FG=1
doesn't mean setting all flags, just those that are affected. If there
is ambiguity, we will use FG1, FG2, etc. to identify individual flags.
Likewise for FG=0. 

For example, if x++ sets a floating-point flag: 

                 ++ [x] FG=1 
                   \ 
                    \ 
                    (x) 

FG=x is a write event. 
  

10.2 Stickiness 

Floating-point flags are sticky in the sense that they can be set more
than once without an intervening sequence point and the result is
still well-defined. Likewise for clears. But we need to distinguish
between set and clear. 

That is, we treat floating-point flags updates like any other write
events, except that two writes producing the same result are allowed
without an intervening sequence point; the result in this case is
still well-defined. 

Add the following rule for floating-point flags. 

[Rule] 

R.3 Read and write events to floating-point flags follow the same rule
as other reads and writes, except for the following. If two FG writes
are both sets (FG=1), or are both clears (FG=0), they are well-defined
regardless of whether they are ordered, and regardless of whether
there is an intervening sequence point. 

[End Rule] 

Note: More work is needed to make sure the above rule reflects the
Standard's intention. 
Also note that floating-point flags can only be cleared by library
functions; i.e. clears are always guarded by call operators. 

10.3 Atomic 

It might be useful to generalize the rule for floating-point flags to
cover sig_atomic_t. Two writes to a sig_atomic_t object is
well-defined if they are writing the same value, even if they are
unordered or have no intervening sequence point. 

Individual floating-point flags behave as if they are atomic. 
  

11. Volatile 

11.1 The black-box 

The collection of all volatile address locations accessible to the
program forms a black-box to the program. Accessing any such locations
implies writes to *all* volatile locations. 

An example would illustrate. 

    volatile int x,y; 
    int i; 
    i = f(y)+x; 

          = 
        /   \ 
       i     \ 
              + 
             / \ 
            /   \ 
           /    (x) [x][y] 
          / 
         ( ) 
         /\ 
        /  \ 
       f   (y) [y][x] 
  

The read access on (x) implies writes to [x] and [y]. The same applies
for the read on (y). 

After identifying all the volatile accesses, and annotating the write
events accordingly, the analysis of the expression simply proceeds as
before. 
  

11.2 

Volatile access is a difficult problem. The above attempts to handle
it in the same way as other reads and writes. We have said more than
the Standard; the latter simply says volatile accesses are side
effects (C99 5.1.2.3p2, 6.7.3p6 and footnote 114). But our
interpretation seems be consistent with the sequence point semantics
discussed so far. 

The analysis in 11.1 is pessimistic, but is the best we can do in the
absence of information about the black-box. If the programmer knows
about what goes behind the volatile objects, a tighter analysis could
be done by partitioning the set of volatile objects into disjoin sets.
Accessing an object in one set would not affect objects in the other
sets. Such partition is implementation defined. 

We can use V to collectively represent all volatile objects; or use
V1, V2, etc. to represent a partition. This way, we can use [V] and
(V) to annotate volatile accesses in the syntax tree instead of
listing out all the individual objects. 
  

11.3 Signal 

This paper does not address signal handling. This is better done after
the bugs in the tool are ironed out. But we would like to make a quick
comment. 

If an operation raises an interrupt, and if the interrupt handler
returns to the point of interruption, we can treat the interrupt
handler as a function call and annotate the read and write events
occurring inside the handler beside the operator. Analysis would then
proceed as before. There are two questions : 
a) Where is the sequence point for the handler's side effects? 
b) Does the "no overlap" rule for function calls apply? 
  
  

12 The order of evaluation revisited 

12.1 In general, the order of evaluation of the subexpressions is
unspecified (C99 6.5p3). For example: 

     int x = 1; 
     (x++ , x) + (x-- , x) 
  

               + 
             /   \ 
            /     \ 
           /       \ 
          ,         , 
         / \       / \ 
        /S  \     /S  \ 
      x++   x    x--   x 
  

The + operator does not impose an order of evaluation. So the 'order
of evaluation is unspecified'. But what exactly does it mean, though ?

There are two ways to look at this. Ignore what the Standard says for
the moment; 
just consider what are the possible definitions here. 

12.2 Tree traversal view 

One way of looking at it is the tree-traversal view. That is, an
evaluation of the expression is a traversal of the tree. We start at
the +-node and then traverse and evaluate as we go along. Since the
+-node does not impose an evaluation order, either the left or the
right subtree can be traversed first. But if we go down one subtree,
we will finish that one before we proceed onto the next one. In the
above example, if we list out the events (reads and writes) we
encounter on the route, we will find that we always have a sequence
point after a write event before we encounter another access. The two
traversal order thus give two results. 

This, of course, is not what the standard says. 

12.3 Events interleaving view 

Another way of looking at it is the events interleaving view. There
are events happening inside the left and right subtree of the +-node.
Within the subtrees, the events have a certain ordering imposed on
them due to the operations. At the +-node, however, the order of
evaluation is unspecified; the events on the left can occur at an
unspecified time relative to the events on the right. To put it in
another way, the events on the left can interleave with those on the
right, subjected to the ordering restrictions imposed by their own
subtree. 

Let us take a more detailed look at the example in 12.1. Following the
notations in n925, let us denote the write event on the left of the +
by {3: W x} and the write event on the right by (6: W x}, and the two
sequence points by {7: S} and {8: S} respectively. 

(The number before the colon identifies the event. W means write, S
means sequence point; the x refer to the address locations affected.
Event numbers that are skipped correspond to events that do not
concern us in our discussion here.) 

There is an ordering imposed on the events on the left (e.g. 3<7) and
the events on the right (e.g. 6<8); but there is no ordering
restriction between 3,7 and 6,8. That is, we can interleave 3,7 with
6,8. Obviously there is a permitted ordering that put the two writes
together without an intervening sequence point. For example, at the
+-node, the following sequence of events is permitted: 

    3 6 7 8 

The expression is therefore undefined. 

The sequence of events happening on one side can be seen as the start
and end points of time intervals. The interval between events 3 and 7
([3,7]) marks the period of time the side effect of writing to x has
to take place. 

Likewise for the interval [6,8]. Since 3 and 6 can each start at an
unspecified time relative to each other, the two time interval can
overlap. Overlapping access intervals involving at least one write
event causes undefined behavior. 

This gives rise to rule R.2.3. 
  

12.4 Function call 

12.4.1 There is a sequence point before the actual call, and another
one before the function returns. Also two function calls cannot
overlap. There is a lot of semantics jammed into our compact notation.
We will go through the call operation in detail here. 

Let us use an example: 

    int v=0; 
    void f (int x) { v += x }; 
    v += f(v); 
  

         += [v] 
        /  \ 
      (v)   \ 
            (S) [v] 
            /\ 
           /  \ 
          /    \ 
        (f)    (v) 
  

The sequence of events is as follows: 

    - Evaluate f and v (in an unspecified order), and then 
    - a sequence point, and then 
    - branch to the function, and then 
    - a sequence point, and then return. 
  

The different nodes are related as follows. 

(f) and (v) (function designator and the arguments): 

     (f)  --<-- ( -->-- (v) 

They are unordered with no intervening sequence point. (The S inside
(S) does not apply. Please refer to the appendix for further comments
about the notation.) 
  

(v) and [v] (the argument and the function side effect): 

    (v) --<-- (S) [v] 

They are ordered with an intervening sequence point (this is the first
sequence point). (Well-defined) 

[v] and [v] (the function side effect and the assignment): 

   +=[v] -->-- (S) [v] 

They are ordered with an intervening sequence point (this is the
second sequence point). (Well-defined) 

(v) and [v] (the left of += and the function side effect): 

          += 
        /   \ 
      (v)    \ 
             (S) [v] 
  
  

    (v) --<-- += -->-- (S) [v] 

They are unordered. The S is the second sequence point. [v] is guarded
by a call operator. (R.2.2, unspecified.) 
  

12.4.2 Two function calls cannot overlap. This is the interpretation
of DR087. This makes function calls somewhat like the tree traversal
described in 12.2. 

Consider the expression: 

    f(1) + f(2) 

where f is defined as in the previous example. 

As discussed in 12.3, the time interval between a write event and the
corresponding sequence point event is what we are interested in. In
essence, if two such time intervals for the same address location
overlap, we have undefined behavior. 

In the above expression, if we draw the time line from left to right,
and use n925's notation for events, then the two write events for v
can be represented as follows: 
(Events 1 & 2 belong to the left f(); 3 & 4 to the right.) 
  

   -------------- time ------------------------> 
  

   {1:W v} ....... {2:S} 
                           {3:W v} ....... {4:S} 
  

or 

                           {1:W v} ....... {2:S} 
   {3:W v} ....... {4:S} 
  

But they cannot overlap like this: 

           {1:W v} ....... {2:S} 
   {3:W v} ....... {4:S} 
  

This is the rationale behind R.2.2. 

The tree diagram for the expression is: 

           + 
          / \ 
         /   \ 
        /     \ 
       /       \ 
     (S)[v]    (S)[v] 
     /\        /\ 
    /  \      /  \ 
   f    1     f   2 

The two write nodes are related as follows: 
  

    [v](S) --<--- + --->--- (S)[v] 
  

They are unordered, but both are guarded by function calls. The path
is unspecified. 
  

13. Conclusion and future work 

We described a tool for analyzing sequence point semantics. The basic
device in this tool was the annotated syntax tree notation, which was
used with a set of rules to provide an interpretation of the Standard.

The tree notation can be used independent of the rules. This enables
us to ask what-if questions, and to examine the effect of different
variations of the rules on an expression. Further work is needed to
study the rules for volatile and for floating-point flags. This paper
did not address signal handling; that area is left open for future
work. 
  

ACKNOWLEDGMENT 

Special thanks to Hugh Redelmeier and Fred Tydeman for their time in
reviewing this paper. Their comments and suggestions had been
invaluable. I did not address all their concerns, and they had not
seen the final version of this paper. Any errors and inadequacies were
mine.  
  

APPENDIX 

X.1 The sequence point notation 

A comment about the notation for sequence point. Let us use an
example: 

      x && y 

        && 
       /  \ 
      /S   \ 
     /      \ 
    x        y 
  

If we treat the evaluation of the expression as a traversal of the
syntax tree, then we start from the &&-node and walk down the
left-side of the left-edge, around the x-node, then up again along the
right-side of the left-edge, and then proceed onto the right subtree.
During this walking tour, we encounter the sequence point on our way
up from the x-node, not on our way down. This is why the S is put on
the right-side beside the edge. 

For a function call: 

       f(x) 

        (S) 
        /\ 
       /  \ 
      f    x 

If we walk this tree, similar situation arises. We walk the f, x (and
other arguments if present) in an unspecified order; but there is no
sequence point along the way until we come back to the ()-node, where
we encounter a sequence point right before we call the function, and
then another one later when it returns. So we put the S inside the
pair of parentheses. 

Also we hang the function designator and the arguments off the left
parentheses of the call operator. This is to represent the fact that
the sequence point does not play a role in their evaluations: 

        ( S ) 
        /\ 
       /  \ 
      f    x 

The call operator is a complex operation; it can be broken down into
three parts: the evaluation of the function designator and the
arguments, the actual calling, and the return. 

To represent these operations explicitly, we need to 'slice' the call
operator into two halves. The left parentheses represents the first
part of the operation (evaluation of the function designator and
arguments). The right parentheses represents the third (the function
return). 

f and x are thus related as follows: 

      f ---<--- ( --->--- (x) 

Only the left half of the call operator is involved, and there is no
sequence point. 

And, 

    x = f(x) 

      = [x] 
       \ 
        \ 
       (S) 
       /\ 
      /  \ 
     f   (x) 

(x) and [x] are related by: 

    (x) ---<--- (S) ---<--- =[x] 
  

As we have seen in the various examples before, it is not necessary to
really distinguish the left/right half of the call operator in an
actual analysis. There is no ambiguity as to which part of the
operation is involved, and which sequence point is relevant. 
  

X.2 The partial ordering 

The key to sequence point analysis is to determine the order of
evaluation; this order is a partial order. The syntax tree notation
presented in this paper is a tool to find out this partial ordering.
However, for the sake of simplicity, certain details are hidden and
not explicitly shown. This appendix provides a more complete
description. 

X.2.1 

Suppose e is an expression. Consider the set of all subexpressions of
e; denote this set by P(e). We take it for granted that subexpressions
are defined and exist. Use e1, e2, etc., to stand for subexpressions;
and 0 to stand for the empty subexpression. Both e and 0 are in P(e). 

A partial ordering can be defined for elements of P(e) according to
the specification of the Standard. If this partial ordering is
inconsistent, the Standard has an error. The Standard does not talk
about the empty subexpression; we add the element 0 and define it to
compare smaller than all other elements. 

We use < to represent the partial order and write e1 < e2 if e1 is
evaluated earlier than e2. (e1 is smaller than e2.) 

For two elements e1 and e2 with e1 < e2, if there is no distinct third
element lying in between, we say e2 immediately succeeds e1. We can
then draw a graphical representation of the partial ordering as
follows (using transitive reduction). 

Draw all subexpressions as nodes of a graph. Two nodes are connected
by a directed edge if one immediately succeeds the other; the
direction of the edge points to the smaller node. In general this
graph is not a tree. 

If we ignore the comma, &&, || and ?: expressions, and if we ignore 0,
the graph of the partial ordering is a tree; and it has a 1-1
correspondence with the syntax tree. Furthermore, the partial ordering
is a lattice, as any two elements has a unique least upper bound and a
unique greatest lower bound. The upper bound of the whole lattice is
the root of the tree, and the lower bound the 0. 

If we include the comma, &&, || and ?: expressions, the situation
becomes more complicated and the graph is no longer a tree (even if we
ignore the 0). But the partial ordering remains to be a lattice. The
upper bound of the whole lattice is still the whole expression e. 

Note: We don't have to worry about the conditional expressions where
part of the expression may not be evaluated. We are concerned only
with the ordering here, not with the actual evaluation of the
subexpressions. All nodes involved are represented in a single partial
ordering. 
  

X.2.2 Correspondence with n925 

Let us use an example to illustrate the partial ordering. 

    u * v  + w * x 

The subexpressions are: 

    e1 = u * v 
    e2 = w * x 
    e  = e1 + e2 
    0  = empty 

u, v, w, x are also subexpressions. They represent themselves. 
The partial ordering is shown below: 
  

          e 
         / \ 
        /   \ 
       e1   e2 
      / \   / \ 
     u  v  x   y 
     \   | |   / 
      \  | |  / 
       \ \ / / 
        \ | / 
          0 
  

If we start from 0 and follow all the edges leaving away from it and
trace through the paths towards e, we get four sequences of nodes.
Each sequence is a total ordering. Note that some nodes are duplicated
among the sequences. The evaluation order of the whole expression is
derived by interleaving the (distinct) nodes and merging the sequences
into a single one, subjected to the ordering within each sequence.
There could be more than one way to do the interleaving. 

If we follow n925 and annotate the events identified by the model
beside the nodes in the lattice, we obtain a correspondence with the
model. For conciseness, can we annotate only the new events as they
come into existence at a particular node. The node then inherits all
the events from nodes smaller than it. The rules derived during the
analysis define the same partial ordering as the lattice. 

Certain details are ignored in the above. n925 makes a distinction
between central and incidental events, and not all events are
inherited under all circumstances. But we should be able to add these
details without affecting the general theme. 

The last stage of the analysis in n925 is to determine all the
permitted ordering of the events. This is equivalent to all the
possible interleaving of the sequences as described above. 

X.2.3 Correspondence with the abstract syntax tree notation 

For expressions without comma, &&, || and ?: subexpressions, the
correspondence between the lattice and the abstract syntax tree is
obvious. 

For expressions with comma, &&, || or ?:, the syntax tree notation
avoids the complexities of drawing the full partial ordering by
introducing step A.5.2 in the algorithm in section 8. The syntax tree
together with A.5 gives the same partial ordering as the lattice. 

X.2.4 Interpretation of the Standard 

The expression syntax and semantics defines the evaluation order of
the subexpressons. This partial ordering should be well-defined (i.e.
consistent); otherwise the Standard is defective. As long as the
partial ordering is well-defined, the Standard cannot be 'wrong' or
ambiguous in the following sense. 

Whatever the Standard does not specify, or whatever the Standard
specifies as unspecified, the corresponding relationship among the
subexpressions is simply not there in the partial ordering. The only
effect this has in the analysis is that there are more permitted ways
of interleaving the events. More expressions become undefined. 

The rules listed in section 5 is an interpretation of the Standard. We
want an interpretation that gives a set of well-defined expressions
matching our expectations (as programmers). In this respect, the
Standard is unclear in the areas of floating-point flags, volatile and
signal handling. The Standard is also not adequate in describing
function calls. Even though DR087 suggested function calls do not
overlap, the Standard does not explicitly talk about this. 
  
  

===== End =====