Doc Number: X3J16/93-0158 WG21/N0365 Date: September 28, 1993 Project: Programming Language C++ Ref Doc: 93-0062/N0269 Reply to: Sam Kendall Sun Microsystems Laboratories, Inc. Sam.Kendall@East.Sun.COM EXPRESSION ATTRIBUTES Sam Kendall Sun Microsystems Laboratories, Inc. 1. INTRODUCTION We all know that every expression has two attributes: a type, and a boolean "lvalue-ness". This is true in ISO C as well as in C++. But C++ expressions have several attributes besides those two. For example, some functions have default arguments; the list of default arguments is another attribute. I have found that trying to analyze expressions solely in terms of attributes yields a great deal of clarity. I am dealing here with type analysis only. In other words, I'm not trying to define the full meaning of each expression; I am just trying to provide a foundation to decide very clearly, for each expression, whether it is well-formed and what its type is. If we can get that far in understanding expressions we will have clarified a great deal. The basic setup should be familiar to any student of programming languages and compilers. (See Aho, Sethi and Ullman chapter 2 for more information.) We consider each expression as a syntax tree. Our aim is to assign _attributes_ to each node in a bottom-up fashion. The attributes of leaf expressions are determined by name lookup and other rules; for example, when I use the identifier `x', name lookup determines which declared function(s) or variable `x' I am naming, and from the associated declaration(s) we get the attributes. The rules for assigning attributes to leaf expressions are not the focus of this paper, which is why I am hand-waving. The attributes of interior nodes are determined solely from the attributes of IMMEDIATE child nodes. This is a key point; it is our source of clarity. I have not worked this out, but I think a type conversion will be considered to ADD a node between some node (the expression being converted) and its parent. This "conversion node" has attributes which are typically different from those of its immediate child. Overload selection may also be handled this way. One point of notation: I use "f" (as in "f3.7") to indicate sections of the WP (working paper). Thanks to Tom Plum and Scott Turner for excellent feedback; I have tried to incorporate it with varying success. 1.1 Relationship to the WP These attributes are all implied by the WP. For example, the sentence "The address-of operator & may not be applied to a bit-field" in f9.6 implies the existence of the boolean attribute I call "Bit_field". Even for this simple attribute there are cases the WP leaves ambiguous; see the example `&(i ? i : b)' below. I'd like my terminology to be as close to the WP as possible. If you catch me departing from the WP unnecessarily please let me know. I'm not sure what changes should be made to the WP as a result of the clarifications that arise from thinking about expression attributes. It is easy to say "we should rewrite the WP in terms of attributes", and this would surely make the WP less ambiguous, but it would also be a ton of work, and the increased formalism would probably leave the WP less accessible to novices. The ultimate point of all this is to be able to write a more-or-less formal definition of expression type analysis. With luck this definition will encompass the built-in operators as well as user-defined operators and functions. I think we need to write this before we can tell whether the relevant parts of the WP need to be rewritten, or whether they can be edited into clarity. 1.2 What about Merging Lvalue and Type? Alternate schemes are possible. For example, in 92-0053/N0130 I proposed (in this paper's terms) to fold the Lvalue and Bit_field attributes into Type. I still think folding Lvalue (though not Bit_field) into Type is a good idea. But I'm not following that line of thought here. In this paper I stay as close to the WP as I can. 2. THE ATTRIBUTES Each expression (and subexpression) has the following attributes. Attribute Value Section In ISO C also? --------- ----- ------- -------------- Overload_set a set of 4-tuples as follows no [ Type a C++ type as in 3.6.2 except 3.6.2 yes no "constant" type modifier (see note below); never a reference type Lvalue boolean: is the expression an 3.7 yes lvalue? Default_args nil 8.2.6 no OR a sequence of boolean: does each argument have a default? NS_member nil 9.2 no OR a class type: the expression is a nonstatic member of that class ] Constant nil 3.6.2, yes OR 5.19 a value of type Type: the value of the constant expression Bit_field boolean: is the expression 9.6 yes a bit-field? Class_qualifier nil 5.3/2 no OR a class type: the expression is class-qualified Was_reference boolean: was a reference type - no modifier stripped? 2.1 Rationale for Each Attribute For most expressions the Overload_set is a singleton set, so as a shorthand we often talk about, for example, the type of an expression. But an overloaded function name or operator may have many types; some may be lvalues while others are not; it may have more than one sequence of default arguments, and some of the overloadings may be nonstatic members while others are not. For this reason Type, Lvalue, Default_args, and NS_member are not independent attributes. The other attributes are not in Overload_set because they have only one value even for an overloaded function name. The reasons for Type and Lvalue are obvious. Default_args allows default arguments to be omitted in calls. Notice that Default_args is a sequence of booleans, not of "expressions". This is because for type analysis we don't care what the default argument is, only whether it is present. NS_member allows nonstatic member names to appear as the right-hand operand of `.' and `->', while disallowing other names from appearing there. NS_member also governs the appearance of nonstatic member names in the bodies of member functions (and elsewhere). Bit_field allows us to enforce the prohibition on reference to bit-field and pointer to bit-field. Constant allows the integer constant expression 0 to be implicitly converted to pointer type, while disallowing other constant expressions, and nonconstant integers, from being so converted. Constant also allows error-checking of contexts requiring a constant expression, such as case labels. Class_qualifier allows the expressions `&m' and `&C::m' to have different types (pointer and pointer to member, respectively, for `m' a nonstatic member of `C'). Was_reference is necessary in analyzing some expressions involving members. Given this class: struct A { A(); ~A(); int i; int& ir; }; Note that the Types of `i' and `ir', are the same. We use Was_reference to tell them apart: `A().ir' is an lvalue while `A().i' is not; and `&A::i' is an `int A::*', while it's not clear what `&A::ir' is. (See 93-0135/N0342 section 4 for a little bit more information.) 2.2 Why Constant is not part of Type f3.6.2 essentially states that Constant is part of Type, not an independent attribute. (We are talking about constant expressions, not about the `const' type modifier.) However, constant-ness is essentially orthogonal to type. Constant-ness is not like a type modifier; if it were, "constant int" would be a type distinct from "int", and for a "constant int" to be used as an "int" there would have to be a type conversion. But there are no such conversions mentioned in the WP. For this reason, and because there is no syntax to talk about a "constant" type, I have made Constant a separate attribute. The same needs to happen in the WP. 2.3 The List is Not Enough for ID-Expressions My list of attributes isn't enough to define the right-hand side of the `.' and `->' operators. For that more attributes come into play, including whether `m' is a static member. In the grammar, the right-hand side of `.' and `->' is an id-expression. We can view these additional attributes (not mentioned again) as attributes of id-expressions, but not of other kinds of expressions. 3. CONTEXT Each expression occurs in a context. For purposes of type analysis the context of an expression consists of the type system (all visible type declarations) and a _class context_. A class context is either nil or a possibly-cv-qualified class type. For example: void f(); struct A { void g() { ...; // class context is `A' } void h() const { ...; // class context is `const A' } }; main() { ...; // class context is nil } 4. EXAMPLES This section is driven entirely by examples. The first, "simple" examples are just to illustrate the concepts. The second, "tough" examples are an attempt to show ambiguities that attributes can help us with. I haven't succeeded in understanding all of the examples, but they can serve as a basis for discussion. 4.1 Notation and Defaults Our notation is to show the attributes as a list: Name1: value1; Name2: value2 A boolean attribute is false unless its name appears, in which case it is true. So for example "Lvalue" is short for "Lvalue: true", and if no "Lvalue" appears in the list then "Lvalue: false" is implied. Other attributes are nil unless otherwise indicated. The Overload_set is delimited using { } and [ ] if necessary, but when it is the singleton set the delimiters are omitted. Note: I am NOT proposing to introduce this notation into the WP. 4.2 Simple Examples Examples are based on this program fragment: int i; struct A { int m; int b :2; void g(); void cg() const; ~A(); }; A a; void f(); void f(int); int& r = i; void A::g() { // All expressions below are in this context. } Expression Attributes ---------- ---------- i Type: int; Lvalue m Type: int; Lvalue; NS_member: A &m Type: int* A::m Type: int; Lvalue; NS_member: A; Class_qualifier: A &A::m Type: int A::* this->m Type: int; Lvalue r Type: int; Lvalue; Was_reference Expression Attributes ---------- ---------- f { [ Type: void () ], [ Type: void (int) ] }; Lvalue &f { [ Type: void (*)() ], [ Type: void (*)(int) ] } g Type: void () ; NS_member: A; Lvalue cg Type: void () const; NS_member: A; Lvalue Expression Attributes ---------- ---------- b Type: int; Lvalue; Bit_field b = 0 Type: int; Lvalue; Bit_field i ? i : b Type: int; Lvalue; Bit_field &(b = 0) not well-formed since unary & does not allow bit-fields &(i ? i : b) not well-formed since unary & does not allow bit-fields Note: how Bit_field propagates through the assignment and ?: operators is not clearly described in the WP, but these are reasonable interpretations. Expression Attribute ---------- --------- A::~A Type: void () const volatile; NS_member: A; Class_qualifier: A Note: Thinking of a destructor as having const volatile type (when it appears in an expression) is a possible way to show that it is well-formed to invoke it explicitly on const or volatile objects, eg: volatile A va; va.A::~A(); Note also that a destructor is not an lvalue, since you cannot take its address. 4.3 Tough Examples >>> Example 1 struct A { void f(); }; void (A::*fp1)() = &A::f; // ok void (A::*fp2)() = &(A::f); // well-formed or not? >>> Example 1 Attributes Expression Attributes ---------- ---------- A::f Type: void (); NS_member: A; Class_qualifier: A (A::f) EITHER the same OR the same except that Class_qualifier is nil &A::f Type: void (A::*)() &(A::f) EITHER Type: void (A::*)() OR not well-formed since if a member function's address is taken Class_qualifier must be non-nil >>> Example 1 Discussion f5.3/2 favors the second outcome (not well-formed), since it claims that the syntax `& qualified-id' is necessary to produce a pointer-to-member. On the other hand, one expects `( E )' to mean the same as `E' for any expression `E'. Cfront 3.0 implements the first outcome (well-formed). Since compilers are typically implemented using attributes much like what this letter discusses, we can expect most of them to follow cfront in this matter. Rewording f5.3/2 in terms of the Class_qualifier attribute would clarify this area. Then f5.1/5, which defines the meaning of `( expression )', can either state that Class_qualifier is preserved by the parens (giving us the first outcome), or that it is made nil (giving us the second outcome). >>> Example 2 struct A { void f(); static void f(int); }; void (A::*fp1)() = &A::f; // gets void f() void (*fp2)(int) = &A::f; // gets static void f(int) void A::f() { void (*fp3)(int) = &f; // not well-formed } >>> Example 2 Attributes Expression Attributes ---------- ---------- A::f { [ Type: void (); NS_member: A ], [ Type: void (int); Lvalue ] }; Class_qualifier: A &A::f { [ Type: void (A::*)() ], [ Type: void (*)(int) ] } &f { [ Type: void (); NS_member: A ], [ Type: void (int); Lvalue ] } >>> Example 2 Discussion It is surprising enough to find that we can have a static and a nonstatic member function with the same name. But `&A::f' is an expression with two types, one of which is a pointer type and the other of which is a pointer-to-member type. This may seem bizarre. `&f' is not well-formed because one of the two `f's is not an lvalue. This is an example of why Lvalue and NS_member need to be part of the Overload_set and not independent attributes. In f5.3/2 and f13.3 the WP gives the rules for this example, but it is never said that the rule for address-of (both address-of qualified-id and normal address-of) can apply to overloaded names, in particular to overloaded names some of which are nonstatic member and some of which are not. Rewording in terms of attributes would clarify this example. >>> Example 3 struct A { void f(A&); void f(const A&); void g(A&) const; void g(const A&); }; void h(A a, const A ca) { a.f(a); // calls f(A&) by 13.2/12 a.g(a); // ambiguous ca.g(a); // calls g(A&) const } >>> Example 3 Attributes Expression Attributes ---------- ---------- f { [ Type: void (A&); NS_member: A ], [ Type: void (const A&); NS_member: A ] } g { [ Type: void (A&) const; NS_member: A ], [ Type: void (const A&); NS_member: A ] } a.f { [ Type: void (A&) ], [ Type: void (const A&) ] } a.g { [ Type: void (A&) const ], [ Type: void (const A&) ] } ca.g Type: void (A&) const OR Type: void (A&) >>> Example 3 Discussion This is a straightforward application of the rules of f13.2. Thinking about it in terms of attributes is interesting, though. Whether the function type of `ca.g' still has its const qualifier is an interesting question. It doesn't matter in this example, but intuitively the const qualifier has "served its purpose" after being used to select one of the two `g's. It probably matters in Example 4. >>> Example 4 struct A { void f() const; static void f(); }; void A::f() const { f(); // ? } >>> Example 4 Attributes Expression Attributes ---------- ---------- f { [ Type: void () const; NS_member: A ], [ Type: void () Lvalue ] }; >>> Example 4 Discussion This is a difficult one. It is not specified by f13.2 how to resolve overloaded static and nonstatic member functions. >>> Example 5 struct B; struct A { void operator+(B&); }; struct B : A {}; void operator+(B&, A&); void f(B b1, B b2) { b1 + b2; // ambiguous } >>> Example 5 Attributes Expression Attributes ---------- ---------- + { [ Type: void (B&); NS_member: A ], [ Type: void (B&, A&); Lvalue ] }; b1 Type: B; Lvalue b2 Type: B; Lvalue >>> Example 5 Discussion Obviously overloaded operators are very special. They are the only case where a (nonstatic) member function and a non-member function "compete" for the same arguments. "+" is not an expression, of course, but we need some way to talk about the set of overloadings introduced by that operator. 5. WORDING IN THE WP 5.1 Introducing the Attributes I don't suggest we introduce this notation into the WP. Instead, I suggest we introduce the attributes in a less formal way, as is done with lvalues and non-lvalues. We may want to use the word "property" instead; it has a friendlier sound but means the same thing. The section introducing the attributes might be something like this: 3.8 Other Expression Properties An expression's type, and whether it is an lvalue, are two _properties_ of the expression. Expressions have a number of other properties. Some expressions have a list of default arguments (f8.2.6). Some expressions are nonstatic members of some class (f9.2). Some expressions are constant expressions with some value (f5.19). Some expressions are bit-field expressions (f9.6). Some expressions are class-qualified by a given class (f5.3). Some expressions are formerly-reference expressions, meaning that they are derived from an identifier of reference type. I am unsure how to introduce the overload set. 5.2 Adding Bit-Field into the Rest of the WP Here is an example of how we would add one attribute -- Bit_field -- to other areas of the WP. Doing so would clarify some of the bit-field examples in section 4.2 above. Unfortunately some other attributes are quite a bit harder to add. In f5.1/5 (Primary Expressions), add to the last sentence or a bit-field. To f5.16/3 (Conditional Operator), append this sentence: The result is a bit-field if it is an lvalue and the second and third operands are both bit-fields. To f5.17/1 (Assignment Operators), append this sentence: The result is a bit-field if the left operand is a bit-field. In f5.18/1 (Comma Operator), add to the last sentence: ; the result is a bit-field if its right operand is. The part bit-fields play in argument matching must also be clarified. Just as a non-lvalue cannot be bound to a nonconst reference, so a bit-field cannot be bound to *any* reference. But is this checked during, or after, overloading resolution? It makes a difference, as in this example: void f(long); void f(int&); struct A { int i; int b :2; }; A a; f(a.i); // calls f(int&) f(a.b); // either not well-formed or calls f(long) 5.3 The Overload Set As Scott Turner has pointed out, the Overload_set is not really accounted for in the WP. We don't want to say The overload set must be a singleton set. in countless locations of the WP. Instead, we can say something like Except in the function call position of a function call operator or as an initializer, an expression's overload set must have a cardinality of 1. Because of this restriction, some sections of this WP can refer to "the type" of an expression. This is not enough, though. This issue, and many others, awaits a comprehensive revision of argument matching, initialization, and built-in operators. If built-in operators can be expressed as overloadings, then the number of expression contexts will be greatly reduced and this issue will become more tractable.