Doc Number: X3J16/94-0111 WG21/N0498 Date: March 30, 1994 Project: Programming Language C++ A NOTE ON CONVERSION SEQUENCES Samuel C. Kendall Sun Microsystems Laboratories, Inc. sam.kendall@east.sun.com 0. INTRODUCTION This note is part of the work of Tom Plum's subgroup of the Core WG. This note has five sections. Section 1 introduces a notation I have found helpful in thinking about standard conversion sequences. Section 2 gives an additional rule for ordering standard conversion sequences, supplementing Tom Wilcox's excellent revised clause [over], 94-0080/N0467. Section 3 explores issues related to rvalues of incomplete type and proposes a (new?) rule. Second 4 explores another couple of issues and proposes two adjustments to Tom Wilcox's [over]. And finally, section 5 gives a number of examples of conversion sequences and how they are ranked. 1. CONVERSION SEQUENCE NOTATION This is a notation I and others have found helpful in discussing conversion sequences. I hope it will continue to be useful. Don't worry, this notation is NOT intended to go into the WP. In explaining the notation I also explain something about what conversions and conversion sequences are. In type analysis a conversion is NOT just an ordered pair [Type1, Type2] or (in more familiar notation) Type1 --> Type2 it is more than that. Formally, it is a seven-tuple Category Type1, Lvalue1, Constant1 --------> Type2, Lvalue2, Constant2 where each LvalueK is a boolean lvalue-ness attribute; each ConstantK is either NIL (indicating "not a constant") or a constant value of type TypeK; and Category is the conversion category (see below). But writing it like that is too cumbersome. Instead, we write conversions as in these examples: rval D* --> rval B* conversion of pointer-to-derived to pointer-to-base lval int[5] --> rval int* conversion of an array to a pointer to the first element constant 0 int --> constant 0 void* a null pointer conversion If "constant" is not present, that means either that that side of the conversion is not a constant, or that the Constant attribute is irrelevant to this example. If neither "lval" or "rval" is present, the Lvalue attribute is obvious or irrelevant to the example. For example, all constants are rvalues, so we don't write the "rval" in the last example above. The (revised) conversion categories are: Lvalue Conversions User-Defined Conversions Rvalue Conversions Ellipsis Conversions Qualification Conversions Promotions Standard Conversions (of course, the two on the right never appear in standard conversion sequences). We write the first letter of the category in or above the arrow, eg: rval D* -s-> rval B* lval int[5] -l-> rval int* constant 0 int -s-> constant 0 void* So far we have shown only single conversions. We write conversion *sequences* in the obvious way, eg: short s; int i = s; // lval short -l-> rval short -p-> rval int 2. PREFERRING SHORTER SEQUENCES This section gives a rule not in Tom Wilcox's revised [over.ics.rank]; it applies after most of those rules have already "had their say", eg, it is a tie-breaker in ranking standard conversion sequences: If standard conversion sequences S1 and S2 have the same initial and final type, lvalue-ness, and constant-ness, and S1 is shorter (has fewer conversions) than S2, then S1 is better than S2. * Example 2.1: volatile int* p = 0; We initialize using the sequence constant 0 int -s-> constant 0 volatile int* rather than the longer sequence constant 0 int -s-> constant 0 int* -q-> constant 0 volatile int* * Example 2.2: struct B {}; struct D : B {}; D d; const B& r = d; We initialize using the sequence lval D -s-> lval B -q-> lval const B rather than the longer sequence lval D -l-> rval D -s-> rval B -q-> rval const B -r-> lval const B 3. CONVERTING AND INITIALIZING WITH INCOMPLETE TYPES Here are three examples of how errors happen when trying to make rvalues of incomplete object types. I was unable to find a discussion of this in [dcl.init] or [class.copy], but I may have missed it. * Example 3.1: struct A; void f(A); extern A a; f(a); // ill-formed The conversion sequence is lval A -l-> rval A We can make a simple rule to explain why this example is ill-formed, putting it in terms of conversions (my preference): Error-checking rule #1: an lvalue T cannot become an rvalue T if T is an incomplete type. or in terms of initialization: Error-checking rule #2: an rvalue of incomplete type T cannot be used to initialize a variable or parameter of non-reference type. Either of these rules are workable and explain example 3.1, but there is more to it; read on! * Example 3.2: struct A; void f(A); void f(A&); extern A a; f(a); // ambiguous, or f(A&) since f(A) would be // erroneous? Ambiguous! Here are the conversion sequences: lval A -l-> rval A // for f(A) lval A // for f(A&), the identity sequence Applied naively, our error-checking rules would cause an error "too soon", f(A) would be thrown out, and f(A&) would be called. So we must delay applying the rule until after overloading resolution. The rule is applied at the same time as - access checking - checking for whether a bit-field was bound to a reference * Example 3.3: struct A; extern A a; void f(...); f(a); // ill-formed The ellipsis conversion sequence is lval A -l-> rval A -e-> "..." Error-checking rule #1 correctly yields an error. Error-checking rule #2 does not apply unless it is enhanced to explicitly mention ellipsis. Error-checking rule #2 (fixed): an rvalue of incomplete type T cannot be used to initialize a variable or parameter of non-reference type, nor to 'initialize' ellipsis. This is awkward. For this reason I recommend error-checking rule #1. 4. TWO MINOR ADJUSTMENTS TO TOM WILCOX'S [over] Example 4.1: void f(int&); void f(int); int i; f(i); // ambiguous The sequences are S1: lval int (identity conversion sequence) S2: lval int -l-> rval int In spite of the fact that S1 "is a proper subsequence of" S2, we want the call to be ambiguous. So the rule in [over.ics.rank] should become something like -- S1 is a proper subsequence of S2, AND S1 does not differ from S2 only in lacking an Lvalue Conversion or only in lacking an Rvalue Conversion, or else .... That's the first adjustment. The second adjustment concerns the precise formulation of the "identity conversion sequence". As I have S1 above, it is a zero-length conversion sequence. However, Tom's [over] sort-of implies that there is a category "Identity Conversions": there are no zero-length standard conversion sequences. Instead, S1 becomes S1': lval int -i-> lval int If we analyze S1' vs. S2, the problem that led to the first adjustment does not arise. I haven't been able to think of examples where it does. However, I find the identity conversion to be a kludge; I'd prefer that we allow zero-length standard conversion sequences, because they are mathematically more regular and thus easier to think about. My second adjustment consists of two alternatives. I recommend the first one, and have used it throughout this paper. EITHER MAKE the first adjustment AND clarify [over] to specify that standard conversion sequences can have zero length, OR DO NOT make the first adjustment, BUT clarify [over] to specify that there are no zero-length standard conversion sequences, that an identity conversion is inserted instead. 5. MORE EXAMPLES Most of these come from discussions between Tom Wilcox and me. They are intended as a resource for people writing up the various rules, and for people thinking about trying to "improve" the rules. (Tom, Bill Gibbons, and Steve Adamczyk, this means you!) Here's my advice: programmers sometimes get their overloadings to work without understanding why they work; or they have long forgotten why their overloadings work. If you change the rules, even to improve them, be extremely careful not to break code. * Example 5.1: void f(const int&); void f(long); f(5); // f(const int&) We pick f(const int&) using the sequence rval int -q-> rval const int -r-> lval const int The alternative is a shorter, but worse, sequence rval int -s-> rval long This is one example of why we don't simply rank conversion sequences by their length. * Example 5.2: I tried to come up with a simple example where a promotion wins over a standard conversion due to promotions being "better" than standard conversions. I couldn't, because usually the subsequence rule causes the promotion to win (see example 5.4). But here is a more complicated example of a promotion winning over a standard conversion: struct A { operator short(); operator int*(); }; A a; void f(int); void f(void*); f(a); // f(int) wins The sequences are lval A -u-> rval short -p-> rval int lval A -u-> rval int* -s-> rval void* We compare these u-d sequences by comparing the std sequences following the u-d conversion, EVEN THOUGH those sequences start from DIFFERENT initial types. One of those sequences consists of a promotion, the other of a standard conversion; so the promotion wins. * Example 5.3: Here is an example of the strangeness caused by null pointer conversions: void f(long); void f(char*); f(0); // ambiguous: constant int 0 -s-> constant char* // vs. int -s-> long * Example 5.4: For compatibility with most existing implementations, a "small" arithmetic type must be promoted before it can be "demoted" again using a standard conversion. For example: void f(short); void f(int); f('c'); // ok, f(int) (perhaps surprising) We get to short via char -p-> int -s-> short (we assume we are on a machine where char promotes to int). We get to int via char -p-> int Since the latter is a subsequence of the former, it is better. * Example 5.5: There is one case where a promotion may be followed by a non-numeric standard conversion: void f(char*); void f(int); f('\0'); // ok, f(int) Cfront 3.x and Turbo C++ prefer f(int), apparently because the conversion sequences are constant 0 char --p--> constant 0 int --s--> constant 0 char* constant 0 char --p--> constant 0 int * Example 5.6: But if we change example 5.5 slightly: void f(char*); void f(short); f('\0'); // ambiguous then it's ambiguous: constant 0 char --p--> constant 0 int --s--> constant 0 char* constant 0 char --p--> constant 0 int --s--> constant 0 short At issue is whether the "null pointer" conversions are [1] all integral and enum constant 0 -s-> T* constant 0 or [2] {int,unsigned,long,unsigned long} constant 0 -s-> T* constant 0 The current [conv.ptr] says [1]. Borland and cfront implement [2], and that is what I have documented. * Example 5.7: User-defined conversions that overlap standard conversions, or threaten to, often confuse people. But they can be handled straightforwardly using the existing rules. struct B {}; struct D : B { operator B&(); }; D d; B& r = d; // ok, uses lval D -s-> lval B The user-defined conversion is not used because a standard conversion sequence is better than a user-defined conversion sequence. * Example 5.8: Here is another one with a user-defined conversion that is "like" a built-in conversion: struct A {}; struct B : A {}; struct C : A {}; struct D : B, C { operator A&(); }; D d; A& r = d; // ok, uses d.operator A&() The user-defined conversion sequence is very direct: lval D -u-> lval A There is no standard conversion lval D -s-> lval A, since A is an ambiguous base of D. * Example 5.9: Here is an example of user-defined conversions and the built-in assignment operator. struct A { operator int&(); }; A a; int i; i = a; // ok, lval A -u-> lval int -l-> rval int a = i; // ill-formed: no u-d conversions on lhs of assignment We disallow user-defined conversions on the left-hand side of assignment in order to make built-in assignment consistent with member assignment operators. * Example 5.10: This example is interesting because it involves an overloaded function name as the argument to an overloaded function. Otherwise it is pretty straightforward. typedef void F1(); typedef void F2(int); extern F1 f; extern F2 f; struct A { A(F1&); }; struct B { B(F2&); }; void g(A); void g(const B&); g(f); // g(A) The user-defined conversion sequences are: S1 lval F1 -u-> rval A S2 lval F2 -u-> rval B -q-> rval const B -r-> lval const B g(A) is preferred because the (0-length) second standard conversion sequence in S1 is better than the second standard conversion sequence in S2, because the latter standard conversion sequence has a qualification conversion.