Doc Number: X3J16/94-0111
WG21/N0498
Date: March 30, 1994
Project: Programming Language C++
A NOTE ON CONVERSION SEQUENCES
Samuel C. Kendall
Sun Microsystems Laboratories, Inc.
sam.kendall@east.sun.com
0. INTRODUCTION
This note is part of the work of Tom Plum's subgroup of the Core WG.
This note has five sections. Section 1 introduces a notation I have
found helpful in thinking about standard conversion sequences. Section
2 gives an additional rule for ordering standard conversion sequences,
supplementing Tom Wilcox's excellent revised clause [over],
94-0080/N0467. Section 3 explores issues related to rvalues of
incomplete type and proposes a (new?) rule. Second 4 explores another
couple of issues and proposes two adjustments to Tom Wilcox's [over].
And finally, section 5 gives a number of examples of conversion
sequences and how they are ranked.
1. CONVERSION SEQUENCE NOTATION
This is a notation I and others have found helpful in discussing
conversion sequences. I hope it will continue to be useful. Don't
worry, this notation is NOT intended to go into the WP.
In explaining the notation I also explain something about what
conversions and conversion sequences are.
In type analysis a conversion is NOT just an ordered pair
[Type1, Type2]
or (in more familiar notation)
Type1 --> Type2
it is more than that. Formally, it is a seven-tuple
Category
Type1, Lvalue1, Constant1 --------> Type2, Lvalue2, Constant2
where each LvalueK is a boolean lvalue-ness attribute; each ConstantK
is either NIL (indicating "not a constant") or a constant value of type
TypeK; and Category is the conversion category (see below).
But writing it like that is too cumbersome. Instead, we write
conversions as in these examples:
rval D* --> rval B* conversion of pointer-to-derived to
pointer-to-base
lval int[5] --> rval int*
conversion of an array to a pointer to
the first element
constant 0 int --> constant 0 void*
a null pointer conversion
If "constant" is not present, that means either that that side of the
conversion is not a constant, or that the Constant attribute is
irrelevant to this example. If neither "lval" or "rval" is present, the
Lvalue attribute is obvious or irrelevant to the example. For example,
all constants are rvalues, so we don't write the "rval" in the last
example above.
The (revised) conversion categories are:
Lvalue Conversions User-Defined Conversions
Rvalue Conversions Ellipsis Conversions
Qualification Conversions
Promotions
Standard Conversions
(of course, the two on the right never appear in standard conversion
sequences). We write the first letter of the category in or above the
arrow, eg:
rval D* -s-> rval B*
lval int[5] -l-> rval int*
constant 0 int -s-> constant 0 void*
So far we have shown only single conversions. We write conversion
*sequences* in the obvious way, eg:
short s;
int i = s; // lval short -l-> rval short -p-> rval int
2. PREFERRING SHORTER SEQUENCES
This section gives a rule not in Tom Wilcox's revised
[over.ics.rank]; it applies after most of those rules have already "had
their say", eg, it is a tie-breaker in ranking standard conversion
sequences:
If standard conversion sequences S1 and S2 have the same initial
and final type, lvalue-ness, and constant-ness, and S1 is
shorter (has fewer conversions) than S2, then S1 is better than
S2.
* Example 2.1:
volatile int* p = 0;
We initialize using the sequence
constant 0 int -s-> constant 0 volatile int*
rather than the longer sequence
constant 0 int -s-> constant 0 int*
-q-> constant 0 volatile int*
* Example 2.2:
struct B {};
struct D : B {};
D d;
const B& r = d;
We initialize using the sequence
lval D -s-> lval B -q-> lval const B
rather than the longer sequence
lval D -l-> rval D
-s-> rval B
-q-> rval const B
-r-> lval const B
3. CONVERTING AND INITIALIZING WITH INCOMPLETE TYPES
Here are three examples of how errors happen when trying to make rvalues
of incomplete object types. I was unable to find a discussion of this
in [dcl.init] or [class.copy], but I may have missed it.
* Example 3.1:
struct A;
void f(A);
extern A a;
f(a); // ill-formed
The conversion sequence is
lval A -l-> rval A
We can make a simple rule to explain why this example is ill-formed,
putting it in terms of conversions (my preference):
Error-checking rule #1: an lvalue T cannot become an rvalue T if
T is an incomplete type.
or in terms of initialization:
Error-checking rule #2: an rvalue of incomplete type T
cannot be used to initialize a variable or parameter of
non-reference type.
Either of these rules are workable and explain example 3.1, but there
is more to it; read on!
* Example 3.2:
struct A;
void f(A);
void f(A&);
extern A a;
f(a); // ambiguous, or f(A&) since f(A) would be
// erroneous? Ambiguous!
Here are the conversion sequences:
lval A -l-> rval A // for f(A)
lval A // for f(A&), the identity sequence
Applied naively, our error-checking rules would cause an error "too
soon", f(A) would be thrown out, and f(A&) would be called. So we must
delay applying the rule until after overloading resolution. The rule is
applied at the same time as
- access checking
- checking for whether a bit-field was bound to a reference
* Example 3.3:
struct A;
extern A a;
void f(...);
f(a); // ill-formed
The ellipsis conversion sequence is
lval A -l-> rval A -e-> "..."
Error-checking rule #1 correctly yields an error. Error-checking rule
#2 does not apply unless it is enhanced to explicitly mention ellipsis.
Error-checking rule #2 (fixed): an rvalue of incomplete type T
cannot be used to initialize a variable or parameter of
non-reference type, nor to 'initialize' ellipsis.
This is awkward. For this reason I recommend error-checking rule #1.
4. TWO MINOR ADJUSTMENTS TO TOM WILCOX'S [over]
Example 4.1:
void f(int&);
void f(int);
int i;
f(i); // ambiguous
The sequences are
S1: lval int (identity conversion sequence)
S2: lval int -l-> rval int
In spite of the fact that S1 "is a proper subsequence of" S2, we want the
call to be ambiguous. So the rule in [over.ics.rank] should become
something like
-- S1 is a proper subsequence of S2, AND S1 does not differ from
S2 only in lacking an Lvalue Conversion or only in lacking an
Rvalue Conversion, or else ....
That's the first adjustment.
The second adjustment concerns the precise formulation of the "identity
conversion sequence". As I have S1 above, it is a zero-length
conversion sequence. However, Tom's [over] sort-of implies that there
is a category "Identity Conversions": there are no zero-length standard
conversion sequences. Instead, S1 becomes
S1': lval int -i-> lval int
If we analyze S1' vs. S2, the problem that led to the first adjustment
does not arise. I haven't been able to think of examples where it does.
However, I find the identity conversion to be a kludge; I'd prefer that
we allow zero-length standard conversion sequences, because they are
mathematically more regular and thus easier to think about.
My second adjustment consists of two alternatives. I recommend the
first one, and have used it throughout this paper.
EITHER
MAKE the first adjustment AND clarify [over] to specify that
standard conversion sequences can have zero length,
OR
DO NOT make the first adjustment, BUT clarify [over] to specify
that there are no zero-length standard conversion sequences,
that an identity conversion is inserted instead.
5. MORE EXAMPLES
Most of these come from discussions between Tom Wilcox and me. They
are intended as a resource for people writing up the various rules, and
for people thinking about trying to "improve" the rules. (Tom, Bill
Gibbons, and Steve Adamczyk, this means you!) Here's my advice:
programmers sometimes get their overloadings to work without
understanding why they work; or they have long forgotten why their
overloadings work. If you change the rules, even to improve them, be
extremely careful not to break code.
* Example 5.1:
void f(const int&);
void f(long);
f(5); // f(const int&)
We pick f(const int&) using the sequence
rval int -q-> rval const int -r-> lval const int
The alternative is a shorter, but worse, sequence
rval int -s-> rval long
This is one example of why we don't simply rank conversion sequences by
their length.
* Example 5.2:
I tried to come up with a simple example where a promotion wins over a
standard conversion due to promotions being "better" than standard
conversions. I couldn't, because usually the subsequence rule causes
the promotion to win (see example 5.4).
But here is a more complicated example of a promotion winning over a
standard conversion:
struct A { operator short(); operator int*(); };
A a;
void f(int);
void f(void*);
f(a); // f(int) wins
The sequences are
lval A -u-> rval short -p-> rval int
lval A -u-> rval int* -s-> rval void*
We compare these u-d sequences by comparing the std sequences following
the u-d conversion, EVEN THOUGH those sequences start from DIFFERENT
initial types. One of those sequences consists of a promotion, the
other of a standard conversion; so the promotion wins.
* Example 5.3:
Here is an example of the strangeness caused by null pointer
conversions:
void f(long);
void f(char*);
f(0); // ambiguous: constant int 0 -s-> constant char*
// vs. int -s-> long
* Example 5.4:
For compatibility with most existing implementations, a "small"
arithmetic type must be promoted before it can be "demoted" again using
a standard conversion. For example:
void f(short);
void f(int);
f('c'); // ok, f(int) (perhaps surprising)
We get to short via
char -p-> int -s-> short
(we assume we are on a machine where char promotes to int). We get to
int via
char -p-> int
Since the latter is a subsequence of the former, it is better.
* Example 5.5:
There is one case where a promotion may be followed by a non-numeric
standard conversion:
void f(char*);
void f(int);
f('\0'); // ok, f(int)
Cfront 3.x and Turbo C++ prefer f(int), apparently because the
conversion sequences are
constant 0 char --p--> constant 0 int --s--> constant 0 char*
constant 0 char --p--> constant 0 int
* Example 5.6:
But if we change example 5.5 slightly:
void f(char*);
void f(short);
f('\0'); // ambiguous
then it's ambiguous:
constant 0 char --p--> constant 0 int --s--> constant 0 char*
constant 0 char --p--> constant 0 int --s--> constant 0 short
At issue is whether the "null pointer" conversions are
[1] all integral and enum constant 0 -s-> T* constant 0
or
[2] {int,unsigned,long,unsigned long} constant 0 -s->
T* constant 0
The current [conv.ptr] says [1]. Borland and cfront implement [2], and
that is what I have documented.
* Example 5.7:
User-defined conversions that overlap standard conversions, or threaten
to, often confuse people. But they can be handled straightforwardly
using the existing rules.
struct B {};
struct D : B { operator B&(); };
D d;
B& r = d; // ok, uses lval D -s-> lval B
The user-defined conversion is not used because a standard conversion
sequence is better than a user-defined conversion sequence.
* Example 5.8:
Here is another one with a user-defined conversion that is "like" a
built-in conversion:
struct A {};
struct B : A {};
struct C : A {};
struct D : B, C { operator A&(); };
D d;
A& r = d; // ok, uses d.operator A&()
The user-defined conversion sequence is very direct:
lval D -u-> lval A
There is no standard conversion lval D -s-> lval A, since A is an
ambiguous base of D.
* Example 5.9:
Here is an example of user-defined conversions and the built-in
assignment operator.
struct A { operator int&(); };
A a;
int i;
i = a; // ok, lval A -u-> lval int -l-> rval int
a = i; // ill-formed: no u-d conversions on lhs of assignment
We disallow user-defined conversions on the left-hand side of assignment
in order to make built-in assignment consistent with member assignment
operators.
* Example 5.10:
This example is interesting because it involves an overloaded function
name as the argument to an overloaded function. Otherwise it is pretty
straightforward.
typedef void F1();
typedef void F2(int);
extern F1 f;
extern F2 f;
struct A { A(F1&); };
struct B { B(F2&); };
void g(A);
void g(const B&);
g(f); // g(A)
The user-defined conversion sequences are:
S1 lval F1 -u-> rval A
S2 lval F2 -u-> rval B -q-> rval const B -r-> lval const B
g(A) is preferred because the (0-length) second standard conversion
sequence in S1 is better than the second standard conversion sequence
in S2, because the latter standard conversion sequence has a
qualification conversion.