Doc Number:	X3J16/94-0111
							WG21/N0498
					Date:		March 30, 1994
					Project:	Programming Language C++


			     A NOTE ON CONVERSION SEQUENCES

				   Samuel C. Kendall
			  Sun Microsystems Laboratories, Inc.
				sam.kendall@east.sun.com


	0.  INTRODUCTION

	   This note is part of the work of Tom Plum's subgroup of the Core WG.

	   This note has five sections.  Section 1 introduces a notation I have
	found helpful in thinking about standard conversion sequences.  Section
	2 gives an additional rule for ordering standard conversion sequences,
	supplementing Tom Wilcox's excellent revised clause [over],
	94-0080/N0467.  Section 3 explores issues related to rvalues of
	incomplete type and proposes a (new?) rule.  Second 4 explores another
	couple of issues and proposes two adjustments to Tom Wilcox's [over].
	And finally, section 5 gives a number of examples of conversion
	sequences and how they are ranked.


	1.  CONVERSION SEQUENCE NOTATION

	   This is a notation I and others have found helpful in discussing
	conversion sequences.  I hope it will continue to be useful.  Don't
	worry, this notation is NOT intended to go into the WP.

	   In explaining the notation I also explain something about what
	conversions and conversion sequences are.

	   In type analysis a conversion is NOT just an ordered pair

		[Type1, Type2]

	or (in more familiar notation)

		Type1 --> Type2

	it is more than that.  Formally, it is a seven-tuple

					  Category
		Type1, Lvalue1, Constant1 --------> Type2, Lvalue2, Constant2

	where each LvalueK is a boolean lvalue-ness attribute; each ConstantK
	is either NIL (indicating "not a constant") or a constant value of type
	TypeK; and Category is the conversion category (see below).

	   But writing it like that is too cumbersome.  Instead, we write
	conversions as in these examples:

		rval D* --> rval B*	conversion of pointer-to-derived to
					pointer-to-base

		lval int[5] --> rval int*
					conversion of an array to a pointer to
					the first element

		constant 0 int --> constant 0 void*
					a null pointer conversion

	If "constant" is not present, that means either that that side of the
	conversion is not a constant, or that the Constant attribute is
	irrelevant to this example.  If neither "lval" or "rval" is present, the
	Lvalue attribute is obvious or irrelevant to the example.  For example,
	all constants are rvalues, so we don't write the "rval" in the last
	example above.

	   The (revised) conversion categories are:

		Lvalue Conversions		User-Defined Conversions
		Rvalue Conversions		Ellipsis Conversions
		Qualification Conversions
		Promotions
		Standard Conversions

	(of course, the two on the right never appear in standard conversion
	sequences).  We write the first letter of the category in or above the
	arrow, eg:

		rval D* -s-> rval B*
		lval int[5] -l-> rval int*
		constant 0 int -s-> constant 0 void*

	So far we have shown only single conversions.  We write conversion
	*sequences* in the obvious way, eg:

		short s;
		int i = s;	// lval short -l-> rval short -p-> rval int


	2.  PREFERRING SHORTER SEQUENCES

	   This section gives a rule not in Tom Wilcox's revised
	[over.ics.rank]; it applies after most of those rules have already "had
	their say", eg, it is a tie-breaker in ranking standard conversion
	sequences:

		If standard conversion sequences S1 and S2 have the same initial
		and final type, lvalue-ness, and constant-ness, and S1 is
		shorter (has fewer conversions) than S2, then S1 is better than
		S2.

	* Example 2.1:

		volatile int* p = 0;

	We initialize using the sequence

		constant 0 int -s-> constant 0 volatile int*

	rather than the longer sequence

		constant 0 int -s-> constant 0 int*
			       -q-> constant 0 volatile int*

	* Example 2.2:

		struct B {};
		struct D : B {};
		D d;
		const B& r = d;

	We initialize using the sequence

		lval D -s-> lval B -q-> lval const B

	rather than the longer sequence

		lval D -l-> rval D
		       -s-> rval B
		       -q-> rval const B
		       -r-> lval const B


	3.  CONVERTING AND INITIALIZING WITH INCOMPLETE TYPES

	Here are three examples of how errors happen when trying to make rvalues
	of incomplete object types.  I was unable to find a discussion of this
	in [dcl.init] or [class.copy], but I may have missed it.

	* Example 3.1:

		struct A;
		void f(A);
		extern A a;
		f(a);		// ill-formed

	The conversion sequence is

		lval A -l-> rval A

	We can make a simple rule to explain why this example is ill-formed,
	putting it in terms of conversions (my preference):

		Error-checking rule #1: an lvalue T cannot become an rvalue T if
		T is an incomplete type.

	or in terms of initialization:

		Error-checking rule #2: an rvalue of incomplete type T
		cannot be used to initialize a variable or parameter of
		non-reference type.

	Either of these rules are workable and explain example 3.1, but there
	is more to it; read on!

	* Example 3.2:

		struct A;
		void f(A);
		void f(A&);
		extern A a;
		f(a);		// ambiguous, or f(A&) since f(A) would be
				// erroneous?  Ambiguous!

	Here are the conversion sequences:

		lval A -l-> rval A	// for f(A)
		lval A			// for f(A&), the identity sequence

	Applied naively, our error-checking rules would cause an error "too
	soon", f(A) would be thrown out, and f(A&) would be called.  So we must
	delay applying the rule until after overloading resolution.  The rule is
	applied at the same time as

	 - access checking
	 - checking for whether a bit-field was bound to a reference

	* Example 3.3:

		struct A;
		extern A a;
		void f(...);
		f(a);		// ill-formed

	The ellipsis conversion sequence is

		lval A -l-> rval A -e-> "..."

	Error-checking rule #1 correctly yields an error.  Error-checking rule
	#2 does not apply unless it is enhanced to explicitly mention ellipsis.

		Error-checking rule #2 (fixed): an rvalue of incomplete type T
		cannot be used to initialize a variable or parameter of
		non-reference type, nor to 'initialize' ellipsis.

	This is awkward.  For this reason I recommend error-checking rule #1.


	4.  TWO MINOR ADJUSTMENTS TO TOM WILCOX'S [over]

	Example 4.1:

		void f(int&);
		void f(int);
		int i;
		f(i);			// ambiguous

	The sequences are

	S1:	lval int			(identity conversion sequence)
	S2:	lval int -l-> rval int

	In spite of the fact that S1 "is a proper subsequence of" S2, we want the
	call to be ambiguous.  So the rule in [over.ics.rank] should become
	something like

		-- S1 is a proper subsequence of S2, AND S1 does not differ from
		   S2 only in lacking an Lvalue Conversion or only in lacking an
		   Rvalue Conversion, or else ....

	That's the first adjustment.

	The second adjustment concerns the precise formulation of the "identity
	conversion sequence".  As I have S1 above, it is a zero-length
	conversion sequence.  However, Tom's [over] sort-of implies that there
	is a category "Identity Conversions": there are no zero-length standard
	conversion sequences.  Instead, S1 becomes

	S1':	lval int -i-> lval int

	If we analyze S1' vs. S2, the problem that led to the first adjustment
	does not arise.  I haven't been able to think of examples where it does.
	However, I find the identity conversion to be a kludge; I'd prefer that
	we allow zero-length standard conversion sequences, because they are
	mathematically more regular and thus easier to think about.

	My second adjustment consists of two alternatives.  I recommend the
	first one, and have used it throughout this paper.

	EITHER

		MAKE the first adjustment AND clarify [over] to specify that
		standard conversion sequences can have zero length,

	OR

		DO NOT make the first adjustment, BUT clarify [over] to specify
		that there are no zero-length standard conversion sequences,
		that an identity conversion is inserted instead.


	5.  MORE EXAMPLES

	   Most of these come from discussions between Tom Wilcox and me.  They
	are intended as a resource for people writing up the various rules, and
	for people thinking about trying to "improve" the rules.  (Tom, Bill
	Gibbons, and Steve Adamczyk, this means you!)  Here's my advice:
	programmers sometimes get their overloadings to work without
	understanding why they work; or they have long forgotten why their
	overloadings work.  If you change the rules, even to improve them, be
	extremely careful not to break code.

	* Example 5.1:

		void f(const int&);
		void f(long);
		f(5);	// f(const int&)

	We pick f(const int&) using the sequence

		rval int -q-> rval const int -r-> lval const int

	The alternative is a shorter, but worse, sequence

		rval int -s-> rval long

	This is one example of why we don't simply rank conversion sequences by
	their length.

	* Example 5.2:

	I tried to come up with a simple example where a promotion wins over a
	standard conversion due to promotions being "better" than standard
	conversions.  I couldn't, because usually the subsequence rule causes
	the promotion to win (see example 5.4).

	   But here is a more complicated example of a promotion winning over a
	standard conversion:

		struct A { operator short(); operator int*(); };
		A a;
		void f(int);
		void f(void*);
		f(a);		// f(int) wins

	The sequences are

		lval A -u-> rval short -p-> rval int
		lval A -u-> rval int*  -s-> rval void*

	We compare these u-d sequences by comparing the std sequences following
	the u-d conversion, EVEN THOUGH those sequences start from DIFFERENT
	initial types.  One of those sequences consists of a promotion, the
	other of a standard conversion; so the promotion wins.

	* Example 5.3:

	Here is an example of the strangeness caused by null pointer
	conversions:

		void f(long);
		void f(char*);
		f(0);		// ambiguous: constant int 0 -s-> constant char*
				// vs. int -s-> long

	* Example 5.4:

	For compatibility with most existing implementations, a "small"
	arithmetic type must be promoted before it can be "demoted" again using
	a standard conversion.  For example:

		void f(short);
		void f(int);
		f('c');			// ok, f(int) (perhaps surprising)

	We get to short via

		char -p-> int -s-> short

	(we assume we are on a machine where char promotes to int).  We get to
	int via

		char -p-> int

	Since the latter is a subsequence of the former, it is better.

	* Example 5.5:

	There is one case where a promotion may be followed by a non-numeric
	standard conversion:

		void f(char*);
		void f(int);
		f('\0');		// ok, f(int)

	Cfront 3.x and Turbo C++ prefer f(int), apparently because the
	conversion sequences are

		constant 0 char --p--> constant 0 int --s--> constant 0 char*
		constant 0 char --p--> constant 0 int

	* Example 5.6:

	But if we change example 5.5 slightly:

		void f(char*);
		void f(short);
		f('\0');		// ambiguous

	then it's ambiguous:

		constant 0 char --p--> constant 0 int --s--> constant 0 char*
		constant 0 char --p--> constant 0 int --s--> constant 0 short

	At issue is whether the "null pointer" conversions are

		[1]  all integral and enum constant 0 -s-> T* constant 0

	or

		[2]  {int,unsigned,long,unsigned long} constant 0 -s->
		     T* constant 0

	The current [conv.ptr] says [1].  Borland and cfront implement [2], and
	that is what I have documented.

	* Example 5.7:

	User-defined conversions that overlap standard conversions, or threaten
	to, often confuse people.  But they can be handled straightforwardly
	using the existing rules.

		struct B {};
		struct D : B { operator B&(); };
		D d;
		B& r = d;		// ok, uses lval D -s-> lval B

	The user-defined conversion is not used because a standard conversion
	sequence is better than a user-defined conversion sequence.

	* Example 5.8:

	Here is another one with a user-defined conversion that is "like" a
	built-in conversion:

		struct A {};
		struct B : A {};
		struct C : A {};
		struct D : B, C { operator A&(); };
		D d;
		A& r = d;		// ok, uses d.operator A&()

	The user-defined conversion sequence is very direct:

		lval D -u-> lval A

	There is no standard conversion lval D -s-> lval A, since A is an
	ambiguous base of D.

	* Example 5.9:

	Here is an example of user-defined conversions and the built-in
	assignment operator.

		struct A { operator int&(); };
		A a;
		int i;
		i = a;	// ok, lval A -u-> lval int -l-> rval int
		a = i;	// ill-formed: no u-d conversions on lhs of assignment

	We disallow user-defined conversions on the left-hand side of assignment
	in order to make built-in assignment consistent with member assignment
	operators.

	* Example 5.10:

	This example is interesting because it involves an overloaded function
	name as the argument to an overloaded function.  Otherwise it is pretty
	straightforward.

		typedef void F1();
		typedef void F2(int);
		extern F1 f;
		extern F2 f;
		struct A { A(F1&); };
		struct B { B(F2&); };
		void g(A);
		void g(const B&);
		g(f);			// g(A)

	The user-defined conversion sequences are:

	S1	lval F1 -u-> rval A
	S2	lval F2 -u-> rval B -q-> rval const B -r-> lval const B

	g(A) is preferred because the (0-length) second standard conversion
	sequence in S1 is better than the second standard conversion sequence
	in S2, because the latter standard conversion sequence has a
	qualification conversion.