N698 J11/97-061 Implementation Defined Integral Types Randy Meyers and Doug Gwyn 23 June 1997 1 Introduction Doug Gwyn distributed via the reflector a proposal (N713) to allow implementation defined integral types to be used in the standard headers. Doug and I discussed the proposed wording changes in N713 and produced this updated version. Early versions of this paper were also distributed to Clive Feather, Frank Farance, and Douglas Walls. Clive provided particularly valuable feedback about issues with the representation of unsigned integers, and issues raised in his paper N691. This paper contains no new issues that have not been in previous proposals before the committee. This version of the proposal incorporates some ideas from N606 by Frank Farance and N669 by Clive Feather. 2 Overview of Proposal Implementation defined integral types are incorporated into the Standard by allowing implementations to add additional types to the set of "signed integer types." By existing wording in the Standard, the implementation must supply corresponding unsigned integer types. By definition, the implementation defined signed and unsigned integer types are integral types, basic types, scalar types, and arithmetic types. All of the statements made in the Standard about those type classes automatically apply to the implementation defined integer types. The same wording in the Standard that defines the properties of the standard integer types defines the properties of the implementation defined integer types as well. For convenience, the terms "extended signed integer types", "extended unsigned integer types", and "extended integer types" are defined. The term "precision" is defined to solve an existing problem with the Standard confusing "size" with an integer type's ability to represent values. Two integer types of the same size might have different padding, and thus not be able to represent the same values. The integral promotions and usual arithmetic conversions have been made less implementation defined than in Doug's original proposal. The new usual arithmetic conversions have the following properties: 1. The results for the standard types do not change. 2. When a Standard type and implementation defined type meet, if the signed or unsigned version of the standard type can represent the values of the implementation defined type, then the result is the (signed or unsigned) standard type. 3. The new rules are a generalization of the old rules, and retain their spirit. 4. The new rules behave like the old rules even for unusual implementations that use the same representation for all the standard types, or have unsigned types that are just the signed types with the sign bit ignored, or have unsigned representations that are much "bigger" than their signed counterparts. This paper actually contains two equivalent alternative wordings for the integral promotions and usual arithmetic conversions for the committee to choose between. Sections 4.1, 5.1, and 6.1 of this paper make up the first alternative wording. The second alternative consists of either Section 4.1 or 4.2, plus section 5.2 and 6.2. The paper contains some optional sections on constants (Section 7), uniqueness of types (Section 8), preprocessor arithmetic (Section 9), and the grammar (Section 10). These sections may be voted in or out without hurting the integrity of the proposal. Note: Text surrounded by *asterisks* should be italicized, while text surrounded by {braces} should be set in Courier font. 3 Allow Implementation Defined Integral Types Replace Section 6.1.2.5 (Types), paragraph 3: There are five *signed integer types*, designated as {signed char}, {short int}, {int}, {long int}, and {long long int}. (The signed integer and other types may be designated in several additional ways, as described in 6.5.2.) with: There are five *standard signed integer types*, designated as {signed char}, {short int}, {int}, {long int}, and {long long int}. (These and other types may be designated in several additional ways, as described in 6.5.2.) There may also be implementation-defined *extended signed integer types*. [reference first new footnote] The standard and extended signed integer types are collectively called just *signed integer types*. [reference second new footnote] Add first new footnote: Implementation defined keywords must have the form of an identifier reserved for any use as described in 7.1.3. Add second new footnote: Therefore, any statement in this Standard about the signed integer types also applies to the extended signed integer types. After the following in Section 6.1.2.5 (Types), paragraph 5: For each of the signed integer types, there is a corresponding (but different) *unsigned integer type* (designated with the keyword {unsigned}) that uses the same amount of storage (including sign information) and has the same alignment requirements. add: The unsigned integer types that correspond to the standard signed integer types are the *standard unsigned integer types*. The unsigned integer types that correspond to the extended signed integer types are the *extended unsigned integer types*. The extended unsigned integer types and extended unsigned integer types are collectively called the *extended integer types*. 4 Define Precision For Integer Types Existing wording in the Standard refers to the "size" of integer types in a problematical fashion. From Section 6.2.1.2 (Signed and unsigned integers), paragraph 2, defining integer conversions: When a signed integer is converted to an unsigned integer with equal or greater size, if the value of the signed integer is nonnegative, its value is unchanged. If integers are allowed to have padding (bits in their representation that do not participate in the value stored in the integer), then the above section fails to consider the case of two integers that are the same size, but use a different number of bits to store the value. Frank Farance suggested in N606 that a new term, precision, be defined for integer types. This proposal contains two alternative definitions from which the committee can choose. 4.1 Precision Definition 1 This definition of "precision" special cases the definition for the unsigned types in in order to make the first of the alternative wordings below for the integral promotions and usual arithmetic conversions work (this definition also works for the second alternative for the promotions and conversions). After the following in Section 6.1.2.5 (Types), paragraph 16: The representations of integral types shall define values by use of a pure binary numeration system.25 Add: The *precision* of a signed integer type is the number of bits it uses to represent values excluding the sign bit and any padding. The precision of an unsigned integer type is considered to be the same as the corresponding signed integer type, although the number of bits used to represent values may be greater. The precision of an enumerated type is the precision of the compatible integral type. Regardless of its representation, the precision of {char} is considered to be the precision of {signed char} and {unsigned char}. 4.2 Precision Definition 2 This definition of precision contains no special cases. It only works with the second alternative wording for the integral promotions and usual arithmetic conversions. After the following in Section 6.1.2.5 (Types), paragraph 16: The representations of integral types shall define values by use of a pure binary numeration system.25 Add: The *precision* of an integral type is the number of bits it uses to represent values excluding the sign bit (if any) and any padding. 5 Integral Promotions This section gives two alternative wordings for the integral promotions. The first alternative is based exclusively on precision. The second alternative is based on a new concept called the integral conversion rank of types. This ranking, once defined, allows the promotions and conversions to be expressed more succinctly. 5.1 Integral Promotions Alternative 1 Change Section 6.2.1.1 (Characters and integers), paragraph 1: A {char}, a {short int}, or an {int} bit-field, or their signed or unsigned varieties, or an enumeration type, may be used in an expression wherever an {int} or {unsigned int} may be used. If an {int} can represent all values of the original type, the value is converted to an {int}; otherwise, it is converted to an {unsigned int}. These are called the *integral promotions*.37 All other arithmetic types are unchanged by the integral promotions. to: The following may be used in an expression wherever an {int} or {unsigned int} may be used: -- An integral type whose precision is less than or equal to the precision of {int} and {unsigned int} -- A bit-field of type {int}, {signed int}, or {unsigned int} If an {int} can represent all values of the original type, the value is converted to an {int}; otherwise, it is converted to an {unsigned int}. These are called the *integral promotions*.37 All other types are unchanged by the integral promotions. Note that Section 6.1.2.5 paragraph 16 defines integral types as char, the signed and unsigned integer types, and the enumerated types. 5.2 Integral Promotions Alternative 2 Replace Section 6.2.1.1 (Characters and integers), paragraph 1: A {char}, a {short int}, or an {int} bit-field, or their signed or unsigned varieties, or an enumeration type, may be used in an expression wherever an {int} or {unsigned int} may be used. If an {int} can represent all values of the original type, the value is converted to an {int}; otherwise, it is converted to an {unsigned int}. These are called the *integral promotions*.37 All other arithmetic types are unchanged by the integral promotions. with the following paragraphs: Every integral type has a *integral conversion rank* defined as follows: -- No two signed integer types shall have the same rank, even if they have the same representation. -- The rank of a signed integer type shall be greater than the rank of any signed integer type with less precision. -- The rank of any standard signed integer type shall be greater than the rank of any extended signed integer type with the same precision. -- The rank of {long long int} shall be greater than the rank of {long int}, which shall be greater than the rank of {int}, which shall be greater than the rank of {short int}, which shall be greater than the rank of {signed char}. -- The rank of any unsigned integer type shall equal the rank of the corresponding signed integer type. -- The rank of {char} shall equal the rank of {signed char} and {unsigned char}. -- The rank of any enumerated type shall equal the rank of the compatible integer type. -- The rank of any extended signed integer type relative to another extended signed integer type with the same precision is implementation-defined, but still subject to the other rules for determining the integral conversion rank. -- For all integral types *T1*, *T2*, and *T3*, if *T1* has greater rank than *T2* and *T2* has greater rank than *T3* then *T1* has greater rank than *T3*. The following may be used in an expression wherever an {int} or {unsigned int} may be used: -- An object or expression with an integral type whose integral conversion rank is less than the rank of {int} and {unsigned int}. -- A bit-field of type {int}, {signed int}, or {unsigned int}. If an {int} can represent all values of the original type, the value is converted to an {int}; otherwise, it is converted to an {unsigned int}. These are called the *integral promotions*.37 All other types are unchanged by the integral promotions. Note that Section 6.1.2.5 paragraph 16 defines integral types as char, the signed and unsigned integral types, and the enumerated types. 6 Usual Arithmetic Conversions This section gives two alternative wordings for the usual arithmetic conversions. The first is based on precision. The second is based on integral conversion rank. 6.1 Usual Arithmetic Conversions Alternative 1 Starting with the following text in Section 6.2.1.7 (Usual arithmetic conversions), paragraph 1: Otherwise, the integral promotions are performed on both operands. Then the following rules are applied: delete to the end of the paragraph 1 and replace with: Otherwise, the integral promotions are performed on both operands. Then the following rules are applied to the promoted operands: If the operands have different precisions, the operand with less precision is converted to the type of other the operand. Otherwise, the operands have the same precision: If either operand has type {long long int} or {unsigned long long int}, then both operands are converted to {unsigned long long int} if either operand has an unsigned integer type. Otherwise, both operands are converted to {long long int}. Otherwise, if one operand has type {long int} or {unsigned long int}, then both operands are converted to {unsigned long int} if either operand has an unsigned integer type. Otherwise, both operands are converted to {long int}. Otherwise, if one operand has type {int} or {unsigned int}, then both operands are converted to {unsigned int} if either operand has an unsigned integer type. Otherwise, both operands are converted to {int}. Otherwise, if both operands have the same type, then no further conversion is needed. Otherwise, if one operand has signed integer type and the other operand has the corresponding unsigned integer type, then the operand with the signed integer type is converted to the type of the operand that has unsigned integer type. Otherwise, both operands are extended integer types with the same precision. There shall be an implementation defined ranking of all extended signed integer types that have the same precision. No two extended signed integer types shall have the same rank, even if they have the same representation. The unsigned integer type that corresponds to an extended signed integer type shall have the same rank as that signed integer type. Then, if either operand has an unsigned integer type, both operands are converted to the unsigned integer type that is or corresponds to the operand type with greater rank. Otherwise, the operand with the type of lesser rank is converted to the type of the operand whose type has greater rank. Take care in reading the above. Remember, the case where the types have different precisions is handled before all of the conditional clauses. This removed the need in the Standard's present wording for discussing what happens when long long, long, and/or int have the same versus different "sizes". 6.2 Usual Arithmetic Conversions Alternative 2 Starting with the following text in Section 6.2.1.7 (Usual arithmetic conversions), paragraph 1: Otherwise, the integral promotions are performed on both operands. Then the following rules are applied: delete to the end of the paragraph 1 and replace with: Otherwise, the integral promotions are performed on both operands. Then the following rules are applied to the promoted operands: If both operands have the same type, then no further conversion is needed. Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integral conversion rank is converted to the type of the operand with greater rank. Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then operand with signed integer type is converted to the type of the operand with unsigned integer type. Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of operand with signed integer type. Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type. 7 Allow "Big" Constants To Have Extended Integral Type The wording proposed in this section is optional. The rest of the proposal is consistent if this section is not voted in. The existing wording in the Standard permits an implementation to give a constant that is too big for {long long} or {unsigned long long} an extended integer type. No diagnostic is required. Section 6.1.3.2 (Integer constants), paragraph 5, in Semantics says: The type of an integer constant is the first of the corresponding list in which its value can be represented. Unsuffixed decimal: {int}, {long int}, {long long int}, {int}; unsuffixed octal or hexadecimal: {int}, {unsigned int}, {long int}, {unsigned long int}, {long long int}, {unsigned long long int}; suffixed by the letter {u} or {U}: {unsigned int}, {unsigned long int}, {unsigned long long int}; suffixed by the letter {l} or {L}: {long int}, {unsigned long int}, {long long int}, {unsigned long long int}; suffixed by both the letters {u} or {U} and {l} or {L}: {unsigned long int}, {unsigned long long int}; suffixed by {ll} or {LL}: {long long int}, {unsigned long long int}; suffixed by both {u} or {U} and {ll} or {LL}: {unsigned long long int}. Section 6.1.3 (Constants), paragraph 2, is the only constraint: The value of a constant shall be in the range of representable values for its type. If the constant is too big for the types in its list, then the program violates a semantics rule and is not strictly conforming. An implementation is allowed to extend the language to give meaning to any program that is not strictly conforming. In this case, the extension is to give the constant an extended integer type. As long as the extended integer type can represent the value of the constant, the constraint is not violated, and no diagnostic is required. The Standard would benefit if it provided more direction to implementations in which extended integer types are appropriate for the different forms of constants. At the end of Section 6.1.3.2 (Integer constants), paragraph 5 add: If an integer constant can not be represented by a type in its list, it may have an extended integer type, if the extended integer type can represent its value. If all of the types in the list for the constant are signed, the extended integer type shall be signed. If all of the types in the list for the constant are unsigned, the extended integer type shall be unsigned. If the list contains both signed and unsigned types, the extended integer type may be signed or unsigned. Note Draft 10 erroneously has an extra {int} at the end of the list for Unsuffixed decimal. Also, most people that have reviewed the list object to the fact that decimal constants suffixed by L or LL are allowed to be unsigned. Perhaps the committee voted in undesirable wording. 8 Uniqueness of types The wording proposed in this section is optional. The rest of the proposal is consistent if this section is not voted in. Section 6.1.2.5 (Types), paragraph 10 says: The type {char}, the signed and unsigned integer types, and the floating types are collectively called the *basic types*. Even if the implementation defines two or more basic types to have the same representation, they are nevertheless different types. Microsoft has keywords that are synonyms for standard types. For example, __int16 is a synonym for short and unsigned __int16 is a synonym for short int. Such a synonyms are not different types: merely funny names for existing types, similar in some ways to a typedef. Such synonyms are not distinct types, and so the above paragraph does not apply to them. A footnote would clarify this. Add a new footnote to the end of Section 6.1.2.5 (Types), paragraph 10: An implementation may define new keywords that provide alterative ways to designate a basic (or any other) type. An alternate way to designate a basic type does not violate the requirement that all basic types be different. Implementation defined keywords must have the form of an identifier reserved for any use as described in 7.1.3. 9 Preprocessor arithmetic The wording proposed in this section is optional. The rest of the proposal is consistent if this section is not voted in. It seems wise to require preprocessing arithmetic to be performed in the largest integral type that the implementation supports. Replace the following sentences from Section 6.8.1 (Conditional inclusion), paragraph 4: The resulting tokens comprise the controlling constant expression which is evaluated according to the rules of 6.4 using arithmetic that has at least the ranges specified in 5.2.4.2, except that {int} and {long}, and {unsigned int} and {unsigned long}, act as if they have the same representation as, respectively, {long long} and {unsigned long long}. with: The resulting tokens comprise the controlling constant expression which is evaluated according to the rules of 6.4 using arithmetic that has at least the ranges specified in 5.2.4.2, except that the signed integer types and the unsigned integer types, act as if they have the same representation as, respectively, {intmax_t} and {uintmax_t} defined in the <inttypes.h> header. Add Forward reference: Largest integral types (7.4.3) Note, the above forward reference may have to be adjusted to reflect the rewrite of the section on <inttypes.h>. 10 Syntax for Declarations The wording proposed in this section is optional. The rest of the proposal is consistent if this section is not voted in. The Standard requires that a violation of a syntax rule cause an implementation to issue a diagnostic. This section section proposes extending the grammar to permit implementation defined keywords to be type specifiers. This change is only needed if the committee wishes to remove the requirement that an implementation issue a diagnostic when user code (as opposed to headers) uses implementation defined keywords as type specifiers. Note that the standard headers are not files (Section 7.1.2 footnote 112), and the committee has always held that the headers may be implemented as a binary representation of the specified contents of the header. Issues of syntax do not really apply to headers, and so, implementations are free to use extended syntax in the standard headers without issuing a diagnostic (an implementation may use a pragma to suppress such diagnostics while in the header). Thus, implementations may use extended integer types in the implementations' headers without this proposed change to the grammar. Gwyn, Meyers, and Feather do not feel the wording change in this section is necessary. After the following line in Section 6.5.2 (Type specifiers), paragraph 1: {long} add new line: *extended-signed-integer-type* Add new Syntax rule: *extended-signed-integer-type*: *identifier* Corresponding changes should be made in Section B.2.2, page 352. After the following in Section 6.5.2 (Type specifiers), paragraph 2: -- {unsigned long long}, or {unsigned long long int} add two new list items: -- an identifier reserved for any use in 7.1.3 that designates an implementation-defined extended signed integer type, or the same identifier preceded by {signed} -- the same identifier preceded by {unsigned} Add the following Forward reference: reserved identifiers (7.1.3) 11 Index Entries New entries in the index should be made for the following terms: 1. standard signed integer types 2. extended signed integer types 3. standard unsigned integer types 4. extended unsigned integer types 5. extended integer types 6. precision 7. integral conversion rank (if the corresponding change is made)