ISO/IEC JTC1/SC22/WG14 N739 General wording issues (clauses 1 to 6) First Revision Clive D.W. Feather Abstract ======== This document is an attempt to identify all the minor issues I can find in clauses 1 to 6 of the Standard. This revision is an update to use draft 10 pre 1 as the starting point. Where issues are still open or are undiscussed, I have added material and the original wording. ======================================================================= Item 1: The term "access" is not well defined. From context, it sometimes appears to mean "read the value", and sometimes "read or write the value". This ambiguity sometimes makes it hard to understand what is actually meant. There needs to be a definition in clause 3, and all uses of the term need to be checked for the read-only / read-write problem. Probably the best approach is to define it as "read or write", and to find and fix the places where "read" is meant. An example of the "read" usage is 6.3.2.3 paragraph 5: With one exception, if a member of a union object is accessed after a value has been stored in a different member of the object, the behaviour is implementation-defined. where writing is clearly meant to be excluded. An example of the "read or write" usage is 6.3 paragraph 6: ... If a value is stored into an object ... the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses ... where writing is clearly meant to be included. An example where this causes problems with interpreting the Standard is 6.5.3. Paragraph 11 reads: A reference to a value means either an access to or a modification of the value. So "access" presumably means read, but not write. But then paragraph 6 reads: What constitutes an access to an object that has volatile-qualified type is implementation-defined. So what constitutes a write to a volatile object is *not* implementation- defined ? There are other instances; this is the first one that comes to mind. ==== Item 2: Change the first part of paragraph 1 of subclause 5.1.2.2.1 to: The function called at program startup is named /main/. The implementation declares no prototype for this function. It shall be defined either with no parameters: ... int main (int argc, char *argv[]) { /* ... */ } or equivalent [*], or in some other implementation-defined manner. [*] Thus /int/ can be replaced by a typedef-name defined as /int/, or the type of argv can be written as /char **argv/, and so on. This will make it clear that, while these are the only permitted strictly conforming alternatives, extensions are allowed but must be documented. ==== Item 3: Examples 2 and 6 in subclause 5.1.2.3 need rewording. At present they use the term "exception" to mean something like a visible overflow trap, whereas 6.3 makes it clear that an "exception" occurs on overflow even when the result is silently wrapped. In example 2, change: Provided the addition of two /chars/ can be done without creating an overflow exception, ... to: Provided the addition of two /chars/ can be done without overflow, or with overflow wrapping silently to produce the correct result, ... In example 6, change: On a machine in which overflows produce an exception ... to: On a machine in which overflows produce an explicit trap ... and change: However on a machine in which overflows do not produce an exception and in which the results of overflows are reversible, to: However, on a machine in which overflow silently generates some value and where positive and negative overflows cancel, ==== Item 4: In 5.2.1 paragraph 2, delete the final "literal". The zero character terminates strings, but does not occur in a string literal (which is a syntactic construct). Add a forward reference to "string" in 7.1.1. ==== Item 5: Subclause 6.1.2 treats the term "identifier" as representing the sequence of characters. On the other hand, subclause 6.1.2.1 treats the term as representing that sequence within a given scope. Thus in: { int fred; /* fred-1 */ { int fred; /* fred-2 */ } } 6.1.2 paragraph 8 treats fred-1 and fred-2 as being the same identifier, while 6.1.2.1 treats them as different. In 6.1.2 paragraph 4, change: An identifier denotes an object ... or a macro parameter. to: An identifier can denote an object ... or a macro parameter. The same identifier can denote different entities at different points in the program. In 6.1.2.1 paragraph 1, change: An identifier is /visible/ (i.e. can be used) only within a region of program text called its scope. to: For each different entity that an identifier designates, the identifier is /visible/ (i.e. can be used) only within a region of program text called its scope. Different entities designated by the same identifier either have non-overlapping scopes, or are in different name spaces. In paragraph 3, change: If an outer declaration of a lexically identical identifier exists in the same name space, it is hidden until the current scope terminates, after which it again becomes visible. to: If an identifer designates two different entities in the same name space, the scopes might overlap. If so, the scope of one entity (the /inner scope/) will be a strict subset of the scope of the other entity (the /outer scope/). Within the inner scope, the identifier designates the entity declared in the inner scope; the entity declared in the outer scope is /hidden/ (and not visible) within the inner scope. Insert a new paragraph between paragraphs 3 and 4: Each occurence of an identifier designates the entity in the relevant name space whose declaration is visible at the point that the identifier occurs. Unless explicitly stated otherwise, where this International Standard uses the term "identifier" to refer to some entity (as opposed to the syntactic construct), it is that entity that is referred to. In 7.1.3 paragraph 2, change: If the program declares or defines an identifier with the same name as an identifier reserved in that context ... to: If the program declares or defines an identifier that is reserved in that context ... ==== Item 6a: In 6.1.2.5, append to paragraph 11: The implementation shall define /char/ to have the same range, representation, and behaviour as one of /signed char/ and /unsigned char/. [*] [*] CHAR_MIN, defined in , will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, /char/ is a separate type from the other two, and is not compatible with either. This clarifies that there are only two differently-behaving types, not three. ==== Item 6b: In 6.1.2.5, change the last sentence of paragraph 2 from: If other quantities are stored in a /char/ object, the behaviour is undefined; the values are treated as either signed or nonnegative integers. to: If any other character is stored in a /char/ object, the resulting value is implementation-defined but shall be within the range of values that can be represented in that type. ==== Item 7: The rules for composite type handle an incomplete array meeting a complete one, but not the equivalent situation with an incomplete structure or union. Replace subclause 6.1.2.6 paragraph 3, first bullet point, with: - If one type is complete and the other type is incomplete, the composite type is a complete type. ==== Item 8: Add the following to the end of subclause 6.2.2.3: An integer may be converted to any pointer type. The result is implementation-defined, and might not be a pointer to an object of that type. [59] Any pointer type may be converted to an integral type; the result is implementation-defined, and need not be in the range of values of any integral type. If the resulting value cannot be represented in the destination type, the behaviour is undefined. [*] [*] Thus if the conversion is to /unsigned int/ but yields a negative value, the behaviour is undefined. A pointer to a complete or incomplete object type may be converted to a pointer to a different complete or incomplete object type. If the resulting pointer is not correctly aligned for the pointed to type, the behaviour is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. [*] [*] All pointers to character types are correctly aligned. In general, the concept "correctly aligned" is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C. A pointer to a function ... [this paragraph, taken from 6.3.4, remains unchanged]. Delete 6.3.4 paragraph 4, and add the following paragraph to the constraints (after paragraph 2): Conversions that involve pointers, other than where permitted by the constraints of 6.3.16.1, shall be specified by means of an explicit cast. ==== Item 9a: The following code is technically illegal: union u { int i; float f; }; u.f = 1.0; u.i = 42; printf ("%d", u.i); In 6.3.2.3 paragraph 5, replace: With one exception, if a member of a union object is accessed after a value has been stored in a different member of the object, the behaviour is implementation-defined. [53] One special guarantee is made ... with: | With one exception, if the value of a member of a union object is | used when the most recent store to the object was to a different | member, the behaviour is implementation-defined. [53] One special guarantee is made ... This item ignores the issues of what implementation-defined means; item 9b deals with that part. ==== Item 9b: If a union is read from a member other than the one last stored into, the result is currently implementation-defined. Because the result might cause a trap of some kind (e.g. invalid pointer), it should be undefined behaviour in most circumstances; the wording should broadly follow 6.3 on this matter. In 6.3.2.3, replace paragraph 5 (either the original or the replacement from item 9a) with: | With two exceptions, if the value of a member of a union object is | used when the most recent store to the object was to a member whose | type does not have the same alignment and representation, the | behaviour is undefined. If either member has character type or is an | array of character type, the behaviour is implementation-defined. [53] | Furthermore, a special guarantee is made ... ==== Item 10: Replace subclause 6.5.2 paragraph 4 by: Each of the comma-separated sets designate the same type, except that for bit-fields, it is implementation-defined whether the specifier /int/ is the same type as /signed int/ or is the same type as /unsigned int/. Replace subclause 6.5.2.1 paragraph 8 by: A bit-field shall have a type that is a qualified or unqualified version of /signed int/ or /unsigned int/. A bit field is interpreted as a signed or unsigned integral type consisting of the specified number of bits. [*] [*] As specified in 6.5.2 above, if the actual type specifier used is /int/ or there is no type specifier, or is a typedef-name defined using either of these, then it is implementation-defined whether the bit-field is signed or unsigned. This eliminates the duplicate wording in these two places, and also makes it clear that there is not a potential third signedness of bitfield. If my proposals for representation of types are accepted, there may need to be further wording adjustments in the second alteration. ==== Item 11: In subclause 6.5.2.1, change paragraph 3 from: The expression that specifies the width of a bit-field shall be an integral constant expression that has nonnegative value that shall not exceed the number of bits in an ordinary object of compatible type. If the value is zero, the declaration shall have no declarator. to: The expression that specifies the width of a bit-field shall be an integral constant expression that has nonnegative value that | shall not exceed the number of bits in an object of the type | that would be specified if the colon and expression had been | omitted. If the value is zero, the declaration shall have no declarator. The current wording doesn't say *what* the type is compatible with. ==== Item 12: Subclause 6.5.2.2 allows an enumerated type (say /enum e/) to be compatible with /long/ or even /unsigned long long/. On the other hand, subclause 6.2.1.1 states that the type converts to /int/ or /unsigned int/ as part of the integral promotions. This produces the apparent contradiction that two compatible types promote differently ! There are two alternative approaches to solving this. (A) Change subclause 6.5.2.2 paragraph 4 from: Each enumerated type shall be compatible with an integer type. The choice of type is implementation-defined, but shall be capable of representing the values of all the members of the enumeration. to: | Each enumerated type shall be compatible with one of the following | types: | signed char unsigned char | signed short unsigned short | signed int unsigned int The choice of type is inplementation-defined, but shall be capable of representing the values of all the members of the enumeration. (B) Change subclause 6.2.1.1 paragraph 1 from: A /char/, a /short int/, or an /int/ bit-field, or their signed or unsigned versions, or an enumeration type, may be used in an expression wherever an /int/ or /unsigned int/ may be used. If an /int/ can represent all values of the original type, the value is converted to an /int/; otherwise, it is converted to an /unsigned int/. These are called the /integral promotions/.[37] All other arithmetic types are unchanged by the integral promotions. to: A /char/, a /short int/, or an /int/ bit-field, or their signed or | unsigned versions, may be used in an expression wherever an /int/ or /unsigned int/ may be used. If an /int/ can represent all values of the original type, the value is converted to an /int/; otherwise, it is converted to an /unsigned | int/. These are called the /integral promotions/.[37] | An enumeration type may be used in an expression wherever the type | that it is compatible with may be used. The integral promotions | cause the value to be converted in the same way as that compatible | type would be. All other arithmetic types are unchanged by the integral promotions. and in subclause 6.5.2.2, change the first sentence of paragraph 4 from: Each enumerated type shall be compatible with an integer type. to: | Each enumerated type shall be compatible with some signed or | unsigned integral type. [At present, enumerated types *are* integer types; the intent is to make them clearly compatible with one of the 10 types named in 6.1.2.5.] ==== Item 13: Change 6.5.7 paragraph 12 from: ... the first member of a union. ... to: ... the first named member of a union. ... [This isn't strictly necessary, but makes things clearer.] ==== Item 14: Now implicit int has been removed from the Standard, then there is no longer a good rationale for allowing functions with an object return type to execute a return statement without an expression. Change subclause 6.6.6.4 as follows. Append to the Constraints: A /return/ statement without an expression shall only appear in a function whose return type is /void/. Change paragraph 2, last sentence, from: A function may have any number of /return/ statements, with and without expressions. to: A function may have any number of /return/ statements. There are two alternative approaches to the remainder of the changes (the above changes are to be made in either case): (A) Change 6.6.6.4 paragraph 4 from: If a /return/ statement without an expression is executed, and the value of the function call is used by the called, the behaviour is undefined. Reaching the } that terminates a function is equivalent to executing a /return/ statement without an expression. to: If the } that terminates a function is reached, and the value of the function call is used by the caller, the behaviour is undefined. and change the last sentence of subclause 5.1.2.2.3 from: If the /main/ function executes a return that specifies no value, the termination status returned to the host environment is undefined. to: If the /main/ function executes a return that specifies no value, | the termination status returned to the host environment is unspecified. [The concept of undefined value is carefully avoided elsewhere.] (B) Delete 6.6.6.4 paragraph 4 entirely, insert the following Constraint in 6.6.6.4 at the end: In a function whose return type is not /void/, the last statement before the terminating } shall have one of the following forms: - a /return/ statement with an expression; - a /goto/ statement; - a block in which the last statement before the terminating } is, recursively, one of these forms; - an /if/ statement with an /else/, in which each substatement is, recursively, one of these forms; - a /switch/ statement which is not the smallest enclosing /switch/ or iteration statement of a /break/ statement, and in which the switch body is, recursively, one of these forms; - an iteration statement which is not the smallest enclosing /switch/ or iteration statement of a /break/ statement, and in which the controlling expression (/expression-2/ for a /for/ statement) is, or is replaced by, a non-zero constant expression. and delete the last sentence of subclause 5.1.2.2.3: If the /main/ function executes a return that specifies no value, the termination status returned to the host environment is undefined.