Document Number: WG14 N825/X3J11 98-024 WG14/N825 C9X Public Comment WG14/N825 ================== Sponsoring National Body: J11 Date: 98/05/15 Author: Tom MacDonald Author Affiliation: Silicon Graphics Inc. Postal Address: 655F Lone Oak Drive, Eagan, MN 55409 USA E-mail Address: tam@cray.com Telephone Number: +1 612 6835818 Fax Number: +1 612 6835307 Number of individual comments: 2 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% %% %% Problems With Undefined Behavior %% %% %% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% As I understand it, the intents of "undefined behavior" in the current Draft are: - let a programmer know something is not portable - often an outright error - no diagnostic required - if implementation elects to issue a diagnostic, it has to be a warning and not a fatal error (i.e., program is translated into something) Seems like there are some conflicting statements in the C9X Draft: 3.18 Undefined behavior [#1] Behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately valued objects, for which this International Standard imposes no requirements. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). The paragraph above indicates the implementation can terminate the translation process if undefined behavior is detected. Paragraph 3 in 3.18 contains contradictory statements: [#3] The implementation must successfully translate a given program unless a syntax error is detected, a constraint is violated, or it can determine that every possible execution of that program would result in undefined behavior. Another problem with paragraph 3 above is that there are 8 phases of translation. Translation Phase 7 says: ... The resulting tokens are syntactically and semantically analyzed and translated as a translation unit. Paragraph 3 above indicates the implementation must successfully translate the entire program. Typically the translator only translates through phase 7, and phase 8 creates the program image using the output of the translator: 8. All external object and function references are resolved. Library components are linked to satisfy external references to functions and objects not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment. So, here's a scenario: The following include file cannot be found by the translator #include and "6.1.7 Header names" says this is undefined behavior. At this point the implementation is allowed to behave in an unpredictable way producing unpredictable results. Seems like one of those unpredictable results is producing the following output: command not found What does it mean to say "that every possible execution results in undefined behavior" for such a case? It's not obvious. What should we do? *Warning* radical suggestion ahead!!! Let's delete paragraph 3 above. I'm not sure it accomplishes whatever, we as a committee wanted it to accomplish. It also changes one of the original motivations for undefined behavior. Originally, one of the intents of undefined behavior was to allow an implementation to extend C in a particular way, but not force other vendors to extend in the same way. We always said that, that vendor can just reject that program if it's undefined behavior. Now the vendor must successfully translate the program (assuming we fix existing wording problems). The problem now is that a vendor cannot issue a fatal error at translation time if undefined behavior is found. Granted they can issue a warning, but it's easy to miss a warning when a recompilation of a large application occurs. The current wording places a burden on the implementors. When customer X complains that Vendor A successfully compiled a program containing an obvious error, the vendor is forced to explain this decision. Customer support costs are expensive and vendors try to minimize them. Paragraph 3 appears, from the vendor point of view, to be an attempt to significantly increase the customer support costs. Remember, you cannot fail to translate just because the following occur: - An unmatched ' or " character is encountered on a logical source line during tokenization (6.1). - A reserved keyword token is used in translation phase 7 or 8 for some purpose other than as a keyword (6.1.1). - The reserved token complex or imaginary is used before is included (6.1.1). - The first character of an identifier is a digit (6.1.2). - The same identifier has both internal and external linkage in the same translation unit (6.1.2.2). - A block containing a variably modified object having automatic storage duration is entered by a jump to a labeled statement (6.1.2.4). - The whole-number and fraction parts of a floating constant are both omitted (6.1.3.1). - For a function call without a function prototype, the function is defined without a function prototype, and the types of the arguments after promotion are not compatible with those of the parameters after promotion (6.3.2.2). - A pointer is converted to other than an integer or pointer type (6.3.4). - An expression is shifted by a negative number or by an amount greater than or equal to the width of the promoted expression (6.3.7). - An expression that is required to be an integer constant expression does not have an integer type, contains casts (outside operands to sizeof operators) other than conversions of arithmetic types to integer types, or has operands that are not integer constants, enumeration constants, character constants, fixed-length sizeof expressions, or immediately-cast floating constants (6.4). - A constant expression in an initializer does not evaluate to one of the following: an arithmetic constant expression, a null pointer constant, an address constant, or an address constant for an object type plus or minus an integer constant expression (6.4). - An arithmetic constant expression does not have arithmetic type, contains casts (outside operands to sizeof operators) other than conversions of arithmetic types to arithmetic types, or has operands that are not integer constants, floating constants, enumeration constants, character constants, or sizeof expressions (6.4). - An address constant is created neither explicitly using the unary & operator or an integer constant cast to pointer type, nor implicitly by the use of an expression of array or function type (6.4). - An identifier for an object is declared with no linkage and the type of the object is incomplete after its declarator, or after its init-declarator if it has an initializer (6.5). - A function is declared at block scope with an explicit storage-class specifier other than extern (6.5.1). - A structure or union is defined as containing no named members (6.5.2.1). - A bit-field is declared with a type other than a qualified or unqualified version of signed int or unsigned int (6.5.2.1). - A tag is declared with the bracketed list twice within the same scope (6.5.2.3). - etcetera ... Many customers are not going to understand why the vendors successfully translated their application when some obvious error occurred. Vendors will be forced to provide non-standard ways of getting fatal errors for obvious mistakes. Paragraph 3 seems to be doing a disservice to both vendors and users.