WG14 Document Number: N890 Date: 09-Sept-1999 Defect report against C99 (this assumes that the current C9X FDIS is approved as C99). What is the accuracy of decimal string to/from "binary" (non-decimal) floating-point conversions? What is the accuracy of hexadecimal string to/from "decimal" (non-power-of-2) floating-point conversions? In the following, the phrase "decimal to binary" shall cover any pair of bases that are not both a power of the same number. It also shall cover both the string to internal floating-point and internal floating-point to string conversions. There are three basic cases to consider: Translation time: decimal string to internal binary Run-time: decimal string to internal binary (scanf family, strtod family) Run-time: internal binary to decimal string (printf family) For each of those basic cases, there are two generic sub-cases: base 10 to base 2 and base 2 to base 10. Background: 6.4.4.2 Floating constants: Paragraph 8 now has: For decimal floating constants, and also for hexadecimal floating constants when FLT_RADIX is not a power of 2, the result is either the nearest representable value, or the larger or smaller representable value immediately adjacent to the nearest representable value, chosen in an implementation-defined manner. For hexadecimal constants when FLT_RADIX is a power of 2, the result is correctly rounded. What happens if the result is exactly representable? I believe that 3 distinct result values are allowed (the exact value, and the two representations adjacent to the exact value). What happens if the same value, written in the same form, is used multiple places in the program? I believe that the same source form can be converted to different values in different places in the program. Must decimal constants converted to a decimal radix be correctly rounded? I believe they should and I believe that this was an oversight. 7.19.6.1 The fprintf function: Paragraph 8 on "f,F" and "e,E" conversion specifiers says: The value is rounded to the appropriate number of digits. Does that mean round to nearest, round by truncating, round by add 0.5 and truncate, round as per the current rounding direction, or something else? Must the rounding used for f,F match the rounding used for e,E? Since there is no explicit allowance for multiple values (as there is in 6.4.4.2 Floating constants), must the value produced be as if the infinitely precise value were rounded (and the rounding produce an error less than or equal to 0.5 ulp for nearest and less than 1.0 ulp otherwise)? 7.19.6.2 The fscanf function: Paragraph 10 discusses conversion. Paragraph 12 on "a,e,f,g" conversion specifiers discusses format. Neither discuss accuracy of the decimal to binary conversion, e.g., it is not specified. What is the accuracy of floating-point string to internal representation conversions? Is it the same as translation time? Is it the same as strtod? Is it undefined behavior if the value is not exactly representable? Is it round to nearest? Is it affected by the current rounding mode, e.g., correctly rounded? 7.20.1.3 The strtod ... functions: What is the required accuracy of strtod family functions? It appears to be either not specified or the same as 6.4.4.2. It appears to depend upon what paragraph 4 "interpreted as a floating constant according to the rules of 6.4.4.2" means. Paragraph 8 recommends that hexadecimal forms when converted to base-10 be one of the two adjacent to the hex source value. Unfortunately, this means that strtod("0x1.p0",(char **)NULL) be converted into either 1.0+DBL_EPSILON or 1.0-DBL_EPSILON/FLT_RADIX, instead of the correct value 1.0 (this appears to be an oversight that forgot about exactly representable values). Suggestions for C99 standard: Change 6.4.4.2 Floating constants: Second half of paragraph 3 to be: For decimal floating constants when FLT_RADIX is not a power of 10, and also for hexadecimal floating constants when FLT_RADIX is not a power of 2, the result is either the nearest representable value, or the larger or smaller representable value immediately adjacent to the nearest representable value, chosen in an implementation-defined manner. For hexadecimal constants when FLT_RADIX is a power of 2 and for decimal constants when FLT_RADIX is a power of 10, the result is correctly rounded. Add a new paragraph in semantics: All floating constants of the same syntactic form and semantic value shall convert to the same representation. Add a new paragraph in semantics: If a constant can be represented exactly in its evaluation format, then it shall be converted to that exact representation. Changes to 7.19.6.1 The fprintf function: Add near paragraph 11 before Recommended practice: The roundings used by %f, %F, %e, and %E shall be the same and shall have an accuracy of better than 1 ulp in round to nearest and better than 2 ulp in other roundings. In paragraph 12 (Recommended practice), add: "For a and A conversions" at the start of the sentence. Also add: "and the value is not exactly representable" after "power of 2". Changes to 7.19.6.2 The fscanf function: In paragraph 12, "a,e,f,g" conversion specifier, add the sentence: The accuracy of this conversion shall be no worse than that of strtold for the same subject. Change 7.20.1.3 The strtod ... functions: In paragraph 4, change "rules of 6.4.4.2" to "rules of 6.4.4.2 (including accuracy requirements)". Add to paragraph 5: If the subject sequence has the decimal form and FLT_RADIX is a power of 10, the value resulting from the conversion is correctly rounded. In paragraph 8 under Recommended practice, add the phrase "and the result is not exactly representable" after the first comma. Add a third recommended practice paragraph: Conversions done by strtod family functions and fscanf family functions of the same valid floating-point subject string shall produce the same value. An alternative (not liked by this author) to all of the above is to add to 5.2.4.2.2 Characteristics of floating types in paragraph 4 before "and": ", binary-decimal conversions(footnote),". footnote: binary-decimal covers both string to internal representations and internal to string representations, and covers any pair of bases. Suggestions for C99 rationale: Add to 6.4.4.2: In C89, exactly representable floating-point constants, such as 1.0, were not required to convert to their exact representation. In addition, floating-point constants of the same syntactic form were not required to convert to the same representation thruout a program. That is, the decimal constant 1.0 could be converted to the value 1.0 - DBL_EPSILON / FLT_RADIX in one place in the program, to the value 1.0 in a second place in the program, and to the value 1.0 + DBL_EPSILON in a third place in the program. In C99, exactly representable values must be converted to that exact value. Also, all floating-point constants of the same syntactic form must now convert to the same value. That is, all 1.57's must convert the same, but 1.570 still need not convert to the same value as 1.57 in a program (but it will if the recommended practice is followed).