#  JTC1/SC22/WG14 N890

```	WG14 Document Number: N890
Date: 09-Sept-1999

Defect report against C99 (this assumes that the current C9X FDIS is
approved as C99).

What is the accuracy of decimal string to/from "binary" (non-decimal)
floating-point conversions?

What is the accuracy of hexadecimal string to/from "decimal"
(non-power-of-2) floating-point conversions?

In the following, the phrase "decimal to binary" shall cover any pair of
bases that are not both a power of the same number.  It also shall cover
both the string to internal floating-point and internal floating-point
to string conversions.

There are three basic cases to consider:
Translation time:  decimal string to internal binary
Run-time:  decimal string to internal binary (scanf family, strtod family)
Run-time:  internal binary to decimal string (printf family)

For each of those basic cases, there are two generic sub-cases: base 10
to base 2 and base 2 to base 10.

Background:

6.4.4.2 Floating constants:

Paragraph 8 now has: For decimal floating constants, and also for
hexadecimal floating constants when FLT_RADIX is not a power of 2, the
result is either the nearest representable value, or the larger or
smaller representable value immediately adjacent to the nearest
representable value, chosen in an implementation-defined manner.  For
hexadecimal constants when FLT_RADIX is a power of 2, the result is
correctly rounded.

What happens if the result is exactly representable? I believe that 3
distinct result values are allowed (the exact value, and the two
representations adjacent to the exact value).

What happens if the same value, written in the same form, is used
multiple places in the program? I believe that the same source form can
be converted to different values in different places in the program.

Must decimal constants converted to a decimal radix be correctly
rounded? I believe they should and I believe that this was an oversight.

7.19.6.1 The fprintf function:

Paragraph 8 on "f,F" and "e,E" conversion specifiers says: The value is
rounded to the appropriate number of digits.

Does that mean round to nearest, round by truncating, round by add 0.5
and truncate, round as per the current rounding direction, or something
else? Must the rounding used for f,F match the rounding used for e,E?
Since there is no explicit allowance for multiple values (as there is in
6.4.4.2 Floating constants), must the value produced be as if the
infinitely precise value were rounded (and the rounding produce an error
less than or equal to 0.5 ulp for nearest and less than 1.0 ulp otherwise)?

7.19.6.2 The fscanf function:

Paragraph 10 discusses conversion.  Paragraph 12 on "a,e,f,g" conversion
specifiers discusses format.  Neither discuss accuracy of the decimal to
binary conversion, e.g., it is not specified.

What is the accuracy of floating-point string to internal representation
conversions? Is it the same as translation time? Is it the same as
strtod? Is it undefined behavior if the value is not exactly
representable? Is it round to nearest? Is it affected by the current
rounding mode, e.g., correctly rounded?

7.20.1.3 The strtod ...  functions:

What is the required accuracy of strtod family functions? It appears to
be either not specified or the same as 6.4.4.2.  It appears to depend
upon what paragraph 4 "interpreted as a floating constant according to
the rules of 6.4.4.2" means.

Paragraph 8 recommends that hexadecimal forms when converted to base-10
be one of the two adjacent to the hex source value.  Unfortunately, this
means that strtod("0x1.p0",(char **)NULL) be converted into either
value 1.0 (this appears to be an oversight that forgot about exactly
representable values).

Suggestions for C99 standard:

Change 6.4.4.2 Floating constants:

Second half of paragraph 3 to be: For decimal floating constants when
FLT_RADIX is not a power of 10, and also for hexadecimal floating
constants when FLT_RADIX is not a power of 2, the result is either the
nearest representable value, or the larger or smaller representable
value immediately adjacent to the nearest representable value, chosen in
an implementation-defined manner.  For hexadecimal constants when
FLT_RADIX is a power of 2 and for decimal constants when FLT_RADIX is a
power of 10, the result is correctly rounded.

Add a new paragraph in semantics: All floating constants of the same
syntactic form and semantic value shall convert to the same
representation.

Add a new paragraph in semantics: If a constant can be represented
exactly in its evaluation format, then it shall be converted to that
exact representation.

Changes to 7.19.6.1 The fprintf function:

Add near paragraph 11 before Recommended practice: The roundings used by
%f, %F, %e, and %E shall be the same and shall have an accuracy of
better than 1 ulp in round to nearest and better than 2 ulp in other
roundings.

In paragraph 12 (Recommended practice), add: "For a and A conversions"
at the start of the sentence.  Also add: "and the value is not exactly
representable" after "power of 2".

Changes to 7.19.6.2 The fscanf function:

In paragraph 12, "a,e,f,g" conversion specifier, add the sentence: The
accuracy of this conversion shall be no worse than that of strtold for
the same subject.

Change 7.20.1.3 The strtod ...  functions:

In paragraph 4, change "rules of 6.4.4.2" to "rules of 6.4.4.2
(including accuracy requirements)".

Add to paragraph 5: If the subject sequence has the decimal form and
FLT_RADIX is a power of 10, the value resulting from the conversion is
correctly rounded.

In paragraph 8 under Recommended practice, add the phrase "and the
result is not exactly representable" after the first comma.

Add a third recommended practice paragraph: Conversions done by strtod
family functions and fscanf family functions of the same valid
floating-point subject string shall produce the same value.

An alternative (not liked by this author) to all of the above is to add
to 5.2.4.2.2 Characteristics of floating types <float.h> in paragraph 4
before "and": ", binary-decimal conversions(footnote),".

footnote: binary-decimal covers both string to internal representations
and internal to string representations, and covers any pair of bases.

Suggestions for C99 rationale:

Add to 6.4.4.2: In C89, exactly representable floating-point constants,
such as 1.0, were not required to convert to their exact representation.
In addition, floating-point constants of the same syntactic form were
not required to convert to the same representation thruout a program.
That is, the decimal constant 1.0 could be converted to the value 1.0 -
DBL_EPSILON / FLT_RADIX in one place in the program, to the value 1.0 in
a second place in the program, and to the value 1.0 + DBL_EPSILON in a
third place in the program.  In C99, exactly representable values must
be converted to that exact value.  Also, all floating-point constants of
the same syntactic form must now convert to the same value.  That is,
all 1.57's must convert the same, but 1.570 still need not convert to
the same value as 1.57 in a program (but it will if the recommended
practice is followed).
```