**Submitter:** Fred Tydeman (US)

**Submission Date:** 1999-10-20

**Source:** NCITS J11

**Reference Document:** N/A

**Version:** 1.4

**Date:** 2001-09-18 14:47:22

**Subject:** Accuracy of decimal string to/from "binary"
(non-decimal) floating-point conversions

**Summary**

What is the accuracy of decimal string to/from "binary"
(non-decimal) floating-point conversions?

What is the accuracy of hexadecimal string to/from "decimal" (non-power-of-2) floating-point conversions?

In the following, the phrase "decimal to binary" shall cover any pair of bases that are not both a power of the same number. It also shall cover both the string to internal floating-point and internal floating-point to string conversions.

There are two basic cases to consider at run-time:

- decimal string to internal binary (
family,**scanf**family)**strtod** - internal binary to decimal string (
family)**printf**

For each of those basic cases, there are two generic sub-cases: base 10 to base 2 and base 2 to base 10.

**Background**

7.19.6.1 The ` fprintf` function:

Paragraph 8 on "" and "f,F" conversion specifiers says: The value is rounded to the appropriate number of digits.e,EDoes that mean round to nearest, round by truncating, round by add 0.5 and truncate, round as per the current rounding direction, or something else? Must the rounding used for

match the rounding used forf,F? Since there is no explicit allowance for multiple values (as there is in 6.4.4.2 Floating constants), must the value produced be as if the infinitely precise value were rounded (and the rounding produce an error less than or equal to 0.5 units in the last place (ulp) for nearest and less than 1.0 ulp otherwise)?e,EFor round to nearest, IEEE-754 (IEC-60559) requires that the maximum error be 0.5 ulp for a large subset of its values and 0.97 ulp for all values. For the other roundings, the maximum error allowed by IEEE-754 is 1.47 ulp. The fourth committee draft (1999-09-30) of ISO/IEC 10967-2 (LIA-2) appears to require the maximum error be in the range 0.5 to 0.75 ulp. These bounds appear to apply to both directions of conversions.

7.19.6.2 The ` fscanf` function:

Paragraph 10 discusses conversion. Paragraph 12 on "" conversion specifiers discusses format. Neither discuss accuracy of the decimal to binary conversion, e.g., it is not specified.a,e,f,gWhat is the accuracy of floating-point string to internal representation conversions? Is it the same as translation time? Is it the same as

? Is it undefined behavior if the value is not exactly representable? Is it round to nearest? Is it affected by the current rounding mode, e.g., correctly rounded?strtod

7.20.1.3 The ` strtod` ... functions:

What is the required accuracy offamily functions? It appears to be either not specified or the same as 6.4.4.2. It appears to depend upon what paragraph 4 "interpreted as a floating constant according to the rules of 6.4.4.2" means.strtod

**Suggested Changes**

Changes to 7.19.6.1 The ` fprintf` function:

Add near paragraph 11 before Recommended practice:

The roundings used by %, %f, %F, and %eshall be the same and shall have an accuracy of better than 1 ulp in round to nearest and better than 2 ulp in other roundings.E

Changes to 7.19.6.2 The ` fscanf` function:

In paragraph 12, "` a,e,f,g`" conversion
specifier, add the sentence:

The accuracy of this conversion shall be no worse than that offor the same subject.strtold

Change 7.20.1.3 The ` strtod` ...
functions:

In paragraph 4, change "rules of 6.4.4.2" to "rules of 6.4.4.2 (including accuracy requirements)"

Add a third recommended practice paragraph:

Conversions done byfamily functions andstrtodfamily functions of the same valid floating-point subject string shall produce the same value.fscanf

An alternative (not liked by this author) to all of the
above is to add to 5.2.4.2.2 Characteristics of floating types
<` float.h`> in paragraph 4 before "and": ",
binary-decimal conversions(footnote),".

footnote: binary-decimal covers both string to internal representations and internal to string representations, and covers any pair of bases.

**Committee Discussion**

5.2.4.2.2 paragraph 4 (which covers the accuracy of
` +`,

6.3.1.5 para. 1 implies that the different widths of F.P. types must have similar representations differing only in number of bits in exponent, mantissa, and padding.

In 7.19.6.1 ` f,F` format, the value is rounded
to the appropriate number of digits, which indicates that the
displayed value differs from the "numerical" value only with
regard to that rounding. (Of course,

7.20.1.3 says that the numeric string is interpreted as a value according to the rules in 6.4.4.2 for floating constants.

Details of rounding are not specified, although certain modes are described in 5.2.4.2.2.

The latitude allowed for inexactness by the standard applies
only to precision of representation and to rounding mode.

**Technical Corrigendum**

Change 5.2.4.2.2 paragraph #4 to:

The accuracy of the floating-point operations (,+,-,*) and of the library functions in </> and <math.h> that return floating-point results is implementation defined, as is the accuracy of the conversion between floating-point internal representations and string representations performed by the libray routine in <complex.h>, <stdio.h> and <stdlib.h>. The implementation may state that the accuracy is unknown.wchar.h