Defect Report #063
Submission Date: 01 Dec 93
Submittor: Project Editor (P.J. Plauger)
Source: Thomas Plum
[This is Defect Report #056, resubmitted for administrative reasons.]
The following requirement is implied in several places, but not explicitly
stated. It should be explicitly affirmed, or alternative wording adopted.
The representation of floating-point values (such as floating-point
constants, the results of floating-point expressions, and floating-point
values returned by library functions) shall be accurate to one unit
in the last position, as defined in the implementation's <float.h>
Discussion: The values in <float.h> aren't required to document
the underlying bitwise representations. If you want to know how many
bits, or bytes, a floating-point values occupies, use sizeof.
The <float.h> values document the mathematical properties of
the representation, the behaviors that the programmer can count upon
in analyzing algorithms.
It is a quality-of-implementation question as to whether the implementation
delivers accurate bits throughout the bitwise representation, or alternatively,
delivers considerably less accuracy. The point being clarified is
that <float.h> documents the delivered precision, not the theoretically
The C Standard imposes no requirement on the accuracy of floating-point
The C Standard speaks directly to the matter of floating-point accuracy
only in one or two areas. Subclause 220.127.116.11 Floating types,
page 35, says of conversions from one floating type to one with less
range and/or precision:
If the value being converted is in the range of values that
can be represented but cannot be represented exactly, the result is
either the nearest higher or nearest lower value, chosen in an implementation-defined
And in subclause 18.104.22.168 Usual arithmetic conversions, page
The values of floating operands and of the results of floating
expressions may be represented in greater precision and range than
that required by the type; the types are not changed thereby.
Otherwise, arithmetic for both integer and floating types is defined
in terms of the usual terminology of mathematics. Nothing in the C
Standard suggests that floating arithmetic is excused from the conventional
rules of arithmetic.
Nevertheless, it is commonplace for the functions declared in <math.h>
to deliver results less accurate than the underlying representation
can support. It is not uncommon even for simple arithmetic expressions
to do the same. And still, implementations document in <float.h>
properties of the underlying representation, not the effective
range and precision reliably delivered. The C community has typically
tolerated a certain laxity in this area.
Probably the most useful response would be to amend the C Standard
by adding two requirements on implementations:
Require that an implementation document the maximum errors it
permits in arithmetic operations and in evaluating math functions.
These should be expressed in terms of ``units in the least-significant
position'' (ULP) or ``lost bits of precision.''
Establish an upper bound for these errors that all implementations
must adhere to.
The state of the art, as the Committee understands it, is:
correctly rounded results for arithmetic operations (no loss
1 ULP for functions such as sqrt, sin, and cos
(loss of 1 bit of precision)
4-6 ULP (loss of 2-3 bits of precision) for other math functions.
Since not all commercially viable machines and implementations
meet these exacting requirements, the C Standard should be somewhat
The Committee would, however, suggest a requirement no more liberal
than a loss of 3 bits of precision, out of kindness to users. An implementation
with worse performance can always conform by providing a more conservative
version of <float.h>, even if that is not a desirable approach
in the general case.
The Committee should revisit this issue during the revision of the
Previous Defect Report
< - >
Next Defect Report