WG14/N808 J11/98-007 DR 63 (and 56): Floating-Point accuracy Proposed wording for the Response for DR 63 (and 56). The following wording should be incorporated into a future version of the standard in section 5.2.4.2.2 <float.h>, page 25, between paragraphs 3 and 4: The accuracy of the floating-point operations (+, -, *, /) and of the math library (<math.h> and <complex.h>) functions that return a floating-point result is implementation defined. The implementation shall document the accuracy as the worst case error in terms of units in the last place (ULPs), or it shall say that the accuracy is "unknown." Rationale (section 5.2.4.2.2) In documenting the accuracy of the math library functions, an implementation is allowed to break the domain of a function into disjoint regions and document the worst case error in each region. An implementation should document the accuracy of the complex operations and functions by listing the worst case error in both the real and the imaginary parts. The error between two floating-point numbers (such as computed and infinitly precise) in ULPs (Unit in the Last Places) depends on the radix and the precision used in representing the number, but not the exponent. Thus, the ULP of a value x, with exponent e, precision p, and radix r, is r**(1-p). With a decimal radix and 3 digits of precision, the computed value 3.14e-2 differs from the value 3.1416e-2 by 0.16 ULPs. The C9X committee discussed the idea of allowing the programmer to find out the accuracy of floating-point operations and math functions during compilation (say via macro symbols) or during execution (via a function call), but neither got enough support (even though this is what some users want) to warrant the change to the standard. Either would require over one hundred macro symbols to name every math function (such as ULP_SINF, ULP_SIN, and ULP_SIND just for the sin function) and that was a large change for a small utility. The values in <float.h> should be in terms of the hardware representation used to store floating-point values (not in terms of the effective accuracy of operations). That is, a poor quality implementation cannot say (in <float.h>) that it has less precision than the hardware representation, so that it could claim 1-ULP accuracy for its operations. The committee could not agree on upper limits on accuracy that all conforming imlementations must meet, eg, addition is no worst than 2 ULPs for all implementations. It is a quality of implementation issue. Implementations that conform to IEC-559 have 0.5 ULP accuracy in round to nearest mode, and 1.0 ULP accuracy in the other three rounding modes, for the basic arithmetic operations and sqrt. For other floating-point arithmetics, it is a rare implementation that has worse than 1-ULP accuracy for the basic arithmetic operations. The accuracy of binary-decimal converions and format conversions are discussed elsewhere in the standard. For the math library functions, currently, fast correctly rounded 0.5 ULP accuracy is still a research problem. Some imlementation provide two math libraries: one being fast and sloppy, the other being slow and accurate. Some math functions, such as those that do argument reduction modulo an approximation of pi, have good accuracy for small arguments, and poor accuracy for large arguments. It is not unusual for an implementation of the trig functions to have zero bits correct in the computed result for large arguments. That is why the committee allows an implementation to break the domain of the function into disjoint sections and specify the worst case accuracy in each section. The value of an ULP of a positive number x that is a power of the radix is the difference between the next representable number (in the model) larger than x and x, rather than the difference between x and the closest representable number smaller than x. These two potential definitions of an ULP's value differ by a factor of the radix.