WG14 N1052 CRITIQUE OF WG14/N1016 DECIMAL FLOATING-POINT ARITHMETIC P.J. Plauger Dinkumware, Ltd. pjp@dinkumware.com Both WG14 and WG21 have accepted WG14/N1016 as the basis for parallel non-normative Technical Reports, adding decimal floating-point arithmetic to C and C++. The decimal formats are based on work done at IBM plus current standardization work within IEEE -- a revision of IEEE 754, the widely adopted standard for binary floating-point arithmetic. The revised standard IEEE 754R will describe both binary and decimal formats. N1016 proposes adding three more basic types to C (and C++), for decimal floating-point to coexist alongside whatever an implementation currently uses for float, double, and long double. That also involves: -- adding literal formats for the new types, such as 1.0DF -- adding promotion and conversion rules between the new types and existing basic types -- adding macros to to describe new rounding modes and exceptions -- adding a new header to describe properties of the new types a la -- adding macros to to describe huge values and NaN for the new types -- adding versions of all the math functions in for the new types -- adding half a dozen new functions to to perform operations particular to decimal floating-point on the new types -- adding new conversion specifications, such as %GLD, to the formatted input/output conversions in and -- adding strto* functions to for the new types -- adding wcsto* functions to for the new types -- adding the relevant macros to for the new functions added to There is no provision for new complex types in C based on the decimal floating-point types. Adding three new types is clearly a major change to C. C++ could avoid the proliferation of names by overloading existing names, but it shares all the other problems. It is not at all clear to me that much need exists for having two sets of floating-point types in the same program. It is certainly not clear to me that whatever need exists is worth the high cost of adding all these types. Lower-cost alternatives exist. Probably the cheapest is simply to define a binding to IEEE 754R, much like the current C Annex F for IEC 60559 (the international version of IEEE 754). If we do this, most of the above list evaporates. But I believe we should still add a few items to the C library, and the obvious analogs to C++, independent of whether the binding is provided by the compiler. These involve: -- adding macros to to describe *some optional* new rounding modes but *no* new exceptions -- adding half a dozen new functions to to perform operations demonstrably useful for decimal floating-point but not unreasonable even for other floating-point formats -- adding the relevant macros to for just the half a dozen new functions added to An implementation that chooses this option would also get C complex decimal in the bargain. Defining a binding does not oblige compiler vendors to switch to decimal floating-point, however. For programmers to get access to this new technology, they would have to wait for a vendor to feel motivated to make major changes to both compiler and library. And it is not just the vendor who pays a price: -- The major cost for the unconcerned user is a slight reduction in average precision, due mainly to the greater "wobble" inherent in decimal vs. binary format. The major payoff is fewer surprises of the form (10.0 * 0.1 != 1.0), and perhaps faster floating-point input/output. -- A bigger cost falls on those programmers who need to convert often between decimal floating-point and existing formats. These could be a rare and special breed, but there might also be a distributed cost in performance and complexity among users who have to access databases that store floating-point results in non-decimal encoded formats. Experts have to write the converters; many non-experts might have to use them. So if all we do is define a binding, it could take a long time for decimal floating-point to appear in the marketplace. But a reasonably cheap alternative can mitigate this problem. Simply define a way to add decimal floating-point as a pure bolt-on to C and C++ -- a library-only package that can work with exsiting C and C++ compilers. For C, this means adding one or more new headers that define three structured types and a slew of functions for manipulating them. For C++, the solution can look much like the existing standard header -- a template class plus operators and functions that manipulate it, with the three IEEE 754R decimal formats as explicit instantiations. The major compromise in a bolt-on solution is the weaker integration of decimal floating-point with the rest of the language and library. C suffers most because it doesn't permit operator overloading for user-defined types. (C++ seems to be doing just fine with complex as a library-defined type.) The payoff is a much greater chance that vendors will supply implementations sooner rather than later. I believe the best thing is to do both of these lightweight things, instead of adding three more floating-point types to the C and C++ languages. Implementing a TR of this form assures programmers that they can reap the benefits of decimal floating-point one way or the other. And such a TR provides a road map for how best to supply decimal floating-point for both the short and long term. A FEW DETAILED CRITICISMS OF N1016 The header should not define names that differ arbitrarily from existing names in (e.g. DEC32_COEFF_DIG). ----- The rounding modes in have even more confusing differences in naming. In C99, for example, "down" means "toward -infinity", while in N1016 it means "toward zero". Here's a Rosetta Stone: N1016 C99 (meaning) FE_DEC_ROUND_DOWN FE_TOWARDZERO (toward zero) FE_DEC_ROUND_HALF_EVEN FE_TONEAREST (ties to even) FE_DEC_ROUND_CEILING FE_UPWARD (toward +inf) FE_DEC_ROUND_FLOOR FE_DOWNWARD (toward -inf) FE_DEC_ROUND_HALF_UP (ties away from zero) FE_DEC_ROUND_HALF_DOWN (ties toward zero) FE_DEC_ROUND_UP (away from zero) Only the last two modes are optional in N1016. ----- Similarly, for floating-point exceptions, we have: N1016 C99 FE_DEC_DIVISION_BY_ZERO FE_DIVBYZERO FE_DEC_INVALID_OPERATION FE_INVALID FE_DEC_INEXACT FE_INEXACT FE_DEC_OVERFLOW FE_OVERFLOW FE_DEC_UNDERFLOW FE_UNDERFLOW There's no good reason for the differences in the first two lines. ----- Many of the functions cited in N1016 are *not* present in the latest draft of IEEE 754R, as advertised. I had to hunt down specifics at: http://www2.hursley.ibm.com/decimal/decbits.pdf http://www2.hursley.ibm.com/decimal/decarith.pdf ----- There's no reason for to have HUGE_VALF, etc. followed by DEC32_HUGE, etc. Once again, the names should not differ arbitrarily. It's also not clear why there should be a DEC_NAN and not a DEC_INF (or DEC_INFINITY). Either both are easily generated as inline expressions (0D/0D, 1D/0D, etc.) or neother is. I favor defining both as macros (perhaps involving compiler magic). ----- N1016 calls for the interesting function: T divide_integerxx(T x, T y); (where xx stands for d32, d64, or d128). This generates an integer quotient only if it's exactly representable. But N1016 doesn't require the corresponding remainder function. I suggest loosely following the pattern of remquo and adding an optional pointer to where to return the remainder: T divide_integerxx(T x, T y, T *prem); Or we could follow the pattern of remquo even closer and replace this function with remainder_integerxx that returns the quotient on the side. ----- N1016 calls for the function: T remainder_nearxx(T x, T y); This has the same specification as the C99 remainder function. It should share the same root name (e.g. remainderd32, after remainderf). ----- N1016 calls for the function: T round_to_integerxx(T x, T y); This has the same specification as the C99 rint function. It should share the same root name. ----- N1016 calls for the interesting function: T normalizexx(T x); This shifts the coefficient right until the least-significant decimal digit is nonzero, or it changes a zero value to canonical form. It's not clear what should happen if such a shift would cause an overflow, but that behavior must be specified. (It's also not clear what the purpose of this function is in the best of circumstances, but maybe I haven't read and played enough to understand.) Finally, this function does exactly the opposite of what "normalize" has meant as a term of art in floating-point for many decades. The name suggests that the coefficient is shifted *left* until the *most-significant" decimal digit is nonzero. (And I can even think of uses for that operation.) Either the spec should change or the name. ----- N1016 calls for the function: bool check_quantumxx(double x, double y); This returns true only if x and y have the same exponent. There is no spec for this function in N1016, but it is clearly the same as the function same_quantum in the defining document from IBM. I see no good reason for changing the name. ----- N1016 calls for the interesting function: T quantizexx(T x, T y); This changes x, as need be, to have the same exponent as y. It is, in effect, a "round to N decimal places" function. In conjunction with the rounding mode, it provides the much-touted "proper" decimal rounding rules to match various government rules and commercial practices. It's not entirely clear to me (yet) why a similar function couldn't reap much the same benefit even when used with binary floating-point. (I hasten to add that there are still other good reasons for using decimal floating-point instead.) In any event, I believe that it can and should be generalized so that its application to binary floating-point makes equal sense. Unfortunately, the function in its current form gets its parameter N from the *decimal exponent* of y. That supports cute notation such as: price = quantized32(price * (1DF + tax_rate), 1.00DF); assuming that you find "1.00DF" more revealing than "2". But it thus relies heavily on the ability to write literals (or generate values) with a known decimal exponent. An earlier version evidently required/permitted you to write the number of decimal digits instead. I've long used an internal library function that truncates binary floating-point values to a specified number of binary places (plus or minus). I could see a real benefit in having both binary and decimal versions of "quantize" that apply to floating-point values of either base. But this particular form does not generalize at all well. ----- N1016 does *not* call for several other functions that suggest themselves. These include decimal equivalents of frexp and ldexp, and possibly an exp10. I know from past experience that some of these are highly useful in writing math functions; but I need more experience writing IEEE 754R decimal floating-point math functions before I can make really informed recommendations. Nevertheless, the absence of such functions, and the lapses in the functions presented in N1016, suggests to me the need for more experience in using the decimal stuff from IEEE 754R as a general floating-point data type before we freeze a TR of any sort, for either C or C++.