WG14 N1052
CRITIQUE OF WG14/N1016
DECIMAL FLOATING-POINT ARITHMETIC
P.J. Plauger
Dinkumware, Ltd.
pjp@dinkumware.com
Both WG14 and WG21 have accepted WG14/N1016 as the basis for parallel
non-normative Technical Reports, adding decimal floating-point arithmetic
to C and C++. The decimal formats are based on work done at IBM plus
current standardization work within IEEE -- a revision of IEEE 754, the
widely adopted standard for binary floating-point arithmetic. The revised
standard IEEE 754R will describe both binary and decimal formats.
N1016 proposes adding three more basic types to C (and C++), for
decimal floating-point to coexist alongside whatever an implementation
currently uses for float, double, and long double. That also involves:
-- adding literal formats for the new types, such as 1.0DF
-- adding promotion and conversion rules between the new types and
existing basic types
-- adding macros to to describe new rounding modes and exceptions
-- adding a new header to describe properties of the new types
a la
-- adding macros to to describe huge values and NaN for the new types
-- adding versions of all the math functions in for the new types
-- adding half a dozen new functions to to perform operations
particular to decimal floating-point on the new types
-- adding new conversion specifications, such as %GLD, to the formatted
input/output conversions in and
-- adding strto* functions to for the new types
-- adding wcsto* functions to for the new types
-- adding the relevant macros to for the new functions added
to
There is no provision for new complex types in C based on the decimal
floating-point types.
Adding three new types is clearly a major change to C. C++ could avoid
the proliferation of names by overloading existing names, but it shares
all the other problems. It is not at all clear to me that much need
exists for having two sets of floating-point types in the same program.
It is certainly not clear to me that whatever need exists is worth the
high cost of adding all these types.
Lower-cost alternatives exist. Probably the cheapest is simply to define
a binding to IEEE 754R, much like the current C Annex F for IEC 60559 (the
international version of IEEE 754). If we do this, most of the above
list evaporates. But I believe we should still add a few items to the
C library, and the obvious analogs to C++, independent of whether the
binding is provided by the compiler. These involve:
-- adding macros to to describe *some optional* new rounding
modes but *no* new exceptions
-- adding half a dozen new functions to to perform operations
demonstrably useful for decimal floating-point but not unreasonable
even for other floating-point formats
-- adding the relevant macros to for just the half a dozen
new functions added to
An implementation that chooses this option would also get C complex decimal
in the bargain.
Defining a binding does not oblige compiler vendors to switch to
decimal floating-point, however. For programmers to get access to this
new technology, they would have to wait for a vendor to feel motivated
to make major changes to both compiler and library. And it is not just
the vendor who pays a price:
-- The major cost for the unconcerned user is a slight reduction in
average precision, due mainly to the greater "wobble" inherent in decimal
vs. binary format. The major payoff is fewer surprises of the form
(10.0 * 0.1 != 1.0), and perhaps faster floating-point input/output.
-- A bigger cost falls on those programmers who need to convert often between
decimal floating-point and existing formats. These could be a rare and
special breed, but there might also be a distributed cost in performance
and complexity among users who have to access databases that store
floating-point results in non-decimal encoded formats. Experts have to
write the converters; many non-experts might have to use them.
So if all we do is define a binding, it could take a long time for
decimal floating-point to appear in the marketplace. But a reasonably
cheap alternative can mitigate this problem. Simply define a way to
add decimal floating-point as a pure bolt-on to C and C++ -- a library-only
package that can work with exsiting C and C++ compilers. For C, this
means adding one or more new headers that define three structured types
and a slew of functions for manipulating them. For C++, the solution
can look much like the existing standard header -- a template
class plus operators and functions that manipulate it, with the three
IEEE 754R decimal formats as explicit instantiations.
The major compromise in a bolt-on solution is the weaker integration
of decimal floating-point with the rest of the language and library.
C suffers most because it doesn't permit operator overloading for
user-defined types. (C++ seems to be doing just fine with complex as
a library-defined type.) The payoff is a much greater chance that vendors
will supply implementations sooner rather than later.
I believe the best thing is to do both of these lightweight things,
instead of adding three more floating-point types to the C and C++
languages. Implementing a TR of this form assures programmers that they
can reap the benefits of decimal floating-point one way or the other.
And such a TR provides a road map for how best to supply decimal
floating-point for both the short and long term.
A FEW DETAILED CRITICISMS OF N1016
The header should not define names that differ arbitrarily
from existing names in (e.g. DEC32_COEFF_DIG).
-----
The rounding modes in have even more confusing differences
in naming. In C99, for example, "down" means "toward -infinity",
while in N1016 it means "toward zero". Here's a Rosetta Stone:
N1016 C99 (meaning)
FE_DEC_ROUND_DOWN FE_TOWARDZERO (toward zero)
FE_DEC_ROUND_HALF_EVEN FE_TONEAREST (ties to even)
FE_DEC_ROUND_CEILING FE_UPWARD (toward +inf)
FE_DEC_ROUND_FLOOR FE_DOWNWARD (toward -inf)
FE_DEC_ROUND_HALF_UP (ties away from zero)
FE_DEC_ROUND_HALF_DOWN (ties toward zero)
FE_DEC_ROUND_UP (away from zero)
Only the last two modes are optional in N1016.
-----
Similarly, for floating-point exceptions, we have:
N1016 C99
FE_DEC_DIVISION_BY_ZERO FE_DIVBYZERO
FE_DEC_INVALID_OPERATION FE_INVALID
FE_DEC_INEXACT FE_INEXACT
FE_DEC_OVERFLOW FE_OVERFLOW
FE_DEC_UNDERFLOW FE_UNDERFLOW
There's no good reason for the differences in the first two lines.
-----
Many of the functions cited in N1016 are *not* present in the
latest draft of IEEE 754R, as advertised. I had to hunt down
specifics at:
http://www2.hursley.ibm.com/decimal/decbits.pdf
http://www2.hursley.ibm.com/decimal/decarith.pdf
-----
There's no reason for to have HUGE_VALF, etc. followed
by DEC32_HUGE, etc. Once again, the names should not differ
arbitrarily.
It's also not clear why there should be a DEC_NAN and not a
DEC_INF (or DEC_INFINITY). Either both are easily generated
as inline expressions (0D/0D, 1D/0D, etc.) or neother is.
I favor defining both as macros (perhaps involving compiler
magic).
-----
N1016 calls for the interesting function:
T divide_integerxx(T x, T y);
(where xx stands for d32, d64, or d128).
This generates an integer quotient only if it's exactly representable.
But N1016 doesn't require the corresponding remainder function. I suggest
loosely following the pattern of remquo and adding an optional pointer to
where to return the remainder:
T divide_integerxx(T x, T y, T *prem);
Or we could follow the pattern of remquo even closer and replace this
function with remainder_integerxx that returns the quotient on the side.
-----
N1016 calls for the function:
T remainder_nearxx(T x, T y);
This has the same specification as the C99 remainder function.
It should share the same root name (e.g. remainderd32, after remainderf).
-----
N1016 calls for the function:
T round_to_integerxx(T x, T y);
This has the same specification as the C99 rint function.
It should share the same root name.
-----
N1016 calls for the interesting function:
T normalizexx(T x);
This shifts the coefficient right until the least-significant decimal
digit is nonzero, or it changes a zero value to canonical form. It's not
clear what should happen if such a shift would cause an overflow,
but that behavior must be specified. (It's also not clear what the
purpose of this function is in the best of circumstances, but maybe
I haven't read and played enough to understand.)
Finally, this function does exactly the opposite of what "normalize"
has meant as a term of art in floating-point for many decades. The
name suggests that the coefficient is shifted *left* until the
*most-significant" decimal digit is nonzero. (And I can even think
of uses for that operation.) Either the spec should change or the
name.
-----
N1016 calls for the function:
bool check_quantumxx(double x, double y);
This returns true only if x and y have the same exponent. There
is no spec for this function in N1016, but it is clearly the same
as the function same_quantum in the defining document from IBM.
I see no good reason for changing the name.
-----
N1016 calls for the interesting function:
T quantizexx(T x, T y);
This changes x, as need be, to have the same exponent as y. It is,
in effect, a "round to N decimal places" function. In conjunction
with the rounding mode, it provides the much-touted "proper" decimal
rounding rules to match various government rules and commercial
practices. It's not entirely clear to me (yet) why a similar
function couldn't reap much the same benefit even when used with
binary floating-point. (I hasten to add that there are still other
good reasons for using decimal floating-point instead.) In any
event, I believe that it can and should be generalized so that
its application to binary floating-point makes equal sense.
Unfortunately, the function in its current form gets its parameter
N from the *decimal exponent* of y. That supports cute notation such
as:
price = quantized32(price * (1DF + tax_rate), 1.00DF);
assuming that you find "1.00DF" more revealing than "2". But it
thus relies heavily on the ability to write literals (or generate
values) with a known decimal exponent. An earlier version evidently
required/permitted you to write the number of decimal digits instead.
I've long used an internal library function that truncates binary
floating-point values to a specified number of binary places (plus or
minus). I could see a real benefit in having both binary and
decimal versions of "quantize" that apply to floating-point
values of either base. But this particular form does not generalize
at all well.
-----
N1016 does *not* call for several other functions that suggest themselves.
These include decimal equivalents of frexp and ldexp, and possibly an
exp10. I know from past experience that some of these are highly useful
in writing math functions; but I need more experience writing IEEE 754R
decimal floating-point math functions before I can make really
informed recommendations.
Nevertheless, the absence of such functions, and the lapses in the
functions presented in N1016, suggests to me the need for more
experience in using the decimal stuff from IEEE 754R as a general
floating-point data type before we freeze a TR of any sort, for either
C or C++.