Audience: SG6
S. Davis Herring <>
Los Alamos National Laboratory
November 20, 2018

# Introduction

SG6 discussed John McFarlane’s P0037R5 on Friday in San Diego, and my realization was that much of the difficulty in specifying a fixed-point number type hinges on identifying the “natural” behavior for multiplication and division. The group asked me to write about the different possibilities; the hope is that specific choices can then be made to increase the focus of future development.

The most obvious choice of type for a homogeneous binary operation (in the absence of special auto-widening) is that of the inputs. (For a C++ fixed-point type, we expect the type to include the allocation of its digits to integer and fractional parts.) This choice is subject to overflow and imprecision: 12.5×23.6 is 295 (which is too big for the dd.f format of the inputs), and 01.3×02.4 is 3.12 (which lies between dd.f values). Nonetheless, for fixed-width arithmetic it might seem to be a reasonable a priori default.

Note that fixed-point formats can include leading or trailing zeroes. For example, 0.0fff or ddd00 store only three digits each; they can be said to have negative numbers of integer and fractional digits, respectively. The example values below avoid such formats (and use decimal digits) for clarity.

# Underlying integer operations

In Jacksonville, I asked John about the choice made by his `fixed_point` template: in particular, why the multiplication of two values of format dddd.ff (with the same width of result) was dd.ffff rather than rounding to fit back into the same dddd.ff format. In particular, I was bothered by the increased risk of overflow from the smaller maximum value (e.g., even 1.52×0.07 overflows d.ff). He explained that the multiplication is multiplication of the underlying integers, along with a separate (static) operation of adjusting the exponents; it therefore shares the underlying integer multiplication’s properties of being exact and overflow-prone in the absence of widening.

P0037R5 says that `fixed_point` division is analogously defined as division of the underlying integers with a separate exponent selection: dividing dd.fff by d.f gives ddd.ff. (For example, dividing 12.345 by 0.1 is 123.45.) Continuing the analogy with integer operations, this truncates and cannot overflow for a same-sized result (disregarding −1 and 0 divisors); this necessitates the increased maximum value of the result (when there are any fractional digits in the divisor). Despite these familiar properties, the loss of precision can be as surprising as the overflow from multiplication: 5.2163/1.6604 is just 3, because 5.2163/0.0003 needs to be 17387 (and so the 3 is actually 00003). More generally, dividing any format by itself yields an integer.

Division of decimal IEEE 754 numbers, where the exponent is (partially) independent of the value, affords an interesting comparison here; it starts with the `fixed_point` rule, but shifts more digits into the significand as needed to avoid (or minimize) rounding. For a type with a static exponent, of course widening is the only means of adding those additional digits.

## Precise division

There are two strategies for getting more fractional digits in a quotient: shift the divisor right, reducing its precision (5.2163/0001.6 is 03.260) or shift the dividend left, risking overflow (001.00/000.07 is 00014, but 1.0000/000.07 is 014.28). The overflow from shifting corresponds to that from multiplication; it may of course be avoided by increasing the width of the dividend by the shift distance (5.216300000/1.6604 is 00003.14159). The division never overflows (as above), but narrowing back to the original format might (2.718300000/0.2327 is 00011.68156, which doesn’t fit back into d.ffff).

Lawrence Crowl in P0106R0 points out that shifting the dividend left by (at least) the width of the divisor guarantees that no quotient rounds (or even truncates) to zero. (There is a typo in the paper; the resolution parameter is supposed to be @a-\$b. John’s P0554R0 contradicts P0037R5 and says that `fixed_point` does this same shift.) We may call this “quasi-exact division”; it is analogous to multiplication in that the width of the result is the sum of those of the operands. Since this shift distance is known statically, it can be used to select a wider integer type (assuming a suitable trait for the width of the divisor). (Shifting by twice the width of the divisor guarantees that any change to the dividend or divisor (but not both) changes the result.)

# Width reduction

It is then a basic issue for these two operations that an “exact” result without overflow requires doubling the width. Applications with statically bounded expression depths can use width-tracking class templates (e.g., `integral` or `elastic_integer`) to statically preclude overflow and imprecision with fixed-width (possibly primitive) underlying types. (Note that division cannot reduce the dividend’s width because the divisor might be 1.) However, any loop with an accumulator absolutely must reduce the width of its results, as must anything with a large expression depth to avoid the memory and time cost of over-wide arithmetic.

Whether such reduction should discard the most significant digits (risking overflow), the least significant digits (risking imprecision), or some of each is of course application-specific. (E.g., when dividing 12.345 by 1.1, discarding one of each to get the format (d)dd.fff(f) best captures the result as 11.222.) In the absence of shifting, `int` discards high digits for multiplication and low digits for division; P0037R5 does the same, or with an `elastic_integer` representation type discards none for multiplication.

## When to shift

The P0037R5 approach follows “don’t pay for what you don’t use” by performing division without any shifts. However, shifts may still happen later:

``````template<class T>   // C++03 style
T normalize(T a,T b,T x) {
return (x-a)/(b-a);
}
auto f() {
using fix=fixed_point<int,-16>;
return normalize(fix(1.25),fix(2),fix(1.5));
}``````

Such shifting after division is pernicious: this `f` returns what should be 1/3 with 16 fractional bits, but all are zeroes shifted in by the conversion in `normalize`.

Even when producing a widened result, initial shifting (as used by P0106R0’s `nonnegative`/`negatable`) is necessary to avoid such behavior. Width reduction requires shifting the results of both multiplication and division. Width-reduced division may nonetheless be performed with one shift because dividend digits never influence more-significant quotient digits; trailing quotient digits may be prevented by reducing the precision shift by the same number of digits. (If the left shift is performed with signed arithmetic, an optimizer could notionally take advantage of undefined behavior for overflow to combine these.)

# The result type of `/`

Generic code cannot decorate operands to guide the widening or shifting for an operator and prevent surprising results like 1.5/0.25 being zero (tens). Neither can it perform conversions on the operator’s result, except perhaps back to an input type (which requires shifting to avoid false precision). In particular, the likelihood of code like

``````auto score(auto a,auto b,auto x,auto y) {
return (a*x+b*y)/(x*x+y*y);
}``````

argues against any approach based on expression templates (especially those containing references to their operands).

The only other means of customizing an operator is to provide alternative implementations in namespaces; this too is unusable in generic code (and precludes having a default at all). The most broadly useful value-type default for each operator must therefore be chosen. For division, the quasi-exact precision is the obvious candidate: it is usefully consistent to have multiplication and division widen their results identically.

## Sharp edges

The requisite shift may overflow a fixed-width (perhaps primitive) representation type for fixed-point numbers. Support for such underlying types is a deliberate feature of `fixed_point`, but may prove too dangerous. One could make the shifting behavior depend on the underlying integer type, but the inconsistency would be hard to teach.

Note that far more than the quasi-exact precision is wanted in certain cases (e.g., a pre-computed reciprocal of a small integer with 32-bit accuracy). Users of a generic function that might perform such a division must pad the dividend on the right (or, with automatic widening, the divisor on the left) with zeroes to avoid silent precision loss.

# Conclusion

The shifting recommended by P0106R0 and P0554R0 (but not P0037R5) has a non-zero cost and may preclude the direct use of primitive representation types. However, it is the best default in an environment that relies very heavily on defaults. It may be useful to provide convenient function templates that determine and apply the necessary adjustments to obtain a given precision or avoid shifting.

# Acknowledgments

Thanks to Hubert Tong for suggesting the comparison to decimal floating-point arithmetic. Thanks to Matthias Kretz for pointing out that shifting for precision could be widening. Thanks to Rachel Ertl for helping derive conclusion from observation.