Document number: LEWG, EWG, SG14, SG6: P0037R0
Project: Programming Language C++, Library Evolution WG, SG14
Reply-to: John McFarlane, email@example.com
This proposal introduces a system for performing binary fixed-point arithmetic using built-in integral types.
Floating-point types are an exceedingly versatile and widely supported method of expressing real numbers on modern architectures.
However, there are certain situations where fixed-point arithmetic is preferable. Some systems lack native floating-point registers and must emulate them in software. Many others are capable of performing some or all operations more efficiently using integer arithmetic. Certain applications can suffer from the variability in precision which comes from a dynamic radix point . In situations where a variable exponent is not desired, it takes valuable space away from the significand and reduces precision.
Built-in integer types provide the basis for an efficient representation of binary fixed-point real numbers. However, laborious, error-prone steps are required to normalize the results of certain operations and to convert to and from fixed-point types.
A set of tools for defining and manipulating fixed-point types is proposed. These tools are designed to make work easier for those who traditionally use integers to perform low-level, high-performance fixed-point computation.
This proposal is a pure library extension. It does not require changes to any standard classes, functions or headers.
The design is driven by the following aims in roughly descending order:
Fixed-point numbers are specializations of
where the template parameters are described as follows.
ReprTypeType Template Parameter
This parameter identifies the capacity and signedness of the
underlying type used to represent the value. In other words, the size
of the resulting type will be
sizeof(ReprType) and it will be
is_signed<ReprType>::value is true. The default is
ReprType must be a fundamental integral type and should not be the
largest size. Suitable types include:
In limited situations,
std::uint64_t can be used.
The reasons for these limitations relate to the difficulty in finding
a type that is suitable for performing lossless integer
ExponentNon-Type Template Parameter
The exponent of a fixed-point type is the equivalent of the exponent
field in a floating-point type and shifts the stored value by the
requisite number of bits necessary to produce the desired range. The
default value of
Exponent is zero, giving
fixed_point<T> the same
The resolution of a specialization of
and the minimum and maximum values are
Any usage that results in values of
Exponent which lie outside the
INT_MIN / 2,
INT_MAX / 2), may result in undefined
behavior and/or overflow or underflow. This range of exponent values
is far in excess of the largest built-in floting-point type and should
be adequate for all intents and purposes.
Exponent template parameter is versatile and concise. It is an
intuitive scale to use when considering the full range of positive and
negative exponents a fixed-point type might possess. It also
corresponds to the exponent field of built-in floating-point types.
However, most fixed-point formats can be described more intuitively by the cardinal number of integer and/or fractional digits they contain. Most users will prefer to distinguish fixed-point types using these parameters.
For this reason, two aliases are defined in the style of
These aliases are declared as:
They resolve to a
fixed_point specialization with the given
signedness and number of integer and fractional digits. They may
contain additional integer and fractional digits.
For example, one could define and initialize an 8-bit, unsigned, fixed-point variable with four integer digits and four fractional digits:
or a 32-bit, signed, fixed-point number with two integer digits and 29 fractional digits:
Fixed-point numbers can be explicitly converted to and from built-in arithmetic types.
While effort is made to ensure that significant digits are not lost
during conversion, no effort is made to avoid rounding errors.
Whatever would happen when converting to and from an integer type
largely applies to
fixed_point objects also. For example:
true and is considered a acceptable rounding error.
Any operators that might be applied to integer types can also be applied to fixed-point types. A guiding principle of operator overloads is that they perform as little run-time computation as is practically possible.
With the exception of shift and comparison operators, binary operators can take any combination of:
Where the inputs are not identical fixed-point types, a simple set of promotion-like rules are applied to determine the return type:
The reasoning behind this choice is a combination of predictability and performance. It is explained for each rule as follows:
Shift operator overloads require an integer type as the right-hand parameter and return a type which is adjusted to accommodate the new value without risk of overflow or underflow.
Comparison operators convert the inputs to a common result type
following the rules above before performing a comparison and returning
Because arithmetic operators return a result of equal capacity to their inputs, they carry a risk of overflow. For instance,
causes overflow because because a type with 4 integer bits cannot store a value of 16.
Overflow of any bits in a signed or unsigned fixed-point type is classed as undefined behavior. This is a minor deviation from built-in integer arithmetic where only signed overflow results in undefined behavior.
The other typical cause of lost bits is underflow where, for example,
results in a value of 7. This results in loss of precision but is generally considered acceptable.
However, when all bits are lost due to underflow, the value is said to be flushed and this is classed as undefined behavior.
Errors resulting from overflow and flushes are two of the biggest headaches related to fixed-point arithmetic. Integers suffer the same kinds of errors but are somewhat easier to reason about as they lack fractional digits. Floating-point numbers are largely shielded from these errors by their variable exponent and implicit bit.
Three strategies for avoiding overflow in fixed-point types are presented:
For arithmetic operators, choice 1) is taken because it most closely follows the behavior of integer types. Thus it should cause the least surprise to the fewest users. This makes it far easier to reason about in code where functions are written with a particular type in mind. It also requires the least computation in most cases.
Choices 2) and 3) are more robust to overflow events. However, they represent different trade-offs and neither one is the best fit in all situations. For these reasons, they are presented as named functions.
promote, borrows a term from the language
feature which avoids integer overflow prior to certain operations. It
fixed_point object and returns the same value represented
by a larger
is equivalent to
Complimentary function template,
demote, reverses the process,
returning a value of a smaller type.
The following named function templates can be used as a hassle-free alternative to arithmetic operators in situations where the aim is to avoid overflow.
trunc_functions return the result as a type no larger than the inputs and with an exponent adjusted to avoid overflow;
promote_functions return the result as a type large enough to avoid overflow and underflow;
_squarefunctions are not guaranteed to be available for 64-bit types;
_squarefunctions produce undefined behavior when all input parameters are the most negative number;
_squarefunctions return an unsigned type;
_dividefunctions take heterogeneous
_reciprocalfunctions in no way guard against divide-by-zero errors;
trunc_shift_functions return results of the same type as their first input parameter and
The following example calculates the magnitude of a 3-dimensional vector.
Calling the above function as follows
returns the value, 9.890625.
Because the aim is to provide an alternative to existing arithmetic types which are supported by the standard library, it is conceivable that a future proposal might specialize existing class templates and overload existing functions.
Possible candidates for overloading include the functions defined in
numeric_limits. A new type
is_fixed_point, would also be useful.
fixed_point is intended to provide drop-in replacements to
existing built-ins, it may be preferable to deviate slightly from the
behavior of certain standard functions. For example, overloads of
functions from \
errno in the case of an error
prevents a function from being defined as pure. This highlights a
wider issue surrounding the adoption of the functional approach and
compile-time computation that is beyond the scope of this document.
The reason that
ReprType is restricted to built-in integer types
is that a number of features require the use of a higher - or
lower-capacity type. Supporting alias templates are defined to
fixed_point with the means to invoke integer types of
specific capacity and signedness at compile time.
There is no general purpose way of deducing a higher or
lower-capacity type given a source type in the same manner as
make_unsigned. If there were, this might be
adequate to allow alternative choices for
The bounded::integer library  exemplifies the benefits of keeping track of ranges of values in arithmetic types at compile time.
To a limited extent, the
trunc_ functions defined here also keep
track of - and modify - the limits of values. However, a combination
of techniques is capable of producing superior results.
For instance, consider the following expression:
The type of
make_ufixed<8, 0> but its value does not
exceed 81. Hence, an integer bit has been wasted. It may be possible
to track more accurate limits in the same manner as the
bounded::integer library in order to improve the precision of types
trunc_ functions. For this reason, the exact value of
the exponents of these return types is not given.
The behavior of the types specialized from
one sub-set of all potentially desirable behaviors. Alternative
One way to extend
fixed_point to cover these alternatives would be
to add non-type template parameters containing bit flags or enumerated
types. The default set of values would reflect
fixed_point as it
Many examples of fixed-point support in C and C++ exist. While almost all of them aim for low run-time cost and expressive alternatives to raw integer manipulation, they vary greatly in detail and in terms of their interface.
One especially interesting dichotomy is between solutions which offer a discrete selection of fixed-point types and libraries which contain a continuous range of exponents through type parameterization.
One example of the former is found in proposal N1169 , the intent of which is to expose features found in certain embedded hardware. It introduces a succinct set of language-level fixed-point types and impose constraints on the number of integer or fractional digits each can possess.
As with all examples of discrete-type fixed-point support, the limited choice of exponents is a considerable restriction on the versatility and expressiveness of the API.
Nevertheless, it may be possible to harness performance gains provided by N1169 fixed-point types through explicit template specialization. This is likely to be a valuable proposition to potential users of the library who find themselves targeting platforms which support fixed-point arithmetic at the hardware level.
There are many other C++ libraries available which fall into the latter category of continuous-range fixed-point arithmetic   . In particular, an existing library proposal, N3352 , aims to achieve very similar goals through similar means and warrants closer comparison than N1169.
N3352 introduces four class templates covering the quadrant of signed versus unsigned and fractional versus integer numeric types. It is intended to replace built-in types in a wide variety of situations and accordingly, is highly compile-time configurable in terms of how rounding and overflow are handled. Parameters to these four class templates include the storage in bits and - for fractional types - the resolution.
fixed_point class template could probably - with a few caveats -
be generated using the two fractional types,
negatable, replacing the
ReprType parameter with the integer bit
fastest for the rounding mode and
undefined as the overflow mode.
However, fixed_point more closely and concisely caters to the needs of users who already use integer types and simply desire a more concise, less error-prone form. It more closely follows the four design aims of the library and - it can be argued - more closely follows the spirit of the standard in its pursuit of zero-cost abstraction.
Some aspects of the design of the N3352 API which back up these conclusion are that:
trunc_function templates and are potentially more costly at run-time;
The added versatility that the N3352 API provides regarding rounding and overflow handling are of relatively low priority to users who already bear the scars of battles with raw integer types. Nevertheless, providing them as options to be turned on or off at compile time is an ideal way to leave the choice in the hands of the user.
Many high-performance applications - in which fixed-point is of potential value - favor run-time checks during development which are subsequently deactivated in production builds. The N3352 interface is highly conducive to this style of development. It is an aim of the fixed_point design to be similarly extensible in future revisions.
Subgroup: Guy Davidson, Michael Wong
Contributors: Ed Ainsley, Billy Baker, Lance Dyson, Marco Foco, ClÃ©ment GrÃ©goire, Nicolas Guillemot, Matt Kinzelman, JoÃ«l Lamotte, Sean Middleditch, Patrice Roy, Peter Schregle, Ryhor Spivak
An in-development implementation of the fixed_point class template and
its essential supporting functions and types is available
. It includes a
utility header containing such things as math and trigonometric
functions and a partial
numeric_limits specialization. Compile-time
and run-time tests are included as well as benchmarking support. It is
the source of examples and measurements cited here.
Despite a focus on usable interface and direct translation from integer-based fixed-point operations, there is an overwhelming expectation that the source code result in minimal instructions and clock cycles. A few preliminary numbers are presented to give a very early idea of how the API might perform.
Figures were taken from a single CPU, OS and compiler, namely:
Fixed inputs were provided to each function, meaning that branch prediction rarely fails. Results may also not represent the full range of inputs.
Where applicable various combinations of integer, floating-point and fixed-point types were tested with the following identifiers:
int64_tbuilt-in integer types;
long doublebuilt-in floating-point types;
Plus, minus, multiplication and division were tested in isolation using a number of different numeric types with the following results:
add(long double) 3.46011
sub(long double) 3.55474
mul(long double) 102.446
div(long double) 8.304
Among the slowest types are
long double. It is likely that they are
emulated in software. The next slowest operations are fixed-point
multiply and divide operations - especially with 64-bit types. This is
because values need to be promoted temporarily to double-width types.
This is a known fixed-point technique which inevitably experiences
slowdown where a 128-bit type is required on a 64-bit system.
Here is a section of the disassembly of the s15:16 multiply call:
The two 32-bit numbers are multiplied together and the result shifted
down - much as it would if raw
int values were used. The efficiency
of this operation varies with the exponent. An exponent of zero should
mean no shift at all.
sqrt implementation has not yet been tested with
fixed_point. (The naive implementation takes over 300ns.) For this
reason, a magnitude-squared function is measured, combining multiply
and add operations:
Only real number formats are tested:
long double 4.5056
Again, the size of the type seems to have the largest impact.
A similar operation includes a comparison and branch:
long double 6.4
Again, fixed-point and native performance are comparable.