1. Abstract
This proposal allows implementations to define extended floatingpoint types in addition to the three standard floatingpoint types. It defines the rules for how the extended floatingpoint types interact with each other and with other types without changing the behavior of the existing standard floatingpoint types. It specifies the rules for type conversions, arithmetic conversions, promotions, and narrowing conversions. It specifies the necessary standard library support for the extended floatingpoint types.
The companion paper, [P1468], defines some standard names for commonly used floatingpoint formats. The end goal of these two papers is to enable the use of newer floatingpoint types, such as IEEE 16bit, in standard conforming code.
2. Revision history
2.1. R0 > R1 (preCologne)
Applied guidance from SG6 in Kona 2019:

Make the floatingpoint conversion rank not ordered between types with overlapping (but not subsetted) ranges of finite values. This makes the ranking a partial order.

Narrowing conversions are now based on floatingpoint conversion rank instead of ranges of finite values, which preservesthe current narrowing conversions relations between standard floatingpoint types; it also interacts favorably with the rank being a partial ordering.

Operations that deal with floatingpoint types whose conversion ranks are unordered are now illformed.

The relevant parts of the guidance have been applied to the library wording section as well.
Afterwards, applied suggestions from EWGI in Kona 2019 (this modifies some of the points above):

Apply the suggestion to make types where one has a wider range of finite values, but a lower precision than the other, unordered in their conversion rank, and therefore make operations that mix them illformed. The motivating example was IEEE754
andbinary16
; see Floatingpoint conversion rank for more details. This change also caused this paper to drop the term "range of finite values", since the modified semantics are better expressed in terms of sets of values of the types.bfloat16 
Add a change to narrowing conversions, to only allow exact conversions to happen.

Explicitly list parts of the language that are not changed by this paper; provide a more detailed analysis of the standard library impact.
2.2. R1 > R2 (preBelfast)
Changes based on feedback in Cologne from SG6, LEWGI, and EWGI. Further changes came from further development of the paper by the authors, especially the overload resolution section.

Revised floatingpoint promotion rules. Removed all promotions other than
tofloat
. Added wording for promoting values passed to varargs functions.double 
Added the section on implicit conversions.

Added the section on overload resolution.

Added the sections on feature test macros.

Added the sections about the possibility of new library traits.

Changed the wording for the
function in theabs
section.< cmath > 
Added constraints to the I/O streams overloads for
to only support standard floatingpoint types.complex 
Added the section about possible changes to
.< atomic >
2.3. R2 > R3 (prePrague)
Changes based on feedback in Belfast from EWG.

Change the overload resolution rules, removing the rule that prefers one standard conversion over another based on conversion rank. Replace it with a rule that prefers one standard conversion over another only when the two types have the same representation.

As a result of the overload resolution change, change floatingpoint promotion so that any type smaller than
promotes todouble
.double 
Allow implicit conversions between pointer types that point to floatingpoint types with the same representation.
3. Motivation
16bit floatingpoint support is becoming more widely available in both hardware (ARM CPUs and NVIDIA GPUs) and software (OpenGL, CUDA, and LLVM IR). Programmers wanting to take advantage of 16bit floatingpoint support have been stymied by the lack of builtin compiler support for the type. A common workaround is to define a class type with all of the conversion operators and overloaded arithmetic operators to make it behave as much as possible like a builtin type. But that approach is cumbersome and incomplete, requiring inline assembly or other compilerspecific magic to generate efficient code.
The problem of efficiently using newer floatingpoint types that haven’t traditionally been supported can’t be solved through userdefined libraries. A possible solution of an implementation changing
to be a 16bit type would be unpopular because users want support for newer floatingpoint types in addition to the standard types, and because users have come to expect
and
to be 32 and 64bit types and have lots of existing code written with that assumption.
This problem is worth solving, and there is no viable solution under the current standard. So changing the core language in an extensible and backwardcompatible way is appropriate. Providing a standard way for implementations to support 16bit floatingpoint types will result in better code, more portable code, and wider use of those types.
This paper changes the language so that implementations can support 16bit and other nonstandard floatingpoint types. [P1468] gives wellknown names to 16bit and other commonly used floatingpoint types.
The motivation for the current approach of extended floatingpoint types comes from discussion of the previous paper [P0192]. That proposal’s single new standard type of
was considered insufficient, preventing the use of both IEEE754 16bit and
in the same application. When that proposal was rejected, the current, more expansive, proposal was developed. It is not feasible to predict which floatingpoint types, or even how many different types, will be used in the future, so this proposal allows for as many types as the implementation sees fit.
The language rules in this paper and the type aliases in [P1468] are designed to work together to simplify the safe adoption of the new floatingpoint types into existing applications. Programmers should be able to start using the 16bit types in one part of the application without having to change other parts. When
and
are IEEEconformant types, it should be possible to mix the standard types with their fixedlayout aliases without problems. This proposal would be a failure if code using the IEEE 64bit type alias had to be kept mostly separate from code using
.
4. Proposal summary
In a nutshell:

Introduce extended floatingpoint types.

Define floatingpoint conversion rank, which governs how floatingpoint types interact with each other.

Adjust the rules for promotion, standard conversions, usual arithmetic conversions, narrowing conversions, and overload resolution to make use of conversion rank.

Add function overloads and template specializations for extended floatingpoint types to
,< cmath >
,< charconv >
,< format >
and< complex >
.< atomic >
5. Core language changes
5.1. Things that aren’t changing
It is currently implementationdefined whether or not the floatingpoint types support infinity and NaN. That is not changing. That feature will still be implementationdefined, even for extended floatingpoint types.
The radix of the exponent of each floatingpoint type is currently implementationdefined. That is not changing. This paper will make it easier for the radix of extended floatingpoint types to be different from the radix of the standard types.
5.2. Extended floatingpoint types
In addition to the three standard floatingpoint types,
,
, and
, implementations may define any number of extended floatingpoint types, similar to how implementations may define extended integer types.
5.2.1. Reasoning
The set of floatingpoint types that have hardware support is not possible to accurately predict years into the future. The standard needs to provide an extensible solution that can adapt to changing hardware without having to modify the standard.
5.2.2. Wording
Modify 6.7.1 "Fundamental types" [basic.fundamental] paragraph 12:
There are three standard floatingpoint types:,
float , and
double . The type
long double provides at least as much precision as
double , and the type
float provides at least as much precision as
long double . The set of values of the type
double is a subset of the set of values of the type
float ; the set of values of the type
double is a subset of the set of values of the type
double . There may also be implementationdefined extended floatingpoint types. The standard and extended floatingpoint types are collectively called floatingpoint types. The value representation of floatingpoint types is implementationdefined. [...]
long double
5.3. Conversion rank
Define floatingpoint conversion rank to mimic in some ways the existing integer conversion rank. Floatingpoint conversion rank is defined in terms of the sets of values that the types can represent. If the set of values of type
is a strict superset of the set of values of type
, then
has a higher conversion rank than
. If two types have the exact same sets of values, they still have different conversion ranks; see the wording below for the exact rules. If the sets of values of two types are neither a subset nor a superset of each other, then the conversion ranks of the two types are unordered. Floatingpoint conversion rank forms a partial order, not a total order; this is the biggest difference from integer conversion rank.
5.3.1. Reasoning
Earlier versions of this proposal used the range of finite values to define conversion rank, and had the conversion rank be a total ordering. Discussions in SG6 in Kona pointed out that that definition resulted in undesirable interactions between IEEE
with 5bit exponent and 10bit mantissa, and
with 8bit exponent and 7bit mantissa.
has a much larger finite range, so it would have a higher conversion rank under the old rules. Mixing
and
in an arithmetic operation would result in the
value being converted to
despite the loss of three bits of precision. This implicit loss of precision was worrisome, so the definition of conversion rank was changed so that the usual arithmetic conversions between two floatingpoint values always preserves the value exactly.
For the purposes of conversion rank, infinity and NaN are treated just like any other values. If type
supports infinity and type
does not, then
can never have a higher conversion rank than
, even if
has a bigger range and a longer mantissa.
5.3.2. Wording
Change the title of section 6.7.4 [conv.rank] from "
Integer conversion rank
" to "
Conversion ranks
", but leave the stable name unchanged. Insert a new paragraph at the end of the subclause:
Every floatingpoint type has a floatingpoint conversion rank defined as follows:
The rank of a floating point type
is greater than the rank of any floatingpoint type whose set of values is a proper subset of the set of values of
T .
T The rank of
is greater than the rank of
long double , which is greater than the rank of
double .
float The rank of any standard floatingpoint type is greater than the rank of any extended floatingpoint type with the same set of values.
The rank of any extended floatingpoint type relative to another extended floatingpoint type with the same set of values is implementationdefined, but still subject to the other rules for determining the floatingpoint conversion rank.
For all floatingpoint types
,
T1 , and
T2 , if
T3 has greater rank than
T1 and
T2 has greater rank than
T2 , then
T3 has greater rank than
T1 .
T3 [ Note: The conversion ranks of extended floatingpoint types
and
T1 will be unordered if the set of values of
T2 is neither a subset nor a superset of the set of values of
T1 . This can happen when one type has both a larger range and a lower precision than the other.  end note ] [ Note: The floatingpoint conversion rank is used in the definition of the usual arithmetic conversions ([expr.arith.conv]).  end note ]
T2
5.4. Promotion
All floatingpoint types with a conversion rank that is less than the rank of
promote to
. (This automatically covers arguments passed to the elipsis part of a varargs functon.)
5.4.1. Reasoning
This most closely matches the integer promotion rules, though the floatingpoint rules are simpler due to the lack of signed/unsigned and enumeration types.
5.4.2. Wording
Change section 7.3.7 "Floatingpoint promotion" [conv.fpprom] as follows:
A prvalue oftypea floatingpoint type whose floatingpoint conversion rank ([conv.rank]) is less than the rank of
float can be converted to a prvalue of type
double . The value is unchanged.
double This conversion is called floatingpoint promotion.
5.5. Implicit conversions
The standard currently allows implicit conversions between any arithmetic types, even if the conversion could result in a loss of information. This can’t be changed for any existing arithmetic types, but it is possible to choose a different behavior for the new extended floatingpoint types. A reasonable rule would be to allow implicit conversions between floatingpoint types only when converting to a type with a higher conversion rank (or when converting between two standard floatingpoint types, for backward compatibility).
If implicit conversions are always allowed, that most closely matches existing behavior and will likely lead to fewer surprises. If potentially lossy conversions are not implicit, that will lead to safer code. Since explicit conversions between all floatingpoint types would still be allowed, potentially lossy conversions would be more verbose rather than forbidden.
This issue was discussed in EWG in Belfast. There were strong opinions on both sides of the issue. The poll that was taken did not show consensus in either direction. The authors of the paper are undecided on which choice is best. So this is still an open issue.
This issue is mostly independent from the rest of the paper, so it could be decided either way without invalidating the rest of the proposal. The only area that would be affected would be overload resolution, since some standard conversions would no longer be standard conversions if implicit versions were restricted.
Should implicit conversions be allowed from larger floatingpoint types to smaller floatingpoint types?
5.5.1. Example
Assuming that extended floatingpoint conversions are restricted as proposed:
double f64 = 1.0 ;
float f32 = 2.0 ;
__fp16 f16 = 3.0 ;
fp64 = fp32 ; // okay
fp32 = fp64 ; // okay, standard types for backward compatibility
fp64 = fp16 ; // okay
fp16 = fp64 ; // error, implicit conversion not allowed
fp16 = static_cast < __fp16 > ( fp64 ); // okay, explicit cast
5.5.2. Wording
If it is decided that implicit conversions are always allowed, then no wording changes are necessary. 7.3.9 [conv.double] already does the right thing.
If it is decided that implicit conversions should be restricted, then the following wording changes are necessary:
Modify section 7.3.9 "Floatingpoint conversions" [conv.double] as follows:
A prvalue of floatingpoint type can be converted to a prvalue of another floatingpoint type with a higher conversion rank or with the same set of values, or a prvalue of standard floatingpoint type can be converted to a prvalue of another standard floatingpoint type . If the source value can be exactly represented in the destination type, the result of the conversion is that exact representation. If the source value is between two adjacent destination values, the result of the conversion is an implementationdefined choice of either of those values. Otherwise, the behavior is undefined.
The conversions allowed as floatingpoint promotions are excluded from the set of floatingpoint conversions.
In section 7.6.1.8 "Static cast" [expr.static.cast], add a new paragraph after paragraph 10 ("A value of integral or enumeration type can [...]"):
A value of floatingpoint type can be explicitly converted to any other floatingpoint type. If the source value can be exactly represented in the destination type, the result of the conversion is that exact representation. If the source value is between two adjacent destination values, the result of the conversion is an implementationdefined choice of either of those values. Otherwise, the behavior is undefined.
Note: A
from a higher floatingpoint conversion rank to a lower conversion rank is already covered by [expr.static.cast] p7, which talks about inverses of standard conversions. The new paragraph is necessary to allow explicit conversions between types with unordered conversion ranks. The wording about what to do with the value is stolen from the floatingpoint conversions section [conv.double].
5.6. Usual arithmetic conversions
The proposed usual arithmetic conversions for floatingpoint types are based on the floatingpoint conversion rank, similar to integer arithmetic conversions. But because floatingpoint conversions are a partial ordering, there may be some expressions where neither operand will be converted to the other’s type. It is proposed that these situations are illformed.
5.6.1. Example
In this implementation, let
be IEEE
,
be IEEE
, and
be 16bit bfloat.
float f32 = 1.0 ;
__fp16 f16 = 2.0 ;
__bfloat b16 = 3.0 ;
f32 + f16 ; // okay, f16 converted to float
, result type is float
f32 + b16 ; // okay, b16 converted to float
, result type is float
f16 + b16 ; // error, neither type can convert to the other via arithmetic conversions
5.6.2. Wording
Modify section 7.4 Usual arithmetic conversions [expr.arith.conv] as follows:
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:
If either operand is of scoped enumeration type ([dcl.enum]), no conversions are performed; if the other operand does not have the same type, the expression is illformed.
If either operand is of type long double, the other shall be converted to long double.Otherwise, if either operand is double, the other shall be converted to double.Otherwise, if either operand is float, the other shall be converted to float. Otherwise, if either operand has a floatingpoint type, the following rules shall be applied:
 If both operands have the same type, no further conversion is needed.
 Otherwise, if one of the operands has a type that is not a floatingpoint type, that operand shall be converted to the type of the operand with the floatingpoint type.
 Otherwise, if the floatingpoint conversion ranks ([conv.rank]) of the types of the operands are ordered, then the operand with the type of the lower floatingpoint conversion rank shall be converted to the type of the other operand.
 Otherwise, the expression is illformed.
Otherwise, the integral promotions ([conv.prom]) shall be performed on both operands.(59) Then the following rules shall be applied to the promoted operands:
If both operands have the same type, no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank shall be converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.
Otherwise, both operands shall be converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
If one operand is of enumeration type and the other operand is of a different enumeration type or a floatingpoint type, this behavior is deprecated (D.1).
5.7. Narrowing conversions
A narrowing conversion is a conversion from a type with a higher floatingpoint conversion rank to a type with a lower conversion rank, or a conversion between two types with unordered conversion rank.
5.7.1. Same representation
When two different floatingpoint types have the same representation, one of the types has a higher conversion rank than the other. Which means that a conversion between the two types will be a narrowing conversion in one of the directions even though the value will be preserved. For example, on some implementations,
and
have the same representation, but
always has a higher conversion rank than
, so a conversion from
to
is considered a narrowing conversion.
An earlier version of this paper defined narrowing conversions in terms of sets of representable values, not in terms of conversion rank. With that definition, conversions between types with the same representation would never be a narrowing conversion. SG6 in Kona preferred using conversion rank over sets of values, so the proposal was changed to the current definition. One argument against the old definition was that it changed the behavior for standard floatingpoint types, as in the example of
and
above.
It would be possible to have different rules for standard floatingpoint types and extended floatingpoint types, but the authors feel it is best to maintain consistency between standard and extended types, and to not change the behavior of standard types.
5.7.2. Constant values
This proposal preserves the existing wording in [dcl.init.list] p7.2, "except where the source is a constant expression and the actual value after conversion is within the range of values that can be represented (even if it cannot be represented exactly)." A reasonable argument could be made that this constant value exception should not apply to extended floatingpoint types. But the authors are not in favor of that change. It would introduce an inconsistency between standard and extended types. It would cause
to be a narrowing conversion because
cannot be represented exactly in binary floatingpoint representations (assuming that
is the name of an extended floatingpoint type with a conversion rank lower than
).
5.7.3. Wording
Modify the definition of narrowing conversions in 9.3.4 "Listinitialization" [dcl.init.list] paragraph 7 item 2:
fromfrom a floatingpoint typeto
long double or
double , or from
float to
double
float to another floatingpoint type whose floatingpoint conversion rank is not greater than that of
T , except where the source is a constant expression and the actual value after conversion is within the range of values that can be represented (even if it cannot be represented exactly), or
T
5.8. Overload resolution
When comparing standard conversion sequences that involve floatingpoint conversions, prefer conversions between types that have the same representation.
5.8.1. Reasoning
The extended floatingpoint types should behave as much as possible as other arithmetic types, with one exception: overload resolution should prefer cases where the argument and the parameter have the same representation, even when they are different types.
Overload resolution of floatingpoint types behaves mostly like the
s, with anything smaller than
promoted to
. The difference comes when choosing among standard conversions. Unlike the
s, where there is no preference among standard conversions, a conversion between two floatingpoint types with the same representation is preferred over a conversion between types with different representations.
These rules ease the adoption of the fixedlayout type aliases defined in [P1468] as described at the end of the motivation.
5.8.2. Examples
These examples assume that
and
are IEEE 32 and 64bit types, that
is X87 80bit, and that
,
,
,
, and
are all extended floatingpoint types (distinct from
and
) representing IEEE 16bit, 32bit, 64bit, and 128bit types and
.
Given that function
is overloaded on all the standard floatingpoint types:
void f ( float );
void f ( double );
void f ( long double );
Calling
with an argument of type
,
, or
will of course choose the overload with the exact match. Calling
with an argument of type
,
,
, or
will choose
because all of those types promote to
. Calling
with an argument of type
will be ambiguous because there is no exact match or promotion and none of the standard conversions is preferred over the others.
Given that function
is overloaded on the 32bit and 64bit extended types:
void g ( __fp32 );
void g ( __fp64 );
Calling
with
or
will choose the overload with the exact match. Calling
with
will call
because
and
have the same representation. For similar reasons, calling
with
will call
. Calling
with an argument of any other type will be ambiguous because the standard conversion to neither parameter is preferred over the other.
5.8.3. Wording
In 12.3.3.2 "Ranking implicit conversion sequences" [over.ics.rank] paragraph 4, add a new bullet between (4.2) and (4.3):
(4.2) A conversion that promotes an enumeration whose underlying type is fixed to its underlying type is better than one that promotes to the promoted underlying type, if the two are different.
 (4.3) A conversion from floatingpoint type
to floatingpoint type
FP1 is better than a conversion from
FP2 to floatingpoint type
FP1 if
FP3
(4.3.1)
and
FP1 have the same set of values, and
FP2 (4.3.2)
or
FP1 is an extended floatingpoint type, and
FP2 (4.3.3)
has a different set of values from
FP3 or the floatingpoint conversion rank ([conv.rank]) of
FP1 is not less than the rank of
FP3 .
FP2 (4.3)(4.4) If classis derived directly or indirectly from class
B , conversion of
A to
B * is better than conversion of
A * to
B * , and conversion of
void * to
A * is better than conversion of
void * to
B * .
void *
Note: The important parts of the proposed wording are (4.3.1) and the first half of (4.3.3). (4.3.2) and the second half of (4.3.3) exist to give reasonable behavior when at least two of the standard floatingpoint types have the same representation (which is true for
and
on many implementations).
5.8.4. Alternate proposal
This paper contained a different set of rules for overload resolution in R2. That proposal had some opposition when presented in Belfast (though no poll was taken about it), so the overload rules were revised to what is listed immediately above. But the authors feel that the older rules have some advantages. So the older rules are listed here in case anyone can think of a way to combine the two into a new proposal that has the advantages of both.
When comparing conversion sequences that involve floatingpoint conversions, prefer conversions that are valuepreserving, and prefer conversions to lower conversion ranks over conversions to higher conversion ranks.
This has the advantage that, when code overloads a function on some of the floatingpoint types, then calls to that function will be wellformed as long as the argument is of a floatingpoint type that can be safely converted to at least one of the possible parameter types.
For example, let
be IEEE 32bit,
be IEEE 64bit, and
be X87 80bit. And let
,
, and
be extended floatingpoint types that represent IEEE 16bit, 32bit, and 64bit respectively. Then given a function overloaded on the 32bit and 64bit extended floatingpoint types:
void f ( __fp32 );
void f ( __fp64 );
The following functions calls should be wellformed:
f (( __fp16 ) 1.0 ); // calls f(__fp32)
f (( __fp32 ) 2.0 ); // calls f(__fp32)
f (( __fp64 ) 3.0 ); // calls f(__fp64)
f (( float ) 4.0 ); // calls f(__fp32)
f (( double ) 5.0 ); // calls f(__fp64)
But the function call
would be an ambiguous function call.
The disadvantage of this proposal (and the reason for the opposition to it in Belfast) is that adding a new overload of an existing function can change the function that is called without the new overload being an exact match for the argument type.
In 12.3.3.2 "Ranking implicit conversion sequences" [over.ics.rank] paragraph 4, add a new bullet between (4.2) and (4.3):
(4.2) A conversion that promotes an enumeration whose underlying type is fixed to its underlying type is better than one that promotes to the promoted underlying type, if the two are different.
 (4.3) A conversion from floatingpoint type
to floatingpoint type
F1 is better than a conversion from
F2 to floatingpoint type
F1 if the set of values of
F3 is a subset of the set of values of
F1 and
F2 has greater floatingpoint conversion rank ([conv.rank]) than
F3 .
F2 (4.3)(4.4) If classis derived directly or indirectly from class
B , conversion of
A to
B * is better than conversion of
A * to
B * , and conversion of
void * to
A * is better than conversion of
void * to
B * .
void *
5.9. Pointer conversions
Pointers to two different floatingpoint types can be freely and implicitly converted between each other as long as the two floatingpoint types have the same representation.
5.9.1. Reasoning
These pointer conversions will ease the transition to the fixedlayout aliases. There is lots of existing floatingpoint code that uses pointerto
or pointerto
as function parameters. When compilers implement
(or whatever name is chosen as the name for the IEEE 64bit type in [P1468]), users on systems where
is IEEE 64bit can change their parameters and variables from
to
incrementally. With the pointer conversions and overload resolution rules above,
and
will essentially behave as if they were the same type even though they are different types. Users do not have to change their code from
to
all at once and don’t have to coordinate the change with thirdparty library vendors. (Changing code to use
instead of
is a good thing for many users because it more clearly communicates the author’s intent.)
If the user is on a system where
is not IEEE 64bit (or later ports to such a system), then
will not implicitly convert to
. In that environment the switch from
to
has to be wellcoordinated and can’t be done piecemeal. The compiler will help with that by reporting a compilation error when such implicit pointer conversions are attempted.
If these pointer conversions are not implicit, then a user switching code from
to
would likely have to add
s to the code in some places. In addition to being more work, this leaves the code more fragile and error prone, because there will be runtime failures rather than compilation errors if the code is later ported to a system where
and
do not have the same representation.
5.9.2. Wording
Add a new paragraph to the end of section 7.3.11 "Pointer conversions" [conv.ptr]:
A prvalue of type "pointer to cv
", where
F1 is a floatingpoint type, can be converted to a prvalue of type "pointer to cv
F1 ", where
F2 is a different floatingpoint type with the same set of values as
F2 . The pointer value is unchanged by this conversion.
F1
5.10. Feature test macro
Should there be a feature test macro to indicate that the implementation supports at least one extended floatingpoint type?
Implementations could support extended floatingpoint types without supporting any of the aliases defined in [P1468]. So it might be useful to have a feature test macro that indicates support for extended floatingpoint types listed in 15.10 [cpp.predefined]. But it would likely have to be one of the conditionallydefined macros, and not listed in Table 17, since a conforming compiler might choose to not define any extended floatingpoint types. If the macro is defined, it would not indicate which extended floatingpoint types are supported, only that there exists at least one extended floatingpoint type in the implementation.
6. Library changes
Making extended floatingpoint types easy to use does not require introducing any new names to the standard library. But it does require adding new overloads or new template specializations in several places. (The companion paper, [P1468], does add new names related to floatingpoint types to the standard library. But those names are not necessary to make extended floatingpoint types useful.)
To handle I/O of extended floatingpoint types, changes are proposed to
and
, but not to
or
.
Implementations will have to change
and
to give correct answers for extended floatingpoint types. The existing wording in the standard already covers that (by referring to all floatingpoint types without listing them explicitly), so no wording changes are needed.
Most of the standard functions that operate on floatingpoint types need wording changes to add overloads or template specializations for the extended floatingpoint types. These classes and functions are in
,
, and
.
No changes are proposed to the following parts of the standard library:

: The header< cfloat >
provides macros describing some of the properties of the standard floatingpoint types. The use of macros does not extend very well to extended floatingpoint types with implementationspecific names. No changes are proposed to< cfloat >
; users should use< cfloat >
instead to query the properties of extended floatingpoint types.std :: numeric_limits 
The
andprintf
families of functions: There is no practical way to add format specifiers for implementationspecific types with implementationspecific names.scanf 
I/O streams: To support extended floatingpoint types, new virtual functions would need to be added to
andnum_get
, which would be an ABI break.num_put 
The
andstrtod
families of functions: With different names for each floatingpoint type (which forstod
was inherited from C), that scheme doesn’t work well for extended floatingpoint types.strtod 
The
family of functions: They are defined in terms ofstd :: to_string
, which will not support extended floatingpoint types.snprintf 
: [rand.req] states that certain template arguments have to be< random >
,float
, ordouble
. The wording could be changed to allow any floatingpoint type, butlong double
does not support extended integral types, so we are not proposing that it support extended floatingpoint types either.< random >
6.1. Possible new names
While no new names need to be added to the standard library for extended floatingpoint types to be useful, there are some new things that could be useful. The authors are undecided if these are useful enough to be worth adding, and would appreciate LEWG feedback on the matter.
6.1.1. Standard/extended floatingpoint traits
is true for both standard and extended floatingpoint types. Should the standard also provide
and/or
? Will users need to distinguish between standard and extended types often enough that
becomes too unwieldy?
Should the new type traits
and/or
be introduced?
6.1.2. Conversion rank trait
Should there be a type trait that reports whether or not one floatingpoint type has a higher conversion rank than another? This could be useful when writing function templates to figure out which conversions between different floatingpoint types are safe. See the constructors for
as an example of where this trait would be useful.
Should a new type trait be introduced that can be used to query the floatingpoint conversion rank relationship?
6.2. < charconv >
Add overloads for all extended floatingpoint types for the functions
and
.
6.2.1. Wording
Add a new paragraph to the beginning of 20.19.1 "Header
synopsis" [charconv.syn], before the start of the synopsis:
When a function has a parameter of type, the implementation provides overloads for all signed and unsigned integer types and
integral as the parameter type. When a function has a parameter of type
char , the implementation provides overloads for all floatingpoint types as the parameter type.
floating  point
Change the header synopsis in [charconv.syn] as follows:
to_chars_result to_chars ( char * first , char * last , see  below integral value , int base = 10 ); to_chars_result to_chars ( char * first , char * last , float floating  point value ); to_chars_result to_chars ( char * first , char * last , double value ); to_chars_result to_chars ( char * first , char * last , long double value ); to_chars_result to_chars ( char * first , char * last , float floating  point value , chars_format fmt ); to_chars_result to_chars ( char * first , char * last , double value , chars_format fmt ); to_chars_result to_chars ( char * first , char * last , long double value , chars_format fmt ); to_chars_result to_chars ( char * first , char * last , float floating  point value , chars_format fmt , int precision ); to_chars_result to_chars ( char * first , char * last , double value , chars_format fmt , int precision ); to_chars_result to_chars ( char * first , char * last , long double value , chars_format fmt , int precision ); // ... from_chars_result from_chars ( const char * first , const char * last , see below integral & value , int base = 10 ); from_chars_result from_chars ( const char * first , const char * last , float floating  point & value , chars_format fmt = chars_format :: general ); from_chars_result from_chars ( const char * first , const char * last , double value , chars_format fmt = chars_format :: general ); from_chars_result from_chars ( const char * first , const char * last , long double value , chars_format fmt = chars_format :: general );
In 20.19.2 "Primitive numeric output conversion" [charconv.to.chars], leave the first three paragraphs unchanged, but modify the rest of the section as follows:
to_chars_result to_chars ( char * first , char * last , see below integral value , int base = 10 ); RequiresExpects :has a value between 2 and 36 (inclusive).
base Effects: The value of
is converted to a string of digits in the given base (with no redundant leading zeroes). Digits inthe range 10..35 (inclusive) are represented as lowercase characters
value ..
a . If
z isless than zero, the representation starts with
value .
'' Throws: Nothing.
Remarks:[ Note: The implementationshall provideprovides overloads for all signed and unsigned integer types andas the type of the parameter
char .  end note ]
value
to_chars_result to_chars ( char * first , char * last , float floating  point value ); to_chars_result to_chars ( char * first , char * last , double value ); to_chars_result to_chars ( char * first , char * last , long double value ); Effects:
is converted to a string in the style of
value in the "C" locale. The conversion specifier is
printf or
f , chosen according to the requirement for a shortest representation (see above); a tie is resolved in favor of
e .
f Throws: Nothing.
[ Note: The implementation provides overloads for all floatingpoint types as the type of the parameter.  end note ]
value
to_chars_result to_chars ( char * first , char * last , float floating  point value , chars_format fmt ); to_chars_result to_chars ( char * first , char * last , double value , chars_format fmt ); to_chars_result to_chars ( char * first , char * last , long double value , chars_format fmt ); RequiresExpects :has the value of one of the enumerators of
fmt .
chars_format Effects:
is converted to a string in the style of
value in the "C" locale.
printf Throws: Nothing.
[ Note: The implementation provides overloads for all floatingpoint types as the type of the parameter.  end note ]
value
to_chars_result to_chars ( char * first , char * last , float floating  point value , chars_format fmt , int precision ); to_chars_result to_chars ( char * first , char * last , double value , chars_format fmt , int precision ); to_chars_result to_chars ( char * first , char * last , long double value , chars_format fmt , int precision ); RequiresExpects :has the value of one of the enumerators of
fmt .
chars_format Effects:
is converted to a string in the style of
value in the "C" locale with the given precision.
printf Throws: Nothing.
[ Note: The implementation provides overloads for all floatingpoint types as the type of the parameter.  end note ]
value See also: ISO C 7.21.6.1
Modify 20.19.3 "Primitive numeric input conversion" [charconv.from.chars] as follows:
All functions namedanalyze the string
from_chars for a pattern, where
[ first , last ) is required to be a valid range. If no characters match the pattern,
[ first , last ) is unmodified, the member
value of the return value is
ptr and the member
first is equal to
ec . [ Note: If the pattern allows for an optional sign, but the string has no digit characters following the sign, no characters match the pattern. — end note ] Otherwise, the characters matching the pattern are interpreted as a representation of a value of the type of
errc :: invalid_argument . The member
value of the return value points to the first character not matching the pattern, or has the value
ptr if all characters match. If the parsed value is not in the range representable by the type of
last ,
value is unmodified and the member
value of the return value is equal to
ec . Otherwise,
errc :: result_out_of_range is set to the parsed value, after rounding according to
value , and the member
round_to_nearest is valueinitialized.
ec
from_chars_result from_chars ( const char * first , const char * last , see below integral & value , int base = 10 ); RequiresExpects :has a value between 2 and 36 (inclusive).
base Effects: The pattern is the expected form of the subject sequence in thelocale for the given nonzero base, as described for
"C" , except that no
strtol or
"0x" prefix shall appear if the value of
"0X" is 16, and except that
base is the only sign that may appear, and only if
'' has a signed type.
value Throws: Nothing.Remarks:[ Note: The implementationshall provideprovides overloads for all signed and unsigned integer types andas the referenced type of the parameter
char .  end note ]
value
from_chars_result from_chars ( const char * first , const char * last , float floating  point & value , chars_format fmt = chars_format :: general ); from_chars_result from_chars ( const char * first , const char * last , double & value , chars_format fmt = chars_format :: general ); from_chars_result from_chars ( const char * first , const char * last , long double & value , chars_format fmt = chars_format :: general ); RequiresExpects :has the value of one of the enumerators of
fmt .
chars_format Effects: The pattern is the expected form of the subject sequence in thelocale, as described for
"C" , except that
strtod
the sign
may only appear in the exponent part;
'+' if
has
fmt set but not
chars_format :: scientific , the otherwise optional exponent part shall appear;
chars_format :: fixed if
has
fmt set but not
chars_format :: fixed , the optional exponent part shall not appear; and
chars_format :: scientific if
is
fmt , the prefix
chars_format :: hex or
"0x" is assumed. [ Example: The string
"0X" is parsed to have the value
0x123 with remaining characters
0 .  end example ]
x123 In any case, the resulting
is one of at most two floatingpoint values closest to the value of the string matching the pattern.
value Throws: Nothing.[ Note: The implementation provides overloads for all floatingpoint types as the referenced type of the parameter.  end note ]
value See also: ISO C 7.22.1.3, 7.22.1.4
6.3. < format >
Change
to support extended floatingpoint types.
6.3.1. Wording
... to be determined ...
6.4. < cmath >
Add overloads for extended floatingpoint types to the functions in
. It is expected that this will be the most used part of the library changes.
6.4.1. Wording
Modify 26.8.1 "Header
synopsis" [cmath.syn] paragraph 2 as follows:
For each set of overloaded functions within, with the exception of
< cmath > , there shall be additional overloads sufficient to ensure:
abs
1. If any argument of arithmetic type corresponding to aparameter has type
double , then all arguments of arithmetic type (6.7.1) corresponding to
long double parameters are effectively cast to
double .
long double 2. Otherwise, if any argument of arithmetic type corresponding to aparameter has type
double or an integer type, then all arguments of arithmetic type corresponding to
double parameters are effectively cast to
double .
double 3. Otherwise, all arguments of arithmetic type corresponding toparameters have type
double .
float  1. If any argument corresponding to a
parameter has floatingpoint type, then all arguments of arithmetic type ([basic.fundamental]) corresponding to
double parameters are effectively cast to the floatingpoint type with the highest floatingpoint conversion rank ([conv.rank]) among the types of such floatingpoint arguments. If two such floatingpoint arguments have types whose conversion rank is unordered, the program is illformed.
double  2. Otherwise, all arguments of arithmetic type corresponding to
parameters are effectively cast to
double .
double [ Note:
is exempted from these rules in order to stay compatible with C.  end note ]
abs
Modify section 26.8.2 "Absolute values" [c.math.abs] as follows:
[ Note: The headersand
< cstdlib > declare the functions described in this subclause. — end note ]
< cmath >
int abs ( int j ); long int abs ( long int j ); long long int abs ( long long int j ); float abs ( float j ); double abs ( double j ); long double abs ( long double j ); Effects: Thefunctions that take integer arguments have the semantics specified in the C standard library for the functions
abs ,
abs , and
labs
llabs ,.,
fabsf , and
fabs
fabsl Remarks: Ifis called with an argument of type
abs () for which
X is
is_unsigned_v < X > true
and ifcannot be converted to
X by integral promotion, the program is illformed. [ Note: Arguments that can be promoted to
int are permitted for compatibility with C. — end note ]
int
floating  point abs ( floating  point x ); Returns: The absolute value of.
x Remarks: The implementation provides overloads for all floatingpoint types as the type of parameter, with the same floatingpoint type as the return type.
x See also: ISO C 7.12.7.2, 7.22.6.1
6.5. < complex >
Make
be welldefined when
is an extended floatingpoint type. The explicit specializations of
are removed. The only differences between the explicit specializations was the explicitness of the constructors that take a complex number of a different type. This behavior is incorporated into the main template through
.
6.5.1. Wording
Modify 26.4 "Complex numbers" [complex.numbers] paragraph 2 as follows:
The effect of instantiating the templatefor any type
complex other thanthat is not a floatingpoint type is unspecified. The specializations,
float , or
double
long double of,
complex < float > , and
complex < double >
complex < long double > for floatingpoint types are literal types ([basic.types]).
complex
Delete the explicit specializations from 26.4.1 "Header
synopsis" [complex.syn]:
namespace std { // 26.4.2, class template complex template class complex ; // 26.4.3, specializations template <> class complex ; template <> class complex ; template <> class complex ; // ...
In 26.4.2 "Class template
" [complex], modify the synopsis of the constructors as follows:
constexpr complex ( const T & re = T (), const T & im = T ()); constexpr complex ( const complex & ) = default ; template < class X > constexpr explicit ( see below ) complex ( const complex < X >& );
Remove section 26.4.3 "Specializations" [complex.special] in its entirety.
In 26.4.4 "Member functions" [complex.members], add the following after paragraph 2:
template < class X > constexpr explicit ( see below ) complex ( const complex < X >& other ); Ensures:
.
real () == other . real () && imag () == other . imag () Remarks: The expression inside
evaluates to false if and only if the floatingpoint conversion rank of
explicit is greater than the floatingpoint conversion rank of
T .
X
In 26.4.6 "Nonmember operations" [complex.ops], change the streaming operators as follows:
Constraints:
template < class T , class CharT , class traits > basic_istream < charT , traits >& operator >> ( basic_istream < charT , traits >& is , complex < T >& x ); is a standard floatingpoint type.
T
RequiresExpects : The input valuesshall beare convertible to.
T Effects: Extracts a complex number
of the form:
x ,
u , or
( u ) , where
( u , v ) is the real part and
u is the imaginary part (29.7.4.2).
v If bad input is encountered, calls
(which may throw
is . setstate ( ios_base :: failbit ) (29.5.5.4)).
ios :: failure Returns:
.
is Remarks: This extraction is performed as a series of simpler extractions. Therefore, the skipping of whitespace is specified to be the same for each of the simpler extractions.
Constraints:
template < class T , class charT , class traits > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& o , const complex < T >& x ); is a standard floatingpoint type.
T Effects: Inserts the complex number
...
x
Modify 26.4.9 "Additional overloads" [cmplx.over] paragraphs 2 and 3 as follows:
The additional overloads shall be sufficient to ensure:
If the argument has type, then it is effectively cast to
long double .
complex < long double > Otherwise, if the argument has typeor an integer type, then it is effectively cast to
double .
complex < double > Otherwise, if the argument has type, then it is effectively cast to
float .
complex < float >  If the argument has a floatingpoint type
, then it is effectively cast to
T .
complex < T >  Otherwise, if the argument has integer type, then it is effectively cast to
.
complex < double > Function template
shall have additional overloads sufficient to ensure, for a call with at least one argument of type
pow :
complex < T >
If either argument has typeor type
complex < long double > , then both arguments are effectively cast to
long double .
complex < long double > Otherwise, if either argument has type,
complex < double > , or an integer type, then both arguments are effectively cast to
double .
complex < double > Otherwise, if either argument has typeor
complex < float > , then both arguments are effectively cast to
float .
complex < float >  If one argument is of type
or
T1 and the other argument is of type
complex < T1 > or
T2 where
complex < T2 > and
T1 are both floatingpoint types:
T2
 If the floatingpoint conversion ranks ([conv.rank]) of
and
T1 are different and unordered, the program is illformed.
T2  Otherwise, if
has greater floatingpoint conversion rank than
T1 , then both arguments are effectively cast to
T2 .
complex < T1 >  Otherwise, both arguments are effectively cast to
.
complex < T2 >  Otherwise, if the other argument has integer type, it is effectively cast to
.
complex < T >
Note: No literal suffixes are defined for complex numbers of extended floatingpoint types. Subclause [complex.literals] is unchanged.
6.6. < atomic >
Change the wording so that the specializations of
for floatingpoint types apply to all floatingpoint types, not just the standard floatingpoint types listed.
The specializations of
for integral types are not required to include specializations for all extended integral types, only for the extended types that are used in
. It would be reasonable for this proposal to adopt a similar approach. If we take that approach, there are no wording changes to
in this paper. Instead, there would be some changes to
as part of [P1468], requiring specializations only for the floatingpoint aliases that name extended floatingpoint types.
Should
have specializations for all floatingpoint types, or only for extended floatingpoint types with wellknown aliases (see [P1468])?
6.6.1. Wording
Modify 31.8.3 "Specializations for floatingpoint types" [atomics.types.float] paragraph 1 as follows:
There are specializations of theclass template for
atomic theall floatingpoint types. For each such type,
float , and
double
long double , the specialization
floating  point provides additional atomic operations appropriate to floatingpoint types.
atomic < floating  point >
6.7. Feature test macro
No feature test macro is being proposed for the library changes in this paper. The library changes would be covered by the core language feature test macro, if there is one.