Version 1.2

Defect | Summary | Date | Status |
---|---|---|---|

DR 1 | P1: Typos | 04/2017 | Closed |

DR 2 | P1: Functions that round result to narrower type don't always | 04/2017 | Closed |

DR 3 | P1: feature macros and header file inclusions | 04/2017 | Closed |

DR 4 | P3: Error in function name | 04/2017 | Closed |

DR 5 | P1: Is return of same type convertFormat or copy? | 04/2017 | Review |

DR 6 | P1: fetestexceptflag and exceptions passed to fegetexceptflag |
04/2017 | Review |

DR 7 | P1: Editorial changes | 04/2017 | Review |

DR 8 | P2: Editorial clarification about number digits in the coefficient | 04/2017 | Review |

DR 9 | P2,P3: Missing specification for usual arithmetic conversions, tgmath | 04/2017 | Open |

DR 10 | P1: wrong type for fesetmode parameter |
04/2017 | Review |

DR 11 | P2: a-style formatting not IEC 60559 conformant | 04/2017 | Open |

DR 12 | P1: Zero payloads and set payload function |
04/2017 | Open |

DR 13 | P3: Type-generic macros for functions that round result to narrower type | 04/2017 | Open |

DR 14 | P2: Effect of %a vs %A conversion specifiers |
04/2017 | Open |

DR 4 Prev <— Closed —> Next DR 2, or summary at top

**Submitter:** Jim Thomas et al.

**Submission Date:** 2016-03-19

**Source:** WG14

**Reference Document:** N2029

**Subject:** Part 1: Typos

**Summary**

- Page 18: In C 7.6.1a#4, the last sentence, “functon” should be “function”.
- Page 48: In C 7.6.2.4a#3, “The
**fetestexcept**function returns ...” should be “The**fetestexceptflag**function returns ...”.

**Suggested Technical Corrigendum**

- Page 18: In C 7.6.1a, paragraph 4, the last sentence, change “functon” to “function”
- Page 48: In C 7.6.2.4a#3, change “
**fetestexcept**” to “”.**fetestexcept**flag

Apr 2016 meeting

**Committee Discussion**

**Proposed Technical Corrigendum**

- Page 18: In C 7.6.1a, paragraph 4, the last sentence, change “functon” to “function”
- Page 48: In C 7.6.2.4a#3, change “
**fetestexcept**” to “”.**fetestexcept**flag

DR 4 Prev <— Closed —> Next DR 2, or summary at top

DR 1 Prev <— Closed —> Next DR 3, or summary at top

**Submitter:** Jim Thomas et al.

**Submission Date:** 2016-03-19

**Source:** WG14

**Reference Document:** N2029

**Subject:** Part 1: Functions that round result to narrower type don't always

**Summary**

**Summary**

The current way of referencing these functions reflects the usual situation, and is perhaps a helpful way of think about them generally. With a note about the uncharacteristic cases, it seems unlike to cause significant confusion. Also, changing all the references to these functions would be a large editorial undertaking, spanning multiple parts of the TS. Confusion could easily arise from having an inconsistent set of documents.

**Suggested Technical Corrigendum**

[1] The functions in this subclause round their results to a type typically narrower than the parameter types.

Page 40: After the change to C ending with “7.12.13a.6 Square root rounded to narrower type ... [3] These functions return the square root of x, rounded to the type of the function.”, insert the following:

In 7.12.13a #1, attach a footnote to the wording:typically narrowerwhere the footnote is:*) In some cases the destination type might not be narrower than the parameter types. For example,doublemight not be narrower thanlong double.

Apr 2016 meeting

**Committee Discussion**

**Proposed Technical Corrigendum**

Page 38: After the C 7.12.13a subclause heading, insert the following paragraph:

[1] The functions in this subclause round their results to a type typically narrower than the parameter types.

Page 40: After the change to C ending with “7.12.13a.6 Square root rounded to narrower type ... [3] These functions return the square root of x, rounded to the type of the function.”, insert the following:

In 7.12.13a #1, attach a footnote to the wording:typically narrowerwhere the footnote is:*) In some cases the destination type might not be narrower than the parameter types. For example,doublemight not be narrower thanlong double.

DR 1 Prev <— Closed —> Next DR 3, or summary at top

DR 2 Prev <— Closed —> Next DR 4, or summary at top

**Submitter:** Jim Thomas et al.

**Submission Date:** 2016-03-19

**Source:** WG14

**Reference Document:** N2029

**Subject:** Part 1: feature macros and header file inclusions/p>

**Summary**

So for

```
```**
#include <math.h>
#define __STDC_WANT_IEC_60559_BFP_EXT__
#include <tgmath.h>
float f(float x) { return ****nextup**(x); }

the ```
```**
#include <limits.h>
#define __STDC_WANT_IEC_60559_BFP_EXT__
#include <math.h>
...
**

the
The suggested corrigendum below specifies that the same set of **WANT** macros must be defined at the points in the code where the relevant headers are first included. This results in fewer combinations of interfaces and provides one sets of interfaces that is consistent
and complete with respect to a given set of WANT macros.

**Suggested Technical Corrigendum**

After 7.1.2#4, insert:[4a] Some standard headers define or declare identifiers contingent on whether certain macros whose names begin with_STDC_WANT_IEC_60559_and end with_EXT_are defined (by the user) at the point in the code where the header is first included. Within a preprocessing translation unit, the same set of such macros shall be defined for the first inclusion of all such headers.

Apr 2016 meeting

**Committee Discussion**

**Proposed Technical Corrigendum**

Page 5: At the end of 5.3, insert:

After 7.1.2#4, insert:[4a] Some standard headers define or declare identifiers contingent on whether certain macros whose names begin with_STDC_WANT_IEC_60559_and end with_EXT_are defined (by the user) at the point in the code where the header is first included. Within a preprocessing translation unit, the same set of such macros shall be defined for the first inclusion of all such headers.

DR 2 Prev <— Closed —> Next DR 4, or summary at top

DR 3 Prev <— Closed —> Next DR 1, or summary at top

**Submitter:** Jim Thomas et al.

**Submission Date:** 2016-03-19

**Source:** WG14

**Reference Document:** N2029

**Subject:** Part 3: Error in function name/p>

**Summary**

**Suggested Technical Corrigendum**

Apr 2016 meeting

**Committee Discussion**

**Proposed Technical Corrigendum**

Page 32: In 12.3, change “**scoshdNx**” to “**coshdNx**”.

DR 3 Prev <— Closed —> Next DR 1, or summary at top

DR 10 Prev <— Review —> Next DR 6, or summary at top

**Submitter:** Jim Thomas

**Submission Date:** 2016-09-10

**Source:** WG14

**Reference Document:** N2077

**Subject:** Part 1: Is return of same type convertFormat or copy?

**Summary**

**Summary**

This is about
the issue raised by Joseph Myers in email SC22WG14.14280:

TS 18661-1 says "Whether C assignment (6.5.16) (and conversion as if
by assignment) to the same format is an IEC 60559 convertFormat
or copy operation is implementation-defined, even if **<fenv.h>** defines the
macro

**FE_SNANS_ALWAYS_SIGNAL** (F.2.1).".

Does this
apply to function return, where the return type of the function is the same as
the type of the expression passed to the return statement and no wider
evaluation format is in use - that is, may this act as either convertFormat or copy? C11 F.6 clearly envisages that
such a return statement may do a conversion to the same type in the case of
wider evaluation formats. But 6.8.6.4#3 only refers to conversions
"If the expression has a type different from the return type of the
function in which it appears".

The specification, from F.3#3, quoted
above is incomplete in that it doesn’t cover function returns, which are not
assignments or conversions as if by assignment. As currently written, C11 +
TS18661-1 might be read to exclude the possibility of using convertFormat
in this case. A statement should be added to say that the implementation has
the option to apply convertFormat to the return value.
The change does not break existing implementations.

The effect of convertFormat
would be that signaling NaNs would signal and noncanonical representations would be canonicalized.
It is extremely unlikely that a program would depend on convertFormat
not being used.

**Suggested
Technical Corrigendum**

In Clause 8, to the text for C F.3#3:

[3] Whether C
assignment (6.5.16) (and conversion as if by assignment) to the same format is
an IEC 60559 convertFormat or copy operation is
implementation-defined, even if **<fenv.h>** defines the macro **FE_SNANS_ALWAYS_SIGNAL** (F.2.1).

append the sentence:

If the return
expression of a **return** statement is evaluated to the
floating-point format of the return type, it is implementation-defined whether
a convertFormat operation is applied to the result of
the return expression.”

At the end of Clause 8, add:

In F.3#3,
attach a footnote to the wording:

Whether C
assignment (6.5.16) (and conversion as if by assignment) to the same format is
an IEC 60559 convertFormat or copy operation

where the
footnote is:

*) Where the
source and destination formats are the same, convertFormat
operations differ from copy operations in that convertFormat
operations raise the “invalid” floating-point exception on signaling NaN inputs and do not propagate non-canonical encodings.

Oct 2016 meeting

**Committee Discussion**

**Proposed Technical Corrigendum**

In Clause 8, to the text for C F.3#3:

[3] Whether C
assignment (6.5.16) (and conversion as if by assignment) to the same format is
an IEC 60559 convertFormat or copy operation is
implementation-defined, even if **<fenv.h>** defines the macro **FE_SNANS_ALWAYS_SIGNAL** (F.2.1).

append the sentence:

If the return
expression of a **return** statement is evaluated to the
floating-point format of the return type, it is implementation-defined whether
a convertFormat operation is applied to the result of
the return expression.”

At the end of Clause 8, add:

In F.3#3,
attach a footnote to the wording:

Whether C
assignment (6.5.16) (and conversion as if by assignment) to the same format is
an IEC 60559 convertFormat or copy operation

where the
footnote is:

*) Where the
source and destination formats are the same, convertFormat
operations differ from copy operations in that convertFormat
operations raise the “invalid” floating-point exception on signaling NaN inputs and do not propagate non-canonical encodings.

DR 10 Prev <— Review —> Next DR 6, or summary at top

DR 5 Prev <— Review —> Next DR 7, or summary at top

**Submitter:** Jim Thomas

**Submission Date:** 2016-09-10

**Source:** WG14

**Reference Document:** N2077

**Subject:** Part 1: **fetestexceptflag** and exceptions passed to **fegetexceptflag**

**Summary**

This is about the issue raised by Joseph
Myers in email SC22WG14.14328:

TS 18661-1
says, for **fetestexceptflag**, "The value of ***flagp** shall have been set by a previous call to **fegetexceptflag**.".

This
contrasts with the C11 wording for **fesetexceptflag**, "The value of ***flagp** shall have been set by a previous call to **fegetexceptflag** whose second argument represented at least those floating-point
exceptions represented by the argument **excepts**.". So what happens if more exceptions are specified in the
call to **fetestexceptflag** than were specified in the call to **fegetexceptflag**? Then **fegetexceptflag** may or may not have stored any meaningful representation of the state of
the extra exceptions being tested.

I think **fetestexceptflag** should have
the same wording for this issue as **fesetexceptflag**: "whose second argument represented at least those floating-point
exceptions represented by the argument **excepts**".

**fesetexceptflag** sets global state, typically a hardware register, whereas **fetestexceptflag** just reads a
variable. It seems more important to avoid spurious data in the former.

Still, there’s no utility in testing
spurious flag settings, and placing the same restrictions on **fetestexceptflag** as on **fesetexceptflag** might be
less error prone.

**Suggested
Technical Corrigendum**

In 15.2, in the new text for C 7.6.2.4a#2,
change:

The value of ***flagp**** **shall have
been set by a previous call to **fegetexceptflag**.

to:

The value of ***flagp** shall have been set by a previous
call to **fegetexceptflag** whose second argument represented at least those floating-point
exceptions represented by the argument **excepts**.

Oct 2016 meeting

**Committee Discussion**

**Proposed Technical Corrigendum**

In 15.2, in the new text for C 7.6.2.4a#2,
change:

The value of ***flagp**** **shall have
been set by a previous call to **fegetexceptflag**.

to:

The value of ***flagp** shall have been set by a previous
call to **fegetexceptflag** whose second argument represented at least those floating-point
exceptions represented by the argument **excepts**.

DR 5 Prev <— Review —> Next DR 7, or summary at top

DR 6 Prev <— Review —> Next DR 8, or summary at top

**Submitter:** Jim Thomas

**Submission Date:** 2016-09-10

**Source:** WG14

**Reference Document:** N2077

**Subject:** Part 1: Editorial changes

**Summary**

**Summary**

In CFP email, Fred Tydeman
noted:

Searching for
"infinite precision" in part 1, most of them have "(as if)
to" before it. Except, **ffma**, **ffmal**, **dfmal** which is missing the "(as
if)".

Right. In particular, all the functions
that round result to narrower type have “(as if)”, except for the **fma** family.

**Suggested
Technical Corrigendum**

In 14.5, in the new text for C
7.12.13a.5#2, insert “(as if)” before “to infinite precision”.

Oct 2016 meeting

**Committee Discussion**

**Proposed Technical Corrigendum**

In 14.5, in the new text for C
7.12.13a.5#2, insert “(as if)” before “to infinite precision”.

DR 6 Prev <— Review —> Next DR 8, or summary at top

DR 7 Prev <— Review —> Next DR 10, or summary at top

**Submitter:** Jim Thomas

**Submission Date:** 2016-09-10

**Source:** WG14

**Reference Document:** N2077

**Subject:** Part 2: Editorial clarification about number digits in the coefficient

**Summary**

**Summary**

In
12.5, n is defined to be “the number of digits in the coefficient *c*”, where the decimal floating-point
argument is represented by the triple (*s*,
*c*, *q*). The intention is that *n*
is the number of digits in the coefficient of the particular argument, i.e.,
the number of significant digits, not the maximum number of digits in the
coefficient for the type. This might be misread, particularly since 5.2.4.2.2a says

¾
number of digits in the coefficient

**DEC32_MANT_DIG**** 7**

**DEC64_MANT_DIG**** 16**

**DEC128_MANT_DIG**** 34**

This part of 5.2.4.2.2a is in the context of
characterizing the type, so clearly refers to the type and not any particular
representation.

**Suggested
Technical Corrigendum**

In 12.5, change:

where *n* is the number of digits in the
coefficient *c*

to:

where
*n* is the number of significant digits
in the coefficient *c*

Oct 2016 meeting

**Committee Discussion**

**Proposed Technical Corrigendum**

In 12.5, change:

where *n* is the number of digits in the
coefficient *c*

to:

where
*n* is the number of significant digits
in the coefficient *c*

DR 7 Prev <— Review —> Next DR 10, or summary at top

DR 14 Prev <— Open —> Next DR 11, or summary at top

**Submitter:** Jim Thomas

**Submission Date:** 2016-09-10

**Source:** WG14

**Reference Document:** N2077

**Subject:** Part 2,3: Missing specification for usual arithmetic conversions, tgmath

**Summary**

This is about
the issue raised by Joseph Myers in email SC22WG14.14282:

C11 specifies that the usual arithmetic conversions on the pair of types
(**long
double**, **double**)
produces a result of type **long double**.

Suppose **long double** and **double** have the same set of values.
TS 18661-3 rewrites the rules for usual arithmetic conversions so that
the case "if both operands are floating types and the sets of values of
their corresponding real types are equivalent" prefers interchange types
to standard types to extended types. But this leaves the case of (**long double**, **double**) unspecified as to which type is
chosen, unlike in C11, as those are both standard types.

I think this
is a defect in TS 18661-3, and it should say that if both are standard types
with the same set of values then **long double** is preferred to **double** which is
preferred to **float**, as in C11.

A similar
issue could arise if two of the extended types have equivalent sets of values.
I'm not aware of anything to prohibit that, although it seems less likely
in practice. I think the natural fix would be to say that **_Float128x** is preferred to **_Float64x** which is
preferred to **_Float32x**.

I think such
an issue would also arise for **<tgmath.h>** (if **_Float64x** and **_Float128x** have the same set of values, the choice doesn't seem to be specified).
It also seems possible for the **<tgmath.h>** rules for purely floating-point arguments to produce a different result
from the usual arithmetic conversions (consider the case where **_Float32x** is wider than **long double**, and **<tgmath.h>** chooses **long double**), and since rules that are the same in most cases but subtly different
in obscure cases tend to be confusing, I wonder if it might be better to
specify much simpler rules for **<tgmath.h>**: take the type resulting from the usual arithmetic conversions[*], where
integer arguments are replaced by **_Decimal64** if there are any decimal arguments and **double** otherwise. (That's different from the present rules for e.g. (**_Float32x**, **int**), but it's a
lot simpler, and seems unlikely in practice to choose a type with a different
set of values from the present choice.)

[*]
Meaningful for more than two arguments as long as the usual arithmetic
conversions are commutative and associative as an operation on pairs of types.

Though substantive, the suggested change
to the usual arithmetic conversions is consistent with the intention in TS
18661-3 to specify all the cases (except where neither format is a subset of
the other and the formats are not the same). The missing cases were an
oversight. The suggested preferences of **long double** over **double** over **float** and **_Float128x** over **_Float64x** over **_Float32x** are the obvious choices.

Joseph Myers notes that the **<tgmath.h>**
specification is incomplete in the same way as the usual arithmetic conversions.
He argues for simplifying the specification by referring to the usual
arithmetic conversions specification, rather than mostly repeating it, as the
current specification does. The suggested Technical Corrigendum below follows
this new approach. Though a substantive change to TS 18661-3, the effects on
implementations and users are expected to be minimal – worth the
simplification.

The suggested Technical Corrigendum
below also restores footnote number 62, which is lost in the current TS
18661-3.

**Suggested
Technical Corrigendum**

In clause 8, change the replacement text
for 6.3.1.8#1:

If one
operand has decimal floating type, the other operand shall not have standard
floating type, binary floating type, complex type, or imaginary type.

If both
operands have floating types and neither of the sets of values of their
corresponding real types is a subset of (or equivalent to) the other, the
behavior is undefined.

Otherwise, if
both operands are floating types and the sets of values of their corresponding
real types are equivalent, then the following rules are applied:

If both
operands have the same corresponding real type, no further conversion is
needed.

Otherwise, if
the corresponding real type of either operand is an interchange floating type,
the other operand is converted, without change of type domain, to a type
whose corresponding real type is that same interchange floating type.

Otherwise, if
the corresponding real type of either operand is a standard floating type, the
other operand is converted, without change of type domain, to a type
whose corresponding real type is that same standard floating type.

Otherwise, if
both operands have floating types, the operand, whose set of values of its
corresponding real type is a (proper) subset of the set of values of the
corresponding real type of the other operand, is converted, without change of
type domain, to a type with the corresponding real type of that other operand.

Otherwise, if
one operand has a floating type, the other operand is converted to the
corresponding real type of the operand of floating type.

Otherwise,
the integer promotions are performed on both operands. Then the following rules
are applied to the promoted operands:

. . .

to:

If one
operand has decimal floating type, the other operand shall not have standard
floating type, binary floating type, complex type, or imaginary type.

If both
operands have floating types and neither of the sets of values of their
corresponding real types is a subset of (or equivalent to) the other, the
behavior is undefined.

If both
operands have the same corresponding real type, no further conversion is
needed.

Otherwise, if
both operands are floating types and the sets of values of their corresponding
real types are equivalent, then the following rules are applied:

If the
corresponding real type of either operand is an interchange floating type, the
other operand is converted, without change of type domain, to a type
whose corresponding real type is that same interchange floating type.

Otherwise, if
the corresponding real type of either operand is **long double**, the other operand is converted, without change of
type domain, to a type whose corresponding real type is **long double**.

Otherwise, if
the corresponding real type of either operand is **double**, the other operand is converted, without change of
type domain, to a type whose corresponding real type is **double**.

(All cases
where **float **might have the same format as
another type are covered above.)

Otherwise, if
the corresponding real type of either operand is **_Float128x** or **_Decimal128x**, the other operand is
converted, without change of type domain, to a type whose corresponding
real type is **_Float128x** or **_Decimal128x**, respectively.

Otherwise, if
the corresponding real type of either operand is **_Float64x** or **_Decimal64x**, the other operand is
converted, without change of type domain, to a type whose corresponding
real type is **_Float64x** or **_Decimal64x**, respectively.

Otherwise, if
both operands have floating types, the operand, whose set of values of its
corresponding real type is a (proper) subset of the set of values of the
corresponding real type of the other operand, is converted, without change of
type domain62), to a type with the corresponding real type of that other
operand.

Otherwise, if
one operand has a floating type, the other operand is converted to the
corresponding real type of the operand of floating type.

Otherwise,
the integer promotions are performed on both operands. Then the following rules
are applied to the promoted operands:

. . .

In clause 15, replace:

In 7.25#3c, replace the bullets:

… bullets …

with:

— If two arguments have floating types and neither of the
sets of values of their corresponding real types is a subset of (or equivalent
to) the other, the behavior is undefined.

— If any arguments for generic parameters have type **_Decimal M** where

— Otherwise, if any argument for generic parameters is of
integer type and another argument for generic parameters has type **_Decimal32**, the type determined is **_Decimal64**.

— Otherwise, if any argument for generic parameters has type **_Decimal32**, the type determined is **_Decimal32**.

— Otherwise, if the corresponding real type of any argument
for generic parameters has type **long double**, **_Float M** where

— Otherwise, if the corresponding real type of any argument
for generic parameters has type **double**, **_Float64**, or **_Float32x**, the type determined is the
widest of the corresponding real types of these arguments. If **_Float64** and either **double** or **_Float32x** are both widest
corresponding real types (with equivalent sets of values) of these arguments,
the type determined is **_Float64**. Otherwise, if **double** and **_Float32x** are both widest
corresponding real types (with equivalent sets of values) of these arguments,
the type determined is **double**.

— Otherwise, if any argument for generic parameters is of
integer type, the type determined is **double**.

— Otherwise, if the corresponding real type of any argument
for generic parameters has type **_Float32**, the type determined is **_Float32**.

— Otherwise, the type determined is **float**.

In the
second bullet 7.25#3c, attach a footnote to the wording:

the type
determined is the widest

where the
footnote is:

*) The
term widest here refers to a type whose set of values is a superset of (or
equivalent to) the sets of values of the other types.

with:

In 7.25#3c, replace the first sentence and bullets:

[3c] Except for the macros for functions that round result to a narrower type
(7.12.13a), use of a
type-generic macro invokes a function whose generic parameters have the
corresponding real type determined by the corresponding real types of the
arguments as follows:

— First, if
any argument for generic parameters has type **_Decimal128**, the type determined is **_Decimal128**.

— Otherwise,
if any argument for generic parameters has type **_Decimal64**, or if any argument for generic parameters is of integer type and
another argument for generic parameters has type **_Decimal32**, the type determined is **_Decimal64**.

— Otherwise,
if any argument for generic parameters has type **_Decimal32**, the type determined is **_Decimal32**.

— Otherwise,
if the corresponding real type of any argument for generic parameters is **long double**, the type determined is **long double**.

— Otherwise,
if the corresponding real type of any argument for generic parameters is **double** or is of integer type, the type determined is **double**.

— Otherwise,
if any argument for generic parameters is of integer type, the type determined
is **double**.

— Otherwise,
the type determined is **float**.

with:

[3c] Except
for the macros for functions that round result to a narrower type (7.12.13a), use of a type-generic macro invokes a function
whose generic parameters have the corresponding real
type determined by the types of the arguments for the generic
parameters as follows:

— Arguments
of integer type are regarded as having type **_Decimal64** if any argument has decimal floating type,
and as having type **double** otherwise.

— If
the function has exactly one generic parameter, the type determined is
the corresponding real type of the argument for the generic
parameter.

— If
the function has exactly two generic parameters, the type determined is
the corresponding real type determined by the usual arithmetic
conversions (6.3.1.8) applied to the arguments for the
generic parameters.

— If
the function has more than two generic parameters, the type determined is
the corresponding real type determined by repeatedly applying the usual
arithmetic conversions, first to the first two arguments for generic
parameters, then to that result type and the next argument for a generic
parameter, and so forth until the usual arithmetic conversions have been
applied to the last argument for a generic parameter.

Oct 2016 meeting

**Committee Discussion**

Apr 2017 meeting

**Committee Discussion**

The committee accepts the proposed modification as reflected below.The TC in DR 501 includes two changes to TS 18661-3, one for the usual arithmetic conversions, the other for type-generic math. The first change fills in missing conversions for new types in TS 18661-3. The second change simplifies type-generic math by referencing the usual arithmetic conversions, and thereby also fills in missing type-generic math rules for arguments of the new types.

This is a proposal for an alternative change to type-generic math. The original change was proposed for TS 18661-3, where the new types where introduced. However, the change can be made in TS 18661-2, where it is easier to understand and leads to a simplification in TS 18661-3.

**Proposed Technical Corrigendum**

In TS 18662-3

In clause 8, change the replacement text
for 6.3.1.8#1:

If one
operand has decimal floating type, the other operand shall not have standard
floating type, binary floating type, complex type, or imaginary type.

If both
operands have floating types and neither of the sets of values of their
corresponding real types is a subset of (or equivalent to) the other, the
behavior is undefined.

Otherwise, if
both operands are floating types and the sets of values of their corresponding
real types are equivalent, then the following rules are applied:

If both
operands have the same corresponding real type, no further conversion is
needed.

Otherwise, if
the corresponding real type of either operand is an interchange floating type,
the other operand is converted, without change of type domain, to a type
whose corresponding real type is that same interchange floating type.

Otherwise, if
the corresponding real type of either operand is a standard floating type, the
other operand is converted, without change of type domain, to a type
whose corresponding real type is that same standard floating type.

Otherwise, if
both operands have floating types, the operand, whose set of values of its
corresponding real type is a (proper) subset of the set of values of the
corresponding real type of the other operand, is converted, without change of
type domain, to a type with the corresponding real type of that other operand.

Otherwise, if
one operand has a floating type, the other operand is converted to the
corresponding real type of the operand of floating type.

Otherwise,
the integer promotions are performed on both operands. Then the following rules
are applied to the promoted operands:

. . .

to:

If both
operands have the same corresponding real type, no further conversion is
needed.

If the
corresponding real type of either operand is an interchange floating type, the
other operand is converted, without change of type domain, to a type
whose corresponding real type is that same interchange floating type.

Otherwise, if
the corresponding real type of either operand is **long double**, the other operand is converted, without change of
type domain, to a type whose corresponding real type is **long double**.

Otherwise, if
the corresponding real type of either operand is **double**, the other operand is converted, without change of
type domain, to a type whose corresponding real type is **double**.

(All cases
where **float **might have the same format as
another type are covered above.)

Otherwise, if
the corresponding real type of either operand is **_Float128x** or **_Decimal128x**, the other operand is
converted, without change of type domain, to a type whose corresponding
real type is **_Float128x** or **_Decimal128x**, respectively.

Otherwise, if
the corresponding real type of either operand is **_Float64x** or **_Decimal64x**, the other operand is
converted, without change of type domain, to a type whose corresponding
real type is **_Float64x** or **_Decimal64x**, respectively.

Otherwise, if
both operands have floating types, the operand, whose set of values of its
corresponding real type is a (proper) subset of the set of values of the
corresponding real type of the other operand, is converted, without change of
type domain62), to a type with the corresponding real type of that other
operand.

. . .

In TS 18661-2

In 12.9, change the introduced [3c] from:

[3c] Except for the macros for functions that round result to a narrower type
(7.12.13a), use of a
type-generic macro invokes a function whose generic parameters have the
corresponding real type determined by the corresponding real types of the
arguments as follows:

— First, if any argument for
generic parameters has type **_Decimal128**, the type determined is **_Decimal128**.

— Otherwise, if any argument for
generic parameters has type **_Decimal64**, or if any argument for generic
parameters is of integer type and another argument for generic parameters has
type **_Decimal32**, the type determined is **_Decimal64**.

— Otherwise, if any argument for
generic parameters has type **_Decimal32**, the type determined is **_Decimal32**.

— Otherwise, if the corresponding
real type of any argument for generic parameters is **long double**, the type determined is **long double**.

— Otherwise, if the corresponding
real type of any argument for generic parameters is **double** or is of integer type, the type determined
is **double**.

— Otherwise, if any argument for
generic parameters is of integer type, the type determined is **double**.

— Otherwise, the type determined is
**float**.

to:

[3c] Except for the macros for functions
that round result to a narrower type (7.12.13a), use of a
type-generic macro invokes a function whose generic parameters have
the corresponding real type determined by the types of the arguments for
the generic parameters as follows:

— Arguments of
integer type are regarded as having type **_Decimal64** if any
argument has decimal floating type, and as having type **double**** **otherwise.

— If the
function has exactly one generic parameter, the type determined
is the corresponding real type of the argument for
the generic parameter.

— If the
function has exactly two generic parameters, the type determined
is the corresponding real type determined by the
usual arithmetic conversions (6.3.1.8) applied to the arguments for
the generic parameters.

— If the
function has more than two generic parameters, the type determined
is the corresponding real type determined by repeatedly applying
the usual arithmetic conversions, first to the first two arguments
for generic parameters, then to that result type and the next
argument for a generic parameter, and so forth until the usual arithmetic
conversions have been applied to the last argument for a generic
parameter.

DR 14 Prev <— Open —> Next DR 11, or summary at top

DR 8 Prev <— Review —> Next DR 5, or summary at top

**Submitter:** Jim Thomas

**Submission Date:** 2016-09-10

**Source:** WG14

**Reference Document:** N2077

**Subject:** Part 1: wrong type for **fesetmode** parameter

**Summary**

This is about the issue raised by Joseph
Myers in email SC22WG14.14358:

TS 18661-1
gives the declaration of **fesetmode** as:

**int**** fesetmode(const fenv_t *modep);**

The argument
should be of type **const**** femode_t ***, not **const**** fenv_t ***.

--

This was an editorial cut-and-past
error. The Description says the argument **modep** shall point to an objet set by a call to **fegetmode**, which sets objects of type **femode_t**. It’s unlikely the function would be implemented with the erroneous
type.

**Suggested
Technical Corrigendum**

In 15.3, in the new text for C
7.6.3.1a#1, change:

** int fesetmode(const
fenv_t *modep);**

to:

** int fesetmode(const
femode_t *modep);**

Oct 2016 meeting

**Committee Discussion**

**Proposed Technical Corrigendum**

In 15.3, in the new text for C
7.6.3.1a#1, change:

** int fesetmode(const
fenv_t *modep);**

to:

** int fesetmode(const
femode_t *modep);**

DR 8 Prev <— Review —> Next DR 5, or summary at top

DR 9 Prev <— Open —> Next DR 12, or summary at top

**Submitter:** Jim Thomas

**Submission Date:** 2016-09-10

**Source:** WG14

**Reference Document:** N2077

**Subject:** Part 2: a-style formatting not IEC 60559 conformant

**Summary**

The **a**-style
formatting specified in subclause 12.5 of TS 18661-2
is not an IEC 60559 conversion for cases where the formatting precision is less
than the length of the coefficient of the input. The specification entails an
intermediate rounding to the floating type of the input, which might overflow resulting
in a character sequence representation of infinity. IEC 60559 conversions to
character sequences do not overflow, unless the language over-restricts the
exponent range for character sequence output, which C does not.

Another undesirable aspect of the
current specification is that in certain cases it produces results with more
precision than given by a width modifier.

Here are some examples, showing the
result of the intermediate conversion, with different behaviors for the current
spec (“old”) and the spec in the suggested Technical Corrigendum below (“new”):

For **_Decimal32** input *x* with representation
(1, 9512345, 90) and specifier ...

**%.3Ha**

old: *x* -> (1,
9510000, 90) -> **9.510000e96**

new: *x* -> (1,
951, 94) -> **9.51e96**

**%.2Ha**

old: *x* -> (1,
9500000, 90) -> **9.500000e96**

new: *x* -> (1,
95, 95) -> **9.5e96**

**%.1Ha**

old: *x* -> Inf -> **inf**

new: *x* -> (1,
1, 97) -> **1e97**

Here’s another example:

For **_Decimal32** input x with representation (1, 9512345, 86) and specifier
...

**%.2Ha**

old: *x* -> (1,
950, 90) -> **9.50e92**

new: *x* -> (1,
95, 91) -> **9.5e92**

The examples use a to-nearest rounding.

As the examples illustrate, the problematic
cases for the current “old” spec occur because of the exponent range limitation
of the format used for the intermediate conversion.

The suggested Technical Corrigendum
below specifies formatting that is IEC 60559 conformant and which honors a
width modifier. It does not change the numerical value of the result, except in
overflow cases.

**Suggested
Technical Corrigendum**

In 12.5, in the addition to 7.21.6.1#8 and 7.29.2.1#8,
under **a**,**A**
conversion specifiers, change:

If the precision is present (in the conversion
specification) and is zero or at least as large as the precision *p* (5.2.4.2.2) of the decimal floating
type, the conversion is as if the precision were missing. If the precision is
present (and nonzero) and less than the precision *p* of the decimal floating type, the conversion first obtains an
intermediate result by rounding the input in the type, according to the current
rounding direction for decimal floating-point operations, to the number of
digits specified by the precision, then converts the intermediate result as if
the precision were missing. The length of the coefficient of the intermediate
result is the smallest number, at least as large as the formatting precision,
for which the quantum exponent is within the quantum exponent range of the type
(see 5.2.4.2.2a). The intermediate rounding may overflow.

to:

If the
precision *P* is present (in the
conversion specification) and is zero or at least as large as the precision *p* (5.2.4.2.2) of the decimal floating
type, the conversion is as if the precision were missing. If the precision
*P* is present (and nonzero) and less
than the precision *p* of the decimal
floating type, the conversion first obtains an intermediate result as follows,
where *n* is the number of significant
digits in the coefficient:

If *n* <= *P*, set the intermediate result to the input.

If *n* > *P*, round the input value, according to the current rounding
direction for decimal floating-point operations, to *P* decimal digits, with unbounded exponent range, representing the
result with a *P*-digit integer
coefficient when in the form (*s*, *c*, *q*).

Convert the
intermediate result in the manner described above for the case where the
precision is missing.

In 12.5, in the addition to 7.21.6.1#8 and 7.29.2.1#8, in
EXAMPLE 3, change the results:

9.54321e+93

9.5432e+93

9.543e+93

9.540e+93

9.500e+93

1.0000e+94

inf

to:

9.54321e+93

9.5432e+93

9.543e+93

9.54e+93

9.5e+93

1e+94

1e+97

Oct 2016 meeting

**Committee Discussion**

Apr 2017 meeting

**Committee Discussion**

However, the committee is concerned that `%a` behavior differs from binary floating point and more review is needed. In particular, there were concerns that for the decimal floating point types now the %a format specifier given with a precision is the total number of significant digits, not the number of digits after the decimal point as it has been for other data types.

**Proposed Technical Corrigendum**

In 12.5, in the addition to 7.21.6.1#8 and 7.29.2.1#8,
under **a**,**A**
conversion specifiers, change:

If the precision is present (in the conversion
specification) and is zero or at least as large as the precision *p* (5.2.4.2.2) of the decimal floating
type, the conversion is as if the precision were missing. If the precision is
present (and nonzero) and less than the precision *p* of the decimal floating type, the conversion first obtains an
intermediate result by rounding the input in the type, according to the current
rounding direction for decimal floating-point operations, to the number of
digits specified by the precision, then converts the intermediate result as if
the precision were missing. The length of the coefficient of the intermediate
result is the smallest number, at least as large as the formatting precision,
for which the quantum exponent is within the quantum exponent range of the type
(see 5.2.4.2.2a). The intermediate rounding may overflow.

to:

If the
precision *P* is present (in the
conversion specification) and is zero or at least as large as the precision *p* (5.2.4.2.2) of the decimal floating
type, the conversion is as if the precision were missing. If the precision
*P* is present (and nonzero) and less
than the precision *p* of the decimal
floating type, the conversion first obtains an intermediate result as follows,
where *n* is the number of significant
digits in the coefficient:

If *n* <= *P*, set the intermediate result to the input.

If *n* > *P*, round the input value, according to the current rounding
direction for decimal floating-point operations, to *P* decimal digits, with unbounded exponent range, representing the
result with a *P*-digit integer
coefficient when in the form (*s*, *c*, *q*).

Convert the
intermediate result in the manner described above for the case where the
precision is missing.

In 12.5, in the addition to 7.21.6.1#8 and 7.29.2.1#8, in
EXAMPLE 3, change the results:

9.54321e+93

9.5432e+93

9.543e+93

9.540e+93

9.500e+93

1.0000e+94

inf

to:

9.54321e+93

9.5432e+93

9.543e+93

9.54e+93

9.5e+93

1e+94

1e+97

Add, as a new EXAMPLE,

#include <stdio.h> int main(void) { _Decimal32 x = 9512345e90df; _Decimal32 x2 = 9512345e86df; printf("%.3Ha\n", x); // New expected output: 9.51e96 printf("%.2Ha\n", x); // New expected output: 9.5e96 printf("%.1Ha\n", x); // New expected output: 1e97 printf("%.2Ha\n", x2); // New expected output: 9.5e92 return 0; }

DR 9 Prev <— Open —> Next DR 12, or summary at top

DR 11 Prev <— Open —> Next DR 13, or summary at top

**Submitter:** Jim Thomas

**Submission Date:** 2017-03-04

**Source:** WG14

**Reference Document:** N2125

**Subject:** P1: Zero payloads and `set payload` function

**Summary**

This is about an issue raised
by Joseph Myers in SC22WG14.14450:

The
specification for **setpayload** (and likewise **setpayloadsig**)
says "If **pl** is not a positive floating-point integer representing
a valid payload, ***res** is set to positive zero."

Does
"positive" as applied to "floating-point integer" here mean
"with sign bit 0" (the list of definitions in IEEE 754 doesn't
include "positive")? In the preferred encodings for binary
interchange formats, 0 is a valid payload for quiet NaNs.
So should +0.0 as an argument to **setpayload**
result in a quiet NaN with payload 0, while -0.0
results in ***res** being set to +0.0 because -0.0 isn't positive (and
for **setpayloadsig**, both result in ***res** set to +0.0 because a payload for a signaling NaN has to be nonzero to avoid all mantissa bits being
zero)?

A “positive floating-point integer” is a positive
integer in the floating-point format, hence it is
greater than zero. So, the current specification for **setpayload** and **setpayloadsig** is flawed in that it doesn’t
allow setting the payload to zero.

A more basic problem is that TS 18661-1 assumes IEC
60559 interprets payloads as integers. This is true for decimal formats. IEC
60559 says:

The
payload corresponds to the significand of finite
numbers, interpreted as an integer with a maximum value of 10^(3×J)−1, …

The significand c
interpreted as an integer is assumed throughout to be non-negative, while the *s* field in (*s*, *q*, *c*) provides the sign. For decimal,
interpreting the bits in the encodings allows the two encoding schemes to have
the same payloads and the payloads to fit conceptually with their encoding
schemes.

However, for binary formats, IEC 60559 says:

For
binary formats, the payload is encoded in the *p*−2 least significant bits of the trailing significand field.

Nowhere does it actually interpret the payload for
binary formats as an integer.

However, the payload for binary formats has a natural
interpretation as an unsigned integer, so it is reasonable for TS 1866-1 to
interpret payloads (for binary and decimal formats) as such.

The suggested Technical Corrigendum below addresses
these problems.

**Suggested Technical Corrigendum**

In 14.10, replace the first
sentence:

IEC 60559 defines the payload of a NaN to be a certain part of the NaN’s
significand interpreted as an integer.

with:

IEC 60559 defines the payload of a NaN to be a certain part of the NaN’s
significand. The payload can be interpreted as an
unsigned integer.

In 14.10, in the new C subclause F.10.13, replace:

IEC 60559 defines the *payload* of a quiet or signaling NaN as an integer value encoded in the significand.

with:

IEC 60559 defines the *payload* of a quiet or signaling NaN as information encoded in part of the NaN significand. The payload can
be interpreted as an unsigned integer.

In 14.10, in the new C subclauses F.10.13.2#2 and F.10.13.3#2, change:

If **pl** is not a positive
floating-point integer representing a valid payload, ***res** is set to positive
zero.

to:

If **pl** is not a floating-point
integer representing a valid payload, ***res** is set to positive zero.

Apr 2017 meeting

**Committee Discussion**

The committee agrees that this is a defect and accepts the Suggested Technical Corrigendum

**Proposed Technical Corrigendum**

In 14.10, replace the first
sentence:

IEC 60559 defines the payload of a NaN to be a certain part of the NaN’s
significand interpreted as an integer.

with:

IEC 60559 defines the payload of a NaN to be a certain part of the NaN’s
significand. The payload can be interpreted as an
unsigned integer.

In 14.10, in the new C subclause F.10.13, replace:

IEC 60559 defines the *payload* of a quiet or signaling NaN as an integer value encoded in the significand.

with:

IEC 60559 defines the *payload* of a quiet or signaling NaN as information encoded in part of the NaN significand. The payload can
be interpreted as an unsigned integer.

In 14.10, in the new C subclauses F.10.13.2#2 and F.10.13.3#2, change:

If **pl** is not a positive
floating-point integer representing a valid payload, ***res** is set to positive
zero.

to:

If **pl** is not a floating-point
integer representing a valid payload, ***res** is set to positive zero.

DR 11 Prev <— Open —> Next DR 13, or summary at top

DR 12 Prev <— Open —> Next DR 14, or summary at top

**Submitter:** Jim Thomas

**Submission Date:** 2017-03-04

**Source:** WG14

**Reference Document:** N2125

**Subject:** P3: Type-generic macros for functions that round result to narrower type

**Summary**

This
is about an issue raised by Joseph Myers in SC22WG14.14561:

TS 18661-1 and -2 define type-generic macros for the
functions that round

result to a narrower type. In part 1 these are, for
example, fadd and

dadd for addition; in part 2, for example, d32add and
d64add.

Part
3 does not seem to make any changes or additions to those macros, and

consequences
of that seem nonobvious. It defines new functions for the

new
types: fMaddfN, fMaddfNx, fMxaddfN, fMxaddfNx (where M <
N, or M <= N

in the fMaddfNx case), and likewise for decimal types. But
the

type-generic
macros remain as defined in 7.25#6a after the changes from

parts 1
and 2 are applied (part 3 does not contain the string "6a").

That
is, it's valid to pass the _FloatN and _FloatNx types to the fadd and

dadd macros, and valid to pass the new _DecimalN and _DecimalNx types
from

part 3 to
the d32add and d64add types.

(a)
7.25#6a says "If the macro prefix is d32 or d64,
use of an argument of

standard
floating type results in undefined behavior.". Other places get

amended
in part 3 to say "floating type of radix 2" in addition to

"standard floating type". But it appears it fails
to make it undefined to

pass _FloatN or _FloatNx arguments to
d32add, d64add etc. type-generic

macros -
although clearly it should be undefined.

(b)
Passing _Decimal128 to d32add would result in the d32addd128 function

being
called, as expected. But say you pass a _Decimal128x argument. A

function
d32addd128x exists but the specification would seem to result in

d32addd64
being called, which seems unintuitive. Similar issues apply

with _FloatN and _FloatNx types -
calling fadd on them would always call

the fadd function not faddl.
(But in that case there *are* no functions

defined
that take _FloatN / _FloatNx
arguments and return float or double.

So
the right thing to do is less obvious.)

The following addresses these
issues by filling in the missing specification in part 3.

**Suggested
Technical Corrigendum**

In clause 15, after the change
to 7.25#6, add:

Change 7.25#6a
from:

[6a] The
functions that round result to a narrower type have type-generic macros whose
names are obtained by omitting any suffix from the function names. Thus, the
macros with **f** or **d** prefix are:

**fadd****
fmul
ffma**

**dadd****
dmul
dfma**

**fsub****
fdiv
fsqrt**

**dsub****
ddiv
dsqrt**

and the macros
with **d32** or **d64** prefix are:

**d32add****
d32mul
d32fma**

**d64add****
d64mul
d64fma**

**d32sub****
d32div
d32sqrt**

**d64sub****
d64div
d64sqrt**

All arguments are generic. If
any argument is not real, use of the macro results in undefined behavior. If
the macro prefix is **f** or **d**, use of an argument of decimal
floating type results in undefined behavior. If the macro prefix is **d32** or **d64**, use of an
argument of standard floating type results in undefined behavior. The function
invoked is determined as follows:

—
If any argument has type **_Decimal128**, or if the
macro prefix is **d64**, the function
invoked has the name of the macro, with a **d128** suffix.

—
Otherwise, if the macro prefix is **d32**, the function
invoked has the name of the macro, with a **d64** suffix.

—
Otherwise, if any argument has type **long double**, or if the macro prefix is **d**, the function invoked has the name of
the macro, with an **l** suffix.

—
Otherwise, the function invoked has the name of the macro
(with no suffix).

to:

[6a] The functions that round
result to a narrower type have type-generic macros whose names are obtained by
omitting any suffix from the function names. Thus, the macros with **f****
**or
**d** prefix are:

**fadd****
fmul
ffma**

**dadd****
dmul
dfma**

**fsub****
fdiv
fsqrt**

**dsub****
ddiv
dsqrt**

and the macros
with **f***M*, **f***M***x**, **d***M*, or **d***M***x** prefix are:

**f***M***add****
f Mxmul
dMfma**

**f***M***sub****
f Mxdiv
dMsqrt**

**f***M***mul****
f Mxfma dMxadd**

**f***M***div****
f Mxsqrt dMxsub**

**f***M***fma****
d Madd
dMxmul**

**f***M***sqrt****
d Msub
dMxdiv**

**f***M***xadd****
d Mmul
dMxfma**

**f***M***xsub****
d Mdiv
dMxsqrt**

All arguments are generic. If
any argument is not real, use of the macro results in undefined behavior. If
the macro prefix is **f**, **d**, **f***M*, or **f***M***x**, use of an argument of decimal
floating type results in undefined behavior. If the macro prefix is **d**M or **d***M***x**, use of an argument of standard or
binary floating type results in undefined behavior. The function invoked is
determined as follows:

— Arguments that have integer
type are regarded as having type **_Decimal64** if any
argument has decimal floating type, and as having type **double** otherwise.

— The unsuffixed
name of the function is the name of the macro, and its suffix, if any,
corresponds to the parameter type which may be any type with at least the range
and precision of the argument types.

In clause 15, at the end of the text appended to the table
in 7.25#7, further append:

**f32xadd****(d, f32x)** any
**f32xaddf***N* or **f32xaddf***N***x** such that *N* > 32 and the suffix type, **_Float N** or

Apr 2017 meeting

**Committee Discussion**

The committee agrees that this is a defect and accepts the Suggested Technical Corrigendum

**Proposed Technical Corrigendum**

In clause 15, after the change
to 7.25#6, add:

Change 7.25#6a
from:

[6a] The
functions that round result to a narrower type have type-generic macros whose
names are obtained by omitting any suffix from the function names. Thus, the
macros with **f** or **d** prefix are:

**fadd****
fmul
ffma**

**dadd****
dmul
dfma**

**fsub****
fdiv
fsqrt**

**dsub****
ddiv
dsqrt**

and the macros
with **d32** or **d64** prefix are:

**d32add****
d32mul
d32fma**

**d64add****
d64mul
d64fma**

**d32sub****
d32div
d32sqrt**

**d64sub****
d64div
d64sqrt**

All arguments are generic. If
any argument is not real, use of the macro results in undefined behavior. If
the macro prefix is **f** or **d**, use of an argument of decimal
floating type results in undefined behavior. If the macro prefix is **d32** or **d64**, use of an
argument of standard floating type results in undefined behavior. The function
invoked is determined as follows:

—
If any argument has type **_Decimal128**, or if the
macro prefix is **d64**, the function
invoked has the name of the macro, with a **d128** suffix.

—
Otherwise, if the macro prefix is **d32**, the function
invoked has the name of the macro, with a **d64** suffix.

—
Otherwise, if any argument has type **long double**, or if the macro prefix is **d**, the function invoked has the name of
the macro, with an **l** suffix.

—
Otherwise, the function invoked has the name of the macro
(with no suffix).

to:

**f****
**or
**d** prefix are:

**fadd****
fmul
ffma**

**dadd****
dmul
dfma**

**fsub****
fdiv
fsqrt**

**dsub****
ddiv
dsqrt**

and the macros
with **f***M*, **f***M***x**, **d***M*, or **d***M***x** prefix are:

**f***M***add****
f Mxmul
dMfma**

**f***M***sub****
f Mxdiv
dMsqrt**

**f***M***mul****
f Mxfma dMxadd**

**f***M***div****
f Mxsqrt dMxsub**

**f***M***fma****
d Madd
dMxmul**

**f***M***sqrt****
d Msub
dMxdiv**

**f***M***xadd****
d Mmul
dMxfma**

**f***M***xsub****
d Mdiv
dMxsqrt**

All arguments are generic. If
any argument is not real, use of the macro results in undefined behavior. If
the macro prefix is **f**, **d**, **f***M*, or **f***M***x**, use of an argument of decimal
floating type results in undefined behavior. If the macro prefix is **d**M or **d***M***x**, use of an argument of standard or
binary floating type results in undefined behavior. The function invoked is
determined as follows:

— Arguments that have integer
type are regarded as having type **_Decimal64** if any
argument has decimal floating type, and as having type **double** otherwise.

— The unsuffixed
name of the function is the name of the macro, and its suffix, if any,
corresponds to the parameter type which may be any type with at least the range
and precision of the argument types.

In clause 15, at the end of the text appended to the table
in 7.25#7, further append:

**f32xadd****(d, f32x)** any
**f32xaddf***N* or **f32xaddf***N***x** such that *N* > 32 and the suffix type, **_Float N** or

DR 12 Prev <— Open —> Next DR 14, or summary at top

DR 13 Prev <— Open —> Next DR 9, or summary at top

**Submitter:** Jim Thomas

**Submission Date:** 2017-03-04

**Source:** WG14

**Reference Document:** N2125

**Subject:** P2: Effect of `%a` vs `%A` conversion specifiers

**Summary**

The
specification in TS 18661-2 for **a**,**A** conversion specifiers for
decimal describes the behavior in terms of
**f**
and **e**
formatting. The intention was that the **A** conversion specifier would
have the effects of **F** and **E** formatting. The following Technical Corrigendum
corrects this oversight, using wording similar to that in C11 for the **g**,**G** conversion specifiers.

**Suggested Technical Corrigendum**

In 12.5, in the text added to 7.21.6.1#8 and
7.29.2.1#8, under **a**,**A**
conversion specifiers, replace the bullets:

— if −(*n*+5) ≤ *q* ≤ 0, use style **f** formatting with formatting precision equal to −*q*,

— otherwise,
use style **e** formatting with
…

with:

— if −(*n*+5) ≤ *q* ≤ 0, use style **f** (or style **F** in
the case of an **A** conversion specifier) with formatting precision equal to −*q*,

— otherwise,
use style **e** (or style **E** in the case of an **A**
conversion specifier) with …

Apr 2017 meeting

**Committee Discussion**

The committee agrees that this is a defect and accepts the Suggested Technical Corrigendum

**Proposed Technical Corrigendum**

In 12.5, in the text added to 7.21.6.1#8 and
7.29.2.1#8, under **a**,**A**
conversion specifiers, replace the bullets:

— if −(*n*+5) ≤ *q* ≤ 0, use style **f** formatting with formatting precision equal to −*q*,

— otherwise,
use style **e** formatting with
…

with:

— if −(*n*+5) ≤ *q* ≤ 0, use style **f** (or style **F** in
the case of an **A** conversion specifier) with formatting precision equal to −*q*,

— otherwise,
use style **e** (or style **E** in the case of an **A**
conversion specifier) with …

DR 13 Prev <— Open —> Next DR 9, or summary at top