ISO/IEC DTR nnnnn

WG14 N1016

Working Draft 2

Programming languages, their environments and system interfaces - Extension for the programming language C to support decimal floating-point arithmetic

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Table of Contents

1 Introduction
1.1 Background
1.2 The Arithmetic Model
1.3 The Encodings

2 General
2.1 Scope
2.2 References

3 Decimal floating types

4 Characteristics of decimal floating types <decfloat.h>

5 Conversions
5.1 Conversions between decimal floating and integer
5.2 Conversions among decimal floating types, and between decimal float types and non-decimal floating types
5.3 Conversions between decimal floating and complex
5.4 Usual arithmetic conversions
5.5 Default argument promotion

6 Constants

7 Floating-point environment <fenv.h>

8 Arithmetic operations
8.1 Operators
8.2 Functions
8.3 Conversions

9 Library
9.1 Decimal mathematics <math.h>
9.2 New functions
9.2.1 divide_integer functions
9.2.2 remainder_near functions
9.2.3 quantizie functions
9.2.4 round_to_integer functions
9.2.5 normalize functions
9.3 Formatted input/output specifiers
9.4 strtod32, strtod64, and strtod128 functions <stdlib.h>
9.5 wcstod32, wcstod64, and wcstod128 functions <wchar.h>
9.6 Type-generic macros <tgmath.h>
 
 

1 Introduction

1.1 Background

Most of today's general purpose computing architectures provide binary floating-point arithmetic in hardware. Binary floating-point is an efficient representation which minimizes memory use and which is simpler to implement than floating-point arithmetic using other bases. It has therefore become the norm for scientific computations, with almost all implementations following the IEEE-754 standard for binary floating-point arithmetic.

However, human computation and communication of numeric values almost always uses decimal arithmetic and decimal notations. Laboratory notes, scientific papers, legal documents, business reports and financial statements all record numeric values in decimal form. When numeric data are given to a program or are displayed to a user, binary to-and-from decimal conversion is required. There are inherent rounding errors involved in such conversions; decimal fractions cannot, in general, be represented exactly by binary floating-point values. These errors often cause usability and efficiency problems, depending on the application.

These problems are minor when the application domain accepts, or requires results to have, associated error estimates (as is the case with scientific applications). However, in business and financial applications, computations are either required to be exact (with no rounding errors) unless explicitly rounded, or be supported by detailed analyses that are auditable to be correct. Such applications therefore have to take special care in handling any rounding errors introduced by the computations.

The most efficient way to avoid conversion error is to use decimal arithmetic. Currently, the IBM z-architecture (and its predecessors since System/360) is a widely used system that supports builtin decimal arithmetic. This, however, provides integer arithmetic only, meaning that every number and computation has to have separate scale information preserved and computed in order to maintain the required precision and value range. Such scaling is difficult to code and is error-prone; it affects execution time significantly, and the resulting program is often difficult to maintain and enhance.

Even though the hardware may not provide decimal arithmetic operations, the support can still be emulated by software. Programming languages used for business applications either have native decimal types (such as PL/I, COBOL, C#, or Visual Basic) or provide decimal arithmetic libraries (such as the BigDecimal class in Java). The arithmetic used, nowadays, is almost invariably decimal floating-point; the COBOL 2002 ISO standard, for example, requires that all standard decimal arithmetic calculations use 32-digit decimal floating-point.

At present, all languages use software for decimal arithmetic. Even the best packages are slow, and can be 100 times slower than a corresponding hardware implementation, and in some cases much slower. At least one processor manufacturer, therefore, is adding decimal floating-point in hardware.

Arguably, the C language hits a sweet spot within the wide range of programming languages available today – it strikes an optimal balance between usability and performance. Its simple and expressive syntax makes it easy to program; and its close-to-the-hardware semantics makes it efficient. Despite the advent of newer programming languages, C is still often used together with other languages to code the computationally intensive part of an application. In many cases, entire business applications are written in C/C++. To maintain the vitality of C, the need for decimal arithmetic by the business and financial community cannot be ignored.

The importance of this has been recognized by the IEEE. The IEEE 754 standard is currently being revised, and the major change in that revision is the addition of decimal floating-point formats and arithmetic. These decimal data types are almost as efficient as the binary types, and are especially suitable for hardware implementation; it is possible that they will become the most widely used primitive data types once hardware implementations are available.

Historically there has been a close tie between IEEE-754 and C with respect to floating-point specification. With the revised IEEE-754 nearing the final approval stage, it is now the appropriate time for C to consider adding decimal types and arithmetic to its specification.
 
 
 

1.2 The Arithmetic Model

The proposal of this Technical Report is based on a model of decimal arithmetic (2.2.4) which is a formalization of the decimal system of numeration (Algorism) as further defined and constrained by the relevant standards, IEEE-854, ANSI X3-274, and the proposed revision of IEEE-754. The latter is also known as IEEE-754R.

There are three components to the model:

The model defines these components in the abstract. It neither defines the way in which operations are expressed (which might vary depending on the computer language or other interface being used), nor does it define the concrete representation (specific layout in storage, or in a processor's register, for example) of numbers or context.

From the perspective of the C language, numbers are represented by data types, operations are defined within expressions, and context is the floating environment specified in fenv.h. This Technical Report specifies how the C language implements these components.
 

Note: A description of the arithmetic model can be found in http://www2.hursley.ibm.com/decimal/decarith.html.
 
 

1.3 The Encodings

Based on the arithmetic model, encodings have been proposed to support the general purpose floating-point decimal arithmetic described in the Decimal Arithmetic Specification. The encodings are the product of discussions by a subcommittee of the IEEE committee IEEE-754R which is currently revising the IEEE 754-1985 and IEEE 854-1987 standards.

Note: A description of the encodings can be found in http://www2.hursley.ibm.com/decimal/decbits.html.

C99 specifies floating-point arithmetic using a two-layer organization. The first layer provides a specification using an abstract model. The representation of floating-point number is specified in an abstract form where the constituent components of the representation is defined (sign, exponent, significand) but not the internals of these components. In particular, the exponent range, significand size and the base (or radix), are implementation defined. This allows flexibility for an implementation to take advantage of its underlying hardware architecture. Furthermore, certain behaviors of operations are also implementation defined, for example in the area of handling of special numbers and in exceptions.

The reason for this approach is historical. At the time when C was first standardized, there were already various hardware implementations of floating-point arithmetic in common use. Specifying the exact details of a representation would make most of the existing implementations at the time not conforming.

C99 provides a binding to IEEE-754 by specifying an annex F and adopting that standard by reference. An implementation not conforming to IEEE-754 can choose to do so by not defining the macro __STDC_IEC_559__. This means not all implementations need to support IEEE-754, and the floating-point arithmetic need not be binary.

This Technical Report specifies decimal floating-point arithmetic according to the IEEE-754R, with the constituent components of the representation defined. This is more stringent than the existing C99 approach for the floating types. Since it is expected that all decimal floating-point hardware implementations will conform to the revised IEEE 754, binding to this standard directly benefits both implementators and programmers.
 

2 General

2.1 Scope

This Technical Report specifies an extension to the programming language C, specified by the international standard ISO/IEC 9899:1999. The extension provides support for decimal floating-point arithmetic that is consistent with the specification in IEEE-754R.

This Technical Report does not specify binary floating-point arithmetic.
 

2.2 References

The following standards contain provisions which, through reference in this text, constitute provisions of this Technical Report. For dated references, subsequent amendment to, or revisions of, any of these publications do not apply. However, parties to agreements based on this Technical Report are encouraged to investigate the possibility of applying the most recent editions of the normative documents indicated below. For undated references, the latest edition of the normative document referred applies. Members of IEC and ISO maintain registers of current valid International Standards.

2.2.1 ISO/IEC 9899:1999, Information technology - Programming languages, their environments and system software interfaces - Programming Language C.

2.2.1.1 ISO/IEC 9899:1999, Technical Corrigendum 1 to Programming Language C.

2.2.2 ANSI/IEEE 754-1985 - IEEE Standard for Binary Floating-Point Arithmetic. The Institute of Electrical and Electronic Engineers, Inc., New York, 1985.

2.2.2.1 The IEEE 754 revision working group is currently revising the specification for floating-point arithmetic:

ANSI/IEEE 754R - IEEE Standard for Floating-Point Arithmetic. The Institute of Electrical and Electronic Engineers, Inc. Draft.
 

2.2.3 ANSI/IEEE 854-1987 - IEEE Standard for Radix-Independent Floating-Point Arithmetic. The Institute of Electrical and Electronic Engineers, Inc., New York, 1987.

2.2.4 A Decimal Floating-Point Specification, Schwarz, Cowlishaw, Smith, and Webb, in the Proceedings of the 15th IEEE Symposium on Computer Arithmetic (Arith 15), IEEE, June 2001.
 

Note: Reference materials relating to IEEE-754R can be found in http://grouper.ieee.org/groups/754/ and http://www.validlab.com/754R/.
 
 

3 Decimal floating types

This Technical Report introduces three decimal floating types, designated as _Decimal32, _Decimal64 and _Decimal128. The set of values of type _Decimal32 is a subset of the set of values of the type _Decimal64; the set of values of the type _Decimal64 is a subset of the set of values of the type _Decimal128. Support for _Decimal128 is optional.

A single token is used as a type name to make it easy for C++ to implement the types as classes.

Within the type hierarchy, decimal floating types are base types, real types and arithmetic types.

The types float, double and long double are also called generic floating types for the purpose of this Technical Report.

Note: C does not specify a radix for float, double and long double. An implementation can choose the representation of float, double and long double to be the same as the decimal floating types. In any case, the decimal floating types are distinct from float, double and long double regardless of the representation.

Note: This Technical Report does not define decimal complex types. The three complex types remain to be float _Complex, double _Complex and long double _Complex.

Following are suggested change to the C99:

Change the first sentence of 6.2.5#10.

[10] There are three generic floating types, designated as float, double and long double.

Add the following paragraphs after 6.2.5#10.

[10a] There are three decimal floating types, designated as _Decimal32, _Decimal64 and _Decimal128. The set of values of the type _Decimal32 is a subset of the set of values of the type _Decimal64; the set of values of the type _Decimal64 is a subset of the set of values of the type _Decimal128. Support for _Decimal128 is optional. Decimal floating types are real floating types.

[10b] The generic floating types and decimal floating types are real floating types.

Add the following to 6.7.2 Type specifiers:

type-specifier:

_Decimal32
_Decimal64
_Decimal128

 

4 Characteristics of decimal floating types <decfloat.h>

The header <float.h> defines characteristics of non-decimal floating types. The contents remain unchanged by this Technical Report.

The characteristics of decimal floating types are defined in terms of a model specifying general decimal arithmetic (refer to 1.2). The encodings are specified in IEEE-754R (refer to 1.3).

The three decimal encoding formats defined in IEEE-754R correspond to the three decimal floating types as follows:

The finite numbers are defined by a sign, an exponent (which is a power of ten), and a decimal integer coefficient. The value of a finite number is given by (-1)sign x coefficient x 10exponent. Refer to IEEE-754R for details of the format.

These formats are characterized by the length of the coefficient, and the maximum and minimum exponent. The table below shows these characteristics by format:
 
 

Format decimal32 decimal64 decimal128
Coefficient length in digits 7 16 34
Maximum Exponent (Emax) 96 384 6144
Minimum Exponent (Emin) -95 -383 -6143

 

The new header <decfloat.h> defines several macros that expand to various limits and parameters of the decimal floating-types. These macros have the similar names and meaning as to the corresponding ones in <float.h>.

Suggested change to C99.

Add the following after 5.2.4.2.2:

5.2.4.2.2a Characteristics of decimal floating types <decfloat.h>

[1] The characteristics of decimal floating types are defined in terms of the format described in IEEE-754R. The finite numbers are defined by a sign, an exponent (which is a power of ten), and a decimal integer coefficient. The value of a finite number is given by (-1)sign x coefficient x 10exponent. The macros defined in decfloat.h provide the characteristics of these representations, which is defined in the Decimal Arithmetic Encoding. The prefixes DEC32_ , DEC64_, and DEC128_ are used to denote the types _Decimal32, _Decimal64, and _Decimal128 respectively.

[2] Except for assignment and casts, the values of operations with decimal floating operands and values subject to the usual arithmetic conversions and of decimal floating constants are evaluated to a format whose range and precision may be greater than required by the type. The use of evaluation formats is characterized by the implementation-defined value of DEC_EVAL_METHOD:

-1 indeterminable;
 0 evaluate all operations and constants just to the range and precision of the type;
 1 evaluate operations and constants of type _Decimal32 and _Decimal64 to the range and precision of the _Decimal64 type, evaluate _Decimal128 operations and constants to the range and precision of the _Decimal128 type;
 2 evaluate all operations and constants to the range and precision of the _Decimal128 type.
All other negative values for DEC_EVAL_METHOD characterize implementation-defined behavior.

[3] The values given in the following list shall be replaced by constant expressions suitable for use in #if preprocessing directives:

5 Conversions

5.1 Conversions between decimal floating and integer

For conversions between real floating and integer types, C99 6.3.1.4 leaves the behavior undefined if the conversion result cannot be represented. (Annex F.4 tightened up the behavior.) To help writing portable code, this Technical Report provides defined behavior whenever possible. Furthermore, it is useful to allow program execution to continue without interruption unless the program needs to check the condition.

When the new type is a decimal floating type, we have these choices: the most positive/negative number representable, positive/negative infinity, and quiet NaN. The first provides no indication to the program that something exceptional has happened. The second provides indication, but other operations that produce infinity also raise signals. A signal needs to be raised here for consistency. But in the interest of performance, interupting the program is not preferable. The third allows the program to continue while providing a way for the implementation to encode the condition. This is slightly better than the second choice.

When the new type is an unsigned integral type, the values that create problems are those less than 0 and those greater than Utype_MAX. There is no overflow/under-flow processing for unsigned arithmetic. A possible choice for the result would be Utype_MAX. Also, common existing implementations do not raise signals for signed integer arithmetic. When the new type is a signed integral type, the values that create problems are those less than type_MIN and those greater than type_MAX. The result here could be type_MIN or type_MAX depending on whether the original value is negative or positive.

To make the behavior consistent among all real floating types, the suggested changes below apply to all real floating types, not just decimal floating types.
 

Suggested change to C99.

Change the last sentence of 6.3.1.4 paragraph 1 to:

[1] ... If the value of the integral part cannot be represented by the integer type, the result is the largest representable number if the type is unsigned, and the most negative or positive number according to the sign of the floating point number if the type is signed.

Change the last sentence of 6.3.1.4 paragraph 2 to:

[2] ... If the value being converted is outside the range of values that can be represented, the result is quiet NaN.
 
 

5.2 Conversions among decimal floating types, and between decimal floating types and generic floating types

The specification is similar to the existing ones for float, double and long double, except that when the result cannot be represented exactly, the behavior is tightened to become correctly rounded.
 

Following are suggested change to C99:

Add after 6.3.1.5#2.

[3] When a _Decimal32 is promoted to _Decimal64 or _Decimal128, or a _Decimal64 is promoted to _Decimal128, its value is unchanged.

[4] When a _Decimal64 is demoted to _Decimal32, a _Decimal128 is demoted to _Decimal64 or _Decimal32, or conversion is performed among decimal and generic floating types other than the above, if the value being converted can be represented exactly in the new type, it is unchanged. If the value being converted is in the range of values that can be represented but cannot be represented exactly, the result is correctly rounded. If the value being converted is outside the range of values that can be represented, the result is dependent on the rounding mode. If the rounding mode is:

near, the absolute value of the result is one of HUGE_VAL, HUGE_VALF, HUGE_VALL, HUGE_VAL_D64, HUGE_VAL_D32 or HUGE_VAL_D128 depending on the result type and the sign is the same as the value being converted.

zero, the value is the most positive respresentable if the value being converted is positive, and the most negative number representable otherwise.

positive infinity, the value is same as zero if the value being converted is negative, and is same as near otherwise.

negative infinity, the value is same as near if the value being converted is negative, and is same as zero otherwise.
 
 

5.3 Conversions between decimal floating and complex

When a value of decimal floating type is converted to a complex type, the real part of the complex result value is determined by the rules of conversion in 5.2 and the imaginary part of the complex result value is zero.

This is covered by C99 6.3.1.7.

5.4 Usual arithmetic conversions

In a business application that is written using decimal arithmetic, mixed operations between decimal and other real types might not occur frequently. Situations where this might occur are when interfacing with other languages, calling an existing library written in binary floating-point arithmetic, or accessing existing data. The programmer may want to use an explicit cast to control the behavior in such cases to make the code maximally portable. One way to handle usual arithmetic conversion is therefore to disallow mixed operations. The disadvantage of this approach is usability - for example, it could be tedious to add explicit casts in assignments and in function calls when the compiler can correctly handle such situations. A variation of this is to allow it only in simple assignments and in argument passing.

One major difficulty of allowing mixed operation is in the determination of the common type. C99 does not specify exactly the range and precision of the generic real types. The pecking order between them and the decimal types is therefore unspecified. Given two (or more) mixed type operands, there is no simple rule to define a common type that would guarantee portability in general.

For example, we can define the common type to be the one with greater range (the suggested change below). But since a double type may have different range under different implemenations, a program cannot assume the resulting type of an addition, say, involving both _Decimal64 and double. This imposes limitations on how to write portable programs.

If the generic real type is a type defined in IEEE-754R, and if we use the greater-range rule, the common type is easily determined. When mixing decimal and binary types of the same type size, decimal type is the common type. When mixing types of different sizes, the common type is the one with larger size. The suggested change below uses this approach but does not assume the generic real type to follow IEEE-754R. This guaranttees consistent behaviors among implementation that uses IEEE-754 in their binary floating-point arithmetic, and at the same time provides reasonable behavior for those that don't. Annex C presents an alternate suggestion that disallows mixed operands.
 

Following are suggested changes to C99.

Insert the following to 6.3.1.8#1, after "This pattern is called the usual arithmetic conversions:"

6.3.1.8[1]

... This pattern is called the usual arithmetic conversions:

If one operand is a decimal floating type and there are no complex types in the operands:

If one operand is a decimal floating type and the other is a generic floating type, the one with a smaller value range is converted to the other.

Otherwise, if either operand is _Decimal128, the other operand is converted to _Decimal128.

Otherwise, if either operand is _Decimal64, the other operand is converted to _Decimal64.

Otherwise, if either operand is _Decimal32, the other operand is converted to _Decimal32.


If one operand is a decimal floating type and the other is a complex type, the decimal floating type is converted to the first type in the following list that can represent the value range: float, double, long double. It is converted to long double if no type in the list can represent its value range. In either case, the complex type is converted to a type whose corresponding real type is this converted type. Usual arithmetic conversions is then applied to the converted operands.

During any of the above conversions, if the value being converted can be represented exactly in the new type, it is unchanged. If the value being converted is in the range of values that can be represented but cannot be represented exactly, the result is correctly rounded. If the value being converted is outside the range of values that can be represented, the result is dependent on the rounding mode. If the rounding mode is:

near, the absolute value of the result is one of HUGE_VAL, HUGE_VALF, HUGE_VALL, HUGE_VAL_D64, HUGE_VAL_D32 or HUGE_VAL_D128 depending on the result type and the sign is the same as the value being converted.

zero, the value is the most positive respresentable if the value being converted is positive, and the most negative number representable otherwise.

positive infinity, the value is same as zero if the value being converted is negative, and is same as near otherwise.

negative infinity, the value is same as near if the value being converted is negative, and is same as zero otherwise.


If there are no decimal floating type in the operands:
 

First, if the corresponding real type of either operand is long double, the other operand is converted, ... <the rest of 6.3.1.8#1 remains the same>

5.5 Default argument promotion

There is no default argument promotion for the decimal floating types.
 

6 Constants

New suffixes are added to denote decimal floating-point constants. DF for _Decimal32, DD for _Decimal64, and DL for _Decimal128. It is a constraint violation to use df, dd, dl, DF, DD and DL in a hexadecimal-floating-constant.
 

Suggested change to C99.

Add the following to 6.4.4.2 floating-suffix.
 

floating-suffix: one of
f l F L df dd dl DF DD DL


Add the following paragraph after 6.4.4.2#2:

6.4.4.2
...
[2a]
Constraints
The df, dd, dl, DF, DD and DL shall not be used in a hexadecimal-floating-constant.

Add the following paragraph after 6.4.4.2#4:

6.4.4.2
...
[4a] If a floating constant is suffixed by df or DF, it has type _Decimal32. If suffixed by dd or DD, it has type _Decimal64. If suffixed by dl or DL, it has type _Decimal128.
 
 

7 Floating-point environment <fenv.h>

The floating point environment specified in C99 7.6 applies to decimal floating types. This is to implement the context defined in IEEE 754R. The existing C99 specification gives flexibility to implementation on which part of the environment is accessible to programs. The decimal floating-point arithmetic specify a more stringent requirement. All the rounding modes and flags are supported.

Suggested change to C99.

Add the following after 7.6#7:

7.6
...
[7a] Each of the macros
 

FE_DEC_ROUND_DOWN
FE_DEC_ROUND_HALF_UP
FE_DEC_ROUND_HALF_EVEN
FE_DEC_ROUND_CEILING
FE_DEC_ROUND_FLOOR


are defined and used by fegetround and fesetround functions for getting and setting the rounding mode of decimal floating-pointer operations.

[7b] Each of the macros

FE_DEC_ROUND_HALF_DOWN
FE_DEC_ROUND_UP


are defined and used by fegetround and fesetround functions if and only if the implementation supports the optional rounding modes round-half-down and round-up.

Add the following paragraph after 7.6#5.

7.6
...
[5a] Each of the macros

FE_DEC_DIVISION_BY_ZERO
FE_DEC_INEXACT
FE_DEC_INVALID_OPERATION
FE_DEC_OVERLFOW
FE_DEC_UNDERFLOW


are defined and used by functions defined in C99 7.6.2.
 
 

8 Arithmetic Operations

 

8.1 Operators

The operators Add (C99 6.5.6), Subtract (C99 6.5.6), Multiply (C99 6.5.5), Divide (C99 6.5.5), Relational operators (C99 6.5.8), Equality operators (C99 6.5.9), and Unary Arithmetic operators (C99 6.5.3.3) when applied to decimal floating type operands shall follow the semantics as defined in IEEE 754R.
 

8.2 Functions


Square root, min, max, fused multiple-add and remainder are implemented as library functions. Refer to section 9 below.
 

8.3 Conversions


Conversions between different formats and to integer formats are covered under section 5.
 
 

9 Library

9.1 Decimal mathematics <math.h>

The elementary functions specified in the mathematic library are extended to handle decimal floating-point types. These include functions specified in 7.12.4, 7.12.5, 7.12.6, 7.12.7, 7.12.8, 7.12.9, 7.12.10, 7.12.11, 7.12.12, and 7.12.13. The macros DEC32_HUGE, DEC64_HUGE, DEC128_HUGE and DEC_NAN are defined to help using these functions. With the exception of sqrt, max, and min, the accuracy of the decimal floating-point results is implementation-defined. The implementation may state that the accuracy is unknown. All classification macros specified in C99 7.12.3 are also extended to handle decimal floating-point types. The same applies to all comparison macros specified in 7.12.14.

The name of the functions are derived by adding suffixes d32, d64 and d128 to the double version of the function name.
 

Suggested change to C99:
 

Add at the end of 7.12 paragraph 3 the following macros.

7.12

[3]  ...

DEC32_HUGE
DEC64_HUGE
DEC128_HUGE


expands to a constant expression of type _Decimal32, _Decimal64 and _Decimal128 representing infinity.
 

Add at the end of 7.12 paragrah 5 the following macro.

7.12

[5]  ...

DEC_NAN


expands to quiet decimal floating NaN for the type _Decimal32.
 

9.2 New functions

The following are new functions added to math.h.
 

9.2.1 divide_integer functions

Suggested addition to C99:
 
7.12.10.4 The divide integer functions

Synopsis

#include <math.h>
_Decimal32  divide_integerd32 (_Decimal32  x, _Decimal32 y);
_Decimal64  divide_integerd64 (_Decimal64  x, _Decimal64 y);
_Decimal128 divide_integerd128(_Decimal128 x, _Decimal128 y);
Description

The divide_integer functions perform the divide-integer operation as defined in IEEE 754R.


 

9.2.2 remainder_near functions


Suggested addition to C99:
 

7.12.10.5 The remainder near functions

Synopsis

#include <math.h>
_Decimal32  remainder_neard32 (_Decimal32 x, _Decimal32 y);
_Decimal64  remainder_neard64 (_Decimal64 x, _Decimal64 y);
_Decimal128 remainder_neard128(_Decimal128 x, _Decimal128 y);


Description

The remainder_near functions perform the remainder-near operation as defined in IEEE 754R.

9.2.3 quantize functions

Suggested addition to C99:
 
7.12.11.5 The quantize functions

Synopsis

#include <math.h>
_Decimal32  quantized32 (_Decimal32 x,  _Decimal32 y);
_Decimal64  quantized64 (_Decimal64 x,  _Decimal64 y);
_Decimal128 quantized128(_Decimal128 x, _Decimal128 y);

_Bool  check_quantum32  (_Decimal32 x,  _Decimal32 y);
_Bool  check_quantum64  (_Decimal64 x,  _Decimal64 y);
_Bool  check_quantum128 (_Decimal128 x, _Decimal128 y);


Description

The quantize functions perform the quantize operation as defined in IEEE 754R.

9.2.4 round_to_integral_value functions


Suggested addition to C99:
 

7.12.11.6 The round to integral functions

Synopsis
 

#include <math.h>
_Decimal32  round_to_integerd32 (_Decimal32 x,  _Decimal32 y);
_Decimal64  round_to_integerd64 (_Decimal64 x,  _Decimal64 y);
_Decimal128 round_to_integerd128(_Decimal128 x, _Decimal128 y);


Description

The round_to_integer functions perform the round-to-integer operation as defined in IEEE 754R.

9.2.5 normalize functions

Suggested addition to C99:
 
7.12.15 The normalize functions

Synopsis

#include <math.h>
_Decimal32  normalized32  (_Decimal32 x);
_Decimal64  normalized64  (_Decimal64 x);
_Decimal128 normalized128 (_Decimal128 x);


Description

The normalize functions perform the normalize operation as defined in IEEE 754R.


 

9.3 Formatted input/output specifiers

The modifier D can be appended to f, F, e, E, g, and G to form output specifiers that indicate the argument is _Decimal64. The modifier LD can be appended to f, F, e, E, g, and G to form output specifiers that indicate the argument is _Decimal128.

Similarly, the modifier D and LD can be appended to f, F, e, E, g, and G to form input specifiers that indicate the argument is a pointer to _Decimal32 or _Decimal128 respectively. In addition, the modifier HD can be appended to f, F, e, E, g, and G to form input specifiers that indicate the argument is a pointer to _Decimal32.
 

9.4 strtod32, strtod64, and strtod128 functions <stdlib.h>

These functions have the similar specifications as strtod, strtof, and strtold as defined in C99 7.20.1.3; refer to annex A for suggested description text. These functions are declared in <stdlib.h> with the following synopsis.

Synopsis

#include <stdlib.h>

_Decimal32  strtod32 (const char * restrict nptr, char ** restrict endptr);
_Decimal64  strtod64 (const char * restrict nptr, char ** restrict endptr);
_Decimal128 strtod128(const char * restrict nptr, char ** restrict endptr);


 

9.5 wcstod32, wcstod64, and wcstod128 functions <wchar.h>

These functions have the similar specifications as wcstod, wcstof, and wcstold as defined in C99 7.24.4.1.1; refer to annex B for suggested description text. They are declared in <wchar.h> with the following synopsis.

Synopsis

#include <wchar.h>

_Decimal32  wcstod32 (const char * restrict nptr, char ** restrict endptr);
_Decimal64  wcstod64 (const char * restrict nptr, char ** restrict endptr);
_Decimal128 wcstod128(const char * restrict nptr, char ** restrict endptr);

9.6 Type-generic macros <tgmath.h>

All new functions added to math.h are subjected to the same requirement as specified in C99 7.22 to provide support for type-generic macro expansion. When one of the arguments is a decimal floating type, use of the type-generic macro invokes a function whose parameters have the types determined as follows:
If there is more than one arguments, usual arithmetic conversions are applied so that both arguments have compatible types. Then,

 

Annex A


Below is the suggested text for strtod32, strtod64, and strtod128, copied from C99 7.20.1.3 with editing. Editing is indicated by strikethrough (delete) and underline (change, new). Refer also to the handling of Signalling NaNs suggested by WG14 paper N1011.
 

7.20.1.5 The strtod32, strtod64, and strtod128 functions

Synopsis

[#1]

    #include <stdlib.h>
    _Decimal32  strtod32 (const char * restrict nptr, char ** restrict endptr);
    _Decimal64  strtod64 (const char * restrict nptr, char ** restrict endptr);
    _Decimal128 strtod128(const char * restrict nptr, char ** restrict endptr);

Description

[#2] The strtod32, strtod64, and strtod128 functions convert the initial portion of the string pointed to by nptr to float_Decimal32, double _Decimal64, and long double _Decimal128 representation, respectively. First, they decompose the input string into three parts: an initial, possibly empty, sequence of white-space characters (as specified by the isspace function), a subject sequence resembling a  floating-point constant or representing an infinity or NaN; and a final string of one or more unrecognized characters, including the terminating null character of the input string. Then, they attempt to convert the subject sequence to a floating-point number, and return the result.

[#3] The expected form of the subject sequence is an optional plus or minus sign, then one of the following:

                      digit
                   non-digit
                      n-char-sequence digit
                   n-char-sequence non-digit
 

The length of the n-char-sequence shall be shorter than D32_COEFF_DIG, D64_COEFF_DIG or D128_COEFF_DIG respectively depending on the return type. The subject sequence is defined as the longest initial subsequence of the input string, starting with the first non-white-space character, that is of the expected form. The subject sequence contains no characters if the input string is not of the expected form.

[#4] If the subject sequence has the expected form for a floating-point number, the sequence of characters starting with the first digit or the decimal-point character (whichever occurs first) is interpreted as a floating constant according to the rules of 6.4.4.2, except that it is not a hexadecimal floating number, that the decimal-point character is used in place of a period, and that if neither an exponent part nor a decimal-point character appears in a decimal floating point number, or if a binary exponent part does not appear in a hexadecimal floating point number, an exponent part of the appropriate type with value zero is assumed to follow the last digit in the string. If the subject sequence begins with a minus sign, the sequence is interpreted as negated. note1)  A character sequence INF or INFINITY is interpreted as an infinity, if representable in the return type, else like a floating constant that is too large for the range of the return type. A character sequence NAN or NAN(n-char-sequence-opt), or SNAN or SNAN(n-char-sequence-opt), is interpreted as a quiet NaN or signalling NaN respectively; the meaning of the n-char sequences is implementation-defined. note2)  A pointer to the final string is stored in the object pointed to by endptr, provided that endptr is not a null pointer.

[#5]  If the subject sequence has the hexadecimal form and FLT_RADIX is a power of 2, the The value resulting from the conversion is correctly rounded.

[#6]  In other than the "C" locale, additional locale-specific subject sequence forms may be accepted.

[#7] If the subject sequence is empty or does not have the expected form, no conversion is performed; the value of nptr is stored in the object pointed to by endptr, provided  that endptr is not a null pointer.

Recommended practice

[#8]  If  the  subject  sequence  has  the hexadecimal form, FLT_RADIX is not a power of 2, and the result is not exactly representable,  the  result should be one of the two numbers in the appropriate internal format that are adjacent to  the hexadecimal floating   source   value,   with   the  extra stipulation that the error should have a  correct  sign  for the current rounding direction.

[#9]  If  the  subject  sequence has the decimal form and at most DECIMAL_DIG (defined in <float.h>)DEC128_COEFF_DIG (defined in <decfloat.h>) significant digits, the result should be correctly rounded. If the subject sequence D has the decimal form and more than DEC128_COEFF_DIG significant digits, consider the two bounding, adjacent decimal strings L and U, both having DEC128_COEFF_DIG significant digits, such that the values of L, D, and U satisfy L <= D <= U. The result should be one of the (equal or adjacent) values that would be obtained by correctly rounding L and U according to the current rounding direction, with the extra stipulation that the error with respect to D should have a correct sign for the current rounding direction. 252)

Returns

[#10]  The functions return the converted value, if any. If no conversion could be performed, zero is returned.  If the correct value is outside the range of representable values, plus or minus HUGE_VALHUGE_VAL_D64, HUGE_VALFHUGE_VAL_D32, or HUGE_VALL HUGE_VAL_D128 is  returned (according to the return type and sign of the value), and the value of the macro ERANGE is stored in errno. If the result underflows (7.12.1), the functions return a value whose magnitude is no greater than the smallest normalized positive number in the return type; whether errno acquires the value ERANGE is implementation-defined.
 

252 DECIMAL_DIG, defined in <float.h>, should be sufficiently large that L and U will usually round to the same internal floating value, but if not will round  to adjacent values.

note1 It is unspecified whether a minus-signed sequence is converted to a negative number directly or by negating the value resulting from  converting the corresponding unsigned sequence (see  F.5); the two methods may yield different results if rounding is toward positive or negative infinity. In either case, the functions honor the sign of zero if floating-point arithmetic supports signed zeros. F.5 shall be followed.

note2 An implementation may use the n-char sequence to determine extra information to be represented in the NaN's significand. No signal is raised at the point of returning the signalling NaN.
 

Annex B

The suggested text for wcstod64, wcstod32 and wcstod128 are similar to those in annex A, and is based on the text in C99 7.24.4.1.1.

7.24.4.1.3 The strtod32, strtod64, and strtod128 functions

Synopsis

[#1]

    #include <stdlib.h>
    _Decimal32  strtod32 (const char * restrict nptr, char ** restrict endptr);
    _Decimal64  strtod64 (const char * restrict nptr, char ** restrict endptr);
    _Decimal128 strtod128(const char * restrict nptr, char ** restrict endptr);

Description

Similar to 7.20.1.5 in annex A, replacing references to character with wide character where appropiate.
 
 

Annex C

The following alternate suggestion disallow mixing decimal floating-point operands with generic floating-point operands in usual arithmetic conversion. Assignments and function argument passing are allowed; they are already covered in 6.5.16.1#2 and 6.5.2.2#7.

Insert the following to 6.3.1.8#1, after "This pattern is called the usual arithmetic conversions:"

6.3.1.8[1]

... This pattern is called the usual arithmetic conversions:

If one operand is a decimal floating type, all other operands shall not be generic floating type or complex type:

First if either operand is _Decimal128, the other operand is converted to _Decimal128.

Otherwise, if either operand is _Decimal64, the other operand is converted to _Decimal64.

Otherwise, if either operand is _Decimal32, the other operand is converted to _Decimal32.


If there are no decimal floating type in the operands:
 

First, if the corresponding real type of either operand is long double, the other operand is converted, ... <the rest of 6.3.1.8#1 remains the same>