Document: N1108
Date: 08-Mar-05

The type and representation of unsuffixed floating constant


The proposal of decimal floating type in N1077 requires new suffixes for the decimal floating constants. It would help usability if unsuffixed floating constant can be used. The same usability issue exists for the fixed point types specified in TR 18037. This paper proposes a solution.

The issue can be illustrated by the following example (_Decimal64 is a decimal floating type proposed in N1077):

_Decimal64 rate = 0.1;

0.1 has type double. In an implementation where binary representation is used for the floating types, and FLT_EVAL_METHOD is not -1, the internal representation of 0.1 cannot be exact. The variable 'rate' will get a value slightly different from 0.1. This defeated the purpose of decimal floating types. On the other hand, requiring programmers to write:

_Decimal64 rate = 0.1dd;

is inconvenient. (Note: dd is the decimal floating point suffix proposed ny N1077.)

Translation time data type

The main idea is to introduce a translation time data type (TTDT) which the translator uses as the type for unsuffixed floating constants. A floating constant is kept in this type and representation until an operation requires it to be converted to an actual type. The value of the constant remains exact as long as possible during the translation process. The idea can be summarized as follows:

1/ The implementation is allowed to use a type different from double and long double as the type of unsuffixed floating constant. This is an implementation defined type. The intention is that this type can represent the constant extactly if the number of decimal digits is within an implementation specified limit. For an implemenation that supports decimal floating pointing, a possible choice is the widest decimal floating type.

2/ The range and precision of this type are implementation defined but are fixed throughout the program.

3/ TTDT is an arithmetic type. All arithmetic operations are defined for this type.

4/ Usual arithmetic conversion is extended to handle mixed operations between TTDT and other types. Roughly speaking, if an operation involves both TTDT and an actual type, the TTDT is converted to an actual type before the operation. This way, there is no "top-down" parsing context information required to process unsuffixed floating constants. Technically speaking, there is no deferring in determining the type of the constant. For example:

double  f;
f =  0.1;

Suppose the implementation uses _Decimal128 (a decimal floating type defined in N1077) as the TTDT. 0.1 is represented exactly after the constant is scanned. It is then converted to double in the assignment operator.

f = 0.1 * 0.3;

Here, both 0.1 and 0.3 are represented in TTDT. If the compiler evaluates the expression during translation time, it would be done using TTDT, and the result would be TTDT. This is then converted to double before the assignment. If the compiler generates code to evaluate the expression during execution time, both 0.1 and 0.3 would be converted to double before the multiply. The result of the former would be different but more precise than the latter.

float g = 0.3f;
f = 0.1 * g;

When one operand is a TTDT and the other is one of float/double/long double, the TTDT is converted to double with an internal representation following the specification of FLT_EVAL_METHOD for constant of type double. Usual arithmetic conversion is then applied to the resulting operands.

_Decimal32 h = 0.1;

If one operand is a TTDT and the other a decimal floating type, the TTDT is converted to _Decimal64 with an internal representation specified by DEC_EVAL_METHOD (as specified in N1077). Usual arithmetic conversion is then applied.

If one operand is a TTDT and the other a fixed point type, the TTDT is converted to the fixed point type. If the implementation supports fixed point type, it is a recommended practice that the implementation chooses a representation for TTDT that can represent floating and fixed point constants exactly, subjected to a predefined limit on the number of decimal digits.

Suggested changes to C99

Below are suggested changes to C99 to capture the above idea. Decimal floating types and fixed point types are not considered in these changes. It is the intention that the change can be applied independently of the decimal floating type TR, but when an implementaion spports decimal floating type, it uses the widest decimal floating type as the TTDT.

In 6.2.5 after paragraph 28, add a paragraph:

[28a] There is an implementation defined data type called the translation time data type, or TTDT. TTDT is an arithmetic type and is used as the type for unsuffixed floating constants. There is no type specifier for TTDT.

Replace paragraph 4 with the following:

[4] An unsuffixed floating constant has type TTDT. If suffixed by the letter f or F, it has type float. If suffixed by the letter l or L, it has type long double.

Add the following paragraphs after Translation Time Data Type

When a TTDT is converted to double, it is converted to the internal representation specified by FLT_EVAL_METHOD.

Recommended practice

The conversion of TTDT to double should match the execution-time conversion of character strings by library functions, such as strtod, given matching inputs suitable for both conversions, the same format and default execution-time rounding.

Before the usual arithmetic conversions are carried out, if one operand is TTDT and the other is not, and is not a decimal floating type, the TTDT operand is converted to double. Otherwise, the behavior is implementation defined.

Considerations for Decimal Floating Type support

If the implementation supports decimal floating type, the recommended practice is to use the widest decimal floating type as TTDT. In this case in, when one operand is TTDT and the other is a decimal floating type, both operands would be decimal floating type. Usual arithmetic conversions specified by the Decimal Floating Type TR. would handle it. The last sentence above in the suggested text in can be removed.

Considerations for C++

It is our desire to provide consistency between C and C++. A valid program in C using decimal floating type should also be translated correctly as a C++ program. But this is a difficult problem when decimal floating constant is included. This paper has been focused so far only on C issues. Let us briefly consider possible solutions for C++.

TTDT is a translation time entity. It is removed and replaced by an actual type by the time translation is completed. This is why there is no type specifier for it. A program cannot explicitly declare a variable with this type. In C++, the language is designed to allow extensions by defining new data types using classes. So if TTDT is a data type for handling floating constants, even if it is used only during translation time, it should be defined using a class. This class will need to play a special role within the language, though. A possible specification for it could be as follows.

Unsuffixed floating constant has type TTDT. There is a special class with name ttdt. It provides a constructor which accepts char* as parameter. It also provides all arithmetic operators. If this class is not declared before a TTDT is needed, the type for TTDT is double. If this class is declared, the type for TTDT is this class.

Let s be a token representing a floating constant. In a context when a TTDT is required, s is converted to TTDT by calling the constructor with "s" (i.e. the string formed by enclosing s within string quotes) as the argument.


_Decimal64 x = 0.1;

If the class ttdt has been declared, the above is parsed as:

_Decimal64 x = ttdt("0.1");

However, the class ttdt is no longer a translation time entity.