Digit Separators

ISO/IEC JTC1 SC22 WG21 N2281 = 07-0141 - 2007-05-02

Lawrence Crowl

Problem

Numeric literals of more than a few digits are hard to read. For example, compare 237498123 with 237499123 for equality and 237499123 with 20249472 for relative magnitude.

Solution

We propose to add the underscore character as a digit separator in numeric literals. The pervious examples become clearer as 237_498_123 with 237_499_123 and 237_499_123 with 20_249_472.

Alternate Solutions

Bjarne Stroustrup has suggested using a space as a separator. While this approach is consistent with some presentation styles, it would likely make editing tools that grab "words" less reliable. Furthermore, the preprocessor syntax would need to change (section 2.9 below) with potential unforseen risk.

Unaddressed Issues

This proposal does not address binary literals or hexadecimal floating-point literals. We believe those should be addressed in a separate paper.

Implementation

This approach has been implemented in the Ada programming language.

Changes to the C++ Standard

The changes to the standard are minimal and introduce no incompatibilities for correct code. The lack of incompatibilities arises because there is no place in the grammar where a numeric literal is followed by an identifier.

2.9 [lex.ppnumber]

To the grammar, no changes are necessary. The preprocessing number tokens already admit an underscore via the non-terminal nondigit.

pp-number:
digit
. digit
pp-number digit
pp-number nondigit
pp-number e sign
pp-number E sign
pp-number .

2.10 Identifiers [lex.name]

To the grammar, no changes are necessary. This section defines non-terminals used elsewhere.

nondigit: one of
a b c d e f g h i j k l m
n o p q r s t u v w x y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z _
digit: one of
0 1 2 3 4 5 6 7 8 9

2.13.1 Integer literals [lex.icon]

To the grammar, edit as follows, permitting underscores between digits. (Note some renderings of HTML will overstrike the underscore with the markup for inserted text. Look for "more than usual" spacing in that case.)

integer-literal:
decimal-literal integer-suffixopt
octal-literal integer-suffixopt
hexadecimal-literal integer-suffixopt
decimal-literal:
nonzero-digit
decimal-literal digit
decimal-literal _ digit
octal-literal:
0
octal-literal octal-digit
octal-literal _ octal-digit
hexadecimal-literal:
0x hexadecimal-digit
0X hexadecimal-digit
hexadecimal-literal hexadecimal-digit
hexadecimal-literal _ hexadecimal-digit
nonzero-digit: one of
1 2 3 4 5 6 7 8 9
octal-digit: one of
0 1 2 3 4 5 6 7
hexadecimal-digit: one of
0 1 2 3 4 5 6 7 8 9
a b c d e f
A B C D E F

2.13.3 Floating literals [lex.fcon]

To the grammar, edit as follows, permitting underscores between digits. (Note some renderings of HTML will overstrike the underscore with the markup for inserted text. Look for "more than usual" spacing in that case.)

floating-literal:
fractional-constant exponent-partopt floating-suffixopt
digit-sequence exponent-part floating-suffixopt
fractional-constant:
digit-sequenceopt . digit-sequence
digit-sequence .
exponent-part:
e signopt digit-sequence
E signopt digit-sequence
sign: one of
+ -
digit-sequence:
digit
digit-sequence digit
digit-sequence _ digit