Doc. no. | P0085R3 |
Audience: | EWG, LEWG |
Date: | 01-07-2025 |
Project: | ISO JTC1/SC22/WG21: Programming Language C++, evolution group |
Reply to: | Jolly Chen <Jolly.Chen@cern.ch>, Axel Naumann <Axel.Naumann@cern.ch>, Michael Jonker <Michael.Jonker@cern.ch>, |
Proposal to add 0o and 0O as an alternative (and preferred) sequence to introduce octal-literals and deprecate use of the old prefix 0 for non-zero octal-literals.
The syntax rule to interpret integer literals starting with a zero as octal-literals might be called a 'historical mistake'. It can be easily misunderstood by novice programmers and can lead to surprising errors.
To allow future generations (of developers if not compilers) to correct this feature, we propose to add the character sequence 0o and 0O as preferred sequences to introduce an octal-literal. The prefix 0o follows the model set by the prefix 0x to introduce a hex-literal, and (since c++14) 0b to introduce a binary-literal.
Additionally, we propose to deprecate the use of octal integer literals with the 0 prefix, apart from 0 itself; this is similar to C declaring the use of nonzero octal integer literals without the prefix 0o or 0O an obsolescent feature (see 6.11.5 in C2y draft). This opens the door for compilers to warn about the use of the deprecated 0 prefix, and eventually (on the scale of a few releases), potentially interpret numbers with leading zeroes as decimals.
From http://en.wikipedia.org/wiki/Octal//en.wikipedia.org/wiki/Octal: "Newer languages have been abandoning the prefix 0, as decimal numbers are often represented with leading zeroes. The prefix q was introduced to avoid the prefix o being mistaken for a zero, while the prefix 0o was introduced to avoid starting a numerical literal with an alphabetic character (like o or q), since these might cause the literal to be confused with a variable name. The prefix 0o also follows the model set by the prefix 0x used for hexadecimal literals in the C language; it is supported by Haskell,[19] OCaml,[20] Python as of version 3.0,[21] Raku,[22] Ruby,[23] Tcl as of version 9,[24] PHP as of version 8.1,[25] Rust[26] and ECMAScript as of ECMAScript 6[27] (the prefix 0 originally stood for base 8 in JavaScript but could cause confusion,[28] therefore it has been discouraged in ECMAScript 3 and dropped in ECMAScript 5[29])."
This proposal now reflects a recent corresponding change in C (N3353) that was triggered by an earlier version of this proposal.
It was observed that changes are needed in the specification of std::format and streams to support the new octal-literal prefix for input/output. This proposal does not address that and leaves them to a future proposal.
Padding leading zeros are often (attempted to be) used to nicely align numbers. This leads to surprising results if the programmer expected the number to be interpreted as a decimal. For example:
Table of numbers:
std::array<std::array<int, 2>, 2> table = { { 100, 042 }, { 107, 000 } };In this case, the programmer intended to write the decimal numbers 100, 42, 107, and 0. However, the values 000 and 042 are interpreted as octal-literals. While the octal-literal 000 is equivalent to the intended decimal number 0, the octal-literal 042 is equivalent to the decimal number 34, triggering bugs that are hard to find by the octal-unaware programmer or reader of code.
Examples for issued caused by padding version numbers:
Following this proposal, these literals all specify the same number:
Literal | Before | After |
---|---|---|
Hex | 0x2A | 0x2A |
Binary | 0b00101010 | 0b00101010 |
Octal | 052 | 0o52 |
The old octal literal 052 will remain valid but deprecated.
This proposal introduces 0o and 0O as new prefixes for octal-literals. Under the current standard, any sequence starting with 0o or 0O is illegal. Consequently, the proposed additions 0o and 0O will not break existing code.
Additionally, this proposal deprecates the existing error-prone syntax rule for non-zero integer literals starting with a zero, following the example of C (N3353). This would affect existing, traditional octal-literals (i.e. with a leading 0), for instance when defining POSIX file permissions (sometimes padded with multiple leading zeros), as seen in popular repositories (>500 stars) e.g., ROOT, Qt Creator, node-android, KDiff3.
We have received concerns that this deprecation leads to warnings for octal-literals that have the same value whether interpreted as octal or decimal. Examples of such literals are: 01, 02, 03, 04, 05, 06, 07. To keep the door open for a future proposal introducing 01...09 as decimal literals (typical use case: 1970y/January/01d), implementations might choose to not diagnose 01..07.
We propose the feature test macro name __cpp_0o_octals for this feature.
To match N3353,
make the following edits (relative to
N5008),
highlighting the insertions and removals:
octal-literal0octal-literal 'opt octal-digitprefixed-octal-literal unprefixed-octal-literal unprefixed-octal-literal: 0 0 'opt octal-digit-sequence prefixed-octal-literal: octal-prefix octal-digit-sequence
Before hexadecimal-prefix insert
octal-prefix: one of 0o 0O
Before hexadecimal-digit-sequence insert
octal-digit-sequence: octal-digit octal-digit-sequence 'opt octal-digit
Under paragraph 2, replace the text:
2 The hexadecimal-digits a through f and A through F have decimal values ten through
fifteen. [Example 1 : The number twelve can be written 12, 014, 0o14,
0XC, or 0b1100. The integer-literals 1048576, 1'048'576,
0X100000, 0x10'0000, and 0'004'000'000 0o0'004'000'000, all have
the same value. - end example]
After paragraph 4, add the paragraph:
An unprefixed-octal-literal ([lex.icon], [gram.lex]) of the form
0 'opt octal-digit-sequence
is deprecated. [depr.oct]
Table 149 - Enum class perms [tab:fs.enum.perms]
Name | Value | POSIX macro | Definition or notes |
---|---|---|---|
none | 0o0 | There are no permissions set for the file. | |
owner_read | 0o400 | S_IRUSR | Read permission, owner |
owner_write | 0o200 | S_IWUSR | Write permission, owner |
owner_exec | 0o100 | S_IXUSR | Execute/search permission, owner |
owner_all | 0o700 | S_IRWXU | Read, write, execute/search by owner; owner_read | owner_write | owner_exec |
group_read | 0o40 | S_IRGRP | Read permission, group |
group_write | 0o20 | S_IWGRP | Write permission, group |
group_exec | 0o10 | S_IXGRP | Execute/search permission, group |
group_all | 0o70 | S_IRWXG | Read, write, execute/search by group; group_read | group_write | group_exec |
others_read | 0o4 | S_IROTH | Read permission, others |
others_write | 0o2 | S_IWOTH | Write permission, others |
others_exec | 0o1 | S_IXOTH | Execute/search permission, others |
others_all | 0o7 | S_IRWXO | Read, write, execute/search by others; others_read | others_write | others_exec |
all | 0o777 | owner_all | group_all | others_all | |
set_uid | 0o4000 | S_ISUID | Set-user-ID on execution |
set_gid | 0o2000 | S_ISGID | Set-group-ID on execution |
sticky_bit | 0o1000 | S_ISVTX | Operating system dependent. |
mask | 0o7777 | all | set_uid | set_gid | sticky_bit | |
unknown | 0xFFFF | The permissions are not known, such as when a file_status object is created without specifying the permissions |
Add a new section
1 An unprefixed-octal-literal ([lex.icon], [gram.lex]) of the form
0 'opt octal-digit-sequenceis deprecated.
[Note 1: Use of unprefixed octal literals, except the literal 0, are deprecated because they are often confused with decimals. --end note]
[Example 1:
int zero = 0; // OK int more_zeroes = 000; // deprecated int unprefixed_octal = 042; // deprecated int prefixed_octal = 0o42; // OK
--end example]
Thanks to Erich Keane and Thomas Köppe for reviewing the draft. The document style was borrowed from Doc. no. N4340