ISO/IEC JTC1 SC22 WG21 N2170 = 07-0030 - 2007-02-02
The current standard prohibits using universal character names
to specify many characters,
in particular the control characters
and the basic source characters
the standard permits specifying the printable ASCII characters
While the prohibition against basic source characters is generally not a significant problem, the prohibition against control characters within character and string literals causes programmers to fall back upon traditional escape sequences, which makes the code more platform-dependent.
For example, the high control characters of Unicode (80-9F)
have code points with different meanings in windows-1252.
In UTF-8, those points also have a different representation.
"\u0085" would be
The current C++ standard permits specification of universal characters within the range D800 through DFFF inclusive. These values do not identify characters, but rather identify half of surrogate pairs. The C 1999 standard prohibits specification of these values.
This problem is core issue number 558, and this paper proposes a solution to that issue.
The only potential need for values within this range is processing of strings. In those rare cases, use of direct numeric constants (e.g. 0xD83F) will suffice.
We propose to lift the prohibitions on control and basic source universal character names within character and string literals. We propose to add prohibitions against surrogate values in all universal character names.
The existing wording in the phases of translation (2.1) and existing grammar for character (2.13.2) and string (2.13.4) literals prevents problems parsing literals because interpretation of the universal character names occurs after tokenization. Because the prohibitions remain outside of string literals, the existing parse is not affected.
In paragraph 2, edit
The universal-character-name construct provides a way to name other characters.The character designated by the universal-character-name
- hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit
\UNNNNNNNNis that character whose character short name in ISO/IEC 10646 is
NNNNNNNN; the character designated by the universal-character-name
\uNNNNis that character whose character short name in ISO/IEC 10646 is
0000NNNN. If the hexadecimal value for a universal character name
is less than 0x20, or in the range 0x7F-0x9F (inclusive),or if the universal character name designatesa character in the basic source character set, thenthe program is ill-formed.