ISO/IEC JTC1 SC22 WG21
N4267 / EWG 119
Richard Smith
richard@metafoo.co.uk
2014-11-05

Adding u8 character literals

Wording

Change in 2.14.3 (lex.ccon):

character-literal:
        ' c-char-sequence '
        u' c-char-sequence '
        U' c-char-sequence '
        L' c-char-sequence '
        encoding-prefixopt ' c-char-sequence '
encoding-prefix: one of
        u8 u U L
[…]

Change in 2.14.3 (lex.ccon) paragraph 1 and split it into two paragraphs:

A character literal is one or more characters enclosed in single quotes, as in 'x', optionally preceded by one of the letters u8, u, U, or L, as in u8'w', u'y', U'z', or L'x', respectively.

A character literal that does not begin with u8, u, U, or L is an ordinary character literal, also referred to as a narrow-character literal. An ordinary character literal that contains a single c-char representable in the execution character set has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set. An ordinary character literal that contains more than one c-char is a multicharacter literal. A multicharacter literal, or an ordinary character literal containing a single c-char not representable in the execution character set, is conditionally-supported, has type int, and has an implementation-defined value.

Drafting note: the term "narrow-character literal" was not used anywhere else in the standard, and confusingly sometimes referred to literals of non-narrow-character type.

Change in 2.14.3 (lex.ccon) paragraph 2 and split it into four paragraphs:

A character literal that begins with u8, such as u8'w', is a character literal of type char, known as a UTF-8 character literal. The value of a UTF-8 character literal is equal to its ISO 10646 code point value, provided that the code point value is representable with a single UTF-8 code unit (that is, provided it is in the C0 Controls and Basic Latin Unicode block). If the value is not representable with a single UTF-8 code unit, the program is ill-formed. A UTF-8 character literal containing multiple c-chars is ill-formed.

A character literal that begins with the letter u, such as u'y', is a character literal of type char16_t. The value of a char16_t literal containing a single c-char is equal to its ISO 10646 code point value, provided that the code point is representable with a single 16-bit code unit. (That is, provided it is a basic multi-lingual plane code point.) If the value is not representable within 16 bits, the program is ill-formed. A char16_t literal containing multiple c-chars is ill-formed.

A character literal that begins with the letter U, such as U'z', is a character literal of type char32_t. The value of a char32_t literal containing a single c-char is equal to its ISO 10646 code point value. A char32_t literal containing multiple c-chars is ill-formed.

A character literal that begins with the letter L, such as L'x', is a wide-character literal. A wide-character literal has type wchar_t. [Footnote: …] The value of a wide-character literal containing a single c-char has value equal to the numerical value of the encoding of the c-char in the execution wide-character set, unless the c-char has no representation in the execution wide-character set, in which case the value is implementation-defined. [ Note: The type wchar_t is able to represent all members of the execution wide-character set (see 3.9.1). ]. The value of a wide-character literal containing multiple c-chars is implementation-defined.

Change in 2.14.3 (lex.ccon) paragraph 4:

[…] The value of a character literal is implementation-defined if it falls outside of the implementation-defined range defined for char (for literals with no prefix), char16_t (for literals prefixed by 'u'), char32_t (for literals prefixed by 'U'), or wchar_t (for literals prefixed by 'L'). Note: If the value of a character literal prefixed by u, u8, or U is outside the range defined for its type, the program is ill-formed. ]

Change in 2.14.5 (lex.string):

[…]
encoding-prefix:
        u8
        u
        U
        L
[…]