N 2701: @ and $ in source and execution character set

Submitter: Philipp Klaus Krause
Submission Date: 2021-03-28

Summary:

@ and $ in source and execution character set

This adds @ and $ to the source and execution character set.

Justification:

The basic source character and execution character set are currently only required to contain the basic character set, which currently consists of the characters of the invariable subset of EBCDIC and ASCII, and other characters (even if their position is different in different EBCDIC code pages) that are used as C syntax. The character @ is present both in ASCII and in many EBCDIC code pages. In different EBCDIC code pages it has different positions. It even moved position in ASCII in the 1965 and 1967 updates of ASCII. Today, @ is commonly used in C source code, in particular in email adresses in both comments and string literals. Practically, C users expect this to work. With @ present in both ASCII and EBCDIC, this should be easy to support for implementations (and as witnessed by the widespread use of @ in C source, current implementations already support it). While there are EBCDIC codepages where @ is missing, such as 322, those are code pages that are missing many other characters from the current basic source character set anyway.

$, while not as widely used as @, is somewhat similar. It, too is present in both ASCII and EBCDIC, and has different positions in different EBCDIC code pages. `, is used less commonly in comments than $, but more often in string literals (since it is used in Markdown syntax) and also present in both ASCII and EBCDIC

By requiring @ and $ in the source and execution character set we, reach the goal of making them useable in comments and string literals. By not adding them to the basic source character set, we protect the freedom of implementations of allowing or disallowing them in identifiers, and avoid inconsistency or incompability regarding the use of universal character names (currently the use of universal character names for characters in the basic source character set is not allowed, so adding characters to the basic source character set without lifting that restriction could break existing code).

Do we want to add @ and $ to the source and execution character set as single byte each?

Proposed change: In N2596, §5.2.1.2: Replace "The basic character set shall be present and each character shall be encoded as a single byte." by "The basic character set, @ and $ shall be present and each character shall be encoded as a single byte.".

Do we want to add @ and $ to the source and execution character set without requiring them to be single bytes?

Proposed change: In N2596, §5.2.1.2: Replace "The presence, meaning, and representation of any additional members is locale-specific." by "The meaning, and representation of any additional members is locale-specific. -The characters @ and $ shall be present. The presence of any additional members is locale-specific.".

Do we also want to add ` in the same way as @ and $?

Proposed change: In N2596, §5.2.1.2, as modified by the previous questions, replace "@ and $" by "@, $ and `".