Response to PC-US0002
May 29, 1998
At the Colorado meeting, I presented a verbal response to PC-US0002
that restated the committee's past position on the question. The
committee informally approved the answer, and as a reward, I was
drafted to write the formal response.
(The public comment asks why UCNs may not be used to represent
characters form the required source character set.)
UCNs are not permitted to designate characters from the basic source
character set in order to permit fast compilation times for C
programs. For some real world programs, compilers spend a significant
amount of time merely scanning for the characters that end a quoted
string, or end a comment, or end some other token. Although, it is
trivial for such loops in a compiler to be able to recognize UCNs,
this can result in a surprising amount of overhead.
A UCN is constrained not to specify a character short identifier in
the range 0000 through 0020 or 007F through 009F inclusive for the
same reason: this avoids allowing a UCN to designate the newline
character. Since different implementations use different control
characters or sequences of control characters to represent newline,
UCNs are prohibited from representing any control character.
UCNs are part of the syntax of C (and C++) just as the syntax of the
"for" loop is part of C. UCNs are a representation of C program
source, and are not part of a character set standard, and have no
special properties to display devices or printers. A text editor (or
any other program) that generates UCNs is generating C source, and it
is no burden for such a program to generate UCNs only for the
characters that can be designated by UCNs.
The constraint that a UCN not represent a basic source character does
not prohibit an implementation from translating all of a program's
source text into wide characters during translation phase one. Such a
compiler need only range check any UCN that it translates into a wide
character. Since the constraint on UCNs is independent on the context
in which the UCN appears, this is very easy to do. Since such a check
would only need to be made when the implementation has already
recognized a UCN, it adds little overhead to the translation.