N832 J11/98-031 Response to PC-US0002 Randy Meyers May 29, 1998 INTRODUCTION At the Colorado meeting, I presented a verbal response to PC-US0002 that restated the committee's past position on the question. The committee informally approved the answer, and as a reward, I was drafted to write the formal response. Thanks guys. (The public comment asks why UCNs may not be used to represent characters form the required source character set.) RESPONSE UCNs are not permitted to designate characters from the basic source character set in order to permit fast compilation times for C programs. For some real world programs, compilers spend a significant amount of time merely scanning for the characters that end a quoted string, or end a comment, or end some other token. Although, it is trivial for such loops in a compiler to be able to recognize UCNs, this can result in a surprising amount of overhead. A UCN is constrained not to specify a character short identifier in the range 0000 through 0020 or 007F through 009F inclusive for the same reason: this avoids allowing a UCN to designate the newline character. Since different implementations use different control characters or sequences of control characters to represent newline, UCNs are prohibited from representing any control character. UCNs are part of the syntax of C (and C++) just as the syntax of the "for" loop is part of C. UCNs are a representation of C program source, and are not part of a character set standard, and have no special properties to display devices or printers. A text editor (or any other program) that generates UCNs is generating C source, and it is no burden for such a program to generate UCNs only for the characters that can be designated by UCNs. The constraint that a UCN not represent a basic source character does not prohibit an implementation from translating all of a program's source text into wide characters during translation phase one. Such a compiler need only range check any UCN that it translates into a wide character. Since the constraint on UCNs is independent on the context in which the UCN appears, this is very easy to do. Since such a check would only need to be made when the implementation has already recognized a UCN, it adds little overhead to the translation.