ISO/IEC JTC 1/SC22/Java SG ISO/IEC JTC 1/SC22/Java SG N 3-7 DATE: 1998-10-13 REPLACES: N/A DOC TYPE: Plain DOS Text TITLE: Support of entire repertoire of ISO/IEC 10646 SOURCE: Akio Kido (Project editor of TR 10176) PROJECT: N/A STATUS: This document is circulated to National Bodies of JTC 1/SC22/Java SG for review and consideration at the October 1998 SC22/Java SG meeting in Tokyo. ACTION ID: FYI DUE DATE: DISTRIBUTION: P and L Members MEDIUM: DISKETTE NO.: NO. OF PAGES: 1 Text of contribution: Whereas, ISO/IEC JTC1/SC2 is now standardizing ISO/IEC 10646-2 and specify additional planes of ISO/IEC 10646, that can not be accessed by UCS2 encoding scheme, and whereas recently approved ISO/IEC TR 10176 second edition, "Guidelines for preparation of programming language standards" recommends that every programming language standard should ensure that at least every character specified by ISO/IEC 10646 can be a value of the character data type, the value space of the string object of the Java standard should be entire repertoire of ISO/IEC 10646 including ISO/IEC 10646-2. To meet the above requirement, the possible alternatives could be: - Do not specify encoding scheme and size of character data type in the Java language standard, make it processor defined. Mapping between the character datatype in the Java language and Java ByteCode should be done in a processor defined manner. In the other words, the implementation which support full repertoire of ISO/IEC 10646 should have the character data type whose size is more or equal to 31 bits and handle character data character by character not 16bit by 16bit. The operation may be translated into multiple Java ByteCode statements where the character data is stored as UTF-16 encoding. - Amend the current specification of Java 1.1 and make the size and encoding of character datatype as UCS4 encoding in all Java standards including Java language, Java ByteCode, and Java API. - Stay the current specification of character datatype as it is, i.e. UCS2, and o allow to store UTF-16 into the string object; o provide new methods to detect character boundary in the string object; o provide new character manipulation method that handle character data in the string object character by character, not by 16bit unit; o provide new "CHARACTER" datatype and "STRING" class which value space is entire repertoire of ISO/IEC 10646 and which encoding is UCS4; o provide set of functions to convert string object which encoding is UTF-16 to the new "STRING" object that encoding is UCS4, and vise versa.