From rinehuls@access.digex.net Sat Nov 15 00:33:33 1997 Received: from access2.digex.net (qlrhmEbBUV1EY@access2.digex.net [205.197.245.193]) by dkuug.dk (8.6.12/8.6.12) with ESMTP id AAA21427 for ; Sat, 15 Nov 1997 00:33:27 +0100 Received: from localhost (rinehuls@localhost) by access2.digex.net (8.8.4/8.8.4) with SMTP id SAA00300 for ; Fri, 14 Nov 1997 18:33:22 -0500 (EST) Date: Fri, 14 Nov 1997 18:33:22 -0500 (EST) From: "william c. rinehuls" X-Sender: rinehuls@access2.digex.net Reply-To: "william c. rinehuls" To: sc22docs@dkuug.dk Subject: SC22 N2612 - Vote Summary on CD 14652 - Cultural Conventions Specifications Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII ________________________ beginning of title page _____________________ ISO/IEC JTC 1/SC22 Programming languages, their environments and system software interfaces Secretariat: U.S.A. (ANSI) ISO/IEC JTC 1/SC22 N2612 TITLE: Summary of Voting on Concurrent CD Registration and CD Approval for CD 14652 - Information technology - Specificiations for Cultural Conventions DATE ASSIGNED: 1997-11-14 SOURCE Secretariat, ISO/IEC JTC 1/SC22 BACKWARD POINTER: N/A DOCUMENT TYPE: Summary of Voting PROJECT NUMBER: JTC 1.22.30.02.03 STATUS: CD 14652 has been registered. WG20 is requested to prepare a Disposition of Comments Report and a recommendation on the further processing of the CD. ACTION IDENTIFIER: FYI to SC22 Member Bodies ACT to WG20 DUE DATE: N/A DISTRIBUTION: Text CROSS REFERENCE: SC22 N2504 DISTRIBUTION FORM: Def Address reply to: ISO/IEC JTC 1/SC22 Secretariat William C. Rinehuls 8457 Rushing Creek Court Springfield, VA 22153 USA Telephone: +1 (703) 912-9680 Fax: +1 (703) 912-2973 email: rinehuls@access.digex.net ______________ end of title page; beginning of overall summary _________ SUMMARY OF VOTING ON Letter Ballot Reference No: SC22 N2504 Circulated by: JTC 1/SC22 Circulation Date: 07-22-1997 Closing Date: 11-07-1997 SUBJECT: Concurrent CD Registration and CD Approval for CD 14652 - Information technology - Specificiations for Cultural Conventions -------------------------------------------------------------------- The following responses have been received on the subject of CD registration: "P" Members supporting registration without comment: 9 "P" Members supporting registration with comment: 0 "P" Members not supporting registration 3 "P" Members abstaining: 4 "P" Members not voting: 7 "O" Members supporting registration without comment: 1 The following responses have been received on the subject of CD approval: "P" Members supporting approval without comment: 7 "P" Members supporting approval with comment: 2 "P" Members not supporting approval: 3 "P" Members abtaining: 4 "P" Members not voting: 7 "O" Members supporting approval without comment: 1 ------------------------------------------------------------------------ Secretariat Action: CD 14652 has been registered. WG20 is requested to prepare a Disposition of Comments Report and a recommendation on the further processing of the CD. The comment accompanying the abstention vote from Austria was: "Lack of expert resources." The comment accompanying the abstention vote from Germany was: "There is no national WG20 rapporteur." The comment accompanying the abstention vote from Sweden was: "Expert resources not available." _________ end of overall summary; beginning of registration summary ___ ISO/IEC JTC1/SC22 LETTER BALLOT SUMMARY Registration Ballot PROJECT NO: JTC 1.22.30.02.03 SUBJECT: Concurrent CD Registration and CD Approval for CD 14652 - Information technology - Specifications for Cultural Conventions Reference Document No: N2504 Ballot Document No: N2504 Circulation Date: 07-22-1997 Closing Date: 11-07-1997 Circulated To: SC22 P, O, L Circulated By: Secretariat SUMMARY OF VOTING AND COMMENTS RECEIVED Approve Disapprove Abstain Comments Not Voting 'P' Members Australia (X) ( ) ( ) ( ) ( ) Austria ( ) ( ) (X) (X) ( ) Belgium (X) ( ) ( ) ( ) ( ) Brazil ( ) ( ) ( ) ( ) (X) Canada (X) ( ) ( ) ( ) ( ) China ( ) ( ) (X) ( ) ( ) Czech Republic (X) ( ) ( ) ( ) ( ) Denmark (X) ( ) ( ) ( ) ( ) Egypt ( ) ( ) ( ) ( ) (X) Finland (X) ( ) ( ) ( ) ( ) France (X) ( ) ( ) ( ) ( ) Germany ( ) ( ) (X) (X) ( ) Ireland ( ) ( ) ( ) ( ) (X) Japan ( ) (X) ( ) (X) ( ) Netherlands ( ) (X) ( ) (X) ( ) Norway (X) ( ) ( ) ( ) ( ) Romania ( ) ( ) ( ) ( ) (X) Russian Federation ( ) ( ) ( ) ( ) (X) Slovenia ( ) ( ) ( ) ( ) (X) Sweden ( ) ( ) (X) (X) ( ) UK ( ) ( ) ( ) ( ) (X) Ukraine (X) ( ) ( ) ( ) ( ) USA ( ) (X) ( ) (X) ( ) 'O' Members Voting Korea Republic (X) ( ) ( ) ( ) ( ) Portugal ( ) ( ) (X) ( ) ( ) __________ end of registration summmary; beginning of approval summary __ ISO/IEC JTC1/SC22 LETTER BALLOT SUMMARY Approval Ballot PROJECT NO: JTC 1.22.30.02.03 SUBJECT: Concurrent CD Registration and CD Approval for CD 14652 - Information technology - Specifications for Cultural Conventions Reference Document No: N2504 Ballot Document No: N2504 Circulation Date: 07-22-1997 Closing Date: 11-07-1997 Circulated To: SC22 P, O, L Circulated By: Secretariat SUMMARY OF VOTING AND COMMENTS RECEIVED Approve Disapprove Abstain Comments Not Voting 'P' Members Australia (X) ( ) ( ) ( ) ( ) Austria ( ) ( ) (X) (X) ( ) Belgium (X) ( ) ( ) ( ) ( ) Brazil ( ) ( ) ( ) ( ) (X) Canada (X) ( ) ( ) (X) ( ) China ( ) ( ) (X) ( ) ( ) Czech Republic (X) ( ) ( ) ( ) ( ) Denmark (X) ( ) ( ) (X) ( ) Egypt ( ) ( ) ( ) ( ) (X) Finland (X) ( ) ( ) ( ) ( ) France (X) ( ) ( ) ( ) ( ) Germany ( ) ( ) (X) (X) ( ) Ireland ( ) ( ) ( ) ( ) (X) Japan ( ) (X) ( ) (X) ( ) Netherlands ( ) (X) ( ) (X) ( ) Norway (X) ( ) ( ) ( ) ( ) Romania ( ) ( ) ( ) ( ) (X) Russian Federation ( ) ( ) ( ) ( ) (X) Slovenia ( ) ( ) ( ) ( ) (X) Sweden ( ) ( ) (X) (X) ( ) UK ( ) ( ) ( ) ( ) (X) Ukraine (X) ( ) ( ) ( ) ( ) USA ( ) (X) ( ) (X) ( ) 'O' Members Voting Korea Republic (X) ( ) ( ) ( ) ( ) Portugal ( ) ( ) (X) ( ) ( ) ________ end of approval summary ______________________________________ ___________ beginning of comments accompanying Canada affirmative vote__ Canadian Comments on ISO/IEC 14652 WD: 1. There is no rationale as to why this standard is required. There is rationale in Annex B for the FDDC-set and for the various LC_* categories but none for this standard. It would be very helpful to add such rationale to understand why this standard is necessary and what problems it solves. Specific Comments: 2. section 3.1.7 - definition of charmap needs to be changed to ..." a definition of a mapping between symbolic character names and the encoding for a coded character set" - The defintion for FDCC states that the term replaces the POSIX term 'locale', as the new entity is a superset of the locale as it is currently used. One may debate the point, but as a superset it fails to deal with basic issues of multiple concurrent support of differing formats (have 2 or more local currency formats as needed in Europe) and calendaring other than Gregorian. I would have expected more support from a new standard. 3. section 4 - FDDC-set. The paragraph beginning "..Other category names...". This is an unnecessary restriction and one that will cause problems with existing implementations. POSIX had no such restriction and as a result we have implementations that have introduced categories such as LC_TIMEZONE or LC_TOD etc. We could say that the six categories are mandatory in the FDDC-set. - The proposal states "In the event that some of the information for an FDCC-set category, as specified in this standard, is missing from the FDCC-set source definition, the behavior of that CATEGORY, if it is referenced, is unspecified." This is too restrictive, in that the complete category is 'wasted'. Perhaps a word of clarification is required. Does the proposed standard really want to have the complete category ignored? If so, there is a requirement on the object creation mechanism to issue a failure message during the compilation step of the 'FDCC-set object'. 4. section 4.1.0.5 - the third paragraph ("The items (2), ....") should be moved before item (2). 5. section 4.1.1. The portable character set should be mentioned because the next sub-section assumes some defaults that are characters in this set (which is also always a part of any charmap?). 6. section 4.1.1.1. - are these the only keywords allowed? Other keywords are allowed in POSIX. Statement should indicate that this list does not preclude others but that these are the minimum that are supported. - under "upper" etc. it states that "..if this keyword is not specified, the uppercase letters A through Z, shall automatically belong to his class...". This is fine when the keyword is absent. But in the opposite case this means that one can specify a whole range of characters under "upper" and exclude the A to Z set. This is not what you want. A statement should be made here that indicates that either: - one must include the portable character set when specifying characters in "upper", or - that these are automatically included if one does not include them in the specification for "upper" Of course, this applies to other keywords in this section as well. - under "graph", change "printable" to "graphical" - the keyword 'digit' ONLY allows the use of digits 0 through 9, but does not state whether they can be values in any language. - table 1: - the intersections of (upper, upper), (lower, lower). etc., should be indicated as N/A (not applicable). - upper (row) should not be permitted in lower (column) and vice versa 7. section 4.1.1.2: - No information is provided on the API that may be able to use such information. The term transformation is used as a synonym for transliteration. Transliteration should be used and the term transform should be used to avoid confusion with other functions performing string transforms (UTF-n, layout transforms....) - add new sub-section numbers (4.1.1.2.x) for: - transform_start keyword - transform_end keyword - include keyword - default_missing keyword - suggest a sub-section for the example. Also, the text at the end of the example should be preceded with "...in the example above.." or words to that effect. 8. Add section 4.1.1.3 for the "i18n" LC_CTYPE. 9. The LC_CTYPE that is shown should be a model; it does not follow the order of the keywords shown in 4.1.1.1. - it should. -Also, if one looks under "toupper" in 4.1.1.1, it states that "...only characters specified for the keywords lower and upper shall be specified". In this definition of LC_CTYPE, "toupper" is defined. Unfortunately, the keywords "lower" and "upper" are NOT specified!! -The LC_CTYPE is incomplete because all the Uxxxx characters are not shown. Ideally, this LC_CTYPE should be complete. Failing that, the incompleteness should be addressed and acknowledged. 10.section 4.1.2: - item (8): needs to be reworded to be clear but at a minimum replace "from behind" with "backwards" - "...The following keywords ..": states that the keywords are described in detail later. This is not wholly true because the first two keyword are not detailed later. - coll_weight_max: stated that the minimum value is 7 and that this is also the default. This is not the case as per the example for LC_COLLATE. There this value is 4. 11.section 4.1.2.4: third paragraph, third sentence - "The first operand .... this ." Expand the sentence to end with "or another "order_start" keyword is encountered". 12.section 4.1.2.5: this really does not belong in the explanation of keywords and as such should really appear after 4.1.2.12. Further in the example, and need to be explained. 13.remove sub-section heading 4.1.2.11 because it is not needed. 14.section 4.1.2.13: assumption here is that there is wide demand for this function. Most folks do not deal with locale construction so the benefit of a 'shorthand' way of changing locale source will be lost for the masses. All of this capability does not provide dynamic run time overrides, only deals with the current static model of previously defined source files. It also presumes the use of rather large all encompassing locales. 15.sub-section 4.1.2.13.6: this example shows two LC_COLLATE statements. Is this correct? Which one takes precedence? 16.section 4.1.3: the i18n LC_MONETARY category shown: - why is the "mon_decimal_point" shown as ? This is not culturally neutral, nor is it the only internationally accepted value. - the "-1" value for int_frac_digits through to n_sign_posn is incorrect in that the value "-1" is not described in the keyword text that precedes this definition. - why are there no entries for int_p_cs-precedes etc. when these are identified as keywords. In the description of the keywords, there is no indication as what will happen when these keywords are omitted. - how are occurrences of multiple currencies, such as EURO and the local country currency, proposed to be handled? 16.section 4.1.4: the i18n LC_NUMERIC category shown: - why is the "mon_decimal_point" shown as ? This is not culturally neutral, nor is it the only internationally accepted value. 18.section 4.1.5: this is the first time that the word "mandatory" has been used for keywords. Does this means that all other keywords in the other categories are optional? - abmon and mon keywords: the current restriction of twelve months is not correct; it does not allow 13 Hebrew months to be shown. - what is the effect of when the "am_pm" keyword is an empty string? in general, what the effect of empty keywords? - optional keyword support for 'era' and alternate digits is rather short sighted in that these are 'mandatory' for far-east support. 19.section 4.1.5.1: table 2: - %m - change from (01-12) to (01-13) - if the timezone information is application defined, per note at the end of the table, then %Z should really be removed. The better suggestion is that the category be expanded to handle timezone and not leave it up to the application. 20.section 4.1.5.2: - %Of should be changed as per the new %f in section 4.1.5.1 21.section 6: repertoiremap is incomplete (misses, for example, the section between U06AF and U1e00). ___________ end of Canada Comments ___________________________________ __ beginning of Denmark Comments Accompanying Affirmative Vote ______ Hereby the Danish Standards vote on SC22 N2504 - CD 14652 1. The vote for CD registration is "Yes" 2. The vote on the CD ballot is "Yes" with comments. 2.1 There is a need for support of an alternate currency, such as the EURO. We propose keywords such as the current for international currency and domestic currency, but with a "2" added to each of the keywords, and a "currency_rate" keyword for the fixed currency rate. 2.2 There is a need for equivalencing of weights in the LC_COLLATE specification, eg by a "weight_equivalence" keyword. This to accomodate different weight naming schemes. 2.3 Some extra keywords in the LC_MESSAGES category should be added, such as "yesstr" "nostr" and "cancelstr" 2.4 Support for ISO 2022 for extended charmap specifications are needed. _______________ end of Denmark Comments ____________________________ _____ beginning of Japan comments accompanying negative vote _______ Japan disapproves document SC22 N2504 (CD 14652) to be registered as Committee Draft (CD). Comments 1. The scope of this project is to specify specification method of cultural conventions more than what POSIX supports. The draft CD covers nothing more than POSIX. Therefore, the document N2504 does not satisfy the project objective. At the project subdivision ballot, Japan asked the difference from POSIX locale definition method. The disposition for the comment is described in SC22 WG20 N269 (Disposition of comments received on WG20 proposal to subdivide project JTC1.22.30.01.01 to include a project on: Cultural convention specification SC22 N1574). The disposition of comment commits that this project covers more than what current POSIX does. If there is no intention to add any more cultural conventions (more than POSIX) at the first publication, this project should be canceled. 2. The SC22 WG20 N269 indicates that the candidates of the "extended cultural conventions" as follow: Data input, Multi-lingual synchronization, Measuring system, Paper size and Postal address. Out form candidate, Japan recommends to add at least Paper size, Measurement system and Postal address. In addition to that, Japan recommend to consider to add "colour specification including colour systems and name of color" and "name of person". 3. When a cultural convention specification method more than POSIX does specify, to make FDCC-set compatible with POSIX, it is necessary to provide a FDCC-set specification method (method to specify which FDCCs are included in specified FDCC-set). Add clause of "specification method of FDCC-set". 4. It is anticipated that more cultural conventions to be added in this standard in future. There is a need to have a guide line to specify the new cultural convention specification methods. Add clause of "the guide line". 5. There are many technical and editorial comments on the documents N2504. Those comments are a part of CD ballot. 6. Confirm whether if the difinition of FDCC and FDCC-set are compatible with TR 11017. There is very high possibility that they are different each other. If there is, then aline the terminology with TR 11017. (This may resolve most of above comments) ------end of registration ballot comments -----CD BALLOT COMMENTS (Japan)----- Japan disapproves the document SC22 N2504 (CD 14652) as Committee Draft (CD) with following comments: J-1) General: The CD text is only a minor enhancement of a POSIX locale specification method and does not include any new categories which are declared to be investigated in SC22 WG20 N 269 -- disposition of the comments to NWI ballots. This project should be abandoned if it would include no new categories not included in a POSIX locale specification method. The extension of collation method should be moved to ISO/IEC 14651 in that case. note: this is the same comment as the CD registration ballot. See the registration ballot comment for detail. J-2) p.2, FOREWORD: The paragraph The Standard uses text from ISO/IEC 9945-2:1993 "Information Technology - Portable Operating System Interface (POSIX) Part 2: Shell and Utilities". The major differences from this text is listed in annex A. should be removed. J-3) p.4, 1.Scope The sentence The specification is compatible with POSIX locale specifications (10), and a locale conformant to POSIX specifications will also be conformant to the specifications in this Standard, while the reverse condition will not hold. should be changed to The specification is upward compatible with POSIX locale specifications(10) -- a locale conformant to POSIX specifications will also be conformant to the specifications in this Standard, while the reverse condition will not hold. J-4) p.4, 2. Normative referemces: The following references (1) ISO 639 Code for the representation of names of languages (2) ISO 646 Information technology - ISO 7-bit coded character set for information interchange (3) ISO/IEC 2022 Information technology - Character code structure and extension techniques (4) ISO 3166 Code for the representation of names of countries (7) ISO/IEC 8824 Information technology - Open Systems Interconnection - Specification of Abstract Syntax Notation One (ASN.1) (8) ISO/IEC 8825 Information technology - Open System Interconnection - Specification of Basic Encoding Rules for Abstract Syntax Notation One (ASN.1) (9) ISO/IEC 9899 Information technology - Programming Language C. should be removed because those standards are not referenced or referenced only in informative part (ISO 646). J-5) p.6, 3.1.12 collation: The text These rules identify a collation sequence between the collating elements, and such additional rules that can be used to order strings consisting of multiple collating elements. should be removed because it is too detailed as a definition and it is vague -- there is no explanation for what rule is additional. J-6) p.6, 3.1.17 affirmative responses: The definition should be removed because the term is understandable without definition. If they remain, the definition should be changed from: An input string that matches one of the responses acceptable to the LC_MESSAGES category keyword "yesexpr", matching an extended regular expression in the current FDCC-set. to: A string conforming to the definition of LC_MESSAGES category keyword "yesexpr". J-7) p.6, 3.1.18 negative response: (the same comment as 3.1.17 affirmative) J-8) p.7, 3.2.1 Format of syntax descriptions: The text The format of each parameter is given by an escape sequence as follows: %s specifies a string %d specifies an decimal integer %c specifies a character %o specifies an octal integer %x specifies a hexadecimal integer %% specifies a single % \n specifies an end-of-line All other characters in the format string represent themselves. should be changed to The format of each parameter is given by an escape sequence as follows: %s specifies a string %d specifies an decimal integer %c specifies a character %o specifies an octal integer %x specifies a hexadecimal integer All other characters in the format string except %% specifies a single % \n specifies an end-of-line represent themselves. J-9) p.7, 3.2.3 Ellipses: The definitions here are not consistent with thier expression in 5.1 Caharcter set description file (pp.45-46). The text here should be changed as to match with POSIX and the explanation in 5.2 should be removed. J-10) p.8, 4. FDCC-set: In the sentence This standard defines a normative FDCC-set named "i18n" with values for each of the above categories. the word "normative" is redundant. It should be removed. J-11) p.9, 4.1 FDCC-set Definition, para."The categrory body ...": The restriction Each keyword within a FDCC-set shall have a unique name (i.e., two categories cannot have a commonly-named keyword); should be removed because it loads a heavy burden on designing each categories -- even in this draft, the keyword "copy" is defined in more than two categories. J-12) p.9, 4.1 FDCC-set Definition: The subclauses 4.1.0.1 - 4.1.0.5 are ill-structured because they have not their direct superior subclause 4.1.0. The content of 4.1.0.5 should be moved before 4.1.0.1 without being put into a subclause and a new subclause title "4.1.0 Pre-category lines" should be introduced before 4.1.0.1:. J-13) p.9-10, 4.1.0.3 repertoiremap: Make clear how many repertoiremap specification is allowed in a FDCC-set. J-14) p10, 4.1.0.4 charmap: The sentence For the actual use of a FDCC-set, at most one charmap may be in use, and this may be different from any charmap specified with the "charmap" line. needs more explanation. J-15) p.11, 4.1.0.5 Character representation: Add a new rule for UCS-notation, and , which looks like symbolic names but not defined in a charmap file. J-16) p.10, 4.1.0.5 Character representation: The text Individual characters, characters in strings, and collating elements shall be represented using symbolic names, as defined below. In addition, characters can be represented using the characters themselves, or as octal, hexadecimal, or decimal constants. When nonsymbolic notation is used, the resultant FDCC-set definitions need not be portable between systems. The left angle bracket (<) is a reserved symbol, denoting the start of a symbolic name; when used to represent itself it shall be preceded by the escape character. The following rules apply to character representation: (1) ... is confusing. It should be changed to Individual characters, characters in strings, and collating elements shall be represented using symbolic names, UCS notation or characters themselves, or as octal, hexadecimal, or decimal constants as defined below. When constant notation is used, the resultant FDCC-set definitions need not be portable between systems. (0) The left angle bracket (<) is a reserved symbol, denoting the start of a symbolic name; when used to represent itself it shall be preceded by the escape character. (1) ... J-17) p.11, 4.1.0.5 Character representation, (1): The sentence The symbolic name, including the angle brackets, shall exactly match a symbolic name defined in a charmap file to be used, and shall be replaced by a character value determined from the value associated with the symbolic name in the charmap file. should be changed to The symbolic name, including the angle brackets, shall exactly match a symbolic name defined in charmap files or repertoire map files to be used, and shall be replaced by a character value determined from the value associated with the symbolic name in the charmap file or a value asscociated to UCS in repertoire map files. J-18) p.11, 4.1.0.5 Character representation, (3)-(5): It is confusing to include concatenated constants in each examples without any definition. The concatenated constants should be removed from the examples of (3)-(5) and the explanation for concatenated constants should be formed as a new rule as follows: (6) Multibyte characters can be represented by concatenated constants specified in byte order with the last constant specifying the least significant byte of the character. Concatenated constants can include a mix of the above character representations. J-19) p.11, 4.1.0.5 Character representation, end: The "Editor's note" here makes no sense. It shoud be removed. J-20) p.12, 4.1.1.1 Basic keywords The specification of digit digit Difine the characters to be classfied as numeric digit. Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9 shall be specified, and in ascending sequence by numerical value. If this keyword is not specified, the digits 0 through 9, shall automatically belongs class, with application-defined character value. is ambiguous. Make clear the relation between the digit 0-9 and in table 3 portable character J-21) p.14-16, 4.1.1.2 Character string transformation: The concept of character string transformation is not mature yet. It is of no use to specify only one transformation without any specific meaning. It should be removed. J-22) p.16, 4.1.1.2 Character string transformation: The "i18n" FDCC-set is not a matter of 4.1.1.2 Character string transformation. A new subclause title "4.1.1.3 i18n-LC_CTYPE" should be added just before the line beginning with "The "i18n" FDCC-set for the LC_CTYPE ...". J-23) p.16, "i18n" FDCC-set IPA characters should be removed from "toupper"and "alpha" Presentation form characters should be removed from "alpha" J-24) p23, 4.1.2 LC_COLLATE: The capabilities text (9) Easy reordering of characters. The "i18n" FDCC-set has a collation specification that with just a few modifications can be culturally correct for a specific culture. Here the "reorder-after" keyword gives a convenient way to modify a FDCC-set. (10) Easy reordering of scripts. The "i18n" FDCC-set gives an ordering of the scripts that may not be culturally acceptable in certain cultures. The keyword "reorder-script-after" gives a convenient way to modify the order of scripts in a FDCC-set. should be changed to (9) Easy reordering of characters. ISO/IEC 14651 has a template for collation specification that with just a few modifications can be culturally correct for a specific culture. Here the "reorder-after" keyword gives a convenient way to modify a FDCC-set template. (10) Easy reordering of scripts. The template in ISO/IEC 14651 gives an ordering of the scripts that may not be culturally acceptable in certain cultures. The keyword "reorder-script-after" gives a convenient way to modify the order of scripts in a FDCC-set template. J-25) p24, 4.1.2 LC_COLLATE: Add summaries of toggling keywords -- "define", "ifdef" etc. -- just before 4.2.2.1. J-26) p.25, 4.1.2.4 "order_start" keyword: The text here become very confusing by mixing and . Text should be changed based on two syntax forms "order_start %s;%s;...;%s\n", , ... and "order_start %s;%s;...;%s\n", , , ... J-27) p.25, 4.1.2.4, "order_start" keywords, directives "forward" and "backward": Give a difinition of "substring" and add a sentence The direction of scanning substrings is towards the logical end of the string. to the explanation of the directives "forward" and "backward". note: related discussion in the past. > > Accepted in principle. The "forward" directive has wordings of > > scanning towards the logical end, while the "backward" directive > > scans towards the beginning of the string. Or was something else > > meant, such as scanning for collating elements? > > Let's consider an example string "ABC123GHI" where A..Z are "backward" > and 1..9 are "forward". In this case, three substrings "ABC", "123", and "GHI" produces a key "CBA", "123", and "IHG" respectively from the current specification but there is no explanation how to combine those subkeys. > > The sentence > The direction of scanning substrings is towards the logical > end of the string. > assures that those subkeys are combined in "forward" manner resulting in > "CBA123IHG". There is no prescribed resulting string as per the specifications, this in an implementation detail. ------end of related discussion---- J-28) p.32, 4.1.2.13.3 "elif" keyword: The definition is incomplete -- the effect of preceding block is not considered. J-29) p.33, 4.1.2.14 "i18n" LC_COLLATE category: The sentence The "i18n" FDCC-set LC_COLLATE category is defined in ISO/IEC 14651 (12). should be changed to There is no "i18n" FDCC-set LC_COLLATE category. Instead of the default ordering, the common template for tailoring defined in ISO/IEC 14651 (12) should be used. note: related discussion in the past. > > Rejected. The "i18n" FDCC-set will be using the data of IS 14651. > > When referring to the "i18n" FDCC-set sorting needs to be defined. > > At the Quebec meeting, we agreed not define a "default" ordering > in IS 14651. Yes, but that does not influence 14652 in the way you indicate. One can always modify the i18n fdcc-set for a given culture. ------end of related discussion----- J-30) p.36-37, 4.1.3 LC_MONETARY and IC:4.1.4 LC_NUMERIC: The default values for "mon_decimal_point" and "decimal_point" should be changed from ( = ',') to ( = '.') according to the standard ISO 6093:1985 Information processing - Representation of numerical values in character strings for information interchange which should be added into 3. Normative references. J-31) p.37, 4.1.5 LC_TIME Following old comments are still opened: - Need to have a specification method of date and time convention for Luna calendar which year is not 365 days. - Need to have a specification method of data and time convention that week is not seven day. (historical buddist calendar) J-31) p.40, 4.1.5.2 Modified Field Descriptors: The value d_t_fmt "<%><%><%>" -- 2 1997-10-07 10:00:01 should be changed to d_t_fmt "<%><(><%><)><%>" -- 1997-10-07(2) 10:00:01 J-32) p.41, 5. CHARMAP: The sentence Conforming charmaps shall support the portable character set specified in Table 3 is ambiguous. It should be changed to Each charmap shall support the portable character set specified in Table 3 or A set of charmaps for a FDCC-set shall support the portable character set specified in Table 3 J-33) p.39, Table 2: The discussion did agree as follow: > > > 2. The LC_TIME %f format should return "1" for the first day of the > week, etc, and "7" for the 7th day of the week. Returning a string with a "0" for the first day of the week is misleading, and this is not used for indexing in arrays, but for display in strings. > > > > Accepted. > > The newly introduced escape sequence > > %f Weekday as a decimal number (0(Monday)-6). > > modified to > > %f Weekday as a decimal number (the first day 1 - the last day 7). However, this document is: > %f Weekday as a decimal number (1(Monday) - 7) Is this agreeable with POSIX? -------end of the comment on the past discussion------- J-34) p.46 5.1 Character set description File, para. "Theencoding part...": If this paragraph remains (we have requested to remove the explanation of character representation in 5.1 including this paragraph already), the sentence In a portable chrmap file, each constant shall represent an 8 bit byte should be removed because the concept of " a portable character set ifile" is not defined in this draft----a portable character set is only defined. J-35) p48-75, "i18nrep" repertoire It is not necessary to define such a confusing symbol name set in an international standatd. J-36) Conclude an open discusion below. --------Start of open discussion B------- >> 4. What does "byte" means in this standard? Since this standard >> does not require any "processor", an individually addressable >> unit of storage does mean nothing. > >The standard is meant for processing in IT environments, >so there is always a processor behind it somewhere. Since the document you are writing is "Standard", please do not hide anything behind the preparations of the standard text. If a "processor" exists, you should describe the processor and conformance of the "conformning implementation of the processor" in the conformance section. Also, you need to change its scope and title of the standard. It is not "specification method", but a "language". Also, please do not specify something that is helpful for something as requirement. You should specify what are mandatory requirements for conformance, and what are allowable extensions of the standard. Please you "may" for what are allowable extensions. I suppose, the conformance clause of your standard could be POSIX or C like things, if you would like to specify syntax of "cultural convention language and its compiler". Please note that I do not say "I agree with the scope change", but just say please write the standard text appropriately for the scope. Otherwise, reviewer may be confused. > I am not sure we need to describe a processor for this, > but we can discuss it at the next WG20 meeting. I believe that you are an expart of C language, right? Please carefully read the conformance section of C language. C standard specity 2 different conformance. One is conforming "processor", and theother is conforming "application". As you may know well, "Implementation" of C language standard is language "processor" of C language, I mean compiler. Then C language standard specifies how the comforming processor shall behave. In addition to that, C langauge standard specifies how comforming C application shall be written. That is application conformance. My point is what is the purpose of your standard. If you would like to specify how cultural convention set shall be specifyed, I mean application conformance, the phenominon that conform to your standard is just description of cultural convention, maybe written on a paper. Then you do not need to care about "processor". But, if you like to specify limitation and/or parameter for a cultural convention set description file "processing" system, then the preparation of your standard will looks like language standard, and you should specify both implementation conformance and application conformance. O.K. you should familiar with POSIX. Main portion of your standard comes from POSIX.2 localedef utility. POSIX also specify implementation conformance of localedef utility itself, and application conformance for localedef file. The limitation of maximam byte is for localedef utility. > > Also, please do not specify something that is helpful for something > > as requirement. You should specify what are mandatory requirements > > for conformance, and what are allowable extensions of the standard. > > Please you "may" for what are allowable extensions. > > The thing in question was inherited from POSIX. > We need at least to maintain it for POSIX compatibility. Do not be afraid. You can simply says that implementation may extend its syntax and specify something something. Then POSIX conforming localedef file becomes your standard conforming. Please note that the objective of the POSIX compatibility is make POSIX conforming localedef file conformity to your standard, not to make your standard conforming ones as POSIX conforming localedef file. > I am not great expert in writing conformance clauses, but I > hope I am learning. I at this time do not see the big difference > between a programming language and a specification method. > The specification method is to be interperted by some IT system, just > like a programming language. Keld, please please do not say "I'm not expert in writing some part of standard" . If you say so, need to say you should not be project editor. Project editor need to have enough capability of writing standard text, even he is not an expart on the subject technology area. It is our WG20's credibility problem. Thus, I need to come back from open discussion on sc22wg20 mailing lsit to parsonal mail. Anyway, you should try. Without having your new draft, we can not discuss it in the next WG20 meeting. Then we can not send new text to SC22 for CD ballot. In order to send new text to SC22 immediately after the next WG20 meeting, all of is sues should be resolved in the next WG20 meeting, and revised version need to be prepare in the meeting. -------end of open discussion B------- -- Minor editorial -- J-36) p.9, 4.1.0.2 escape_char: The sentence All examples this standard uses "/" as the escape character, except where otherwise noted. should be changed to All examples in this standard uses "/" as the escape character, except where otherwise noted. J-37) p.11, 4.1.1 LC_CTYPE: "in clause 3.2.5" should be changed to "subclause 3.2.3". J-38) p.13, Table 1: The line "In Can also belong to" should be removed. J-39) p24, 4.2.2.1 "script" keyword: "4.2.2.1" should be changed to "4.1.2.1". J-40) p24, 4.2.2.3 "collating-symbol" keyword: "4.2.2.3" should be changed to "4.1.2.3" _______________ end of Japan comments ___________________________________ ______ beginning of Netherlands comments accompanying negative vote ___ The NNI votes NO on CD 14652 in SC22N2504. These no votes pertain to both the registration vote and the document vote. The NNI will vote yes on the CD registration when the comments under -1- and -2- have been properly resolved. The NNI will vote yes on the CD when the comments under -1-, -2- and -3- have been properly resolved. The NNI has the following comments: -1- Market relevance The way this specification has been phrased effectively limits the use of this specification to POSIX/Unix and C platforms. This market is rather small; much larger markets and existing notations for a Cultural Conventions Specification seem to have been ignored. Even in this small market this document addresses only a minor part of 9945-2 and is understood to provide a small improvement to that specification. It is unclear whether the POSIX market will accept such slight improvements to this 9945-2 standard in a separate document. The NNI is of the opinion that WG20 has, and should have, a much broader scope than the POSIX platform and considers such a specification of limited applicability unacceptable. The NNI suggests the following course of actions: (a) WG20 is requested to develop a Platform and Language Independent Specification (PLIS) for a Cultural Conventions Specification (CCS-PLIS). This CCS-PLIS describes the functionality needed for a CCS without any reference to windows, files, programming languages and other implementation issues. (b) WG20 is requested to provide implementations of this CCS-PLIS for major platforms, amongst which the Wintel, the Macintosh, the POSIX and mainframe platforms. (c) The CCS-PLIS for POSIX is to be developed in cooperation with WG15. It should be noted that the CCS-PLIS should be defined in such a way that for each of the platforms mentioned above conformance clauses with respect to the CCS-PLIS can be specified. -2- Relation to Framework document The relation between this document and the Cultural Dependent Items formulated in the Framework Document is unclear. The Framework Document mentions that WG20 will deliver, amongst others, specifications for the following cultural dependent items: hyphenation of words, word representations of numbers, writing directions, voice messages and postal addressing formatting. The NNI had expected that the now presented 14652 document would contain such specifications. The NNI requests the following information from WG20: - will these additional items be added to 14652 in the (near) future? - if so, what will be the life expectancy of the current 14652 document? -3- Technical comments (a) The lexical and syntactic structure of the files has been specified incompletely. The document cannot be understood without knowledge of 9945-2. The document mixes lexical/syntactical structure and semantics of the specification. It is requested that a complete syntactical definition is given using EBNF (ISO 14977, or a variant thereof) and that a clear separation between lexical structure, syntactical structure and semantics will be maintained in the document. (b) The definitions as given in section 3 are unclear, incomplete in some cases, over-complete in other cases and self-contradictory in a few cases. It is requested that this section is redeveloped, preferably in an axiomatic style. The document itself contains much terminology that has not been defined in section 3. (c) The document seems to mix-up the concepts of `value' and `constant'. (d) The numbering system used is highly inconsistent: There are two sections 4.1.2.13.5 and two sections 4.1.2.13.6 After 4.1 follows 4.1.0.1 WG20 is requested to debug their documents before presenting them to the NBs. (e) the relationship between this document and CD 14651 is unclear: is it for instance possible that a system comforms to 14651 and not to 14652? The relationship needs yo be explaned. __________________ end of Netherlands comments __________________________ _____ beginning of USA comments accompanying negative vote ____________ The US National Body votes to Disapprove the CD Registration and the CD Ballot for ISO/IEC CD 14652. See comments below: General Comments Re 4.1.1 LC_CTYPE While the presence of the LC_CTYPE specification in CD 14652 is understandable, given the fact that CD 14652 is derivative from ISO/IEC 9945-2 (itself derivative from the XPG-4 specification of locale), it is inappropriate to extend the LC_CTYPE mechanism for dealing with character properties to cover the repertoire of ISO/IEC 10646. ISO/IEC 10646 specifies the *Universal Character Set*, and in the context of the Universal Character Set, character properties of the type that LC_CTYPE is concerned with are best treated as inherent to the characters. It would be correct to enumerate these properties in a standard- perhaps even in 14652, if not 10646 itself-but it is incorrect to imply, through the general FDCC-set syntax spelled out in 14652, that it is o.k. to redefine any of these properties in an FDCC-set definition, the same way that LC_MONETARY or LC_NUMERIC entries can be tailored for local cultural conventions. Character properties are *not* subject to local cultural conventions. It is *not* acceptable to redefine GREEK SMALL LETTER TAU to be uppercase, or to define CIRCLED DIGIT SIX to be punctuation, for example. Such definitions do not belong in specifications for *cultural conventions*, or if character properties must be defined there, they should at least be clearly earmarked as different from all other categories of an FDCC-set. The one obvious exception to this generality is case-mapping. Case-mapping relations do vary by language (with well-known examples for Turkish, French, and German). The specification of the LC_CTYPE "properties" and should be clearly marked as exceptional in this way. CD 14652 should give the default case-mapping values for the "i18n" FDCC-set, as shown, and then specify that these particular values should be redefined or overridden to obtain correct cultural specification for case-mapping for Turkish, for French, or whatever. ****** Re 3.1.6 FDCC-set The introduction of this new term seems unnecessary. The concepts presented in CD 14652 are so closely modeled on the XPG-4 notion of "locale" (except for the attempt to extend the character set coverage to 10646 and expand the concept of LC_COLLATE), that the new term obscures rather than clarifies what 14652 is about. Retention of the term "locale" or perhaps a adjectivally modified version of the term "locale" ("extended locale" ?) would be preferable. ****** Re 3.2.3 Ellipses The introduction of distinctions between two-dot, three-dot, and four-dot ellipses seems overly complex and subject to error in use. Furthermore, the explanations, both on pages 8 and 41ff are confusing. If such distinctions between range notations must be maintained, they should be better described, with clearer examples. Also, it is generally better practice to simply have a single range notation for a formal syntax, while maintaining clear syntactic differentiation of the elements which can form the items at each end of a range. So if the FDDC-set syntax must distinguish a range a symbols, a range of decimal values, a range of octal values, a range of hexadecimal values, and so on, the notation for "symbol", "decimal value", "octal value", "hexadecimal value", and so on should be unique and mutually exclusive, so that interpretation of the type of range does not depend on the number of dots. ****** Re 4.1.2 LC_COLLATE The syntax introduced for tailoring a collation sequence definition for cultural conventions is overly complex. It is very tightly coupled to the specific way in which a collation is defined in CD 14651, which itself is in question. A much simpler syntax has been promulgated by the Java developers to accomplish the same task, and it would be desireable to examine the alternatives before standardizing an LC_COLLATE syntax of unnecessary complexity. Unlike most of the rest of the categories involved in an FDCC-set definition, which merely specify lists of things, the LC_COLLATE syntax introduces notions of scope, reordering, and a macro control language. Granted that reordering rules are needed for defining collations, it is unclear that all of the rest of the syntax is. Re B.1.2 LC_COLLATE Rationale This states "The syntax for the LC_COLLATE category source is the result of a cooperative effort between representatives for many countries and organizations working with international issues, such as UniForum, X/Open, and ISO,..." We believe that this intentionally overstates the degree of cooperative effort involved and omits the fact that there is a serious lack of consensus in the international community, both about how to define the international string ordering and how to specify a syntax for tailoring it. Major implementors of international string ordering based on 10646 disagree with the approach taken in these drafts, and the standard should not paper over those differences with misleading implications that everyone agrees about how to do it. p. 74. In the rationale for LC_COLLATE, there is an estimation made that the standard covers the requirements for European languages, and that it will extend well to cover Cyrillic and Middle Eastern scripts (see below for editorial comment), and for the level 3 collation required for Chinese and Japanese. However, the standard will fail for dealing with scripts (such as Thai and Lao) that require *reordering* of characters within a string before calculating weights. That fact should be noted. Furthermore, the standard deliberately ignores the role of combining marks in collation. Implementation of 10646 with combining marks is not well-guided by this standard. It is quite unclear how to modify an LC_COLLATE definition to take combining marks into account. If combining marks are out-of-scope for CD 14652, this should be clearly stated and be consistently carried through. If they are not out-of-scope, then the tailoring syntax for LC_COLLATE should either account for them, or CD 14652 should state clearly what the alternative approaches involving tailoring of CHARMAP or REPERTOIRMAP could be, and how they would be implemented, *with specific examples*. ================================================================ Specific Technical Comments pp. 12 & 26: and It is unclear from either the definitions of and on page 12, or from the specification of the "i18n" FDCC-set for LC_CTYPE why certain space characters from the 10646 repertoire are not listed: U+00A0 NO-BREAK SPACE U+2007 FIGURE SPACE U+FEFF ZERO WIDTH NO-BREAK SPACE If having a property precludes a character from being included in the or types, that should be spelled out in the definition of those categories. ********* pp. 16-20: Bugs in the and tables In the toupper table, the entry (,) is incorrect and should be removed. In the toupper table, (,) should be added. In the toupper table, (,) should be added. In the tolower table, the entry (,) has the items reversed. It should read (,). In the tolower table, (,) should be added. ********** pp. 20-21: specification The list of characters for 10646 differs significantly from that implemented for the Alphabetic category for Java. Insistence on maintaining a distinction, based on principled or unprincipled arguments about the alphabetic status of this or that character, will lead to implementation confusion between the Java community and those who implement based on locales derived from the "i18n" FDCC-set. Given the importance of Java, and the fact that it has already provided a widespread, commercially significant answer to the question of which 10646 characters are alphabetic, the category in CD 14652 (if included here at all-see general comments above) should be harmonized with the Java values. A major defect in the list is the omission of combining characters from many scripts which clearly have the alphabetic property (e.g. the combining vowel matras from Indic scripts). Such omissions would result in nonsensical specifications of alphabetic spans in such scripts, if taken seriously. To simplify correction of the CD 14652 text for the property, here is the suggested list, as implemented in Java (divided into Alphabetic and Ideographic). (Not all unassigned subranges within these ranges are separately called out, to make this list shorter.) #Alphabetic 0041..005A LATIN CAPITAL LETTER A.. LATIN CAPITAL LETTER Z 0061..007A LATIN SMALL LETTER A.. LATIN SMALL LETTER Z 00AA FEMININE ORDINAL INDICATOR 00B5 MICRO SIGN 00BA MASCULINE ORDINAL INDICATOR 00C0..00D6 LATIN CAPITAL LETTER A WITH GRAVE.. LATIN CAPITAL LETTER O WITH DIAERESIS 00D8..00F6 LATIN CAPITAL LETTER O WITH STROKE.. LATIN SMALL LETTER O WITH DIAERESIS 00F8..02B8 LATIN SMALL LETTER O WITH STROKE.. MODIFIER LETTER SMALL Y 02BB..02C1 MODIFIER LETTER TURNED COMMA.. MODIFIER LETTER REVERSED GLOTTAL STOP 02E0..02E4 MODIFIER LETTER SMALL GAMMA.. MODIFIER LETTER SMALL REVERSED GLOTTAL STOP 037A GREEK YPOGEGRAMMENI 0386 GREEK CAPITAL LETTER ALPHA WITH TONOS 0388..0481 GREEK CAPITAL LETTER EPSILON WITH TONOS.. CYRILLIC SMALL LETTER KOPPA 0490..0559 CYRILLIC CAPITAL LETTER GHE WITH UPTURN.. ARMENIAN MODIFIER LETTER LEFT HALF RING 0561..0587 ARMENIAN SMALL LETTER AYB.. ARMENIAN SMALL LIGATURE ECH YIWN 05D0..05F2 HEBREW LETTER ALEF.. HEBREW LIGATURE YIDDISH DOUBLE YOD 0621..063A ARABIC LETTER HAMZA.. ARABIC LETTER GHAIN 0641..0652 ARABIC LETTER FEH.. ARABIC SUKUN 0670..06D3 ARABIC LETTER SUPERSCRIPT ALEF.. ARABIC LETTER YEH BARREE WITH HAMZA ABOVE 06D5..06DC ARABIC LETTER AE.. ARABIC SMALL HIGH SEEN 06E1..06E8 ARABIC SMALL HIGH DOTLESS HEAD OF KHAH.. ARABIC SMALL HIGH NOON 06ED ARABIC SMALL LOW MEEM 0901..0939 DEVANAGARI SIGN CANDRABINDU.. DEVANAGARI LETTER HA 093D..094C DEVANAGARI SIGN AVAGRAHA.. DEVANAGARI VOWEL SIGN AU 0958..0963 DEVANAGARI LETTER QA.. DEVANAGARI VOWEL SIGN VOCALIC LL 0981..09B9 BENGALI SIGN CANDRABINDU.. BENGALI LETTER HA 09BE..09CC BENGALI VOWEL SIGN AA.. BENGALI VOWEL SIGN AU 09D7..09E3 BENGALI AU LENGTH MARK.. BENGALI VOWEL SIGN VOCALIC LL 09F0 BENGALI LETTER RA WITH MIDDLE DIAGONAL 09F1 BENGALI LETTER RA WITH LOWER DIAGONAL 0A02..0A39 GURMUKHI SIGN BINDI.. GURMUKHI LETTER HA 0A3E..0A4C GURMUKHI VOWEL SIGN AA.. GURMUKHI VOWEL SIGN AU 0A59..0A5E GURMUKHI LETTER KHHA.. GURMUKHI LETTER FA 0A70..0AB9 GURMUKHI TIPPI.. GUJARATI LETTER HA 0ABD..0ACC GUJARATI SIGN AVAGRAHA.. GUJARATI VOWEL SIGN AU 0AE0 GUJARATI LETTER VOCALIC RR 0B01..0B39 ORIYA SIGN CANDRABINDU.. ORIYA LETTER HA 0B3D..0B4C ORIYA SIGN AVAGRAHA.. ORIYA VOWEL SIGN AU 0B56..0B61 ORIYA AI LENGTH MARK.. ORIYA LETTER VOCALIC LL 0B82..0BCC TAMIL SIGN ANUSVARA.. TAMIL VOWEL SIGN AU 0BD7 TAMIL AU LENGTH MARK 0C01..0C4C TELUGU SIGN CANDRABINDU.. TELUGU VOWEL SIGN AU 0C55..0C61 TELUGU LENGTH MARK.. TELUGU LETTER VOCALIC LL 0C82..0CCC KANNADA SIGN ANUSVARA.. KANNADA VOWEL SIGN AU 0CD5..0CE1 KANNADA LENGTH MARK.. KANNADA LETTER VOCALIC LL 0D02..0D4C MALAYALAM SIGN ANUSVARA.. MALAYALAM VOWEL SIGN AU 0D57..0D61 MALAYALAM AU LENGTH MARK.. MALAYALAM LETTER VOCALIC LL 0E01..0E2E THAI CHARACTER KO KAI.. THAI CHARACTER HO NOKHUK 0E30..0E3A THAI CHARACTER SARA A.. THAI CHARACTER PHINTHU 0E40..0E45 THAI CHARACTER SARA E.. THAI CHARACTER LAKKHANGYAO 0E47 THAI CHARACTER MAITAIKHU 0E4D THAI CHARACTER NIKHAHIT 0E81..0EAE LAO LETTER KO.. LAO LETTER HO TAM 0EB0..0EC4 LAO VOWEL SIGN A.. LAO VOWEL SIGN AI 0ECD LAO NIGGAHITA 0EDC LAO HO NO 0EDD LAO HO MO 0F40..0F81 TIBETAN LETTER KA.. TIBETAN VOWEL SIGN REVERSED II 0F90..10F6 TIBETAN SUBJOINED LETTER KA.. GEORGIAN LETTER FI 1100..1FBC HANGUL CHOSEONG KIYEOK.. GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI 1FBE GREEK PROSGEGRAMMENI 1FC2..1FCC GREEK SMALL LETTER ETA WITH VARIA AND YPOGEGRAMMENI.. GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI 1FD0..1FDB GREEK SMALL LETTER IOTA WITH VRACHY.. GREEK CAPITAL LETTER IOTA WITH OXIA 1FE0..1FEC GREEK SMALL LETTER UPSILON WITH VRACHY.. GREEK CAPITAL LETTER RHO WITH DASIA 1FF2..1FFC GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI.. GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI 207F SUPERSCRIPT LATIN SMALL LETTER N 2102 DOUBLE-STRUCK CAPITAL C 2107 EULER CONSTANT 210A..2113 SCRIPT SMALL G.. SCRIPT SMALL L 2115 DOUBLE-STRUCK CAPITAL N 2118..211D SCRIPT CAPITAL P.. DOUBLE-STRUCK CAPITAL R 2124 DOUBLE-STRUCK CAPITAL Z 2126 OHM SIGN 2128 BLACK-LETTER CAPITAL Z 212A..212D KELVIN SIGN.. BLACK-LETTER CAPITAL C 212F..2131 SCRIPT SMALL E.. SCRIPT CAPITAL F 2133..2138 SCRIPT CAPITAL M.. DALET SYMBOL 2160..2182 ROMAN NUMERAL ONE.. ROMAN NUMERAL TEN THOUSAND 3041..3094 HIRAGANA LETTER SMALL A.. HIRAGANA LETTER VU 30A1..30FA KATAKANA LETTER SMALL A.. KATAKANA LETTER VO 3105..318E BOPOMOFO LETTER B.. HANGUL LETTER ARAEAE AC00..D7A3 .. FB00..FB17 LATIN SMALL LIGATURE FF.. ARMENIAN SMALL LIGATURE MEN XEH FB1F..FB28 HEBREW LIGATURE YIDDISH YOD YOD PATAH.. HEBREW LETTER WIDE TAV FB2A..FD3D HEBREW LETTER SHIN WITH SHIN DOT.. ARABIC LIGATURE ALEF WITH FATHATAN ISOLATED FORM FD50..FDFB ARABIC LIGATURE TEH WITH JEEM WITH MEEM INITIAL FORM.. ARABIC LIGATURE JALLAJALALOUHOU FE70..FEFC ARABIC FATHATAN ISOLATED FORM.. ARABIC LIGATURE LAM WITH ALEF FINAL FORM FF21..FF3A FULLWIDTH LATIN CAPITAL LETTER A.. FULLWIDTH LATIN CAPITAL LETTER Z FF41..FF5A FULLWIDTH LATIN SMALL LETTER A.. FULLWIDTH LATIN SMALL LETTER Z FF66..FF6F HALFWIDTH KATAKANA LETTER WO.. HALFWIDTH KATAKANA LETTER SMALL TU FF71..FF9D HALFWIDTH KATAKANA LETTER A.. HALFWIDTH KATAKANA LETTER N FFA0..FFDC HALFWIDTH HANGUL FILLER.. HALFWIDTH HANGUL LETTER I #Ideographic 3007 IDEOGRAPHIC NUMBER ZERO 3021..3029 HANGZHOU NUMERAL ONE.. HANGZHOU NUMERAL NINE 4E00..9FA5 .. F900..FA2D .. ********** pp. 43-70: "i18nrep" repertoire file This list is arbitrarily chosen, and the principles for characters in it are unstated. If the repertoire file is not going to correspond to one of the named and numbered subsets of ISO/IEC 10646 (and Subset 300, the BMP, would be the obvious choice), then the choice of characters in the repertoire file *must* be justified in 14652. On inspection, it is clear that many combining characters from 10646 have been omitted, but this is not done systematically or consistently. For example, combining characters for U+064B ARABIC FATHATAN .. U+0652 ARABIC SUKUN *are* included. But if so, why not GERESH, etc., for Hebrew? On pp. 68-69, the C0 controls are duplicated in this list. They appeared already (on page 43), with different mnemonics. This calls into question the meaning of the REPERTOIREMAP file. Are duplications of characters allowed, in which case the REPERTOIREMAP file is really a definition of the mnemonics by which characters can be referred to (e.g. and ), or is it intended to be a listing of the characters in a repertoire, in which case no duplications should be allowed? If the intention is actually to define a repertoire, then the C1 control functions defined on page 69 should be omitted. These are not specified by 10646 at all, and it is dangerous in 14652 to try to override the function of other standards which specify the usage of C1 controls. If the intention is, rather, to just define a bunch of short mnemonics, then most of this entire listing is useless and should be omitted. Introducing mnemonics such as for GREEK SMALL LETTER XI and for CYRILLIC SMALL LETTER ZHE and for HEBREW LETTER FINAL KAF is completely confusing. A very small percentage of these mnemonics has seen widespread use in plaintext reference to accented characters. The rest should be completely abandoned in CD 14652 in favor of use of the hexadecimal value as the unique symbolic identifier for a 10646 characters (e.g. ). The pejorative and inaccurate note "(not a real character)" should be dropped from the listing of combining characters on pp. 69-70. Furthermore, it is completely unexplained why most of these are given user-defined character values when they are actually encoded characters in 10646. E.g. <"'> NON-SPACING ACUTE ACCENT (not a real character) must be amended to: COMBINING ACUTE ACCENT with the correct 10646 encoding and character name. Additional technical comments This proposal is not ready for prime time. We must coordinate this standardization effort with Java and Win32 internationalization. We need to treat XPG4 as one of the contributing standards, not as the standard being extended. A1. The mapping of Unicode character types to POSIX LC_TYPE attributes should be specified, but doing this by using the XPG4 LC_TYPE syntax is not appropriate. These character attributes are in general not culturally specific. The base POSIX character attributes are also missing a large number of attributes needed for parsing a larger character set. A2. There are a small number of upper/lower case conversions which are locale dependent. Even in locales with such modifications (such as Turkey) it is still necessary to have universal upper/lower functions to be able to deal with matching of names (such as file names) which are processed simultaneously in multiple locales. A3. Cultural cases of differing case mapping should be defined as exceptions, rather than building up a complete upper/lower table. The existing POSIX locales have tended to incompleteness in the case mapping tables. A4. Although the LC_COLLATE syntax is complex, it at least tries to address the problems of doing override collation from a base collation order. This is similar to what Java has done, but in this case the Java syntax is simpler than the 14652 proposal. If we limit the scope of what we expect this locale based sorting to do it is a usable compromise. Those people who need complex sorting including numeric ordering, conversion of numerics to names, and phonetic reordering should expect to use the locale as the basis for information but to significantly pre-process the data. For sorting file names a universal multi-script collation with overrides for various locales is good enough. A5. The two letter mnemonics used in the i18nrep section are worthless. I think the best solution is to use the "meaningful" names for the basic latin characters and punctuation, and the unicode based names for other characters. ================================================================ Editorial Comments pp. 64 ff. "IDEOGRAPHIC" is consistently misspelled in the character names. If this misspelling has not been caught, then all other character names should be carefully checked against 10646 to ensure that they are exactly correct. Spelling errors: p 11, 3rd paragraph "depreciated" --> "deprecated" p. 42, 2nd paragraph, last line "an" --> "and" p. 74 "with Slavic or Middle East character sets" should be corrected to "for Cyrillic or Middle Eastern scripts". __________________ end of SC22 N2612 _________________________________