From rinehuls@access.digex.net Tue May 6 23:03:29 1997 Received: from access4.digex.net (qlrhmEbBUV1EY@access4.digex.net [205.197.245.195]) by dkuug.dk (8.6.12/8.6.12) with ESMTP id XAA23212 for ; Tue, 6 May 1997 23:03:01 +0200 Received: from localhost (rinehuls@localhost) by access4.digex.net (8.8.4/8.8.4) with SMTP id QAA18269 for ; Tue, 6 May 1997 16:59:37 -0400 (EDT) Date: Tue, 6 May 1997 16:59:37 -0400 (EDT) From: "william c. rinehuls" Reply-To: "william c. rinehuls" To: sc22docs@dkuug.dk Subject: SC22 M2466 - Vote Summary of LB N2364 - CD 14651 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII ____________________beginning of title page ___________________________ ISO/IEC JTC 1/SC22 Programming languages, their environments and system software interfaces Secretariat: U.S.A. (ANSI) ISO/IEC JTC 1/SC22 N2466 May 1997 TITLE: Summary of Voting on CD Approval for CD 14651 - Information technology - International String Ordering - Method for Comparing Character Strings and Description of a Default Tailorable Ordering SOURCE: Secretariat, ISO/IEC JTC 1/SC22 WORK ITEM: JTC 1.22.30.02.02 STATUS: N/A CROSS REFERENCE: SC22 N2364 DOCUMENT TYPE: Summary of Voting ACTION: To SC22 Member Bodies for information. To WG20 for preparation of a Disposition of Comments Report and a recommendation on the further processing of the CD. Address reply to: ISO/IEC JTC 1/SC22 Secretariat William C. Rinehuls 8457 Rushing Creek Court Springfield, VA 22153 USA Tel: +1 (703) 912-9680 Fax: +1 (703) 912-2973 email: rinehuls@access.digex.net ________________end of title page; beginning of overall summary ________ SUMMARY OF VOTING ON Letter Ballot Reference No: SC22 N2364 Circulated by: JTC 1/SC22 Circulation Date: 01-20-1997 Closing Date: 04-24-1997 SUBJECT: CD Approval for CD 14651 - Information technology International String Ordering - Method for Comparing Character Strings and Description of a Default Tailorable Ordering The following responses have been received on the subject of approval: "P" Members supporting approval without comment 10 "P" Members supporting approval with comment 1 "P" Members not supporting approval 4 "P" Members abstaining 2 "P" Members not voting 7 "O" Members supporting approval without comment 1 "O" Members not supporting approval 1 "O" Members abstaining 1 Secretariat Action: The comment accompanying the abstention vote from Germany was: "There is no national WG11 (sic) rapporteur." The comments accompanying the affirmative vote from Denmark; the comments accompanying the abstention vote from the United Kingdom; and the comments accompanying the negative votes from Austria, Israel, Japan, Netherlands, and USA are attached. WG20 is requested to prepare a Disposition of Comments report and make a recommendation on the further processing of the CD. _______________end of overall summary; beginning of detail summary ___ ISO/IEC JTC1/SC22 LETTER BALLOT SUMMARY PROJECT NO: JTC 1.22.30.02.02 SUBJECT: CD approval for CD 14651 - Information technology - International String Ordering - Method for Comparing Character Strings and Description of a Default Tailorable Ordering Reference Document No: N2364 Ballot Document No: N2364 Circulation Date: 01-20-1997 Closing Date: 04-24-1997 Circulated To: SC22 P, O, L Circulated By: Secretariat SUMMARY OF VOTING AND COMMENTS RECEIVED Approve Disapprove Abstain Comments Not Voting 'P' Members Australia (X) ( ) ( ) ( ) ( ) Austria ( ) (X) ( ) (X) ( ) Belgium ( ) ( ) ( ) ( ) (X) Brazil ( ) ( ) ( ) ( ) (X) Canada ( ) ( ) ( ) ( ) (X) China ( ) ( ) ( ) ( ) (X) Czech Republic (X) ( ) ( ) ( ) ( ) Denmark (X) ( ) ( ) (X) ( ) Egypt ( ) ( ) ( ) ( ) (X) Finland (X) ( ) ( ) ( ) ( ) France (X) ( ) ( ) ( ) ( ) Germany ( ) ( ) (X) (X) ( ) Ireland ( ) ( ) ( ) ( ) (X) Japan ( ) (X) ( ) (X) ( ) Netherlands ( ) (X) ( ) (X) ( ) Norway (X) ( ) ( ) ( ) ( ) Romania (X) ( ) ( ) ( ) ( ) Russian Federation (X) ( ) ( ) ( ) ( ) Slovenia (X) ( ) ( ) ( ) ( ) Sweden ( ) ( ) ( ) ( ) (X) Switzerland (X) ( ) ( ) ( ) ( ) UK ( ) ( ) (X) (X) ( ) Ukraine (X) ( ) ( ) ( ) ( ) USA ( ) (X) ( ) (X) ( ) 'O' Members Argentina ( ) ( ) ( ) ( ) ( ) Bulgaria ( ) ( ) ( ) ( ) ( ) Cuba ( ) ( ) ( ) ( ) ( ) Greece ( ) ( ) ( ) ( ) ( ) Hungary ( ) ( ) ( ) ( ) ( ) Iceland ( ) ( ) ( ) ( ) ( ) India ( ) ( ) ( ) ( ) ( ) Indonesia ( ) ( ) ( ) ( ) ( ) Israel ( ) (X) ( ) (X) ( ) Italy ( ) ( ) ( ) ( ) ( ) Korea Republic (X) ( ) ( ) ( ) ( ) New Zealand ( ) ( ) ( ) ( ) ( ) Poland ( ) ( ) ( ) ( ) ( ) Portugal ( ) ( ) (X) ( ) ( ) Singapore ( ) ( ) ( ) ( ) ( ) Thailand ( ) ( ) ( ) ( ) ( ) Turkey ( ) ( ) ( ) ( ) ( ) Yugoslavia ( ) ( ) ( ) ( ) ( ) ____end of detailed summary; beginning of Danish Comments Accompanying Affirmative Vote__________________________ >From keld@dkuug.dk Tue Apr 29 15:57:22 1997 Here is the danish ballot on CD 14651: Title: Comments on CD 14651 - International String Ordering Source: Danish Standards Association Date: 1997-04-29 Reference: SC22 N2364 The Danish ballot is: Yes, with general and technical comments The comments are directed towards the english version of the text, although the same comments can be done wrt. the French text. 1. The overall technical contents of CD 14652 is sound, and as agreed by the working group, and thus we can accept the document as a CD. General comments: 2. There is too much emphasis on the "binary sorting string" concept. The concept of just comparing two strings should be catered for overall in the document. Some places only sorting on binary prepared strings are possible, to reach the functionality. Also there should be ample warnings a number of places on the binary sorting string concept, as it is culturally dependent, that is it is dependent on the sorting specification used to produce the binary representation. Storing data in the precompiled binary string representation should thus be recommended only for monocultural environments, and that is actually environments that we should advise against, having internationalization as our goal. 3. Formal description language, such as ISO 11404 or IDL of ISO 13788 (PCTE) should be used in the specification of the APIs. The description of the APIs lack a number of specifications now, including description of the types of the parameters, and specifications of how to bind to programming languages, that are inherent in the 11404 and 13788 specification languages. We are willing to help rewriting the API sepcifications in light of this comment. 4. We recommend that a thin binding method be used, as demonstrated in other API papers of WG20. We can provide text for this, in conjunction with text to address the problems mentioned in comment 3. 5. The APIs have 3 parameters, that should not occur in the API, because all localisation should be done via the locale. These are the parameters order_accents, order_case and sign_espace of the COMPCAR and CARABIN functions. 6. The LC_COLLATE specification in 14652 format should be readily useable and referenceable, without need for retailoring. The different options, as expressed by the parameters of the 3 parameters in our comment 5, should be available as different LC_COLLATE specifications each with a well-defined name. 7. The definitions in section 3 should be numbered and not ordered alfabetically (in either English or French). 8. The definitions are too centered about a precompiled sorting string concept. Terminology should also be applicable to comparisons on the string encoding. Terms that should be useable with plain string comparisons include: equivalence, ordering key, ordering subkey. 9. The technical specifications should be aligned with 14652. especially hexadecimal symbolix ellipses "..". 10. The names of the APIs should be less French-oriented. 11. The tables should use names established from the POSIX locale work, such as ISO/IEC 9945-2 annex G names or 14652 names from the repertoiremap, especially when not using names. 12. A number of scripts have not been ordered properly, such as hiragana and katakana and thai. 13. A reversability function from binary sort strings to character strings seems to be missing. 14. There are some spelling errors, and we suggest a spell-checker be used for production of further documents. Technical comments: 15. page 5: first paragraph: It is not always required to transform, for example "4" into a number of strings, sometimes it is only necessary to transform it into one string. Thus change "requires" to "may require" and "is hence" to "may thus be". 16. Page 5, last paragraph and following prargraphs: Too much emphasis on the precompiled sorted character sting data type. This is not a general type as noted in our comment 2. 17. Page 8, Add after "Scandinavian" "and several other". This incudes languages like Polish, Finnish, Hungarian, Turkish, and many others. 18. Page 14: "subprogramme" - rather use the word "function". All APIs in this standard are functions. All references to "subprogrammess" should be changed to "functions" in the standard. 19. page 15, first paragraph: we recommend that only uppercase characters be used in hexadecimal numbers, and this is also the specification in CD 14652. 20. Page 15, last paragraph: it seems like it is a requirement that a LC_COLLATE specification, like the default, can be tailored on the fly. This is not recommendable, as it would take quite some processing time, and thus delay the processing considerable. On the fly tailoring should thus not be a requirement. 21. Page 16, 5.1.1 last paragraph: use the name of the API (COMPCAR) instead of the number "API 1". 22. Page 17: last paragraph: the names of the functions should be used for the binding. Of cause the names of the functions may vary for the different programming languages, but the names are more than "only indicative". 23. Page 20: The COMPCAR function seems to miss a result value on whether the first string was lexiographically less, equal or greater than the second string. We propose the values -1, 0 and 1 for the three possiblities, in line with current C practice. Also return values seems to be missing for the other functions. 24. Page 21: It should not be normatively required that COMPCAR be equivalent to CARABIN and COMPBIN. CARABIN produces output that is not necessary for some use of COMPCAR. 25. Page 21, last paragraph: It should not be prescribed that there be binary strings used for comparisons, in the COMPCAR function. Also the "default" table mentioned here is the global locale, and not the 14651 default. This should be clarified, maybe using "global" instead of "default". 26. Page 22: all parameters should be spelled out, and references to other APIs when defining the parameters should be avoided. 27. Page 25 second paragraph: the default table cannot be used per se, as it needs tailoring. See our comment 6 on how to solve this. 28. Page 27, first paragraph: this description is very oriented towards the binary sort string. Descriptions also valid for COMPCAR method without binary sort strings should be present. We would request a separate descripti on how COMPCAR can be implemented, especially pointing out that only comparison of the first (few) characters are necessary in many cases, and that generating binary sort strings is typically not necessary. 29. Page 27: level 1: Some non-letters, for example Kana, may have more than one character at the first level. 30. Page 27: note of 5.3.2.1: Combining accents may have ignore at level 1, and then values at level 2. Should that not lead to full predictability? 31. Page 29: level one: Use the API names instead of "SUBPROGRAMME" 32. Page 29: what is the difference between level 2 and 4? In traditional locale invocation there is not that difference, but some other difference. Maybe level 4 should always be required. 33. Page 31: COLL_WEIGHT_MAX is not a directive of 14652. 34. Page 31: Some scripts are not (yet) in IS 10646, for example the Yi and Canadian syllable scripts. 35. Page 31: We should assure that comments are allowable all the places used here according to 14652, and possibly change 14652 to allow them. 36. Page 41-51: a number of the symbols defined here are also defined later. Example defined on page 46 and page 79. This is not allowed according to 14652 (giving a symbol two weights). 37. Page 111: (4) There needs to be a strong warning that binary strings stored cannot be used internationally for culturally correct sorting, as they are stored in a localized form. Or we should simply advise against it. 38. Page 112: the text seems obsolete, as these concepts have been proven. 39. Page 115: Also list ISO/IEC 9945-2 POSIX shell and utilities, especially annex G, as a source. 40. Page 118, paragraph 7: There is only a need for 4 levels, not 5. 41. Page 118, paragraph 7: Is it necessary to have an extra level for 10646 conformance level 3? Maybe in some cases but not generally. When sorting the combining characters per se, there is no need for a further level. 42. Page 119: paragraph 9: We thought this was proven not to be true. Or is this some implementation guideline (which then should noted as such). 43. Page 120: Annex I should be explained further, especially how it fits into the internationalization model. _________________________end of Danish Comments _______________________ ____________beginning of UK comments accompanying abstention vote ______ > N 2364 ISO/IEC CD 14651 > > The UK ABSTAINS on this ballot, due to lack of participation in this area. > The UK would however like to bring the following issues to the attention of > SC 22 : > > - a tutorial on problems solved is inappropriate for an IS; either the > document should be a TR or the tutorial moved to an appendix. > > - the statement on page 10 about information being obtainable from > Alain LaBonte' is also inappropriate for a formal document. > > > There are also a number of minor points: > > - there are a disturbingly high number of elementary typographical > errors (e.g. p 18 'starings' (strings); 'compariosn', 'aat'; also mixed > languages in chbin1, chbin2 heading). On page 19 there are French > quotation marks rather than English ones. > > - p 25 there is a reference to section 5.8, which does not exist. > > - subprogramme is consistently spelled thus, although `subprogram' is > the correct form in both US and UK (don't know about Canada, Australia > etc). ______________________________end of UK comments ______________________ _____________beginning of Austrian comments accompanying negative _____ ON (the Austrian NB) votes NO on CD Ballot SC22 N2364 (CD 14651 - Information technology - International String Ordering - Method for Comparing Character Strings and Description of a Default Tailorable Ordering) with the following comments: (1) It seems doubtful (to say the least) that a reasonable Default Ordering for all -- or even most -- of the languages of the world can be found. Consequently, there is reason to doubt the usefulness of the proposed International Standard. (2) The "Tutorial" contained in the Introduction should be moved to an informative annex; it should not remain in the main part of the document which would have to be considered normative. (3) Even though there is a "Tutorial", the proposed methods do not seem to be well explained. It could at least be expected that one should be able to read and understand the tables in Annex 1 without having to consult other sources. For an example, see page 51 where a rather poor comment, in itself encoded, supposedly explains the structure of the following tables by cryptically stating: "% ;;;" The sudden change of typeface on the same page seems equally confusing und unmotivated (except possibly by line length). Also, it seems that a more detailed description of a possible practical implementation could prove helpful. (4) The "Benchmark" in Annex 2 adds to the general confusion by showing the "sorted" version to be (in excerpt): "vice-president's" "offices" "vice-presidents'" "offices" The problem obviously lies in automatic line breaks and can easily be corrected, but seems to raise the question whether similar errors have been introduced in areas which are very difficult -- if not impossible -- to check. To mention the most prominent example, some errors in Annex 1 might never be found because this part of the document can hardly be checked exhaustively. (5) It is rather difficult to determine the necessity of text that is not present. ON does therefore not feel able to decide on Annexes F, G, and H. (6) The document has obviously been translated from French to English, which would not be a problem if the process had been completed. For a counterexample see the description of procedures chbin1 and chbin2 on page 18. Also, the name of procedure sign_espace (on page 19) seems to be partially French. (7) The document does not appear to have been spell-checked. Some examples: p. 19: "precedenceof" should be "precedence of" p.109: "deafult" should be "default" p.114: "standaredized" should be "standardized" (8) Anticipating the answer that ON experts should actively participate in the process of correction and development of the document in question, ON states that expert resources in this area are too limited at this time. However, this does not imply that any document can be accepted. Sorry. ___________________ end of Austrian comments _______________________ ________beginning of Japanese comments accompanying negative _______ Japan disapproves CD 14651 proposed in SC22 N2364. The CD is not mature enough to proceed to DIS from view point of completeness as a JTC1 standard as follows. - not precise enough tuned yet from technical view point, - still not reaching a consensus on the expected ordering result. - high dependency on ISO/IEC 14652 which is not in CD stage. and - style of the document does not meet the JTC1 requirement Therefore, because of high dependency of this CD on ISO/IEC 14652, Japan requests to wait and synchronize the review and ballot of CD 14651 until CD 14652 is registered, or to change the scope of the standard to "ordering result" only and move API part to i18n API project. Thus, Japan sees absolutely no reason why we need to proceed to DIS now. Comment detail. 1. Style (major editorial) The CD is very different from the what ISO/JTC1 directive requires, (and also different from the template provided by ITTF and many of JTC1 standards) For example, there are very high dependency on font selection (usage of bold, slant, point size variation and/or unnecessary type face mixture. are prohibited). The Definition clause need to have sub-clause for each terms, two groups of annex --one for normative and another for informative. Review and rewrite all text according to ISO/JTC1 directive and template supplied by ITTF. 2. Relation with ISO/IEC 14652. (General process) The syntax and semantics of Annex 1 are not defined in this draft and are depending on ISO/IEC 14652 which is not available yet. Synchronize the project with ISO/IEC 14652 development -- wait for decision until CD 14652 is available at least, or, if it is not accepted, move related part of the ISO/IEC 14652 into this CD.. 3. Tutorial (major editorial) Heavy tutorial clause at the beginning is not a thing to do, move them to appropriate place and rewrite them to fit the new place. In addition, there are many "information only" text in main clauses (such as clause 5.3). Remove them out from main (and mostly normative) part of the standard, and place them (if really necessary) to appropriate related place(s). 4. Scope (major technical) Describe what are this standard defines clearly and straight forward way. For example, change the word "a method" to much clear specific word (which is API). Once above change is made, it may affect on the title of the standard. Also the word "Default Tailorable Ordering" does not have logical meaning. One possibility of the new title would be "API with default order for International string ordering". Last part of 2nd bullet (on an order which is culturally---of that script) should be removed because "order which is acceptable culturally" is not a scope of this standard. This part should be re-written something like "The default order is aiming for easy understanding of non-casual user of the script, cultural correctness/acceptance is not a purpose of the default order. The correctness/acceptance by the casual (or native) user to be provided by tailoring by the user or as a country profile". Rationale: Above has been an agreement on the project scope from the beginning. There were many discussions of impracticalness of having a single default order which may satisfy all of cultures. The conclusion has been it is not practical to have such an ideal default order, and it was said that "this is why tailoring is needed". Japan, then, did not request culturally correctness for ordering. Same story for French, since French ordering is so sophisticated no outsider understand it easily, therefore, it is not practical to use true French order as international default order, it may causes mis-understanding of peoples of other cultures. Such sophisticated ordering (such as French) can be satisfactorily supported by tailoring anyway. (See clause 4.2.7 of DTR 11017, This IS is not i18n per 4.2.6 nor 4.2.4. This IS is aiming 4.2.7) 5. Definitions (major technical) 5-1, Each definitions should have separated sub-clause number. 5-2. API: Initial text of "for purpose of..... standard" is not necessary. 5-3. equivalence: Too much, make it almost 1/3 by eliminating "informative" texts with in this definition. (for example: last 4 lines) 5-4. field, first order talken, fourth order talken level, level, second order talken, transformation, third order talken: Eliminate "informative" explanations. 5-5. posthandling, prehandling : Those definition should be moved to the related clause. 5-6 telephone-book-type transformation: This term need not be defined in Definitions because it appears only once in Introduction (5th para., Page 5). Although Japan considers that the paragraph is understandable in itself, we propose to change the first sentence to: More generally, specific requirements exist for a kind of complex transformation -- e.g. phonetic transformation adopted in some telephone-book systems because telephone-book ordering means differ from culture to culture, so, this wording may confuse the user. 6. Conformance (major technical) 6-1. Conformance clause(s) should come after the scope clause it should not be after the requirements clause. The location of the conformance clause is inviting difficulty of understanding of each conformance levels clearly. Reason (rationale) why conformance clause should be clause 2: If requirement is simple and no leveling are employed, the conformance clause can be any place in theory. Note that ISO/IEC directive part-3 does not require "conformance clause" even. However, in case of ISO/IEC CD 14651, the condition is different, it should be clause 2. Since 14651 is a very complicated multilevel standard. the scope clause can not cover all what "scope' clause should say. The conformance, in particular, the clean and clear "levels" descriptions are acting, in reality, as a sub-scope clauses as well as real conformance descriptions. If it does not come after "scope" clause, it is almost impossible for the user of the standard to understand "what are defined in this standard and how to read the standard efficiently and accurately". 6-2, Conformance clause should have exact pointer(s) for the conformance requirement (clause and sub-clause numbers). Umbrella conformance for buried requirements with in main clauses (like this CD) should not be used. (Current CD is too unkindly for reader) 6-3. In case of leveled conformance, provide a sub-clause to explain what those levels are much straight way. (Too many indirect explanation now). 6-3-1. Conformance level-1 should be defined as "Generic API only. And should not make some of the parameters as "option". The option causes in-compatibility problems between conforming level-1 APIs. Further define two options (not parameter option s), one for COMPCAR and another for COMPBIN + CARABIN. 6-3-2 Conformance level-2 should be defined and stated as "Generic API and table format" 6-3-3 Conformance level-3: Change prehandling to requirement for string input as normative. Thus prehandling is out of scope of this standard (remove 5.1.2 at least). Then, change the description of this conformance level accordantly. By the way, in current text, normative clause (5.1.2) is reefers informative annex. This is prohibited practice. 6-3-4 Conformance level-4. Remove the word "possibility". then resultant might be "Add API an access method for specific table. 6-4. Add a concept of conformance for "ordering result only" 6-5 Add a method to specify partial conformance of ordering result, for example, a method to state "every thing but Japanese repertoire are conforming this default order and Japanese repertoire are per JIS" would be a real life use of this standard. (as one of sub-set of the ordering result only conformance) 6-6, Add a method to swap the order of th 0a+e scripts, but still the orders within each scripts are conforming default order. 6-7, Add a method to state only selected scripts in comment 6-6 are conforming the default order. 6-8, Maintain compatibility with POSIX and C. Providing independent conformance level may be one of the choice to respond for this comment. . 6-9, Remove all of "best guess" dependency. Write exactly what is needed. For example, there is no description what "default order" is. There is default table and API (and conformance levels), so best guess may be use the "default table" with the API s. 7. Requirements (major technical) 7-1. There are many options in one conformance level, those should be another levels of conformance if those are really necessary. 7-2. The "Toggle" mechanism, which is realized by parameters "order_accent", "order_case" and "sign_escape", should be removed because: 1) it contradicts with the concept of the locale mechanism -- it allows an ordering regardless of the ordering table defined as a locale, 2) the concepts of "case" and "accents" are specific to some scripts and they are not defined in this draft where these script-dependent concepts have been resolved into universal rules in tables. Instead of the current "Toggle" mechanism, Japan proposes to reconsider the specification of ordering tables, which will be defined in ISO/IEC 14652, so as to enable variants of the default table be defined more flexibly -- for example, by introducing som e preprocessing elements #define ... #ifdef ... #include ... etc. 7-3. table To specify a name of an ordering table in COMPCAR and CARABIN as a parameter "table" will put a heavy burden on implementations. At runtime the processes COMPCAR and CARABIN should check every time whenever the table is changed from that of the previous call and/or the table should be compiled. There are two alternatives to this problem: 1) to remove the parameter "table" from the two processes and define a new process "set_collating_table" which has a parameter "table", 2) to define a new process "open_table" which has an input parameter "table" and returns a pointer to a protected structure derived from that "table" while the parameter "table" in the two process is changed to "table_pointer". 7-4 "chbin1" and "chbin2" in COMPCAR are not necessary. Further more, options within an API specification does not make any sense at all. 7-5. The whole contents of 5.3 should be removed or put into an informative annex because those contents are to be defined in ISO/IEC 14652 in the current framework. 7-6. Add text for the case where characters are not encoded in ISO/IEC 10646. Some character set, e.g. ISO 6937 are not in ISO/IEC 10646, and some do not have conversion table (or same character names) with ISO/IEC 10646 (yet). 8. Data table (such as Annex A) (major technical) 8-1. Japan confirms a principle of default order table as: - The default order is non-native user friendly (easy to understand, simple rule, less exceptions) - Cultural correctness for the native user of the script should be done by tailoring. APIs and data format should have enough room for the necessary tailoring. - Therefore, cultural correctness of the default order is not a goal of this standard. Based on the principle above, Japanese proposal on Japanese scripts are not correct for Japanese view, however, it is easy for the people who are not familiar with Japanese scripts. 8-2 Collation for HIRAGANA and KATAKANA Japan proposes to add a set of collating rules for HIRAGANA and KATAKANA attached.. The order defined in Attachment is different from one defined in JIS X 4061 which was published in February 1997. The main differences in handling of a prolonged sound mark . Roughly speaking, JIS X 4061 replaces the prolonged sound mark with the vowel of the most recent letter, while Attachment neglects the prolonged sound mark at first in the same way as a hyphen. The second difference is handling of the iteration marks , , , . Roughly speaking, JIS X 4061 replaces the iteration marks with the most recent KANA letter, while Attachment handles the iteration marks as they are. The reasons for proposing Attachment are as follows: 1) JIS X 4061 cannot be realized by LC_COLLATE representation unless some rules using regular expression, which will put a heavy burden on implementations, are introduced, 2) ordering results of JIS X 4061 are hard to understand for foreigners without knowledge of how letter sequences are pronounced -- it is not cross-culture friendly, 3) ordering results of Attachment are easy to understand for foreigners without knowledge of pronunciation of letter sequences and even in Japan, a number of encyclopedia order their items in the same way as Attachment does -- it is cross-culture friendly, 8-3 Consideration on Compatibility characters of ISO/IEC 10646. Consideration on the compatibility characters are missing. At least, following are needed. 8-3-1 UFF00-FF9F, FFE0-FFE8 Handle those characters as same as equivalent characters in A-zone. 8-3-2 F900-FA0D, FA10, FA12, FA15-FA1E, FA20, FA22, FA25, FA26, FA2A-FA2D of ISO/IEC 10646-1 Handle those characters as same as equivalent characters in I-zone. 8-4 FA0E, FA0F, FA11, FA13, FA14, FA1F, FA21, FA23, FA24, FA27-FA29 of ISO/IEC 10646-1 and future addition of CJK ideographs (ext-A and B). Merge them with I-zone characters with defined rule. Provide informative annex which describe the rule (radical, number of the stroke and so on.....) 8-5 Character combination type symbols. For those characters which are made up combination of two or more Japanese characters such as 3300-336F, Handle those as if those are string of independent characters. 8-6. Symbols of character(s) and symbol(s) Symbols with character(s) should be handled one of following methods. a) Character(s) and symbol(s) like "short form" of normal writing such as 2480 which is looked like "( 13 )". Split the symbol as if it is a normal string. b) Character(s) and symbols can not split into one unambiguous sequence such as 2470 which the circle can be either before or after character 17. Handle as if it is a special form of the character(s) part of the symbol. 8-7. Symbols for making combining sequence such as 20E0. Follow the rule proposed at 8-6 above, the process might be different from the method for combining sequences. 8-8. Japan expect many countries have same kinds of comments above. Japan request, therefore, confirmation of specific to the data table to be circulated to all JTC1 member countries (not only SC22 p-member) for review. 9. Other comments Japan recognizes many editorial issues as well as technical issues which are not on this ballot comment, too many major technical comments (and may be more to expect) does not give us a time to scan all of them. Japan thinks the minor editorial comment are unnecessary components of this ballot comments because of un-matureness of the CD 14651. Anyway, the text should be rewritten totally for full acceptance of the technical comments. ------- ATTACHMENT --------- %level 1 %level 2 %level 3 % for Table 122 % for Table 122 % abbreviations in comments -- UCS Name: % HIRAkn -- HIRAGANA LETTER % KATAkn -- KATAKANA LETTER % hfwd -- HALFWIDTH % voice -- HIRAGANA-KATAKANA VOICED SOUND MARK % semi-voice -- HIRAGANA-kATAKANA SEMI-VOICED SOUND MARK % iter -- ITERATION % comb -- COMBINING % prolong -- KATAKANA-HIRAGANA PROLONGED SOUND MARK % ;;; % HIRAkn SMALL A ;;; % HIRAkn A ;;; % KATAkn SMALL A ;;; % hfwd KATAkn SMALL A ;;; % KATAkn A ;;; % hfwd KATAkn A % ;;; % HIRAkn SMALL I ;;; % HIRAkn I ;;; % KATAkn SMALL I ;;; % hfwd KATAkn SMALL I ;;; % KATAkn I ;;; % hfwd KATAkn I % ;;; % HIRAkn SMALL U ;;; % HIRAkn U ;;; % HIRAkn VU ;;; % KATAkn SMALL U ;;; % hfwd KATAkn SMALL U ;;; % KATAkn U ;;; % hfwd KATAkn U ;;; % KATAkn VU % ;;; % HIRAkn SMALL E ;;; % HIRAkn E ;;; % KATAkn SMALL E ;;; % hfwd KATAkn SMALL E ;;; % KATAkn E ;;; % hfwd KATAkn E % ;;; % HIRAkn SMALL O ;;; % HIRAkn O ;;; % KATAkn SMALL O ;;; % hfwd KATAkn SMALL O ;;; % KATAkn O ;;; % hfwd KATAkn O % ;;; % HIRAkn KA ;;; % HIRAkn GA ;;; % KATAkn SMALL KA ;;; % KATAkn KA ;;; % hfwd KATAkn KA ;;; % KATAkn GA % ;;; % HIRAkn KI ;;; % HIRAkn GI ;;; % KATAkn KI ;;; % hfwd KATAkn KI ;;; % KATAkn GI % ;;; % HIRAkn KU ;;; % HIRAkn GU ;;; % KATAkn KU ;;; % hfwd KATAkn KU ;;; % KATAkn GU % ;;; % HIRAkn KE ;;; % HIRAkn GE ;;; % KATAkn SMALL KE ;;; % KATAkn KE ;;; % hfwd KATAkn KE ;;; % KATAkn SMALL GE % ;;; % HIRAkn KO ;;; % HIRAkn GO ;;; % KATAkn KO ;;; % hfwd KATAkn KO ;;; % KATAkn GO % ;;; % HIRAkn SA ;;; % HIRAkn ZA ;;; % KATAkn SA ;;; % hfwd KATAkn SA ;;; % KATAkn ZA % ;;; % HIRAkn SI ;;; % HIRAkn ZI ;;; % KATAkn SI ;;; % hfwd KATAkn SI ;;; % KATAkn ZI % ;;; % HIRAkn SU ;;; % HIRAkn ZU ;;; % KATAkn SU ;;; % hfwd KATAkn SU ;;; % KATAkn ZU % ;;; % HIRAkn SE ;;; % HIRAkn ZE ;;; % KATAkn SE ;;; % hfwd KATAkn SE ;;; % KATAkn ZE % ;;; % HIRAkn SO ;;; % HIRAkn ZO ;;; % KATAkn SO ;;; % hfwd KATAkn SO ;;; % KATAkn ZO % ;;; % HIRAkn TA ;;; % HIRAkn DA ;;; % KATAkn TA ;;; % hfwd KATAkn TA ;;; % KATAkn DA % ;;; % HIRAkn TI ;;; % HIRAkn DI ;;; % KATAkn TI ;;; % hfwd KATAkn TI ;;; % KATAkn DI % ;;; % HIRAkn SMALL TU ;;; % HIRAkn TU ;;; % HIRAkn DU ;;; % KATAkn SMALL TU ;;; % hfwd KATAkn SMALL TU ;;; % KATAkn TU ;;; % hfwd KATAkn TU ;;; % KATAkn DU % ;;; % HIRAkn TE ;;; % HIRAkn DE ;;; % KATAkn TE ;;; % hfwd KATAkn TE ;;; % KATAkn DE % ;;; % HIRAkn TO ;;; % HIRAkn DO ;;; % KATAkn TO ;;; % hfwd KATAkn TO ;;; % KATAkn DO % ;;; % HIRAkn NA ;;; % KATAkn NA ;;; % hfwd KATAkn NA % ;;; % HIRAkn NI ;;; % KATAkn NI ;;; % hfwd KATAkn NI % ;;; % HIRAkn NU ;;; % KATAkn NU ;;; % hfwd KATAkn NU % ;;; % HIRAkn NE ;;; % KATAkn NE ;;; % hfwd KATAkn NE % ;;; % HIRAkn NO ;;; % KATAkn NO ;;; % hfwd KATAkn NO % ;;; % HIRAkn HA ;;; % HIRAkn BA ;;; % HIRAkn PA ;;; % KATAkn HA ;;; % hfwd KATAkn HA ;;; % KATAkn BA ;;; % KATAkn PA % ;;; % HIRAkn HI ;;; % HIRAkn BI ;;; % HIRAkn PI ;;; % KATAkn HI ;;; % hfwd KATAkn HI ;;; % KATAkn BI ;;; % KATAkn PI % ;;; % HIRAkn HU ;;; % HIRAkn BU ;;; % HIRAkn PU ;;; % KATAkn HU ;;; % hfwd KATAkn HU ;;; % KATAkn BU ;;; % KATAkn PU % ;;; % HIRAkn HE ;;; % HIRAkn BE ;;; % HIRAkn PE ;;; % KATAkn HE ;;; % hfwd KATAkn HE ;;; % KATAkn BE ;;; % KATAkn PE % ;;; % HIRAkn HO ;;; % HIRAkn BO ;;; % HIRAkn PO ;;; % KATAkn HO ;;; % hfwd KATAkn HO ;;; % KATAkn BO ;;; % KATAkn PO % ;;; % HIRAkn MA ;;; % KATAkn MA ;;; % hfwd KATAkn MA % ;;; % HIRAkn MI ;;; % KATAkn MI ;;; % hfwd KATAkn MI % ;;; % HIRAkn MU ;;; % KATAkn MU ;;; % hfwd KATAkn MU % ;;; % HIRAkn ME ;;; % KATAkn ME ;;; % hfwd KATAkn ME % ;;; % HIRAkn MO ;;; % KATAkn MO ;;; % hfwd KATAkn MO % ;;; % HIRAkn SMALL YA ;;; % HIRAkn YA ;;; % hfwd KATAkn SMALL YA ;;; % KATAkn SMALL YA ;;; % KATAkn YA ;;; % hfwd KATAkn YA % ;;; % HIRAkn SMALL YU ;;; % HIRAkn YU ;;; % KATAkn SMALL YU ;;; % hfwd KATAkn SMALL YU ;;; % KATAkn YU ;;; % hfwd KATAkn YU % ;;; % HIRAkn SMALL YO ;;; % HIRAkn YO ;;; % KATAkn SMALL YO ;;; % hfwd KATAkn SMALL YO ;;; % KATAkn YO ;;; % hfwd KATAkn YO % ;;; % HIRAkn RA ;;; % KATAkn RA ;;; % hfwd KATAkn RA % ;;; % HIRAkn RI ;;; % KATAkn RI ;;; % hfwd KATAkn RI % ;;; % HIRAkn RU ;;; % KATAkn RU ;;; % hfwd KATAkn RU % ;;; % HIRAkn RE ;;; % KATAkn RE ;;; % hfwd KATAkn RE % ;;; % HIRAkn RO ;;; % KATAkn RO ;;; % hfwd KATAkn RO % ;;; % HIRAkn SMALL WA ;;; % HIRAkn WA ;;; % KATAkn SMALL WA ;;; % KATAkn WA ;;; % hfwd KATAkn WA ;;; % KATAkn VA % ;;; % HIRAkn WI ;;; % KATAkn WI ;;; % KATAkn VI % ;;; % HIRAkn WE ;;; % KATAkn WE ;;; % KATAkn VE % ;;; % HIRAkn WO ;;; % KATAkn WO ;;; % hfwd KATAkn WO ;;; % KATAkn VO % ;;; % HIRAkn N ;;; % KATAkn N ;;; % hfwd KATAkn N --- ;;; % comb voice ;;; % comb semi-voice ;;; % voice ;;; % hhwd voice ;;; % semi-voice ;;; % hfwd semi-voice ;;; % HIRAGANA iter MARK ;;; % HIRAGANA VOICED iter MARK ;;; % KATAKANA iter MARK ;;; % KATAKANA VOICED iter MARK % ;;; % KATAKANA MIDDLE DOT ;;; % hfwd KATAKANA MIDDLE DOT ;;; % prolong ;;; % hfwd prolong % hfwd HALFWIDTH IDEOGRAPHIC FULL STOP -- to be handled with % hfwd HALFWIDTH LEFT CORNER BRACKET -- to be handled with % hfwd RIGHT CORNER BRACKET -- to be handled with % HALFWIDTH IDEOGRAPHIC COMMA -- to be handled with ________________________end of Japan comments ___________________________ __________beginning of Netherlands comments accompanying negative _____ From: John Bijlsma JTC1 SC22 N2364, ISO/IEC CD 14651 IT - International String Ordering - Method for Comparing Character Strings and Description of a Default Tailorable Ordering 97-04-24, DISAPPROVAL WITH COMMENT ...................................................... The Netherlands vote negative on CD 14651. To turn our vote to positive modifications shall be made in accordance with our comments. We reserve our final position regarding the CD until we have seen the Final CD. Technical comments: 1. Remove Annex 1 and all references to an International Default Order. -- SC22 has no expertise in this field, and cannot check for correctness Most NBs in SC22 are not able to check whether a proposed ordering for a certain unfamiliar script is in agreement to actual practice far from home. Those NBs that are familiar are not represented in SC22, nor have been asked for comment. -- Default order is an instrument of cultural imperialism. In several countries more than one ordering rule is in use without any agreed preference. Calling one of these the "default" is imposing an extraneous pressure, and will involve interference with national habits. -- No need for a default. No country uses always all characters from 10646. They should not be burdened with unwanted features. A method for supplying ordering information for a given restricted character set to an API should be contained in 14651 itself, without reference to 14652. 2. Remove all references to 14652. -- Needless complexity should be avoided. An ISO standard should be as independent as possible of other ISO standards. If ordering information can only be supplied by way of a complete set of cultural conventions, as specified in 14652, there is involved an enormous overhead, and an obligation to NBs of also having to specify non-ordering information which is irrelevant to 14651, but nevertheless required in this CD. Editorial comments: The text of this document leaves much to be desired regarding precision of definition, clarity of presentation and conformance to ISO directives part-3. The NNI cannot give detailed comments here, nor offer replacement text as doing so would require rewriting more than half of the document for which we have no resources available. The NNI already gave some directions with its vote on CD-registration, but found almost no improvement in this CD. __________________________end of Netherland comments ________________ _________beginning of USA comments accompanying negative _____________ The US National Body votes to Disapprove ISO/IEC CD 14651 with the following comments: These are the U.S. comments for the first CD ballot for ISO/IEC CD 14651, International String Ordering (SC22 N2364). No alternative text is supplied as part of this response because a lot of it would have to be written. Here are the concerns: AF-1 The specification of the sorting algorithm must be made independently of a programming model. Sorting is a process that is used in an incredible variety of circumstances and on widely different systems, including object-oriented systems. Care should be taken in preparing the normative specifications for CD 14651 that they are usable independent of a particular programming model, programming language, or environment. In particular, the descriptions of the sorting operations should be expressed in an abstract form, specifying IN, OUT and RETURN parameters but "without" language binding. Also, no parameters needed for the sorting operation may be presumed to hide in some semi-opaque state, but rather they should always be specified explicitly in the description of the operation. If it is desired to show how the standard might be implemented in a POSIX environment, that could be the subject of an informative annex. Function bindings for POSIX could assume transparent access to locale data from the POSIX locale model, if that is desired. The annex would specify how the proposed POSIX functions make use of the abstract operations defined in the normative part of the standard, and how their parameters are set either explicitly or implicitly. RLG 1: The body of the standard includes material which belongs in an informative annex, specifically the "Tutorial on problems solved by this standard." RLG 2: The order specified for two Cyrillic characters (p. 95-100 of the CD) conflicts with the order in Table 2 of ISO/R9 and other sources (cited below). The characters in question are these two case pairs: CYRILLIC CAPITAL LETTER TSHE/CYRILLIC SMALL LETTER TSHE and CYRILLIC CAPITAL LETTER DZE/CYRILLIC SMALL LETTER DZE. Cyrillic letter TSHE: In the CD, TSHE follows KA WITH HOOK and precedes EL. In ISO/R9 and other sources, TSHE follows TE and precedes U. Cyrillic letter DZE: In the CD, DZE follows KOPPA and precedes CHE. In ISO/R9 and other sources, DZE follows ZE and precedes I. Other differences in the order of Cyrillic characters between the CD and Table 2 of ISO/R9 are either not supported by the other sources or are arbitrary. RLG 3: The order of scripts on p. 31 differs slightly from the order in ISO/IEC 10646. Specifically: - Georgian follows Cyrillic; in ISO/IEC 10646, it follows Tibetan (pDAM-6) - Hebrew follows Arabic, in ISO/IEC 10646, it follows Armenian (and precedes Arabic). These differences are not explained. RLG 4: Hangul is positioned between Tibetan and Cherokee (i.e., consistent with the location of Hangul Jamo in ISO/IEC 10646). There is no explanation as to why this position was chosen, rather than that of Hangul Syllables. Since Korean may be written with a mixture of ideographs and Hangul syllables, the Hangul Syllables position established by pDAM-5, immediately after the CJK Unified Ideographs, might be preferable. HP 1 The outline of the document does not follow the well defined and established method already used in other JTC1 standards. For example, the Introduction is too big and the reader gets lost and might decide not to continue to read the document. Usually such information belongs to an informative annex otherwise it becomes normative. HP 2 The structure of the document has the "Scope" clause on page 11. This clause should come immediately after a newly written short Introduction clause. In addition, this clause needs clarifications. For example, does it describes the APIs needed by applications to specify character string ordering? It is also not clear what is meant by the phrase "full repertoire of ISO/IEC 10646 (independently of coding)". The part that is not clear in the previous statement is the one in parenthesis. In addition, the "Scope" clause talks about a specific default ordering but it is not clear as to where in the CD how it was derived or how it is related to the APIs. HP 3 The "Conformance" clause should follow immediately the "Scope" clause. It should be combined with the "Requirements" clause. It should be rewritten to make easy to understand how to conform without having to go through the syntax and content complexity of the "Requirements" clause. Conformance is difficult to determine from the document; the document requires a table of precisely which features are required. Moreover, the functions levels are, in general, independent of the previous level; there is little reason to force all features of one level before the next higher is reached. Post handling is informative, and has no place in conformance. HP 4 In the clause "Tailoring Mechanism", it is not clear at all as to what an application developers needs to do to override the default ordering that is specified in Annex 1. HP 5 May be it would be better to have this CD become a Technical Report rather than a standard since it allows users to override the default ordering proposed and there might be more users overriding the default, with an undefined and nowhere described mechanism, than what the CD proposes. HP 6 Dependency on an unpublished standard 14652, Cultural Conventions Specification is too high. Currently, 14652 is still in the CD stage as mentioned in clause 2, Normative References, of this CD (14651). In summary, there is a lot of structural and technical fine tuning that is necessary to make this document complete. If such an effort takes too much time may be the industry could be served better if the proposal is modified for publication as a TR rather an ISO standard. This work can be later converted to an ISO publication when CD 14652, Cultural Conventions Specification, is accepted and is published as an ISO standard. TG 1 The organization and nomenclature (e.g. COMPCAR) in unnecessarily obscure. Names should be spelled out completely for clarity. TG 2 The requirement that the original string be recoverable is unnecessary; many applications, such as databases, will have a sort key be an alternate field in the record. They may only need to have a level 1 sort for their application. In that case, storing the original string twice or requiring internal structure that enables reconstruction is unnecessary and only increases storage to no purpose. TG 3 Use of NBSP is in practice an unacceptable overload of its primary function. Being able to functionally tailor just space and nbsp is in practice not useful; in general a whole host of similar characters, punctuation and symbols, behave the same way. TG 4 The algorithm for comparison must be stated in terms of results, NOT a specific mechanism. TG 5 The format in Annex 1 is unnecessarily complex. It is impossible to assess and recommend this standard where we cannot clearly determine the result of the default sorting order rules in this annex. It forces use of a whole separate notation for characters. To correct this, characters must always be referred to by their full 10646 name for clarity, rather than arbitrary notations such as AYEHS, AIGUT, POINN, QARNP, or many other examples. Script names should always be the 10646 block name. TG 6 The equivalencies of composed characters vs. composite character sequences; e.g. a + umlaut and a-umlaut can be stated much more succinctly. TG 7 The relative ordering of characters cannot be determined from the character lists, since they are not even remotely in the resulting order. To correct this, the ordering of characters within a script must be presented in the resulting order as much as possible. Example: IGNORE;IGNORE;IGNORE; % NULL IGNORE;IGNORE;IGNORE; % SYMBOL FOR NULL IGNORE;IGNORE;IGNORE; % START OF HEADING IGNORE;IGNORE;IGNORE; % SYMBOL FOR START OF HEADING IGNORE;IGNORE;IGNORE; % START OF TEXT IGNORE;IGNORE;IGNORE; % SYMBOL FOR START OF TEXT IGNORE;IGNORE;IGNORE; % END OF TEXT IGNORE;IGNORE;IGNORE; % SYMBOL FOR END OF TEXT ... The fourth column (in this case) determines the final ordering of the characters, which is NOT the order presented. It must be presented as: IGNORE;IGNORE;IGNORE; % NULL IGNORE;IGNORE;IGNORE; % START OF HEADING IGNORE;IGNORE;IGNORE; % START OF TEXT IGNORE;IGNORE;IGNORE; % END OF TEXT ... IGNORE;IGNORE;IGNORE; % SYMBOL FOR NULL IGNORE;IGNORE;IGNORE; % SYMBOL FOR START OF HEADING IGNORE;IGNORE;IGNORE; % SYMBOL FOR START OF TEXT IGNORE;IGNORE;IGNORE; % SYMBOL FOR END OF TEXT TG 8 The Annex also does not make clear that the vast majority of its characters are sorted in character code order. This requires the reader to visually inspect every line to no purpose. These should be replaced one statement; "Except where otherwise noted, all symbols are sorted as: IGNORE;IGNORE;IGNORE;" TG 9 Annex 2 List #1 is superfluous. The statement should be that the words in List#2 in any initial order, when sorted will result in List #2. ______________________ end of USA comments __________________________ _________beginning of Israel comments accompany negative ________________ THE STANDARDS INSTITUTE OF ISRAEL (SII) Comments on ISO/IEC CD 14651 (ISO/IEC JTC 1/SC22/WG20 N471en) The SII votes NO on CD 14651. If items 1, 2 and 3 were to be accepted, our vote would become YES. 1. Hebrew Accents The Hebrew accents (UO591 to UO5AF), Meteg (UO5BD) and Upper Dot (UO5C4) do not participate in the string ordering process. They relate, in fact, to the whole word, rather than to the letter to which they are attached, and are never used in the lexicographic order or in any other ordering of Hebrew texts. - The Hebrew accents should be removed from the list of collating symbols, page 35, and from page 45. - On page 56 they should all be defined as: - IGNORE; IGNORE; IGNORE; IGNORE; 2. Composite characters and combining characters. It seems that combining characters do not sort and compare as equivalent to their precomposed encoding. For instance, the two strings "Gu:nther" and "Gu:nther", the first coded with UOOFC, the second with UOO75 followed by UO3O8, are equivalent and should not be distinguished but are not equivalent in the CD. The particular coding used is an artifact, possibly not under the control of the user, and is normally meaningless. 3. Introduction, page 6, last paragraph: "If two equivalent strings are not absolutely identical, then the tie must be broken." This sentence is not acceptable. If two strings are equivalent they should be treated as such. For example, Hebrew strings that are equivalent but have different accents. 4. Introduction, page 4 (Editorial): The introduction begins with a negative statement and continues with a criticism of past practices. The SII suggests it should be preferable to begin with a positive statement describing what the standard is and what are its benefits. 5. Tutorial, page 7 (Editorial). The tutorial would be better placed in an informative appendix. 6. Page 35 (Editorial). The comment should be qubuts (the s is mussing). ________________end of Israel comments; end of document SC22 N2466 ____