Title: Draft disposition of comments on CD 14652 - SC22 N2504 Source: Keld Simonsen, project editor Date: 1997-11-18 Status: draft disposition, second round Canadian Comments on ISO/IEC 14652 WD: 1. There is no rationale as to why this standard is required. There is rationale in Annex B for the FDDC-set and for the various LC_* categories but none for this standard. It would be very helpful to add such rationale to understand why this standard is necessary and what problems it solves. Disposition: Accepted. The rationale could be described in a new clause 0: introduction, which describes the benefit of this standard, to different types of users, and with respect both to the sorting part and the rest, also seen in relation to other WG20 standards, along the line of what has been described earlier on the benefits of WG20 standards. Specific Comments: 2. section 3.1.7 - definition of charmap needs to be changed to ..." a definition of a mapping between symbolic character names and the encoding for a coded character set" Dispositon: accepted. - The defintion for FDCC states that the term replaces the POSIX term 'locale', as the new entity is a superset of the locale as it is currently used. One may debate the point, but as a superset it fails to deal with basic issues of multiple concurrent support of differing formats (have 2 or more local currency formats as needed in Europe) and calendaring other than Gregorian. I would have expected more support from a new standard. Disposition: accepted. The multiple currency problem of Europe will be addressed. Canada is invited to provide text for other calendaring at the next ballot. 3. section 4 - FDDC-set. The paragraph beginning "..Other category names...". This is an unnecessary restriction and one that will cause problems with existing implementations. POSIX had no such restriction and as a result we have implementations that have introduced categories such as LC_TIMEZONE or LC_TOD etc. We could say that the six categories are mandatory in the FDDC-set. Disposition: Accepted. The restriction will be changed into a recommendation. - The proposal states "In the event that some of the information for an FDCC-set category, as specified in this standard, is missing from the FDCC-set source definition, the behavior of that CATEGORY, if it is referenced, is unspecified." This is too restrictive, in that the complete category is 'wasted'. Perhaps a word of clarification is required. Does the proposed standard really want to have the complete category ignored? If so, there is a requirement on the object creation mechanism to issue a failure message during the compilation step of the 'FDCC-set object'. Disposition: Accepted. All keywords will be checked and the meaning of a missing keyword will be specified. 4. section 4.1.0.5 - the third paragraph ("The items (2), ....") should be moved before item (2). Disposition: Accepted. 5. section 4.1.1. The portable character set should be mentioned because the next sub-section assumes some defaults that are characters in this set (which is also always a part of any charmap?). Disposition. Accepted. Added text: "Support for the portable character set is required." 6. section 4.1.1.1. - are these the only keywords allowed? Other keywords are allowed in POSIX. Statement should indicate that this list does not preclude others but that these are the minimum that are supported. Disposition: Accepted in principle. In 4.1 it is said generally that you can have implementation-defined keywords. There is no need to repeat this here. - under "upper" etc. it states that "..if this keyword is not specified, the uppercase letters A through Z, shall automatically belong to his class...". This is fine when the keyword is absent. But in the opposite case this means that one can specify a whole range of characters under "upper" and exclude the A to Z set. This is not what you want. A statement should be made here that indicates that either: - one must include the portable character set when specifying characters in "upper", or - that these are automatically included if one does not include them in the specification for "upper" Of course, this applies to other keywords in this section as well. Disposition: Accepted, with the addition that A to Z is "of the the portable character set". The characters will be automatically included. This also applies to the keyword "lower". - under "graph", change "printable" to "graphical" Disposition: Rejected. The wording is consistent with POSIX-2 wording. It is noted that IDEOGRAPHIC SPACE is not in class "graph". - the keyword 'digit' ONLY allows the use of digits 0 through 9, but does not state whether they can be values in any language. Disposition: Accepted. The "digits" keyword will be expanded to take other digits in groups of 10, where digits are in ascending order with corresponding values from zero to nine. - table 1: - the intersections of (upper, upper), (lower, lower). etc., should be indicated as N/A (not applicable). Disposition: Accepted in principle. The diagonal will become blank. - upper (row) should not be permitted in lower (column) and vice versa Disposition: Rejected. Some characters are both upper and lower, such as the "Lj" character (U01C8). 7. section 4.1.1.2: - No information is provided on the API that may be able to use such information. The term transformation is used as a synonym for transliteration. Transliteration should be used and the term transform should be used to avoid confusion with other functions performing string transforms (UTF-n, layout transforms....) Disposition: Accepted. The term "transliteration" will be used. - add new sub-section numbers (4.1.1.2.x) for: - transform_start keyword - transform_end keyword - include keyword - default_missing keyword Disposition: accepted. - suggest a sub-section for the example. Also, the text at the end of the example should be preceded with "...in the example above.." or words to that effect. Disposition: accepted. 8. Add section 4.1.1.3 for the "i18n" LC_CTYPE. Disposition: accepted. 9. The LC_CTYPE that is shown should be a model; it does not follow the order of the keywords shown in 4.1.1.1. - it should. Disposition: accepted. -Also, if one looks under "toupper" in 4.1.1.1, it states that "...only characters specified for the keywords lower and upper shall be specified". In this definition of LC_CTYPE, "toupper" is defined. Unfortunately, the keywords "lower" and "upper" are NOT specified!! Disposition: accepted. Keywords "lower" and "upper" will be specified. -The LC_CTYPE is incomplete because all the Uxxxx characters are not shown. Ideally, this LC_CTYPE should be complete. Failing that, the incompleteness should be addressed and acknowledged. Disposition: accepted in principle. All ISO/IEC 10646 characters are classified with this specification, most of them with the "graph" keyword. 10.section 4.1.2: - item (8): needs to be reworded to be clear but at a minimum replace "from behind" with "backwards" Disposition: accepted. - "...The following keywords ..": states that the keywords are described in detail later. This is not wholly true because the first two keyword are not detailed later. Disopsition: accepted. Summaries missing will be added. - coll_weight_max: stated that the minimum value is 7 and that this is also the default. This is not the case as per the example for LC_COLLATE. There this value is 4. Disposition: accepted in principle. The default will be removed. 11.section 4.1.2.4: third paragraph, third sentence - "The first operand .... this ." Expand the sentence to end with "or another "order_start" keyword is encountered". Disposition: accepted. 12.section 4.1.2.5: this really does not belong in the explanation of keywords and as such should really appear after 4.1.2.12. Disposition: accepted. The collating statement description is moved up to after 4.2.1.1 Further in the example, and need to be explained. Disoposition: accepted. Explanation will be added. 13.remove sub-section heading 4.1.2.11 because it is not needed. Disposition: rejected. It is needed structually to signify that some new statements are introduced different from the "reorder-scripts-after" keyword. Clause renumbered to 4.1.2.10.1. 14.section 4.1.2.13: assumption here is that there is wide demand for this function. Most folks do not deal with locale construction so the benefit of a 'shorthand' way of changing locale source will be lost for the masses. All of this capability does not provide dynamic run time overrides, only deals with the current static model of previously defined source files. It also presumes the use of rather large all encompassing locales. Disposition: rejected. The people that need the toggling keywords are the same that need the other locale keywords; that is, locale writers. Given that most locale writers are expected to use the 14651 standard, which has toggles, the toggling keywords are very needed. 15.sub-section 4.1.2.13.6: this example shows two LC_COLLATE statements. Is this correct? Which one takes precedence? Disposition: accepted. The example is meant to only describe one resulting LC_COLLATE specifcation, which is the latter of the ones shown. The first is a repetition of the standard "i18n" FDCC-set. This will be clarified. 16.section 4.1.3: the i18n LC_MONETARY category shown: - why is the "mon_decimal_point" shown as ? This is not culturally neutral, nor is it the only internationally accepted value. Disposition: accepted. The "mon_decimal_point" will be the empty string, as also in POSIX-2. - the "-1" value for int_frac_digits through to n_sign_posn is incorrect in that the value "-1" is not described in the keyword text that precedes this definition. Disposition: accepted in principle. The meaning of the value of "-1" is described in the beginning of clause 4.1.3. - why are there no entries for int_p_cs-precedes etc. when these are identified as keywords. In the description of the keywords, there is no indication as what will happen when these keywords are omitted. Disposition: accepted. Wordings will be added to specify that if not specified, the value for the corresponding domestic currency keyword will be taken. - how are occurrences of multiple currencies, such as EURO and the local country currency, proposed to be handled? Disposition: accepted. See disposition of Danish comment 2.1. 16.section 4.1.4: the i18n LC_NUMERIC category shown: - why is the "mon_decimal_point" shown as ? This is not culturally neutral, nor is it the only internationally accepted value. Disposition: Accepted. The "decimal_point" will be the empty string, and add that no default will be applied. 18.section 4.1.5: this is the first time that the word "mandatory" has been used for keywords. Does this means that all other keywords in the other categories are optional? Disposition: accepted. "mandatory" was meant to specify that they shall be recognized, while the optional "era" keywords need not be recognized. The "era" keywords will also become mandatory, and the word "mandatory" will then be removed here. - abmon and mon keywords: the current restriction of twelve months is not correct; it does not allow 13 Hebrew months to be shown. Disposition: accepted. - what is the effect of when the "am_pm" keyword is an empty string? Disposition: If any of the two "am_pm" strings are undefined, then the %p field descriptor yields an unspecified result. in general, what the effect of empty keywords? Disposition: This is described for each keyword. Missing keywords leads as a default to that the category is unspecified. See Canadian comment 3. - optional keyword support for 'era' and alternate digits is rather short sighted in that these are 'mandatory' for far-east support. Disposition: accepted. They are made 'mandatory'. 19.section 4.1.5.1: table 2: - %m - change from (01-12) to (01-13) Dispositon: accepted - if the timezone information is application defined, per note at the end of the table, then %Z should really be removed. The better suggestion is that the category be expanded to handle timezone and not leave it up to the application. Disposition: accepted. A new keyword: "timezone" is introduced, with syntax as the "TZ" environment variable in POSIX, combined with dates indicating validity of the specifications. Several timezones may be specified. 20.section 4.1.5.2: - %Of should be changed as per the new %f in section 4.1.5.1 Disposition: Accepted. The %Of is added with parallel functionality as %f. 21.section 6: repertoiremap is incomplete (misses, for example, the section between U06AF and U1e00). Disposition: These characters can be identified via the scheme. A statement is added to the beginning of the repertoire clause that the list is provided to accomodate prior art. ---------- Hereby the Danish Standards vote on SC22 N2504 - CD 14652 1. The vote for CD registration is "Yes" 2. The vote on the CD ballot is "Yes" with comments. 2.1 There is a need for support of an alternate currency, such as the EURO. We propose keywords such as the current for international currency and domestic currency, but with a "2" added to each of the keywords, and a "currency_rate" keyword for the fixed currency rate. Disposition: accepted. 2.2 There is a need for equivalencing of weights in the LC_COLLATE specification, eg by a "weight_equivalence" keyword. This to accomodate different weight naming schemes. Disposition: accepted. 2.3 Some extra keywords in the LC_MESSAGES category should be added, such as "yesstr" "nostr" and "cancelstr" Disposition: rejected. This is a very incomplete solution in terms of coverage of text data. 2.4 Support for ISO 2022 for extended charmap specifications are needed. Disposition: accepted. New keywords in the charmap definition is added: include %s %s %s, gset1, gset2, charmap includes gset1 of existing charmap as gset2 in new charmap, gset1 and gset2 cn be one of "g0,g1,g2,g3,c0,c1". escseq %s %s %s, gset1, gset2, sequence Defines "gset1" of the current charamap as a "gset2" introduced by "sequence" addset %s, charmap adds the coded character sets as specificed in "charmap" and selected by the escape sequences defined in that "charmap"s "escsec" keywords to the recognised set of character sets of the current charmap. MB_CHARMAX should include both the escape and the character in question. ----------- Japan disapproves document SC22 N2504 (CD 14652) to be registered as Committee Draft (CD). Comments 1. The scope of this project is to specify specification method of cultural conventions more than what POSIX supports. The draft CD covers nothing more than POSIX. Therefore, the document N2504 does not satisfy the project objective. At the project subdivision ballot, Japan asked the difference from POSIX locale definition method. The disposition for the comment is described in SC22 WG20 N269 (Disposition of comments received on WG20 proposal to subdivide project JTC1.22.30.01.01 to include a project on: Cultural convention specification SC22 N1574). The disposition of comment commits that this project covers more than what current POSIX does. If there is no intention to add any more cultural conventions (more than POSIX) at the first publication, this project should be canceled. Dispospition: accepted. Major new functionality is added in the areas of transcription, sorting, currency handling, paper formats, telephone number formats, postal address formats. Many minor items have been added, and POSIX is not looking into enhancing their standard significantly, because of lack of expertise, and that this is not their area of expertise. 2. The SC22 WG20 N269 indicates that the candidates of the "extended cultural conventions" as follow: Data input, Multi-lingual synchronization, Measuring system, Paper size and Postal address. Out form candidate, Japan recommends to add at least Paper size, Measurement system and Postal address. In addition to that, Japan recommend to consider to add "colour specification including colour systems and name of color" and "name of person". Disposition: accepted in principle. The catogory paper size, postal formats and telephone formats will be added. The paper size category "LC_PAPER" will contain information on "width" and "height" measured in millimeters. The LC_NAME category will contain information on formatting of a name for a postal address with the position of items like "family_name" "given_name" "middle_name" "middle_initial" "given_initial" "title" "profession". The LC_ADDRESS category will contain information on formatting of a name for a postal address with the position of items like "c-o_address" "firm_name" "department_name" "building_name" "street_name" "house_number" "room_number" "floor_designation" "country_designation" "zip_number" "city" "country". Also "country_abbreviation" will be given. The LC_TELEPHONE category will have information on "int_prefix" "int_select" and formats for "tel_domestic" and "tel_int". The LC_MEASUREMENT category will have a simple statement on local messaurement system. Handling of colours are understood to be too complex at this time. Directionality will be adressed, along the lines provided by Khaled Sherif. 3. When a cultural convention specification method more than POSIX does specify, to make FDCC-set compatible with POSIX, it is necessary to provide a FDCC-set specification method (method to specify which FDCCs are included in specified FDCC-set). Add clause of "specification method of FDCC-set". Disposition: Accepted. There will be conformance for each of the categories, and a category to specify which other categories are present what they are conforming to. 4. It is anticipated that more cultural conventions to be added in this standard in future. There is a need to have a guide line to specify the new cultural convention specification methods. Add clause of "the guide line". Dispositon: accepted. There is already a guideline on how to specify new categories in 4.1, this will be made a rule. Guidance will be added to avoid clashes with future standardized categories. 5. There are many technical and editorial comments on the documents N2504. Those comments are a part of CD ballot. 6. Confirm whether if the difinition of FDCC and FDCC-set are compatible with TR 11017. There is very high possibility that they are different each other. If there is, then aline the terminology with TR 11017. (This may resolve most of above comments). Disposition: Accepted. The standard is intended to be aligned with TR 11017. We will say that categories are more or less the same as a FDCC. "FDCC-set elements" will be changed to "FDCC" ------end of registration ballot comments -----CD BALLOT COMMENTS (Japan)----- Japan disapproves the document SC22 N2504 (CD 14652) as Committee Draft (CD) with following comments: J-1) General: The CD text is only a minor enhancement of a POSIX locale specification method and does not include any new categories which are declared to be investigated in SC22 WG20 N 269 -- disposition of the comments to NWI ballots. Dispostion: Accepted. New functionality is introduced as noted in response to the Japanese registration comment 2. above. This project should be abandoned if it would include no new categories not included in a POSIX locale specification method. The extension of collation method should be moved to ISO/IEC 14651 in that case. note: this is the same comment as the CD registration ballot. See the registration ballot comment for detail. Disposition: see response above. J-2) p.2, FOREWORD: The paragraph The Standard uses text from ISO/IEC 9945-2:1993 "Information Technology - Portable Operating System Interface (POSIX) Part 2: Shell and Utilities". The major differences from this text is listed in annex A. should be removed. Disposition: Accepted in principle. JTC 1 directives require a statement in the foreword of the relation to other standards. The standard has the stated relation. JTC 1 directives also prescribe listing of annexes in the foreword. The sentence in question will be rewritten to specify which parts of POSIX is conformant. J-3) p.4, 1.Scope The sentence The specification is compatible with POSIX locale specifications (10), and a locale conformant to POSIX specifications will also be conformant to the specifications in this Standard, while the reverse condition will not hold. should be changed to The specification is upward compatible with POSIX locale specifications(10) -- a locale conformant to POSIX specifications will also be conformant to the specifications in this Standard, while the reverse condition will not hold. Disposition: accepted. J-4) p.4, 2. Normative referemces: The following references (1) ISO 639 Code for the representation of names of languages (2) ISO 646 Information technology - ISO 7-bit coded character set for information interchange (3) ISO/IEC 2022 Information technology - Character code structure and extension techniques (4) ISO 3166 Code for the representation of names of countries (7) ISO/IEC 8824 Information technology - Open Systems Interconnection - Specification of Abstract Syntax Notation One (ASN.1) (8) ISO/IEC 8825 Information technology - Open System Interconnection - Specification of Basic Encoding Rules for Abstract Syntax Notation One (ASN.1) (9) ISO/IEC 9899 Information technology - Programming Language C. should be removed because those standards are not referenced or referenced only in informative part (ISO 646). Disposition: accepted. Moved to bibliography. J-5) p.6, 3.1.12 collation: The text These rules identify a collation sequence between the collating elements, and such additional rules that can be used to order strings consisting of multiple collating elements. should be removed because it is too detailed as a definition and it is vague -- there is no explanation for what rule is additional. Dispositon: accepted. J-6) p.6, 3.1.17 affirmative responses: The definition should be removed because the term is understandable without definition. If they remain, the definition should be changed from: An input string that matches one of the responses acceptable to the LC_MESSAGES category keyword "yesexpr", matching an extended regular expression in the current FDCC-set. to: A string conforming to the definition of LC_MESSAGES category keyword "yesexpr". Disposition: accepted, with text changed as proposed. J-7) p.6, 3.1.18 negative response: (the same comment as 3.1.17 affirmative) Disposition: accepted, with text changed as proposed. J-8) p.7, 3.2.1 Format of syntax descriptions: The text The format of each parameter is given by an escape sequence as follows: %s specifies a string %d specifies an decimal integer %c specifies a character %o specifies an octal integer %x specifies a hexadecimal integer %% specifies a single % \n specifies an end-of-line All other characters in the format string represent themselves. should be changed to The format of each parameter is given by an escape sequence as follows: %s specifies a string %d specifies an decimal integer %c specifies a character %o specifies an octal integer %x specifies a hexadecimal integer All other characters in the format string except %% specifies a single % \n specifies an end-of-line represent themselves. Disoposition: accepted. J-9) p.7, 3.2.3 Ellipses: The definitions here are not consistent with thier expression in 5.1 Caharcter set description file (pp.45-46). The text here should be changed as to match with POSIX and the explanation in 5.2 should be removed. Disposition: Rejected. The Ellipses have new functionality compared with POSIX, and there is no clause 5.2. J-10) p.8, 4. FDCC-set: In the sentence This standard defines a normative FDCC-set named "i18n" with values for each of the above categories. the word "normative" is redundant. It should be removed. Disposition: accepted. J-11) p.9, 4.1 FDCC-set Definition, para."The categrory body ...": The restriction Each keyword within a FDCC-set shall have a unique name (i.e., two categories cannot have a commonly-named keyword); should be removed because it loads a heavy burden on designing each categories -- even in this draft, the keyword "copy" is defined in more than two categories. Disposition: accepted. "FDCC-set" changed to "category". J-12) p.9, 4.1 FDCC-set Definition: The subclauses 4.1.0.1 - 4.1.0.5 are ill-structured because they have not their direct superior subclause 4.1.0. The content of 4.1.0.5 should be moved before 4.1.0.1 without being put into a subclause and a new subclause title "4.1.0 Pre-category lines" should be introduced before 4.1.0.1:. Disposition: accepted in principle. The heading of 4.1.0.5 is retained, though. J-13) p.9-10, 4.1.0.3 repertoiremap: Make clear how many repertoiremap specification is allowed in a FDCC-set. Disposition: accepted. At most one repertoiremap per FDCC-set. J-14) p10, 4.1.0.4 charmap: The sentence For the actual use of a FDCC-set, at most one charmap may be in use, and this may be different from any charmap specified with the "charmap" line. needs more explanation. Disposition: accepted. Japan will provide further text on the next ballot, and the editor will provide text on the intention of the keyword. J-15) p.11, 4.1.0.5 Character representation: Add a new rule for UCS-notation, and , which looks like symbolic names but not defined in a charmap file. Disposition: accepted. J-16) p.10, 4.1.0.5 Character representation: The text Individual characters, characters in strings, and collating elements shall be represented using symbolic names, as defined below. In addition, characters can be represented using the characters themselves, or as octal, hexadecimal, or decimal constants. When nonsymbolic notation is used, the resultant FDCC-set definitions need not be portable between systems. The left angle bracket (<) is a reserved symbol, denoting the start of a symbolic name; when used to represent itself it shall be preceded by the escape character. The following rules apply to character representation: (1) ... is confusing. It should be changed to Individual characters, characters in strings, and collating elements shall be represented using symbolic names, UCS notation or characters themselves, or as octal, hexadecimal, or decimal constants as defined below. When constant notation is used, the resultant FDCC-set definitions need not be portable between systems. (0) The left angle bracket (<) is a reserved symbol, denoting the start of a symbolic name; when used to represent itself it shall be preceded by the escape character. (1) ... Disposition: accepted. J-17) p.11, 4.1.0.5 Character representation, (1): The sentence The symbolic name, including the angle brackets, shall exactly match a symbolic name defined in a charmap file to be used, and shall be replaced by a character value determined from the value associated with the symbolic name in the charmap file. should be changed to The symbolic name, including the angle brackets, shall exactly match a symbolic name defined in charmap files or repertoiremap files to be used, and shall be replaced by a character value determined from the value associated with the symbolic name in the charmap file or a value asscociated to UCS in repertoire map files. Disposition: accepted. J-18) p.11, 4.1.0.5 Character representation, (3)-(5): It is confusing to include concatenated constants in each examples without any definition. The concatenated constants should be removed from the examples of (3)-(5) and the explanation for concatenated constants should be formed as a new rule as follows: (6) Multibyte characters can be represented by concatenated constants specified in byte order with the last constant specifying the least significant byte of the character. Concatenated constants can include a mix of the above character representations. Disposition: accepted. J-19) p.11, 4.1.0.5 Character representation, end: The "Editor's note" here makes no sense. It shoud be removed. Disposition: accepted. J-20) p.12, 4.1.1.1 Basic keywords The specification of digit digit Difine the characters to be classfied as numeric digit. Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9 shall be specified, and in ascending sequence by numerical value. If this keyword is not specified, the digits 0 through 9, shall automatically belongs class, with application-defined character value. is ambiguous. Make clear the relation between the digit 0-9 and in table 3 portable character Disposition: accepted. Also refer Canadian comment 6. J-21) p.14-16, 4.1.1.2 Character string transformation: The concept of character string transformation is not mature yet. It is of no use to specify only one transformation without any specific meaning. It should be removed. Disposition: Accepted in principle. See response to Canadian comment 7. J-22) p.16, 4.1.1.2 Character string transformation: The "i18n" FDCC-set is not a matter of 4.1.1.2 Character string transformation. A new subclause title "4.1.1.3 i18n-LC_CTYPE" should be added just before the line beginning with "The "i18n" FDCC-set for the LC_CTYPE ...". Disposition: accepted. See also Canadian comment 8. J-23) p.16, "i18n" FDCC-set IPA characters should be removed from "toupper"and "alpha". Presentation form characters should be removed from "alpha" Disposition: rejected. The standard will be aligned with TR 10176. J-24) p23, 4.1.2 LC_COLLATE: The capabilities text (9) Easy reordering of characters. The "i18n" FDCC-set has a collation specification that with just a few modifications can be culturally correct for a specific culture. Here the "reorder-after" keyword gives a convenient way to modify a FDCC-set. (10) Easy reordering of scripts. The "i18n" FDCC-set gives an ordering of the scripts that may not be culturally acceptable in certain cultures. The keyword "reorder-script-after" gives a convenient way to modify the order of scripts in a FDCC-set. should be changed to (9) Easy reordering of characters. ISO/IEC 14651 has a template for collation specification that with just a few modifications can be culturally correct for a specific culture. Here the "reorder-after" keyword gives a convenient way to modify a FDCC-set template. (10) Easy reordering of scripts. The template in ISO/IEC 14651 gives an ordering of the scripts that may not be culturally acceptable in certain cultures. The keyword "reorder-script-after" gives a convenient way to modify the order of scripts in a FDCC-set template. Disposition: accepted. J-25) p24, 4.1.2 LC_COLLATE: Add summaries of toggling keywords -- "define", "ifdef" etc. -- just before 4.2.2.1. Disposition: accepted. J-26) p.25, 4.1.2.4 "order_start" keyword: The text here become very confusing by mixing and . Text should be changed based on two syntax forms "order_start %s;%s;...;%s\n", , ... and "order_start %s;%s;...;%s\n", , , ... Disposition: accepted. J-27) p.25, 4.1.2.4, "order_start" keywords, directives "forward" and "backward": Give a difinition of "substring" and add a sentence The direction of scanning substrings is towards the logical end of the string. to the explanation of the directives "forward" and "backward". note: related discussion in the past. > > Accepted in principle. The "forward" directive has wordings of > > scanning towards the logical end, while the "backward" directive > > scans towards the beginning of the string. Or was something else > > meant, such as scanning for collating elements? > > Let's consider an example string "ABC123GHI" where A..Z are "backward" > and 1..9 are "forward". In this case, three substrings "ABC", "123", and "GHI" produces a key "CBA", "123", and "IHG" respectively from the current specification but there is no explanation how to combine those subkeys. > > The sentence > The direction of scanning substrings is towards the logical > end of the string. > assures that those subkeys are combined in "forward" manner resulting in > "CBA123IHG". There is no prescribed resulting string as per the specifications, this in an implementation detail. ------end of related discussion---- Disposition: accepted. J-28) p.32, 4.1.2.13.3 "elif" keyword: The definition is incomplete -- the effect of preceding block is not considered. Disposition: accepted. It will be added that this keyword will only be evaluated if none of the preceding "elif" keywords have been used. J-29) p.33, 4.1.2.14 "i18n" LC_COLLATE category: The sentence The "i18n" FDCC-set LC_COLLATE category is defined in ISO/IEC 14651 (12). should be changed to There is no "i18n" FDCC-set LC_COLLATE category. Instead of the default ordering, the common template for tailoring defined in ISO/IEC 14651 (12) should be used. note: related discussion in the past. > > Rejected. The "i18n" FDCC-set will be using the data of IS 14651. > > When referring to the "i18n" FDCC-set sorting needs to be defined. > > At the Quebec meeting, we agreed not define a "default" ordering > in IS 14651. Yes, but that does not influence 14652 in the way you indicate. One can always modify the i18n fdcc-set for a given culture. ------end of related discussion----- Disposition: rejected, as a LC_COLLATE category is needed for the i18n FDCC-set to be complete. A note will be added to say that this will be using the template. J-30) p.36-37, 4.1.3 LC_MONETARY and IC:4.1.4 LC_NUMERIC: The default values for "mon_decimal_point" and "decimal_point" should be changed from ( = ',') to ( = '.') according to the standard ISO 6093:1985 Information processing - Representation of numerical values in character strings for information interchange which should be added into 3. Normative references. Disposition: See response to Canadian comments 16 and second 16 (17). J-31) p.37, 4.1.5 LC_TIME Following old comments are still opened: - Need to have a specification method of date and time convention for Luna calendar which year is not 365 days. - Need to have a specification method of data and time convention that week is not seven day. (historical buddist calendar) Disposition: accepted in principle. Japan is invited to provide text. Specification to accomodate calendars with fixed week lenght different from 7: one gregorian date that has the first weekday of the alternate week, specify how many days per week and then the weekday names, and further specifications in normal time/date formats. First day of week can be specified on a calender format. First working day, specified with ISO 8601 day numbering. Directionality of calendar representation, left-to-right or top-down is added. J-31) p.40, 4.1.5.2 Modified Field Descriptors: The value d_t_fmt "<%><%><%>" -- 2 1997-10-07 10:00:01 should be changed to d_t_fmt "<%><(><%><)><%>" -- 1997-10-07(2) 10:00:01 Disposition: rejected. The formats have the same information and the original is more aligned with existing practice (weekday first). J-32) p.41, 5. CHARMAP: The sentence Conforming charmaps shall support the portable character set specified in Table 3 is ambiguous. It should be changed to Each charmap shall support the portable character set specified in Table 3 or A set of charmaps for a FDCC-set shall support the portable character set specified in Table 3 Disposition: accepted in principle. The charmap employed with a FDCC-set shall support the portable character set specified in table 3. J-33) p.39, Table 2: The discussion did agree as follow: > > > 2. The LC_TIME %f format should return "1" for the first day of the > week, etc, and "7" for the 7th day of the week. Returning a string with a "0" for the first day of the week is misleading, and this is not used for indexing in arrays, but for display in strings. > > > > Accepted. > > The newly introduced escape sequence > > %f Weekday as a decimal number (0(Monday)-6). > > modified to > > %f Weekday as a decimal number (the first day 1 - the last day 7). However, this document is: > %f Weekday as a decimal number (1(Monday) - 7) Is this agreeable with POSIX? Dispositon: accepted. POSIX does not have the %f specifier (yet). -------end of the comment on the past discussion------- J-34) p.46 5.1 Character set description File, para. "Theencoding part...": If this paragraph remains (we have requested to remove the explanation of character representation in 5.1 including this paragraph already), the sentence In a portable chrmap file, each constant shall represent an 8 bit byte should be removed because the concept of " a portable character set ifile" is not defined in this draft----a portable character set is only defined. Disposition: accepted in principle: The word "portable" will be removed. J-35) p48-75, "i18nrep" repertoire It is not necessary to define such a confusing symbol name set in an international standatd. Disposition: the list reflects prior art, and is necessary not to invalidate a lot of existing data. See response to Canadian comment 21. J-36) Conclude an open discusion below. --------Start of open discussion B------- >> 4. What does "byte" means in this standard? Since this standard >> does not require any "processor", an individually addressable >> unit of storage does mean nothing. > >The standard is meant for processing in IT environments, >so there is always a processor behind it somewhere. Since the document you are writing is "Standard", please do not hide anything behind the preparations of the standard text. If a "processor" exists, you should describe the processor and conformance of the "conformning implementation of the processor" in the conformance section. Also, you need to change its scope and title of the standard. It is not "specification method", but a "language". Also, please do not specify something that is helpful for something as requirement. You should specify what are mandatory requirements for conformance, and what are allowable extensions of the standard. Please you "may" for what are allowable extensions. I suppose, the conformance clause of your standard could be POSIX or C like things, if you would like to specify syntax of "cultural convention language and its compiler". Please note that I do not say "I agree with the scope change", but just say please write the standard text appropriately for the scope. Otherwise, reviewer may be confused. > I am not sure we need to describe a processor for this, > but we can discuss it at the next WG20 meeting. I believe that you are an expart of C language, right? Please carefully read the conformance section of C language. C standard specity 2 different conformance. One is conforming "processor", and theother is conforming "application". As you may know well, "Implementation" of C language standard is language "processor" of C language, I mean compiler. Then C language standard specifies how the comforming processor shall behave. In addition to that, C langauge standard specifies how comforming C application shall be written. That is application conformance. My point is what is the purpose of your standard. If you would like to specify how cultural convention set shall be specifyed, I mean application conformance, the phenominon that conform to your standard is just description of cultural convention, maybe written on a paper. Then you do not need to care about "processor". But, if you like to specify limitation and/or parameter for a cultural convention set description file "processing" system, then the preparation of your standard will looks like language standard, and you should specify both implementation conformance and application conformance. O.K. you should familiar with POSIX. Main portion of your standard comes from POSIX.2 localedef utility. POSIX also specify implementation conformance of localedef utility itself, and application conformance for localedef file. The limitation of maximam byte is for localedef utility. > > Also, please do not specify something that is helpful for something > > as requirement. You should specify what are mandatory requirements > > for conformance, and what are allowable extensions of the standard. > > Please you "may" for what are allowable extensions. > > The thing in question was inherited from POSIX. > We need at least to maintain it for POSIX compatibility. Do not be afraid. You can simply says that implementation may extend its syntax and specify something something. Then POSIX conforming localedef file becomes your standard conforming. Please note that the objective of the POSIX compatibility is make POSIX conforming localedef file conformity to your standard, not to make your standard conforming ones as POSIX conforming localedef file. > I am not great expert in writing conformance clauses, but I > hope I am learning. I at this time do not see the big difference > between a programming language and a specification method. > The specification method is to be interperted by some IT system, just > like a programming language. Keld, please please do not say "I'm not expert in writing some part of standard" . If you say so, need to say you should not be project editor. Project editor need to have enough capability of writing standard text, even he is not an expart on the subject technology area. It is our WG20's credibility problem. Thus, I need to come back from open discussion on sc22wg20 mailing lsit to parsonal mail. Anyway, you should try. Without having your new draft, we can not discuss it in the next WG20 meeting. Then we can not send new text to SC22 for CD ballot. In order to send new text to SC22 immediately after the next WG20 meeting, all of issues should be resolved in the next WG20 meeting, and revised version need to be prepare in the meeting. -------end of open discussion B------- Disposition: Accepted. The concept of "file" will be changed into "text", "byte" in the sense of input in text will be removed, and the "end of line" will be revised, to if last graphic character on line is the escape character this is the end of line. A "localedef" like utility will be added to 15435. -- Minor editorial -- J-36) p.9, 4.1.0.2 escape_char: The sentence All examples this standard uses "/" as the escape character, except where otherwise noted. should be changed to All examples in this standard uses "/" as the escape character, except where otherwise noted. Dispositon: accepted. J-37) p.11, 4.1.1 LC_CTYPE: "in clause 3.2.5" should be changed to "subclause 3.2.3". Disposition: accepted. J-38) p.13, Table 1: The line "In Can also belong to" should be removed. Disposition: accepted. J-39) p24, 4.2.2.1 "script" keyword: "4.2.2.1" should be changed to "4.1.2.1". Disposition: accepted (on page 22) J-40) p24, 4.2.2.3 "collating-symbol" keyword: "4.2.2.3" should be changed to "4.1.2.3" Disposition: accepted. =----------- The NNI votes NO on CD 14652 in SC22N2504. These no votes pertain to both the registration vote and the document vote. The NNI will vote yes on the CD registration when the comments under -1- and -2- have been properly resolved. The NNI will vote yes on the CD when the comments under -1-, -2- and -3- have been properly resolved. The NNI has the following comments: -1- Market relevance The way this specification has been phrased effectively limits the use of this specification to POSIX/Unix and C platforms. This market is rather small; much larger markets and existing notations for a Cultural Conventions Specification seem to have been ignored. Even in this small market this document addresses only a minor part of 9945-2 and is understood to provide a small improvement to that specification. It is unclear whether the POSIX market will accept such slight improvements to this 9945-2 standard in a separate document. The NNI is of the opinion that WG20 has, and should have, a much broader scope than the POSIX platform and considers such a specification of limited applicability unacceptable. The NNI suggests the following course of actions: (a) WG20 is requested to develop a Platform and Language Independent Specification (PLIS) for a Cultural Conventions Specification (CCS-PLIS). This CCS-PLIS describes the functionality needed for a CCS without any reference to windows, files, programming languages and other implementation issues. (b) WG20 is requested to provide implementations of this CCS-PLIS for major platforms, amongst which the Wintel, the Macintosh, the POSIX and mainframe platforms. (c) The CCS-PLIS for POSIX is to be developed in cooperation with WG15. It should be noted that the CCS-PLIS should be defined in such a way that for each of the platforms mentioned above conformance clauses with respect to the CCS-PLIS can be specified. Disposition: Accepted in principle. The standard is not limited to POSIX/Unix and C, as it has been expanded to include a number of other issues not dealt with in these systems. With regards to the market targeted it is the intention that the standard be implemented on all major plaforms. The market requirements are considered to be great as there is a huge demand for internationalization and localization specifications currently, as also witnessed by the request from JTC 1 that all IT standards be addressing internationalization issues. The standard will also include major new features, such as enhanced 10646 support, translitteration, euro- support, paper size support and postal and telephone number formatting support, along with a number of smaller enhancements. With respect to a) WG20 has investigated the possibilities for a PLIS specification and chosen the current specification technique as the most appropiate. This PLIS is platform independent, windows independent, programming language independent, and coded character set independent, and it builds on proven technology. It uses a text definition that is implementable on virtually all platforms, and thus it meets the requirements of NNI. b) It is outside the terms of reference of WG20 to provide implementations, although we are aware that work is underway on major platforms for this standard. c) WG20 has been working with liaison from WG15 on the standard. -2- Relation to Framework document The relation between this document and the Cultural Dependent Items formulated in the Framework Document is unclear. The Framework Document mentions that WG20 will deliver, amongst others, specifications for the following cultural dependent items: hyphenation of words, word representations of numbers, writing directions, voice messages and postal addressing formatting. The NNI had expected that the now presented 14652 document would contain such specifications. The NNI requests the following information from WG20: - will these additional items be added to 14652 in the (near) future? - if so, what will be the life expectancy of the current 14652 document? Dispostion: accepted. It is planned to add functionality to the standard, prioritized by available expertise and new requirements. The postal address issue is adressed with the currrent standard. Items will be added in the furure. The standard will be valid until it is withdrawn. -3- Technical comments (a) The lexical and syntactic structure of the files has been specified incompletely. The document cannot be understood without knowledge of 9945-2. The document mixes lexical/syntactical structure and semantics of the specification. It is requested that a complete syntactical definition is given using EBNF (ISO 14977, or a variant thereof) and that a clear separation between lexical structure, syntactical structure and semantics will be maintained in the document. Disposition: accepted in principle. The WG20 has ensured that the document can be understood without knowledge of 9945-2. It has a clear distiction between lexical/syntactical structure and semantic structure. There is a complete syntactical description of each of the components in the standard. (b) The definitions as given in section 3 are unclear, incomplete in some cases, over-complete in other cases and self-contradictory in a few cases. It is requested that this section is redeveloped, preferably in an axiomatic style. The document itself contains much terminology that has not been defined in section 3. Disposition: Partially accepted. Changes has been made. (c) The document seems to mix-up the concepts of `value' and `constant'. Disposition: rejected. We need more information from NNI to address this issue, if there is one. (d) The numbering system used is highly inconsistent: There are two sections 4.1.2.13.5 and two sections 4.1.2.13.6 After 4.1 follows 4.1.0.1 WG20 is requested to debug their documents before presenting them to the NBs. Disposition. Accepted. The above mentioned numbering errors will be corrected. (e) the relationship between this document and CD 14651 is unclear: is it for instance possible that a system comforms to 14651 and not to 14652? The relationship needs yo be explaned. Disposition: accepted. A system can conform to 14651 but not to the whole of 14652. The conformance clause is being revised. - The US National Body votes to Disapprove the CD Registration and the CD Ballot for ISO/IEC CD 14652. See comments below: General Comments Re 4.1.1 LC_CTYPE While the presence of the LC_CTYPE specification in CD 14652 is understandable, given the fact that CD 14652 is derivative from ISO/IEC 9945-2 (itself derivative from the XPG-4 specification of locale), it is inappropriate to extend the LC_CTYPE mechanism for dealing with character properties to cover the repertoire of ISO/IEC 10646. ISO/IEC 10646 specifies the *Universal Character Set*, and in the context of the Universal Character Set, character properties of the type that LC_CTYPE is concerned with are best treated as inherent to the characters. It would be correct to enumerate these properties in a standard- perhaps even in 14652, if not 10646 itself-but it is incorrect to imply, through the general FDCC-set syntax spelled out in 14652, that it is o.k. to redefine any of these properties in an FDCC-set definition, the same way that LC_MONETARY or LC_NUMERIC entries can be tailored for local cultural conventions. Character properties are *not* subject to local cultural conventions. It is *not* acceptable to redefine GREEK SMALL LETTER TAU to be uppercase, or to define CIRCLED DIGIT SIX to be punctuation, for example. Such definitions do not belong in specifications for *cultural conventions*, or if character properties must be defined there, they should at least be clearly earmarked as different from all other categories of an FDCC-set. The one obvious exception to this generality is case-mapping. Case-mapping relations do vary by language (with well-known examples for Turkish, French, and German). The specification of the LC_CTYPE "properties" and should be clearly marked as exceptional in this way. CD 14652 should give the default case-mapping values for the "i18n" FDCC-set, as shown, and then specify that these particular values should be redefined or overridden to obtain correct cultural specification for case-mapping for Turkish, for French, or whatever. Disposition: rejected. There are items in the LC_CTYPE which are cultural dependent, such as for example the case mapping tables as also mentioned above, and other items may also be cultural dependent, such as whether a CIRCLED DIGIT SIX is a digit or a graphichal (special) character. Some cultures may also chose to regard character of foreign scripts as special characters, and some cultures may regard a character as a basic letter, while others regard the same character as an accented letter or a ligature, although the recommendation is that characters maintain most of their properties thruout all cultures. In general the properties of a character is thus culturally dependent. ****** Re 3.1.6 FDCC-set The introduction of this new term seems unnecessary. The concepts presented in CD 14652 are so closely modeled on the XPG-4 notion of "locale" (except for the attempt to extend the character set coverage to 10646 and expand the concept of LC_COLLATE), that the new term obscures rather than clarifies what 14652 is about. Retention of the term "locale" or perhaps a adjectivally modified version of the term "locale" ("extended locale" ?) would be preferable. Disposition: rejected. The term "FDCC-set" is approved terminology via the approved TR 11017. ****** Re 3.2.3 Ellipses The introduction of distinctions between two-dot, three-dot, and four-dot ellipses seems overly complex and subject to error in use. Furthermore, the explanations, both on pages 8 and 41ff are confusing. If such distinctions between range notations must be maintained, they should be better described, with clearer examples. Also, it is generally better practice to simply have a single range notation for a formal syntax, while maintaining clear syntactic differentiation of the elements which can form the items at each end of a range. So if the FDDC-set syntax must distinguish a range a symbols, a range of decimal values, a range of octal values, a range of hexadecimal values, and so on, the notation for "symbol", "decimal value", "octal value", "hexadecimal value", and so on should be unique and mutually exclusive, so that interpretation of the type of range does not depend on the number of dots. Disposition: accepted in principle. The explanations and examples will be revised, but with unchanged syntax and semantics. ****** Re 4.1.2 LC_COLLATE The syntax introduced for tailoring a collation sequence definition for cultural conventions is overly complex. It is very tightly coupled to the specific way in which a collation is defined in CD 14651, which itself is in question. A much simpler syntax has been promulgated by the Java developers to accomplish the same task, and it would be desireable to examine the alternatives before standardizing an LC_COLLATE syntax of unnecessary complexity. Unlike most of the rest of the categories involved in an FDCC-set definition, which merely specify lists of things, the LC_COLLATE syntax introduces notions of scope, reordering, and a macro control language. Granted that reordering rules are needed for defining collations, it is unclear that all of the rest of the syntax is. Disposition: rejected. The mechanisms used are one-line statements and then directives using prior art and tools like the C preprocessor. Re B.1.2 LC_COLLATE Rationale This states "The syntax for the LC_COLLATE category source is the result of a cooperative effort between representatives for many countries and organizations working with international issues, such as UniForum, X/Open, and ISO,..." We believe that this intentionally overstates the degree of cooperative effort involved and omits the fact that there is a serious lack of consensus in the international community, both about how to define the international string ordering and how to specify a syntax for tailoring it. Major implementors of international string ordering based on 10646 disagree with the approach taken in these drafts, and the standard should not paper over those differences with misleading implications that everyone agrees about how to do it. Disposition: rejected. This text is taken from the POSIX-2 document and it was true at the time of writing 5 years ago, and it is even more true today. p. 74. In the rationale for LC_COLLATE, there is an estimation made that the standard covers the requirements for European languages, and that it will extend well to cover Cyrillic and Middle Eastern scripts (see below for editorial comment), and for the level 3 collation required for Chinese and Japanese. However, the standard will fail for dealing with scripts (such as Thai and Lao) that require *reordering* of characters within a string before calculating weights. That fact should be noted. Furthermore, the standard deliberately ignores the role of combining marks in collation. Implementation of 10646 with combining marks is not well-guided by this standard. It is quite unclear how to modify an LC_COLLATE definition to take combining marks into account. If combining marks are out-of-scope for CD 14652, this should be clearly stated and be consistently carried through. If they are not out-of-scope, then the tailoring syntax for LC_COLLATE should either account for them, or CD 14652 should state clearly what the alternative approaches involving tailoring of CHARMAP or REPERTOIRMAP could be, and how they would be implemented, *with specific examples*. Disposition: rejected. Proof of unsuitablility for sorting of Thai and Lao is requested if a note should be given. The standard adresses specification of string sorting and the combining characters are addressed in 14651. A note on how combining characters can be handled will be added. ================================================================ Specific Technical Comments pp. 12 & 26: and It is unclear from either the definitions of and on page 12, or from the specification of the "i18n" FDCC-set for LC_CTYPE why certain space characters from the 10646 repertoire are not listed: U+00A0 NO-BREAK SPACE U+2007 FIGURE SPACE U+FEFF ZERO WIDTH NO-BREAK SPACE If having a property precludes a character from being included in the or types, that should be spelled out in the definition of those categories. Disposition: accepted. The NO-BREAK exclusion will be explained, classes and are meant for finding possible break points. ********* pp. 16-20: Bugs in the and tables In the toupper table, the entry (,) is incorrect and should be removed. Disposition: rejected. This is not obvious, and needs further documentation. In the toupper table, (,) should be added. Disposition: rejected. This is not obvious, and needs further documentation. In the toupper table, (,) should be added. Disposition: rejected. The characters will be considered when they both are fully included in 10646. In the tolower table, the entry (,) has the items reversed. It should read (,). Disposition: accepted. In the tolower table, (,) should be added. Disposition: rejected. This is not obvious, and needs further documentation. ********** pp. 20-21: specification The list of characters for 10646 differs significantly from that implemented for the Alphabetic category for Java. Insistence on maintaining a distinction, based on principled or unprincipled arguments about the alphabetic status of this or that character, will lead to implementation confusion between the Java community and those who implement based on locales derived from the "i18n" FDCC-set. Given the importance of Java, and the fact that it has already provided a widespread, commercially significant answer to the question of which 10646 characters are alphabetic, the category in CD 14652 (if included here at all-see general comments above) should be harmonized with the Java values. A major defect in the list is the omission of combining characters from many scripts which clearly have the alphabetic property (e.g. the combining vowel matras from Indic scripts). Such omissions would result in nonsensical specifications of alphabetic spans in such scripts, if taken seriously. To simplify correction of the CD 14652 text for the property, here is the suggested list, as implemented in Java (divided into Alphabetic and Ideographic). (Not all unassigned subranges within these ranges are separately called out, to make this list shorter.) #Alphabetic 0041..005A LATIN CAPITAL LETTER A.. LATIN CAPITAL LETTER Z 0061..007A LATIN SMALL LETTER A.. LATIN SMALL LETTER Z 00AA FEMININE ORDINAL INDICATOR 00B5 MICRO SIGN 00BA MASCULINE ORDINAL INDICATOR 00C0..00D6 LATIN CAPITAL LETTER A WITH GRAVE.. LATIN CAPITAL LETTER O WITH DIAERESIS 00D8..00F6 LATIN CAPITAL LETTER O WITH STROKE.. LATIN SMALL LETTER O WITH DIAERESIS 00F8..02B8 LATIN SMALL LETTER O WITH STROKE.. MODIFIER LETTER SMALL Y 02BB..02C1 MODIFIER LETTER TURNED COMMA.. MODIFIER LETTER REVERSED GLOTTAL STOP 02E0..02E4 MODIFIER LETTER SMALL GAMMA.. MODIFIER LETTER SMALL REVERSED GLOTTAL STOP 037A GREEK YPOGEGRAMMENI 0386 GREEK CAPITAL LETTER ALPHA WITH TONOS 0388..0481 GREEK CAPITAL LETTER EPSILON WITH TONOS.. CYRILLIC SMALL LETTER KOPPA 0490..0559 CYRILLIC CAPITAL LETTER GHE WITH UPTURN.. ARMENIAN MODIFIER LETTER LEFT HALF RING 0561..0587 ARMENIAN SMALL LETTER AYB.. ARMENIAN SMALL LIGATURE ECH YIWN 05D0..05F2 HEBREW LETTER ALEF.. HEBREW LIGATURE YIDDISH DOUBLE YOD 0621..063A ARABIC LETTER HAMZA.. ARABIC LETTER GHAIN 0641..0652 ARABIC LETTER FEH.. ARABIC SUKUN 0670..06D3 ARABIC LETTER SUPERSCRIPT ALEF.. ARABIC LETTER YEH BARREE WITH HAMZA ABOVE 06D5..06DC ARABIC LETTER AE.. ARABIC SMALL HIGH SEEN 06E1..06E8 ARABIC SMALL HIGH DOTLESS HEAD OF KHAH.. ARABIC SMALL HIGH NOON 06ED ARABIC SMALL LOW MEEM 0901..0939 DEVANAGARI SIGN CANDRABINDU.. DEVANAGARI LETTER HA 093D..094C DEVANAGARI SIGN AVAGRAHA.. DEVANAGARI VOWEL SIGN AU 0958..0963 DEVANAGARI LETTER QA.. DEVANAGARI VOWEL SIGN VOCALIC LL 0981..09B9 BENGALI SIGN CANDRABINDU.. BENGALI LETTER HA 09BE..09CC BENGALI VOWEL SIGN AA.. BENGALI VOWEL SIGN AU 09D7..09E3 BENGALI AU LENGTH MARK.. BENGALI VOWEL SIGN VOCALIC LL 09F0 BENGALI LETTER RA WITH MIDDLE DIAGONAL 09F1 BENGALI LETTER RA WITH LOWER DIAGONAL 0A02..0A39 GURMUKHI SIGN BINDI.. GURMUKHI LETTER HA 0A3E..0A4C GURMUKHI VOWEL SIGN AA.. GURMUKHI VOWEL SIGN AU 0A59..0A5E GURMUKHI LETTER KHHA.. GURMUKHI LETTER FA 0A70..0AB9 GURMUKHI TIPPI.. GUJARATI LETTER HA 0ABD..0ACC GUJARATI SIGN AVAGRAHA.. GUJARATI VOWEL SIGN AU 0AE0 GUJARATI LETTER VOCALIC RR 0B01..0B39 ORIYA SIGN CANDRABINDU.. ORIYA LETTER HA 0B3D..0B4C ORIYA SIGN AVAGRAHA.. ORIYA VOWEL SIGN AU 0B56..0B61 ORIYA AI LENGTH MARK.. ORIYA LETTER VOCALIC LL 0B82..0BCC TAMIL SIGN ANUSVARA.. TAMIL VOWEL SIGN AU 0BD7 TAMIL AU LENGTH MARK 0C01..0C4C TELUGU SIGN CANDRABINDU.. TELUGU VOWEL SIGN AU 0C55..0C61 TELUGU LENGTH MARK.. TELUGU LETTER VOCALIC LL 0C82..0CCC KANNADA SIGN ANUSVARA.. KANNADA VOWEL SIGN AU 0CD5..0CE1 KANNADA LENGTH MARK.. KANNADA LETTER VOCALIC LL 0D02..0D4C MALAYALAM SIGN ANUSVARA.. MALAYALAM VOWEL SIGN AU 0D57..0D61 MALAYALAM AU LENGTH MARK.. MALAYALAM LETTER VOCALIC LL 0E01..0E2E THAI CHARACTER KO KAI.. THAI CHARACTER HO NOKHUK 0E30..0E3A THAI CHARACTER SARA A.. THAI CHARACTER PHINTHU 0E40..0E45 THAI CHARACTER SARA E.. THAI CHARACTER LAKKHANGYAO 0E47 THAI CHARACTER MAITAIKHU 0E4D THAI CHARACTER NIKHAHIT 0E81..0EAE LAO LETTER KO.. LAO LETTER HO TAM 0EB0..0EC4 LAO VOWEL SIGN A.. LAO VOWEL SIGN AI 0ECD LAO NIGGAHITA 0EDC LAO HO NO 0EDD LAO HO MO 0F40..0F81 TIBETAN LETTER KA.. TIBETAN VOWEL SIGN REVERSED II 0F90..10F6 TIBETAN SUBJOINED LETTER KA.. GEORGIAN LETTER FI 1100..1FBC HANGUL CHOSEONG KIYEOK.. GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI 1FBE GREEK PROSGEGRAMMENI 1FC2..1FCC GREEK SMALL LETTER ETA WITH VARIA AND YPOGEGRAMMENI.. GREEK CAPITAL LETTER ETA WITH PROSGEGRAMMENI 1FD0..1FDB GREEK SMALL LETTER IOTA WITH VRACHY.. GREEK CAPITAL LETTER IOTA WITH OXIA 1FE0..1FEC GREEK SMALL LETTER UPSILON WITH VRACHY.. GREEK CAPITAL LETTER RHO WITH DASIA 1FF2..1FFC GREEK SMALL LETTER OMEGA WITH VARIA AND YPOGEGRAMMENI.. GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI 207F SUPERSCRIPT LATIN SMALL LETTER N 2102 DOUBLE-STRUCK CAPITAL C 2107 EULER CONSTANT 210A..2113 SCRIPT SMALL G.. SCRIPT SMALL L 2115 DOUBLE-STRUCK CAPITAL N 2118..211D SCRIPT CAPITAL P.. DOUBLE-STRUCK CAPITAL R 2124 DOUBLE-STRUCK CAPITAL Z 2126 OHM SIGN 2128 BLACK-LETTER CAPITAL Z 212A..212D KELVIN SIGN.. BLACK-LETTER CAPITAL C 212F..2131 SCRIPT SMALL E.. SCRIPT CAPITAL F 2133..2138 SCRIPT CAPITAL M.. DALET SYMBOL 2160..2182 ROMAN NUMERAL ONE.. ROMAN NUMERAL TEN THOUSAND 3041..3094 HIRAGANA LETTER SMALL A.. HIRAGANA LETTER VU 30A1..30FA KATAKANA LETTER SMALL A.. KATAKANA LETTER VO 3105..318E BOPOMOFO LETTER B.. HANGUL LETTER ARAEAE AC00..D7A3 .. FB00..FB17 LATIN SMALL LIGATURE FF.. ARMENIAN SMALL LIGATURE MEN XEH FB1F..FB28 HEBREW LIGATURE YIDDISH YOD YOD PATAH.. HEBREW LETTER WIDE TAV FB2A..FD3D HEBREW LETTER SHIN WITH SHIN DOT.. ARABIC LIGATURE ALEF WITH FATHATAN ISOLATED FORM FD50..FDFB ARABIC LIGATURE TEH WITH JEEM WITH MEEM INITIAL FORM.. ARABIC LIGATURE JALLAJALALOUHOU FE70..FEFC ARABIC FATHATAN ISOLATED FORM.. ARABIC LIGATURE LAM WITH ALEF FINAL FORM FF21..FF3A FULLWIDTH LATIN CAPITAL LETTER A.. FULLWIDTH LATIN CAPITAL LETTER Z FF41..FF5A FULLWIDTH LATIN SMALL LETTER A.. FULLWIDTH LATIN SMALL LETTER Z FF66..FF6F HALFWIDTH KATAKANA LETTER WO.. HALFWIDTH KATAKANA LETTER SMALL TU FF71..FF9D HALFWIDTH KATAKANA LETTER A.. HALFWIDTH KATAKANA LETTER N FFA0..FFDC HALFWIDTH HANGUL FILLER.. HALFWIDTH HANGUL LETTER I #Ideographic 3007 IDEOGRAPHIC NUMBER ZERO 3021..3029 HANGZHOU NUMERAL ONE.. HANGZHOU NUMERAL NINE 4E00..9FA5 .. F900..FA2D .. Disposition: partly accepted. The 14652 will follow TR 10176 annex A. ********** pp. 43-70: "i18nrep" repertoire file This list is arbitrarily chosen, and the principles for characters in it are unstated. If the repertoire file is not going to correspond to one of the named and numbered subsets of ISO/IEC 10646 (and Subset 300, the BMP, would be the obvious choice), then the choice of characters in the repertoire file *must* be justified in 14652. On inspection, it is clear that many combining characters from 10646 have been omitted, but this is not done systematically or consistently. For example, combining characters for U+064B ARABIC FATHATAN .. U+0652 ARABIC SUKUN *are* included. But if so, why not GERESH, etc., for Hebrew? Disposition: partly accepted. The list of characters corresponds to prior art on the works of POSIX locales, and it is included to facilitate reuse of locale data already in use. There will be an explantion to this effect in the rationale. See response to Canadian comment 21. On pp. 68-69, the C0 controls are duplicated in this list. They appeared already (on page 43), with different mnemonics. This calls into question the meaning of the REPERTOIREMAP file. Are duplications of characters allowed, in which case the REPERTOIREMAP file is really a definition of the mnemonics by which characters can be referred to (e.g. and ), or is it intended to be a listing of the characters in a repertoire, in which case no duplications should be allowed? Disposition: the repertoiremap serves both the purpose of defining the repertoire, and also to define mnemonics. If the intention is actually to define a repertoire, then the C1 control functions defined on page 69 should be omitted. These are not specified by 10646 at all, and it is dangerous in 14652 to try to override the function of other standards which specify the usage of C1 controls. DIsposition: 10646 does contain the ISO 6429 control characters per the normative inclusion of this standard. If the intention is, rather, to just define a bunch of short mnemonics, then most of this entire listing is useless and should be omitted. Introducing mnemonics such as for GREEK SMALL LETTER XI and for CYRILLIC SMALL LETTER ZHE and for HEBREW LETTER FINAL KAF is completely confusing. A very small percentage of these mnemonics has seen widespread use in plaintext reference to accented characters. The rest should be completely abandoned in CD 14652 in favor of use of the hexadecimal value as the unique symbolic identifier for a 10646 characters (e.g. ). Disposition: see explanation above that this covers prior art, to use existing locale definitions. The pejorative and inaccurate note "(not a real character)" should be dropped from the listing of combining characters on pp. 69-70. Furthermore, it is completely unexplained why most of these are given user-defined character values when they are actually encoded characters in 10646. E.g. <"'> NON-SPACING ACUTE ACCENT (not a real character) must be amended to: COMBINING ACUTE ACCENT with the correct 10646 encoding and character name. Disposition: The NON-SPACING ACUTE ACCENT etc are characters that are placed before the base character as in ISO/IEC 6937, and not after, as the 10646. They thus have different semantics and they are thus different characters. Additional technical comments This proposal is not ready for prime time. We must coordinate this standardization effort with Java and Win32 internationalization. We need to treat XPG4 as one of the contributing standards, not as the standard being extended. Disposition: rejected. This specification has been underway for a number of years. A1. The mapping of Unicode character types to POSIX LC_TYPE attributes should be specified, but doing this by using the XPG4 LC_TYPE syntax is not appropriate. These character attributes are in general not culturally specific. The base POSIX character attributes are also missing a large number of attributes needed for parsing a larger character set. A2. There are a small number of upper/lower case conversions which are locale dependent. Even in locales with such modifications (such as Turkey) it is still necessary to have universal upper/lower functions to be able to deal with matching of names (such as file names) which are processed simultaneously in multiple locales. A3. Cultural cases of differing case mapping should be defined as exceptions, rather than building up a complete upper/lower table. The existing POSIX locales have tended to incompleteness in the case mapping tables. A4. Although the LC_COLLATE syntax is complex, it at least tries to address the problems of doing override collation from a base collation order. This is similar to what Java has done, but in this case the Java syntax is simpler than the 14652 proposal. If we limit the scope of what we expect this locale based sorting to do it is a usable compromise. Those people who need complex sorting including numeric ordering, conversion of numerics to names, and phonetic reordering should expect to use the locale as the basis for information but to significantly pre-process the data. For sorting file names a universal multi-script collation with overrides for various locales is good enough. A5. The two letter mnemonics used in the i18nrep section are worthless. I think the best solution is to use the "meaningful" names for the basic latin characters and punctuation, and the unicode based names for other characters. Disposition: there may be categories from unicode that could be used in the 14652 standard. The USA is invited to give text on which character classes to include. The rest of the comments are addressed above. ================================================================ Editorial Comments pp. 64 ff. "IDEOGRAPHIC" is consistently misspelled in the character names. If this misspelling has not been caught, then all other character names should be carefully checked against 10646 to ensure that they are exactly correct. Disposition: accepted. The list of names will be checked with ISO/IEC 10646 annex E. Spelling errors: p 11, 3rd paragraph "depreciated" --> "deprecated" p. 42, 2nd paragraph, last line "an" --> "and" p. 74 "with Slavic or Middle East character sets" should be corrected to "for Cyrillic or Middle Eastern scripts". Disposition: accepted, if they can be found, as the references does not correspond to WG20 N528 = SC22 N2504. -------------- end of comments received -----------------------