SC22/WG20 N883R
From: Keld Jørn Simonsen [keld@dkuug.dk]
Sent: Friday, October 05, 2001 12:19 AM
Draft for approval/Keld
Title: Disposition of comments on DTR 14652.
Source: WG20
Date: 2001-10-04
Cross reference: JTC1 N6483
Draft disposition of comments on JTC 1 N 6404, ISO/IEC DTR 14652 -
Functionality for Internalization - Specification Method for Cultural
Conventions
The DTR 14652 ballot closed with 10 votes approving the draft
and 8 votes disapproving.
Although this ballot received a majority of approval votes, there
were a significant number of negative votes and comments from
national bodies. Therefore, the summary of voting was sent
to SC 22 for resolution of the comments and preparation of a revised DTR text.
SC22 passed a resolution where WG20 was requested to produce
a repacked DTR where the non-controversial and controversial
portions are identified.
This will be done by marking all controversial clauses of the specification,
as noted below.
The following is the dispositions of comments.
> Finland
> DISAPPROVAL OF THE DRAFT FOR REASONS ON THE ATTACHED (Please
> indicate if acceptance of these reasons and appropriate changes
> in the text will change your vote to approval)
>
> In our consultation with other national bodies on this, we have
> become aware of the forthcoming negative US comments that we
> find both relevant and appropriate. We believe that it would
> serve no useful purpose and be utmost inefficient if we were
> to reformulate them by ourselves, especially since we believe
> that it would be difficult if not impossible to try to solve
> the problems inherent in this DTR with pointed, individual
> editing instructions.
Response: The Finnish comments are implicitely resonded to via
other responses, as they are defined implicitely via other comments.
The Finnish member body will be asked if they would like to
have these comments or other comments in annex D.
> France
> Please note that the acceptance of these reasons and appropriate changes
> in the text will change your vote to approval.
>
> 1-Audience expectation
>
> The objective of DTR 14652 set forth in line 148 :
>
>
> [..] that are expected to be developed for a number of programming languages
>
> cannot be reached because the DTR 4652 is kept compatible with POSIX:1996,
> as stated in its "Scope" section. POSIX:1996 architecture is not fit for
> a general and modern specification of cultural services, and its next
> revision is not expected to improve on that particular matter.
> Conversely, keeping POSIX compatibility will doubtlessly serve
> POSIX audience better, so we believe the DTR shall insists it
> belong to the POSIX culture. Therefore, we kindly request the
> line 148 to specify its audience is POSIX culture, such as :
>
>
> The descriptions are intended to be coded in text files
> to be used via Application Programming Interfaces,
> that are expected to be developed for a number of
> systems which comply with ISO/IEC 9945.
Response: accepted. Will replace the text.
>
> 2-POSIX 200X compatibility
>
> At the same time, every reasonable step must be taken to
> ensure that the DTR will accommodate POSIX 200x. We kindly
> request this effort is asserted or documented somewhere.
>
Response: Accepted. The work has been aligned with the POSIX:200x work.
The POSIX 200X will be added to the normative references, and that an
alignment effort has been done is added to the the scope.
>
> 3-Multiple currency
>
> The currency multiple value is not expected to be used because
> the DTR will be published after the most important dual
> currency period. It is expected that the solutions that
> will be implemented using (hopefully) POSIX until 2002-02-17
> (end of dual currency for the first round of "Euroland"
> countries) will be used in future currency-switching countries,
> in Europe or elsewhere. In short this solution arrives too late,
> and is not proven to be appropriate. We kindly request it is withdrawn.
>
Response: Other member bodies want this functionality.
The LC_MONETARY clause will be marked as controversial.
>
> 4-Iswgraph
>
> We are not sure that the DTR allows for intelligent categories, e.g.
> that one can handle multiple "space" characters, like C/C++ does
> with iswgraph. If not so, we kindly request the DTR to do so,
> in particular (but not only) for multiple space.
>
Response: The TR will be catering for multiple "space" characters as
C/C++ does, as the information provided via the classes are meant
to also be used by C/C++ APIs. No change.
>
> Germany
> Vote: Disapproval with comments
>
> If the comments are satisfactorily resolved, the vote will change to
> approval for this TR.
>
> For the record it should be stated that Germany will not support the
> transformation of this TR into an international standard even if all of
> these comments are resolved.
>
> Comments:
> General: The draft technical report has now reached a certain stage of
> maturity that might possibly make it useful for guidance in certain
> communities. However, it still contains a considerable number of errors and
> shortcomings, some of a systematic nature, that make it unsuitable for
> acceptance.
>
>
> * There are multiple errors in the membership of LC_CTYPE classes. For
> example, the draft introduces two new classes that are meant to be related
> to the ISO/IEC 10646-1 descriptions of combining characters. However, the
> draft has its own, somewhat peculiar interpretation of combining
> characters: Simply "combining" are, quite properly, "Characters to form
> composite graphic symbols, such as characters listed in ISO/IEC 10646:1993
> annex B.1." (l. 939f). This is, what one would intuitively understand also
> combining3 (i. e. combining characters allowed in a level 3 implementation
> of 10646, i. e. all) to mean. However, combining3 is combining -
> "combining2", i. e. minus those combining characters in a level 2
> implementation. The terminology should be adapted accordingly, e. g.
> combining for all combining characters (with combining3 as an equivalent)
> and combining2 to mean specifically those allowed in a level 2
> implementation - if combining2 is indeed needed at all.
> Most ideographs are not included in the <graph> class (why?). Also, the
> draft includes only the repertoire of 10646 as it was in 1998 and should be
> extended to cover at least 10646-1:2000.
Response: Partly accepted. Some errors have been found in the
tables for the LC_CTYPE classes. They will be corrected as follows:
Include in line 1670: U06D6
Reason: this character is in the combining class in 10646
Exclude in line 1683: U0F88..U0F89
Reason: these characters are not in the combining class in 10646
line 1263 change U20AA to U20AC
Reason: to include the Dong sign and Euro sign in class punct
Also inclue FFFC in class punct.
For combining classes: The draft follows the tables in ISO/IEC 10646-1:1993,
and the tables of annex B in that standard and the draft are the same.
It seems to be most secure to closely follow the originating standard.
The keywords reflect a widely implemeted industry practice.
No change.
For the ideographs: As they are in the "alpha" class, they are automatically
included in the "graph" class. But it is agreed that this would be
better documented if these characters were explicitely included.
Thus for line 1338 add Hangul and CJK: UAC00..UD7A3 U4E00..U9FA5
It was decided to follow closely a specific repertoire of IS 10646,
as also done for IS 14651 and TR 10176. The repertoire can be updated
in a future revision of this TR.
> *The changes to the monetary section that are incompatible with the current
> POSIX.2 standard (ISO/IEC 9945-2) must be removed, in particular all cases
> where it has previously only been allowed to insert one value, but now a
> semicolon-delimited list of values. This is true in particular for the
> definition of multiple national currencies.
Response: Not accepted. LC_MONETARY will be marked as controversial.
A new specification will not be downward compatible, such that
older applications can use the new specifications.
This is also what is stated in the scope, that is is not expected that
POSIX will be able to handle all new constructs in this specification.
> a) This breaks implementations that expect the single values defined in
> POSIX.2.
Response: 14652 is an enhancement to POSIX. Applications that use the new
format can also use the older POSIX locales. However, it is
not the intention that old applications can use the new format
without change.
> b) It does not specify which of the currencies should be selected,
> unless the "valid_from" and "valid_to" keywords are meant to be such a
> mechanism. In this case, the mechanism would be highly unsuitable
> especially for the case of the Euro where the currency co-exists over a
> period of time (now virtually over), and the correct currency sign is
> selected on a case by case basis. The Java approach of multiple locales for
> one language and locale, differing only with respect to a certain point -
> currency in this case - is far more flexible and user friendly.
Response: Yes, the "valid_from" and "valid_to" keywords are meant
to be such a mechanism. It caters for the Euro case, which may occur in the
future for other European countries switching to the Euro currency.
The Java approach is not something that is generally available in
SC22 language standards, while the specification here may be used with
more programming languages, even with Java.
> * The fixed, locale-based currency-rate must be removed, as repeatedly
> discussed in the past (ll 2275ff and also ll 6504ff). It is an unsuitable
> mechanism that will not work even for those European countries in the
> Euro-zone.
Response: the fixed currency-rate will work for the Euro-zone.
> * The changes to the LC_TIME section (section 4.7) that are incompatible
> with POSIX.2 must be undone. This includes the issue of "twelve or thirteen
> semicolon-separated" (2574f) months, whereas previously only twelve months
> were allowed. Implementations that expect exactly twelve entries here will
> break.
Response: noted. The LC_TIME section will be marked as controversial.
14652 is upwards compatible with POSIX. It allows for 13 months
because some lunar calenders prescribe this.
Oldfashioned POSIX compliant locale parsers cannot expect to
be able to parse 14652 FDCC-sets. There are many new features that
old POSIX-compliant parsers would not understand.
One such thing is the hexadecimal ranges.
> * The value of the timezone keyword (2663ff) in LC_TIME is difficult to
> see for countries that span more than one timezone. The relevant timezone
> is in any case present in the TIMEZONE environment variable.
Response: noted. The LC_TIME section will be marked as controversial.
No change.
> * The usefulness of the LC_XLITERATE category (section 4.9) has repeatedly
> been questioned. As the TR freely admits, it is suitable only to "simple
> transliteration based on substring substitution" (2938). There is often if
> not usually more than one transliteration scheme from a source script to a
> target script even within one culture. To hardcode one of these into a
> locale makes little sense. Therefore, LC_XLITERATE should be removed.
>
Response: noted. The LC_XLITERATE section will be marked as controversial.
The transliteration facilities provided by the specification is implemented
and used in some Linux implementations.
>
> Ireland
>
> Ireland votes NO on DTR 14652. In consultation with members in the
> Irish IT industry, we became aware of the forthcoming negative US
> comments. These comments are extensive and exhaustive, and it seems
> clear that this project, which has been on the books for a very long
> time indeed, still lacks consensus and technical accuracy. We do not
> feel that it would be useful to publish our own litany of what is
> wrong with this standard; rather, in this case, we consider the US
> comments to state the case quite clearly.
Response: These comments will thus be dealt with implicitely via the response
to the US comments.
> It is not clear that this matter ought to be standardized. It seems
> far more appropriate for it to be formulated and published in another
> medium, such as an RFC or a UTR.
Response: Ireland will be contacted for a possible entry in annex D.
>
> Slovenia
> Standards and Metrology Institute of Slovenia (SMIS) as a full
> member of JTC1 would like to vote "against" for the document
> ISO/IEC 14652 with the folowing techical comments:
>
> GENERAL: There is no consistency with existing practice in the
> technical part of the document. In particular:
Response: the document is reflecting the most widespread existing
practice on Unix/Linux today, in terms of computers running with this
behaviour.
>
> OBJECTION 1
> Section 4.1.4.1 comment_char (lines 652-653, and affecting the FDCC-set definition)
>
> Current text:
> "The comment character defaults to the number_sign "#". All examples
> in this Technical Report uses "%" as the comment character,
> except where otherwise noted."
>
> Problem and Action:
> ISO/IEC 9945-2:1992 (POSIX.2) uses the default comment_char, and for
> consistency with existing practice, this document should as well.
> Change the sentence "All examples..." to "All examples in this
> Technical Report use the default comment character." Also,
> revise the FDCC-set definition.
Response: not accepted. The specification reflects current practice on
Linux or other platforms using the GNU C compiler. Also, it does not
change any behaviour, and
it would be tedious and errorprone to change in the specification.
It also reflects use in IS 14651. This use removes some
problems that occurred with some communications protocols, and
some problems with presentation in some countries.
>
> OBJECTION 2
> Section 4.1.4.2 escape_char (lines 666-667, and affecting the FDCC-set definition
>
> Current text:
> "The escape character defaults to backslash "\". All examples in this
> Technical Report uses "/" as the escape character, except where otherwise noted."
>
> Problem and Action:
> ISO/IEC 9945-2:1992 (POSIX.2) uses the default escape_char, and for
> consistency with existing practice, this document should as well. =
> Change the sentence "All examples..." to "All examples in this
> Technical Report use the default escape character." Also, revise the FDCC-set
> definition.
Response: Not accepted. See response to previous comment.
>
>
> Sweden
> We find that this Technical Report of type 1 are not up-to-date with
> modern internationalisation techniques. Incremental changes are unlikely
> to result in anything sufficiently up-to-date. We therefore suggest
> that this project be discontinued. A new internationalisation format
> report could be taken up at a later date, should resources and
> sufficient consensus arise.
Response: It is acknowled this TR has controversial sections.
The Swedish member body is invited to contribute to annex D
if they wish.
> SE 1. MAJOR:
> There is no character encoding declaration for a FDCC set file
> itself, nor any requirement to use an encoding scheme for the
> universal character set (e.g. UTF-8). Instead there is
> essentially a limitation to POSIX so-called portable
> characters (a subset of ASCII), otherwise the encoding
> is in principle undefined ("implementation defined") and
> that cannot be relied upon. Therefore expressing some of
> the things covered by 14652, like weekday names, are needlessly
> cumbersome, using various kinds of character references.
> Instead such items should be expressed directly as
> the strings that one wishes to have output (or parsed).
Response: not accepted. This follows POSIX practice.
> SE 2. MAJOR, LC_CTYPE:
> Draft 14652 suggests to tie character properties to
> locales (FDCC sets). This will surely lead to
> inconsistencies among locales for property
> assignments for the same characters. Instead haracter
> properties should be defined on the universal character set (UCS).
> Together with well defined mappings between various character
> encodings and the UCS one can get consistent property assignments.
> In particular some properties may be defined only for a subset
> of the UCS characters in many locales, which works very badly
> together with programming paradigm where all character string
> processing is done on UCS strings, and other encodings are
> handled via conversion (this is the modern approach to
> character processing).
Response: not accepted. It is recommended that the
character properties of the i18n FDCC set be used, which
provides a full set of character attributes for UCS in
a specific version. But the model allows for deviance as it
is known that some deviance is needed in some cases.
It is agreed that it is advisable to do all string handling
in UCS and then convert to the actual encoding, but it
is also known that this may not always be doable, eg in embedded
systems, and the model thus allows for that. No change.
> SE 3. MAJOR, REPERTOIREMAP:
> More than 25 pages (in small print) are devoted to a
> so-called repertoiremap (clause 6), with non-mnemonic
> arcane "names" for characters. This list of names should
> be removed. Instead, for these names, for the few instances
> really needed (like invisible characters), use the code
> point number (in hexadecimal). But for the majority of
> cases use the character itself, as mentioned in the first point above.
Response: Not accepted. Clause 6 wil be marked as contoversial.
> SE 4. MAJOR:
> Many of the components formats presupposes a C-like API,
> using format strings with % followed by a letter. Not all
> systems may wish to use such format strings. Further,
> the character classes are insufficient for many purposes,
> assuming an "encoding independent" paradigm (which
> is assumed for standard C, but cannot be used for the
> most modern character encodings, i.e. the UCS encoding forms,
> since the UCS has many features not present in most or
> any legacy character encoding.
Response: Not accepted. No changes. No alternates were provided.
It is agreed that the syntax is C-like, but no better
syntax has been proposed, and it is too late to come up with a new
scheme at this stage. The C syntax has been well established.
This is only a TR type 1 and a better syntax may be proposed
for a revision of the TR. This is existing POSIX practice.
> SE 5. MAJOR:
> The locale (FDCC set) layout structure is very much geared
> towards having fixed premade locales. It's not geared
> towards having data in one layer and user preference selections
> in another layer. Instead these layers are mixed up, needlessly
> complicating things for users that may wish to compose their
> own "locale", i.e. formatting preferences.
Response: Not accepted. There are mechanisms for modifying data with user
preferences. It does not seem very complicated for users to
produce their own specifications, or even for programs to aid
users setting up their preferences.
> SE 6. MAJOR:
> LC_COLLATE: The format and semantics are described in 14651.
> Only a reference to that is to be made, no conflicting (as
> is presently given in DTR 14652) specification can be allowed.
Response: Not accepted. The format and semantics are not fully described in
14651, there are some features missing in 14651. The syntax
and semantics in 14651 is only described for use with the
Default template, not generally. It is decided for completeness that
LC_COLLATE be described in its full in 14652.
>
> SE 7. MAJOR:
> Charmaps: charmaps are not in any way related to 'locales'
> (FDCC sets), and locales should thus never specify any charmap.
> Any character encoding can occur for any locale (compare XML
> and its 'encoding' pseudo attribute). Further the "xliterate"
> 'category' seems to be more related to character mapping
> fallbacks than to real transliteration.
Response: Not accepted. It is agreed that charmaps should be seen as orthogonal
to the rest of the FDCC set description, but it is a required
element to make FDCC sets work in many environments.
It is agreed that 'xliterate' can be used for character mapping,
but it can also be used for many common transliterations purposes.
The LC_XLITERATE clause will be marked as controversial.
>
> United Kingdom
> Technical Comments
>
> The work is still premature and does not represent any industry
> practice. Although some useful possibilities are documented which
> would benefit from further development, many of these should not be
> considered for development within a standard, nor in a technical report
> which can be referred to normatively.
Response: The work at hand is in most parts implemented
in the Linux operating system, as part of the GNU C and
C++ compilers, so it represents widespread industry practice.
It is thus relevant to document this practice. The area is a good
candidate for standardization as an interface is needed to supply
information in the area, and to be able to process the information.
Several clauses will be marked as controverisal, and the UK is invited
to contribute to annex D.
> In addition, there has been sustained opposition to this from various
> industry sources who participate in ISO/IEC JTC1/SC22.
Response: True, but there has also been a majority behind issuing the
Technical report.
> Given the lack of consensus, this item should be withdrawn from the
> ISO/IEC JTC1/SC22/WG20 work programme. It may be useful for this to be
> developed in other fora, e.g. some Linux development groups, but it
> should not be developed further in ISO/IEC JTC1/SC22/WG20.
>
Response: There is consensus, as defined by JTC 1 rules, to approve
this as a TR. JTC 1 and SC22 has instructed WG20 to go forward with
the TR for a second DTR ballot.
> There are also errors in this draft which have not been corrected to
> take account of previous comments in the meetings. The editorial
> comments below list just a few of these.
Response: All comments on the earlier drafts have been addressed,
by disposition of comments, that have been approved by WG20,
and the resulting changes have been applied by the editor.
> Editorial comments
>
> There remain errors in new tables in LC_CTYPE classes: these use
> different descriptions and different character groups than those
> defined in ISO/IEC 10646-1:2000.
Response: An effort was done to correct this in DTR 1.
The U.K. is invited to come forward with a list of changes
in this respect.
> In the monetary and time sections, (a) definitions of multiple
> currencies are introduced, which conflict with implementations which
> anticipate only the single values defined in the POSIX.2 standard
> ISO/IEC 9945-2, and (b) different definitions for the number of
> months, and the start day of the week, are introduced.
Response: Not accepted. The specification cannot be backward compatible,
when new issues are handled. The POSIX standard has no way of
understanding more than one currency nor a year with more than 12 months.
The clauses are marked as controversial.
> The LC_XLITERATE section for character transliteration does not include
> the corrections suggested at previous meetings of ISO/IEC
> JTC1/SC22/WG20. The conversions proposed are somewhat idiosyncratic, and
> do not represent any consensus for conversion within ISO/TC46/SC2
> (Conversion of Written Languages) which develops standards on
> transliteration, and other alternative transliteration conventions are
> not catered for.
>
Response: Partial accepted. There have previously been some corrections proposed
wrt transliterations and they have all been accomodated.
The section will be described more like character string substitution
which can be used for culturally dependent issues like transliteration
and fallback. The clause is marked as controversial.
> United States
>
> OBJECTION #1
> Section 4.1.4.1 comment_char (lines 652-653, and affecting the FDCC-set
> definition)
>
> Current text:
> "The comment character defaults to the number_sign "#". All examples in this
> Technical Report uses "%" as the comment character, except where otherwise
> noted."
>
> Problem and Action:
> ISO/IEC 9945-2:1992 (POSIX.2) uses the default comment_char, and for
> consistency with existing practice, this document should as well. Change the
> sentence "All examples..." to "All examples in this Technical Report
> use the default comment character." Also, revise the FDCC-set definition.
Response: not accepted. The document reflects widespread existing
practice, which enhances the portability of the specifications.
See also response to Slovenian comments.
>
> OBJECTION #2
> Section 4.1.4.2 escape_char (lines 666-667, and affecting the FDCC-set
> definition)
>
> Current text:
> "The escape character defaults to backslash "\". All examples in this
> Technical Report uses "/" as the escape character, except where
> otherwise noted."
>
> Problem and Action:
> ISO/IEC 9945-2:1992 (POSIX.2) uses the default escape_char, and for
> consistency with existing practice, this document should as well. Change the
> sentence "All examples..." to "All examples in this Technical Report
> use the default escape character." Also, revise the FDCC-set definition.
Response: not accepted. The document reflects widespread existing
practice, which enhances the portability of the specifications.
See also response to Slovenian comments.
>
> OBJECTION #3
> Section 4.2 LC_IDENTIFICATION (lines 698-777)
>
> Problem:
> The text defines a list of properties for an FDCC-set, and states that
> "All keywords are mandatory unless otherwise noted." (lines 701-702) However,
> at lines 728-729, it states "If information required for any of the
> mandatory keywords above is not available, then the corresponding string
> is an empty string." Further, the i18n LC_IDENTIFICATION section defined
> at lines 748-777 contains empty strings for six `mandatory' keywords.
>
> This is confusing. What the text is trying to say is that certain keywords
> must be present, as opposed to requiring that values be assigned to certain
> keywords. But when most people think of "mandatory", they think of it in
> terms of values, not keywords. Besides, what is the rationale of requiring
> that certain keywords be present, but NOT requiring that they include a
> value? If values are not required, they are not mandatory.
>
> Action:
> Make the following changes.
>
> 1. Change the sentence "All keywords are mandatory..." to "Values must be
> supplied for all keywords, unless otherwise noted."
>
> 2. Add the sentence "This keyword is optional." to the description of
> keywords email, tel, fax, language, and territory.
>
> 3. Remove the sentence at lines 728-729 ("If information required for
> any of the mandatory keywords...").
Response: accepted.
> OBJECTION #4
> Section 4.3 LC_CTYPE (lines 787-788 and 817-821 and affecting
> Section 4.3.2 "i18n" LC_CTYPE category)
>
> Current wording:
> "The double increment hexadecimal symbolic ellipses ("..(2)..") works
> like the hexadecimal symbolic ellipses, but generates only every other
> of the symbolic character names. As an example. <U01AC>..(2)..<U01B2>
> is interpreted as the symbolic character names <U01AC>, <U01AE>, <U01B0>,
> and <U01B2>, in that order."
>
> Problem:
> This type of symbolic ellipses allows an FDCC-set author to save a little
> typing for some scripts if letters for those scripts are arranged in a code set
> in uppercase/lowercase pairs. Using this type of ellipses, the author can
> indicate a start and end point for a range, and pick up every other
> entry.
>
> The problem is that this is extremely confusing, especially considering
> that there already are three other types of ellipses. It will be extremely
> easy for authors to make mistakes, and difficult to implement and maintain
> all these variations. The work saved by adding this type of ellipses is
> overshadowed by the implementation, maintenance, readability, and
> potential for mistakes that it adds.
>
> Action:
> Remove lines 817-821. Remove the reference to double increment
> hexadecimal symbolic ellipses in lines 787-788. Change the entries
> in Section 4.3.2 to eliminate usage of this type of ellipses.
Response: accepted. Text will be removed and related tables will
be modified accordingly.
>
> EDITORIAL #5
> Section 4.3.1 Character classification keywords (line 834)
>
> Problem and Action:
> Grammar; change existing text to "...the interpreting system provides
> them if missing and accepts them silently..."
Response: Not accepted. this is wording from 9945 and it is not foreseen that
the wording creates problems.
>
> OBJECTION #6
> Section 4.3.1 Character classification keywords (lines 855-857)
>
> Current wording for digit class:
> "Define the characters to be classified as numeric digits. Digits
> corresponding to the values 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 can be
> specified in groups of 10 digits,..."
>
> Problem:
> The text was not quite accurate in POSIX.2, and it definitely is not
> accurate here. The first sentence is copied from POSIX.2, but in that
> standard, *only* the portable digits 0-9 could be specified. This proposal
> extends the definition, but only allows decimal digits. The restriction
> should be spelled out.
>
> Action:
> Change the first sentence to "Define the characters to be classified as
> decimal digits."
Response: accepted.
>
> OBJECTION #7
> Section 4.3.1 Character classification keywords (line 867)
>
> Problem and Action:
> Incorrect class name; change "digits" to "digit".
Response: accepted.
>
> OBJECTION #8
> Section 4.3.1 Character classification keywords (lines 869-878)
>
> Current wording for "outdigit" class:
> "Define the characters to be classified as numeric digits for output from an
> application, such as to a printer or a display or a output text file. Digits
> corresponding to the values <0>, <1>, <2>, <3>, <4>, <5>, <6>, <7>, <8>,
> and <9> can be specified, and in ascending order of the values they
> represent. The intended use is for all places where digits are used for
> output, including numeric and monetary formatting, and date and time
> formatting. Only one set of 10 digits may be specified. If this keyword is
> not specified, the digits 0 through 9 of the portable character set
> automatically belong to this class, with application-defined character
> values..."
>
> Problem:
> This keyword as defined is insufficient for its stated use. Assume someone
> wants to define Roman numerals for use in dates. Since only the values 0-9
> can be specified, there is no way to list the Roman numerals X, XI, and XII
> for the 10th-12th months. Or suppose someone wants to write Chinese monetary
> values. There is a single character for "ten", a single character for
> "hundred", and so on. To express 10, you use the "ten" character; to
> express 20, you use the "two" character plus the "ten" character (two 10s).
> The outdigit keyword does not allow for the Chinese "ten" or "hundred"
> (and so on) characters, and so does not fulfill the intended use for
> "all places where digits are used for output, including numeric and
> monetary..."
>
> Action:
> Remove this keyword since it does not satisfy the stated need.
Response: not accepted. Roman numerals are not digits, and Chinese
numarals neither. Add "decimal" before digits to clarify this.
>
> OBJECTION #9
> Section 4.3.1 Character classification keywords (lines 902-905)
>
> Current wording in description of "xdigit" class:
> "...If this keyword is not specified, the digits <0> through <9>, the
> uppercase letters "A" through <F>, and the lowercase letters <a> through
> <f>, automatically belong to this class, with application-defined
> character values..."
>
> Problem:
> As written, this is different from the POSIX.2 requirement that the xdigit
> class must contain the portable digits 0-9 and the portable letters A-F
> and a-f. This only says that if the keyword is not specified, these
> portable characters are included, but with this text, a person could
> write an xdigit class that included only Hindi digits and some subset
> of Greek letters, and it would be legal. This is inconsistent with
> POSIX.2, and therefore must be changed.
>
> Action:
> Remove the clause "If this keyword is not specified," from the sentence
> beginning at line 902. The revised sentence will read "The digits <0>
> through <9>..."
>
> Also note that "A" in the sentence should be <A>.
Response: accepted.
>
> OBJECTION #10
> Section 4.3.1 Character classification keywords (lines 929-932)
>
> Current wording in tolower description:
> "...If this keyword is specified,
> the uppercase letters <A> through <Z>, and their corresponding lowercase
> letter, are specified. If this keyword is not specified, the mapping is the
> reverse mapping of the one specified for toupper."
>
> Problem:
> The description is incorrect for what happens when the keyword is
> specified. This is what happens if the keyword is NOT specified.
> However, the sentence (if fixed) still would be unnecessary because
> the second sentence "If this keyword is not specified, the mapping is
> the reverse..." implies that <A> to <Z> will be included.
>
> Action:
> Remove the sentence on lines 929-931 ("If this keyword is specified,...")
Response: not accepted. When the keyword is specified it states that
A-z is always included.
>
> OBJECTION #11
> Section 4.3.1 Character classification keywords (lines 933-946)
>
> (and see also Section 4.3.2 "i18n" LC_CTYPE category [class "combining" and
> class "combining_level3; lines 1664-1694])
>
> Current wording for "class" class:
> "Define characters to be classified in the class with the name given in the
> first operand, which is a string. This string only contains characters of the
> portable character set that either has the string "LETTER" in its description,
> or is a digit or <hyphen-minus> or <low-line>. The following operands are
> characters. This keyword is optional. The keyword can only be specified
> once per named class. The following two names are recognized:
>
> combining Characters to form composite graphic symbols, such
> as characters listed in ISO/IEC 10646:1993 annex B.1.
> combining_level3 Characters to form composite graphic symbols, that
> may also be represented by other characters, such as
> characters listed in ISO/IEC 10646-1:1993 annex B.2."
>
> And also current wording from the "i18n" FDCC-set definition, lines 1664-1694:
> "% The "combining" class reflects ISO/IEC 10646-1 annex B.1
> % That is, all combining characters (level 2+3).
> class "combining" /
> <U0300>..<U036F>; <U20D0>..<U20FF>; <UFE20>..<UFE2F>;/
> <U0483>..<U0486>;<U0591>..<U05A1>;<U05A3>..<U05B9>;/
> <U05BB>..<U05BD>;<U05BF>;<U05C1>;<U05C2>;<U05C4>;<U064B>..<U0652>;<U0670>;/
> <U06D7>..<U06E4>;<U06E7>;<U06E8>;<U06EA>..<U06ED>;<U0901>..<U0903>;<U093C>;/
> <U093E>..<U094D>;<U0951>..<U0954>;<U0962>;<U0963>;<U0981>..<U0983>;<U09BC>;/
> ...
> <U0F97>;<U0F99>..<U0FAD>;<U0FB1>..<U0FB7>;<U0FB9>;<U302A>..<U302F>;/
> <U3099>;<U309A>;<UFB1E>
> %
> % The "combining_level3" class reflects ISO/IEC 10646-1 annex B.2
> % That is, combining characters of level 3.
> class "combining_level3"; /
> <U0300>..<U036F>;<U20D0>..<U20FF>;<U1100>..<U11FF>;<UFE20>..<UFE2F>;/
> <U0483>..<U0486>;<U0591>..<U05A1>;<U05A3>..<U05AE>;<U05C4>;/
> <U05AF>;<U093C>;<U0953>;<U0954>;<U09BC>;<U09D7>;<U0A3C>;/
> <U0A70>;<U0A71>;<U0ABC>;<U0B3C>;<U0B56>;<U0B57>;<U0BD7>;<U0C55>;<U0C56>;/
> <U0CD5>;<U0CD6>;<U0D57>;<U0F39>;<U302A>..<U302F>;<U3099>;<U309A>"
>
> Problem:
> I've quoted a lot of original text here, because this is a confusing
> problem. I could not understand from the description what the classes
> were supposed to be for, so I looked at the i18n FDCC-set example.
>
> It turns out the description and definition of the two combining
> classes is exactly backward. ISO 10646 defines three levels:
>
> Level 1 -- most restrictive; shall not contain any characters listed
> in Annex B.1
> Level 2 -- less restrictive; shall not contain any characters listed
> in Annex B.2
> Level 3 -- least restrictive; can contain any coded character.
>
> The members listed of the classes in the FDCC-set, however, do not
> match the definitions. What is called combining_level3 is the group
> of characters that canNOT appear in a Level 1 or 2 implementation. What
> is called "combining", and described as being "all combining characters
> (level 2 + 3)", actually is the list of characters that canNOT
> appear in a Level 1 implementation.
>
> Action:
> These classes do not exist in other standards and are so ill-defined that
> it is impossible to say what characters are supposed to be defined in
> which class. Remove lines 933-946 and lines 1664-1694 from the draft.
Response: not accepted. The classes exist in another standard namely
IS 10646. This faithfully reflects the definition in that standard,
as it is defined in its annex B. LC_CTYPE keywords class, map, width
will be marked as controversial.
>
> OBJECTION #12
> Section 4.3.1 Character classification keywords (lines 947-955)
>
> Current wording in width description:
> "Define the column width of characters, for example for use of the C
> function wcwidth(). The operands are first a list for characters, possibly
> using various ellipses, and semicolon separated, then a <colon>, and then
> the width of these characters given as an unsigned positive integer. Such
> width-lists separated by <semicolon> may be given for the various widths.
> The default value of width of characters in class "cntrl" and class
> "combining" is 0, else the default value of width is 1. A width for a
> character may be overridden by a WIDTH specification in a charmap. This
> keyword is optional."
>
> Problem:
> This description is very confusing. What does it mean that a "...width
> for a character may be overridden by a WIDTH specification in a charmap"?
> Does that mean if it's one thing in the charmap and another in the
> FDCC-set, the charmap wins? Why should width specifications be in two
> places?
Response: Partially accepted. This is what it means. It seem that the wording is
clear enough as the text has been understood correctly. The reason for a
mechanism to override a default, is that in many cases the default
would suffice, while there are a some exceptions from this rule.
It is thus efficient to have a place to specify a default, and
places to specify exceptions. Keyword with will be marked as controversial.
> Also, this class is quite different from other LC_CTYPE classes. For other
> classes, one lists which characters are in that class, or a one-to-one
> mapping between uppercase and lowercase. This is different; you list a
> group of characters, and then define what value their width is. Each
> character in this class can have a different value, as opposed to other
> classes where it simply is a Boolean function -- if you're listed, you're in.
Response: yes, this is a correct observation. The specification is different
because it is a different subject.
> This class is confusingly-defined, and seems out-of-place in the
> Boolean-oriented LC_CTYPE section.
>
> Action:
> Remove lines 947-955.
Response: Not accepted. The specification satisfies a clear objective, and
it is well understood, as also indicated in the comments above.
LC_CTYPE also contains mapping to other characters, as toupper and tolower,
and with this keyword mappings to integers.
>
> OBJECTION #13
> Section 4.3.1 Character classification keywords (lines 956-973)
>
> Problem:
> The map keyword is poorly described. According to Annex A, it is supposed
> to provide the functionality associated with the C library function
> towctrans(), but that's not clear from the text here ("Define the
> mapping of characters." What?).
>
> Action:
> Either remove this keyword, or rewrite the description to make it
> clearer that this is designed to allow mapping of one type of characters
> to another, related type. For example, you might want to map hiragana
> to katakana. Or Hindi digits to portable digits. Etc.
Response: Partially accepted. The specification will be clarified to be to
map characters to other characters, as it is also descibed three
lines further ahead. Keyword map will be marked as controversial.
> OBJECTION #14
> Section 4.3.1 Character classification keywords (lines 975-1002)
>
> Problem:
> The mapping table of character class combinations duplicates information in
> POSIX.2 without adding any new data about classes included in this
> document.
>
> Action:
> Either remove the table completely, since the information already is
> available in another standard, or update it to include combination information
> about classes added for this document.
Response: not accepted. It is the intension of 14652 to carry all
information pertinent to i18n that was also included in IS 9945.
>
> OBJECTION #15
> Section 4.3.2 "i18n" LC_CTYPE category
>
> Problem:
> The membership of classes is inconsistent and confusing. With a few
> exceptions, it should match the classifications in the Unicode standard,
> where the classes/properties are comparable. Right now, class memberships
> are similar, but not identical to, comparable Unicode classes. For
> example:
>
> * the digit class includes a large group of digits that Unicode
> also identifies as being decimal, but is missing these groups:
>
> Myanmar (U1040..U1049)
> Ethiopic (U1369..U1371)
> Khmer (U17E0..U17E9)
> Mongolian (U1810..U1819)
> Fullwidth (UFF10..UFF19)
>
> Why should these be omitted, when the others are included?
Because they are not in the covered repertoire, except for
the fullwidth digits, which will be added.
> * the space class includes many of those that Unicode identifies
> as being space, but is missing:
>
> U00A0 -- No-Break Space
> U2007 -- Figure Space
> U202F -- Narrow No-Break Space
These spaces are not considered part of the space class, as they
are spaces that are regarded as graphical characters - nonremovable,
eg they cannot be removed at line breaks, and cannot be used for
word breaks. This is aligned with some existing practice.
> Note that this class also has several control characters, like <tab> and
> <carriage-return>, that Unicode does not consider part of the space class.
> However there is much existing practice on POSIX-based systems for
> including those controls, so it is understandable why they are here.
>
> * the punct class includes some, but not all, characters that Unicode
> identifies as being punctuation. For example:
>
> + it includes U2030..U2046, which are in the Unicode general punctuation
> block, but omits
>
> U2048 -- Question Exclamation Mark
> U2049 -- Exclamation Question Mark
> U204A -- Tironian Sign Et
> U204B -- Reversed Pilcrow Sign
>
> These also are in the general punctuation block.
Yes, but not in the covered repertoire
> + it includes the currency symbols in the range U20A0..U20AA, but
> omits these other currency symbols in the same block:
>
> U20AB -- Dong Sign
> U20AC -- Euro Sign
> U20AD -- Kip Sign
> U20AE -- Tugrik Sign
> U20AF -- Drachma Sign
U20AD U20EA and U20AF are not part of the repertoire covered.
U20AB and U20AC are in the repertoire, and will be added.
> + unlike Unicode 3.0, it includes most of the "Letterlike Symbols"
> from the range U2100..U213A in the punct class. This includes
> characters like U210B (Script Capital H), U2115 (Double-Struck
> Capital N), etc., but omits those that happen to have the word
> "LETTER" in their name; e.g.,
>
> U210C -- Black-Letter Capital H
> U2111 -- Black-Letter Capital I
>
> This range also omits U2139 (Information Source), and U213A
> (Rotated Capital Q), which are also in this Letterlike Symbols
> block.
>
> It's not clear why any in this range are included in punct, but
> the particular subset of characters listed is even more confusing.
U2139 and U213A are not in the covered repertoire.
> There are many more differences between this i18n FDCC-set and
> Unicode, but the point is that the differences exist. This document
> should use the Unicode values where they exist instead of inventing
> another group of classifications that differ in dozens of small ways.
>
> Action:
> Revise the membership of all classes to match the lists Unicode provides,
> where they exist. HOWEVER, in the few cases where the common practice in
> POSIX systems differs from Unicode (for example, including some control
> characters in the space class), retain that existing practice for
> members of the portable character set.
>
> Note, too, that 14652 defines some classes for which there are no
> matching Unicode properties. Obviously, in these cases, the i18n FDCC-set
> cannot match Unicode.
Partially accepted, with changes as indicated above.
>
> Section 4.4 LC_COLLATE
>
> This is a placeholder for the content of Section 4.4 (LC_COLLATE).
> See TECHNICAL #61, TECHNICAL #62 and TECHNICAL #63 later in this document.
>
>
> OBJECTION #16
> Section 4.5 LC_MONETARY (entire section)
>
> Problem:
> This section includes multiple keywords that were defined in POSIX.2,
> but it changes their definitions in such a way that existing applications
> would be invalid. This is incorrect. The changes allow the rules for
> multiple currencies to be specified in existing keywords, but in POSIX.2,
> only rules for single currencies can be defined.
yes, this is an enhancement of POSIX.2.
> While the need to handle multiple currencies is real, the method defined
> here is significantly different than what has been done when other
> LC_ categories have had to be extended. When expanding LC_TIME to allow
> for multiple calendars, new keywords were added (era, era_year, etc.),
> rather than simply tacking new entries on to the end of existing keywords.
No, this is not different to ways other categories have been
extended. This is an extension to the local currency symbol,
where there are more than one at one time, or there are different symbols
at different times.
> Consider the previously existing LC_MONETARY keyword currency_symbol.
> It is defined in POSIX.2 as "The string that shall be used as the local
> currency symbol," while here it is defined as "One or more strings
> separated by semicolons that are used as the local currency symbol." (lines
> 2293-2294). Assume I'm defining French currency and the euro. I might
> have something like this:
>
> currency_symbol "<F>";"<euro>"
>
> However, the description of this category no longer is correct -- these are
> not strings "that are used as the local currency symbol". That implies the
> two strings are synonyms for each other. The reality is that these are
> strings that represent different currencies used for this locale. They
> should not be glommed together in one keyword. It would be more accurate to
> separate these (and all other keywords that in this draft can take multiple
> values) into something like
>
> currency_symbol "<F>";
> alt_currency_symbol "<euro>";
The <f> and <euro> symbols are both local symbols.
The euro is not an alternate currency symbol, but
an additional local currency symbol, which is just as
necessary as the first mentioned.
If the proposed extension scheme would be followed, an
indefinite number of keywords would need to be defined,
for which no well defined API support could be specified.
> As defined in this draft, it is not clear how application programs parse
> or use these values. Existing implementations request *the* currency symbol
> and use it to format values. What would happen to a previously conforming
> application if it requested the (single) currency_symbol value, but an array
> of strings was returned? Lines 6509-6510 of the rationale state:
> "Also the same application call can be made to be valid for countries with
> a single currency and countries with dual currencies." That's only true
> if the application is expecting one *or more* values. Existing applications
> expect exactly one value for most of these keywords.
There is no previously conforming implementations to this specification.
You cannot make the POSIX standards forward compatible, when
that standard does not have the capability to handle the problems
at hand. That standard would give wrong results in any case.
> Now, suppose an application is rewritten to allow for multiple currency
> symbols. Now what? What rules does it use to decide which currency_symbol
> value it should use to format a monetary quantity? If the section were
> designed so that the existing definitions had not changed, but alt_*
> keywords were added when needed, an application could request currency_symbol
> when formatting national currency values, and alt_currency_symbol when
> formatting euros (or another alternate currency).
The euro is not an alternate. It must be displayed together
with the other currency. You have a double-currency in these countries.
> Also, *because* this section allows multiple currencies to be specified,
> there is an implied tie between keywords. If currency_symbol includes
> French francs and euros (in that order), frac_digits, ps_cs_precedes,
> etc., must also specify the rules for francs and euros in the SAME order.
> The valid_from keyword attempts to explain this dependency, but the
> wording is very confusing and not restricted to that keyword.
> Moving to other keywords, there are a new set of int_* keywords. Under
> POSIX.2, there were only two such keywords -- int_currency_symbol and
> int_frac_digits. They were for formatting monetary values using the
> international currency strings (e.g., "USD " rather than "$" for the
> U.S. dollar; "DEM " rather than "DM" for the German mark; etc.). Under
> POSIX.2, quantities that used the international currency string and
> those that used the local currency symbol used the same values for
> keywords such as p_cs_precedes, p_sep_by_space, etc. Annex A says these
> have been added to accommodate "differences between local and international
> formats." For example?