SC22/WG20 N775
L2/00-308

Collection of reactions to the WG20 convenor's
"Personal thoughts about the future of WG20"

Part 2, from September 7 through September 13, 2000

 

 

Akio Kido suggested that I collect all reactions to my proposal about the future of WG20 in one document for easy reference.  Due to the interest in this subject, it became a rather lengthy document and I decided to put a linked index in front of it – that allows you to go straight to the contribution that interests you.  I did not do any formatting – please apologize, if text in html does not look as good as it could be, but I wanted to maintain the original form of the e-mails the way I received them.

 

The document got too long – I had to split it into parts:

 

Parts

SC22/WG20

NCITS/L2 - UTC

Part 1, from August 30 – September 6, 2000

N774

L2/00-307

Part 2, from September 6 on …

N775

L2/00-308

 

Index with the latest document on top: (Status September 13, 2000)

 

National Body

Name

Date

Content

Supports
N3164

USA

Ken Whistler

2000-09-13

Character properties. 9945-2

Y

USA

Ken Whistler

2000-09-12

WG20 projects

Y

Norway

Keld Simonsen

2000-09-12

Comments on WG20 projects

N

W3C

Martin Dürst

2000-09-11

I18N in W3C

Y

Norway

Keld Simonsen

2000-09-07

Answer to Whistler's e-mail

N

Canada

Glen Seeds

2000-09-07

Value of I18N

N

France

Antoine Leca

2000-09-07

LC_CTYPE in POSIX

?

 

 

 

Individual contributions on e-mail:

 

 

France, Antoine Leca, September 7, 2000

 

From: Kenneth Whistler [kenw@sybase.com]

>

> 3. Character Properties

>

> The most contentious issue regarding DTR 14652 is the effort to

> extend LC_CTYPE to cover the repertoire of ISO 10646-1. The contending

> positions effectively reflect a worldview divide among the participants

> regarding character properties:

>

> Position A: Character properties have not traditionally been covered

> by character encoding standards, and have not been viewed as the

> domain of the ISO committee responsible for encoding characters: SC2.

> Instead, character properties are an implementation issue, traditionally

> dealt with in the standards most directly concerned with character

> implementation -- namely the formal language standards -- and are

> dealt with in ISO by the working groups under SC22. In the context

> of 14652, the appropriate place to define character properties is

> LC_CTYPE, where the properties would be usable in a POSIX context as

> part of locale definitions.

 

 

May I point out that POSIX in this area just provide two things:

- a portable way to "formalize" LC_CTYPE (the localedef mechanism), which

  is the very thing that PDTR 14652 is improving; this is covered by

  Ken's previous discussion, as I see things;

- a mandatory implementation of the minimum subset, the "POSIX" locale,

  which he inherited really from Unix V7 ff., but formally that he

  inherits from the C Standard.

 

As such, one may also consider involving WG14. Furthermore, the new revision

of the C standard provides some support for the UCS. If this extensions

are used (and this is a pre-requesite for them to be used in POSIX context),

then it would be a natural extension in the next amendment/revision of the

C Standard to provide mandatory rules for the character properties: for

example, to somewhat require iswupper(L'\u0410') to return nonezero;

currently, this is not the case (nothing is required here).

<sidenote>

Furthermore, I had a discussion within the POSIX group some months ago.

As a result, in the "POSIX" locale, iswupper(L'\u0410') is expected to return 0.

</sidenote>

 

 

Canada, Glen Seeds, September 7, 2000

 

Sounds like we've started another really interesting thread.

My reactions on postings so far:

 

- I agree with those who say that trying to develop an API in a horizontal

group makes no sense. That should be left to the individual programming

language groups.

 

- I also agree that we should try to avoid invention wherever possible, and

search for existing practice that can be codified. Where invention is

unavoidable, we should try to keep it at as high a level as possible.

 

- I disagree that this topic is outside the proper province of programming

languages.  All such specifications include libraries or similar facilities

that, while not central to the language syntax and semantics, are needed in

order to make the average programmer's life tenable, and to facilitate

common solutions to common problems. This includes things such as I/O and

string handling. i18n is another thing of this type.

 

- I agree that the single most important issue is enabling conformance to

10646. I don't agree that this is the only important issue. Handling of

other cultural conventions in a standard way is also extremely important.

 

- I don't agree that leaving this to individual vendors would be a

reasonable way to address this need. As a user, it costs my company a great

deal to have to work around the differences between vendors in this area,

and having standard solutions that they all conform to would be of

considerable value to us. I have to tell you that in the face of the absence

of this, we are adopting the same approach as was described for Metaphor: we

are forbidding use of the vendors' facilities, and implementing our own. We

are not at all happy about having to do this, and are critical of vendors'

slow adoption of things such as UTF-8 and 14651.

 

- I don't agree that the differences between the different approaches in

different programming languages is in the same category as the problem

above. However, I would like to make a point here that has not been made

yet:

 

The most significant objective that an i18n standardization group could

achieve is a specification for a minimum *set of cultural issues* where

conforming systems should support variability, and a standard way if

*interchanging the encoding rules* for that variability. This is where

international expertise is most needed and most effective. It is also where

existing vendors tend to have the fewest opinions and vested interests that

bog down the standardization process. The Austin Group in particular has

said that they are waiting for direction from ISO before doing any further

work in these areas.

 

(It's also the area where they get themselves into the most trouble on their

own.  A good example of this is the current Austin Group discussion on

"collation order" versus "collation sequence" in regular expressions.)

 

We have already achieved a lot in this area in the form of 10646 and 14651.

It's unfortunate that the next step, 14652, failed to become anything beyond

a TR. As a user, I have a strong interest in seeing this work go forward. I

can't imagine a better place than SC22/WG20. The areas most affected are the

PL's and OS's, but none of them have the expertise to put together a

statement of generic issues and solutions.

 

Having said all that, I agree the lack of new progress shows that a review

is in order. We should ask the other groups, especially those in SC22, what

their concerns are in this area, and what sort of process they would buy

into that would allow us to move forward.

 

   /glen

 

 

Norway, Keld Simonsen, September 7, 2000

 

On Wed, Sep 06, 2000 at 09:43:45AM -0400, Winkler, Arnold F wrote:

>

> From: Kenneth Whistler [kenw@sybase.com]

> Sent: Friday, September 01, 2000 4:40 PM

> Subject: Some technical issues regarding the future of SC22/WG20

>

> ================================================================

>

> Arnold Winkler has recently raised a number of issues regarding the future

> of SC22/WG20 and the standards that it maintains or has under

> development, for consideration at the upcoming SC22 plenary in Nara.

> Chief among the issues he raised is whether WG20 is now at the

> end of its useful life, and whether it should be sunsetted, with

> its various projects redistributed over time to other committees as

> appropriate for maintenance.

 

As I have written in another message, I see WG20 as just now

being able to get to work on real issues. WG20 has been in

the process of taking control over its subject, producing

standards that were the best in SC22 on the subject, but

in essence not much better than say, C, C++, Ada, Fortran, COBOL

or POSIX specifications.

 

There is a long way to go, if we want ISO standards to be leading in

the field, eg in the area of APIs. Current widespread APIs on the

market like Microsoft NT APIs or IBM ICU have maybe 3 times as many

APIs that what we have worked on in WG20.

 

This is very normal that ISO standards do not contain all functionality,

and for WG20 the work item was actually restricted to a specific

quite small set of functionality when the NP was accepted.

 

> 1. Collation

>

> Furthermore, among the active participants in WG2 are the experts

> on collation (with implementation experience) who actually ended

> up authoring much of the content of 14651. Comparable experience is

> not obviously available in the SC22 committees other than WG20.

> Furthermore, because of the current close working relationship

> between WG2 and the Unicode Technical Committee, WG2 is also the

> best place to maintain a standard that should stay in synch with

> the Unicode Collation Algorithm maintained by the UTC, to prevent

> unanticipated "drift" between the two standards.

 

The argument that the sorting expertise is in SC2 is a myth.

I do not encounter sorting experts in SC2 - beyound the ones I already

know in SC22. And some SC22 experts I rarely see in SC2.

 

Furthermore it is important that there be a strong realtion to SC22

producers so there not be a "drift" from other SC22 sorting

specifications in this area, such as POSIX or C specs.

 

> 3. Character Properties

>

> The most contentious issue regarding DTR 14652 is the effort to

> extend LC_CTYPE to cover the repertoire of ISO 10646-1. The contending

> positions effectively reflect a worldview divide among the participants

> regarding character properties:

>

> Position A: Character properties have not traditionally been covered

> by character encoding standards, and have not been viewed as the

> domain of the ISO committee responsible for encoding characters: SC2.

> Instead, character properties are an implementation issue, traditionally

> dealt with in the standards most directly concerned with character

> implementation -- namely the formal language standards -- and are

> dealt with in ISO by the working groups under SC22. In the context

> of 14652, the appropriate place to define character properties is

> LC_CTYPE, where the properties would be usable in a POSIX context as

> part of locale definitions.

>

> Position B: Character properties for the *universal* character set --

> namely ISO 10646 (= Unicode) are inherent to *characters*, and should

> *not* be defined in locales. The locale model and LC_CTYPE were an

> attempt to provide a mechanism for dealing with properties of characters

> in alternate encodings, but that model does not scale well for dealing

> with properties for the universal repertoire of 10646. Furthermore,

> it is inappropriate to assert that character properties are defined

> in locales, and are thus subject to locale-specific variation, since

> such a position would lead to inconsistent and inexplicable differences

> in application behavior, depending on locale, in ways that have

> no bearing on the usually understood issues of locale-specific

> formatting differences, etc. Because character properties are closely

> tied to the characters themselves, responsibility for defining them

> should belong with the character encoding committees, rather than

> with the language committees -- and thus in SC2, rather than SC22.

>

> It is clear that among the rather large community of implementers

> of 10646 (= Unicode), Position B has much more widespread support

> than Position A. Position A is, however, a vocally held minority

> opinion among those committed to the extension of the POSIX framework.

 

On the other hand, in the UNIX/POSIX/C circles Position A is much

more widespread.  Position B is voiced very actively by a small

group of about 20 companies in the Unicode consortium.

In terms of machines actually employing the two different positions,

there is about 20 million or more in the UNIX/Linux community using

it in the Position A way, while Position B is only standard on

Windows 2000 which has less than 10 millions systems installed.

 

However, the difference between Position A and B is in practice

not big. Most agree that attributes are associated to characters,

however there are some culturally dependent character properties,

such as the Turkish mappings between uppercase and lowercase

for the letter "I" and display of native digits.

 

> In point of actual fact, the *real* work on standardization of

> 10646 character properties is being done almost entirely

> by the Unicode Technical Committee, which for years now has been

> publishing machine-readable tables of character properties and

> associated technical reports that are in widespread implementation

> in many products. A very few character properties, most notably

> "combining" and "mirroring", are also formally maintained by SC2/WG2 in

> ISO 10646 itself, and those properties are tracked in parallel by

> the UTC.

 

There has also been a lot of work going on in POSIX circles,

with character properties for more that 20.000 characters

already defined in the POSIX.2 standard that was finished

in 1992. It is maybe a sign of how well researched the Unicode

specifications are that this fact is still unnoticed by prominent

Unicode people.

 

> On balance, it would seem far preferable to conclude that within

> JTC1 any responsibility for character properties should belong

> to SC2, rather than SC22. Once again, this is a matter of expertise

> regarding the huge number of characters in 10646. That expertise

> is in SC2, and not in SC22. And the implementation experience

> regarding character properties resides in the UTC, which has a

> firm working relationship with SC2, but no close ties to SC22.

 

Again, the existence of SC2 experts in this area is a myth.

I believe that Unicode has experts, but they are as well connected

to WG20 as to SC2, having C liaison status in both groups.

Furthermore the Unicode technical committee chairman, Arnold Winkler,

is the convener of WG20. No high-ranking Unicode officers have

the same level of office in SC2.

 

SC2 has for a long time said that they were only into the encoding

of characters, not the meaning. I think still this is a reasonable

approach.

 

 

> Regarding LC_CTYPE in particular, the maintenance or extension of

> LC_CTYPE should be remanded to WG15, along with all of DTR 14652,

> but with the following recommendations: Rather than attempting to

> independently extend LC_CTYPE definitions to cover 10646, a mechanism

> should be developed whereby POSIX implementations using LC_CTYPE

> can make use of the more widespread and better researched and

> reviewed character property definitions developed by the UTC, in

> cooperation with SC2/WG2's development of 10646. This should be

> done by *reference*, rather than by enumerating lists of characters

> in SC22 standards or TR's, because of the danger of those lists

> getting out of synch or introducing errors that cause interoperability

> problems. Furthermore, this practice of dealing with character

> properties by reference to UTC and/or SC2 developed standards

> for them, should be recommended to *all* the SC22 committees, as

> the generic way to deal with character properties in formal

> language standards.

 

As said before, POSIX specs are more widespread than Unicode's,

in therms of systems employing them, and it seems like they may be

better researched, as they have included Unicode specifications

in their research, while Unicode still to this date is unaware of

their bigger competitor...

>

> 4. Internationalization API Standard

>

The i18n API project is another WG20 project to take control of

the subject of i18n, to become masters of our own house.

It is admittedly not very advanced, compared to some industry

APIs, this is partly due to SC22's decision to make a restricted

API. It is, however, with more functionalities than most

programming languages standardized in SC22, and aimed to take a

lead for SC22 standardization in this area.

 

> No one in WG20 but the project editor seems to be doing any active

> work to develop the API standard for internationalization, and the

> committee feedback to date has largely been that the quality of

> the drafts is poor. Fundamental questions regarding the nature

> of the API design have not been resolved. Furthermore, there has

> been a lot of hand-waving over the issue of how closely tied the

> proposed API is to the locale extension constructs of DTR 14652.

> The API under development for 15435 is locale-centric, in that

> it requires information in an "FDCC-set" defined a la DTR 14652,

> assuming API behavior will depend on that information, resident

> in some implementation-defined "database".

 

> Modern internationalization libraries have largely eschewed that

> kind of locale-centric design as too constrained, instead breaking up

> the problem of internationalization support into more modular

> designs that separate out different aspects of the problems

> involved.

 

Some modern i18n libraries still use locale-centric behaviour,

including POSIX compatible systems. As POSIX compatible

operating systems are the only major operating systems

gaining significant market shares these days, it cannot be all

that bad.  The i18n system of POSIX furthermore has facilities

so that you can orchestrate you own localization, which is

a virtue of the model. This is only recently that these mechanisms

have been taken up eg in microsoft systems, while posix systems

have done this for years. Java also have very similar concepts,

although they may maintain that it is completely different.

Seen from a users perspective, i18n using the POSIX model works

very well, in my personal experience.

 

The POSIX model is extensible, and is the only ISO standardized

model.

 

> Furthermore, the proposed API standard aspires to platform

> independent design. That, however, inappropriately conflates the

> issue of designing appropriate behavior for internationalization

> with the problem of designing appropriately abstracted API's

> for that behavior on distinct platforms. In actual practice,

> implementers are tending to make use of available libraries that

> surface correct internationalization behavior (such as the

> ICU classes) and then writing whatever wrappers are necessary to

> abstract that behavior into their systems. The days of trying

> to define complex behavior via ISO API standards, to be rolled

> out by language compiler vendors in standard C libraries and such,

> are being overtaken by object-oriented design and software

> component models.

 

Portablility across platforms is one of SC22's hallmarks,

and we achive it well with other standards such as the programming

language standards. The situations that is described above is

just the ones SC22 is set up to solve.

 

> At this point, WG20's project 15435 should just be abandoned as

> a well-intentioned but obsolete project that has no demonstrated

> need or support for its development.

 

The 15435 standard is primarily set up for other PL standards.

And furthermore, it is already implemented on major platforms

in major compilers (GNU C/C++)

 

> 6. Identifiers

 

WG20 was quite capable of producing the annex on 10176 on identifiers

and quite successful in getting it adopted by the Programming

Languages. WG20 has thus demonstrated it capabilities in

this area and there is no need to move the subject to somebody else.

WG20 even succeded to get Unicode to adopt the specifications.

 

> This entire issue, is, by the way, also of intense interest to

> the Database standards arena, where it is of direct relevance

> to the SQL standard, for example. So the SC22 working groups are

> not the only JTC1 groups with an interest in standard,

> interoperable results in this area for 10646 characters.

 

WG20 has liaison to the SQL WG, and furthermore acts as a focal

point for i18n for all of JTC 1, according to JTC 1 decisions.

 

Kind regards

Keld Simonsen

 

 

W3C, Martin Dürst, September 11, 2000

 

From: Martin J. Duerst [duerst@w3.org]

Sent: Monday, September 11, 2000 2:10 AM

To: John Hill, ISO/IEC JTC1 SC22 Chair

Cc: Lisa Rajchel ISO/IEC JTC1 SC22 Secretariat at ANSI

     Arnold Winkler, ISO/IEC JTC1 SC22 WG20 Convener

 

Type of document: Liaison Contribution

Subject: Future directions for WG20

 

For consideration at the Nara meeting of SC22

 

 

W3C herewith supports Arnold Winkler's recent proposal for the

future of SC22/WG20.

 

The experience with the internationalization of a wide range of

specifications at W3C strongly shows the following:

 

- The range of specifications with internationalization needs

   extends far beyond programming languages and includes document

   and data formats and protocols.

 

- Programming languages become more and more diverse, and most

   of a program's internationalization functionality is handled

   as part of libraries (input/output and user interface) where

   diversity is even bigger than in the programming language core.

 

- Internationalization cannot be done in isolation, but needs to

   be done by the committee responsible for the 'base' standard,

   with the participation, contribution, and review from

   internationalization experts. The main common base is the

   universal character set (ISO/IEC 10646).

 

 

With respect to the current work items of SC22/WG20, our input

is as follows:

 

- Sorting/Collation Standard (14651): The standard itself is close

   to completion, and should be completed by SC22/WG20. SC2/WG2 is the

   optimal place for further work on the data needed for the standard.

 

- List of characters for identifiers (Appendix to TR 10176):

   Again SC2/WG2 is the optimal place to extend this work to

   newly encoded characters.

 

- API for Internationalization (15435): Given the large variance

   across programming languages, and the increased importance of

   libraries and user interface components, a general API for

   internationalization is highly inappropriate.

 

- Registry for cultural conventions (ISO/IEC 15897): A good

   documentation on cultural conventions is very helpful for

   implementers of all kinds of information technology. In order

   to be of real value, the registry should:

   - Make the full information available on the World Wide Web.

   - Accept incomplete contributions (e.g. when only part

     of some cultural conventions are known or established).

   - Provide a full revision history for official registrations.

   - Accept contributions not only from the relevant national

     bodies, but also from the general public (and e.g. label

     them as 'not verified').

   - Accept multiple contributions for the same locale

     (and label them appropriately).

   - Besides registered information, provide pointers to related

     information elsewhere, in print or on the WWW.

   Once the registry is set up appropriately, the task of

   WG20 in this area can be considered completed.

 

 

The Type C Liaison between SC22/WG20 and the World Wide Web

Consortium (W3C), in particular the W3C Internationalization

Working Group (SC22 N3073) has been established to coordinate

internationalization issues between these two groups. Completion

of the current SC22/WG20 tasks as proposed by Arnold Winkler

and as discussed above, and transfer of the remaining character-

related responsibilities to SC2/WG2 completely satisfy the

needs of W3C and simplify the interaction between W3C and

ISO/IEC TC1 in the area of internationalization, because

W3C has already established a liaison with SC2/WG2.

 

 

Yours sincerely,   Martin J. Dürst.

 

 

Norway, Keld Simonsen, September 12, 2000

 

 

Arnold Winkler has recently raised a number of issues regarding the future

of SC22/WG20 and the standards that it maintains or has under

development, for consideration at the upcoming SC22 plenary in Nara.

Chief among the issues he raised is whether WG20 is now at the

end of its useful life, and whether it should be sunsetted, with

its various projects redistributed over time to other committees as

appropriate for maintenance.

 

However, I see WG20 as just now

being able to get to work on real issues. WG20 has been in

the process of taking control over its subject, producing

standards that were the best in SC22 on the subject, but

in essence not much better than say, C, C++, Ada, Fortran, COBOL

or POSIX specifications.

 

There is a long way to go, if we want truly internationalized,

portable applications, and ISO standards to be leading in

the field, here in the area of APIs. Current widespread APIs on the

market like Microsoft NT APIs or IBM ICU have maybe 3 times as many

APIs that what we have worked on in WG20.

 

This is very normal that ISO standards do not contain all functionality,

and for WG20 the work item was actually restricted to a specific

quite small set of functionality when the NP was accepted.

 

In general, I think the standardization of APIs and formats for data

specifications are best done in SC22, which standardizes

libraries, and also interacts with the many ISO programming languages.

 

Moving WG20 activities into SC2, as Arnold Winkler proposes,

would be an error, IMHO.

APIs are not in the scope of SC2. Neither are sorting or

character attributes. And sorting and character attributes

have for a long time been a SC22 issue, viz. C, and other

programming languages islower(), isupper() etc.

 

In the following I will give some comments on each of WG20's projects.

 

1. Collation

 

The argument that the sorting expertise is in SC2 is a myth.

The only sorting expert I encounter in SC2 - beyond the ones I already

know in SC22, is Michael Everson. And a number of SC22 experts that

always come the the WG20 meetings (at least during the last 2 years)

comes less regularily to SC2 meetings, this includes Ken Whistler,

Marc Küster, Kent Karlsson, Takata-San, and myself.

 

3. Character Properties

 

One school of thought, represented foremost by Unicode people,

think that character properties, such as what is a letter, digit,

and cpecial character, is an inherent property of the character itself

and cannot be changed, while another school thinks that character

properties may be culturally dependent, as per a C/C++/POSIX locale.

In terms of machines actually employing the two different positions,

there is about 20 million or more in the UNIX/Linux community using

it in the locale way, while the Unicode way is only standard on

Windows 2000 which has less than 10 millions systems installed.

 

However, the difference between the two schools of thought is in practice

not big. Most agree that attributes are associated to characters in a

fixed way, however there are some culturally dependent character

properties, such as the Turkish mappings between uppercase and lowercase

for the letter "I" and display of native digits.

 

On character properties, there has been some work going on

in Unicode, but also work going on in POSIX circles,

with character properties for more that 20.000 characters

already defined in the POSIX.2 standard that was finished

in 1992. It seems that this work has not till this date been

noticed by prominent Unicode people.

 

There is also a myth that the existence of experts in this area is

foremost in SC2.  I believe that Unicode has experts, but they are as

well connected to WG20 as to SC2, having C liaison status in both

groups. Beyond the Unicode people I see very few experts in SC2 on

this matter. On the other hand there are experts in SC22, including

experts in the different language WGs, the POSIX WG, and myself.

That Unicode should be less conntected to WG20 than to SC2 is for

me hard to fnderstand, with Unicode having C category liaison both

places, and furthermore the Unicode technical committee chairman,

Arnold Winkler, being the convener of WG20. No high-ranking Unicode

officers have the same level of office in SC2.

 

SC2 has for a long time said that they were only into the encoding

of characters, not the meaning. I think still this is a reasonable

approach.

 

3. cultural conventions specification standard, TR 14652

 

As said before, POSIX specs are more widespread than Unicode's,

in terms of systems employing them, and it seems like they may be

better researched, as they have included Unicode specifications

in their research, while Unicode still to this date is unaware of

their bigger competitor...

 

4. Internationalization API Standard

 

Some modern i18n libraries use locale-centric behaviour,

including POSIX compatible systems. As POSIX compatible

operating systems are the only major operating systems

gaining significant market shares these days, it cannot be all

that bad.  The i18n system of POSIX furthermore has facilities

so that you can orchestrate you own localization, which is

a virtue of the model. This is only recently that these mechanisms

have been taken up eg in microsoft systems, while posix systems

have done this for years. Java also have very similar concepts,

although they may maintain that it is completely different.

Seen from a users perspective, i18n using the POSIX model works

very well, in my personal experience.

 

The POSIX model is extensible, and is the only ISO standardized

model.

 

Portablility across platforms is one of SC22's hallmarks,

and we achive it well with other standards such as the programming

language standards. Also in the area of i18n SC22 and JTC 1

shouldstrve for applications portablilty.

 

The 15435 standard is primarily set up for other PL standards.

And furthermore, it is already implemented on major platforms

in major compilers (GNU C/C++).

 

6. Identifiers

 

WG20 was quite capable of producing the annex on 10176 on identifiers

and quite successful in getting it adopted by the Programming

Languages. WG20 has thus demonstrated it capabilities in

this area and there is no need to move the subject to somebody else.

WG20 even succeded to get Unicode to adopt the specifications.

 

WG20 has liaison to many parties inside and outside of SC22,

including the SQL WG, and furthermore acts as a focal

point for i18n for all of JTC 1, according to JTC 1 decisions.

 

Kind regards

Keld Simonsen

 

 

USA, Ken Whistler, September 12, 2000

 

Keld responded to a number of the concerns I had surfaced on

behalf of the U.S. committee. Here are some countercomments

which may lead into the discussion which is sure to ensue during

the upcoming Malvern meeting of WG20.

 

> > From: Kenneth Whistler [kenw@sybase.com]

> > Sent: Friday, September 01, 2000 4:40 PM

> > Subject: Some technical issues regarding the future of SC22/WG20

> >

> > ================================================================

> >

> > Arnold Winkler has recently raised a number of issues regarding the future

> > of SC22/WG20 and the standards that it maintains or has under

> > development, for consideration at the upcoming SC22 plenary in Nara.

> > Chief among the issues he raised is whether WG20 is now at the

> > end of its useful life, and whether it should be sunsetted, with

> > its various projects redistributed over time to other committees as

> > appropriate for maintenance.

>

> As I have written in another message, I see WG20 as just now

> being able to get to work on real issues. WG20 has been in

> the process of taking control over its subject, producing

> standards that were the best in SC22 on the subject, but

> in essence not much better than say, C, C++, Ada, Fortran, COBOL

> or POSIX specifications.

 

This is, unfortunately, a sad commentary on the quality of the

I18N work coming out of WG20 to date, and I concur with Keld's

assessment!

 

>

> There is a long way to go, if we want ISO standards to be leading in

> the field, eg in the area of APIs. Current widespread APIs on the

> market like Microsoft NT APIs or IBM ICU have maybe 3 times as many

> APIs that what we have worked on in WG20.

 

...and much greater sophistication, as well as precision of definition.

And you neglected to mention Java in this list.

 

As for the presuppostion here, that ISO standards should be leading

this field, see below. I agree with the essential assessment

that WG20 is *way* behind. But I differ with Keld in that I don't

think there is any feasible way for WG20 to do a decent job of

providing an I18N API standard.

 

>

> This is very normal that ISO standards do not contain all functionality,

> and for WG20 the work item was actually restricted to a specific

> quite small set of functionality when the NP was accepted.

 

I don't think there was any "specific quite small set of

functionality" defined in the NP. All along, the coverage of

15435 has essentially been precisely what the editor intended

it to be; I see no evidence of principled direction from the

committee that set or constrained the initial scope of the

proposed standard.

 

>

> > 1. Collation

> >

> > Furthermore, among the active participants in WG2 are the experts

> > on collation (with implementation experience) who actually ended

> > up authoring much of the content of 14651. Comparable experience is

> > not obviously available in the SC22 committees other than WG20.

> > Furthermore, because of the current close working relationship

> > between WG2 and the Unicode Technical Committee, WG2 is also the

> > best place to maintain a standard that should stay in synch with

> > the Unicode Collation Algorithm maintained by the UTC, to prevent

> > unanticipated "drift" between the two standards.

>

> The argument that the sorting expertise is in SC2 is a myth.

> I do not encounter sorting experts in SC2 - beyound the ones I already

> know in SC22. And some SC22 experts I rarely see in SC2.

 

Perhaps this is a result of attending more to SC2 committee matters

per se, rather than to WG2 or its liaison relation to the UTC.

 

Here are some examples: 4 experts on Myanmar sorting issues

at WG2 in London; 1 expert on Tibetan sorting at WG2 in London,

and *megabytes* of Tibetan input on a UTC hosted discussion list;

input on Kannada sorting from an expert just last week at the

International Unicode Conference; numerous other Indic inputs

from Jeroen Hellingham and other experts on the Unicode discussion

lists; Chinese input on Yi sorting issues in WG2 in London,

Fukuoka, and Beijing; participation from Arabic and Syriac

experts; Joe Becker; Asmus Freytag; Tex Texin (implemented at

Progress); Gary Richards (implemented at NCR); implementers

from Oracle; the designers and implementers of sorting in Java;

the designers and implementers of sorting in the IBM ICU; and

last, but not least, the designers and implementers of ML sorting

at Microsoft.

 

Would you care to make a corresponding, explicit list of the

SC22 experts in sorting that you rarely see in SC2, and what

their contributions might be to solving issues that must be

faced in extending the 14651 tables to cover such scripts as

Myanmar, Khmer, Mongolian, and Yi?