From: Kenneth Whistler [kenw@sybase.com] Sent: Thursday, August 08, 2002 4:19 PM To: frank@farance.com Cc: kenw@sybase.com; keld@dkuug.dk; tplum@plumhall.com; jb@benito.com; Winkler, Arnold F; nwallace@us.ibm.com; John.Hill@eng.sun.com; rex@RexJaeschke.com; nobuyoshi.mori@sap.com; Don.Schricker@microfocus.com; willemw@ace.nl; asmusf@ix.netcom.com; mark.davis@jtcsv.com Subject: Re: Agenda for Character set ad-hoc - 26th August Frank responded: > My concern is that SC22 have a voice in these "identifier characters". In principle, that is fine with me. SC22 committees *should* participate in this discussion. > I want to make sure that the people choosing the identifier characters > have an understanding and involvement in programming language standards. Yes -- an admirable goal. But it should be countered with its equal and opposite: that people developing the programming language standards should have an understanding and involvement in the character encoding standards. > WG20 seems to be the place where this happens. No, it is not. One might like to *believe* that is where that happens, but it is certainly not the case that any significant interactions or developments have been taking place in that forum regarding this topic. The significant interactions, in my experience, have been between the developers of Java, ECMAScript, C#, and XML, and the architects of character property definitions in the Unicode Technical Committee. > Sure it is possible to have some Unicode people in individual SC22 WGs, > but I'd rather see a consistent SC22 perspective on this, i.e., > representation of SC22's issues (not necessarily Unicode's table). Then I would think a reasonable thing to do would be to summarize SC22's issues and bring them into the Unicode Technical Committee for discussion. That's what W3C does when there are mutual concerns about issues such as identifiers. The way you put this smacks of standards turf defense and wishful thinking. It might seem more well-behaved for this concern of SC22 programming language committees to be "controlled" within an SC22 committee -- namely WG20. But the issue is not controlled by SC22 -- there are other players out there creating realities, and it behooves the ISO programming language standardizers to interact with them, so that their programming language concerns are represented into the discussions. > I don't believe that Unicode will be able to address the programming > language standardization aspect of these characters (similar to the > concerns of XML, ASN.1, SQL, etc..). I don't think anyone in the UTC expects that to be the forum that would establish the *particular* identifier syntax rules that apply in C or C++ or COBOL (or XML, ASN.1, SQL, or anything else). That is a concern of each relevant standardizer. What the UTC establishes are consistent, extensible rules for Identifier_Start and Identifier_Extend properties for all Unicode characters. Those rules can be adapted and customized, as required for particular formal syntaxes. But what the formal language committees should not be doing is pawing through 94,000+ Unicode characters trying to establish all their properties and sorting them into categories for Identifier_Start and Identifier_Extend classes. *That* is the expertise of the UTC, instead. > It seems fine to me for Unicode to submit a contribution and for SC22 > to have some review (SC32 would like review, too), but to merely point > to a Unicode table without SC22's review does not serve the purpose of > SC22's programming languages. No, you point to the Unicode table(s), and then review the particular extensions, limitations, or customizations (treatment of "_", "-", "@", other syntax characters, whatever) that apply to your particular standards. And you consider case sensitivity. And you consider interoperability issues for identifiers which may be used across formal languages. And you consider stability issues for identifiers across versions of Unicode and of your language standards. But don't expect the Unicode Consortium to submit all 21+ primary data files of the Unicode Character Database (including the 26 megabyte Unihan.txt) to SC22 for review in that context. The review and updating of those data files takes place in the *UTC* context, by the open process that the UTC has established for timely, coordinated releases of those data files -- now needed and required by many, many implementations of the Unicode Standard. SC22 participants are welcome to participate in that review and development, along with everybody else -- just don't expect to turn it into a process that *lives* in SC22 committees and works by ISO ballotting. > > Regarding the maintainability of this, there's always the possibility > of using a registry (a tried and true method) that would satisfy many > concerns. Do you still have objections to use of a registry? Absolutely. A registry is an absolutely crazy way to try to maintain a large database of inter-related properties for 94,000+ characters. The Unicode Technical Committee has a tried and true method for doing this that is now in practical use by hundreds of major implementations. Regards, --Ken