From mgm@sybase.com Wed Nov 19 01:32:56 1997 Received: from inergen.sybase.com (inergen.sybase.com [192.138.151.43]) by dkuug.dk (8.6.12/8.6.12) with ESMTP id BAA23418; Wed, 19 Nov 1997 01:32:54 +0100 Received: from smtp1.sybase.com (sybgate.sybase.com [130.214.220.35]) by inergen.sybase.com (8.8.4/8.8.4) with SMTP id QAA26198; Tue, 18 Nov 1997 16:29:49 -0800 (PST) Received: from constantine.sybase.com by smtp1.sybase.com (4.1/SMI-4.1/SybH3.5-030896) id AA28895; Tue, 18 Nov 97 16:30:37 PST Received: by constantine.sybase.com (5.x/SMI-SVR4/SybEC3.5) id AA23221; Tue, 18 Nov 1997 16:27:47 -0800 Date: Tue, 18 Nov 1997 16:27:47 -0800 From: mgm@sybase.com (Michael G. McKenna) Message-Id: <9711190027.AA23221@constantine.sybase.com> To: rosenne@NetVision.net.il, Harald.T.Alvestrand@uninett.no, manuel.carrasco@emea.eudra.org Subject: Re: (i18n.390) RE: Transliteration standards: possible impact on internationaliz ation Cc: Converse@sesame.demon.co.uk, i18n@dkuug.dk, xojig@xopen.co.uk, sc22wg14@dkuug.dk, www-international@w3.org, wgi18n@terena.nl, keld@dkuug.dk X-Sun-Charset: US-ASCII [Mike] Unfortunately, the 639 language code does not cover regional differences, for instance between US English and International English. This may not be that big of a problem with regards the target language, but it may make a difference when choosing the source language. In Russian, for instance, is the source language White Russian, or contemporary Russian? And what script is it in? Serbo-Croation is commonly written using a latin script in Croation areas, but a cyrillic script in Serbian areas. It might look a little ugly, but the X-windows font specifier strings may be of some use as a starting template. Perhaps something like: t------ Where: sl - source language, using ISO 639 sd - source dialect, perhaps using ISO 3166 (I know, even this has defieciencies) ss - source script - (we'll script identifiers) tl - target language td - target dialect ts - target script Any value can be a default or wild card. So, French transliterated into Hebrew = t-fr-*-*-iw-*-* French transliterated into Russian = t-fr-*-*-ru-*-* Russian transliterated into Serbo-Croation in a latin script = t-ru-*-cy-sh-*-cy where cy = cyrillic la = latin This may be overkill, but we do need some sort of modifier part for regional differences. My $0.02, Mike____ > [Carrasco 1] > > >Transliteration should be coded in RFC 1766 (Mr. Alvestrand ?). > > > > > >For example: > > > > > > t-xx > > > > > >where > > > t : transliteration > > > xx : a 639 language code > > > > [Rosenne] > > A second argument is needed: the language into which the text is > > transliterated. Obviously, French transliterated to Hebrew is > > different > > from French transliterated into Russian. > > > > [Carrasco 2] > > > > So one needs to code: > > > > - t : transliteration indicator > > - ss : a 639 language code ; source language (language > > transliterated from) > > - tt : a 639 language code ; target language (language > > transliterated into) > > > > Examples: > > French transliterated into Hebrew = t-fr-iw > > French transliterated into Russian = t-fr-ru > > > > Questions: > > - Any other parameters needed to be coded ? > > - Does this breaks RFC 1766 ? > > > > Regards > > Tomas > > >