SC22/WG20 N866 From: Ordering@sesame.demon.co.uk Sent: Friday, September 14, 2001 5:36 AM ------------------------------------------------------------ Meeting report: ISO/TC37/SC2/WG1 (Coding systems), Toronto, 2001-08-14 1. Meeting arrangements ISO/TC37/SC2/WG1 (Coding systems), met on 2001-08-14 in Toronto. Present were Haavard Hjulstad (Norway, also Convenor ISO/TC37/SC2/WG1), Helen Hutcheson (Canada, also ISO/TC37/SC2 Secretary), Gerhard Budin (Austria, also ISO/TC37/SC2 Chair), Peter Port (Canada, also ISO/TC37/SC3 Chair), Gary Simons (US, also SIL International), Peter Constable (US, also SIL International), Marie Claire Morin (France), Jean Schwob (France), Chris Cox (UK), Marta Alonso (Spain), Yuka Sasaki (Japan), Glenn Patton (US, also ISO 639 Joint Advisory Committee member), Rebecca Guenther (US, also representing the ISO 639-2 Maintenance Agency, also ISO 639 Joint Advisory Committee chair), Janice Pereira (Canada), Jake Knoppers (Canada, also Convenor of ISO/IEC JTC1/SC32/WG1 (Open EDI) and designated ISO/IEC JTC1/SC32 Liaison to ISO/TC37/SC2), John Clews (UK, also Chair, ISO/TC46/SC2: Conversion of Written Languages, member of ISO/IEC JTC1/SC22/WG20: Internationalization), Christian Galinski (Austria, representing the ISO 639 Maintenance Agency), Jennifer De Camp (US), Monte George (US). 2. Progress on existing standards ISO 639-1:2001 (2-letter codes) is agreed and awaits publication. It is intended to update and replace ISO 639 (2-letter codes). The issue of freezing the 2-letter repertoire remains open. It is planned that ISO 639-1 tables will be available via the Internet. ISO 639-2 (3-letter codes) remains in use. It includes both Terminology codes and Bibliography codes, partly due to its history of ISO 639-1 and ISO 639-2 being developed by two different ISO committees, though there is now considerable cooperation through the ISO 639 Joint Advisory Committee. ISO 639-2 tables are available via the Internet . At the Toronto meeting, there were also other people involved with other language coding systems outside of ISO, so convergence of approaches is now more possible than before. 3. Open discussion on extending language code standards The meeting mainly comprised an open discussion about language coding and the need and feasibility to standardize and extend existing language coding. 3.1 Jennifer De Camp (US) described user needs and requirements, espcially in the IT area, and in the area of national and local governmental need for language codes. Typical uses of language codes are for marking documents or files - to refer to that material - in order to apply IT tools, or Knowledge management tools - identifying statistical patterns of use. What is needed is a single set of codes, to allow IT tools to allow integration of common off the shelf products. Voice recogintion will need even more language entity information, including dialect. Jennifer De Camp noted various other data sources where language codes had been developed (e.g. SIL, Linguasphere, Microsoft, Opentype, etc) many of which were different to each other. Multiple standards creates additional costs, and ISO/TC37/SC2/WG1 should look to being able to develop a single extended language code if possible. 3.2 Peter Constable (USA) noted SIL's contact with IT users and developers. He had analysed the deficiencies of a number of coding systems in a previous International Unicode Conference paper, distributed to ISO/TC37/SC2 at its 2000 meeting in London, particularly relating to denotation (what exactly does this language name represent? A name alone is insufficient). In Toronto, in ISO/TC37/SC2/WG1 N76 he proposed specific principles for mapping between different coding schemes. Quite often direct 1:1 mapping was not possible. Mapping of Language groups, and mapping Major Language Variety (MLV) were the most useful principles that they had identified, but sometimes only indeterminate mappings could be arrived at, for example due to insufficient specificity in ISO 639 and ISO 639-2. 3.3 Jake Knoppers (Canada) described the use of language codes in relation to other data elements, and problems of combinations, especially in EDI and E-commerce applications, and raised queries about whether, in the 27 cases where they are different, implementers should use ISO terminology codes or ISO bibliography codes (the terminology codes were favoured for general use, as is the case in the Internet standard RFC 3066 (Language Tags). 3.4 Haavard Hjulstad's paper ISO/TC37/SC2/WG1 N72 - Additional language coding - made suggestions on combining code elements, such as from ISO 639: Language codes ISO 3166: Country codes ISO 15924: Script codes. Currently there is no way specified on how to combine codes, e.g. in which order. He suggested a SGML-type specification might be produced of how language codes should be listed with other data elements. In a further paper (N71), he provided suggestions for 5-letter taxonomy codes for language groups, for different purposes to the existing codes in ISO 639-2. 3.5 Chris Cox (United Kingdom) presented an initial proposal from David Dalby (Linguasphere Observatory, UK) which suggested using alphanumeric language codes to extend language code provision, for specifying written languages. A suggestion that different meanings might be allocated to 2-letter and 3-letter code elements (e.g. en: specifically Written English; eng: oral English implied) was dropped, due to strong opposition. 4. Action agreed ISO/TC37/SC2/WG1 agreed to recommend that ISO/TC37/SC2 should progress a New Work Item to develop a new ISO Technical Report on extending language codes, and should set up a Task Force of ISO/TC37/SC2/WG1. Task Force members will be Gerhard Budin (Austria, Chair), Haavard Hjulstad (Norway), Jennifer De Camp (USA), and John Clews (United Kingdom). It is planned that October 15 2001 will be an initial reporting date to ISO/TC37/SC2/WG1. The Task Force would (a) identify User requirements for language codes, (b) suggest possible methodologies for extension, and (c) liaise with other bodies with an interest in this area. Further progress is likely at various points before August 2002, when the next ISO/TC37 meetings will take place (2002-08-19 through 2002-08-23 in Vienna). A draft Technical Report is likely to be produced over the next few months. John Clews -- John Clews, Keytempo Limited (Information Management), 8 Avenue Rd, Harrogate, HG2 7PG Email: Scripts@sesame.demon.co.uk tel: +44 1423 888 432; Committee Chair of ISO/TC46/SC2: Conversion of Written Languages; Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization; Committee Member of ISO/TC37: Terminology