A Taxonomy for TC304 - European Localization Requirements

Source: Keld Simonsen

Date: 1998-06-02

1 Introduction and scope

In order to approach standardization in a systematic way, a common approach is to develop a way to classify the subject area, or a taxonomy. This helps in two ways:

-a taxonomy helps to identify all aspects of the domain in question which might be subject to standardization;

-a taxonomy helps to provide a logical structure for the standardization activity.

A taxonomy has been developed of relevant concepts in the domain of character-set technology, based on user requirements for functionality, as discussed in Clause 4 of Part I of this report.

By way of an application, all known current standards and standardization activities have been grouped according to this taxonomy, thus forming another type of taxonomy, that of the standards themselves.

2 A taxonomy for European localization requirements.


Figure 1: Topical map of user requirements in European localization requirements

The present classification of the concepts was made through the identification of commonalties, such as characters, sets, fonts and rules relating to presentation. The analysis was based on a much wider view of "multi-cultural support", a shown in Figure 1, which attempts to map some of its concepts. Areas relevant to this report were chosen and developed into the full taxonomy, shown in clause 3.2. This latter choice comprises the technology which relates to methods for specifying, and rules governing, the creation of unique properties and codes which facilitate the presentation, storage and transmission of individual characters.

The taxonomy in clause 3.2 was based on references ISO/IEC TR 10000-1, ISO TR 12382 and IEC 824 and the activities of appropriate standardization bodies, but most notably the work of CEN/TC304 and ISO/IEC/JTC 1.

3 Description of classification

3.1 Description

Reference to Figure 1 shows the background upon which the taxonomy itself is based. User requirements may be summed up in the single phrase "multi-cultural support", being the need to accommodate all the requirements of different types of user, whether they are racial, national, typographical, occupational or individual. The figure is not intended to be exhaustive, nor fully developed, but allows the choice for the TC304 taxonomy to be based on logical analysis. The primary choice was for text based topics, in line with the capability of computer technology to code, store and process individual characters.

The taxonomy in clause 3.2 takes the classic form of a tree structure, where two major classes are recognized; Locales and Characters. The former deals with the cultural environment of the user, the latter with the smallest divisible parts that make up the messages which are being electronically processed.

A taxonomy of whatever phenomena can be constructed in several ways, depending on its purpose and the aspects applied. (For instance, a number of persons may be grouped firstly according to age, then according to gender, then according to place of living -- or precisely the other way around, according to need.) A taxonomy for standardization purposes naturally has to take into account the most practical ways to group existing standards and standardization projects as well as the logical connections between them and any conceptual "holes" which may need to be filled in order to cover the full need for standardization.

The following taxonomy is thus intended to provide a map for almost all of the user requirements identified in Part I (see the application in Part III). Therefore the level of subordination in some cases go very deep -- this does not mean that the actual standardization projects need a taxonomy of the same complexity. When a sub-level is empty of existing or future standards, the entries in that sub-level are simply collapsed and only the level above remains.

3.2 Taxonomy for TC304

/ (no id) TAXONOMY FOR TC304

L/ LOCALES
|----- L/1 Specifications
| |----- L/11 Languages
| | |----- L/111 Natural languages
| | |----- L/1111 Vocabulary
| | | |----- L/11111 Standard terminology
| | | |----- L/11112 Thesauri
| | | |----- L/11113 Standard phrases
| | | |----- L/11114 Translation
| | |----- L/1112 Grammar
| | |----- L/1113 Orthography
| | | |----- L/11131 Alphabet
| | | |----- L/11132 Spelling
| | | |----- L/11133 Use of special characters
| | | |----- L/11134 Capitalization
| | | |----- L/11135 Hyphenation
| | | |----- L/11136 Punctuation
| | | |----- L/11137 Transcription
| | | |----- L/11138 Ordering
| | | | |----- L/111381 Europe
| | | | |----- L/111382 World-wide
| | | |----- L/11139 Personal names and titles
| | |----- L/1114 Speech
| |----- L/12 Cultural conventions
| | |----- L/121 Cultural elements
| | |----- L/1211 Orthography
| | | |----- L/12111 Date and time format
| | | |----- L/12112 Numeric separators
| | | |----- L/12113 Monetary format
| | | |----- L/12114 Telephone number format
| | | |----- L/12115 Payment number format
| | | |----- L/12116 Mail address format
| | | |----- L/12117 National places
| | |----- L/1212 Measurement system
| | |----- L/1213 Layout styles
| | |----- L/1214 Paper sizes
| |----- L/13 Operating system dependency
| |----- L/131 POSIX
| | |----- L/1311 Europe
| | |----- L/1312 World-wide
| |----- L/132 Other
|----- L/2 Registration
| |----- L/21 Procedures
| |----- L/211 Europe
| | |----- L/2111 National
| |----- L/212 World-wide
|----- L/3 Implementation
|----- L/31 Fallback

C/ CHARACTERS
|----- C/1 Character information
| |----- C/11 Identification
| | |----- C/111 Characters
| | | |----- C/1111 Identifiers
| | | |----- C/1112 Attributes
| | |----- C/112 Repertoires
| | | |----- C/1121 Graphic characters
| | | | |----- C/11211 Natural language alphabets
| | | | | |----- C/112111 Europe
| | | | | | |----- C/1121111 General
| | | | | | |----- C/1121112 Disabled/elderly
| | | | | |----- C/112112 World-wide
| | | | |----- C/11212 Programming language alphabets
| | | | |----- C/11213 Non-alphabetic symbols
| | | | |----- C/112131 General
| | | | |----- C/112131 Disabled/elderly
| | | |----- C/1122 Control functions
| | | | |----- C/11221 Europe
| | | | | |----- C/112211 General
| | | | | |----- C/112212 Disabled/elderly
| | | | |----- C/112222 World-wide
| | | |----- C/1123 Registration
| | |----- C/113 Glyphs
| | | |----- C/1131 Registration
| | | |----- C/1132 Character correspondence
| | |----- C/114 Glyph repertoires
| | |----- C/1141 Registration
| | |----- C/1142 Repertoire correspondence
| |----- C/12 Manipulation
| |----- C/121 Transformation
| |----- C/1211 Case conversion
| |----- C/1212 Transliteration
| |----- C/1213 Fallback representation
|----- C/2 Input/output
| |----- C/21 Input
| | |----- C/211 Keyboard
| | | |----- C/2111 Europe
| | | |----- C/2112 World-wide
| | |----- C/212 Other means
| |----- C/22 Output
| |----- C/221 Character repertoires
| | |----- C/2211 Europe
| | |----- C/2212 World-wide
| |----- C/222 Character attributes
|----- C/3 Electronic processing
|----- C/31 Coding schemes
| |----- C/311 Encoding of graphic characters
| | |----- C/3111 7-bit method
| | |----- C/3112 8-bit method
| | |----- C/3113 Multiple-octet method
| | |----- C/31131 Europe
| | |----- C/31132 World-wide
| |----- C/312 Encoding of control functions
| |----- C/313 Code transformations
| |----- C/3131 UCS--UCS
| |----- C/3132 UCS--other coding schemes
| | |----- C/31321 Europe
| | |----- C/31322 World-wide
|----- C/32 Interchange/communication
| |----- C/321 7-bit method
| |----- C/322 8-bit method
| |----- C/323 Multiple-octet method
|----- C/33 Internationalization support
|----- C/331 Programming languages
| |----- C/3311 Language-dependent
| |----- C/3312 Language-independent
|----- C/332 Operating systems
|----- C/333 Communications
|----- C/3331 Directory services
|----- C/3332 Telematics

4 Taxonomy of current standardization work and research

What follows is an application of the above taxonomy to standardization and research projects currently going on. The purpose is to illustrate one use of the taxonomy as well as to provide a map of where the respective work is being carried out.

CodeTitle Current standardization or research activity
/ (no id)TAXONOMY CEN/TC304
L/LOCALES-
L/1Specifications-
L/11Languages-
L/111Natural languages-
L/1111VocabularyISO/TC 37, LRE - TRANSTERM, GENELEX
L/11111Standard terminologyLRE - POINTER
L/11112Thesauri-
L/11113Standard phrases-
L/11114TranslationLRE - PAROLE, EUROTRA
L/1112Grammar-
L/1113Orthography-
L/11131AlphabetCEN/TC304/WG2
L/11132Spelling-
L/11133Use of special characters-
L/11134Capitalization-
L/11135Hyphenation-
L/11136Punctuation-
L/11137Transcription-
L/11138Ordering-
L/111381EuropeCEN/TC304/WG1
L/111381World-wideISO/IEC/JTC1/SC22, ISO/TC46, ISO/TC37
L/11139Personal names and titles-
L/1114SpeechLRE - EAGLES, LRE - SPEECHDAT
L/12Cultural conventionsISO/IEC JTC1/SC22/WG20, X/Open, CEN/TC304/WG2
L/121Cultural elements-
L/1211Orthography-
L/12111Date and time format-
L/12112Numeric separators-
L/12113Monetary format-
L/12114Telephone number formatPTTs, CEPT, ENO
L/12115Payment number format-
L/12116Mail address formatCEN/PC8
L/12117National places-
L/1212Measurement system-
L/1213Layout styles-
L/1214Paper sizesISO/TC6, CEN/TC172
L/13Operating systems dependency-
L/131POSIX-
L/1311Europe-
L/1312World-wideISO/IEC JTC1/SC22/WG15
L/132Other X/open-
L/2Registration -
L/21Procedures-
L/211EuropeCEN/TC304/WG2
L/2111National-
L/212World-wide-
L/3Implementation -
L/31Fallback-
C/CHARACTERS-
C/1Character information-
C/11Identification-
C/111CharactersISO/IEC JTC1/SC2, SC18
C/1111Identifiers-
C/1112Attributes-
C/112RepertoiresISO/IEC JTC1/SC2, SC18, SC22
C/1121Graphic characters-
C/11211Natural language alphabets-
C/112111EuropeCEN/TC304/WG3
C/1121111General-
C/1121112Elderly/disabledISO/TC173
C/112112World-wide-
C/11212Programming language alphabets-
C/11213Non-alphabetic symbols-
C/112131General-
C/112132Disabled/elderlyTIDE
C/1122Control functions-
C/11221Europe-
C/112211General-
C/112212Elderly/disabled-
C/11222World-wide-
C/1123Registration-
C/113GlyphsISO/IEC JTC1/SC18
C/1131Registration-
C/1132Character correspondence-
C/114Glyph repertoiresISO/IEC JTC1/SC18
C/1141Registration-
C/1142Repertoire correspondence-
C/12Manipulation-
C/121TransformationCEN/TC304/WG4
C/1211Case conversionISO/IEC JTC1/SC22/WG15, WG20
C/1212TransliterationISO TC46 (bibliographic)
C/1213Fallback representation-
C/2Input/output-
C/21InputISO/IEC JTC1/SC18
C/211Keyboard-
C/2111Europe-
C/2112World-wide-
C/212Other means-
C/22Output-
C/221Character repertoires-
C/2211Europe-
C/2212World-wide-
C/222Character attributes-
C/3Electronic processing-
C/31Coding schemesISO/IEC JTC1/SC2, SC22; CEN/TC 304/WG3
C/311Encoding of graphic charactersISO/IEC JTC1/SC18 (text layout)
C/31117-bit methodCEN/TC304/WG3
C/31128-bit methodCEN/TC304/WG3
C/3113Multiple-octet methodCEN/TC304/WG3
C/31131Europe-
C/31132World-wide-
C/312Encoding of control functionsISO/IEC JTC1/SC18 (control functions)
C/313Code transformationsCEN/TC304/WG4
C/3131UCS--UCS-
C/3132UCS--other coding schemes-
C/31321Europe-
C/31322World-wide-
C/32Interchange/communication-
C/3217-bit methodEWOS: Use of ISO 2022 coding structure
C/3228-bit methodEWOS: Use of ISO 2022 coding structure
C/323Multiple-octet methodEWOS: Use of ISO 10646 coding structure
C/33Internationalization supportLRE - GLOSSASOFT, ISO/IEC JTC1/SC22/WG15 and WG20
C/331Programming languages-
C/3311Language-dependent-
C/3312Language-independent-
C/332Operating systems-
C/333Communications-
C/3331Directory services-
C/3332Telematics-

5 Maintenance of the taxonomy

To allow widespread use of, and comment on, this taxonomy it is proposed that it should be published as a technical report and given adequate publicity. It is recommended that the upkeep, development and maintenance of the taxonomy should be the responsibility of CEN/TC304.