A Taxonomy for TC304 - European Localization Requirements

Source: Keld Simonsen

Date: 1998-06-02

1 Introduction and scope

In order to approach standardization in a systematic way, a common approach is to develop a way to classify the subject area, or a taxonomy. This helps in two ways:

-a taxonomy helps to identify all aspects of the domain in question which might be subject to standardization;

-a taxonomy helps to provide a logical structure for the standardization activity.

A taxonomy has been developed of relevant concepts in the domain of character-set technology, based on user requirements for functionality, as discussed in Clause 4 of Part I of this report.

By way of an application, all known current standards and standardization activities have been grouped according to this taxonomy, thus forming another type of taxonomy, that of the standards themselves.

2 A taxonomy for European localization requirements.

Figure 1: Topical map of user requirements in European localization requirements

The present classification of the concepts was made through the identification of commonalties, such as characters, sets, fonts and rules relating to presentation. The analysis was based on a much wider view of "multi-cultural support", a shown in Figure 1, which attempts to map some of its concepts. Areas relevant to this report were chosen and developed into the full taxonomy, shown in clause 3.2. This latter choice comprises the technology which relates to methods for specifying, and rules governing, the creation of unique properties and codes which facilitate the presentation, storage and transmission of individual characters.

The taxonomy in clause 3.2 was based on references ISO/IEC TR 10000-1, ISO TR 12382 and IEC 824 and the activities of appropriate standardization bodies, but most notably the work of CEN/TC304 and ISO/IEC/JTC 1.

3 Description of classification

3.1 Description

Reference to Figure 1 shows the background upon which the taxonomy itself is based. User requirements may be summed up in the single phrase "multi-cultural support", being the need to accommodate all the requirements of different types of user, whether they are racial, national, typographical, occupational or individual. The figure is not intended to be exhaustive, nor fully developed, but allows the choice for the TC304 taxonomy to be based on logical analysis. The primary choice was for text based topics, in line with the capability of computer technology to code, store and process individual characters.

The taxonomy in clause 3.2 takes the classic form of a tree structure, where two major classes are recognized; Locales and Characters. The former deals with the cultural environment of the user, the latter with the smallest divisible parts that make up the messages which are being electronically processed.

A taxonomy of whatever phenomena can be constructed in several ways, depending on its purpose and the aspects applied. (For instance, a number of persons may be grouped firstly according to age, then according to gender, then according to place of living -- or precisely the other way around, according to need.) A taxonomy for standardization purposes naturally has to take into account the most practical ways to group existing standards and standardization projects as well as the logical connections between them and any conceptual "holes" which may need to be filled in order to cover the full need for standardization.

The following taxonomy is thus intended to provide a map for almost all of the user requirements identified in Part I (see the application in Part III). Therefore the level of subordination in some cases go very deep -- this does not mean that the actual standardization projects need a taxonomy of the same complexity. When a sub-level is empty of existing or future standards, the entries in that sub-level are simply collapsed and only the level above remains.

3.2 Taxonomy for TC304

/ (no id) TAXONOMY FOR TC304

L/ LOCALES 


|----- L/1 Specifications 

| |----- L/11 Languages 

| | |----- L/111 Natural languages 

| | |----- L/1111 Vocabulary 

| | | |----- L/11111 Standard terminology 

| | | |----- L/11112 Thesauri 

| | | |----- L/11113 Standard phrases 

| | | |----- L/11114 Translation 

| | |----- L/1112 Grammar 

| | |----- L/1113 Orthography 

| | | |----- L/11131 Alphabet 

| | | |----- L/11132 Spelling 

| | | |----- L/11133 Use of special characters 

| | | |----- L/11134 Capitalization 

| | | |----- L/11135 Hyphenation 

| | | |----- L/11136 Punctuation 

| | | |----- L/11137 Transcription 

| | | |----- L/11138 Ordering 

| | | | |----- L/111381 Europe 

| | | | |----- L/111382 World-wide 

| | | |----- L/11139 Personal names and titles 

| | |----- L/1114 Speech 

| |----- L/12 Cultural conventions 

| | |----- L/121 Cultural elements 

| | |----- L/1211 Orthography 

| | | |----- L/12111 Date and time format 

| | | |----- L/12112 Numeric separators 

| | | |----- L/12113 Monetary format 

| | | |----- L/12114 Telephone number format 

| | | |----- L/12115 Payment number format 

| | | |----- L/12116 Mail address format 

| | | |----- L/12117 National places 

| | |----- L/1212 Measurement system 

| | |----- L/1213 Layout styles 

| | |----- L/1214 Paper sizes 

| |----- L/13 Operating system dependency 

| |----- L/131 POSIX 

| | |----- L/1311 Europe 

| | |----- L/1312 World-wide 

| |----- L/132 Other 

|----- L/2 Registration

| |----- L/21 Procedures 

| |----- L/211 Europe 

| | |----- L/2111 National 

| |----- L/212 World-wide 

|----- L/3
 Implementation
 

|----- L/31 Fallback 



C/ CHARACTERS 

|----- C/1 Character information

| |----- C/11 Identification 

| | |----- C/111 Characters 

| | | |----- C/1111 Identifiers 

| | | |----- C/1112 Attributes 

| | |----- C/112 Repertoires 

| | | |----- C/1121 Graphic characters 

| | | | |----- C/11211 Natural language alphabets 

| | | | | |----- C/112111 Europe 

| | | | | | |----- C/1121111 General 

| | | | | | |----- C/1121112 Disabled/elderly 

| | | | | |----- C/112112 World-wide 

| | | | |----- C/11212 Programming language alphabets 

| | | | |----- C/11213 Non-alphabetic symbols 

| | | | |----- C/112131 General 

| | | | |----- C/112131 Disabled/elderly 

| | | |----- C/1122 Control functions 

| | | | |----- C/11221 Europe 

| | | | | |----- C/112211 General 

| | | | | |----- C/112212 Disabled/elderly 

| | | | |----- C/112222 World-wide 

| | | |----- C/1123 Registration 

| | |----- C/113 Glyphs 

| | | |----- C/1131 Registration 

| | | |----- C/1132 Character correspondence 

| | |----- C/114 Glyph repertoires 

| | |----- C/1141 Registration 

| | |----- C/1142 Repertoire correspondence 

| |----- C/12 Manipulation 

| |----- C/121 Transformation 

| |----- C/1211 Case conversion 

| |----- C/1212 Transliteration 

| |----- C/1213 Fallback representation 

|----- C/2 Input/output
 

| |----- C/21 Input 

| | |----- C/211 Keyboard 

| | | |----- C/2111 Europe 

| | | |----- C/2112 World-wide 

| | |----- C/212 Other means 

| |----- C/22 Output 

| |----- C/221 Character repertoires 

| | |----- C/2211 Europe 

| | |----- C/2212 World-wide 

| |----- C/222 Character attributes 

|----- C/3 Electronic processing
 

|----- C/31 Coding schemes 

| |----- C/311 Encoding of graphic characters 

| | |----- C/3111 7-bit method 

| | |----- C/3112 8-bit method 

| | |----- C/3113 Multiple-octet method 

| | |----- C/31131 Europe 

| | |----- C/31132 World-wide 

| |----- C/312 Encoding of control functions 

| |----- C/313 Code transformations 

| |----- C/3131 UCS--UCS 

| |----- C/3132 UCS--other coding schemes 

| | |----- C/31321 Europe 

| | |----- C/31322 World-wide 

|----- C/32 Interchange/communication 

| |----- C/321 7-bit method 

| |----- C/322 8-bit method 

| |----- C/323 Multiple-octet method 

|----- C/33 Internationalization support 

|----- C/331 Programming languages 

| |----- C/3311 Language-dependent 

| |----- C/3312 Language-independent 

|----- C/332 Operating systems 

|----- C/333 Communications 

|----- C/3331 Directory services 

|----- C/3332 Telematics

4 Taxonomy of current standardization work and research

What follows is an application of the above taxonomy to standardization and research projects currently going on. The purpose is to illustrate one use of the taxonomy as well as to provide a map of where the respective work is being carried out.

Code	Title	Current standardization or research activity
/ (no id)	TAXONOMY	CEN/TC304
L/	LOCALES	-
L/1	Specifications	-
L/11	Languages	-
L/111	Natural languages	-
L/1111	Vocabulary	ISO/TC 37, LRE - TRANSTERM, GENELEX
L/11111	Standard terminology	LRE - POINTER
L/11112	Thesauri	-
L/11113	Standard phrases	-
L/11114	Translation	LRE - PAROLE, EUROTRA
L/1112	Grammar	-
L/1113	Orthography	-
L/11131	Alphabet	CEN/TC304/WG2
L/11132	Spelling	-
L/11133	Use of special characters	-
L/11134	Capitalization	-
L/11135	Hyphenation	-
L/11136	Punctuation	-
L/11137	Transcription	-
L/11138	Ordering	-
L/111381	Europe	CEN/TC304/WG1
L/111381	World-wide	ISO/IEC/JTC1/SC22, ISO/TC46, ISO/TC37
L/11139	Personal names and titles	-
L/1114	Speech	LRE - EAGLES, LRE - SPEECHDAT
L/12	Cultural conventions	ISO/IEC JTC1/SC22/WG20, X/Open, CEN/TC304/WG2
L/121	Cultural elements	-
L/1211	Orthography	-
L/12111	Date and time format	-
L/12112	Numeric separators	-
L/12113	Monetary format	-
L/12114	Telephone number format	PTTs, CEPT, ENO
L/12115	Payment number format	-
L/12116	Mail address format	CEN/PC8
L/12117	National places	-
L/1212	Measurement system	-
L/1213	Layout styles	-
L/1214	Paper sizes	ISO/TC6, CEN/TC172
L/13	Operating systems dependency	-
L/131	POSIX	-
L/1311	Europe	-
L/1312	World-wide	ISO/IEC JTC1/SC22/WG15
L/132	Other X/open	-
L/2	Registration	-
L/21	Procedures	-
L/211	Europe	CEN/TC304/WG2
L/2111	National	-
L/212	World-wide	-
L/3	Implementation	-
L/31	Fallback	-
C/	CHARACTERS	-
C/1	Character information	-
C/11	Identification	-
C/111	Characters	ISO/IEC JTC1/SC2, SC18
C/1111	Identifiers	-
C/1112	Attributes	-
C/112	Repertoires	ISO/IEC JTC1/SC2, SC18, SC22
C/1121	Graphic characters	-
C/11211	Natural language alphabets	-
C/112111	Europe	CEN/TC304/WG3
C/1121111	General	-
C/1121112	Elderly/disabled	ISO/TC173
C/112112	World-wide	-
C/11212	Programming language alphabets	-
C/11213	Non-alphabetic symbols	-
C/112131	General	-
C/112132	Disabled/elderly	TIDE
C/1122	Control functions	-
C/11221	Europe	-
C/112211	General	-
C/112212	Elderly/disabled	-
C/11222	World-wide	-
C/1123	Registration	-
C/113	Glyphs	ISO/IEC JTC1/SC18
C/1131	Registration	-
C/1132	Character correspondence	-
C/114	Glyph repertoires	ISO/IEC JTC1/SC18
C/1141	Registration	-
C/1142	Repertoire correspondence	-
C/12	Manipulation	-
C/121	Transformation	CEN/TC304/WG4
C/1211	Case conversion	ISO/IEC JTC1/SC22/WG15, WG20
C/1212	Transliteration	ISO TC46 (bibliographic)
C/1213	Fallback representation	-
C/2	Input/output	-
C/21	Input	ISO/IEC JTC1/SC18
C/211	Keyboard	-
C/2111	Europe	-
C/2112	World-wide	-
C/212	Other means	-
C/22	Output	-
C/221	Character repertoires	-
C/2211	Europe	-
C/2212	World-wide	-
C/222	Character attributes	-
C/3	Electronic processing	-
C/31	Coding schemes	ISO/IEC JTC1/SC2, SC22; CEN/TC 304/WG3
C/311	Encoding of graphic characters	ISO/IEC JTC1/SC18 (text layout)
C/3111	7-bit method	CEN/TC304/WG3
C/3112	8-bit method	CEN/TC304/WG3
C/3113	Multiple-octet method	CEN/TC304/WG3
C/31131	Europe	-
C/31132	World-wide	-
C/312	Encoding of control functions	ISO/IEC JTC1/SC18 (control functions)
C/313	Code transformations	CEN/TC304/WG4
C/3131	UCS--UCS	-
C/3132	UCS--other coding schemes	-
C/31321	Europe	-
C/31322	World-wide	-
C/32	Interchange/communication	-
C/321	7-bit method	EWOS: Use of ISO 2022 coding structure
C/322	8-bit method	EWOS: Use of ISO 2022 coding structure
C/323	Multiple-octet method	EWOS: Use of ISO 10646 coding structure
C/33	Internationalization support	LRE - GLOSSASOFT, ISO/IEC JTC1/SC22/WG15 and WG20
C/331	Programming languages	-
C/3311	Language-dependent	-
C/3312	Language-independent	-
C/332	Operating systems	-
C/333	Communications	-
C/3331	Directory services	-
C/3332	Telematics	-

5 Maintenance of the taxonomy

To allow widespread use of, and comment on, this taxonomy it is proposed that it should be published as a technical report and given adequate publicity. It is recommended that the upkeep, development and maintenance of the taxonomy should be the responsibility of CEN/TC304.