Project team report

CEN/TC304/PT01

User requirements study and programming for standardization in the field of Character Set Technology

or: Standards for providing multi-cultural functionality in IT-systems

Final report
1995-09-06

for approval by:
CEN, CENELEC, ETSI and EWOS

Editor: Ţorvarđur Kári Ólafsson


FOREWORD

This report was developed by a Project team that was set up under mandate M/037 from EC to develop a European strategy for Character Set Technology (CST) standardization. The work is a direct continuation of the CST workshop that was held in Luxemburg 1-2 December 1994 [1].

The final report is submitted to CENELEC, ETSI, EWOS and CEN for approval. CEN/TC304 is responsible for its final approval.


CONTENTS

FOREWORD

PART 0 -- GENERAL

1. Introduction and background
2. Scope
3. Definitions, abbreviations and symbols
4. Conclusions and recommendations

PART I -- USER REQUIREMENTS STUDY

1. Introduction and scope
2. Definitions
3. Abbreviations
4. Statement of problem -- the users' requirements
5. The European context
6. Internationalization
7. Character sets
8. Implications of the ICT workshop recommendations
9. Summary of the requirements

PART II -- TAXONOMY

1. Introduction and scope
2. Taxonomy of character set technology
3. Description of classification
4. Taxonomy of current standardization and research
5. Maintenance of the taxonomy

PART III -- STRATEGY

1. Introduction
2. Promotion/awareness
3. Research and Development
4. Standardization
5. European and ISO/IEC standardization
6. Coordination
7. Role of particular organizations and standardization bodies

PART IV -- WORK NEEDED FOR MULTI-CULTURAL SUPPORT IN IT

1. Introduction and scope
2. Promotion and coordination activities
3. Research and development activities
4. Deriving the standardization action from the user requirements
5. Proposed European standardization work
6. Description of new CEN work
7. Description of new EWOS work
8. Description of new ETSI work
9. Description of new CENELEC work
10. Funding of the work

ANNEXES

A. Members of this PT

B. Bibliography

C. Mandate M/037

D. References to standards

E. General recommendations of the CST-workshop

F. The indigenous languages of Europe

G. Extract from the Nordic report on cultural requirements


PART 0 -- General


1. Introduction and background

From 1985 to 1992, a committee under ITSTC was responsible for European character set standardization. This was CEN/CENELEC/IT/WG-CSC, Character sets and coding. It developed several European functional pre-standards, ENV 41501 -- 41508, all based on 8-bit coding.

The work of other organizations then started to affect the future of CST standardization in Europe:

This development led to the disbanding of IT/WG-CSC and the establishment of CEN/TC304 in late 1992. The new committee has an extended scope, derived from a first draft taxonomy that was based on the Nordic report on cultural requirements.

In December 1993, the EC Senior Official Group on IT standardization (SOGITS) through its secretariat in EC DG.III, issued a standardization mandate in the field of CST, asking CEN/TC304 to develop a European strategy for CST standardization.

In order to execute this mandate, TC304 organized a workshop and established the Project Team PT01. The workshop was on European User Requirements for Internationalization of IT and Character Set Technology and took place in Luxemburg 1-2 December 1994. The Project Team is basing its work on the results of this workshop [1], see further Part I and Annex E.

The ultimate goal of this work is that new CST standards meet user requirements and provide basis for a true multilingual infrastructure in Europe. A further goal is that users know which standards are available and may obtain guidance on their usage.

2. Scope

The scope of this work is the study of user requirements for standardization related to Character Set Technology (CST). The study should identify current problems, formulate requirements and needs for standards and standard solutions.

The report defines a European policy and strategy with respect to CST, aligns the existing and ongoing standardization activities with this policy, and finally it defines a European work program for CST, including a taxonomy and European sub-repertoires of ISO/IEC 10646. This work is based on a clear definition of European user requirements. All required standardization activities in order to realise the user functionality should be defined.

See also Annex C.

3. Definitions, abbreviations, symbols

See Part I, clause 2 and clause 3.

4. Conclusions and recommendations

4.1 Conclusions

The project team has met the requirements of the mandate (see Annex C). A joint European work programme for standardization in the field of CST has been agreed between CEN, CENELEC, ETSI and EWOS (see Part IV). This work programme has been based on a clear definition of user requirements (see Part I). These user requirements were gathered at the open workshop held in Luxemburg in December 1994. A taxonomy for CST is presented in Part II. A strategy for implementation of it is identified in Part III which involves not just standardization but also R-and-D, promotion and coordination activities.

4.2 Recommendations

Recommendation 1
CEN/TC304 should publish and maintain the taxonomy which represents a key tool for the management and coordination of the work programme.

Recommendation 2
The strategy defined in Part III should be implemented.

Recommendation 3
The work programme for standardization and other activities defined in Part IV should be implemented and partly funded by the EC.

PART I -- User requirements study

1. Introduction and scope

A workshop was set up by CEN/TC304 to give an extensive overview of user requirements concerning internationalization. Its results, documented in [1] were the main input to this study. The workshop agreed on the general recommendations found in annex E.

This part aims to define which are the requirements for CST standardization in Europe on part of the prospective users of the standards. The users include other standardizers, End users, IT applications, IT industry, service providers and public procurers.

For the purposes of this study, user requirements are grouped in an order of priority, ranging from High through Medium to Low. A high priority means that the topic should be addressed urgently in order to improve the usability of IT for most people. Medium priority is assigned when the action would improve the usability of IT for some people, or improve it marginally for most users. Low priority topics would not improve usability for most users, but general benefits would accrue, or some selected groups of users would benefit marginally.

Note that this part of the report tries to describe the user requirements regardless of whether they are to do with standardization or not. The standardization perspective is introduced in Part III.

2. Definitions

For the purpose of this report the following definitions apply. The sources of definitions are given, except for definitions that are local to this report.

Attributes: those general aspects of a character used independently of design (font or style), to convey additional abstract meaning, e.g. bold or underline.

Character: a member of a set of elements used for the organization, control or representation of data [ISO 4873]. NOTE: Characters are sub-divided into graphic characters and control characters.

Character set (Character repertoire): a finite set of different characters that is complete for a given purpose [ISO 2382-4].

Character Set Technology: the technology for handling characters within IT and providing multi-cultural functionality in informatic and telematic systems. This includes input, coding, interchange, rendition, transformation, identification, ordering and other ways of manipulating character data by electronic means.

Charmap: a text file describing a coded character set. Each character set description file defines characteristics for a coded character set and the encoding of characters. Other information about the coded character set may also be in the file. Coded character set character values are defined using symbolic character names followed by character encoding values [ISO/IEC 9945-2].

Coded character set, Code: a set of unambiguous rules that establishes a character set and the one-to-one relationship between the characters of the set and their bit combinations [ISO 4873].

Coding scheme: a collection of rules that maps the elements of one set onto the elements of a second set.

Coding: the process of allocating unique bit combinations to an individual character according to established rules.

Combining character: a member of an identified subset of the coded character set of ISO/IEC 10646 intended for combination with the preceding non-combining graphic character, or with a sequence of combining characters preceded by a non-combining character [ISO/IEC 10646].

Control function: an action that affects the recording, processing, transmission or interpretation of data, and that has a coded representation consisting of one or more bit combinations [ISO 6429].

Cultural convention: a convention of an information system which is functionally common between regional cultures, but may differ in presentation, operation behaviour or degree of importance. NOTE: application or organizational cultures are not considered here [3].

Cultural register: a register of cultural conventions related to IT.

Cultural requirements: requirements due to properties of the language(s), commonly accepted rules for its use -- especially in written form -- or other special characteristics of a society in a certain geographic area [5].

Diacritical mark: a combining character which forms part of a letter.

Europe: the geographic area whose boundaries are the Ural Mountains, the Caspian, Black and Mediterranian Seas, the Atlantic Ridge and the North Pole.

Fallback representation: an approximate representation of a character, made for equipment not capable of representing the character correctly.

Font: a collection of glyph images having the same basic design, e.g. Courier Bold Oblique [ISO 9541-1].

Glyph: a recognisable abstract graphic symbol which is independent of any specific design [ISO 9541-1].

Graphic character: a character, other than a control function, that has a visual representation, normally hand-written, printed or displayed, and has a coded representation consisting of one or more bit combinations [ISO 4873].

Indigenous: native to a certain geographic area.

Internationalization: a process of producing an application platform or application which is capable of being localized for (almost) any cultural environmental easily. NOTE: an internationalized information system does not have a dependency on any specific culture unless it is localized to that selected culture [3].

Letter: a graphic character that is in the alphabet of a natural language [5].

Locale: the definition of the environment of a user that depends on language and cultural conventions. It is made up from one or more categories. Each category is identified by its name and controls specific aspects of the behaviour of components of the system [ISO/IEC 9945-2].

Localization: a process of adapting an internationalized application to a specific cultural environment [5].

Operating system: software that controls the execution of programs and that may provide services such as resource allocation, scheduling, input/output control and data management [ISO 2382-1].

Ordering: an operation by which two different objects (for example two character strings) are assigned a context-free deterministic ordering [3].

Script: a set of graphic characters used for the written form of one or more languages [3]. Examples: Latin script, Cyrillic script, Greek script.

Special character: a graphic character that is not a letter, a digit or a spacing character [ISO 2382-4].

Symbol: a character or letter, or stylized representation of an object.

Taxonomy: a classification of concepts or a terminology system.

Telematics: the application of information and communications technologies and services, usually in direct combination.

Transcription: the process whereby the pronunciation of a given language is noted by the system of signs of a conversion language. A transcription system is of necessity based on the orthographic conventions of the conversion language. Transcription is not strictly reversible [ISO 3602].

Transformation: any conversion of coded character data, including transliteration, transcription, code conversion and fallback rules.

Transliteration: the process which consists of representing the characters of an alphabetical or syllable writing by the characters of a conversion alphabet. In principle, this conversion should be made character by character [ISO 3602]. NOTE: transliteration is a reversible process.

User interface: the part of a system with which the user interacts [ANSI X3.172].

Users (1. of IT) Persons or organizations utilising IT in their work or leisure (2. of CST-standards) Users of IT , providers of IT-technology, developers of IT-standards.

3. Abbreviations

10646see UCS
ANSIAmerican National Standards Institute
APIApplications Programming Interface
ASCIIAmerican Standard Code for Information Interchange (7-bit code covering the English alphabet)
ASN.1Abstract Syntax Notation no. 1
BLISSa picture symbol language used by severely motor and speech disabled
BMPBasic Multilingual Plane (Part 1 of ISO/IEC 10646)
CADDIACo-operation in Automation of Data and Documentation for Imports/Exports and the Management and Financial Control of the Agricultural Market
CCITTsee ITU
CENComité Européen de Normalisation
CENELECComité Européen de Normalisation Électrotechnique
CEPTConference Européen des Postes et Télecommunications (see also ETSI)
CLSCambridge Language Survey
CSCCharacter Sets and Coding
CSTCharacter Set Technology
DISDraft International Standard
EAGLESExpert Advisory Group on Language Engineering Standards
EBCDICExtended Binary Coded Decimal Interchange Code
ECThe European Commission
EDIElectronic Data Interchange
ENEuropean Standard
ENOEuropean Numbering Office
ENSEuropean Nervous System
ENVEuropean Pre-Standard
EPHOSEuropean Procurement Handbook for Open Systems
ETSIEuropean Telecommunications Standardization Institute
EUThe European Union
EWOSEuropean Workshop on Open Systems
HTTPHyperText Transfer Protocol
ICTInformation and Communication Technology
IDAInterchange of Data between Administrations
IEInformation Engineering
IECInternational Electrotechnical Commission
IETFInternet Engineering Task Force
IRVInternational Reference Version (of ISO/IEC 646)
ISOInternational Organization for Standardization
ITInformation Technology
ITSTCIT Steering Committee (of CEN, CENELEC and ETSI)
ITUInternational Telecommunications Union
ITU-TThe Telecommunications standardization part of ITU
JTC1Joint Technical Committee no. 1 (of ISO/IEC, covering IT-standardization)
LELanguage Engineering (follow-up of LRE)
LRELinguistics Research and Engineering
MIMEMultiple-purpose Internet Mail Extension
MLISMultilingual Information Society
MOTISMessage Oriented Text Interchange System
ODAOpen Document Architecture
OSIOpen Systems Interconnection
POSIXPortable Operating Systems Interface
PTProject Team
R-and-DResearch and Development
SGMLStandardized Generalized Mark-up Language
SCSub-Committee
SOGITSSenior Officials Group on IT Standardization
SQLStructured Query Language
STRÍStaÝlaráÝ Íslands (Icelandic Council for Standardization)
TCTechnical Committee
TC304Technical Committee no. 304 (of CEN, covering Character Set Technology standardization)
TEDISTrade EDI Systems
TERENATrans European Research and Education Network (formerly RARE and EARN)
TIDETelematics for Disabled and Elderly people
TG/CSTechnical (Liaison) Group on Character Sets (EWOS, formerly TLG/CS)
TRTechnical Report
UCSUniversal Multiple-Octet Coded Character Set (ISO/IEC 10646)
WGWorking Group
X.400ITU-T recommendation on MOTIS

4. Statement of problem -- the Users' requirements

User requirements are considered here in relation to the whole domain of information technology (IT), both actual and potential. In particular, the emergence of the Single European Market and the evolution of public and private services, from national service orientation to a pan-European environment, is leading to a number of new requirements for users of IT. These affect fields of activity in which they may be engaged. The essential need is for there to be no constraint or effective limit to their personal capabilities and performance imposed by incompatibilities within the IT domain. In other words, the technology must offer benefits which support and increase the ability of each individual to work or play as efficiently and as easily as they can.

4.1 The users

It is convenient to consider a number of large easily identifiable groups of users and then to examine their needs in relation to applications and tasks they need to perform.

4.1.1 The working population
There are some 180 million people actively employed in the European Union, another 5 million in EFTA and anything up to 500 million within continental Europe who may interact with one another and with those in countries outside Europe, truly a distributed international environment, or "global name-space".

A rapidly increasing percentage of these working people will make use of IT in some form or another, whether directly with word processing, or indirectly using data bases or data processing. Their tasks will all involve reading input and output via characters. To a large extent these characters will be the ones in their local language, but at some time or other many of the users will have the need to interact with other users whose language or culture uses an alternative character set.

To avoid incomprehension, or at least inefficiency, they must be able to access these "foreign", unfamiliar characters. Additionally, users will interact with IT "tools", e.g. software which may present feedback and help information in another language. Neither the tools nor the medium must impose any barrier to users' efficiency. Users' requirements are for access to own language scripts and fonts and translation from/to others for interaction with other countries, nationalities and cultures.

Users in the working population are characterized by aptitudes, training and experience which generally fit the task they perform. That task may demand a high degree of "task fit", in cultural and language terms. Different and highly specific user requirements may often be identified within user groups or professions, such as authors and writers (technical and scientific, fiction), journalists, translators and interpreters, engineers, scientists, medical specialists, etc.

This whole group of users is held within defined age limits, somewhere between 15 and 65, but still includes many people with special needs, particularly in the context of this study, such as elderly and disabled.

4.1.2 The leisure population
The working population overlaps the users who pursue leisure interests outside their work when they employ IT. One example is the increasing number of people who now regularly use a personal computer for their home and leisure pursuits. In some European countries up to 25% of all private households possess a PC and many are extending their interests into the Internet.

Other people have requirements for interaction with public services utilising IT, such as information retrieval services, shopping and banking services and even self-service vending machines for many everyday activities, such as car parking, purchase of tickets and general small items of merchandise. This group is characterized by a very much wider range of age, possibly from as young as five years, up to 80 or even 90. A large proportion of this group may have special needs, not just visual and physical handicaps, but mental handicaps as well, and simply by being very young or very old. The range of capabilities and aptitudes is large and includes both those with no education or training, and users with highly specialized training and education and a wealth of experience.

This leisure group has at least as great a requirement for language and cultural exchange as working people, if not more so. However, in general their encounters with texts in other languages are more sporadic and they have less familiarity with IT. Another reason for highlighting this group is the proportion temporarily or permanently embedded within a culture other than their own, with the requirement, therefore, to be able to communicate to a culture outside the one in which they dwell.

4.1.3 People with disabilities and the elderly
As previously mentioned, a significant proportion of the population may be characterized as having some disability or suffering from the effects of ageing. In the EU states, some 2%, representing about 6.5 million people, have some visual deficiency, 2.7% a hearing problem, 2.3% have some form of mental handicap and as many as 6% have a physical deficiency, and this too has some impact on this study, because of its impact on manipulation of IT input devices. Reference may be made to various publications from COST 219 programme. A significant number has speech impairment and motor disability, or another combined disability.

The significance of visual deficiencies lies especially with those having acuity or astigmatism problems, who may have a problem perceiving fine detail in character sets, making it especially difficult to distinguish diacritical marks or the detail of some non-Latin characters. The general significance relates to form and fonts and character size, over all of which the user should be able to have some control.

Blind IT users are now beginning to be able to master graphical user interfaces supported by combined sound and tactile interfaces, after a period with great problems introduced by new, and for them, inaccessible technology. When introducing new CST these users expect not to be left behind again, but to find their alternative ways of displaying and controlling information built into the user interfaces from the start. One such need is the alternative representation of characters in Braille for tactile display.

Physical deficiencies are an issue relevant to this study, because of their impact on manipulation of IT input devices. There are also more general issues of control and display devices e.g. text phones, Teletext (Minitel/Viewdata/Videotext), Braille (including keyboards), touch screens, mice and operating systems, not addressed in this study, but which falls into the domain of ETSI.

Nationally developed systems are in a critical situation now in the area of internationalization. One such is text telephony, mainly used for text conversation in the telephone network by deaf, hard-of-hearing, speech-impaired and deaf-blind people. The users are impatient to get support for an internationally useful replacement of the currently incompatible national systems.

Teletext is the medium used, for example, for subtitling of TV. It has a general requirement for the internationalization of character representation, but also for standardising use of colours and other effects of expressing specific non-language related contents of the TV programme.

Finally, support for sign languages within the context of CST should be developed to cater for the need from transcribed documentation and education in these languages. The symbolic languages used by people with language disabilities i.e. Bliss and Pictogram needs support in the international IT environment both for local and remote communication. Education in pronunciation is highly related to modern IT implementations, and a support of phonetic transcription is needed in that sector.

An important long-running initiative within the European Union is the so-called TIDE programme (Telematics for Disabled and Elderly People). Since 1992 TIDE has been running with the aim to improve the quality of life for disabled and elderly people and to improve the European industry and market in products and services that meet their needs.

Past and current TIDE projects and other activities in the area are summarized below:

Braille: progress on standardization, as Braille alphabets are not all identical, e.g. work in ISO TC 173, work on contraction systems, text to Braille conversion.

BLISS: which is a picture symbol language for use by severely motor and speech disabled, employing a standard Bliss-symbol Graphical Character Set (Reg. no. 169 in the ISO International Register of Coded Character Sets to be used with Escape Sequences).

Sign language representation: many deaf people communicate with sign language, which needs coded representation. TIDE has three projects dealing with signs.

Text telephones: used by many deaf, hearing impaired, speech impaired and deaf-blind people. Activities in COST 219 and COST 220, and standardization work in ETSI and ITU.

Teletext: international exchange of television programs, including Teletext, raises the problem of conversion and transliteration. A similar problem arises with sub-titling.

Character sets and control codes: need for definition of minimal subsets of existing standards, e.g. ISO/IEC 10646-1, ISO 6429. In particular real-time control of conversation is critical.

Symbols for signing and posting: is used, for example as a standardized way of informing which accessibility features for disabled people are available in an application. Work to define more symbols is ongoing in ETSI and ITU.

4.1.4 Intermediate users
Intermediate users are users that need standards to provide services to the end users listed above. Their needs are thus always derived from the preceding end user requirements.

These needs are specialized and professional, and they originate from users such as standardizers, political administration, procurers, manufacturers, application developers, service providers, user organizations and professionals such as librarians and linguists. These people normally also have the role of end users -- using the technology for their job and leisure -- but has the added role of being responsible for others' use of the technology too.

The requirements of intermediate users originate from the intrinsics of the technology, that is how it works, and not just the results which are the main interest of end users. Many of the requirements on how it works are essentially the same as end user requirements, such as functionality, consistency, inter-operability, preciseness, efficiency, security, economic viability.

But some standardization requirements come from the question on how to make it work, such as guides for making standards in the field, guides for implementation in applications, standards for APIs to be used by application developers, generic standards on APIs for programming language standards developers, standardized profiles for which coded character sets to use in communications, registry standards for reference in other standards, and overall description of all existing and required standardization efforts with priorities and work allocation provided. Actually almost all of the standards are written for the use of intermediate users, as end users do not care about for example how characters are coded or applications are programmed or letters are communicated, as long as it works.

A summary of requirements in the field of CST for different intermediate types of users is given below. The individual requirements are further elaborated and listed in clauses 6 and 7.

Standardizers have major requirements on a diversity of levels. First of all they use each others' standards, for example the coded character set standards are used in programming languages, and the programming languages are used to define database languages. For this use to be consistent for example across programming languages or communication standards there is a need for guides on the use of CST in such standards. Member bodies have requirements for their national and cultural specifications to be precisely defined and referencable uniquely, and standards writers have a need to be able to use this information in a well defined way. Last, but not least, standardizers need to know how all the standards work together in an appropriate model, so they know what is available or expected and plan accordingly.

Political administrators such as the European Commission or national governments require that the CST requirements for the use of their citizens are fulfilled. They need a standards apparatus to ensure that this can be achieved in a non-monopolistic market in a economically viable and inter-operable way. They also require overviews of the technology so they can fund what is necessary to ensure the fulfilment of the citizens' requirements.

Procurers, manufacturers, application developers, and service providers require the standards and possibly guides of using these, and also information on available standards and new work.

User organizations and professionals require standards to fulfil the requirements of the people they work for. For them, overviews of needed standards and guides are essential to avoid gaps.

Many standardization efforts will thus need to satisfy a number of user audiences and the diverse audiences may make it more difficult to reach consensus.

4.2 Technical aspects

4.2.1 General
The applications of IT considered in detail in this report cover text processing, but similar considerations apply to text when used for many other applications.

The first basic task is reading. Clearly many other applications, if not all, involve this. Thus, information retrieval of all kinds, from public information sources such as signs and notices, transport departure and arrival boards, timetables, electronic data bases and news sources, teletext, etc., all require perception and above all comprehension of the displayed text images.

The second basic task is writing. Interactive tasks, where users have a requirement to input characters from a keyboard, offer an additional complication, and users must be offered a familiar character set, which they must be able to recognise, or the instructions on how to enable it. User interface design is outside the scope of this work and much design guidance material is available, including that on characters.

Character encoding has evolved to facilitate transliteration, or the representation of spelling in characters of another language, and conversion. With time this has satisfied more and more of the users needs, but the field is constantly expanding. Many coding schemes have been employed and their requirements are discussed in clause 7.

4.2.2 Text and data processing and communication
To make use of any language of choice, a user needs the ability to have input and output in that language, including all the characters, accents and diacritical marks necessary to support it.

Transliteration and transcription are used when the letters of the original text are unavailable or incomprehensible for most readers. In bibliographical or other scholarly work the need for an exact representation of the original text calls for transliteration, while transcription is used to write e.g. Greek and Russian names in books and newspapers for the common reader. It may also be necessary to incorporate fallback representation to avoid annoying errors when trying to write certain letters where adequate input/output facilities are not available. In addition, a user needs to be able to receive and convey information that may be in a convention or notation peculiar to a particular nationality or culture (including business or scientific sub-cultures).

The following are areas where different notations may be used, which may be culturally dependent (a more comprehensive treatment is given in [5]):

In all applications, messages to the user and their expected responses must be capable of being selected in language of choice. This aspect is discussed further in clause 6.

The number of characters in the Basic Multilingual Plane (BMP) of the Universal Character Set is 65.000 characters, but with the addition of further planes the repertoire is expected to become up to 250.000 characters. Users will expect to be able to enter and see any or all of these characters with their equipment. This aspect is discussed further in clause 7.

4.3 Conclusions

4.3.1 Summary of task and user requirements
While this review highlights the very wide diversity of users and tasks they perform utilising IT, it is possible to summarize their requirements into a single generic requirement. Users need to be able to:

To facilitate communication and interaction outside the boundaries of their own culture or nation, many elements of these requirements need a degree of standardization. This is addressed in later clauses of this report. Many individual and national or cultural requirements have already been identified, and some references to standards in this area are attached in Annex D.

4.3.2 Prioritization of user requirements
High priority requirements:

Medium priority requirements:

5. The European context

5.1 The multilingual environment

Europe comprises some 45 states, totalling a number of around 700 million inhabitants. In these states, some 160 different indigenous languages (out of which maybe 70% have written forms) are spoken (see Annex F), each with its particular socio-cultural characteristics. In addition, a large number of non-indigenous languages are spoken by substantial immigrant communities.

The development of education, business, communication and leisure together with many other factors has led to the situation where most Europeans are able to understand - in many cases also make themselves understood in -- more than one language.

Europe today thus finds itself in a unique situation in the developed world. In a relatively small, relatively densely populated area many different languages and cultures are mixed. The different languages are used in swiftly growing intercommunication between different peoples and areas, something which puts obvious demands on the means of communication.

5.2 A growing need

5.2.1 Needs
Intra-European trade is increasing, as is business in general. Growth of intra-European travel and administration contacts follows. There is also a growth in intra-European tourism and other leisure activities involving communication by many different means.

The recent and ongoing upsurge in the development and use of multimedia is a striking example of culture-dependent, software-based communication. The need for localization of such software is an obvious one; the need for internationalization and localization of means of communication for professional purposes is of older date and becomes more urgent by the day.

5.2.2 Timeliness
a. Coming together

As of 1 January 1995, the European Union has three new members, bringing in two new Union languages plus the Sámi languages into this very close co-operation.

Right now, discussions on the inclusion of the former Eastern Block European countries are progressing apace, and even if in all likelihood it will take 5-10 years before even the first extension of the Union is made in that direction, already business contacts are growing fast and administrative co-operation is being discussed.

It is thus clear that from the point of view of closer co-operation, the time is very ripe to start work on CST standardization for European needs.

b. Falling apart

In the perspective of cultural disintegration which already in several Eastern European countries has led to conflict and even civil war, it can hardly be doubted that better text communication facilities between the different cultural groups can only lead to better understanding and more concord.

5.2.3 EU initiatives
Because of the need for a flow of administrative information between the EU Member States and between the Member States and the Commission, the Commission a long time ago began to initiate projects which could provide the necessary conditions.

One of the most ambitious projects, the European Nervous System (ENS) -- also known as "Support for the Establishment of Trans-European Networks between Administrations" - is currently encountering difficulties of practical and financial kinds, and national implementations seem uncertain.

Related to ENS are the recently made Council Decisions on "a series of guidelines for trans-European data communications networks between administrations" and on a Community programme to "support the implementation of trans-European networks for the interchange of data between administrations (IDA)". These projects seem to be relatively firmly based in all relevant authorities.

Already in use is the Co-operation in Automation of Data and Documentation for Imports/Exports and the Management and Financial Control of the Agricultural Market (CADDIA). There are pilot projects running in some Member States; others are being installed.

Then there is the development in the EDI area, driven primarily by the private markets but with substantial support by the EU in the form of the TEDIS (Trade EDI Systems) programme, now in its seventh year.

Also, of course, the Commission is perhaps the world's biggest user of translation services and as such has far-reaching interests in tools which makes possible text transpositions from one culture/locale to another.

Last but not least one should mention the EC White Book proposals in 1993 for substantial investments in a European infrastructure for information and the EC Action plan for Europe's way to the information society in 1994.

Preservation and promotion of cultural and linguistic diversity is one of the guiding principles on which EU policy for the information society rests. The Commission (XIII-E and III-F) will prepare in May 1995 a Communication addressing European linguistic issues and means to stimulate the emerging language-based industry

This Communication will address the stimulation, coordination and regulatory initiatives to be undertaken in co-operation with the member states for the creation of a linguistic infrastructure of resources and services that improve language communication. It will also overcome language barriers hampering the development of the information society. The measures proposed will increase the use and efficiency of information and communication systems while contributing to the enrichment of the linguistic diversity of Europe and reinforcing Europe's language industry. [2].

The conclusion is that the European Union is making large efforts -- and planning perhaps even larger ones -- at creating a basis for an extensive, multi-purpose data communications network encompassing all Member States and reaching towards prospective members. Clearly these efforts will be seriously hampered if facilities for the handling of alphabetical and cultural differences are not in place.

5.2.4 EPHOS
It should be mentioned that the work within European Procurement Handbook for Open Systems (EPHOS), which in its second phase [7] includes character repertoires, has been criticized for recommending outdated (and little-used) methodology. Any future standardization work in the CST area needs to liaise with the EPHOS work in order to keep it up to date and make sure that the EPHOS recommendations are in line with CST standards.

5.3 European research

The Commission is funding a Linguistic Research and Engineering programme -- an effort to involve industry in linguistic engineering and to provide the European users with infrastructure that will enable some 360 million EU citizens to handle text information in a variety of languages. One such project is GLOSSASOFT, intended to result in guidelines for developing and re engineering software products with internationalized features. Expert Advisory Group on Language Engineering Standards (EAGLES) is working on the linguistic problems in IT with the aim of developing common functional specifications for description and manipulation of linguistic specific data. The Cambridge Language Survey (CLS) is aimed at assisting in the creation of large computerized electronic dictionaries and includes the aspects of cultural differences within languages and language groups.

In 1994, three preparatory actions on reusable language resources were launched within the LRE-programme. They are POINTER on terminology data, SPEECHDAT on spoken language resources and PAROLE on harmonized textual and lexical resources and tools. These actions are expected to lead to standards proposals in this field that the EU may wish to develop into more formal European or international standards.

There are also other programs such as EUROTRA, TELELANG, TRANSTERM, GENELEX; EUROLANG and GRAAL that work on the linguistic problems within IT.

However, it should be pointed out that the majority of the projects are not concerned with character sets issues, the focus lying on the linguistic aspects. Many European languages can be written and presented by use of one single 8-bit code table, and for that reason the character sets issue is treated as lower layer problem.

There is also some privately funded research and development aimed at providing multi-lingual services in SQL data bases in more than 50 languages simultaneously and in a heterogenous environment. The results of this work are encouraging and deserve more support since both aspects of the problem are addressed: the interchange of data coded in different character sets and the linguistic aspects of such data and information.

Practically oriented work is being done both nationally and internationally within the European standardization framework. The CEN work described in Part 0 is one example.

In general, for reasons mainly to do with the multilingual European scene described in 5.1 above, research is being carried out in many places in Europe which concerns different aspects of CST.

5.4 The global connection

Duplication of standardization work must be avoided. Therefore, the basic principle for standardization in Europe is that when there is a need for a European standard, the work should be done on a European and not on a global basis only when either

the topic is not relevant to other parts of the world, or

for whatever reason, global standardization will not be done in the foreseeable future.

In the second case, the European results will always be used as input into ISO/IEC if and when a corresponding project is started there.

Although Internet has made some progress on internationalization support in telecommunication standards, the same is not true for ISO/IEC and ITU-T. Work on internationalization for OSI purposes is done only in Directory and Virtual Terminal contexts. Basic work has been done within the framework of POSIX standardization, but there are problems with referencing that work outside POSIX standards. And while a basis for multilingual character set coding now exists in the form of ISO/IEC 10646 (see clause 7), much more pressure is needed in order to reach the desired extent of its implementation.

In Europe, even if the EU Member States do not always properly perceive the needs for CST standardization, there is a forceful standardization community and there is also the strong push and support by the Commission. The latter includes support for investigating the actual user needs.

5.5 Conclusions

5.5.1 Start in Europe
At the workshop mentioned in the Introduction there was a strong consensus on the need for means to facilitate computer-based communication between people based in different languages and cultures; also for means to localize application program interfaces in Europe. While some standardization to this end is underway in ISO/IEC (see above), much is lacking -- in particular those parts which are culturally dependent.

Some European standardization work in this area has already been started, such as the project of standardising a register of cultural specifications. This fact, plus the prominent position enjoyed by European researchers and technicians in the field of CST, is a good argument for Europe taking the lead in standardization. That way, Europe should be able to sustain (and build on) its current competence as well as pull other regions along -- even if the languages and cultural factors are locally dependent, the methodology is universally applicable.

It has been suggested that European CST standardization could include a pilot project -- perhaps a reference implementation -- exploring the practical use of ISO/IEC 10646 and also providing material for implementation guides.

It was furthermore stressed by the CST workshop (see clause 1) that close co-operation between R-and-D in this field and the standardization efforts is required.

5.5.2 Language priorities
Resources for CST standardization are limited, and work cannot be started on all languages used by large population groups in Europe. It has been suggested that priority should be given therefore to indigenous languages, also for the reason that Europe should not tell non-European countries how to standardize CST.

A strong argument for not leaving non-indigenous languages outside the scope of CST standardization, however, is the fact that there are large groups in the EU speaking Arabic, Hindi, etc., and that those people by law have the right not to be discriminated against. Hence, the standardization bodies (and the Commission) must be prepared to argue its case if a decision as proposed above is made and the matter is subsequently brought to the Court of Justice.

Another argument for taking in non-European languages would be the matter of trade and business: it makes good sense from an economical viewpoint to facilitate business relationships by providing for localization.

However, it should be noted that standards and implementations for many "immigrant" languages and scripts are presently being developed in the home countries and regions of origin (Arabic, Indic, and East-Asian for instance). Standards bodies like CEN/TC304 should liaise with the corresponding members in those areas to facilitate provision of the necessary tools. For the present, it should remain a European priority to accommodate Europe's indigenous languages, however large or small, since no one else will be able to do so.

5.5.3 General European requirements
High priority requirements

Medium priority requirements

6. Internationalization

6.1 General introduction

Internationalization in the IT-context means conferring the ability to use the same application in different cultural environments.

Users want applications that are truly adapted to their cultural environment, so that the use of the application feels completely natural with respect to the user's cultural expectations.

6.2 Man-machine dialogue

The main problem is the support for language. An application issues a number of messages to the user and asks for corresponding input. Both message and input are language-dependent. In addition, they are very much application-dependent.

Some language-dependent messages and input are commonly used, for example asking for confirmation or rejection (yes/no/cancel?), and may be standardized.

A vast majority of messages, however, will be defined by the specific application. The best solution here would be to help applications administer messages/input according to the different languages and cultures, and to give users mechanisms to supply their own input to the application. A standard for users (or national distributors or user consultants) to specify input and messages for a given application would advance the availability of customized man-machine dialogue significantly.

As guidance for the use of a consistent terminology in a language, a recommended list of IT-related terms in each language could prove very useful.

In essence the user requires that the man-machine dialogue is in a language that he can understand, normally his own language. If the dialogue is available only in other languages, then it is a requirement that a natural language translator be present. This will also facilitate communication between people not having a common communication language.

To have all applications issuing messages and accepting input in all languages of the world seems to be a large problem and a difficult task.

6.3 Coded character set support

Another big task is the support for coded character sets, which is fundamental for internationalization and independent of most other internationalization issues. The character set requirements are discussed separately in clause 7 below.

6.4 Specific cultural issues

Many issues are involved in the users' needs for proper cultural support. The report "Nordic cultural requirements on IT" [5] is a good catalogue of the needs. Amongst the issues can be mentioned:

Machine translation services give an additional dimension to these problems.

6.5 Current application support

Various levels of support for the above requirements can be found in today's applications, much of the support being provided by the operating system. It is recommended that as much as possible of the general cultural support be available to all applications, in a uniform way, and a natural place to put these services would be the operating system.

6.5.1 POSIX coverage
A number of cultural conventions have been catalogued by the POSIX Operating System (ISO/IEC 9945 part 1 and 2), which via POSIX locales support character classification (digits, letters, etc.), character string ordering, numeric formatting, currency amount formatting, time and date formats and rudimentary messages support (yes/no). Via POSIX "charmaps" the character set support can be dealt with. A few other culturally dependent conventions can also be handled by POSIX, including time zone and daylight savings rules. The specification format is extensible, so it can cover other issues on internationalization.

6.5.2 Other work
A number of applications and especially proprietary operating systems have specified cultural conventions on much the same issues as POSIX. However, POSIX is currently the only official standard that contains stringent specifications of both the application program interface and the specification of data. Some other standards, such as SGML, contain stringent specifications of coded character sets. SQL has interfaces for many cultural specifications, including sorting and character sets. But the only generally agreed specification method for cultural conventions is specified in the POSIX standard, which also has accompanying APIs to handle the specifications. Currently, though, there is no way of uniquely referring POSIX data -- but the forthcoming CEN cultural registration standard will provide this.

6.6 POSIX as a model for internationalization

The POSIX standards provide a model for the internationalization on an application platform, by specifying functions which are dependent on cultural parameters changeable dynamically, and by adding a standardized way to specify the culture-dependent information. This provides a modularity that makes adaptation of an application to a particular market independent of writing the application. This improves the time to market and reduces the cost of providing the application in the cultural environment, and it is a reasonably efficient way in machine terms of implementing the internationalization support. This is the current model in ISO for internationalization being developed by ISO/IEC JTC1/SC22/WG20, and it is recommended that this model be applied.

6.7 Remaining user requirements

As described above there is an internationally agreed model for internationalization, and an accompanying specification method for data, and how to use the data via standardized APIs. This is a useful base to build further standardization on, but a number of technical issues need to be addressed before the user requirements can be fulfilled.

To satisfy fully the user requirements in all applications may take a long time. It is desirable that the user gets the best possible service in the meantime. This may be accomplished by specifying preferences for alternatives, for example messages in another language when support for the preferred language is not available. The other example is the provision of an alternative representation if the equipment available is not capable of correctly presenting the characters. The internationalization model needs to be enhanced to provide fallback representation, and APIs need to be specified for the fallback support.

6.7.1 Additional cultural information
There is a need to specify cultural conventions beyond what is currently possible in POSIX. Currently, this is done in free text. Later a standardized formal specification may be developed. Work should be started on how to specify and use further cultural convention data beyond current standardized methods. Some work on this is undertaken in ISO/IEC JTC1/SC22/WG20, which is producing a TR on the framework and model for internationalization, and an international standard for specifying cultural convention data.

6.7.2 Cultural register
It would be very helpful to software developers if data on cultural elements were available. The proposed CEN cultural register would address them by allowing world-wide visibility and unique referencing to cultural conventions. A register may also accommodate new formalized specification methods.

Another problem is that, for a number of cultures, data are not available. Reliable data are hard to obtain. A specified process would help. CEN/TC304 is currently working on a standard for registration of such data. National work on collecting and obtaining consensus on cultural data should be encouraged.

When reliable data are available, they should be uniquely identifiable world-wide and easily accessible. Then any software producer could get hold of the information, and every user would be able to know and specify precisely what behaviour is wanted. The data should be available in a formally specified form and electronically, so that applications could process them without change (possibly via operating system services) and automatically deliver the desired support. The forthcoming CEN Cultural Register Standard provides a means for this.

For the use of the POSIX locales and charmaps, reference to the Cultural Register should be built into the standards concerned; examples include:

This should be done at the ISO level, and it is therefore desirable that the forthcoming CEN cultural register standard be further developed into an ISO/IEC standard.

6.7.3 Support in programming language and other standards
The work of adapting existing standards to be able to use the cultural conventions is expected to involve a significant number of standards. The implementation of the cultural support should be consistent across applications and programming languages. Therefore, guidance to the standards writers is required on the use of references and for the use of data via functions etc.

Such guidance for the design of programming languages is the subject of a revision of ISO/IEC TR 10176. Guidelines for APIs for cultural conventions and Programming language independent APIs are the subjects of ISO internationalization projects in ISO/IEC JTC1/SC22/WG20.

6.8 Specification technique guidance

6.8.1 Guidance to member bodies on specification of cultural conventions
The specification of cultural conventions in formal POSIX notation may be complex or involve arbitrary selections; for example, specifying culturally correct ways of the ordering of character strings for the new ISO/IEC 10646 standard, which covers almost all languages of the world. Guidance to member bodies on the specification of cultural conventions is needed. This will also achieve some consistency between cultural specifications. A work item on the production of such a guide has been proposed by the ISO/IEC JTC1/SC22/WG15 POSIX WG.

6.8.2 European locale
There is a need for specification of default European rules for many of the cultural conventions. The existence of a "European locale" would make it much easier to specify the national locales by concentrating on those aspects which are undeniably national and leaving the rest to follow the "European culture".

6.8.3 Ordering of large repertoires
Specifically for the sorting of strings users would like to have a uniform method which would give consistent results within all applications also in a world-wide distributed environment. To have separate specifications for all possible character sets, in a wide range of application platforms, would be cumbersome for the user. A base standard for the sorting of strings is therefore necessary. Simple ways to specify small deviations from such a standard would be needed both for the individual user to get precisely what is wanted, and also for identified cultures to specify their normal way of sorting. CEN/TC304 has a work item on ordering the characters of a European subset of ISO/IEC 10646-1, which has recently gone out for TC enquiry. ISO/IEC JTC1/SC22/WG20 has just started a project on a default ordering of ISO/IEC 10646-1.

6.9 Promotion

Once the formal cultural data is available and may be referenced in the standards, they should be applied by a number of different agents for products to be available to the end users. These agents include national member bodies, the IT industry, the communication service operators, and user communities such as translation offices, computational linguistic societies, procurers, etc. in Europe and the rest of the world. It is important that these agents be informed of new developments, so that they can take advantage thereof. Contact should be taken with all agents and information be sent regularly, preferably by electronic means.

6.10 Conclusions

All in all there are a number of areas where a coordinated effort could improve the situation significantly for the users. Only some of these are advisable to be done on the European level; others should be pursued on the global level.

High priority requirements
Medium priority requirements

7. Character sets

7.1 The evolution

Instead of just having the graphics coded as patterns, graphic images of well known use, for example letters, digits, mathematical symbols, are coded as characters. This facilitates handling of the symbols in computers and also communication between computers. The coding of characters have facilitated this in an efficient way, satisfying with time evolving more and more of the users' needs. In this evolution many coding schemes have been employed.

A short list of coded character sets currently in use includes the 7-bit ISO/IEC 646, its predecessor ASCII, other national variants of ISO/IEC 646, the 8-bit ISO 8859 series (regional sets covering e.g. Eastern and Western European languages, Greek and Arabic), ISO 6937 (covering the Latin script in an 8/16 bit code), the Japanese, Chinese and Korean 14-bit sets (JIS X0208 and X0212, GB 2312 and KSC 5601, respectively).

The only standardized coded character set which is intended to cover all languages of the world is the recent ISO/IEC 10646, also known as the Universal Multiple-Octet Coded Character Set, or UCS. In order to satisfy that purpose, however, it is necessary to use different codes and different levels of support. The standard is still being extended, but with some 33.000 characters it covers most of the languages in use today.

There are also a myriad of non-standard character sets developed by manufacturers to cover the same needs. Notably IBM has been creative, with a family of EBCDIC character sets (foremost among those incompatible with ISO/IEC 646) and PC code pages.

The X.400 applications supports the 8-bit coding systems defined by ISO 6937 and any set specified in the ISO/IEC character set registry. The X.500 Directory Service uses the T.61 (ISO 6937) coding, which caters for almost all characters used by Latin alphabet languages.

Recently, new objects have been introduced in the Directory Service which make internationalization and language specification possible, but as yet no applications with these new features exist. Since June 1992, the Internet also has the means to exchange messages containing multiple character sets through the methods defined in the specification of MIME (Multiple-purpose Internet Mail Extension). The World Wide Web service now has started to implement the underlying protocol HTTP, based on 8-bit character coding.

7.2 Towards the ideal situation

The users have to accept what the suppliers offer, and the selection of character sets for any one product is usually narrow. Over the years, therefore, a base of installed hardware, software and data has been built up which uses a range of different character sets and codes. Therefore, and although UCS is able to cater for all current and projected needs, there will during the foreseeable future exist a number of different coding schemes.

Nevertheless, the user ideally should perceive that the character set interface of his choice is the same all over the world, regardless of region or country where the data originated, and without intrusion of any underlying technical complexities of communication service or application program.

There is therefore a major need for conversions between all existing coded character sets. UCS is the primary building block for this, since it will encompass all the other character sets. Its implementation will provide the required integrity of characters as well as the required support for multi-linguality in the advanced network services.

7.3 The current situation

Many companies in the computer industry recognize the UCS Basic Multilingual Plane (BMP; ISO/IEC 10646-1), as a universal coding solution, but not many products which support this standard are on the market yet. That may be because of its complexity and lack of sophisticated devices for input (no special keyboards and drivers exist) and output (simultaneous rendition and presentation of all character content of BMP on the output devices).

Thus despite some promising signs, Europe and the rest of the world are still of mixed opinions regarding the use of character sets. The users still use equipment based on 8-bit coding, the telematic services do not widely implement UCS, and the old standards are still implemented.

Some of the major reasons for the lack of UCS implementations are the following:


Despite this, one property of the existing IT applications is encouraging, namely that some of the current and most important telematic services have the capability to transport a large number of different character sets, including UCS.

7.4 Discussion of issues

First, the provision of UCS is discussed, to provide the users with the required capabilities, then the requirements for coexistence with other character sets are described.

7.4.1 UCS provision
Users need to be able to handle any conceivable character in the world, and ISO/IEC 10646-1 (BMP of UCS) is aiming at providing this capability. There is no other candidate for this purpose in the standards world, and the industry standard Unicode is incorporated in the UCS standard. A major user requirement is thus the provision of UCS, possibly in steps.

a) Input

It is expected that input in the future be generated on keyboards not very different from current keyboards, which have about 100 individual keys. There is a user requirement for standards for generating UCS characters with only a limited set of keyboard keys. These standards are currently varying from culture to culture, and standardization should cater for this cultural variance, while also offering one or more globally standardized input methods for UCS. The keyboard WG of ISO (ISO/IEC SC18/WG9) has recently started a new project assigned on the issue of global input method standards, while the CEN cultural registration standard can cater for registering culturally dependent input methods.

b) Processing, communication and storage

The different encoding forms and levels of UCS have different properties and also requirements for support by the application. Applications are usually written in a programming language and require the support by an operating system, and it is thus essential for proper programming and portability of applications that programming languages and operating systems support the various encoding forms and levels of UCS. The ISO group on programming languages, operating systems and programming environments (ISO/IEC JTC1/SC22) has several projects that address the issue of UCS support in the operating system, including POSIX, an internationalization framework technical report, a guide on the design of programming languages and language independent APIs for internationalization.

Communications standards need to be upgraded to be able to handle the different forms and levels of UCS, in a similar way to the operating systems and programming languages described above.

As the different encodings and levels of UCS have different capabilities and strong and weak points, UCS can be expected for as long as we can see to be present in different forms, and a requirement to be able to handle conversion between these coding forms is a major one. Specifications of these conversions are included in the standard, but the capabilities need to be present in applications to prevent users from having problems. Software for this conversion between UCS encodings is freely available. A related issue is the conversion to and from smaller character sets, which is covered in clause 7.4.2.

Getting applications to support UCS is thus a major goal. This cannot be controlled by standards authorities or public authorities, but progress can be helped by public procurement requirements and publicly funded programmes for development and awareness.

c) Output

There are a number of output requirements for the support of UCS:

d) Subsets of UCS

Some of the implementation problems discussed above could be solved by the provision of subsets of UCS. CEN/TC304 has already started work on defining European subsets, particularly aimed at solving the problems of outputting the full character set of UCS.

It is estimated that full UCS implementation will be costly in the first stages of UCS use, and that manufacturers then only will implement a subset. To ensure that a common subset which can be used by the vast majority of European users be available for a reasonable price, and as a guide to manufacturers, it will be helpful to users and procurers of systems if European subsets of UCS, encompassing all characters for use in European languages plus other frequently used characters, be specified. Also such subsets may be useful to do further standardization work for example on sorting, so that the work is reasonably limited and still useful in an European environment. Only a small number of subsets should be specified. CEN is currently working on such European subsets of UCS.

7.4.2 Coexistence issues
A major user need is to have access to conversions between all coded character sets that are implemented in their equipment.

Fallback methods

Inherent in the specification of character set conversion is how to handle inconvertible characters, including information preserving and information losing fallback techniques, and as different requirements may be present, it can be expected that different solutions be needed. One requirement is information preserving fallback, where the fallback representation would be legible to the user in the character set available, and where fallback representation is only done when needed, thus not disabling the user further from the limited character set. Another requirement is the preservation of the number of characters, but possibly information-losing.

This work is being investigated by different groups, e.g. CEN/TC304/WG4, IETF and TERENA. All this work needs to be brought together in a single set of specifications.

7.4.3 Coexistence
Users may have data represented in different forms on various stages of computing for as long as we can foresee. This will eventually most likely be in various forms of UCS, but data encoded by other methods will be in common use for a long period. To presuppose a situation where only one character set exists is not realistic. It is thus essential that tools be provided for coexistence of UCS and other coded character sets. A guide for users with recommendations on how to use character sets in a heterogeneous character set world would also be important.

7.4.4 Other coded character sets.
Although UCS is able to cover all European needs there is still a need to develop and maintain other coded character sets. Some standards have special needs for coding, such as the bar-code standards or optically recognisable characters OCR-B. These standards do not cater for accented European characters and need to be enhanced. Other current standards need to be maintained for example for alignment with UCS, as they are still very much in use.

7.5 Conclusions

High priority requirements:
Medium priority requirements:
Low priority requirements:

8. Implications of the ICT workshop recommendations

The EC held a European ICT standardization policy workshop 28-30 November 1994. The Project Team has considered the implications of the initial conclusions of this workshop, given in annex E. Only the CST specific requirements are discussed here.

8.1 Organizational structure

The high-level strategy group proposed by the ICT workshop should be composed in such a way that it is able to take due account of the linguistic aspects of the information society.

The linguistic aspects can easily get lost in structures such as Fora, Consortia and ad-hoc developments of publicly available specifications. This kind of structure may counteract a true multilingual infrastructure in Europe.

The single workshop approach is not in itself a threat to a true multilingual infrastructure in Europe, as long as the proper management tools are in place.

8.2 Future role of national standardization organizations

In the field of CST, national standards are a tool to provide reliable documentation of the cultural conventions throughout Europe. The national bodies also provide an important link to end-users that otherwise would not be able to make themselves heard.

For minority cultural groups, national bodies may not be the best channel to get heard internationally, so other means must be found as well, which do not exist today.

8.3 International standardization

While most of the IT standardization work should be international, Europe should take the lead in the internationalization of IT, which means developing European base standards in the field, for promotion as international standards later.

8.4 Other issues

The other main results of the ICT-workshop were agreed upon by the CST-workshop and PT01.

8.5 CST-specific organizational requirements

High priority

9. Summary of user requirements

9.1 The users

The general requirements of the average user may be summarized thus:

This is valid also for communication with and among disabled people. For this case, there is need for support from other ways of input and output of common languages or specific languages suiting other ways of perception and production. Examples are the symbolic language BLISS and natural sign languages. For the industry to be able to supply products suiting these requirements, the CST standardization must include them in the common work.

9.2 Cultural conventions

There is a need for proper documentation and standardization of the cultural conventions in Europe, such as orthography, in order to achieve appropriate application of those conventions in IT services and products.

Examples of such services and products are

The industry needs coordinated registration procedures at the European level.

9.3 Internationalization and Europe

The problems related to internationalization of IT are not specific to Europe, but due to the level of technical development and the large number of languages, they are of a particular concern to Europeans. Therefore European action is needed on the standardization work in this field, especially for the European languages. Liaison with organizations outside Europe is necessary because of the need both for Europeans to communicate in other languages and to coordinate work on general principles of internationalization.

9.4 UCS

The European user requirements can be met through the implementation of the BMP of ISO 10646-1, UCS. Since the number of characters in UCS is huge, a subset needs to be defined for European purposes.

Solutions are also needed to related problems such as

Existing IT standards which include or presuppose the use of coded character sets need to be updated to cater to the use of UCS.

9.5 Coexistence

For a long time to come, IT products using the 8-bit coding method will continue to exist and be used alongside UCS products. It must be possible to communicate between those two environments; therefore tools for transformation will be needed.

In addition, means for code transformation between different coding methods in user equipment and telematic services must be provided either in the user environment or in the services. The use of a European UCS subset must be taken into account.

A particular way of handling the coexistence problem is the fallback representation. Proper methods for transcription/transliteration also taking into account the needs of disabled and elderly people must be provided.


PART II -- A Taxonomy Of Character Set Technology

1 Introduction and scope

In order to approach standardization in a systematic way, a common approach is to develop a way to classify the subject area, or a taxonomy. This helps in two ways:

-a taxonomy helps to identify all aspects of the domain in question which might be subject to standardization;

-a taxonomy helps to provide a logical structure for the standardization activity.

A taxonomy has been developed of relevant concepts in the domain of character-set technology, based on user requirements for functionality, as discussed in Clause 4 of Part I of this report.

By way of an application, all known current standards and standardization activities have been grouped according to this taxonomy, thus forming another type of taxonomy, that of the standards themselves.

2 A taxonomy of character set technology.


Figure 1: Topical map of user requirements in CST

The present classification of CST concepts was made through the identification of commonalties, such as characters, sets, fonts and rules relating to presentation. The analysis was based on a much wider view of "multi-cultural support", a shown in Figure 1, which attempts to map some of its concepts. Areas relevant to this report were chosen and developed into the full taxonomy, shown in clause 3.2. This latter choice comprises the technology which relates to methods for specifying, and rules governing, the creation of unique properties and codes which facilitate the presentation, storage and transmission of individual characters.

The taxonomy in clause 3.2 was based on references ISO/IEC TR 10000-1, ISO TR 12382 and IEC 824 and the activities of appropriate standardization bodies, but most notably the work of CEN/TC304 and ISO/IEC/JTC 1.

3 Description of classification

3.1 Description

Reference to Figure 1 shows the background upon which the taxonomy itself is based. User requirements may be summed up in the single phrase "multi-cultural support", being the need to accommodate all the requirements of different types of user, whether they are racial, national, typographical, occupational or individual. The figure is not intended to be exhaustive, nor fully developed, but allows the choice for the CST taxonomy to be based on logical analysis. The primary choice was for text based topics, in line with the capability of computer technology to code, store and process individual characters.

The taxonomy in clause 3.2 takes the classic form of a tree structure, where two major classes are recognized; Locales and Characters. The former deals with the cultural environment of the user, the latter with the smallest divisible parts that make up the messages which are being electronically processed.

A taxonomy of whatever phenomena can be constructed in several ways, depending on its purpose and the aspects applied. (For instance, a number of persons may be grouped firstly according to age, then according to gender, then according to place of living -- or precisely the other way around, according to need.) A taxonomy for standardization purposes naturally has to take into account the most practical ways to group existing standards and standardization projects as well as the logical connections between them and any conceptual "holes" which may need to be filled in order to cover the full need for standardization.

The following taxonomy is thus intended to provide a map for almost all of the user requirements identified in Part I (see the application in Part III). Therefore the level of subordination in some cases go very deep -- this does not mean that the actual standardization projects need a taxonomy of the same complexity. When a sub-level is empty of existing or future standards, the entries in that sub-level are simply collapsed and only the level above remains.

3.2 Taxonomy for CST and internationalization

/ (no id) TAXONOMY OF CST AND INTERNATIONALIZATION

L/ LOCALES
|----- L/1 Specifications
| |----- L/11 Languages
| | |----- L/111 Natural languages
| | |----- L/1111 Vocabulary
| | | |----- L/11111 Standard terminology
| | | |----- L/11112 Thesauri
| | | |----- L/11113 Standard phrases
| | | |----- L/11114 Translation
| | |----- L/1112 Grammar
| | |----- L/1113 Orthography
| | | |----- L/11131 Alphabet
| | | |----- L/11132 Spelling
| | | |----- L/11133 Use of special characters
| | | |----- L/11134 Capitalization
| | | |----- L/11135 Hyphenation
| | | |----- L/11136 Punctuation
| | | |----- L/11137 Transcription
| | | |----- L/11138 Ordering
| | | | |----- L/111381 Europe
| | | | |----- L/111382 World-wide
| | | |----- L/11139 Personal names and titles
| | |----- L/1114 Speech
| |----- L/12 Cultural conventions
| | |----- L/121 Cultural elements
| | |----- L/1211 Orthography
| | | |----- L/12111 Date and time format
| | | |----- L/12112 Numeric separators
| | | |----- L/12113 Monetary format
| | | |----- L/12114 Telephone number format
| | | |----- L/12115 Payment number format
| | | |----- L/12116 Mail address format
| | | |----- L/12117 National places
| | |----- L/1212 Measurement system
| | |----- L/1213 Layout styles
| | |----- L/1214 Paper sizes
| |----- L/13 Operating system dependency
| |----- L/131 POSIX
| | |----- L/1311 Europe
| | |----- L/1312 World-wide
| |----- L/132 Other
|----- L/2 Registration
| |----- L/21 Procedures
| |----- L/211 Europe
| | |----- L/2111 National
| |----- L/212 World-wide
|----- L/3 Implementation
|----- L/31 Fallback

C/ CHARACTERS
|----- C/1 Character information
| |----- C/11 Identification
| | |----- C/111 Characters
| | | |----- C/1111 Identifiers
| | | |----- C/1112 Attributes
| | |----- C/112 Repertoires
| | | |----- C/1121 Graphic characters
| | | | |----- C/11211 Natural language alphabets
| | | | | |----- C/112111 Europe
| | | | | | |----- C/1121111 General
| | | | | | |----- C/1121112 Disabled/elderly
| | | | | |----- C/112112 World-wide
| | | | |----- C/11212 Programming language alphabets
| | | | |----- C/11213 Non-alphabetic symbols
| | | | |----- C/112131 General
| | | | |----- C/112131 Disabled/elderly
| | | |----- C/1122 Control functions
| | | | |----- C/11221 Europe
| | | | | |----- C/112211 General
| | | | | |----- C/112212 Disabled/elderly
| | | | |----- C/112222 World-wide
| | | |----- C/1123 Registration
| | |----- C/113 Glyphs
| | | |----- C/1131 Registration
| | | |----- C/1132 Character correspondence
| | |----- C/114 Glyph repertoires
| | |----- C/1141 Registration
| | |----- C/1142 Repertoire correspondence
| |----- C/12 Manipulation
| |----- C/121 Transformation
| |----- C/1211 Case conversion
| |----- C/1212 Transliteration
| |----- C/1213 Fallback representation
|----- C/2 Input/output
| |----- C/21 Input
| | |----- C/211 Keyboard
| | | |----- C/2111 Europe
| | | |----- C/2112 World-wide
| | |----- C/212 Other means
| |----- C/22 Output
| |----- C/221 Character repertoires
| | |----- C/2211 Europe
| | |----- C/2212 World-wide
| |----- C/222 Character attributes
|----- C/3 Electronic processing
|----- C/31 Coding schemes
| |----- C/311 Encoding of graphic characters
| | |----- C/3111 7-bit method
| | |----- C/3112 8-bit method
| | |----- C/3113 Multiple-octet method
| | |----- C/31131 Europe
| | |----- C/31132 World-wide
| |----- C/312 Encoding of control functions
| |----- C/313 Code transformations
| |----- C/3131 UCS--UCS
| |----- C/3132 UCS--other coding schemes
| | |----- C/31321 Europe
| | |----- C/31322 World-wide
|----- C/32 Interchange/communication
| |----- C/321 7-bit method
| |----- C/322 8-bit method
| |----- C/323 Multiple-octet method
|----- C/33 Internationalization support
|----- C/331 Programming languages
| |----- C/3311 Language-dependent
| |----- C/3312 Language-independent
|----- C/332 Operating systems
|----- C/333 Communications
|----- C/3331 Directory services
|----- C/3332 Telematics

4 Taxonomy of current standardization work and research

What follows is an application of the above taxonomy to standardization and research projects currently going on. The purpose is to illustrate one use of the taxonomy as well as to provide a map of where the respective work is being carried out.

CodeTitle Current standardization or research activity
/ (no id)TAXONOMY OF CST AND INTERNATIONALIZATIONCEN/TC304
L/LOCALES-
L/1Specifications-
L/11Languages-
L/111Natural languages-
L/1111VocabularyISO/TC 37, LRE - TRANSTERM, GENELEX
L/11111Standard terminologyLRE - POINTER
L/11112Thesauri-
L/11113Standard phrases-
L/11114TranslationLRE - PAROLE, EUROTRA
L/1112Grammar-
L/1113Orthography-
L/11131AlphabetCEN/TC304/WG2
L/11132Spelling-
L/11133Use of special characters-
L/11134Capitalization-
L/11135Hyphenation-
L/11136Punctuation-
L/11137Transcription-
L/11138Ordering-
L/111381EuropeCEN/TC304/WG1
L/111381World-wideISO/IEC/JTC1/SC22, ISO/TC46, ISO/TC37
L/11139Personal names and titles-
L/1114SpeechLRE - EAGLES, LRE - SPEECHDAT
L/12Cultural conventionsISO/IEC JTC1/SC22/WG20, X/Open, CEN/TC304/WG2
L/121Cultural elements-
L/1211Orthography-
L/12111Date and time format-
L/12112Numeric separators-
L/12113Monetary format-
L/12114Telephone number formatPTTs, CEPT, ENO
L/12115Payment number format-
L/12116Mail address formatCEN/PC8
L/12117National places-
L/1212Measurement system-
L/1213Layout styles-
L/1214Paper sizesISO/TC6, CEN/TC172
L/13Operating systems dependency-
L/131POSIX-
L/1311Europe-
L/1312World-wideISO/IEC JTC1/SC22/WG15
L/132Other X/open-
L/2Registration -
L/21Procedures-
L/211EuropeCEN/TC304/WG2
L/2111National-
L/212World-wide-
L/3Implementation -
L/31Fallback-
C/CHARACTERS-
C/1Character information-
C/11Identification-
C/111CharactersISO/IEC JTC1/SC2, SC18
C/1111Identifiers-
C/1112Attributes-
C/112RepertoiresISO/IEC JTC1/SC2, SC18, SC22
C/1121Graphic characters-
C/11211Natural language alphabets-
C/112111EuropeCEN/TC304/WG3
C/1121111General-
C/1121112Elderly/disabledISO/TC173
C/112112World-wide-
C/11212Programming language alphabets-
C/11213Non-alphabetic symbols-
C/112131General-
C/112132Disabled/elderlyTIDE
C/1122Control functions-
C/11221Europe-
C/112211General-
C/112212Elderly/disabled-
C/11222World-wide-
C/1123Registration-
C/113GlyphsISO/IEC JTC1/SC18
C/1131Registration-
C/1132Character correspondence-
C/114Glyph repertoiresISO/IEC JTC1/SC18
C/1141Registration-
C/1142Repertoire correspondence-
C/12Manipulation-
C/121TransformationCEN/TC304/WG4
C/1211Case conversionISO/IEC JTC1/SC22/WG15, WG20
C/1212TransliterationISO TC46 (bibliographic)
C/1213Fallback representation-
C/2Input/output-
C/21InputISO/IEC JTC1/SC18
C/211Keyboard-
C/2111Europe-
C/2112World-wide-
C/212Other means-
C/22Output-
C/221Character repertoires-
C/2211Europe-
C/2212World-wide-
C/222Character attributes-
C/3Electronic processing-
C/31Coding schemesISO/IEC JTC1/SC2, SC22; CEN/TC 304/WG3
C/311Encoding of graphic charactersISO/IEC JTC1/SC18 (text layout)
C/31117-bit methodCEN/TC304/WG3
C/31128-bit methodCEN/TC304/WG3
C/3113Multiple-octet methodCEN/TC304/WG3
C/31131Europe-
C/31132World-wide-
C/312Encoding of control functionsISO/IEC JTC1/SC18 (control functions)
C/313Code transformationsCEN/TC304/WG4
C/3131UCS--UCS-
C/3132UCS--other coding schemes-
C/31321Europe-
C/31322World-wide-
C/32Interchange/communication-
C/3217-bit methodEWOS: Use of ISO 2022 coding structure
C/3228-bit methodEWOS: Use of ISO 2022 coding structure
C/323Multiple-octet methodEWOS: Use of ISO 10646 coding structure
C/33Internationalization supportLRE - GLOSSASOFT, ISO/IEC JTC1/SC22/WG15 and WG20
C/331Programming languages-
C/3311Language-dependent-
C/3312Language-independent-
C/332Operating systems-
C/333Communications-
C/3331Directory services-
C/3332Telematics-

5 Maintenance of the taxonomy

To allow widespread use of, and comment on, this taxonomy it is proposed that it should be published as a technical report and given adequate publicity. It is recommended that the upkeep, development and maintenance of the taxonomy should be the responsibility of CEN/TC304.

PART III -- Strategy for implementation

1. Introduction

The user needs as described in Part I reflect both a need to be able to use IT equipment in the most natural ways possible and a need to support the diversity of European cultures. Coordinated action on different levels is necessary in order to satisfy those needs. Work is required on standardization, on promotion of the standardization work and its result, and on both underlying research and subsequent product development.

2. Promotion/awareness

Promotion and awareness of the standards is a prerequisite for successful standardization and implementation of the standards. Possible such activities include workshops, demonstrations, conferences, publications of various types, and the use of electronic means such as the World Wide Web. The results of the European CST standardization and its application in the IT product development could well be promoted together.

3. Research and Development

The EU supports a wide range of projects in the domains of telematics and language engineering. Many of them have issues in common with CST and internationalization. Three categories can be identified:

Research and development related to CST standardization could be included in the EC Commission's IV Framework programme.

See also Part IV.

4. Standardization

Many CST and internationalization standards exist today, at the global, European and national levels (see Annex D). However, it is a general tendency that it takes a long time before they are used or referenced as appropriate in other IT standards. This is of course detrimental to the development of IT products. The same slowness can be seen in the development of liaisons between CST and other standardization organizations. This relative isolation of related areas must not continue.

The descriptions of work items and programmes in Part IV have been made with this in mind. However, constant vigil is necessary if the required awareness is to be built and maintained.

5. European and ISO/IEC standardization

There is a clear need in Europe for a common information infrastructure and compatible, versatile from the CST point of view IT products. Therefore, Europe must provide the CST and internationalization standards where ISO/IEC does not. Public funding of this work is likely to be necessary, as well as purposeful and concerted efforts by industry and other organizations.

6. Coordination

It is obvious from the preceding paragraphs that the aims described require well-organized coordination of work by many parties. Again, the work programmes described in Part IV have been drawn up with this in mind. In addition, any public funding of this work should include a requirement for coordination, possibly also identifying liaison parties.

7. Role of particular organizations and standardization bodies

The objectives identified in this part require the collective efforts of many organizations and bodies. What is required from the most significant of those is described here.

PART IV -- Work needed for multi-cultural support in IT

1. Introduction and scope

This part of the report identifies the work needed to achieve multi-cultural support in IT applications in Europe.

Promotion, coordination, research and development activities, based on the discussion in Part III, are outlined in clauses 2 and 3.

In clause 4, the requirements in Part I are grouped according to the taxonomy in Part II. The items in clause 4 are then in clause 5 subdivided according to status: completed or ongoing work and new work to start now or later.

New work items are described in clauses 6 9. Each clause is related to the organization which is recognized as the one mainly responsible for the respective technical area.

Finally the financial aspects are discussed in clause 10.

2. Promotion and coordination activities

2.1 Promotion

The use of CST standards should be promoted in a number of ways, some of which are:

2.2 Coordination

A number of standardization groups are involved in the field. Liaison and co-operation are vital between these groups, to inform of each others' work and to align the standards. For example there is a number of standards in the communication and programming language areas that should utilise the CEN cultural register, and thus need to be changed to refer to this registry standard. A number of liaison officers should be available for this task.

Fora for the dissemination of information on standardization work and to obtain user input have proven useful. Regular workshops is another way of providing user interface.

R-and-D projects for the development of reference implementation is one way to increase the practicability of the standards. Support of such projects could be given over the normal R-and-D programmes in the EU.

3. Research and development activities

As explained in Part III, the R-and-D relations to standardization fall into 3 main categories:

3.1 Required pre-standardization work

Research into the history and use of indigenous European characters is needed by several of the work items proposed for CEN in clauses 4-6. It is expected that this research is carried out within the projects of CEN/TC304 on a case-by-case basis.

3.2 Developing R-and-D proposed standards into formal standards

Standards proposals are expected from research projects on at least the following subjects:

CEN/TC304 is expected to develop such proposals into formal standards. This is reflected in the standardization programme below.

3.3 Development of products that implement UCS

There are many EC projects on language engineering and telematics. The Character Sets problem is somehow neglected in most of these projects and main focus is given to the linguistic part. Most of this research is limited to a single or few languages that can be written and presented by use of one single 8-bit code. For this reason the character sets problem is somehow ignored, leaving out the multilingual aspects from the research.

The problem of character sets appears with all its dimensions when the products are put on the network and included in the telematic service. In telematic services, the minimum common denominator is still ISO 646 with some minor exceptions i.e. in electronic mail, where 8-bit character sets codes are used.

For this reason the scope of the R-and-D work in developing and implementing Character Sets Technology and Cultural Conventions for IT is in the CST taxonomy class C/3332 (Internationalization support in Telematics). This would provide multi-cultural support facilities in the telematic services which are now under development in Europe (ENS programme is just one of them).

The scope of the R-and-D work should be in specifying, implementing and developing:

4. Deriving the standardization action from the user requirements

The following table is a mapping of the user requirements in Part I onto the taxonomy in Part II. A separate (third) column gives a more specific description of the required action. Column 4 refers to the grouping into phases as described in the next clause.

CodeTitleActionP
h
a
s
e
User requirement
----StandardizationOther
/ (no id)TAXONOMYa) Publish it
b) Maintain it
1
4
--
L/LOCALES----
L/1Specifications---6B3, 6B13: The availability and use of cultural specifications should be promoted
L/11Languages----
L/111Natural languages----
L/1111Vocabulary----
L/11111Standard terminologyNational translation of IT-terminology46B6: Standardized IT terminology (per language)-
L/11112ThesauriTerminological data exchange format4-R-and-D (Transterm, Genelex and Pointer)
L/11113Standard phrasesa) Standardized interface with localizers
b) Guidelines and procedures for development of internationalized software products
4
3
6B8: API standard for dialogue interface

4A7: Application prompts for appropriate orthography/notation

6B14: IT standards to be enhanced to support internationalization

R-and-D (Glossasoft)

4A8: Applications to implement dialogue with user in language of choice

L/11114TranslationHarmonized textual and lexical resources and tools (for automatic translation)46B12: Standard for machine translationR-and-D (Parole)
L/1112Grammarsee L/1114-6B12: Standard for machine translationR-and-D
L/1113Orthography----
L/11131AlphabetTechnical report on repertoires of the indigenous languages of Europe1-7B10: Research on history and use of European characters.
L/11132Spelling----
L/11133Use of special characters----
L/11134Capitalization----
L/11135Hyphenation----
L/11136Punctuation----
L/11137Transcription----
L/11138Ordering----
L/111381Europea) Standard ordering of the minimum European subset
b) Standard ordering of all European characters
1


2

6B17: Standard for "European" ordering-
L/111382World-wideStandard for deterministic ordering of all UCS characters46B15b: Standard for default ordering of UCS characters-
L/11139Personal names and titles----
L/1114SpeechModels of spoken language4-R-and-D (Speechdat)
L/12Cultural conventions----
L/121Cultural elements----
L/1211Orthography----
L/12111Date and time format----
L/12112Numeric separators----
L/12113Monetary format----
L/12114Telephone number format----
L/12115Payment number format----
L/12116Mail address format----
L/12117National places----
L/1212Measurement system----
L/1213Layout styles----
L/1214Paper sizes----
L/13Operating system dependency----
L/131POSIX----
L/1311EuropeStandard for European locale36B2: Standard for European locale-
L/1312World-wideUpdate POSIX to cover more cultural conventions46B1: Extend formal specification techniques to cover more classes of cultural conventions-
L/132OtherFormal specification techniques for cultural data46B13: Standard for specification of cultural data independent of POSIX-
L/2Registration----
L/21Procedures----
L/211Europea) European Cultural Register (ENV)
b) Technical report on unregistered cultural conventions
1


2

6A3: Registration procedures6B9: Central process to collect data from National Bodies

7B6: Encourage registration by CEN

L/2111Nationala) National standards on cultural conventions
b) Guidelines on nat-ional specifications of cultural conventions
3


3

6A2: National Bodies to collect cultural data6A4: Guidelines on the specification of cultural conventions
L/212World-wideInternational cultural registry (EN)36B5: Transform European registration procedures into global ones-
L/3Implementation----
L/31FallbackUpdate POSIX to include fallback locales46B7: Enhance formal specification techniques to allow for fallback locales-
C/CHARACTERS----
C/1Character information----
C/11Identification----
C/111Characters---4B9: User manual on purpose and use of all UCS characters. Priority: Europe.
C/1111IdentifiersShort identifiers for characters4-6B8: Applications to permit identification of all characters.
C/1112Attributes----
C/112RepertoiresRevision of R-IT-042-7B11: Guidelines on use of char. sets in Europe
C/1121Graphic charactersa) Transpose UCS into EN
b) Update UCS to include missing European characters
c) Maintenance of UCS
3


4


3

7A1: Standard on all European characters4B7: Access to all characters of UCS

7C1: Applications to handle all UCS characters

C/11211Natural language alphabets----
C/112111EuropeStandard on minimum European subset(s)17A8: Definition of European subsets of ISO/IEC 10646-1

4B6: Symbols to be available to users

-
C/1121111General----
C/1121112Disabled/elderly8-bit Braille character set34A5: Char.sets and control code reps should be developed to cater for the needs of disabled-
C/112112World-wide----
C/11212Programming language alphabets----
C/11213Non-alphabetic symbols----
C/112131General-4--
C/112132Disabled/elderlyGeneral symbol language representation3 4B2: Applications to include symbol language (Bliss or similar)

4A4: Standard on min. symbols subset for disabled and elderly

-
C/1122Control functions----
C/11221Europe----
C/112211GeneralStandard on minimum European subset of control functions4--
C/112212Disabled/elderly----
C/11222World-wide----
C/1123Registration----
C/113Glyphsa) Character-glyph model for Europe
b) International Char-glyph model
3


4

4A2: Permit selection of a variety of glyphs/ repertoires/ fonts/sizes-
C/1131Registration----
C/1132Character correspondenceCharacter-glyph correspondence for Europe34A2: Permit selection of a variety of glyphs/ repertoires/ fonts/sizes-
C/114Glyph repertoiresEnhanced OCR-B std. for European use47B13: Enhanced OCR-B is needed for Europe-
C/1141Registration----
C/1142Repertoire correspondence----
C/12Manipulation----
C/121Transformation----
C/1211Case conversion----
C/1212TransliterationTechnical report on transliteration in Europe24B4: Applications to provide transliteration possibilities-
C/1213Fallback representationa) Fallback to ASCII
b) General European rules for fallback
2


3

4A6, 7B4, 7B8: Specifications to be provided, including characters as yet undefined-
C/2Input/output Input/output devices for disabled and elderly people34A3: Facilities should cater to disabled and elderly people

4A4: Standard on min. symbols subset for disabled and elderly

4A5: Char.sets and control code reps should be developed to cater for the needs of disabled

-
C/21Input----
C/211Keyboard----
C/2111Europea) Standardized profile of UCS keyboard for Europe
b) Transpose ISO/IEC 9995 into EN
3


3

4B7: Permit users to generate and see all UCS characters

7A3: Keyboard standard for all European characters

-
C/2112World-wide--7C2: Keyboard standard(s) for all UCS characters-
C/212Other means----
C/22Output----
C/221Character repertoires----
C/2211Europesee C/1213 and C/112111-7A7: Output media to be able to handle all European UCS characters-
C/2212World-wide-- 7C3: Output media to be able to handle all UCS characters-
C/222Character attributessee C/1213 and C/112111-4A2: Output media to be able to handle many fonts and sizes-
C/3Electronic processing ----
C/31Coding schemes----
C/311Encoding of graphic charactersBar coding of European characters47B12: Bar code standard for European use needed-
C/31117-bit methodInter-working with Telex1--
C/31128-bit method8-bit sets for Europe1(Coexistence)-
C/3113Multiple-octet methodsee C/32327B1b: Guidance on design on language-independent API in relation to UCS7A11: Promotion of use of UCS

7A2: Implementation of UCS

7B2: Application support of UCS to be encouraged by R-and-D, public procurement

C/31131Europe see C/11211-7A8: European subset of UCS-
C/31132World-wide----
C/312Encoding of control functions----
C/313Code transformationsTechnical report on tools and transformation tables47B9: All code transformation standards to be compatible-
C/3131UCS--UCSGuide on conversion between UCS options37A6: Standards for transformations between different UCS options. -- Priority: UCS-2, UCS-4, UTF-8-
C/3132UCS--other coding schemesModel for transformations between European coded character sets27A10: Standards for transformations between UCS options and other encodings.-
C/31321Europe----
C/31322World-wide----
C/32Interchange/communication----
C/3217-bit methodsee C/322---
C/3228-bit methodGuidance on ISO 20220--
C/323Multiple-octet methodUse of the ISO/IEC 10646 code structure27A5: Communication standards to support UCS-
C/33Internationalization support---4A1: Applications to allow use of full orthography in language of choice

4A9: Public IT services to allow use of any language

C/331Programming languagesGuidelines for UCS in programming languages37A4: Programming languages and operating systems should support UCS alphabets7B1a: Guidelines for design in relation to UCS
C/3311Language-dependenta) Support for UCS in programming languages
b) Guidelines for the design of internation-alization
4


3

-6B10: Guidelines on internationalization functionality in programming languages
C/3312Language-independentLanguage-independent API specification46B15a: Language-independent API specification for internationalization-
C/332Operating systemsSupport for Locales in POSIX operating systems37A4: Programming languages and operating systems should support UCS alphabets-
C/333Communications----
C/3331Directory servicesIntroduction of locales in ISO standard on The Directory36B4: Reference to the Cultural Register to be built into relevant IT standards-
C/3332TelematicsMulti-cultural support in various application standards:
a) ETSI guide
c) DVB
d) Videotex
e) Radio Paging
f) GSM text comm.
g) MHS
h) ODA
i) SGML
j) RDS
k) HBES
l) IC-cards
m) Traffic Telematics
n) Medical Informatic
o) Techn. draw+doc
p) Library Informatics
-


3
3
3
3
3
3
3
3
4
3
4
4
4
4
4
4
4B3: Telematic and IT services to use same technology 7A2: Implementation of UCS

5. Proposed European standardization programme

The work in the preceding tables have been grouped into four different phases:

The phase numbers are also included in clause 4 above, for cross-reference purposes.

An attempt has been made to identify, for each item, the organization responsible, related work, type of deliverable and a tentative time-table.

Already published results of European work are listed for information in clause 5.1.

5.1 Phase 0: Published European standards/reports

Taxonomy classTitle of documentOrganizationDeliverablePublishedAction
C/112European functional standards for character sets and their codingCENR-IT-041990See Phase 1
C/112Character repertoires and their codingEWOSTLG/PT 001 report1991none
C/32Usage of coded character sets and repertoires inEWOSEWOS ETR1992none
C/3311Character repertoire and coding for inter-working with telex servicesCENENV 41504 +AC1990, 1991Replace by EN 1922
C/3312European character repertoires and their codingCENENV 41503 +AC1990, 1991Replace by EN 1923
C/3312Graphic character repertoire and coding for line drawingCENENV 415051991Replace by EN 1923
C/3312East European graphic character repertoires and their codingCENENV 41508 +AC1990, 1991Replace by EN 1923
C/3332European ODA profilesEWOSENV 41509, 41510, 41511-Support UCS, see C/3332h
C/3332Videotex presentation layer data syntaxETSIETS 3000721991Support UCS, see C/3332d
C/3332International Videotex inter-workingETSIETS 3001051991Support UCS, see C/3332d
C/3332Basic and recommended additional requirements for terminal equipment supporting Teletex applicationETSIETS 3000151995Support UCS, see C/3332g
C/3332Specification of the Radio Data System (RDS)CENELECEN 500671993Support UCS, see C/3332j
C/3332European Radio Message System (ERMES); Part 2: Service aspectsETSIETS 300133-21992Support UCS, see C/3332e-f
C/3332Numeric keyboard for home electronic systemsCENELECEN 609481990a) Replace by EN ISO 9995,
see C/2111b.
b) Support UCS in HBES, see C/3332k.

5.2 Phase 1: European standards/reports to be published soon

ItemTitleOrg./Deliv.Latest documentFormal vote startsExpected publication
/ (no id)Taxonomy of character set technology (TC-P4)CEN CRthis report1995-121996-02
L/11131Repertoires of letters used for writing the indigenous languages of Europe (WG2-P11)CEN CRCEN/TC304 N379+BÁC51996-041996-06
L/211Procedures for European registration of cultural elements (WG2-P2.1)CEN ENVprENV 120051995-071996-02
C/112111European subsets of ISO/IEC 10646-1 (WG3-P10)CEN ENVENV 19731995-061996-02
C/3111Character repertoire and coding for inter-working with Telex Services (WG3-P8.3)CEN ENprEN 19221995-051996-05
C/3112European repertoires and coding for Information processing (WG3-P6)CEN ENprEN 19231995-051996-05
C/322International Standardized Profiles -- Character set code structure based on ISO 2022 -- Part 1: FCS 111 -- 2022 Option 1 (TG-CS)EWOS EN-ISPprDISP 12070-011995-011995-12

5.3 Phase 2: Other ongoing European work

ItemTitleOrg./Deliv.Latest documentEnquiry stageReady for formal vote
L/111381aMultilingual minimum subset ordering rules for Europe (WG1-P1.2)CEN ENVCEN/TC304 N4361995-07 (TC)1996-12
L/111381bMultilingual extended subset ordering rules for Europe (WG1-P1.3)CEN ENVCEN/TC304 N4361996-06 (TC)1997-06
L/211Cultural elements (unregistered ones) (WG2-P2.2)CEN CRCEN/TC304 N4491996-04 (TC)1998-04
C/1121aTransposition of ISO/IEC 10646-1 into EN.(WG3)CEN EN-ISISO/IEC 10646:1993 +DCOR1 +DAM1-41995-09 (Public)1996-02
C/112Guide on the use of character sets in Europe (Revision of R-IT-04) (TC-P3)CEN CRR-IT-041996-10 (TC)1997-04
C/1212Description of problems and issues of transliteration and transcription within Europe (WG4-P12)CEN CRCEN/TC304 N3361996-04 (TC)1996-10
C/1213aEuropean conversion and fallback rules -- Number 1: Conversion from European subsets of UCS into ASCII (WG4-P9.2)CEN ENCEN/TC304 N4461996-10 (Public)1997-10
C/31311General model for character transformation (WG4-P9.1)CEN ENVCEN/TC304 N2171996-10 (TC)1997-04
C/323Use of the ISO/IEC 10646 code structure (TG-CS)EWOS EN-ISP-1996-06 (ED)1997-10

5.4 Phase 3: New work to start immediately

ItemTitleOrg./Deliv.Related workProposed enquiry sta.Proposed ready for FV
L/11113aGuidelines and procedures for develop-ment of internationalized software products (Glossasoft R-and-D results)CEN ENVsGlossasoft L/11113b1996-06 (TC)1996-12
L/1311European default localeCEN ENVL/21111996-09 (TC)1997-06
L/2111aNational standards on cultural conventionsNBs stdsL/1211, L/2111995-1996 (national)1995-1997
L/2111bGuidelines on national specifications of cultural conventionsCEN CR-ISJTC1/SC22 L/2111a1996-06 (TC)1996-12
L/212International cultural registryCEN EN-ISJTC1/SC22 Replace ENV of L/211a1996-03 (Public)1997-03
C/1121cMaintenance of ISO/IEC 10646CEN EN-AMD /CORISO/IEC 10646on a stand-by basison a stand-by basis
C/1121112Common 8-bit Braille character setCEN EN-ISISO TC173 JTC11996-12 (TC)1997-06
C/112132General symbol language representationCEN ENVTIDE JTC11996-12 (TC)1997-06
C/113aCharacter-glyph model for EuropeCEN ENVC/113b (JTC1), C/11321996-12 (TC)1997-06
C/1132Character-glyph correspondence for EuropeCEN ENVC/113a+b1996-12 (TC)1997-06
C/114Enhanced OCR-B standard for European useCEN EN-ISISO 1073-2 SC17/ICAO1996-12 (Public)1997-12
C/1213bGeneral European rules for fallback representationCEN ENC/1213a, C/1212, National work1996-10 (Public)1997-10
C/2Input/output devices for disabled and elderly people (TC-HF)ETSI ETSsCEN, ISO, ETSI-workshopvariousvarious
C/2111aStandardized profile of UCS keyboard for EuropeCEN ENVISO/CD 14755, ISO/IEC 9995 C/12131996-09 (TC)1997-06
C/2111bTransposition of ISO/IEC 9995 into ENCEN EN-ISETSI-HF CLC, EN 60948, CEN/TC122, CEN/TC2241996-02 (Public)1996-09
C/3131Guide on conversion between UCS coding formsCEN CR-ISISO 9945-2b, WG20-APIs1996-06 (TC)1996-12
C/331Guidelines for UCS in programming languagesCEN CR-ISrev2 TR 10176 C/3311a+b1996-01 (CD/TC)1997-06
C/3311bGuidelines for the design of internation-alizationCEN CR-IS= CD TR 10176 C/3311a1996-01 (CD/TC)1996-07
C/332Support for Locale registry in POSIX operating systemsCEN EN-ISrev. ISO 9945 L/31, L/131219971998
C/3331Introduction of locales in ISO/IEC 9594 The DirectoryCEN IS-AMDISO/IEC 95941996-01 (CD-AMD)1997-06
C/3332aGuidelines on providing multilingual functionality in ETSI standards (TC-HF)ETSI ETRall C/3332 PT and External experts needed1996-01 Very urgent 1996-06
C/3332cUCS in Digital Video Broadcasting -- (DVB) revised ETS 300468 (ETSI/EBU JTC)ETSI ETSEBU, CLC TC106, ENV 197319961997
C/3332dUCS in Videotex (TE1)ETSI ETSsold ETSs, ITU-T/SG8, ENV 197319961998
C/3332eUCS in Radio Paging (RES4)ETSI ETSGSM, ENV 197319961997
C/3332fUCS in text communication over GSM (TC-SGM)ETSI ETSRES4, ENV 197319961997
C/3332gUCS in MHS (EG-MHS)EWOS ISsETSI/TE3, JTC1/SC18, CEN/TC30419961998
C/3332hUCS in ODA (EG-SMMI)EWOS ISsETSI/TE, JTC1/SC18, CEN/TC30419961998
C/3332iUCS in SGML (EG-SMMI)EWOS ISsETSI TE, JTC1/SC18, CEN/TC30419961998
C/3332kUCS in HBES (TC 105)CLC ENsEN 60948, EN 50090, CEN/TC3041996-12 (Public)1997-12

5.5 Phase 4: Other new work

ItemTitleOrg./Deliv.Related workWork can startCould be ready for FV
/ (no id)Taxonomy of character set technology (Revision)CEN CR/M-ITall1996/71998
L/11111National translation of standard IT-terminologyNBs stdsJTC1/SC1variousvarious
L/ 11112aTerminological data exchange format (Transterm, Pointer, Genelex R-and-D results)CEN, EN-ISR-and-D projects, ISO TC37 (DIS 12200 and 12620)19961998
L/11113bMessage interface with localizersCEN EN-ISReplaces ENV19961997
L/11114Harmonized textual and lexical resources for automatic translationCEN ENParole R-and-D results1996/71998
L/111382Deterministic ordering of all UCS charactersCEN EN-ISJTC1/SC22, WG20, Replace ENVs1995 (CD in Oct)1997/8
L/1114Models of spoken language (Speechdat R-and-D results)CEN ENVSpeechdat19971998
L/1312Update POSIX to cover more cultural conventionsCEN EN-ISrev ISO 9945, L/3119961998
L/132Formal specification techniques for cultural data (In addition to POSIX)CEN EN-ISJTC1/SC22 or /SC2119951998
L/31Update POSIX to include locale default rulesCEN EN-ISrev ISO 9945, L/1312, C/33219961998
C/1111Short identifiers for charactersCEN EN-ISJTC1/SC21996/71998
C/1121bUpdate UCS to include missing European charactersCEN EN-ISISO/IEC 10646 rev. EN-IS1995/61999
C/112211Minimum set of control functions for EuropeCEN ENVMay be foll. by EN-ISP (JTC1)19951998
C/113bGlobal Character-Glyph modelCEN EN-ISJTC1
C/113a repl. ENV
1996/71998/9
C/311Bar coding of European characters (TC225)CEN ENprEN 1923, ENV 197319961998
C/313Tools and transformation tablesCEN CR/ENVall C/31319961997
C/3311aSupport for UCS in programming languagesCEN ISsJTC1/SC22 groups1995various
C/3312Language independent API specification for internationalization and UCSCEN CR-ISJTC1/SC22, WG2019951997/8
C/3332jUCS in RDS (TC107)CLC ENprEN 1923, ENV 19731996/71997/8
C/3332lUCS in IC-cards (TC224)CEN ENprEN 1923, ENV 197319961998
C/3332mUCS in Traffic Telematics (TC278)CEN ENprEN 1923, ENV 197319961998
C/3332nUCS in Medical Informatics (TC251)CEN ENEWOS, CLC, TC304-stds19961997/8
C/3332oUCS in Technical drawings and documentationCEN EN-ISsJTC1/SC24, IEC+ISO TCs, ENV 19731996various
C/3332pUCS in Library Informatics (ISO TC46)CEN EN-ISsEWOS EG-LIB ISO TC171 ENV 1973 prENV 120051996various

6. Description of new CEN work

6.1 CEN/TC304

/ (no id) Taxonomy of character set technology (revision). Phase: 4.

L/11113a Guidelines and procedures for development of internationalized software products. Phase: 3

L/11114 Harmonized textual and lexical resources and tools for automatic translation. Phase: 4.

L/1114 Models of spoken language. Phase: 4.

L/1311 European default locale. Phase: 3.

C/112132 General symbol language representation. Phase: 3

C/112211 Minimum set of control functions for Europe. Phase: 4

C/113a Character-glyph model for Europe. Phase: 3

C/1132 Character/Glyph correspondence for Europe. Phase: 3

C/1213b General European rules for fallback representation. Phase: 3

C/2111a Standardized profile of UCS keyboard for Europe. Phase: 3

C/313 Tools and transformation tables. Phase: 4

6.2 Other CEN committees

6.2.1 CEN/TC224
C/3332l See 6.5.2.

6.2.2 CEN/TC225
C/311 Bar coding of European characters. Phase: 4
6.2.3 CEN/TC251
C/3332n UCS in Medical Informatics. Phase: 4.
6.2.4 CEN/TC278
C/3332m See 6.4.5.

6.2.5 CEN/TC298
C/1121112 See 6.4.4.

6.3 CEN National member bodies

L/11111 National translation of standard IT-terminology. Phase: 4.

L/2111 National standards on cultural conventions. Phase: 3

6.4 Work in co-operation with ISO TCs

This clause lists work of interest to Europe that normally would be done by ISO. Application of the Vienna agreement is recommended for all these items, at different levels of co-operation. The exact co-operation level is to be determined by CEN. In some cases, CEN may offer to do the work, develop an international standard and invite countries outside CEN to participate.

6.4.1 ISO TC10 (and CEN TC304)
C/3332o UCS in Technical drawings and technical documentation. Phase: 4
6.4.2 ISO TC37 (and CEN TC304)

L/11112a Terminological data exchange format. Phase: 4.

6.4.3 ISO TC46 (and CEN TC304)

C/3332p UCS in Library Informatics. Phase: 4

6.4.4 ISO TC173 (and CEN TC293)

C/1121112 Common 8-bit Braille character set. Phase: 3

6.4.5 ISO TC204 (and CEN TC278)
C/3332m UCS in Traffic Telematics. Phase: 4.

6.5 Work in co-operation with ISO/IEC JTC1

This clause lists work of interest to Europe that normally would be done by ISO/IEC JTC1. Application of the Vienna agreement is recommended for all these items, at different levels of co-operation. The exact co-operation level is to be determined by CEN. In some cases, CEN may offer to do the work, develop an international standard and invite countries outside CEN to participate.

6.5.1 JTC1/SC2 (and CEN TC304)

C/1111 Short identifiers for characters. Phase: 4

C/1121b Update 10646 to include missing European characters. Phase: 4

C/1121c Maintenance of ISO/IEC 10646. Phase: 3.

C/113b Global character-glyph mode. Phase: 4

C/114 Enhanced OCR-B standard for European use. Phase: 4.

6.5.2 JTC1/SC17 (and CEN TC224)
C/3332l UCS in IC-cards. Phase: 4
6.5.3 JTC1/SC18 (and CEN TC304)
C/2111b Transposition of ISO/IEC 9995 into EN. Phase: 3
6.5.4 JTC1/SC21 (and CEN TC304)
L/132 see 6.5.5

C/3331 Introduction of locales in ISO/IEC 9594 The Directory. Phase: 3

6.5.5 JTC1/SC22 (and CEN TC304)
L/11113b Message interface with localizers. Phase: 4.

L/111382 Deterministic ordering of all UCS characters. Phase: 4.

L/1312 Update POSIX to cover more cultural conventions. Phase: 4.

L/132 Formal specification techniques for cultural data (in addition to POSIX). Phase: 4


L/2111b Guidelines on national specifications of cultural conventions. Phase: 3

L/212 International cultural registry. Phase: 3

L/31 Update POSIX to include locale default rules. Phase: 4

C/3131 Guide on conversion between UCS coding forms. Phase: 3

C/331 Guidelines for UCS in programming languages. Phase: 3.

C/3311a Support for UCS in programming languages. Phase: 4.

C/3311b Guidelines for the design of internationalization. Phase: 3.

C/3312 Language independent API specification for internationalization and UCS. Phase: 4.

C/332 Support for Locale registry in POSIX operating systems. Phase: 3.

6.5.6 JTC1/SC24
C/3332o see 6.4.1.

7. Description of new EWOS work

All these work items should be started immediately. Many organizations are involved and the work will take years.

C/3332g UCS in MHS. Phase: 3

C/3332h UCS in ODA. Phase: 3

C/3332i UCS in SGML. Phase: 3

8. Description of new ETSI work

Work should be started immediately in appropriate ETSI groups on these items with the work being coordinated by TC-HF. The guidelines are the most urgent item, because they will assist in the execution of the work in other ETSI groups; a project team including external experts is probably needed for this item.

C/2 Input/output devices for disabled and elderly people. Phase: 3

C/3332a Guidelines on providing multilingual functionality in ETSI standards. Phase: 3

C/3332c UCS in Digital Video Broadcasting. Phase: 3

C/3332d UCS in Videotex. Phase: 3

C/3332e UCS in radio paging. Phase: 3

C/3332f UCS in text communication over GSM. Phase 3.

C/3332g UCS in MHS see clause 7.

C/3332h UCS in ODA see clause 7.

C/3332i UCS in SGML see clause 7.

9. Description of new CENELEC work

C/3332c UCS in Digital Video Broadcasting see clause 8.

C/3332j UCS in RDS. Phase: 4

C/3332k UCS in HBES. Phase: 3

10. Funding of the work

The work needed to implement UCS in Europe can not be done on a completely voluntary basis. European authorities can ensure timely progress by partly funding the work The CEN and ETSI work should be covered by separate EC mandates, one for each organization. Clauses 6, 7 and 9 provide input to the EC for the drafting of such mandates. The EWOS work is expected to be covered by the existing global mandate for EWOS.

Annex A

Members of this PT

Chris Makemson, United Kingdom (chairman)
Borka Jerman-Blažič, Slovenia
Donald Anderson, Norway
Keld Simonsen, Denmark
Mats Linder, Sweden
Sten G. Lindberg, Sweden
Ţorvarđur Kári Ólafsson, STRÍ (secretary)


Annex B

Bibliography

[1] Proceedings of the CST-workshop, held in Luxemburg 1-2 December 1994. Go back

[2] http://www.ispo.cec.be/infosoc/legreg/actionla.html -- Europe's way to the information society -- an action plan (Updated version, May 1995). Go back

[3] ISO/IEC PDTR 11017 Framework and requirements for Internationalization (JTC1 SC22 N277R). Go back

[4] IOS Press: Language Industries Atlas.

[5] STRÍ TS3: Nordic Cultural Requirements on Information Technology (INSTA Technical report, STRÍ TS3, 1992). Go back

[6] JTC1 N2406: Recommendation on Coordination of Internationalization Activities.

[7] EPHOS Handbook II, EPHOS Project Office, 1994. Go back

[8] CEN/TC 304 N379+B C5, Draft for P11: Repertoires of letters used for writing the indigenous languages of Europe. Go back

[9] ISO/IEC JTC1 N1335 Final report of ISO/IEC JTC1 TSG-1 on Standards necessary to define Interfaces for Application Portability (IAP).

[10] EWOS/TLG/PT 001 report: Character repertoires and their coding.

[11] EWOS guide: Usage of coded character sets and repertoires in EWOS.

[12] Hugh A. Tucker: Application of Norms and Standards for Information Services: Character Codes in SGML (Final report for EC DG.XIII-E/1, October 1994).

[13] Cordis Focus Supplement 6, 17 February 1995: Europe's way to the Information Society.

[14] CEN R-IT-04 European functional standards for character sets and their coding.

[15] EC-DG.XIII Linguistic Research and Engineering (LRE) -- An overview, June 1994.

[16] http://www.echo.lu -- Multilingual Action Plan (MLAP) -- LRE -Overview of the actions launched in 1994 (October 1994)

Various published articles on Internationalization of IT.

Additional references are in annexes C-F


Annex C

Mandate M/037

COMMISSION OF THE EUROPEAN COMMUNITIES
Brussels, 4th October 1993
ADB/CK/
III.B.2
M 037

STANDARDIZATION MANDATE ADDRESSED TO CEN/CENELEC/ETSI IN THE AREA OF CHARACTER TECHNOLOGY

PURPOSE

The standardization organizations CEN, CENELEC and ETSI should define a coherent work programme for character repertoires an coding, including a Taxonomy for Character Set Technology.

HISTORY

The last years have seen a major technological evolution, accompanied by an equally important cost reduction, which all together have made information processing very wide spread, and no more the reserved domain of highly qualified specialists.

In parallel, the evolution of the communication concepts, materialized by the OSI standards, has created a new environment which does not constrain a processing system by limiting its communication capabilities to a closed set of proprietary protocols.

Therefore, it is obvious that the definition and coding of the character repertoires have to be revized and a policy must be defined in accordance with the new context and the new requirements.

It is also obvious that the current systems must be adapted to the new situation.

Consequently, character repertoires solutions are required for the new environment, together with a clear migration strategy for existing equipment.

The European standardization organizations have up to now published several standardization documents concerning sets and coding e.g.

a) European standards:

Several of above mentioned European standards have been published as a result of
mandate BC-IT-o8 (1986) and BC-IT-98 (1988)

b) Reports: JUSTIFICATION

At international level as well as at European level, several activities have taken place that will influence standardization activities on character sets definitions and codings; e.g.:
The above mentioned events justify a reassessment of European Standardization and therefore the Commission invites the standardization bodies to:

a) Define an European policy and strategy with respect to Character Repertoires and their encodings.

b) To align the existing and ongoing standardization activities with this policy.

ORDER

The Commission invites CEN in co-operation with CENELEC, ETSI and EWOS to define an European work programme for Character Repertoires and coding including a taxonomy for character set technology.

The work programme should:

EXECUTION OF THE MANDATE

Annex D

References to standards


D.1 International coding standards

Standards for registration of character sets :

D.2 Regional Standards

European standards :
Chinese Standards :
Taiwanese standards :
Japanese Standards :
Korean Standard :

D.3 Standards related to internationalization

Standards related to Data Input Service :
Standards related to Cultural Conventions :
Standards related to Date Format Service :
Standards related to Time Format Service :
Standards related to Numeric Formatting Service :
Standards related to Currency Formatting Service :
Standards related to Sorting and Collation Service:

D.4 Standards related to the visual representation of characters (glyphs) :

D.5 Other standards:

ISO 9241 : This standard covers the ergonomic aspects of hardware and software products. It will finally consist of 20 parts, covering the different elements of H/W and S/W ergonomics, such as visual display devices, presentation of information, user guidance, menu dialogues, etc. The standard will have significant impact on the internationalization of products through the "ease of use" requirement, which -- amongst others -- means the support of local language and cultural conventions.

ISO DIS 8613 : Office Document Architecture and Interchange Format (ODA/ODIF)
ISO/IEC TR 10000 -- 1 2ed, 1992. Information technology -- Framework and taxonomy of International Standardized Profiles.

ISO/IEC TR 12382. 1992 Permuted index of the vocabulary of information technology.

ISO 2382 Data processing; Vocabulary. (multiple parts)

IEC 824 Series of recommendations on Electrotechnical terminology

Annex E

General recommendations of the CST-workshop

held in Luxemburg 1-2 December 1994

GENERAL SITUATION:

Guidelines for implementation of CST-standards are needed.
Awareness, information clearing house, links to applied R-and-D.

Standardize at the right time (not too early, no too late)

The recommendations of the ICT-workshop need consideration by PT01

EUROPEAN WORK TO BE DONE:

Guidelines for migration to ISO/IEC 10646 are needed

Suppliers need reliable information on cultural conventions in Europe

The common cultural conventions in Europe need to be documented,
providing a basis for specifying local conventions

National bodies have an important role to play

Europe should be the leading force (locomotive) for internationalization of IT
strongly linked to basic R-and-D


Annex F

The indigenous languages of Europe

The information in F.1-F.8 is taken from [8]. The languages have been sorted by family and subfamily though this is not transparent in this list because the branches have been deleted and only the "root" and "stem" are given here. There may be errors or omissions in the list, since it is a draft. However, its main purpose is to illustrate the scope and complexity of the language map of Europe.

The information in F.9 is taken from ENV 1973 and lists only those languages covered by the Minimum European Subset of UCS.

F.1 Afro-Asiatic languages

F.2 Basque

F.3 Caucasian languages

F.4 Eskimo-Aleut languages

F.5 Indo-European languages

F.6 Mongolian languages

F.7 Turkic languages

F.8 Uralic languages

F.9 Languages covered by the Minimum European Subset (MES)

According to Annex B of ENV 1973, at least the following languages are covered by the Minimum European Subset of UCS:

Annex G

Extract from the Nordic report on cultural requirements [5]

This annex contains the table of contents from the Nordic report Nordic cultural requirements on Information Technology , ISBN 9979-9004-3-1. Copies of the full report can be obtained from:

STRÍ, Keldnaholt; IS-112 Reykjavík; Iceland; Fax:+354 587-7409; E-mail:stri@iti.is

G.0 Table of contents

0 Preface

1 Introduction
2 Characters
3 Text
4 Data elements
5 User interfaces
6 Specific applications of Information Technology
7 Legal and regulatory demands
Appendices
A Bibliography
B List of addresses
C Acronyms and index of definitions
D Keyboard layouts

TABLES and FIGURES
The report contains 48 tables and 1 figure

HTML Michael Everson, everson@indigo.ie, Everson Gunn Teoranta, Dublin, 1996-06-10.