CEN Guide to the Use of Character Sets in EuropeTC 304

UCS - Introduction


Origins and aims of the UCS

The Universal Multiple-Octet Coded Character Set, more simply known as the UCS, is intended to provide a single coded character set for the encoding of the written forms of all the languages of the world and of a wide range of additional symbols that may be used in conjunction with such languages. It is intended not only to cover languages in current use, but also languages of the past and such additions as may be required in the future.

The coding provided by the UCS is applicable to the representation, transmission, interchange, processing, storage, input and presentation of the written forms of the languages.

To achieve these aims, the UCS is a multi-part standard under continuous development. The first edition of part 1 was published in 1993 as:

At the time of writing, two Technical Corrigenda and Amendments 1 to 9 (Cor.1-2, AMD.1-9) have been published. Amendments 10 to 27 are in preparation. This guide covers both the base standard and the latest available texts of all these corrigenda and amendments.

The Basic Multilingual Plane (BMP) referred to in this title is a subset of the full UCS that may be encoded in 16 bits, so providing for a total of 65,536 character positions of which so far a large proportion have been allocated. The full UCS allows for 31-bit coding (there is a 32nd bit that is constrained to be zero) and so provides for over two thousand million characters. It should therefore have ample space to fulfill its intention of covering all languages.

For many applications of the UCS, the characters of the BMP are all that will be required. It would be very wasteful of resources if a 32-bit coding was imposed on applications that required only a subset that could be encoded in 16 bits. The UCS therefore specifies more than one form of coding for its characters, in particular providing for encoding of the BMP in a 16-bit form.

The UCS standard will be extended in future by the publication of further parts and of further editions of the existing part 1. Future editions incorporate all published corrigenda and amendments issued prior to their publication. They may in addition include further changes that have not been published separately in this way. It is the declared intention that all such extensions of the UCS will be upwardly compatible, i.e. that they will add the coding of additional characters but that once included, no character will be withdrawn or have its coding changed. The scope of the standard is, however, so wide that such an intention is difficult to maintain. It has, indeed, already been broken in published corrigenda and amendments. Nevertheless it is hoped that it will not be necessary in future to make any further exceptions to this important feature.

The UCS and UNICODE

The UCS has been developed under the auspices of Joint Technical Committee 1 (JTC 1) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). ISO maintains a World Wide Web site, which includes its catalogue and ordering information, at:

http://www.iso.ch

The JTC 1 subcommittee responsible for the UCS is SC 2, which maintains an official information service at:

http://www.dkuug.dk/JTC1/SC2

The UCS is closely related to a commercial character encoding called UNICODE™, prepared by The Unicode Consortium (e-mail: unicode-inc@unicode.org) and published as:

The Unicode Standard, Worldwide Character Encoding

which is now at Version 2.1. Information concerning UNICODE™ is available at:

http://www.unicode.org

Roughly speaking, UNICODE™ can be regarded as being the 16-bit coding of the BMP of the UCS. There is effective cooperation between the Unicode Consortium and ISO/IEC JTC 1/SC 2 which should ensure that this compatibility is maintained in future enhancements to the BMP. However, UNICODE™ is not simply the BMP of the UCS as it includes guidelines for usage that are not present in the equivalent ISO standard.

The restriction of UNICODE™ to containing only the BMP of the UCS increases the significance of the positioning of characters in future additions to the UCS. More details of the organization of the BMP are given in the chapter of this guide on the Basic Multilingual Plane.


To Top of UCS Guide