CEN Guide to the Use of Character Sets in Europe

TC 304

UCS - Basic Multilingual Plane (BMP)

Relationship to 8-bit codes

For reasons of compatibility, the row of the BMP with R = 00 has been given the structure of an 8-bit code according to ISO/IEC 2022. This requires that

the code positions 0000-001F and 0080-009F are reserved for the coding of control functions (prior to Amendment 3, only 0000-001F was available for the coding of control functions as 0080-009F was reserved for future standardization);
the code position 007F is reserved for the DELETE character (for historic reasons that have long since ceased to be relevant);
the code position 0020 is allocated to the SPACE character.

This enables the coded representation of a control function to be obtained by a simple algorithm from its coded representation in an 8-bit code in accordance with ISO/IEC 2022. The algorithm is described elsewhere in this guide.

The graphic characters in the remaining 190 code positions of row 00 are allocated in accordance with the 8-bit code specified in

ISO/IEC 8859-1:1997, Information technology - 8-bit single-byte coded graphic character sets - Part 1: Latin Alphabet No.1.

That code, and therefore row 00 of the BMP, contains graphic characters used for general purpose applications in typical office environments in at least the following languages:

Albanian, Basque, Breton, Catalan, Danish, Dutch, English, Faroese, Finnish, French (with restrictions), Frisian, Gaelic, Galician, German, Greenlandic, Icelandic, Irish Gaelic (new orthography), Italian, Latin, Luxemburgish, Norwegian, Portuguese, Rhaeto-Romanic, Spanish and Swedish.

This incorporation of ISO/IEC 8859-1 in particular makes the cells 21-7E of row 00 have the same allocations as the graphic characters of ASCII, which in its internationally standardized form is also known as the International Reference Version (IRV) of:

ISO/IEC 646:1991, Information technology - ISO 7-bit single-byte coded character set for information interchange.

The 5 zones of the BMP

To aid its interpretation and development, the Basic Multilingual Plane is divided into five zones corresponding to the following code positions:

A-zone: code positions 0000-4DFF but excluding the positions 0000-001F and 0080-009F reserved for control characters and 007F reserved for the DELETE character (leaving 19903 positions)
I-zone: code positions 4E00-9FFF (20992 positions)
O-zone: code positions A000-D7FF (14336 positions)
S-zone: code positions D800-DFFF (2048 positions)
R-zone: code positions E000-FFFD (8190 positions)

The R-zone terminates at FFFD as positions FFFE and FFFF are reserved; see the section of this guide on the 4-octet code structure of the UCS.

Each zone has a distinctive use:

the A-zone is used for alphabetic and syllabic scripts together with various symbols;
the I-zone is used for Chinese/Japanese/Korean (CJK) unified ideographs;
the O-zone is used for the Korean Hangul syllabic script, and for various other scripts;
the S-zone is reserved for use with transformation format UTF-16;
the R-zone is known as the restricted use zone and contains sets of graphic characters for various uses including private use that is outside the scope of standardization.

The transformation format UTF-16 was introduced by Amendment 1 to the first edition of ISO/IEC 10646-1, which also created the S-zone by a splitting of the O-zone. Prior to that amendment the O-zone extended to code position DFFF. UTF-16 extends the two-octet coding of the BMP into a variable-length coding. In that coding the characters of all zones of the BMP (P=00) other than the S-zone are encoded in two octets while in addition characters of any of the fifteen planes P=01 to P=10 (remember that 10 here is a hexadecimal value) are encoded in four octets.

Alphabetic and syllabic scripts of the A-zone

The A-zone is structured into named blocks, each consisting of a consecutive range of cells. Each block is allocated to a related set of characters, although a block may contain individual cells that are currently unallocated. The characters in the UCS from a particular script may be grouped together in a single block (such as BENGALI) or they may be divided among several blocks (such as BASIC ARABIC and ARABIC EXTENDED). The characters of the Latin script occupy the first four named blocks BASIC LATIN, LATIN-1-SUPPLEMENT, LATIN EXTENDED-A, LATIN EXTENDED-B but in addition there is one further block of Latin characters, LATIN EXTENDED ADDITIONAL, which occurs further into the code table.

Separate from the block structure, but closely related to it, is the concept of a collection of characters. A collection is the subset of characters allocated to a specified range of cells. The difference between a block and a collection is that the cells of a collection need not be consecutive and two collections may overlap. Collections are assigned both a name and a number. Blocks divide the code space into separate areas that are allocated for a coherent purpose. Collections put blocks and/or individual characters together to form subsets of practical significance. A user may then put several collections together to form a subset meeting a particular need, such as communication in English and Hebrew.

The following table shows the blocks and collections of the first nine rows of the A-zone, comprising cells 0000-08FF. It gives both the name and the range of cells that comprise the block. With the exception of the collection HEBREW EXTENDED, which is formed from two blocks, there is a one-to-one correspondence between blocks and collections for the characters in these seven rows. The table also gives the number assigned to the collection in the first column; the collection name is the same as that of the block.

Blocks and Collections of rows 00-08 of the UCS
(collection = block, except for collection 13; *,† = contains combining characters; see the section below on combining characters for the significance of these markings)

1 BASIC LATIN 0020-007E

2 LATIN-1-SUPPLEMENT 00A0-00FF

3 LATIN EXTENDED-A 0100-017F

4 LATIN EXTENDED-B 0180-024F

5 IPA EXTENSIONS 0250-02AF

6 SPACING MODIFIER LETTERS 02B0-02FF

7† COMBINING DIACRITICAL MARKS 0300-036F

8 BASIC GREEK 0370-03CF

9 GREEK SYMBOLS AND COPTIC 03D0-03FF

10† CYRILLIC 0400-04FF

(Reserved for future standardization) 0500-052F

11 ARMENIAN 0530-058F

HEBREW EXTENDED-A
(31 further Hebrew characters have been allocated to previously reserved cells in this block by Amd. 7)
0590-05CF

12 BASIC HEBREW 05D0-05EA

HEBREW EXTENDED-B 05EB-05FF

13* HEBREW EXTENDED (This collection comprises the two blocks HEBREW EXTENDED-A and HEBREW EXTENDED-B)

14* BASIC ARABIC 0600-065F

15* ARABIC EXTENDED 0660-06FF

85 SYRIAC
(added by Amd.27, hence the out-of sequence number)
0700-074F

(Reserved for future standardization) 0750-077F

86* THAANA
(added by Amd.24, hence the out-of sequence number)
0780-07BF

(Reserved for future standardization) 07C0-08FF

Blocks and Collections of rows 00-08 of the UCS
(collection = block, except for collection 13; *,† = contains combining characters; see the section below on combining characters for the significance of these markings)
1	BASIC LATIN	0020-007E
2	LATIN-1-SUPPLEMENT	00A0-00FF
3	LATIN EXTENDED-A	0100-017F
4	LATIN EXTENDED-B	0180-024F
5	IPA EXTENSIONS	0250-02AF
6	SPACING MODIFIER LETTERS	02B0-02FF
7†	COMBINING DIACRITICAL MARKS	0300-036F
8	BASIC GREEK	0370-03CF
9	GREEK SYMBOLS AND COPTIC	03D0-03FF
10†	CYRILLIC	0400-04FF
	(Reserved for future standardization)	0500-052F
11	ARMENIAN	0530-058F
	HEBREW EXTENDED-A (31 further Hebrew characters have been allocated to previously reserved cells in this block by Amd. 7)	0590-05CF
12	BASIC HEBREW	05D0-05EA
	HEBREW EXTENDED-B	05EB-05FF
13*	HEBREW EXTENDED (This collection comprises the two blocks HEBREW EXTENDED-A and HEBREW EXTENDED-B)
14*	BASIC ARABIC	0600-065F
15*	ARABIC EXTENDED	0660-06FF
85	SYRIAC (added by Amd.27, hence the out-of sequence number)	0700-074F
	(Reserved for future standardization)	0750-077F
86*	THAANA (added by Amd.24, hence the out-of sequence number)	0780-07BF
	(Reserved for future standardization)	07C0-08FF

Certain characters in the blocks LATIN-1-SUPPLEMENT AND LATIN-EXTENDED-B have had their names changed by Technical Corrigendum 1 (1996) since the publication of the first edition of the standard in 1993. In the first of these blocks the characters affected are:

LATIN CAPITAL LIGATURE AE, renamed to
- LATIN CAPITAL LETTER AE (ash);
LATIN SMALL LIGATURE AE, renamed to
- LATIN SMALL LETTER AE (ash).

In the other block the affected characters are these same characters with added diacritical marks MACRON or ACUTE. The same name changes will be made in the next editions of the parts of ISO/IEC 8859 in which these characters appear.

The next five rows, 09-0D, are allocated to scripts that require the two special characters

ZERO WIDTH NON-JOINER (code position 200C)
ZERO WIDTH JOINER (code position 200D)

in the coding of languages written in those scripts. As with rows 00-06, there is a collection corresponding to each block, but for these rows the collection consists of the characters allocated to that block together with these two special characters.

The following table shows the blocks and collections of rows 09-0D of the A-zone, comprising cells 0900-0DFF. It gives both the name and the range of cells that comprise the block. The table also gives the number assigned to the collection that consists of the characters allocated to the block together with the additional characters at positions 200C and 200D. The collection name is the same as that of the block on which it is based.

Blocks and Collections of Rows 09-0D of the UCS
(collection = block + 200C + 200D; * = contains combining characters)

16* DEVANAGARI 0900-097F

17* BENGALI 0980-09FF

18* GURMUKHI 0A00-0A7F

19* GUJARATI 0A80-0AFF

20* ORIYA 0B00-0B7F

21* TAMIL 0B80-0BFF

22* TELUGU 0C00-0C7F

23* KANNADA 0C80-0CFF

24* MALAYALAM 0D00-0D7F

84* SINHALA
(added by Amd.21, hence the out-of sequence number)
0D80-0DFF

Blocks and Collections of Rows 09-0D of the UCS
(collection = block + 200C + 200D; * = contains combining characters)
16*	DEVANAGARI	0900-097F
17*	BENGALI	0980-09FF
18*	GURMUKHI	0A00-0A7F
19*	GUJARATI	0A80-0AFF
20*	ORIYA	0B00-0B7F
21*	TAMIL	0B80-0BFF
22*	TELUGU	0C00-0C7F
23*	KANNADA	0C80-0CFF
24*	MALAYALAM	0D00-0D7F
84*	SINHALA (added by Amd.21, hence the out-of sequence number)	0D80-0DFF

The remainder of the first 32 rows, namely rows 0E-1F, are either reserved or allocated to further scripts that correspond to collections on a one-to-one basis without additional characters. These are shown in the following table:

Blocks and Collections of Rows 0E-1F
(collection = block; * = contains combining characters)

25* THAI 0E00-0E7F

26* LAO 0E80-0EFF

72* BASIC TIBETAN
(added by Amd.6, hence the out-of sequence number)
0F00-0FBF

(Reserved for future standardization) 0FC0-109F

28 GEORGIAN EXTENDED
(note that the collection number is out of sequence)
10A0-10CF

27 BASIC GEORGIAN 10D0-10FF

29 HANGUL JAMO 1100-11FF

73 ETHIOPIC
(added by Amd.10, hence the out-of sequence number)
1200-137F

(Reserved for future standardization) 1380-139F

75 CHEROKEE
(added by Amd.12, hence the out-of sequence number)
13A0-13FF

74 UNIFIED CANADIAN ABORIGINAL SYLLABICS
(added by Amd.11, hence the out-of sequence number)
1400-167F

82 OGHAM
(added by Amd.20, hence the out-of sequence number)
1680-169F

83 RUNIC
(added by Amd.19, hence the out-of sequence number)
16A0-16FF

87* BURMESE
(added by Amd.26, hence the out-of sequence number)
1700-177F

88* KHMER
(added by Amd.25, hence the out-of sequence number)
1780-17FF

(Reserved for future standardization) 1800-1DFF

30 LATIN EXTENDED ADDITIONAL
(one additional Latin character has been allocated to a previously reserved cell in this block by Amd.7.)
1E00-1EFF

31 GREEK EXTENDED 1F00-1FFF

Blocks and Collections of Rows 0E-1F
(collection = block; * = contains combining characters)
25*	THAI	0E00-0E7F
26*	LAO	0E80-0EFF
72*	BASIC TIBETAN (added by Amd.6, hence the out-of sequence number)	0F00-0FBF
	(Reserved for future standardization)	0FC0-109F
28	GEORGIAN EXTENDED (note that the collection number is out of sequence)	10A0-10CF
27	BASIC GEORGIAN	10D0-10FF
29	HANGUL JAMO	1100-11FF
73	ETHIOPIC (added by Amd.10, hence the out-of sequence number)	1200-137F
	(Reserved for future standardization)	1380-139F
75	CHEROKEE (added by Amd.12, hence the out-of sequence number)	13A0-13FF
74	UNIFIED CANADIAN ABORIGINAL SYLLABICS (added by Amd.11, hence the out-of sequence number)	1400-167F
82	OGHAM (added by Amd.20, hence the out-of sequence number)	1680-169F
83	RUNIC (added by Amd.19, hence the out-of sequence number)	16A0-16FF
87*	BURMESE (added by Amd.26, hence the out-of sequence number)	1700-177F
88*	KHMER (added by Amd.25, hence the out-of sequence number)	1780-17FF
	(Reserved for future standardization)	1800-1DFF
30	LATIN EXTENDED ADDITIONAL (one additional Latin character has been allocated to a previously reserved cell in this block by Amd.7.)	1E00-1EFF
31	GREEK EXTENDED	1F00-1FFF

The next eight rows of the A-zone contains symbols of various sorts and for various scripts, including technical and special purpose symbols. These take up rows 20-28 and they are followed by a further seven rows that are at present unallocated. This area of the A-zone is structured as follows:

Blocks and Collections of Rows 20-2F
(collection = block; † = contains combining characters)

32 GENERAL PUNCTUATION 2000-206F

33 SUPERSCRIPTS AND SUBSCRIPTS 2070-209F

34 CURRENCY SYMBOLS 20A0-20CF

35† COMBINING DIACRITICAL MARKS FOR SYMBOLS 20D0-20FF

36 LETTERLIKE SYMBOLS 2100-214F

37 NUMBER FORMS 2150-218F

38 ARROWS 2190-21FF

39 MATHEMATICAL OPERATORS 2200-22FF

40 MISCELLANEOUS TECHNICAL 2300-23FF

41 CONTROL PICTURES 2400-243F

42 OPTICAL CHARACTER RECOGNITION 2440-245F

43 ENCLOSED ALPHANUMERICS 2460-24FF

44 BOX DRAWING 2500-257F

45 BLOCK ELEMENTS 2580-259F

46 GEOMETRIC SHAPES 25A0-25FF

47 MISCELLANEOUS SYMBOLS 2600-26FF

48 DINGBATS 2700-27BF

(Reserved for future standardization) 27C0-27FF

80 BRAILLE PATTERNS
(added by Amd.16)
2800-28FF

(7 more rows reserved for future standardization) 2900-2FFF

Blocks and Collections of Rows 20-2F
(collection = block; † = contains combining characters)
32	GENERAL PUNCTUATION	2000-206F
33	SUPERSCRIPTS AND SUBSCRIPTS	2070-209F
34	CURRENCY SYMBOLS	20A0-20CF
35†	COMBINING DIACRITICAL MARKS FOR SYMBOLS	20D0-20FF
36	LETTERLIKE SYMBOLS	2100-214F
37	NUMBER FORMS	2150-218F
38	ARROWS	2190-21FF
39	MATHEMATICAL OPERATORS	2200-22FF
40	MISCELLANEOUS TECHNICAL	2300-23FF
41	CONTROL PICTURES	2400-243F
42	OPTICAL CHARACTER RECOGNITION	2440-245F
43	ENCLOSED ALPHANUMERICS	2460-24FF
44	BOX DRAWING	2500-257F
45	BLOCK ELEMENTS	2580-259F
46	GEOMETRIC SHAPES	25A0-25FF
47	MISCELLANEOUS SYMBOLS	2600-26FF
48	DINGBATS	2700-27BF
	(Reserved for future standardization)	27C0-27FF
80	BRAILLE PATTERNS (added by Amd.16)	2800-28FF
	(7 more rows reserved for future standardization)	2900-2FFF

The next 30 rows contain alphabetic scripts and symbols that are used by languages that also make use of ideographic scripts. The reference to CJK in the titles of some of the blocks of these rows is to unified Chinese/Japanese/Korean characters; see the section on ideographic scripts for more information. The blocks and collections of these rows are as follows:

Blocks and Collections of Rows 30-4D
(collection = block; * = contains combining characters)

49* CJK SYMBOLS AND PUNCTUATION 3000-303F

50* HIRAGANA 3040-309F

51 KATAKANA 30A0-30FF

52 BOPOMOFO 3100-312F

53 HANGUL COMPATIBILITY JAMO 3130-318F

54 CJK MISCELLANEOUS 3190-319F

55 ENCLOSED CJK LETTERS AND MONTHS 3200-32FF

56 CJK COMPATIBILITY 3300-33FF

81 CJK UNIFIED IDEOGRAPHS EXTENSION A
(Amd.17)
3400-4DBF

(Reserved for future standardization) 4DC0-4DFF

Blocks and Collections of Rows 30-4D
(collection = block; * = contains combining characters)
49*	CJK SYMBOLS AND PUNCTUATION	3000-303F
50*	HIRAGANA	3040-309F
51	KATAKANA	30A0-30FF
52	BOPOMOFO	3100-312F
53	HANGUL COMPATIBILITY JAMO	3130-318F
54	CJK MISCELLANEOUS	3190-319F
55	ENCLOSED CJK LETTERS AND MONTHS	3200-32FF
56	CJK COMPATIBILITY	3300-33FF
81	CJK UNIFIED IDEOGRAPHS EXTENSION A (Amd.17)	3400-4DBF
	(Reserved for future standardization)	4DC0-4DFF

The CJK COMPATIBILITY block includes many symbols for scientific units that have been coded in Chinese national standards as if they were ideographs. Examples, together with their coding, are

mm³ (cubic millimetres): SQUARE MM CUBED (coded at 33A3)
µs (microsecond): SQUARE MU S (coded at 33B2)
rad/s² (radians per second per second, a unit of angular acceleration): SQUARE RAD OVER S SQUARED (coded at 33AF)

The last 26 rows 34-4D of the A-Zone, now contain CJK Unified Ideographs Extension A (Amendment 17). However, these rows were allocated in the first edition of ISO/IEC 10646-1 to the Hangul syllabic script, divided into three blocks and corresponding collections numbered 57-59. Amendment 5 to this first edition deleted these allocations and created instead an allocation for a substantially larger set of Hangul syllabic characters in the O-zone. This was accepted as a violation of the principle that published allocations would not be changed, but there were compelling reasons to adopt this change. It will not be taken as a precedent for future changes of a similar nature.

Unified ideographs of the I-zone

The I-zone of the BMP is allocated as a single block to Chinese/Japanese/Korean unified ideographs, and it correspondingly forms a single collection. For completeness this is shown in the following table:

The one Block and Collection of the I-zone
60 CJK UNIFIED IDEOGRAPHS 4E00-9FFF

The one Block and Collection of the I-zone
60	CJK UNIFIED IDEOGRAPHS	4E00-9FFF

An informative annex S has been added to ISO/IEC 10646-1 by Amendment 8 which describes the unification procedure. This section of the guide is based on that annex.

The I-zone contains 20992 code positions, of which 20902 are currently allocated to specific ideographs. These ideographs were derived from over 54000 ideographs which are found in various different national and regional standards for coded character sets. A process of unification was applied in which single ideographs from two or more of the source standards were associated together and assigned to a single code position in the I-zone. The ideographs that are thus associated are described, for the purposes of the UCS, as unified. To preserve data integrity, any ideographs that are separately encoded in any one of the source standards were not unified. Also ideographs that are unrelated in historical derivation are not unified. However, some ideographs encoded in two different standards for the same language may have been unified.

The unification process is based on the shapes of the ideographs, analyzed according to a systematic procedure. Any ideograph is composed of geometric elements which may themselves be composite structures and possibly ideographs in their own right. This enables the structure of an ideograph to be described by a component tree, where the top node is the ideograph itself and the bottom nodes are primitive elements. When two ideographs are compared, their component trees are compared to see if they agree in all of the following aspects:

the number of components;
the relative position of the components in each complete ideograph;
the structure of the corresponding components.

If all of these aspects agree then the ideographs are considered to have the same abstract shape and are therefore unified. Annex S to ISO/IEC 10646-1 contains a listing of pairs or triples of ideographs that would have been unified under these rules except for the criteria concerning historical derivation or separate encoding in an existing standard.

Unified ideographs are named and listed in the code pages of ISO/IEC 10646-1 in a manner separate from that used for other scripts. For each unified ideograph, the listing reproduces all (which may only be one) of the graphic symbols (source ideographs) that have been unified into that code position. For each graphic symbol it specifies the source standard from which the graphic symbol is taken and the coded representation of the symbol in that standard. The name assigned to each unified ideograph is algorithmically generated by appending their two-octet coded representation to "CJK UNIFIED IDEOGRAPH-", for example CJK UNIFIED IDEOGRAPH-4E00.

The information concerning CJK united ideographs has now been replaced by Amd.13.

The Hangul syllabics of the O-zone and Yi

Amendment 5 to the first edition of ISO/IEC 10646-1 specified a change in the encoding of Hangul syllabic script. Prior to that Amendment, the last 26 rows of the A-zone (row numbers 34-4D) were allocated to the Hangul syllabic script and the entire O-zone was reserved for future standardization. Due to a major revision of the corresponding Korean national standard shortly after the final text of the first edition was agreed, it became necessary to accommodate substantially more syllabic characters into the UCS. To include these additional characters, the total space required would be almost 44 rows.

It was decided that this was sufficient of an exceptional circumstance to merit violating the principle that code positions, once allocated, should not be changed. The Hangul syllabic characters already encoded would be moved from the A-zone to the I-zone, where there was sufficient space to include both the original and the additional characters in a single block, with a corresponding single collection. The amendment contains the statement that this change is not intended to be regarded as a precedent for other changes of allocation in future editions. This statement will itself be incorporated into future editions.

Amendment 14 has added the syllables and radicals of the Yi script to the O-Zone.

Following these amendments, the O-zone has the structure shown in the following table:

The Blocks and Collections of the O-zone
76 YI SYLLABLES A000-A48F

77 YI RADICALS A490-A4CF

(Reserved for future standardization) A4D0-ABFF

71 HANGUL EXTENDED AC00-D7A3

(Reserved for future standardization) D7A4-D7FF

The Blocks and Collections of the O-zone
76	YI SYLLABLES	A000-A48F
77	YI RADICALS	A490-A4CF
	(Reserved for future standardization)	A4D0-ABFF
71	HANGUL EXTENDED	AC00-D7A3
	(Reserved for future standardization)	D7A4-D7FF

Amendment 5 contains a mapping table giving the correspondence between the code positions before and after this amendment for the characters originally allocated to rows 34-4D.

The Hangul syllabic characters are assigned names that follow the naming rules used for alphabetic scripts, e.g. HANGUL SYLLABLE GEOLH (KEOLH) rather than the algorithmic name structure used for the CJK unified ideographs of the O-zone.

The restricted use R-zone

The R-zone is distinguished from the remainder of the BMP in that its code positions are allocated for use only in special circumstances. There are three distinct uses for the R-zone:

Private use characters: These may be specific user-defined characters or may be dynamically-redefinable characters. In either case an agreement is necessary between sender and recipient, outside the scope of ISO/IEC 10646, if these are to be exchanged meaningfully between two communicating parties.
Presentation forms of characters: A presentation form is an alternative form, for use in a particular context, to the nominal form of a character or sequence of characters from the other zones of graphic characters. The transformation from the nominal form to the presentation forms may involve substitution, superimposition or combination. The rules for such transformations are outside the scope of ISO/IEC 10646.
Presentation forms are not normally intended to be used as a substitute for the nominal forms, but specific applications may use them in this way for particular purposes such as compatibility with existing devices.
The specification of presentation forms within ISO/IEC 10646, an example of which is LATIN SMALL LIGATURE FI at code position FB01, blurs the distinction between characters and glyphs discussed elsewhere in this guide.
Compatibility characters: Compatibility characters are included in the UCS primarily for compatibility with existing coded character sets to allow two-way code conversion without loss of information.

As with the other zones, it is divided into blocks and collections but the block for private use consists, by its very nature, only of unallocated code positions. The structure of this zone is as follows:

The Blocks and Collections of the R-zone
(collection = block; *,† = contains combining characters)

61 PRIVATE USE AREA E000-F8FF

62 CJK COMPATIBILITY IDEOGRAPHS F900-FAFF

63* ALPHABETIC PRESENTATION FORMS FB00-FB4F

64 ARABIC PRESENTATION FORMS-A FB50-FDFF

(Reserved for future standardization) FE00-FE1F

65† COMBINING HALF MARKS FE20-FE2F

66 CJK COMPATIBILITY FORMS FE30-FE4F

67 SMALL FORM VARIANTS FE50-FE6F

68 ARABIC PRESENTATION FORMS-B FE70-FEFE

(The single character at code position FEFF is not in any of the blocks into which the BMP is divided. Its significance is explained in the chapter of this guide on Serial Transmission of the UCS) FEFF

69 HALFWIDTH AND FULLWIDTH FORMS FF00-FFEF

70 SPECIALS FFF0-FFFD

The Blocks and Collections of the R-zone
(collection = block; *,† = contains combining characters)
61	PRIVATE USE AREA	E000-F8FF
62	CJK COMPATIBILITY IDEOGRAPHS	F900-FAFF
63*	ALPHABETIC PRESENTATION FORMS	FB00-FB4F
64	ARABIC PRESENTATION FORMS-A	FB50-FDFF
	(Reserved for future standardization)	FE00-FE1F
65†	COMBINING HALF MARKS	FE20-FE2F
66	CJK COMPATIBILITY FORMS	FE30-FE4F
67	SMALL FORM VARIANTS	FE50-FE6F
68	ARABIC PRESENTATION FORMS-B	FE70-FEFE
	(The single character at code position FEFF is not in any of the blocks into which the BMP is divided. Its significance is explained in the chapter of this guide on Serial Transmission of the UCS)	FEFF
69	HALFWIDTH AND FULLWIDTH FORMS	FF00-FFEF
70	SPECIALS	FFF0-FFFD

Recall that the final two positions FFFE, FFFF are required to be left unused in every plane of the UCS. The collection numbered 200 is one of a number of special-purpose collections that have been assigned numbers in the range 200-299. See the chapter of this guide on repertoires and subsets for more information.

Top of UCS Guide