SC22/WG20 N871
 
     REPORT ON CHARACTER SET POLICY                        VERSION 1.1
                                         1998-10-12, corr. 2001-09-21
                                                      J. W. van Wingen

POLICY OF THE NETHERLANDS GOVERNMENT AND RELATED INSTITUTIONS REGARDING THE USE OF CODED CHARACTER SETS

This document illustrates the policy of the Netherlands Government and related institutions regarding the use of coded character sets. It consists of an extract from NEN 1888, General Personal Data - Definition, character sets and interchange formats, at present at the stage of approved committee draft, translated into English.

Only the material related to the use of character sets is included here. Any reference to normative issues is to be made to the original Dutch version of NEN 1888 (when finally approved). For the correctness of the translation only this author is responsible.

4 Character sets

4.1 The following character sets are being distinguished

4.1.1 Character set EDIFACT-level B (82 characters and SPACE) This set includes 26 capital letters and 26 small letters, 10 digits, SPACE and 20 special characters, and is specified in the "directories" attached to ISO/IEC 9735-1 (EDIFACT) as the UN/ECE level B character set.

4.1.2 Character set ASCII (94 characters and SPACE) This set includes 26 capital letters and 26 small letters, 10 digits, SPACE and 32 special characters, and is specified in ISO/IEC 646, International Reference Version (registered as ISO-IR 6). The set is a superset of EDIFACT Level B. For the 12 special characters not in 4.1.1, see Annex D.3.

4.1.3 Character set Latin1 (190 characters and SPACE) This set includes ASCII and additionally 96 characters, that are extra letters, letters with diacritics, which are required for writing correctly most Western European languages, next to special characters, as specified in ISO/IEC 8859-1 (registered as ISO-IR 100).

4.1.4 Character set Latin5 (190 characters and SPACE) This set includes ASCII and additionally 96 characters, that are extra letters, letters with diacritics, which are required for writing correctly most Western European languages (but not Icelandic) and Turkish, next to special characters, as specified in ISO/IEC 8859-9 (registered as ISO-IR 148).

4.1.5 Character set GBA (292 characters and SPACE) This set includes letters with which all European languages using Latin script can be written, including those of Latin1 and Latin5, 10 digits and a number of special characters. For the specification of the GBA set, see Annex C.

NOTE The GBA set contains the same letters and digits as specified in the Repertoire of ISO/IEC 6937, with the exclusion of the IJ and ij as single characters, and a number of special characters.

4.2 Coding of character sets

If for a character set of 4.1 a ISO coding system is applied, the following coding shall be used.

 
      EDIFACT-level B:      ISO/IEC 646, at 7-bit coding
                            ISO/IEC 8859-1 or 8859-9, at 8-bit coding
      ASCII:                ISO/IEC 646, at 7-bit coding
                            ISO/IEC 8859-1 or 8859-9, at 8-bit coding
      Latin1:               ISO/IEC 8859-1
      Latin5:               ISO/IEC 8859-9
      GBA:                  ISO/IEC 6937 or
                            ISO/IEC 10646-1

NOTES at GBA:

1. ISO/IEC 6937 codes characters with diacritic marks with two bytes, all others with one single byte. It is assumed that hardware and software are capable to treat such characters, as if these do not take more than a single position. The standard was intended for text communication where this aspect is not much of a problem. Use of the coding method of ISO/IEC 6937 in text processing on the contrary poses severe demands which may result in higher costs.

2. ISO/IEC 10646-1 offers a choice between uniform coding with two bytes (UCS-2) and mixed coding with one or two bytes (UTF-8); see Annex C.

3. The coding system "Unicode", maintained by the Unicode consortium includes the GBA set, having the same codes for its characters as ISO/IEC 10646-1.

 
     ANNEX   C                                             VERSION 1.6
                                                            1998-10-12
                                                       J. W. van Wingen

(This Annex is normative.)

THE GBA SET

C.1 Introduction

The characters that are permitted in the GBA system are specified in this Annex. There is a distinction between "repertoire" of a character set, and the actual coding of each member of the set. The repertoire specifies only the characters themselves, which we classify into LETTERS, DIGITS and SPECIAL CHARACTERS. In this Annex the repertoire is normative, all other data are just informative. What is part of the GBA set is indicated in the listing with "permitted", what is not with "not permitted".

Each character may be coded according to a particular method, uniformly for a given collection of data. It depends on the application selected. At transfer of data to a different application, the coding has to be (possibly) converted, even if both of these use the same characters. Which characters are permitted is specified in the repertoire selected. This is given simply by a list, in which each character is included with its standardised unique name, which is always written with capitals.

The GBA set is a repertoire which is a subset of that in ISO/IEC 6937. In the past the International Telecommunications Union (ITU) has also defined a subset, which has become known as CCITT T.61, or as the Teletex set. De GBA set has been derived from that. In the mean time, however, differences have crept in with the special characters, because ITU did not follow immediately the ISO developments, and GBA not those with ITU. Consequently, the GBA set cannot be considered any longer as being identical to the Teletex set.

The GBA repertoire may be coded in different ways.

  1. according to ISO/IEC 6937,
  2. with uniformly 2 bytes, according to ISO/IEC 10646-1 (this is called UCS-2).
  3. with 1 byte for the characters from ASCII, with 2 bytes for all others, according to ISO/IEC 10646-1 (this is called UTF-8). Thus this is also a "mixed coding" system. Between UCS-2 and UTF-8 coding there is a fixed relation, specified by an algorithm. (This is described in ISO/IEC 10646-1, Annex D.)

In the list (first letters, then digits, then special characters) in each line (that is for each character) there is given (columns):

Not all the information given will be required in daily practice. The data provided, however, may prevent unneeded searching in special cases.

C.2 Notes on the list (this is C.3 in the Dutch version)

In the list the following notes are referenced by "note", followed by the corresponding number.

  1. The Dutch IJ is being written (and coded) always as TWO letters, to prevent confusion. Once printed, the character IJ cannot be recognised anymore whether it represents one or two letters. This may cause problems at searching files for a name.
  2. The letters marked as "obsolete" were intended, either for an orthography now abolished (for Greenlandic), or for characters that once occurred on a keyboard no longer in production (Catalan, Afrikaans).
  3. The following characters have got in ISO/IEC 6937:1994 a code different from that in CCITT T.61, which has been withdrawn since. The GBA set is based on the old T.61 with respect to coding. This will remain so in the foreseeable future. One is strongly recommended to avoid use of these characters as much as possible, and to be alert on possible confusion between the present GBA coding and that in ASCII (and in 6937 and UTF-8).
4.            
5.                                                                        GBA  ASCII
6.           $$ &dollar SC03  DOLLAR SIGN                                  A4   24
7.           ## &num    SM01  NUMBER SIGN                                  A6   23
  1. In 6937 there is no CAPITAL ETH (and thus also not in the GBA set, but it is there in 10367 and in 10646). For capital letter the CAPITAL D WITH STROKE will serve, which has the same shape visually.
  2. The following character:
10.        
11.                                                                   GBA/6937
12.       $g &gcedil LG41  LATIN SMALL   LETTER G WITH CEDILLA          C267

was called in the past:

 
           LG11  LATIN SMALL   LETTER G WITH ACUTE

It corresponds with the following capital letter:

 
$G &Gcedil LG42  LATIN CAPITAL LETTER G WITH CEDILLA          CB47

At the revision of ISO 6937 (from 1983 to 1994) the name has been changed, but the coding has been left unmodified. Whilst all letters with CEDILLA have a coding that starts with CB, the small "g" has C2, as if it still had ACUTE. (In Latvian the CEDILLA is being printed, either as a comma upturned and above the "g", or as a diacritic with a strong likeness to an ACUTE accent.)

  1. Where in the tables in the UTF-8 column four dots occur (....), the meaning is that this character has to be coded with THREE bytes. In the GBA set only the OHM SIGN requires this. For this reason use of this character is discouraged. The code in UTF-8 is E284A6.
  2. At data entry of names in files one should take into account that several languages allow for typographical variants for representing the same character, which may cause confusion with people unaware of the fact. A reader in the Netherlands will have no problems in his language with deciding which is the right printing letter for a given letter in handwriting. With other languages the choice may be more difficult. The NNI has available detailed information regarding the characters in current use in European languages, and their relation to those in the GBA set.
 
 GBA repertoire
 
---------------------------------------------------------------------------
    LETTERS PERMITTED  IN GBA                          code in    10646-1
TRFO SID  NAME                                        TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
 a LA01  LATIN SMALL   LETTER A                         61   61   61 0061
 A LA02  LATIN CAPITAL LETTER A                         41   41   41 0041
 b LB01  LATIN SMALL   LETTER B                         62   62   62 0062
 B LB02  LATIN CAPITAL LETTER B                         42   42   42 0042
 c LC01  LATIN SMALL   LETTER C                         63   63   63 0063
 C LC02  LATIN CAPITAL LETTER C                         43   43   43 0043
 d LD01  LATIN SMALL   LETTER D                         64   64   64 0064
 D LD02  LATIN CAPITAL LETTER D                         44   44   44 0044
 e LE01  LATIN SMALL   LETTER E                         65   65   65 0065
 E LE02  LATIN CAPITAL LETTER E                         45   45   45 0045
 f LF01  LATIN SMALL   LETTER F                         66   66   66 0066
 F LF02  LATIN CAPITAL LETTER F                         46   46   46 0046
 g LG01  LATIN SMALL   LETTER G                         67   67   67 0067
 G LG02  LATIN CAPITAL LETTER G                         47   47   47 0047
 h LH01  LATIN SMALL   LETTER H                         68   68   68 0068
 H LH02  LATIN CAPITAL LETTER H                         48   48   48 0048
 i LI01  LATIN SMALL   LETTER I                         69   69   69 0069
 I LI02  LATIN CAPITAL LETTER I                         49   49   49 0049
 j LJ01  LATIN SMALL   LETTER J                         6A   6A   6A 006A
 J LJ02  LATIN CAPITAL LETTER J                         4A   4A   4A 004A
 k LK01  LATIN SMALL   LETTER K                         6B   6B   6B 006B
 K LK02  LATIN CAPITAL LETTER K                         4B   4B   4B 004B
 l LL01  LATIN SMALL   LETTER L                         6C   6C   6C 006C
 L LL02  LATIN CAPITAL LETTER L                         4C   4C   4C 004C
 m LM01  LATIN SMALL   LETTER M                         6D   6D   6D 006D
 M LM02  LATIN CAPITAL LETTER M                         4D   4D   4D 004D
 n LN01  LATIN SMALL   LETTER N                         6E   6E   6E 006E
 N LN02  LATIN CAPITAL LETTER N                         4E   4E   4E 004E
 o LO01  LATIN SMALL   LETTER O                         6F   6F   6F 006F
 O LO02  LATIN CAPITAL LETTER O                         4F   4F   4F 004F
 p LP01  LATIN SMALL   LETTER P                         70   70   70 0070
 P LP02  LATIN CAPITAL LETTER P                         50   50   50 0050
 q LQ01  LATIN SMALL   LETTER Q                         71   71   71 0071
 Q LQ02  LATIN CAPITAL LETTER Q                         51   51   51 0051
 r LR01  LATIN SMALL   LETTER R                         72   72   72 0072
 R LR02  LATIN CAPITAL LETTER R                         52   52   52 0052
 s LS01  LATIN SMALL   LETTER S                         73   73   73 0073
 S LS02  LATIN CAPITAL LETTER S                         53   53   53 0053
 t LT01  LATIN SMALL   LETTER T                         74   74   74 0074
 T LT02  LATIN CAPITAL LETTER T                         54   54   54 0054
 u LU01  LATIN SMALL   LETTER U                         75   75   75 0075
 U LU02  LATIN CAPITAL LETTER U                         55   55   55 0055
 v LV01  LATIN SMALL   LETTER V                         76   76   76 0076
 V LV02  LATIN CAPITAL LETTER V                         56   56   56 0056
 w LW01  LATIN SMALL   LETTER W                         77   77   77 0077
 W LW02  LATIN CAPITAL LETTER W                         57   57   57 0057
 x LX01  LATIN SMALL   LETTER X                         78   78   78 0078
 X LX02  LATIN CAPITAL LETTER X                         58   58   58 0058
 y LY01  LATIN SMALL   LETTER Y                         79   79   79 0079
 Y LY02  LATIN CAPITAL LETTER Y                         59   59   59 0059
 z LZ01  LATIN SMALL   LETTER Z                         7A   7A   7A 007A
 Z LZ02  LATIN CAPITAL LETTER Z                         5A   5A   5A 005A
---------------------------------------------------------------------------
 
---------------------------------------------------------------------------
   LETTERS PERMITTED IN GBA                           code in    10646-1
TRFO SID NAME                                         TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
/a LA11  LATIN SMALL   LETTER A WITH ACUTE            2F61 C261 C3A1 00E1
/A LA12  LATIN CAPITAL LETTER A WITH ACUTE            2F41 C241 C381 00C1
/c LC11  LATIN SMALL   LETTER C WITH ACUTE            2F63 C263 C487 0107
/C LC12  LATIN CAPITAL LETTER C WITH ACUTE            2F43 C243 C486 0106
/e LE11  LATIN SMALL   LETTER E WITH ACUTE            2F65 C265 C3A9 00E9
/E LE12  LATIN CAPITAL LETTER E WITH ACUTE            2F45 C245 C389 00C9
/i LI11  LATIN SMALL   LETTER I WITH ACUTE            2F69 C269 C3AD 00ED
/I LI12  LATIN CAPITAL LETTER I WITH ACUTE            2F49 C249 C38D 00CD
/l LL11  LATIN SMALL   LETTER L WITH ACUTE            2F6C C26C C4BA 013A
/L LL12  LATIN CAPITAL LETTER L WITH ACUTE            2F4C C24C C4B9 0139
/n LN11  LATIN SMALL   LETTER N WITH ACUTE            2F6E C26E C584 0144
/N LN12  LATIN CAPITAL LETTER N WITH ACUTE            2F4E C24E C583 0143
/o LO11  LATIN SMALL   LETTER O WITH ACUTE            2F6F C26F C3B3 00F3
/O LO12  LATIN CAPITAL LETTER O WITH ACUTE            2F4F C24F C393 00D3
/r LR11  LATIN SMALL   LETTER R WITH ACUTE            2F72 C272 C595 0155
/R LR12  LATIN CAPITAL LETTER R WITH ACUTE            2F52 C252 C594 0154
/s LS11  LATIN SMALL   LETTER S WITH ACUTE            2F73 C273 C59B 015B
/S LS12  LATIN CAPITAL LETTER S WITH ACUTE            2F53 C253 C59A 015A
/u LU11  LATIN SMALL   LETTER U WITH ACUTE            2F75 C275 C3BA 00FA
/U LU12  LATIN CAPITAL LETTER U WITH ACUTE            2F55 C255 C39A 00DA
/y LY11  LATIN SMALL   LETTER Y WITH ACUTE            2F79 C279 C3BD 00FD
/Y LY12  LATIN CAPITAL LETTER Y WITH ACUTE            2F59 C259 C39D 00DD
/z LZ11  LATIN SMALL   LETTER Z WITH ACUTE            2F7A C27A C5BA 017A
/Z LZ12  LATIN CAPITAL LETTER Z WITH ACUTE            2F5A C25A C5B9 0179
---------------------------------------------------------------------------
\a LA13  LATIN SMALL   LETTER A WITH GRAVE            5C61 C161 C3A0 00E0
\A LA14  LATIN CAPITAL LETTER A WITH GRAVE            5C41 C141 C380 00C0
\e LE13  LATIN SMALL   LETTER E WITH GRAVE            5C65 C165 C3A8 00E8
\E LE14  LATIN CAPITAL LETTER E WITH GRAVE            5C45 C145 C388 00C8
\i LI13  LATIN SMALL   LETTER I WITH GRAVE            5C69 C169 C3AC 00EC
\I LI14  LATIN CAPITAL LETTER I WITH GRAVE            5C49 C149 C38C 00CC
\o LO13  LATIN SMALL   LETTER O WITH GRAVE            5C6F C16F C3B2 00F2
\O LO14  LATIN CAPITAL LETTER O WITH GRAVE            5C4F C14F C392 00D2
\u LU13  LATIN SMALL   LETTER U WITH GRAVE            5C75 C175 C3B9 00F9
\U LU14  LATIN CAPITAL LETTER U WITH GRAVE            5C55 C155 C399 00D9
---------------------------------------------------------------------------
 
---------------------------------------------------------------------------
   LETTERS PERMITTED IN GBA                           code in    10646-1
TRFO SID NAME                                         TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
>a LA15  LATIN SMALL   LETTER A WITH CIRCUMFLEX       3E61 C361 C3A2 00E2
>A LA16  LATIN CAPITAL LETTER A WITH CIRCUMFLEX       3E41 C341 C382 00C2
>c LC15  LATIN SMALL   LETTER C WITH CIRCUMFLEX       3E63 C363 C489 0109
>C LC16  LATIN CAPITAL LETTER C WITH CIRCUMFLEX       3E43 C343 C488 0108
>e LE15  LATIN SMALL   LETTER E WITH CIRCUMFLEX       3E65 C365 C3AA 00EA
>E LE16  LATIN CAPITAL LETTER E WITH CIRCUMFLEX       3E45 C345 C38A 00CA
>g LG15  LATIN SMALL   LETTER G WITH CIRCUMFLEX       3E67 C367 C49D 011D
>G LG16  LATIN CAPITAL LETTER G WITH CIRCUMFLEX       3E47 C347 C49C 011C
>h LH15  LATIN SMALL   LETTER H WITH CIRCUMFLEX       3E68 C368 C4A5 0125
>H LH16  LATIN CAPITAL LETTER H WITH CIRCUMFLEX       3E48 C348 C4A4 0124
>i LI15  LATIN SMALL   LETTER I WITH CIRCUMFLEX       3E69 C369 C3AE 00EE
>I LI16  LATIN CAPITAL LETTER I WITH CIRCUMFLEX       3E49 C349 C38E 00CE
>j LJ15  LATIN SMALL   LETTER J WITH CIRCUMFLEX       3E6A C36A C4B5 0135
>J LJ16  LATIN CAPITAL LETTER J WITH CIRCUMFLEX       3E4A C34A C4B4 0134
>o LO15  LATIN SMALL   LETTER O WITH CIRCUMFLEX       3E6F C36F C3B4 00F4
>O LO16  LATIN CAPITAL LETTER O WITH CIRCUMFLEX       3E4F C34F C394 00D4
>s LS15  LATIN SMALL   LETTER S WITH CIRCUMFLEX       3E73 C373 C59D 015D
>S LS16  LATIN CAPITAL LETTER S WITH CIRCUMFLEX       3E53 C353 C59C 015C
>u LU15  LATIN SMALL   LETTER U WITH CIRCUMFLEX       3E75 C375 C3BB 00FB
>U LU16  LATIN CAPITAL LETTER U WITH CIRCUMFLEX       3E55 C355 C39B 00DB
>w LW15  LATIN SMALL   LETTER W WITH CIRCUMFLEX       3E77 C377 C5B5 0175
>W LW16  LATIN CAPITAL LETTER W WITH CIRCUMFLEX       3E57 C357 C5B4 0174
>y LY15  LATIN SMALL   LETTER Y WITH CIRCUMFLEX       3E79 C379 C5B7 0177
>Y LY16  LATIN CAPITAL LETTER Y WITH CIRCUMFLEX       3E59 C359 C5B6 0176
---------------------------------------------------------------------------
%a LA17  LATIN SMALL   LETTER A WITH DIAERESIS        2561 C861 C3A4 00E4
%A LA18  LATIN CAPITAL LETTER A WITH DIAERESIS        2541 C841 C384 00C4
%e LE17  LATIN SMALL   LETTER E WITH DIAERESIS        2565 C865 C3AB 00EB
%E LE18  LATIN CAPITAL LETTER E WITH DIAERESIS        2545 C845 C38B 00CB
%i LI17  LATIN SMALL   LETTER I WITH DIAERESIS        2569 C869 C3AF 00EF
%I LI18  LATIN CAPITAL LETTER I WITH DIAERESIS        2549 C849 C38F 00CF
%o LO17  LATIN SMALL   LETTER O WITH DIAERESIS        256F C86F C3B6 00F6
%O LO18  LATIN CAPITAL LETTER O WITH DIAERESIS        254F C84F C396 00D6
%u LU17  LATIN SMALL   LETTER U WITH DIAERESIS        2575 C875 C3BC 00FC
%U LU18  LATIN CAPITAL LETTER U WITH DIAERESIS        2555 C855 C39C 00DC
%y LY17  LATIN SMALL   LETTER Y WITH DIAERESIS        2579 C879 C3BF 00FF
%Y LY18  LATIN CAPITAL LETTER Y WITH DIAERESIS        2559 C859 C5B8 0178
---------------------------------------------------------------------------
~a LA19  LATIN SMALL   LETTER A WITH TILDE            7E61 C461 C3A3 00E3
~A LA20  LATIN CAPITAL LETTER A WITH TILDE            7E41 C441 C383 00C3
~n LN19  LATIN SMALL   LETTER N WITH TILDE            7E6E C46E C3B1 00F1
~N LN20  LATIN CAPITAL LETTER N WITH TILDE            7E4E C44E C391 00D1
~o LO19  LATIN SMALL   LETTER O WITH TILDE            7E6F C46F C3B5 00F5
~O LO20  LATIN CAPITAL LETTER O WITH TILDE            7E4F C44F C395 00D5
---------------------------------------------------------------------------
 
---------------------------------------------------------------------------
   LETTERS PERMITTED IN GBA                           code in    10646-1
TRFO SID NAME                                         TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
*c LC21  LATIN SMALL   LETTER C WITH CARON            2A63 CF63 C48D 010D
*C LC22  LATIN CAPITAL LETTER C WITH CARON            2A43 CF43 C48C 010C
*d LD21  LATIN SMALL   LETTER D WITH CARON            2A64 CF64 C48F 010F
*D LD22  LATIN CAPITAL LETTER D WITH CARON            2A44 CF44 C48E 010E
*e LE21  LATIN SMALL   LETTER E WITH CARON            2A65 CF65 C49B 011B
*E LE22  LATIN CAPITAL LETTER E WITH CARON            2A45 CF45 C49A 011A
*l LL21  LATIN SMALL   LETTER L WITH CARON            2A6C CF6C C4BE 013E
*L LL22  LATIN CAPITAL LETTER L WITH CARON            2A4C CF4C C4BD 013D
*n LN21  LATIN SMALL   LETTER N WITH CARON            2A6E CF6E C588 0148
*N LN22  LATIN CAPITAL LETTER N WITH CARON            2A4E CF4E C587 0147
*r LR21  LATIN SMALL   LETTER R WITH CARON            2A72 CF72 C599 0159
*R LR22  LATIN CAPITAL LETTER R WITH CARON            2A52 CF52 C598 0158
*s LS21  LATIN SMALL   LETTER S WITH CARON            2A73 CF73 C5A1 0161
*S LS22  LATIN CAPITAL LETTER S WITH CARON            2A53 CF53 C5A0 0160
*t LT21  LATIN SMALL   LETTER T WITH CARON            2A74 CF74 C5A5 0165
*T LT22  LATIN CAPITAL LETTER T WITH CARON            2A54 CF54 C5A4 0164
*z LZ21  LATIN SMALL   LETTER Z WITH CARON            2A7A CF7A C5BE 017E
*Z LZ22  LATIN CAPITAL LETTER Z WITH CARON            2A5A CF5A C5BD 017D
---------------------------------------------------------------------------
#a LA23  LATIN SMALL   LETTER A WITH BREVE            2361 C661 C483 0103
#A LA24  LATIN CAPITAL LETTER A WITH BREVE            2341 C641 C482 0102
#g LG23  LATIN SMALL   LETTER G WITH BREVE            2367 C667 C49F 011F
#G LG24  LATIN CAPITAL LETTER G WITH BREVE            2347 C647 C49E 011E
#u LU23  LATIN SMALL   LETTER U WITH BREVE            2375 C675 C5AD 016D
#U LU24  LATIN CAPITAL LETTER U WITH BREVE            2355 C655 C5AC 016C
---------------------------------------------------------------------------
+o LO25  LATIN SMALL   LETTER O WITH DOUBLE ACUTE     2B6F CD6F C591 0151
+O LO26  LATIN CAPITAL LETTER O WITH DOUBLE ACUTE     2B4F CD4F C590 0150
+u LU25  LATIN SMALL   LETTER U WITH DOUBLE ACUTE     2B75 CD75 C5B1 0171
+U LU26  LATIN CAPITAL LETTER U WITH DOUBLE ACUTE     2B55 CD55 C5B0 0170
---------------------------------------------------------------------------
@a LA27  LATIN SMALL   LETTER A WITH RING ABOVE       4061 CA61 C3A5 00E5
@A LA28  LATIN CAPITAL LETTER A WITH RING ABOVE       4041 CA41 C385 00C5
@u LU27  LATIN SMALL   LETTER U WITH RING ABOVE       4075 CA75 C5AF 016F
@U LU28  LATIN CAPITAL LETTER U WITH RING ABOVE       4055 CA55 C5AE 016E
---------------------------------------------------------------------------
@c LC29  LATIN SMALL   LETTER C WITH DOT ABOVE        4063 C763 C48B 010B
@C LC30  LATIN CAPITAL LETTER C WITH DOT ABOVE        4043 C743 C48A 010A
@e LE29  LATIN SMALL   LETTER E WITH DOT ABOVE        4065 C765 C497 0117
@E LE30  LATIN CAPITAL LETTER E WITH DOT ABOVE        4045 C745 C496 0116
@g LG29  LATIN SMALL   LETTER G WITH DOT ABOVE        4067 C767 C4A1 0121
@G LG30  LATIN CAPITAL LETTER G WITH DOT ABOVE        4047 C747 C4A0 0120
@I LI30  LATIN CAPITAL LETTER I WITH DOT ABOVE        4049 C749 C4B0 0130
@i LI61  LATIN SMALL   LETTER DOTLESS I               4069   F5 C4B1 0131
@z LZ29  LATIN SMALL   LETTER Z WITH DOT ABOVE        407A C77A C5BC 017C
@Z LZ30  LATIN CAPITAL LETTER Z WITH DOT ABOVE        405A C75A C5BB 017B
---------------------------------------------------------------------------
 
---------------------------------------------------------------------------
   LETTERS PERMITTED IN GBA                           code in    10646-1
TRFO SID NAME                                         TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
=a LA31  LATIN SMALL   LETTER A WITH MACRON           3D61 C561 C481 0101
=A LA32  LATIN CAPITAL LETTER A WITH MACRON           3D41 C541 C480 0100
=e LE31  LATIN SMALL   LETTER E WITH MACRON           3D65 C565 C493 0113
=E LE32  LATIN CAPITAL LETTER E WITH MACRON           3D45 C545 C492 0112
=i LI31  LATIN SMALL   LETTER I WITH MACRON           3D69 C569 C4AB 012B
=I LI32  LATIN CAPITAL LETTER I WITH MACRON           3D49 C549 C4AA 012A
=o LO31  LATIN SMALL   LETTER O WITH MACRON           3D6F C56F C58D 014D
=O LO32  LATIN CAPITAL LETTER O WITH MACRON           3D4F C54F C58C 014C
=u LU31  LATIN SMALL   LETTER U WITH MACRON           3D75 C575 C5AB 016B
=U LU32  LATIN CAPITAL LETTER U WITH MACRON           3D55 C555 C5AA 016A
---------------------------------------------------------------------------
=d LD61  LATIN SMALL   LETTER D WITH STROKE           3DF2   F2 C491 0111
=D LD62  LATIN CAPITAL LETTER D WITH STROKE           3DE2   E2 C490 0110
=h LH61  LATIN SMALL   LETTER H WITH STROKE           3DF4   F4 C4A7 0127
=H LH62  LATIN CAPITAL LETTER H WITH STROKE           3DE4   E4 C4A6 0126
=l LL61  LATIN SMALL   LETTER L WITH STROKE           3DF8   F8 C582 0142
=L LL62  LATIN CAPITAL LETTER L WITH STROKE           3DE8   E8 C581 0141
$o LO61  LATIN SMALL   LETTER O WITH STROKE           24F9   F9 C3B8 00F8
$O LO62  LATIN CAPITAL LETTER O WITH STROKE           24E9   E9 C398 00D8
=t LT61  LATIN SMALL   LETTER T WITH STROKE           3DFD   FD C5A7 0167
=T LT62  LATIN CAPITAL LETTER T WITH STROKE           3DED   ED C5A6 0166
---------------------------------------------------------------------------
$c LC41  LATIN SMALL   LETTER C WITH CEDILLA          2463 CB63 C3A7 00E7
$C LC42  LATIN CAPITAL LETTER C WITH CEDILLA          2443 CB43 C387 00C7
$g LG41  LATIN SMALL   LETTER G WITH CEDILLA  (note5) 2467 C267 C4A3 0123
$G LG42  LATIN CAPITAL LETTER G WITH CEDILLA          2447 CB47 C4A2 0122
$k LK41  LATIN SMALL   LETTER K WITH CEDILLA          246B CB6B C4B7 0137
$K LK42  LATIN CAPITAL LETTER K WITH CEDILLA          244B CB4B C4B6 0136
$l LL41  LATIN SMALL   LETTER L WITH CEDILLA          246C CB6C C4BC 013C
$L LL42  LATIN CAPITAL LETTER L WITH CEDILLA          244C CB4C C4BB 013B
$n LN41  LATIN SMALL   LETTER N WITH CEDILLA          246E CB6E C586 0146
$N LN42  LATIN CAPITAL LETTER N WITH CEDILLA          244E CB4E C585 0145
$r LR41  LATIN SMALL   LETTER R WITH CEDILLA          2472 CB72 C597 0157
$R LR42  LATIN CAPITAL LETTER R WITH CEDILLA          2452 CB52 C596 0156
$s LS41  LATIN SMALL   LETTER S WITH CEDILLA          2473 CB73 C59F 015F
$S LS42  LATIN CAPITAL LETTER S WITH CEDILLA          2453 CB53 C59E 015E
$t LT41  LATIN SMALL   LETTER T WITH CEDILLA          2474 CB74 C5A3 0163
$T LT42  LATIN CAPITAL LETTER T WITH CEDILLA          2454 CB54 C5A2 0162
---------------------------------------------------------------------------
$a LA43  LATIN SMALL   LETTER A WITH OGONEK           2461 CE61 C485 0105
$A LA44  LATIN CAPITAL LETTER A WITH OGONEK           2441 CE41 C484 0104
$e LE43  LATIN SMALL   LETTER E WITH OGONEK           2465 CE65 C499 0119
$E LE44  LATIN CAPITAL LETTER E WITH OGONEK           2445 CE45 C498 0118
$i LI43  LATIN SMALL   LETTER I WITH OGONEK           2469 CE69 C4AF 012F
$I LI44  LATIN CAPITAL LETTER I WITH OGONEK           2449 CE49 C4AE 012E
$u LU43  LATIN SMALL   LETTER U WITH OGONEK           2475 CE75 C5B3 0173
$U LU44  LATIN CAPITAL LETTER U WITH OGONEK           2455 CE55 C5B2 0172
---------------------------------------------------------------------------
 
---------------------------------------------------------------------------
   LETTERS PERMITTED IN GBA                           code in    10646-1
TRFO SID NAME                                         TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
&a LA51  LATIN SMALL   LETTER AE                      26F1   F1 C3A6 00E6
&A LA52  LATIN CAPITAL LETTER AE                      26E1   E1 C386 00C6
&o LO51  LATIN SMALL   LIGATURE O E                   26FA   FA C593 0153
&O LO52  LATIN CAPITAL LIGATURE O E                   26EA   EA C592 0152
&s LS61  LATIN SMALL   LETTER SHARP S (German)        26FB   FB C39F 00DF
&n LN61  LATIN SMALL   LETTER ENG (Sami)              26FE   FE C58B 014B
&N LN62  LATIN CAPITAL LETTER ENG (Sami)              26EE   EE C58A 014A
&d LD63  LATIN SMALL   LETTER ETH (Icelandic)         26F3   F3 C3B0 00F0
&D LD64  LATIN CAPITAL LETTER ETH (Icelandic) (note4) 26E3   .. C390 00D0
&t LT63  LATIN SMALL   LETTER THORN (Icelandic)       26FC   FC C3BE 00FE
&T LT64  LATIN CAPITAL LETTER THORN (Icelandic)       26EC   EC C39E 00DE
---------------------------------------------------------------------------
   LETTERS PERMITTED IN GBA, BUT OBSOLETE     (note2) code in    10646-1
TRFO SID NAME                                         TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
~i LI19 #LATIN SMALL   LETTER I WITH TILDE            7E69 C469 C4A9 0129
~I LI20 #LATIN CAPITAL LETTER I WITH TILDE            7E49 C449 C4A8 0128
~u LU19 #LATIN SMALL   LETTER U WITH TILDE            7E75 C475 C5A9 0169
~U LU20 #LATIN CAPITAL LETTER U WITH TILDE            7E55 C455 C5A8 0168
&k LK61 #LATIN SMALL   LETTER KRA (Greenlandic)       266B   F0 C4B8 0138
&l LL63 #LATIN SMALL   LETTER L WITH MIDDLE DOT       266C   F7 C580 0140
&L LL64 #LATIN CAPITAL LETTER L WITH MIDDLE DOT       264C   E7 C4BF 013F
=n LN63 #LATIN SMALL   LETTER N PRECEDED BY APOSTROPHE3D6E   EF C589 0149
---------------------------------------------------------------------------
   LETTERS NOT PERMITTED IN GBA, BUT IN 6937  (note1) code in    10646-1
TRFO SID NAME                                         TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
&i LI51  LATIN SMALL   LIGATURE I J                   26F6   F6 C433 0133
&I LI52  LATIN CAPITAL LIGATURE I J                   26E6   E6 C432 0132
---------------------------------------------------------------------------
   PERMITTED DIGITS                                   code in    10646-1
TRFO SID NAME                                         TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
 1 ND01  DIGIT ONE                                      31   31   31 0031
 2 ND02  DIGIT TWO                                      32   32   32 0032
 3 ND03  DIGIT THREE                                    33   33   33 0033
 4 ND04  DIGIT FOUR                                     34   34   34 0034
 5 ND05  DIGIT FIVE                                     35   35   35 0035
 6 ND06  DIGIT SIX                                      36   36   36 0036
 7 ND07  DIGIT SEVEN                                    37   37   37 0037
 8 ND08  DIGIT EIGHT                                    38   38   38 0038
 9 ND09  DIGIT NINE                                     39   39   39 0039
 0 ND10  DIGIT ZERO                                     30   30   30 0030
---------------------------------------------------------------------------
 
---------------------------------------------------------------------------
   SPECIAL CHARACTERS PERMITTED IN GBA                code in    10646-1
TRFO SID NAME                                         TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
@2 NS02  SUPERSCRIPT TWO                              4032   B2 C2B2 00B2
@3 NS03  SUPERSCRIPT THREE                            4033   B3 C2B3 00B3
---------------------------------------------------------------------------
_2 NF01  VULGAR FRACTION ONE HALF                     5F32   BD C2BD 00BD
_4 NF04  VULGAR FRACTION ONE QUARTER                  5F34   BC C2BC 00BC
_3 NF05  VULGAR FRACTION THREE QUARTERS               5F33   BE C2BE 00BE
---------------------------------------------------------------------------
++ SA01  PLUS SIGN                                    2B2B   2B   2B 002B
 < SA03  LESS-THAN SIGN                                 3C   3C   3C 003C
== SA04  EQUALS SIGN                                  3D3D   3D   3D 003D
>> SA05  GREATER-THAN SIGN                            3E3E   3E   3E 003E
_+ SA02  PLUS-MINUS SIGN                              5FB1   B1 C2B1 00B1
_: SA06  DIVISION SIGN                                5F3A   B8 C3B7 00F7
_* SA07  MULTIPLICATION SIGN                          5F2A   B4 C397 00D7
---------------------------------------------------------------------------
_f SC01  CURRENCY SIGN                                5F66   A8 C2A4 00A4
_L SC02  POUND SIGN                                   5F4C   A3 C2A3 00A3
$$ SC03  DOLLAR SIGN                          (note3) 2424   24   24 0024
_c SC04  CENT SIGN                                    5F63   A2 C2A2 00A2
_Y SC05  YEN SIGN                                     5F59   A5 C2A5 00A5
---------------------------------------------------------------------------
## SM01  NUMBER SIGN                          (note3) 2323   23   23 0023
%% SM02  PERCENT SIGN                                 2525   25   25 0025
&& SM03  AMPERSAND                                    2626   26   26 0026
** SM04  ASTERISK                                     2A2A   2A   2A 002A
@@ SM05  COMMERCIAL AT                                4040   40   40 0040
*( SM06  LEFT SQUARE BRACKET                          2A28   5B   5B 005B
*) SM08  RIGHT SQUARE BRACKET                         2A29   5D   5D 005D
 | SM13  VERTICAL LINE                                  7C   7C   7C 007C
_m SM17  MICRO SIGN                                   5F6D   B5 C2B5 00B5
_O SM18  OHM SIGN                             (note6) 5F4F   E0 .... 2126
@0 SM19  DEGREE SIGN                                  4030   B0 C2B0 00B0
_o SM20  MASCULINE ORDINAL INDICATOR                  5F6F   EB C2BA 00BA
_a SM21  FEMININE ORDINAL INDICATOR                   5F61   E3 C2AA 00AA
#S SM24  SECTION SIGN                                 2353   A7 C2A7 00A7
#P SM25  PILCROW SIGN                                 2350   B6 C2B6 00B6
#. SM26  MIDDLE DOT                                   233A   B7 C2B7 00B7
---------------------------------------------------------------------------
SP SP01  SPACE                                          20   20   20 0020
 ! SP02  EXCLAMATION MARK                               21   21   21 0021
*! SP03  INVERTED EXCLAMATION MARK                    2321   A1 C2A1 00A1
 " SP04  QUOTATION MARK                                 22   22   22 0022
 ' SP05  APOSTROPHE                                     27   27   27 0027
 ( SP06  LEFT PARENTHESIS                               28   28   28 0028
 ) SP07  RIGHT PARENTHESIS                              29   29   29 0029
 , SP08  COMMA                                          2C   2C   2C 002C
__ SP09  LOW LINE                                     5F5F   5F   5F 005F
 - SP10  HYPHEN-MINUS                                   2D   2D   2D 002D
 . SP11  FULL STOP                                      2E   2E   2E 002E
// SP12  SOLIDUS                                      2F2F   2F   2F 002F
 : SP13  COLON                                          3A   3A   3A 003A
 ; SP14  SEMICOLON                                      3B   3B   3B 003B
 ? SP15  QUESTION MARK                                  3F   3F   3F 003F
*? SP16  INVERTED QUESTION MARK                       2A3F   BF C2BF 00BF
*< SP17  LEFT-POINTING DOUBLE ANGLE QUOTATION MARK    2A3C   AB C2AB 00AB
*> SP18  RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK   2A3E   BB C2BB 00BB
---------------------------------------------------------------------------
 
---------------------------------------------------------------------------
   SPECIAL CHARACTERS NOT PERMITTED IN GBA, BUT ASCII code in    10646-1
TRFO SID NAME                                         TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
@\ SD13  GRAVE ACCENT                                 405C   60   60 0060
@> SD15  CIRCUMFLEX ACCENT                            403E   5E   5E 005E
~~ SD19  TILDE                                        7E7E   7E   7E 007E
---------------------------------------------------------------------------
\\ SM07  REVERSE SOLIDUS                              5C5C   5C   5C 005C
 { SM11  LEFT CURLY BRACKET                             7B   7B   7B 007B
 } SM14  RIGHT CURLY BRACKET                            7D   7D   7D 007D
---------------------------------------------------------------------------
   SPECIAL CHARACTERS NOT PERMITTED IN GBA, BUT 6937  code in    10646-1
TRFO SID NAME                                         TRFO 6937 UTF-8 UCS-2
---------------------------------------------------------------------------
@1 NS01  SUPERSCRIPT ONE                              4031   D1 C2B9 00B9
---------------------------------------------------------------------------
=1 NF18  VULGAR FRACTION ONE EIGHTH                   3D31   DC .... 215B
=3 NF19  VULGAR FRACTION THREE EIGHTHS                3D33   DD .... 215C
=5 NF20  VULGAR FRACTION FIVE EIGHTHS                 3D35   DE .... 215D
=7 NF21  VULGAR FRACTION SEVEN EIGHTHS                3D37   DF .... 215E
---------------------------------------------------------------------------
@/ SD11  ACUTE ACCENT                                 402F C220 C2B4 00B4
@% SD17  DIAERESIS                                    4025 C820 C2A8 00A8
@* SD21  CARON                                        402A CF20 C887 02C7
@# SD23  BREVE                                        4023 C620 C898 02D8
@" SD25  DOUBLE ACUTE ACCENT                          402B CD20 C89D 02DD
@0 SD27  RING ABOVE                                   4030 CA20 C89A 02DA
@. SD29  DOT ABOVE                                    402E C720 C899 02D9
@= SD31  MACRON                                       403D C520 C2AF 00AF
_) SD41  CEDILLA                                      5F29 CB20 C2B8 00B8
_( SD43  OGONEK                                       5F28 CE20 C89B 02DB
---------------------------------------------------------------------------
_- SM12  HORIZONTAL BAR                               5F2D   D0 .... 2015
_< SM30  LEFTWARDS ARROW                              5F3C   AC .... 2190
_> SM31  RIGHTWARDS ARROW                             5F3E   AE .... 2192
_A SM32  UPWARDS ARROW                                5F41   AD .... 2191
_V SM33  DOWNWARDS ARROW                              5F56   AF .... 2193
#c SM52  COPYRIGHT SIGN                               2363   D3 C2A9 00A9
#r SM53  REGISTERED SIGN                              2372   D2 C2AE 00AE
#t SM54  TRADE MARK SIGN                              2374   D4 .... 2122
*| SM65  BROKEN BAR                                   237C   D7 C2A6 00A6
 ^ SM66  NOT SIGN                                       D6   D6 C2AC 00AC
_J SM93  MUSIC NOTE       (EIGHTH  NOTE IN 10646)     5F4A   D5 .... 266A
---------------------------------------------------------------------------
@( SP19  LEFT SINGLE QUOTATION MARK                   4028   A9 .... 2018
@) SP20  RIGHT SINGLE QUOTATION MARK                  4029   B9 .... 2019
@{ SP21  LEFT DOUBLE QUOTATION MARK                   405B   AA .... 201C
@} SP22  RIGHT DOUBLE QUOTATION MARK                  405D   BA .... 201D
---------------------------------------------------------------------------
   SP31  NO-BREAK SPACE                                 ..   A0 C2A0 00A0
   SP32  SOFT HYPHEN                                    ..   FF C2AD 00AD
---------------------------------------------------------------------------
 
     ANNEX D (informative)                                 VERSION 1.5
                                                            1998-10-12

REPRESENTATION OF THE CHARACTERS OF THE GBA SET WITH THE CHARACTERS OF THE ASCII SET WITH LOSS OF INFORMATION

D.1 Introduction

Should one have in his application at his disposal only the 52 letters, the 10 digits and the 33 special characters of ASCII (in total 95), then submitted texts may be made manageable by replacing every character not available by one taken from ASCII. In this way different GBA characters are mapped on the same ASCII character, thus causing loss of information. This method is internationally known as "fall back".

The rules to be followed at this transformation are described in the following.

D.2 Rules

1. From letters carrying a diacritic mark (those that are coded in ISO/IEC 6937 with two bytes) the diacritic mark is removed (this is the first byte of the code).

2. The following letters of the so-called supplementary set in ISO/IEC 6937 (that are those coded with 1 byte) are modified as follows:

 
TRFO SGML  SID   Name                                 becomes from   to
@i &inodot LI61  LATIN SMALL   LETTER DOTLESS I             i  F5    69
=d &dstrok LD61  LATIN SMALL   LETTER D WITH STROKE         d  F2    64
=D &Dstrok LD62  LATIN CAPITAL LETTER D WITH STROKE         D  E2    44
=h &hstrok LH61  LATIN SMALL   LETTER H WITH STROKE         h  F4    68
=H &Hstrok LH62  LATIN CAPITAL LETTER H WITH STROKE         H  E4    48
=l &lstrok LL61  LATIN SMALL   LETTER L WITH STROKE         l  F8    6C
=L &Lstrok LL62  LATIN CAPITAL LETTER L WITH STROKE         L  E8    4C
$o &ostrok LO61  LATIN SMALL   LETTER O WITH STROKE         o  F9    6F
$O &Ostrok LO62  LATIN CAPITAL LETTER O WITH STROKE         O  E9    4F
=t &tstrok LT61  LATIN SMALL   LETTER T WITH STROKE         t  FD    64
=T &Tstrok LT62  LATIN CAPITAL LETTER T WITH STROKE         T  ED    44
&d ð    LD63  LATIN SMALL   LETTER ETH (Icelandic)       d  F3    64
&k &kgreen LK61  LATIN SMALL   LETTER KRA (Greenlandic)     q  F0    71

3. The following letters will be transformed into two characters:

 
&a æ  LA51  LATIN SMALL   LETTER AE                   ae  F1  6165
&A Æ  LA52  LATIN CAPITAL LETTER AE                   AE  E1  4145
&l &lmidot LL63  LATIN SMALL   LETTER L WITH MIDDLE DOT    l.  F7  6C2E
&L &Lmidot LL64  LATIN CAPITAL LETTER L WITH MIDDLE DOT    L.  E7  4C2E
&o œ  LO51  LATIN SMALL   LIGATURE O E                oe  FA  6F65
&O Œ  LO52  LATIN CAPITAL LIGATURE O E                OE  EA  4F45
&s ß  LS61  LATIN SMALL   LETTER SHARP S (German)     ss  FB  7373
&t þ  LT63  LATIN SMALL   LETTER THORN (Icelandic)    th  FC  7468
&T Þ  LT64  LATIN CAPITAL LETTER THORN (Icelandic)    TH  EC  5448
&n &eng    LN61  LATIN SMALL   LETTER ENG (Sami)           ng  FE  6E67
&N &ENG    LN62  LATIN CAPITAL LETTER ENG (Sami)           NG  EE  4E47
'n &napos  LN63  LATIN SMALL   LETTER N PRECEDED BY        'n  EF  276E
                                               APOSTROPHE

4. The following special characters will be transformed to 1 character:

 
@2 ²   NS02  SUPERSCRIPT TWO                            2  B2    32
@3 ³   NS03  SUPERSCRIPT THREE                          3  B3    33
_: ÷ SA06  DIVISION SIGN                              :  B8    3A
_* ×  SA07  MULTIPLICATION SIGN                        x  B4    78
_f ¤ SC01  CURRENCY SIGN                              *  A8    2A
_L £  SC02  POUND SIGN                                 L  A3    4C
$$ &dollar SC03 =DOLLAR SIGN                                ?  24    3F
_c ¢   SC04  CENT SIGN                                  c  A2    63
_Y ¥    SC05  YEN SIGN                                   Y  A5    59
## &num    SM01 =NUMBER SIGN                                ?  23    3F
@@ &commat SM05 =COMMERCIAL AT                              ?  40    3F
*( &lsqb   SM06 =LEFT SQUARE BRACKET                        (  5B    28
*) &rsqb   SM08 =RIGHT SQUARE BRACKET                       )  5D    29
 | &verbar SM13 =VERTICAL LINE                              ?  7C    3F
_m µ  SM17  MICRO SIGN                                 ?  B5    3F
@0 °    SM19  DEGREE SIGN                                o  B0    6F
_o º   SM20  MASCULINE ORDINAL INDICATOR                o  EB    6F
_a ª   SM21  FEMININE ORDINAL INDICATOR                 a  E3    61
#S §   SM24  SECTION SIGN                               ?  A7    3F
#P ¶   SM25  PILCROW SIGN                               ?  B6    3F
#. · SM26  MIDDLE DOT                                 .  B7    2E
*! ¡  SP03  INVERTED EXCLAMATION MARK                  !  A1    21
__ &lowbar SP09 =LOW LINE                                   -  5F    2D
*? ¿ SP16  INVERTED QUESTION MARK                     ?  BF    3F

5. The following special characters will be transformed to 2 characters:

 
*< «  SP17  LEFT-POINTING DOUBLE ANGLE QUOTATION MARK  << AB  3C3C
*> »  SP18  RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK >> BB  3E3E

6. The following special characters will be transformed to 3 characters:

 
_2 ½ NF01  VULGAR FRACTION ONE HALF                 1/2  BD  312F32
_4 ¼ NF04  VULGAR FRACTION ONE QUARTER              1/4  BC  312F34
_3 ¾ NF05  VULGAR FRACTION THREE QUARTERS           3/4  BE  332F34
_+ ± SA02  PLUS-MINUS SIGN                          +/-  B1  2B2F2D
_O &ohm    SM18  OHM SIGN                                 Ohm  E0  4F686D

7. All other characters remain unchanged.

D.3 Notes to the rules:

1. The lists of characters for transformation as given above only contain the characters from the GBA set, for others no rules are presented here. The characters to which a mapping is specified are those included in the Edifact Level B repertoire. This does not contain the following characters in ASCII: # $ @ \ [ ] ^ ` { | } ~ which would be transformed as well in as far as they occur in the GBA set. Should one remain in an ASCII enviroment, however, then it has little sense to transform these characters to others (indicated with = before the name of the character under D.2 sub 4).

2. The following characters in ISO/IEC 6937:1994 have got a code different from that in CCITT T.61, which has been withdrawn since. The GBA set is still based on the old T.61 with respect to coding. There are no plans to change these codes in the near future.

 
                                                       GBA ASCII
$$ &dollar SC03  DOLLAR SIGN                            A4  24
## &num    SM01  NUMBER SIGN                            A6  23

D.4 Example

As an example a French text is given here, together with its version after transformation with loss.

ORIGINAL

Composée en 1936, exécutée pour la première fois au cours du Festival de Venise de 1937, la musique de la Suite Provençale s'affirme, au c÷ur de l' ÷uvre immense de Darius Milhaud, comme l'une de ses réussites les plus accomplies. C'est Aix, sa ville natale et si tendrement aimée, avec ses hôtels historiques, son cours Mirabeau, ses fontaines dont l'eau ruisselle en jasant sous la fraiche voûte des platanes, tandis qu'au-delà des champs de vignes et sombres haies de cyprès, la courbe nette de la Sainte Victoire s'érige dans le bleu d'un ciel nimbé de grise vapeur que Cézanne a si bien fixé avec un amour égal à son souci de la plus stricte vérité.

AFTER TRANSFORMATION

Composee en 1936, executee pour la premiere fois au cours du Festival de Venise de 1937, la musique de la Suite Provencale s'affirme, au coeur de l' oeuvre immense de Darius Milhaud, comme l'une de ses reussites les plus accomplies. C'est Aix, sa ville natale et si tendrement aimee, avec ses hotels historiques, son cours Mirabeau, ses fontaines dont l'eau ruisselle en jasant sous la fraiche voute des platanes, tandis qu'au-dela des champs de vignes et sombres haies de cypres, la courbe nette de la Sainte Victoire s'erige dans le bleu d'un ciel nimbe de grise vapeur que Cezanne a si bien fixe avec un amour egal a son souci de la plus stricte verite.

 
     ANNEX E (informative)                                 VERSION 1.3
                                                            1998-10-12

REPRESENTATION OF THE CHARACTERS OF THE GBA SET WITH THE CHARACTERS OF THE ASCII SET WITHOUT LOSS OF INFORMATION

E.1 Introduction

Should one have in his application at his disposal only the 52 letters, the 10 digits and the 33 special characters of ASCII (in total 95), then submitted texts may be input without loss of information only if some characters will be transformed to more than one character taken from ASCII.

It is important here to follow a uniform transformation scheme that also can be performed by a simple program, that in addition allows that the original can be reconstructed again, and that makes the original text as readable as possible after the transformation.

With the following conventions good experience has been met. All GBA characters not in ASCII are turned into two ASCII characters, a special character followed by a letter, thus not causing loss of information. A text thus transformed can be typed in directly on an ASCII keyboard, or converted with a program from a text provided by the GBA system, that contains letters with diacritics. The text remains directly readable, if one knows the conventions for the diacritics, contrary to methods like uuencode, base64 or mime which require decypherment first.

E.2 Notation

For denotating accented letters and others not in the 26 Latin letter basic alphabet the following rules apply. Any such letter is written as a one basic letter preceded by one special character, chosen as:

 
    /  ACUTE
    \  GRAVE
    >  CIRCUMFLEX
    %  DIAERESIS
    ~  TILDE
    *  CARON
    #  BREVE
    +  DOUBLE ACUTE
    @  RING ABOVE or DOT ABOVE and DOTLESS I
    =  MACRON or STROKE (but this not on O)
    $  CEDILLA or OGONEK and O WITH STROKE (Danish etc.)
    &  LIGATURE or special form (AE OE ETH THORN ENG SHARP S)
    _  LOW LINE

In Annex C one may find for every non-ASCII character the transformed representation (TRFO).

If any of these specials occur in a text, that should not be included in the transformation, these must be replaced by TWO of that same character. Thus / is replaced by //, and % by %%.

E.3 Example

As an example a text in French is presented, with its transformed version.

ORIGINAL

Composée en 1936, exécutée pour la première fois au cours du Festival de Venise de 1937, la musique de la Suite Provençale s'affirme, au c÷ur de l' ÷uvre immense de Darius Milhaud, comme l'une de ses réussites les plus accomplies. C'est Aix, sa ville natale et si tendrement aimée, avec ses hôtels historiques, son cours Mirabeau, ses fontaines dont l'eau ruisselle en jasant sous la fraiche voûte des platanes, tandis qu'au-delà des champs de vignes et sombres haies de cyprès, la courbe nette de la Sainte Victoire s'érige dans le bleu d'un ciel nimbé de grise vapeur que Cézanne a si bien fixé avec un amour égal à son souci de la plus stricte vérité.

AFTER TRANSFORMATION

Compos/ee en 1936, ex/ecut/ee pour la premi\ere fois au cours du Festival de Venise de 1937, la musique de la Suite Proven$cale s'affirme, au c&our de l' &ouvre immense de Darius Milhaud, comme l'une de ses r/eussites les plus accomplies. C'est Aix, sa ville natale et si tendrement aim/ee, avec ses h>otels historiques, son cours Mirabeau, ses fontaines dont l'eau ruisselle en jasant sous la fraiche vo>ute des platanes, tandis qu'au-del\a des champs de vignes et sombres haies de cypr\es, la courbe nette de la Sainte Victoire s'/erige dans le bleu d'un ciel nimb/e de grise vapeur que C/ezanne a si bien fix/e avec un amour /egal \a son souci de la plus stricte v/erit/e.

 
     ANNEX F (informative)                                 VERSION 1.3
                                                            1998-10-12

REPRESENTATION OF CHARACTERS FROM NON-LATIN SCRIPTS WITH CHARACTERS FROM THE GBA SET (TRANSLITERATION AND TRANSCRIPTION)

F.1 Introduction

At handling of either documents or names in non-latin scripts either transliteration or transcription has to be applied, because it cannot be expected nor required from the official in charge that he is able to deal with other scripts than the Latin, as a general rule.

Transliteration is the transformation at which each character from the other script is converted to one or more characters from the Latin script, in a way that may be called mechanical, in principle without human intervention. Back transliteration is defined analogically, but sometimes a certain letter combination cannot be univocally reduced to a single letter, like "ue" does not need to have resulted from a letter with umlaut as in the German word "aktuell".

Transcription is the transformation at which the given text is converted to Latin script with keeping as much as possible to the original pronunciation. In this way it may happen that a given letter might result in a letter that may vary according to context, or may depend on the application of external knowledge (example: Ersjev (transliterated) becomes Jersjov (transcribed). Here the first "e" is changed to "je", but the second "e" to "o".

At applying these methods in practice, one has to distinguish documents from names. Documents are being handled in the civil service as original (thus in the script of origin) or in translation. Transliteration or transcription do not matter here. With names the case is different, because those have to be entered into the administrative systems, that only handle Latin script. For writing names to be included in the Population Register the Agreement of Berne (Trb. 1974 nr. 31) has to be followed, unless deviation is authorised by Circulaire from the Minister of Justice. This implies that names in Latin script remain unmodified from the reading (including diacritics) in the original as presented. Names in non-Latin script shall be kept untranslated, and rendered by a letter to letter conversion as strict as possible (transliteration). ISO standards, if existing, shall be applied at this transformation.

At evaluating these rules after several years, it appeared that at applying transliteration strictly, serious problems arise. Thus it has been decided, and announced by the Staatssecretaris of Justice by circulaire, that with a number of scripts deviation from the rules is permitted.

F.2 Greek script

The Latin reading of the name in a Greek passport shall be adhered to. Because several Greek letters and combinations of letters, like "ei", "oi", are being pronounced as "i", and transcribed that way, is back transcription without thorough knowledge of the Greek language not possible. Thus the original Greek spelling should have been stored too, but for this there is in general no provision.

Furthermore, it is the custom in Greece that non-Greek words will be transcribed to Greek script according to certain rules. Should these foreign words or names be transliterated back according to ISO 843 to Latin script, then strange results emerge.

 
      Don Giovanni         Nton Tziobani
      Dirk Bogarde         Nterk Mpougkarnt
      Van den Broek        Ban nten Mprouk

Thus the Latin spelling, presented with expertise by Greek authorities, shall be adhered to. This implies transcription. There is no way to restore "ntokimanter" to "documentaire" applying a fixed rule.

F.3 Cyrillic script

In former Jugoslavia ISO 9 for transliteration has never been applied. This has a 1 to 1 mapping. Usually some single Cyrillic letters will be transformed to two Latin ones, like lj and nj. This usage shall be followed.

With Russian there are problems, because transliteration does not reflect pronunciation. According to ISO 9, "Elcin" should have been written, and not "Jeltsin", like one sees normally. On the contrary, one finds "Potemkin" and not "Patjomkin", as the pronunciation would do expect.

With persons of German origin, who had their name written in Russia with Cyrillic characters, it is permitted at their return in the West to use again the original spelling of their name, if it can be proved how that was done in the past. Many Russian personal names are from foreign origin, and are often no longer recognizable as such (Kjoei-Cui, Katuar-Catoire, Metner-Medtner). A name like that of Grigori Shneerson can be reduced to Schneiersohn, but also Schneyersohn is possible.

F.4 Hebrew en Arabic script

Should ISO 259, respectively 233 be used, then letter / diacritic combinations may appear, not included in the GBA set.

F.5 General

The conclusion is, if the nation of origin provides a name, written in Latin script, then this name shall be adopted. Should uncertainty remain, then the judgement of an accredited translator shall be followed.