From taylor@limbo Tue Jan 14 11:24:33 1992 Received: from uucp-gw-1.pa.dec.com by dkuug.dk via EUnet with SMTP (5.64+/8+bit/IDA-1.2.8) id AA08055; Tue, 14 Jan 92 11:24:33 +0100 Received: by uucp-gw-1.pa.dec.com; id AA18234; Tue, 14 Jan 92 01:49:33 -0800 Received: by limbo; Mon, 13 Jan 92 23:34:36 pst Message-Id: <9201140734.AA01852@limbo.intuitive.com> Subject: Re: Support for symbolic character names To: wg14@dkuug.dk, i18n@dkuug.dk Date: Mon, 13 Jan 92 23:34:33 PST From: Dave Taylor Reply-To: Dave Taylor Organization: Intuitive Systems, Mountain View, California +1 (415) 966-1151 X-Mailer: Elm [version 2.02] X-Charset: ASCII X-Char-Esc: 29 Regarding the ideas about international character labelling, a few thoughts. First off, UniCode and MULTInational standards of that ilke solve this problem rather directly. If "UNICODE 540" is always 'o/' (a character I cannot duplicate here, alas), then it's always the same for everyone. For symbolic names, I suggest further that there be standard header files that define an English name for each character, so we might have something like: #define LOWER_O_SLASH 540 to define the character mnemonically. In my upcoming book "Global Software", I have extensive examples of this type of mnemonic approach for 8859-1 characters. It makes the code very clean, and also makes the collating / transliteration tables nice and obvious too. Indeed, one wonders why we don't just have everything defined that way anyway, so that regular C could contain tests like: if (ch == COLON || ch == EXCLAMATION_MARK || ch == ASTERISK) rather than the much less portable, and more cryptic tests like: if (ch == ':' || ch == '!' || ch == '*') The place that it's most problematical, btw, is when a programmer wants to compare a character to the single quote ASCII character, leading to an ugly that we've all seen: if (ch == ''') or if (ch == '\'') both of which are pretty sad solutions to the problem, really. In any case, I support the approach of having mnemonics, and note strongly that the key to having anything of this ilke work is to have the *mnemonic* already available on all the systems targeted. Perhaps a publicly available, and X/Open (etc) proposed standard on as a system include file? Perhaps it could be automatically included when is included in a program, even? Remember, it's not going to add one iota of code to the application, and modern day computers should be quite fast enough that even another few thousand preprocessor defines should be transparent on compiler performance. Before I leave this note, it is true that I suggest a set of mnemonics that are defined for English. Indeed, it's just as ethnocentric as all the original computer design that -- significantly -- got us into this mess in the first place. Mea culpa. But having the 'standard' mnemonics in English doesn't preclude localization teams from having their own application specific mapping of English to local language for within their code. Perhaps something like: #define ENYE LOWER_N_TILDE or similar (I know how to pronounce the Spanish name for the 'n' with a tilde, but don't know how to spell that word. My apologies!). Note that this would not only be just as portable as using the English (read "standard") mnemonics, but would offer the additional boon of being a localized definition that could, among other things, be globally replaced without any danger to the integrity of the code. -- Dave Taylor Intuitive Systems SunWorld Magazine Mountain View, CA San Francisco, CA taylor@intuitive.com taylor@netcom.com taylor@sunworld.com ps: can someone fiddle with the mail headers so we get a standard Reply-To: that points to the entire list? When I composed this message I must have spent five minutes trying to puzzle out the headers and addresses, and am still not sure it's right...