From keld@dkuug.dk Thu Jan 9 19:54:22 1992 Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8) id AA05626; Thu, 9 Jan 92 19:54:22 +0100 Date: Thu, 9 Jan 92 19:54:22 +0100 From: Keld J|rn Simonsen Message-Id: <9201091854.AA05626@dkuug.dk> To: LALOVIC%torolab5.vnet.ibm.com@xopen.co.uk, i18n@dkuug.dk, wg14@dkuug.dk Subject: Re: (XoJIG 420) (i18n.140) Re: (XoJIG 413) (i18n.135) X-Charset: ASCII X-Char-Esc: 29 > Re: (SC22WG14.165) > Re: support for symbolic character names > > > > > . . . text deleted > > > >The mechanism is to support portability of international C programs. > >Consider a text with my name in it, Keld J|rn Simonsen. > >The | looks fine on my terminal, but probably not on yours. > >It is a lowercase-o-with-stroke. If I want to write a portable > >program, which tests for this letter, then it is very difficult. > >It depends on the execution character set. In the one I am really > >using now (IBM865) it is one value, in ISO8859-1 it is another, > >and ind MacIntosh it is yet another. If I can use the localedef > >notation, I would just refer to the symbolic character name, > >and the execution locale will do the proper naming for me. > > There are two issues here: portability of source code, and > data integrity. I agree with Keld that source code portability > demands a standard way of de-referencing characters that > are not available in a particular environment. However, to > deal with the second issue a different approach is needed. > > Keld correctly observes that his scheme is in line with the > current practice in C (e.g. \t, \n), but the scheme goes one > step further. It assumes that symbolic character names will > be resolved during the execution, as opposed to compilation. All standard C compilers has a source character set and an execution character set. Of cause the symbolic character names must be dealt with at compilation time, and the compiler also deals with other aspects of the execution character set, when it translates character and string literals. The compiler should be able to know for which execution character set it is written. Maybe that is not clearly defined, but then we should address this general problem. > As far as I know C resolves symbolic character names during > compilation, thus the object code can only work correctly > if it is executed in the same environment in which the source > was compiled. As noted above, yse for the same execution character set that was known at compile time. > Similarly POSIX locale must be compiled by localedef utility > before it can be used, but the compilation produces an object > which is fixed to a particular code set (the one corresponding > to the charmap). Yes, but there may be many such charmaps or locales, one for each binding of the locale to a charmap. > The current practice, therefore, is not in line with Keld's > assumption that symbolic character names are resolved at > execution time. Due to performance issues, the current practice > is unlikely to change in the future, so other means of dealing > with data integrity must be considered. The proposal can be bound the the environment known at the compile time, for the strings which are also translated according to knowledge at compile time. The functions I do not see problems having bound at execution time. Future standards needs future products to be written, and there are certainly bigger items on the table than this one. > One possibility is to base the processing environment on UCS > (Universal Character Set) such as ISO 10646, and convert on > its boundaries. For example, if a display device does not have > all characters that appear in a text to be displayed, the text > can be converted such that unavailable characters are replaced > with Keld's symbolic names. The opposite conversion would apply > to keyboard entry, i.e. from symbolic names to UCS code points. Yes, that is another approach giving something similar in functionality. > > > >Keld > > | Milos Lalovic A3/979/895/TOR | Keld