SC22/WG20 N624From: Miles Ellis [Miles.Ellis@etrc.ox.ac.uk]
Sent: Monday, November 16, 1998 1:22 PM
To: Winkler, Arnold F (WG20)
Subject: Re: I18N in Fortran
I am sorry for the delay in getting back to you.
At our meeting in Trollhättan in June, WG5 spent a substantial amount of time discussing the exact nature of the i18n support that we should put in Fortran. Interestingly, the English-speaking members were much more radical in their suggestions than those who would actually be affected by any i18n features!
We finally adopted a paper which had been developed in subgroup as our position, and forwarded this to J3, who are the primary (technical) development body for Fortran 2000. It reads as follows:
The subgroup was of the unanimous opinion that the aim should be that a standard-conforming program should continue to compile when moved from one cultural environment to another, but that it should be possible to write it so that it can handle data of different cultures. This should be explained in the Introduction to the Standard.
To achieve the first aim, the source form should be as for Fortran 95 in these respects:
(a) characters of non-default kind should be permitted only in character contexts and in comments,
(b) the decimal point in a real or complex constant should be a period, and
(c) quotation marks should be as for Fortran 95.
To achieve the second aim, at least the following should be added:
(a) a mechanism to specify the kind value for the ISO 10646 character set on those processors that support it,
(b) a means to alter formatted, namelist, and list-directed i/o to use a comma as the decimal point. The subgroup proposes a new io-control-spec for a READ or WRITE statement:
DECIMAL = scalar-default-char-exp
where the scalar-default-char-exp shall evaluate to COMMA or POINT. This shall control the decimal point for i/o by the statement, with any comma separators replaced by semicolons when commas are used for decimal points. The default in the absence of a DECIMAL io-control-spec shall be POINT. Also, there shall be a new edit descriptor for formatted i/o:
decimal-edit-desc is DC
which will change the representation of decimal points until another decimal-edit-desc is encountered.
Let me elaborate on this slightly.
The first aim is the one over which I had several arguments with Alain, in particular, when I was a member of WG20. The Fortran community believes that the highest priority must be attached to ensuring that source text is completely portable across platforms and cultures. Our Japanese colleagues, for example, simply do not want to write programs which cannot be given, sold, or otherwise transferred to non-Japanese computer systems if the programs require ANY alteration to make such a transfer. This is what we mean by portability!!!!!
We therefore rejected the use of any new and/or extra characters in identifiers or Fortran keywords.
Note that Fortran already allows any characters which are capable of being represented on the processor to be used in comments and character strings, and has done since long before anyone had even thought of the acronym i18n! If a programmer wishes to explain the code in his/her own language using any characters available then it may not print correctly on another computer system, but since the coment will be ignored by the Fortran processor anyway, who cares? Similarly, if a character data string contains characters that are not understood by the processor it doesn't really matter since the worst that can happen is that an unintelligible message may be printed out. And waht's new about that? (;-)
However, we did wish to move as far as seemed reasonable in the direction of adopting cultural conventions other than the Anglo/American one, and two areas where we thought we could take action were in the ready use of ISO 10646 and in the use of commas as decimal separators. Of course we already can use ISO 10646 as another KIND of character, but the KIND feature is not particularly portable as there is no standard definition of what KIND number represent what character repertoire. The only code that is definitively identified in Fortran is 7-bit ASCII (though not as a KIND), and I think the proposal is that there should be a simple way of stating that ISO 10646 is the default character set.
We did come up with a number of other proposals during the meeting, mainly in an attempt to salve the consciences of those of us who don't need any of this stuff for our own programming, but the non-English speakers did not want them!
We also discussed "culturally correct sorting", as per WG20, and decided that it would not be difficult to add as a module if it was really needed once there was a Standard, but that we could not justify adding anything until such a Standard was in existence. In any event, even those whose sorting algorithmes might be different from a straight character ordering system, using the internal code point ordering, could not summon up much enthusiasm for anything else, on the grounds that the applications that really need such culturally correct sorting are not written in Fortran.
So, to summarise:
1. WG5 does not intend to do anything that will require a program to ever require altering in order for it to run correctly on another platform; this means that only a common subset of characters will ever be used for keywords and identifiers, and that any given character will always have the same meaning in these contexts.
2. WG5 intends to make it easy for a program to reverse the interpretation of comas and periods in numeric data fields, to accord with common practice in the user's environment.
3. WG5 intends to provide a standard means of specifying that a character variable or constant is of the KIND corresponding to ISO 10646.
We believe that, at this point in time, this meets all the user requirements for i18n in Fortran.
(In other words, it way not be "politically correct" but it's what the users want!)
Dr Miles Ellis
Director: Educational Technology Resources Centre
University of Oxford, 37 Wellington Square, Oxford OX1 2JF, ENGLAND
Telephone: +44 1865 270528 Fax: +44 1865 270527