Document number: SC22/WG14 N780 (J11 97-144) Title: POSIX Alignment Author: Keld Simonsen Author affiliation: DKUUG Postal address: Fruebjergvej 3, DK-2100 Kbenhavn Email address: keld@dkuug.dk Telephone number: +45 3122-6543 Fax number: +45 3325-6543 Sponsor: DS Date: 1997-09-28 Proposal category: __ Editorial change/non-normative contribution XX Correction XX New feature __ Addition to obsolescent feature list __ Other (please specify) Area of standard affected: XX Environment XX Language __ Preprocessor XX Library XX Macro/typedef/tag name XX Function XX Header __ Other (please specify) Prior art: ISO/IEC 9945 POSIX standards Target audience: general Related documents: N431 (Rationale and analysis), N507, N538, N586, N658, N665 Proposal attached: proposal paper Review committee: Keld Simonsen, Rex Jaeschke, Doug Gwyn, Frank Farance, Clive Feather Status: stage 3, principally agreed Abstract: The paper gives proposals for alignment of C9X with the POSIX standards wrt. internationalization features. Introduction This paper details changes to the C standard to align it with POSIX System API (C language) (POSIX-1) and ISO/IEC 9945-2:1993 POSIX Shell and Utilities (POSIX-2). It does not cover newer proposals for POSIX or other related specifications, that are not yet international standards. This document builds on N431, which gave an overview of internationalisation in C and POSIX standards, a comparison of the functionality and features provided, and also mentioned other incompatibilities between C and POSIX standards. Thus N431 gave the background and rationale for the proposed changes, and it was decided in the Copenhagen meeting to do further work based on N431. The paper here describes in detail what those changes should be. 7.3.1 Character testing functions POSIX-2 adds in its section 2.5.2 a class "blank", consisting initially of the characters and . We should support this class be adding the function isblank() (as well as iswblank) that is similar to the isspace() function except that the test is for a standard blank character, and the characters covered initially only are space (' ') and horizontal tab ('\t'). -------- Start New Section ---------- 7.3.1.3 The isblank function Synopsis [1] #include int isblank(int c); Description: [2] The isblank function tests for any character that is a standard blank character or is one of an implementation-defined set of characters, for which isalnum is false. The standard blank characters are the following: space (' '), and horizontal tab ('\t'). In the "C" locale, isblank returns true only for the standard blank characters. -------- End New Section ---------- -------- Start New Section ---------- 7.17.2.1.3 The iswblank function Synopsis [1] #include int iswblank(wint_t c); Description: [2] The iswblank function tests for any wide character that is a standard blank wide character or is one of an implementation-defined set of wide characters, for which iswalnum is false. The standard blank wide characters are the following: space (L' '), and horizontal tab (L'\t'). In the "C" locale, iswblank returns true only for the standard blank wide characters. -------- End New Section ---------- POSIX has this table in the standard, so it is proposed to add this table at the end of clause 7.3 -------- Start New Section ---------- Table 1: Valid Character Class Combinations In Can also belong to Class upper lower alpha digit space cntrl punct graph print xdigit blank upper + + A x x x x A A + x lower + + A x x x x A A + x alpha + + + x x x x A A + x digit x x x + x x x A A A x space x x x x + + * * * x + cntrl x x x x + + x x x x + punct x x x x + x + A A x + graph + + + + + x + + A + + print + + + + + x + + + + + xdigit + + + + x x x A A + x blank x x x x A + * * * x + NOTES: Note 1: Explanation of codes: A Automatically included; see text + Permitted x Mutually exclusive * See note 2 Note 2: The character, which is part of the space and blank class, cannot belong to punct or graph, but automatically shall belong to the print class. Other space or blank characters can be classified as punct, graph, and/or print. -------- End New Section ---------- 7.3.2 Character case mapping functions C has only an implicit statement on locale dependence for the case mapping functions, referring to isupper/islower. The locale dependence can be made explicit by adding text to the descriptions of to[w]upper() and to[w]lower(), as follows: -------- Start Changed Section ---------- 7.3.2.1 The tolower function Returns [#3] If the argument is a character for which isupper is true and there is a corresponding character ===>as specified by the current locale<=== for which islower is true, the tolower function returns the corresponding character; otherwise, the argument is returned unchanged. -------- End Changed Section ---------- -------- Start Changed Section ---------- 7.3.2.2 The toupper function Returns [#3] If the argument is a character for which islower is true and there is a corresponding character ===>as specified by the current locale<=== for which isupper is true, the toupper function returns the corresponding character; otherwise, the argument is returned unchanged. -------- End Changed Section ---------- -------- Start Changed Section ---------- 7.17.3.1.1 The towlower function Returns [#3] If the argument is a wide character for which iswupper is true and there is a corresponding wide character ===>as specified by the current locale<=== for which iswlower is true, the towlower function returns the corresponding wide character; otherwise, the argument is returned unchanged. -------- End Changed Section ---------- -------- Start Changed Section ---------- 7.17.3.1.2 The towupper function Returns [#3] If the argument is a wide character for which iswlower is true and there is a corresponding wide character ===>as specified by the current locale<=== for which iswupper is true, the towupper function returns the corresponding wide character; otherwise, the argument is returned unchanged. -------- End Changed Section ---------- 7.4 Localization The POSIX-2 standard was approved after adoption of the C standard, and it contains a format for specifying locales and accompanying charmaps. This is a valuable and standardized way of specifying locales and should be mentioned as a footnote, as follows: -------- Start Changed Section ---------- 7.5 Localization [#3] The macros defined are NULL (described in 7.1.6); and LC_ALL LC_ALL *Footnote* LC_COLLATE ... Footnote: POSIX-2 specifies locale and charmap formats that may be used to specify locales for C. -------- End Changed Section ---------- A reference to the POSIX-2 standard should be added to the informative bibliography. The entry is: ISO/IEC 9945-2:1993 Information technology - Portable Operating System Interface(POSIX) - Part 2: Shell and Utilities. 7.5.2.1 int_curr_symbol different from currency_symbol As there may be differences between the order of how local currency is written and how international currency is written, it is proposed to add the following members (none of which are part of the POSIX spec) to the lconv struct, as follows: -------- Start Changed Section ---------- 7.5 Localization [#2] ... char int_p_cs_precedes; /* CHAR_MAX */ char int_p_sep_by_space; /* CHAR_MAX */ char int_n_cs_precedes; /* CHAR_MAX */ char int_n_sep_by_space; /* CHAR_MAX */ char int_p_sign_posn; /* CHAR_MAX */ char int_n_sign_posn; /* CHAR_MAX */ -------- End Changed Section ---------- -------- Start Changed Section ---------- 7.5.2.1 The localeconv function [#3] ... char int_p_cs_precedes Set to 1 or 0 if the int_curr_symbol respectively precedes or succeeds the value for a nonnegative formatted monetary quantity. char int_p_sep_by_space Set to 1 or 0 if the int_curr_symbol respectively is or is not separated by a space from the value for a nonnegative formatted monetary quantity. char int_n_cs_precedes Set to 1 or 0 if the int_curr_symbol respectively precedes or succeeds the value for a negative formatted monetary quantity. char int_n_sep_by_space Set to 1 or 0 if the int_curr_symbol respectively is or is not separated by a space from the value for a negative formatted monetary quantity. char int_p_sign_posn Set to a value indicating the positioning of the positive_sign for a nonnegative formatted monetary quantity. char int_n_sign_posn Set to a value indicating the positioning of the negative_sign for a negative formatted monetary quantity. -------- End Changed Section ---------- In section 7.5.2.1 the examples need to be enhanced. There cannot be a point after ITL. Netherlands use a kind of small "f". Norway have at least a space between "kr" and the value. We need examples with all the new variables, int_p_cs_precedes etc. This is all done in the text below. -------- Start Changed Section ---------- 7.5.2.1 The localeconv function Examples [#8] The following table illustrates the rules which may well be used by five countries to format monetary quantities. Country Positive format Negative format International format Italy L.1.234 -L.1.234 ITL 1.234 Netherlands f 1.234,56 f -1.234,56 NLG 1.234,56 Norway kr 1.234,56 kr 1.234,56- NOK 1.234,56 Switzerland SFrs.1,234.56 SFrs.1,234.56C CHF 1,234.56 Finland 1.234,56 mk -1.234,56 mk FIM 1.234,56 [#9] For these five countries, the respective values for the monetary members of the structure returned by localeconv are: Italy Netherlands Norway Switzerland Finland int_curr_symbol "ITL " "NLG " "NOK " "CHF " "FIM " currency_symbol "L." "f" "kr" "SFrs." "mk" mon_decimal_point "" "," "," "." "," mon_thousands_sep "." "." "." "," "." mon_grouping "\3" "\3" "\3" "\3" "\3" positive_sign "" "" "" "" "" negative_sign "-" "-" "-" "C" "-" int_frac_digits 0 2 2 2 2 frac_digits 0 2 2 2 2 p_cs_precedes 1 1 1 1 0 p_sep_by_space 0 1 0 0 1 n_cs_precedes 1 1 1 1 0 n_sep_by_space 0 1 0 0 1 p_sign_posn 1 1 1 1 1 n_sign_posn 1 4 2 2 1 int_p_cs_precedes 1 1 1 1 1 int_p_sep_by_space 0 1 0 0 1 int_n_cs_precedes 1 1 1 1 1 int_n_sep_by_space 0 1 0 0 1 int_p_sign_posn 1 1 1 1 1 int_n_sign_posn 1 4 2 2 4 -------- End Changed Section ---------- 7.4.2.1 p_sep_by_space and n_sep_by_space POSIX has added a third possibility for a formatted monetary quantity, so now we have: No space separates the currency_symbol from the value. A space separates the symbol from the value. *New* A space separates the symbol and the value, if these entities are next to eachother. -------- Start Changed Section ---------- 7.5.2.1 The localeconv function [#3] ... char p_sep_by_space Set to 0 if no space separates the currency_symbol from the value for a nonnegative formatted monetary quantity; set to 1 if a space separates the symbol from the value; and set to 2 if a space separates the symbol and the value, if adjacent. char n_sep_by_space Set to 0 if no space separates the currency_symbol from the value for a negative formatted monetary quantity; set to 1 if a space separates the symbol from the value; and set to 2 if a space separates the symbol and the value, if adjacent. char int_p_sep_by_space Set to 0 if no space separates the int_curr_symbol from the value for a nonnegative formatted monetary quantity; set to 1 if a space separates the symbol from the value; and set to 2 if a space separates the symbol and the value, if adjacent. char int_n_sep_by_space Set to 0 if no space separates the int_curr_symbol from the value for a negative formatted monetary quantity; set to 1 if a space separates the symbol from the value; and set to 2 if a space separates the symbol and the value, if adjacent. -------- End Changed Section ---------- ---------------- added section in rationale ----------------- This section should go into the rationale A table giving example formats for the combinations of p_cs_precedes, p_sign_posn and p_sep_by_space is given below, given that the positive_sign is "+" and the currency_symbol is "$". p_sep_by_space 2 1 0 p_cs_precedes = 1 p_sign_posn = 0 ($ 1.25) ($ 1.25) ($1.25) p_sign_posn = 1 + $1.25 +$ 1.25 +$1.25 p_sign_posn = 2 $1.25 + $ 1.25+ $1.25+ p_sign_posn = 3 + $1.25 +$ 1.25 +$1.25 p_sign_posn = 4 $ +1.25 $+ 1.25 $+1.25 p_cs_precedes = 0 p_sign_posn = 0 (1.25 $) (1.25 $) (1.25$) p_sign_posn = 1 +1.25 $ +1.25 $ +1.25$ p_sign_posn = 2 1.25$ + 1.25 $+ 1.25$+ p_sign_posn = 3 1.25+ $ 1.25 +$ 1.25+$ p_sign_posn = 4 1.25$ + 1.25 $+ 1.25$+ ------------------------------------------------------------------