Document number: SC22/WG14 N714 (J11 97-077) Title: POSIX Alignment Author: Keld Simonsen Author affiliation: DKUUG Postal address: Fruebjergvej 3, DK-2100 Kbenhavn Email address: keld@dkuug.dk Telephone number: +45 3122-6543 Fax number: +45 3325-6543 Sponsor: DS Date: 1997-06-08 Proposal category: __ Editorial change/non-normative contribution XX Correction XX New feature __ Addition to obsolescent feature list __ Other (please specify) Area of standard affected: XX Environment XX Language __ Preprocessor XX Library XX Macro/typedef/tag name XX Function XX Header __ Other (please specify) Prior art: ISO/IEC 9945 POSIX standards Target audience: general Related documents: N431 (Rationale and analysis), N507, N538, N586, N658, N665 Proposal attached: proposal paper Review committee: Keld Simonsen, Rex Jaeschke, Doug Gwyn, Frank Farance, Clive Feather Status: stage 3, principally agreed Abstract: The paper gives proposals for alignment of C9X with the POSIX standards wrt. internationalization features. Introduction This paper gives proposals for changes to the C standard to align it with the POSIX standards POSIX System API (C language) (POSIX-1), and ISO/IEC 9945-2:1993 POSIX Shell and Utilities (POSIX-2). It does not cover newer proposals for POSIX or other related specifications, that are not yet international standards. It builds on the document N431, which gave an overview of internationalisation in C and POSIX standards, a comparison of the functionality and features provided, and also mentioned other incompatibilities between C and POSIX standards. Thus N431 gave the background and rationale for the proposed changes, and it was decided in the Copenhagen meeting to do further work based on N431. The paper here descibes in detail what the changes should be. Frank Farance has taken an action item on a proposal that addresses non- internationalization oriented alignment with POSIX.1, such as described in clause 8 of ISO/IEC 9945-1. One reviewer recommended to introduce two new functions chtype() and ischtype() to match wctype() and iswctype(), but I have not included text for this as this would seem to be beyond what was agreed in principle by WG14. There is a separate paper for strftime() POSIX alignment. The following section numbers refer to the C9X Draft 8. Changes to N586 Changes to the N586 document are: LC_MESSAGES description with yesexpr and noexpr deleted Reserving a "std/" namespace for setlocale() deleted Deleted numerical thousands and decimal seperator being more than one character. Changes from N658 (1997-02-10) Changes to the N658 document are: lconv MAXINT==-1 clause deleted. POSIX.1 clause 8 alignment note deleted. some minor text added as discussed 1997-02-13. Class table added. *_sign_posn (value 5) changed to *_sep_by_space (value 2). Table of p_cs_precedes, p_sep_by_space and p_sign_posn added. Description of lconv struct entries for time added. 7.3.1 Character testing functions POSIX-2 adds in its section 2.5.2 a class "blank", consisting initially of the characters and . This character class should be added, possibly by adding a function isblank() that is similar to the isspace() function except that the test is for a standard blank character, and the characters covered initially only are space (' ') and horizontal tab ('\t'). Similary a function iswblank() should be added. Text for 7.3.1.12: "7.3.1.12 the isblank function Synopsis [1] #include int isblank(int c); Description: [2] The isblank function tests for any character that is a standard blank character or is one of an implementation-defined set of characters, for which isalnum is false. The purpose of the function is for testing for blank characters in a line. The standard blank characters are the following: space (' '), and horizontal tab ('\t'). In the "C" locale, isblank returns true only for the standard blank characters. Table 1: Valid Character Class Combinations In Can also belong to Class upper lower alpha digit space cntrl punct graph print xdigit blank upper + + A x x x x A A + x lower + + A x x x x A A + x alpha + + + x x x x A A + x digit x x x + x x x A A A x space x x x x + + * * * x + cntrl x x x x + + x x x x + punct x x x x + x + A A x + graph + + + + + x + + A + + print + + + + + x + + + + + xdigit + + + + x x x A A + x blank x x x x A + * * * x + NOTES: Note 1: Explanation of codes: A Automatically included; see text + Permitted x Mutually exclusive * See note 2 Note 2: The character, which is part of the space and blank class, cannot belong to punct or graph, but automatically shall belong to the print class. Other space or blank characters can be classified as punct, graph, and/or print. 7.3.2 Character case mapping functions C has only an implicit statement on locale dependence for the case mapping functions, referring to isupper/islower. The locale dependence can be made explicit by adding: "as specified by the current locale" to both the toupper() and tolower() descriptions, so it reads (for tolower): "If the argument is a character for which isupper is true and there is a corresponding character as specified by the current locale for which islower is true, the tolower function returns the corresponding character; otherwise, the argument is returned unchanged." 7.4 Localization The POSIX-2 standard was approved after adoption of the C standard, and it contains a format for specifying locales and accompanying charmaps. This is a valuable and standardized way of specifying locales, on the other hand many C compilers do not operate under a POSIX operating system. It is proposed to add in 7.4 after the macro (LC_ALL etc) section in a non-normative note: "Footnote: POSIX-2 specifies locale and charmap formats that may be used to specify locales for C." A reference to the POSIX-2 standard should be added to the informative bibliography. 7.4.2.1 p_sep_by_space and n_sep_by_space POSIX has added a 3rd value and thus it is proposed to change the descriptions of p_sep_by_space, n_sep_by_space, int_p_sep_by_space an int_n_sep_by_space: set to 0 if no space separates the currency_symbol from the value for a nonnegative formattet monetary quantity, set to 1 if a space separates the symbol from the value, and set to 2 if a space separates the symbol and the value, if adjacent. Variations of this definition for the international and/or negatve values are generated by using int_curr_symbol an negative for currency_symbol and nonnegative, respectively. A table giving example formats for the combinations of p_cs_precedes, p_sign_posn and p_sep_by_space is given below, given that the positive_sign is "+" and the currency_symbol is "$". p_sep_by_space 2 1 0 p_cs_precedes = 1 p_sign_posn = 0 ($ 1.25) ($ 1.25) ($1.25) p_sign_posn = 1 + $1.25 +$ 1.25 +$1.25 p_sign_posn = 2 $1.25 + $ 1.25+ $1.25+ p_sign_posn = 3 + $1.25 +$ 1.25 +$1.25 p_sign_posn = 4 $ +1.25 $+ 1.25 $+1.25 p_cs_precedes = 0 p_sign_posn = 0 (1.25 $) (1.25 $) (1.25$) p_sign_posn = 1 +1.25 $ +1.25 $ +1.25$ p_sign_posn = 2 1.25$ + 1.25 $+ 1.25$+ p_sign_posn = 3 1.25+ $ 1.25 +$ 1.25+$ p_sign_posn = 4 1.25$ + 1.25 $+ 1.25$+ 7.5.2.1 int_curr_symbol different from currency_symbol As there may be differences between the order of how local currency is written and how international currency is written, it is proposed to add the 6 following members of the lconv struct: int_p_cs_precedes int_p_sep_by_space int_n_cs_precedes int_n_sep_by_space int_p_sign_pos int_n_sign_pos with equivalent wording as "p_cs_precedes" etc, where "currency_symbol" is replaced with "int_curr_symbol" in 7.5.2.1[3]. In section 7.5.2.1 the examples need to be enhanced. There cannot be a point after ITL. Netherlands use a kind of small "f". Norway have at least a space between "kr" and the value. We need examples with all the new variables, int_p_cs_precedes etc. Differences from POSIX This proposal introduces the following changes from POSIX (all additions): Adds to lconv struct: int_p_cs_precedes int_p-sep_by_space int_n_cs_precedes int_n_sep_by_space int_p_sign_pos int_n_sign_pos