ISO/ IEC JTC1/SC22/WG14 N780

                     Document number: SC22/WG14 N780 (J11 97-144)

Title: POSIX Alignment
Author: Keld Simonsen
Author affiliation: DKUUG
Postal address: Fruebjergvej 3, DK-2100 K›benhavn ¯
Email address: keld@dkuug.dk
Telephone number: +45 3122-6543
Fax number: +45 3325-6543
Sponsor: DS
Date: 1997-09-28
Proposal category:
    __ Editorial change/non-normative contribution
    XX Correction
    XX New feature
    __ Addition to obsolescent feature list
    __ Other (please specify)
Area of standard affected:
    XX Environment
    XX Language
    __ Preprocessor
    XX Library
       XX Macro/typedef/tag name
       XX Function
       XX Header
    __ Other (please specify)
Prior art: ISO/IEC 9945 POSIX standards
Target audience: general

Related documents: N431 (Rationale and analysis), N507, N538,
N586, N658, N665
Proposal attached: proposal paper
Review committee: Keld Simonsen, Rex Jaeschke, Doug Gwyn,
Frank Farance, Clive Feather
Status: stage 3, principally agreed

Abstract: 

The paper gives proposals for alignment of C9X with the POSIX standards
wrt. internationalization features.

Introduction

This paper details changes to the C standard to align it with POSIX System
API (C language) (POSIX-1) and ISO/IEC 9945-2:1993 POSIX Shell and 
Utilities (POSIX-2). It does not cover newer proposals for POSIX or other
related specifications, that are not yet international standards.

This document builds on N431, which gave an overview of
internationalisation in C and POSIX standards, a comparison of the
functionality and features provided, and also mentioned other
incompatibilities between C and POSIX standards. Thus N431 gave the
background and rationale for the proposed changes, and it was decided in
the Copenhagen meeting to do further work based on N431. The paper here
describes in detail what those changes should be.


7.3.1 Character testing functions

POSIX-2 adds in its section 2.5.2 a class "blank", consisting initially of
the characters <space> and <tab>. We should support this class be adding
the function isblank() (as well as iswblank) that is similar to the
isspace() function except that the test is for a standard blank character,
and the characters covered initially only are space (' ') and horizontal
tab ('\t').

-------- Start New Section ----------

7.3.1.3 The isblank function

Synopsis

[1] 

        #include <ctype.h>
        int isblank(int c);

Description:

[2] The isblank function tests for any character that is a standard blank
character or is one of an implementation-defined set of characters, for
which isalnum is false. The standard blank characters are the
following: space (' '), and horizontal tab ('\t'). In the "C" locale,
isblank returns true only for the standard blank characters.

-------- End New Section ----------



-------- Start New Section ----------

7.17.2.1.3 The iswblank function

Synopsis

[1] 

        #include <wctype.h>
        int iswblank(wint_t c);

Description:

[2] The iswblank function tests for any wide character that is a standard
blank wide character or is one of an implementation-defined set of wide
characters, for which iswalnum is false.  The standard
blank wide characters are the following: space (L' '), and horizontal tab
(L'\t'). In the "C" locale, iswblank returns true only for the standard
blank wide characters.

-------- End New Section ----------

POSIX has this table in the standard, so it is proposed to add
this table at the end of clause 7.3

-------- Start New Section ----------

Table 1: Valid Character Class Combinations

In     Can also belong to
Class  upper   lower alpha digit space cntrl punct graph  print  xdigit blank

upper  +       +     A     x     x     x     x     A      A      +      x
lower  +       +     A     x     x     x     x     A      A      +      x 
alpha  +       +     +     x     x     x     x     A      A      +      x 
digit  x       x     x     +     x     x     x     A      A      A      x 
space  x       x     x     x     +     +     *     *      *      x      +
cntrl  x       x     x     x     +     +     x     x      x      x      +
punct  x       x     x     x     +     x     +     A      A      x      +
graph  +       +     +     +     +     x     +     +      A      +      +
print  +       +     +     +     +     x     +     +      +      +      +
xdigit +       +     +     +     x     x     x     A      A      +      x 
blank  x       x     x     x     A     +     *     *      *      x      +

NOTES:

Note 1: Explanation of codes:

	A Automatically included; see text
	+ Permitted
	x Mutually exclusive
	* See note 2

Note 2: The <space> character, which is part of the space and blank class,
cannot belong to punct or graph, but automatically shall belong to the
print class. Other space or blank characters can be classified as punct,
graph, and/or print.

-------- End New Section ----------


7.3.2 Character case mapping functions

C has only an implicit statement on locale dependence for the case mapping
functions, referring to isupper/islower. The locale dependence can be made
explicit by adding text to the descriptions of to[w]upper() and
to[w]lower(), as follows:

-------- Start Changed Section ----------

7.3.2.1 The tolower function

Returns

[#3] If the argument is a character for which isupper is true and there is
a corresponding character ===>as specified by the current locale<=== for
which islower is true, the tolower function returns the corresponding
character; otherwise, the argument is returned unchanged.

-------- End Changed Section ----------



-------- Start Changed Section ----------

7.3.2.2  The toupper function

Returns

[#3] If the argument is a character for which islower is true and there is
a corresponding character ===>as specified by the current locale<=== for
which isupper is true, the toupper function returns the corresponding
character; otherwise, the argument is returned unchanged.

-------- End Changed Section ----------


-------- Start Changed Section ----------

7.17.3.1.1 The towlower function

Returns

[#3] If the argument is a wide character for which iswupper is true and 
there is a corresponding wide character ===>as specified by the current
locale<=== for which iswlower is true, the towlower function returns the
corresponding wide character; otherwise, the argument is returned
unchanged.

-------- End Changed Section ----------



-------- Start Changed Section ----------

7.17.3.1.2 The towupper function

Returns

[#3] If the argument is a wide character for which iswlower is true and 
there is a corresponding wide character ===>as specified by the current
locale<=== for which iswupper is true, the towupper function returns the
corresponding wide character; otherwise, the argument is returned
unchanged.

-------- End Changed Section ----------


7.4 Localization

The POSIX-2 standard was approved after adoption of the C standard, and it
contains a format for specifying locales and accompanying charmaps. This
is a valuable and standardized way of specifying locales and should be
mentioned as a footnote, as follows:


-------- Start Changed Section ----------

7.5  Localization <locale.h>

[#3] The macros defined are NULL (described in 7.1.6); and               		
LC_ALL
	LC_ALL *Footnote*
	LC_COLLATE
	...

Footnote: POSIX-2 specifies locale and charmap formats that may be used to
specify locales for C.

-------- End Changed Section ----------

A reference to the POSIX-2 standard should be added to the informative
bibliography. 

The entry is:

ISO/IEC 9945-2:1993 Information technology - Portable Operating System
Interface(POSIX) - Part 2: Shell and Utilities.

 
7.5.2.1 int_curr_symbol different from currency_symbol

As there may be differences between the order of how local currency is
written and how international currency is written, it is proposed to add
the following members (none of which are part of the POSIX spec) to the
lconv struct, as follows:

-------- Start Changed Section ----------

7.5  Localization <locale.h>

[#2]	...

	char int_p_cs_precedes;        /* CHAR_MAX */
	char int_p_sep_by_space;       /* CHAR_MAX */
	char int_n_cs_precedes;        /* CHAR_MAX */
	char int_n_sep_by_space;       /* CHAR_MAX */
	char int_p_sign_posn;          /* CHAR_MAX */
	char int_n_sign_posn;          /* CHAR_MAX */

-------- End Changed Section ----------


-------- Start Changed Section ----------

7.5.2.1  The localeconv function

[#3] ...

char int_p_cs_precedes	Set to 1 or 0 if the int_curr_symbol respectively
precedes or succeeds the value for a nonnegative formatted monetary
quantity.

char int_p_sep_by_space	Set to 1 or 0 if the int_curr_symbol respectively
is or is not separated by a space from the value for a nonnegative
formatted monetary quantity.

char int_n_cs_precedes	Set to 1 or 0 if the int_curr_symbol respectively
precedes or succeeds the value for a negative formatted monetary quantity.

char int_n_sep_by_space	Set to 1 or 0 if the int_curr_symbol respectively
is or is not separated by a space from the value for a negative formatted
monetary quantity.

char int_p_sign_posn	Set to a value indicating the positioning of the
positive_sign for a nonnegative formatted monetary quantity.

char int_n_sign_posn	Set to a value indicating the positioning of the
negative_sign for a negative formatted monetary quantity. 

-------- End Changed Section ----------


In section 7.5.2.1 the examples need to be enhanced.

	There cannot be a point after ITL.
	Netherlands use a kind of small "f".
	Norway have at least a space between "kr" and the value.

We need examples with all the new variables, int_p_cs_precedes etc.
This is all done in the text below.

-------- Start Changed Section ----------

7.5.2.1 The localeconv function

Examples

[#8] The following table illustrates the rules which may well be used by
five countries to format monetary quantities.

Country      Positive format Negative format International format

Italy        L.1.234        -L.1.234       ITL 1.234
Netherlands  f 1.234,56     f -1.234,56    NLG 1.234,56
Norway       kr 1.234,56    kr 1.234,56-   NOK 1.234,56
Switzerland  SFrs.1,234.56  SFrs.1,234.56C CHF 1,234.56
Finland      1.234,56 mk    -1.234,56 mk   FIM 1.234,56

[#9] For these five countries, the respective values for the monetary 
members of the structure returned by localeconv are:

                     Italy  Netherlands Norway Switzerland Finland

int_curr_symbol         "ITL " "NLG "    "NOK " "CHF "     "FIM "
currency_symbol         "L."   "f"       "kr"   "SFrs."    "mk"
mon_decimal_point       ""     ","       ","    "."        ","
mon_thousands_sep       "."    "."       "."    ","        "."
mon_grouping            "\3"   "\3"      "\3"   "\3"       "\3"
positive_sign           ""     ""        ""     ""         ""
negative_sign           "-"    "-"       "-"    "C"        "-"
int_frac_digits         0      2         2      2          2
frac_digits             0      2         2      2          2
p_cs_precedes           1      1         1      1          0
p_sep_by_space          0      1         0      0          1
n_cs_precedes           1      1         1      1          0
n_sep_by_space          0      1         0      0          1
p_sign_posn             1      1         1      1          1
n_sign_posn             1      4         2      2          1
int_p_cs_precedes       1      1         1      1          1
int_p_sep_by_space      0      1         0      0          1
int_n_cs_precedes       1      1         1      1          1
int_n_sep_by_space      0      1         0      0          1
int_p_sign_posn         1      1         1      1          1
int_n_sign_posn         1      4         2      2          4

-------- End Changed Section ----------



7.4.2.1 p_sep_by_space and n_sep_by_space

POSIX has added a third possibility for a formatted monetary quantity, so
now we have:

	No space separates the currency_symbol from the value.
	A space separates the symbol from the value.
*New*	A space separates the symbol and the value, if these entities are next 
to eachother.


-------- Start Changed Section ----------

7.5.2.1  The localeconv function

[#3] ...

char p_sep_by_space	Set to 0 if no space separates the currency_symbol
from the value for a nonnegative formatted monetary quantity; set to 1 if
a space separates the symbol from the value; and set to 2 if a space
separates the symbol and the value, if adjacent.

char n_sep_by_space	Set to 0 if no space separates the currency_symbol
from the value for a negative formatted monetary quantity; set to 1 if a
space separates the symbol from the value; and set to 2 if a space
separates the symbol and the value, if adjacent.

char int_p_sep_by_space	Set to 0 if no space separates the int_curr_symbol
from the value for a nonnegative formatted monetary quantity; set to 1 if
a space separates the symbol from the value; and set to 2 if a space
separates the symbol and the value, if adjacent.

char int_n_sep_by_space	Set to 0 if no space separates the int_curr_symbol
from the value for a negative formatted monetary quantity; set to 1 if a
space separates the symbol from the value; and set to 2 if a space
separates the symbol and the value, if adjacent.

-------- End Changed Section ----------

---------------- added section in rationale -----------------

This section should go into the rationale

A table giving example formats for the combinations of p_cs_precedes,
p_sign_posn and p_sep_by_space is given below, given that the
positive_sign is "+" and the currency_symbol is "$". 

                                          p_sep_by_space
                                          2        1        0
p_cs_precedes = 1 p_sign_posn = 0      ($ 1.25) ($ 1.25) ($1.25)
                  p_sign_posn = 1      + $1.25  +$ 1.25  +$1.25
                  p_sign_posn = 2      $1.25 +  $ 1.25+  $1.25+
                  p_sign_posn = 3      + $1.25  +$ 1.25  +$1.25
                  p_sign_posn = 4      $ +1.25  $+ 1.25  $+1.25

p_cs_precedes = 0 p_sign_posn = 0      (1.25 $) (1.25 $) (1.25$)
                  p_sign_posn = 1      +1.25 $  +1.25 $  +1.25$
                  p_sign_posn = 2      1.25$ +  1.25 $+  1.25$+
                  p_sign_posn = 3      1.25+ $  1.25 +$  1.25+$
                  p_sign_posn = 4      1.25$ +  1.25 $+  1.25$+

------------------------------------------------------------------