Document number: SC22/WG14 N714 (J11 97-077)

Title: POSIX Alignment
Author: Keld Simonsen
Author affiliation: DKUUG
Postal address: Fruebjergvej 3, DK-2100 K›benhavn ť
Email address: keld@dkuug.dk
Telephone number: +45 3122-6543
Fax number: +45 3325-6543
Sponsor: DS
Date: 1997-06-08
Proposal category:
    __ Editorial change/non-normative contribution
    XX Correction
    XX New feature
    __ Addition to obsolescent feature list
    __ Other (please specify)
Area of standard affected:
    XX Environment
    XX Language
    __ Preprocessor
    XX Library
       XX Macro/typedef/tag name
       XX Function
       XX Header
    __ Other (please specify)
Prior art: ISO/IEC 9945 POSIX standards
Target audience: general

Related documents: N431 (Rationale and analysis), N507, N538,
N586, N658, N665
Proposal attached: proposal paper
Review committee: Keld Simonsen, Rex Jaeschke, Doug Gwyn,
Frank Farance, Clive Feather
Status: stage 3, principally agreed

Abstract: 
The paper gives proposals for alignment of C9X with the POSIX
standards wrt. internationalization features.

Introduction

This paper gives proposals for changes to the C standard to align it with the POSIX
standards POSIX System API (C language) (POSIX-1), and ISO/IEC 9945-2:1993 POSIX
Shell and  Utilities (POSIX-2). It does not cover newer proposals for POSIX or other
related specifications, that are not yet international standards.

It builds on the document N431, which gave an overview of internationalisation in C and
POSIX standards, a comparison of the functionality and features provided, and also
mentioned other incompatibilities between C and POSIX standards. Thus N431 gave the
background and rationale for the proposed changes, and it was decided in the Copenhagen
meeting to do further work based on N431. The paper here descibes in detail what the
changes should be.

Frank Farance has taken an action item on a proposal that addresses non-
internationalization oriented alignment with POSIX.1, such as described in clause 8 of
ISO/IEC 9945-1. 

One reviewer recommended to introduce two new functions chtype() and ischtype() to
match wctype() and iswctype(), but I have not included text for this as this would seem to
be beyond what was agreed in principle by WG14.

There is a separate paper for strftime() POSIX alignment.

The following section numbers refer to the C9X Draft 8.

Changes to N586

Changes to the N586 document are:

LC_MESSAGES description with yesexpr and noexpr deleted
Reserving a "std/" namespace for setlocale() deleted
Deleted numerical thousands and decimal seperator being more than one character.

Changes from N658 (1997-02-10)

Changes to the N658 document are:
lconv MAXINT==-1 clause deleted.
POSIX.1 clause 8 alignment note deleted.
some minor text added as discussed 1997-02-13.
Class table added.
*_sign_posn (value 5) changed to *_sep_by_space (value 2).
Table of p_cs_precedes, p_sep_by_space and p_sign_posn added.
Description of lconv struct entries for time added.

7.3.1 Character testing functions

POSIX-2 adds in its section 2.5.2 a class "blank", consisting initially of the characters
<space> and <tab>. This character class should be added, possibly by adding a function
isblank() that is similar to the isspace() function except that the test is for a standard blank
character, and the characters covered initially only are space (' ') and horizontal tab ('\t').
Similary a function iswblank() should be added. Text for 7.3.1.12:

"7.3.1.12 the isblank function

Synopsis

[1] 

        #include <ctype.h>
        int isblank(int c);

Description:

[2] The isblank function tests for any character that is a standard blank character or is one
of an implementation-defined set of characters, for which isalnum is false. The purpose of
the function is for testing for blank characters in a line. The standard blank characters are
the following: space (' '), and horizontal tab ('\t'). In the "C" locale, isblank returns true
only for the standard blank characters.


Table 1: Valid Character Class Combinations

In     Can also belong to
Class  upper   lower alpha digit space cntrl punct graph  print  xdigit blank

upper  +       +     A     x     x     x     x     A      A      +      x
lower  +       +     A     x     x     x     x     A      A      +      x 
alpha  +       +     +     x     x     x     x     A      A      +      x 
digit  x       x     x     +     x     x     x     A      A      A      x 
space  x       x     x     x     +     +     *     *      *      x      +
cntrl  x       x     x     x     +     +     x     x      x      x      +
punct  x       x     x     x     +     x     +     A      A      x      +
graph  +       +     +     +     +     x     +     +      A      +      +
print  +       +     +     +     +     x     +     +      +      +      +
xdigit +       +     +     +     x     x     x     A      A      +      x 
blank  x       x     x     x     A     +     *     *      *      x      +

NOTES:
Note 1: Explanation of codes:
A Automatically included; see text
+ Permitted
x Mutually exclusive
* See note 2


Note 2: The <space> character, which is part of the space and blank class, cannot belong
to punct or graph, but automatically shall belong to the print class. Other space or blank
characters can be classified as punct, graph, and/or print.


7.3.2 Character case mapping functions

C has only an implicit statement on locale dependence for the case mapping functions,
referring to isupper/islower. The locale dependence can be made explicit by adding: "as
specified by the current locale" to both the toupper() and tolower() descriptions, so it reads
(for tolower):

"If the argument is a character for which isupper is true and there is a corresponding
character as specified by the current locale for which islower is true, the tolower function
returns the corresponding character; otherwise, the argument is returned unchanged."


7.4 Localization

The POSIX-2 standard was approved after adoption of the C standard, and it contains a
format for specifying locales and accompanying charmaps. This is a valuable and
standardized way of specifying locales, on the other hand many C compilers do not
operate under a POSIX operating system. It is proposed to add in 7.4 after the macro
(LC_ALL etc) section in a non-normative note:

"Footnote:
POSIX-2 specifies locale and charmap formats that may be used to specify locales for C."

A reference to the POSIX-2 standard should be added to the informative bibliography. 

7.4.2.1 p_sep_by_space and n_sep_by_space

POSIX has added a 3rd value and thus it is proposed to change the descriptions of
p_sep_by_space, n_sep_by_space, int_p_sep_by_space an int_n_sep_by_space:

set to 0 if no space separates the currency_symbol from the value for a nonnegative
formattet monetary quantity, set to 1 if a space separates the symbol from the value, and
set to 2 if a space separates the symbol and the value, if adjacent.

Variations of this definition for the international and/or negatve values are generated by
using int_curr_symbol an negative for currency_symbol and nonnegative, respectively.

A table giving example formats for the combinations of p_cs_precedes, p_sign_posn and
p_sep_by_space is given below, given that the positive_sign is "+" and the
currency_symbol is "$". 

                                         p_sep_by_space
                                       2           1             0

p_cs_precedes = 1    p_sign_posn = 0   ($ 1.25)    ($ 1.25)      ($1.25)
                     p_sign_posn = 1   + $1.25     +$ 1.25       +$1.25
                     p_sign_posn = 2   $1.25 +     $ 1.25+       $1.25+
                     p_sign_posn = 3   + $1.25     +$ 1.25       +$1.25
                     p_sign_posn = 4   $ +1.25     $+ 1.25       $+1.25

p_cs_precedes = 0    p_sign_posn = 0   (1.25 $)    (1.25 $)      (1.25$)
                     p_sign_posn = 1   +1.25 $     +1.25 $       +1.25$
                     p_sign_posn = 2   1.25$ +     1.25 $+       1.25$+
                     p_sign_posn = 3   1.25+ $     1.25 +$       1.25+$
                     p_sign_posn = 4   1.25$ +     1.25 $+       1.25$+

 
7.5.2.1 int_curr_symbol different from currency_symbol

As there may be differences between the order of how local currency is written and how
international currency is written, it is proposed to add the 6 following members of the
lconv struct:

int_p_cs_precedes
int_p_sep_by_space
int_n_cs_precedes
int_n_sep_by_space
int_p_sign_pos
int_n_sign_pos

with equivalent wording as "p_cs_precedes" etc, where "currency_symbol" is replaced
with "int_curr_symbol" in 7.5.2.1[3].

In section 7.5.2.1 the examples need to be enhanced.
There cannot be a point after ITL.
Netherlands use a kind of small "f".
Norway have at least a space between "kr" and the value.
We need examples with all the new variables, int_p_cs_precedes etc.

Differences from POSIX

This proposal introduces the following changes from POSIX (all additions):

Adds to lconv struct:
       int_p_cs_precedes
       int_p-sep_by_space
       int_n_cs_precedes
       int_n_sep_by_space
       int_p_sign_pos
       int_n_sign_pos

<END OF DOCUMENT>