ISO/ IEC JTC1/SC22/WG14 N892

	Document Number: N892	
	Date: 13-Sept-1999

	-Issue #1-

Author: Clive Feather <clive@demon.net>
Date: 1999-06-27

Subject: default argument conversion of float _Complex


Summary
-------

For consistency with real floating types, the type float _Complex should be
promoted by the default argument promotions to double _Complex.


Suggested Technical Corrigendum
-------------------------------

Change 6.5.2.2p6 in part, from:

    and arguments that have type float are promoted to double.

to:

    and arguments that have a corresponding real type of float are
    promoted, without change of type domain, to a type whose corresponding
    real type is double.


	- Issue #2-

Author: Clive Feather <clive@demon.net>
Date: 1999-06-27

Subject: handling of imaginary types


Summary
-------

The handling of imaginary types in the Standard is somewhat inconsistent. For
example, they are not mentioned at all in 6.2.5 (other than a footnote), but
are treated as first-class types in 6.7.2. Annex G makes certain assumptions
about such types but these assumptions are not supported by the Standard.


Details
-------

There are two reasonable approaches that could be followed. The first is to
remove all mention of imaginary types from the main text of the Standard
and put them all in Annex G.  The second is to make the basic properties of
imaginary types part of the main language (while still making them
optional), leaving Annex G to handle the details of ISO 60559 imaginary
types.

After some thought, the author of this DR feels that imaginary types are
experimental enough that the first approach is better and has worded the
Suggested Technical Corrigendum on that basis.

The keyword _Imaginary is mentioned in 6.4.1, 6.7.2, and 7.3.1. These
references - and any related text - are all to be removed and replacement
wording added to Annex G.

A new subclause G.4.4 is added. This specifies the practical implications
of giving imaginary types the same representation and alignment as real
floating types.


Suggested Technical Corrigendum
-------------------------------

Delete "_Imaginary" from the list of keywords in 6.4.1. If this is felt to
be too radical, instead add the following text to paragraph 2:

    The keyword _Imaginary is not used in the C language, but is reserved
    for specifying imaginary types such as described in Annex G.

Delete "_Imaginary" from 6.7.2p1 and the three imaginary cases from 6.7.2p2.
Change 6.7.2p3 to read:

    The type specifier _Complex shall not be used if the implementation
    does not provide complex types.

Delete 7.3.1p3. Delete "imaginary" from 7.3.1p5. Replace 7.3.1p4 with:

    The macro
        I
    expands to _Complex_I.

Add a new paragraph before G.2p1:

    There is a new keyword _Imaginary used to specify imaginary types.
    It is used as a type-specifier within declaration-specifiers in the
    same way as _Complex is (thus "_Imaginary float" is a valid type name).

Add a new subclause G.4.4

    G.4.4 Interchangable values

    Though imaginary types are not compatible with the corresponding real
    type, values of one may be used where the other is expected in the
    following cases. In each case the value is converted to the value of the
    other type that has the same representation (that is, by multiplying by
    the imaginary unit when converting to an imaginary type, and by dividing
    by the imaginary unit when converting to a real type).
    - one type is the type of the parameter, and the other type the
      type of the argument, when a function is called without a prototype
      in scope; [*]
    - one type is the type of an argument corresponding to a trailing
      ellipsis in a function call and the other is specified as the
      type argument of an invocation of the va_arg macro;
    - one type is the type of an argument to a function such as fprintf
      or the type pointed to by an argument to a function such as fscanf,
      and the other is the type implied by the corresponding conversion
      specifier.

    [*] If a prototype is in scope, conversion is as if by assignment and
    the value will be converted to zero.

Replace G.6p1 with:

    The macros
        imaginary
    and
        _Imaginary_I
    are defined, respectively, as _Imaginary and a constant expression of
    type const float _Imaginary with the value of the imaginary unit.
    The macro
        I
    is defined to be _Imaginary_I (not _Complex_I as stated in 7.3).
    Notwithstanding the provisions of 7.1.3, a program may undefine and
    then perhaps redefine the macro imaginary.


Afternote
---------

If WG14 wishes to take the alternative approach of moving _Imaginary types
more firmly into the body of the Standard, then the following areas would
be affected.
- Do not make any of the above changes.
- Add text to 4p6 explaining that imaginary types are never required.
- Merge the text from G.2 into 6.2.5.
- Merge the existing text from G.4 into 6.3.1.
- Make the cases described in the new G.4.4 above further cases of the
  relevant subclauses (6.5.2.2, 7.15.1.1, 7.19.6.1, 7.19.6.2, 7.24.2.1,
  and 7.24.2.2).
- Move G.5.1p1 and G.5.2p1 into 6.5.5 and 6.5.6.
- Delete G.6p1.

	-Issue #3-

Author: Clive Feather <clive@demon.net>
Date: 1999-09-06

Subject: ambiguity in initialization


Summary
-------

When there is more than one initializer for the same object it is not
clear whether both initializers are actually evaluated. Wording changes
are proposed to clarify this.


Details
-------

Subclause 6.7.8 paragraph 19 reads:

    The initialization shall  occur  in  initializer  list
    order,  each initializer provided for a particular subobject
    overriding any previously listed initializer  for  the  same
    subobject;   all   subobjects   that   are  not  initialized
    explicitly shall  be  initialized  implicitly  the  same  as
    objects that have static storage duration.

Paragraph 23 reads:

    The order in which any side effects  occur  among  the
    initialization list expressions is unspecified.

If the same object is initialized twice, as in:

    int a [2] = { f (0), f (1), [0] = f (2) };

the term "overriding" could be taken to mean that the first initializer is
ignored completely, or it could be taken to mean that the expression is
evaluated and then discarded. The proposed wording change assumes the
latter.


Suggested Technical Corrigendum
-------------------------------

Replace 6.7.8 paragraph 23 with:

    All the initialization list expressions are evaluated, even if the
    resulting value will be overridden, but the order in which any side
    effects occur is unspecified.


	- Issue #4-

Author: Clive Feather <clive@demon.net>
Date: 1999-09-11

Subject: binding of multibyte conversion state objects


Summary
-------

There is a general belief that multibyte conversion state objects (or at
least those associated with streams) are bound to a given locale when first
used. However, the Standard does not support this belief and in fact
contradicts it. The author believes that common practice matches the belief
and not the Standard, and therefore submits that the latter should be
changed.


Discussion
----------

Consider the following function to convert a file from one encoding to
another. The macro CONVERT is assumed to convert a wide character between
the two encodings - it is expected that this will often be a no-op. Error
checking has been omitted for clarity.

    void conv_file (char *fn1, char *fn2, char *loc1, char *loc2)
    {
        setlocale (LC_CTYPE, loc1);
        FILE *f1 = fopen (fn1, "r");
        wchar_t c = fgetwc (f1);

        setlocale (LC_CTYPE, loc2);
        FILE *f2 = fopen (fn2, "w");

        while (c != WEOF)
        {
            fputwc (CONVERT (c), f2);
            c = fgetwc (f1);
        }
    }

Most people would be surprised to discover that this code does not work
unless two extra setlocale calls are added to the inner loop:

        {
            setlocale (LC_CTYPE, loc2);
            fputwc (CONVERT (c), f2);
            setlocale (LC_CTYPE, loc1);
            c = fgetwc (f1);
        }

Similarly, consider a function to convert a block from one multibyte
encoding to another:

    void conv_data (char *p1, char *p2, char *loc1, char *loc2)
    {
        char *p1e = strchr (p1, '\0') + 1;
        setlocale (LC_CTYPE, loc1);
        mbstate_t mb1 = { 0 };
        wchar_t c;
        p1 += mbrtowc (&c, p1, p1e - p1, &mb1);

        setlocale (LC_CTYPE, loc2);
        mbstate_t mb2 = { 0 };
        p2 += wcrtomb (p2, CONVERT (c),  &mb2);

        while (c != 0)
        {
            p1 += mbrtowc (&c, p1, p1e - p1, &mb1);
            p2 += wcrtomb (p2, CONVERT (c),  &mb2);
        }
    }

Again this actually requires two setlocale calls in the inner loop.

Clause 7.24.6p3 reads, in part:

    If an mbstate_t object has been altered by any of the functions
    described in this subclause, and is then used with a different
    multibyte character sequence, or in the other conversion direction,
    or with a different LC_CTYPE category setting than on earlier
    function calls,the behavior is undefined.

Put another way, each mbstate_t object is initially "unbound" (if it is
initialized to zero) and then becomes "bound" by any call to a function
such as mbrtowc or wcrtomb. When "bound" it can only be used in the
same direction with the same string as originally bound, and only when the
LC_CTYPE category is that in effect when it was bound.

With ordinary mbstate_t objects this is a annoyance; one implication is
that a new object must be created every single time a new string is to be
converted (the Standard does not provide any way to "unbind" the object).
With the mbstate_t object inside a FILE structure it is even worse, because
it makes it impossible to (for example) write to a file, rewind it, and
then read the same file. Similarly, the internal mbstate_t objects used
when the mbstate_t pointer argument is set to NULL can be used for only
one string in the entire program !

Users of mbstate_t objects (including those in FILE structures) expect them
to continue to work even when the locale is changed, and expect to be able
to use them for more than a single purpose.


Potential solutions
-------------------

Any change to the Standard needs to not affect existing code. Luckily this
is fairly simple: existing code must only use the mbstate_t object in the
locale it was bound to and in the same way as when it was initially bound.
However, any change should also make the above code work.

One naive proposal is that, whenever a mbstate_t object returns to the
initial state, it becomes unbound and can then be rebound. However, in most
locales the initial state occurs often, and therefore there is a major risk
that the object will be bound to the wrong locale at that time.

The actual proposal below works in three stages:

(1) Once an mbstate_t object is bound to a locale, it remains bound to
that locale even after a call to setlocale(). [This gives meaning to code
that is currently undefined, but does not affect any existing correct code.]

(2) Explicit mechanisms are provided to change the status of an mbstate_t
object in several ways:
- return to the initial state, making it available for future conversion
  in either direction [for the wide->multi direction this was already
  possible using wcrtomb, while for the multi->wide direction a minor change
  is made to mbrtowc utilising previously unspecified behaviour];
- unbind the object, returning it to the initial state [done by assigning
  a zero value to it or with the new __mbsbind function];
- bind to a new locale, returning the mbstate_t object to the initial state
  [done with the new __mbsbind function].

(3) Various changes are made to the way that the mbstate_t object hidden
in a FILE is processed. The previous changes ensure that, once bound to a
locale, it remains bound to it. It can be returned explicitly to the
initial state by a call to fseek, and also returns to the initial state
after input reaches end-of-file (these choices were made to correspond
with the requirements of 7.19.5.3p6 for changing I/O direction). Finally
a new __mbsfbind function is provided to unbind the object or bind it to
the current locale.

(4) The internal mbstate_t objects associated with the mbrlen, mbrtowc,
wcrtomb, mbsrtowcs, and wcsrtombs functions can only be used with the
locale they initially bind to. Semantics are proposed to force the object
to the unbound state; these use previously impossible cases where they
exist, or in the case of wcrtomb are upwards compatible.


Efficiency
----------

Some concerns have been expressed that, on some implementations, changing
locale is a "heavyweight" operation and that such changes should therefore
be resisted. However, the proposed changes do not actually alter the
situation: the same locales will still be used at the same time. What does
become possible is for the mbstate_t to usefully contain a pointer to
cached locale information, and for the cache to be maintained more
effectively.


Suggested Technical Corrigendum
-------------------------------

(Changes concerning explicit mbstate_t objects.)

Change 7.24.6p3 to:

    [#3]  The  initial  conversion  state  corresponds,  for   a
    conversion  in  either  direction, to the beginning of a new
    multibyte character in the initial shift state. An mbstate_t
    object may be "unbound" or "bound". A zero-valued mbstate_t
    object is (at least) one way to describe an unbound object,
    and if an mbstate_t object is assigned such a value it it
    becomes unbound. All unbound mbstate_t objects are in the
    initial conversion state. An unbound object can be used to
    initiate conversion involving any multibyte character
    sequence, in any LC_CTYPE category setting, and then becomes
    bound to that category setting. When a bound mbstate_t
    object is used with any of the functions described in this
    subclause, the category it is bound to is used irrespective
    of the current LC_CTYPE category setting. If an mbstate_t
    object has been altered by any of the functions described in
    this subclause so as to not be in the initial conversion
    state, and is then used with a different multibyte character
    sequence, or in the other conversion direction, than on the
    most recent such function call, the behavior is undefined.290)

Append to footnote 290:

    Furthermore, provided that the object is in the initial
    conversion state, it can then be used in converting a new
    string or in the other direction.
 
Add a new subclause 7.24.6.2.2:

    7.24.6.2.2  The __mbsbind function

    Synopsis

    [#1]

            #include <wchar.h>
            int __mbsbind(mbstate_t *ps, int loc);

    Description

    [#2] The value of loc shall be 0 or 1. If ps is not a null
    pointer, the pointed-to mbstate_t object is made unbound
    (if loc is 0) or bound to the current LC_CTYPE category
    setting in the initial conversions state (if loc is 1),
    irrespective of its previous state.

    Returns

    [#3] The __mbsbind function returns zero normally or a
    negative value if ps is a null pointer or some other error
    occurred.290a)

    290a The __mbsbind function is not required to detect any
    other errors.

Change 7.24.6.3p1 and 7.24.6.4p1 from:

    [...] which is initialized at program startup to the initial
    conversion state. [...]
to:
    [...] which is initialized at program startup to the unbound
    state. [...]

Change 7.24.6.3.2p2 to:

    [#2]  If  s  is  a  null  pointer,  the  mbrtowc function is
    equivalent to the call:

                    mbrtowc(NULL, "", 1, ps)

++  except that the resulting state described is the initial
++  conversion state even if an encoding error occurred.290b)

    In this case, the values of the parameters  pwc  and  n  are
    ignored.

++  290b The only possible return values are 0 and (size_t)-1.
++  The effect is reliably to set *ps to the initial conversion
++  state while remaining bound.

In 7.24.6.3.2p4, change "positive" to "<= n" (the two error values are
actually large and positive).

Append a footnote to 7.24.6.3.3p2:

    291a  The effect is reliably to set *ps to the initial
    conversion state while remaining bound.


(Changes concerning mbstate_t objects associated with streams.)

Append to 7.19.2p6:

    If a wide character input function encounters end-of-file, or
    after a successful call to the fseek function, the mbstate_t
    object associated with the stream describes the initial
    conversion state.

Append to the last sentence of 7.19.9.2p5:

    and if the stream is wide-oriented the associated mbstate_t
    object shall be set to the initial conversion state.

In 7.24.3.1p2, change:

to:
    [...] If the stream
    is at end-of-file, the end-of-file indicator for the  stream
++  is set, the mbstate_t object associated with the stream is
++  set to the initial conversions state,
    and fgetwc returns WEOF. [...]

Add a new subclause 7.24.6.2.3:

    7.24.6.2.3  The __mbsfbind function

    Synopsis

    [#1]

            #include <stdio.h>
            #include <wchar.h>
            int __mbsfbind(FILE *stream, int loc);

    Description

    [#2] The __mbsfbind function is equivalent to:

            if (fwide(stream, 1) > 0)
                __mbsbind(&mbsobj, loc);

    where mbsobj is the mbstate_t object associated with stream
    when it is wide-oriented, except that the returned value
    can represent different errors.

    Returns

    [#3] The __mbsfbind function returns zero normally or a
    negative value if some error occurred.290c)

    290c The __mbsfbind function is not required to detect any
    errors.


(Changes associated with internal mbstate_t objects.)

Append to 7.24.6.3.2p3:

    As a special case, if n is (size_t)-1 then ps becomes unbound
    irrespective of its previous state and an unspecified value
    is returned.

Append to 7.24.6.3.3p2:

    If additionally ps is a null pointer, the internal mbstate_t
    object becomes unbound irrespective of its previous state.

Append to 7.24.6.4p2:

    As a special case, if src is a null pointer then the normal
    behaviour of the function is ignored and instead ps becomes
    unbound irrespective of its previous state; an unspecified
    value is returned.