Document Number: N892 Date: 13-Sept-1999 -Issue #1- Author: Clive Feather Date: 1999-06-27 Subject: default argument conversion of float _Complex Summary ------- For consistency with real floating types, the type float _Complex should be promoted by the default argument promotions to double _Complex. Suggested Technical Corrigendum ------------------------------- Change 6.5.2.2p6 in part, from: and arguments that have type float are promoted to double. to: and arguments that have a corresponding real type of float are promoted, without change of type domain, to a type whose corresponding real type is double. - Issue #2- Author: Clive Feather Date: 1999-06-27 Subject: handling of imaginary types Summary ------- The handling of imaginary types in the Standard is somewhat inconsistent. For example, they are not mentioned at all in 6.2.5 (other than a footnote), but are treated as first-class types in 6.7.2. Annex G makes certain assumptions about such types but these assumptions are not supported by the Standard. Details ------- There are two reasonable approaches that could be followed. The first is to remove all mention of imaginary types from the main text of the Standard and put them all in Annex G. The second is to make the basic properties of imaginary types part of the main language (while still making them optional), leaving Annex G to handle the details of ISO 60559 imaginary types. After some thought, the author of this DR feels that imaginary types are experimental enough that the first approach is better and has worded the Suggested Technical Corrigendum on that basis. The keyword _Imaginary is mentioned in 6.4.1, 6.7.2, and 7.3.1. These references - and any related text - are all to be removed and replacement wording added to Annex G. A new subclause G.4.4 is added. This specifies the practical implications of giving imaginary types the same representation and alignment as real floating types. Suggested Technical Corrigendum ------------------------------- Delete "_Imaginary" from the list of keywords in 6.4.1. If this is felt to be too radical, instead add the following text to paragraph 2: The keyword _Imaginary is not used in the C language, but is reserved for specifying imaginary types such as described in Annex G. Delete "_Imaginary" from 6.7.2p1 and the three imaginary cases from 6.7.2p2. Change 6.7.2p3 to read: The type specifier _Complex shall not be used if the implementation does not provide complex types. Delete 7.3.1p3. Delete "imaginary" from 7.3.1p5. Replace 7.3.1p4 with: The macro I expands to _Complex_I. Add a new paragraph before G.2p1: There is a new keyword _Imaginary used to specify imaginary types. It is used as a type-specifier within declaration-specifiers in the same way as _Complex is (thus "_Imaginary float" is a valid type name). Add a new subclause G.4.4 G.4.4 Interchangable values Though imaginary types are not compatible with the corresponding real type, values of one may be used where the other is expected in the following cases. In each case the value is converted to the value of the other type that has the same representation (that is, by multiplying by the imaginary unit when converting to an imaginary type, and by dividing by the imaginary unit when converting to a real type). - one type is the type of the parameter, and the other type the type of the argument, when a function is called without a prototype in scope; [*] - one type is the type of an argument corresponding to a trailing ellipsis in a function call and the other is specified as the type argument of an invocation of the va_arg macro; - one type is the type of an argument to a function such as fprintf or the type pointed to by an argument to a function such as fscanf, and the other is the type implied by the corresponding conversion specifier. [*] If a prototype is in scope, conversion is as if by assignment and the value will be converted to zero. Replace G.6p1 with: The macros imaginary and _Imaginary_I are defined, respectively, as _Imaginary and a constant expression of type const float _Imaginary with the value of the imaginary unit. The macro I is defined to be _Imaginary_I (not _Complex_I as stated in 7.3). Notwithstanding the provisions of 7.1.3, a program may undefine and then perhaps redefine the macro imaginary. Afternote --------- If WG14 wishes to take the alternative approach of moving _Imaginary types more firmly into the body of the Standard, then the following areas would be affected. - Do not make any of the above changes. - Add text to 4p6 explaining that imaginary types are never required. - Merge the text from G.2 into 6.2.5. - Merge the existing text from G.4 into 6.3.1. - Make the cases described in the new G.4.4 above further cases of the relevant subclauses (6.5.2.2, 7.15.1.1, 7.19.6.1, 7.19.6.2, 7.24.2.1, and 7.24.2.2). - Move G.5.1p1 and G.5.2p1 into 6.5.5 and 6.5.6. - Delete G.6p1. -Issue #3- Author: Clive Feather Date: 1999-09-06 Subject: ambiguity in initialization Summary ------- When there is more than one initializer for the same object it is not clear whether both initializers are actually evaluated. Wording changes are proposed to clarify this. Details ------- Subclause 6.7.8 paragraph 19 reads: The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject; all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration. Paragraph 23 reads: The order in which any side effects occur among the initialization list expressions is unspecified. If the same object is initialized twice, as in: int a [2] = { f (0), f (1), [0] = f (2) }; the term "overriding" could be taken to mean that the first initializer is ignored completely, or it could be taken to mean that the expression is evaluated and then discarded. The proposed wording change assumes the latter. Suggested Technical Corrigendum ------------------------------- Replace 6.7.8 paragraph 23 with: All the initialization list expressions are evaluated, even if the resulting value will be overridden, but the order in which any side effects occur is unspecified. - Issue #4- Author: Clive Feather Date: 1999-09-11 Subject: binding of multibyte conversion state objects Summary ------- There is a general belief that multibyte conversion state objects (or at least those associated with streams) are bound to a given locale when first used. However, the Standard does not support this belief and in fact contradicts it. The author believes that common practice matches the belief and not the Standard, and therefore submits that the latter should be changed. Discussion ---------- Consider the following function to convert a file from one encoding to another. The macro CONVERT is assumed to convert a wide character between the two encodings - it is expected that this will often be a no-op. Error checking has been omitted for clarity. void conv_file (char *fn1, char *fn2, char *loc1, char *loc2) { setlocale (LC_CTYPE, loc1); FILE *f1 = fopen (fn1, "r"); wchar_t c = fgetwc (f1); setlocale (LC_CTYPE, loc2); FILE *f2 = fopen (fn2, "w"); while (c != WEOF) { fputwc (CONVERT (c), f2); c = fgetwc (f1); } } Most people would be surprised to discover that this code does not work unless two extra setlocale calls are added to the inner loop: { setlocale (LC_CTYPE, loc2); fputwc (CONVERT (c), f2); setlocale (LC_CTYPE, loc1); c = fgetwc (f1); } Similarly, consider a function to convert a block from one multibyte encoding to another: void conv_data (char *p1, char *p2, char *loc1, char *loc2) { char *p1e = strchr (p1, '\0') + 1; setlocale (LC_CTYPE, loc1); mbstate_t mb1 = { 0 }; wchar_t c; p1 += mbrtowc (&c, p1, p1e - p1, &mb1); setlocale (LC_CTYPE, loc2); mbstate_t mb2 = { 0 }; p2 += wcrtomb (p2, CONVERT (c), &mb2); while (c != 0) { p1 += mbrtowc (&c, p1, p1e - p1, &mb1); p2 += wcrtomb (p2, CONVERT (c), &mb2); } } Again this actually requires two setlocale calls in the inner loop. Clause 7.24.6p3 reads, in part: If an mbstate_t object has been altered by any of the functions described in this subclause, and is then used with a different multibyte character sequence, or in the other conversion direction, or with a different LC_CTYPE category setting than on earlier function calls,the behavior is undefined. Put another way, each mbstate_t object is initially "unbound" (if it is initialized to zero) and then becomes "bound" by any call to a function such as mbrtowc or wcrtomb. When "bound" it can only be used in the same direction with the same string as originally bound, and only when the LC_CTYPE category is that in effect when it was bound. With ordinary mbstate_t objects this is a annoyance; one implication is that a new object must be created every single time a new string is to be converted (the Standard does not provide any way to "unbind" the object). With the mbstate_t object inside a FILE structure it is even worse, because it makes it impossible to (for example) write to a file, rewind it, and then read the same file. Similarly, the internal mbstate_t objects used when the mbstate_t pointer argument is set to NULL can be used for only one string in the entire program ! Users of mbstate_t objects (including those in FILE structures) expect them to continue to work even when the locale is changed, and expect to be able to use them for more than a single purpose. Potential solutions ------------------- Any change to the Standard needs to not affect existing code. Luckily this is fairly simple: existing code must only use the mbstate_t object in the locale it was bound to and in the same way as when it was initially bound. However, any change should also make the above code work. One naive proposal is that, whenever a mbstate_t object returns to the initial state, it becomes unbound and can then be rebound. However, in most locales the initial state occurs often, and therefore there is a major risk that the object will be bound to the wrong locale at that time. The actual proposal below works in three stages: (1) Once an mbstate_t object is bound to a locale, it remains bound to that locale even after a call to setlocale(). [This gives meaning to code that is currently undefined, but does not affect any existing correct code.] (2) Explicit mechanisms are provided to change the status of an mbstate_t object in several ways: - return to the initial state, making it available for future conversion in either direction [for the wide->multi direction this was already possible using wcrtomb, while for the multi->wide direction a minor change is made to mbrtowc utilising previously unspecified behaviour]; - unbind the object, returning it to the initial state [done by assigning a zero value to it or with the new __mbsbind function]; - bind to a new locale, returning the mbstate_t object to the initial state [done with the new __mbsbind function]. (3) Various changes are made to the way that the mbstate_t object hidden in a FILE is processed. The previous changes ensure that, once bound to a locale, it remains bound to it. It can be returned explicitly to the initial state by a call to fseek, and also returns to the initial state after input reaches end-of-file (these choices were made to correspond with the requirements of 7.19.5.3p6 for changing I/O direction). Finally a new __mbsfbind function is provided to unbind the object or bind it to the current locale. (4) The internal mbstate_t objects associated with the mbrlen, mbrtowc, wcrtomb, mbsrtowcs, and wcsrtombs functions can only be used with the locale they initially bind to. Semantics are proposed to force the object to the unbound state; these use previously impossible cases where they exist, or in the case of wcrtomb are upwards compatible. Efficiency ---------- Some concerns have been expressed that, on some implementations, changing locale is a "heavyweight" operation and that such changes should therefore be resisted. However, the proposed changes do not actually alter the situation: the same locales will still be used at the same time. What does become possible is for the mbstate_t to usefully contain a pointer to cached locale information, and for the cache to be maintained more effectively. Suggested Technical Corrigendum ------------------------------- (Changes concerning explicit mbstate_t objects.) Change 7.24.6p3 to: [#3] The initial conversion state corresponds, for a conversion in either direction, to the beginning of a new multibyte character in the initial shift state. An mbstate_t object may be "unbound" or "bound". A zero-valued mbstate_t object is (at least) one way to describe an unbound object, and if an mbstate_t object is assigned such a value it it becomes unbound. All unbound mbstate_t objects are in the initial conversion state. An unbound object can be used to initiate conversion involving any multibyte character sequence, in any LC_CTYPE category setting, and then becomes bound to that category setting. When a bound mbstate_t object is used with any of the functions described in this subclause, the category it is bound to is used irrespective of the current LC_CTYPE category setting. If an mbstate_t object has been altered by any of the functions described in this subclause so as to not be in the initial conversion state, and is then used with a different multibyte character sequence, or in the other conversion direction, than on the most recent such function call, the behavior is undefined.290) Append to footnote 290: Furthermore, provided that the object is in the initial conversion state, it can then be used in converting a new string or in the other direction. Add a new subclause 7.24.6.2.2: 7.24.6.2.2 The __mbsbind function Synopsis [#1] #include int __mbsbind(mbstate_t *ps, int loc); Description [#2] The value of loc shall be 0 or 1. If ps is not a null pointer, the pointed-to mbstate_t object is made unbound (if loc is 0) or bound to the current LC_CTYPE category setting in the initial conversions state (if loc is 1), irrespective of its previous state. Returns [#3] The __mbsbind function returns zero normally or a negative value if ps is a null pointer or some other error occurred.290a) 290a The __mbsbind function is not required to detect any other errors. Change 7.24.6.3p1 and 7.24.6.4p1 from: [...] which is initialized at program startup to the initial conversion state. [...] to: [...] which is initialized at program startup to the unbound state. [...] Change 7.24.6.3.2p2 to: [#2] If s is a null pointer, the mbrtowc function is equivalent to the call: mbrtowc(NULL, "", 1, ps) ++ except that the resulting state described is the initial ++ conversion state even if an encoding error occurred.290b) In this case, the values of the parameters pwc and n are ignored. ++ 290b The only possible return values are 0 and (size_t)-1. ++ The effect is reliably to set *ps to the initial conversion ++ state while remaining bound. In 7.24.6.3.2p4, change "positive" to "<= n" (the two error values are actually large and positive). Append a footnote to 7.24.6.3.3p2: 291a The effect is reliably to set *ps to the initial conversion state while remaining bound. (Changes concerning mbstate_t objects associated with streams.) Append to 7.19.2p6: If a wide character input function encounters end-of-file, or after a successful call to the fseek function, the mbstate_t object associated with the stream describes the initial conversion state. Append to the last sentence of 7.19.9.2p5: and if the stream is wide-oriented the associated mbstate_t object shall be set to the initial conversion state. In 7.24.3.1p2, change: to: [...] If the stream is at end-of-file, the end-of-file indicator for the stream ++ is set, the mbstate_t object associated with the stream is ++ set to the initial conversions state, and fgetwc returns WEOF. [...] Add a new subclause 7.24.6.2.3: 7.24.6.2.3 The __mbsfbind function Synopsis [#1] #include #include int __mbsfbind(FILE *stream, int loc); Description [#2] The __mbsfbind function is equivalent to: if (fwide(stream, 1) > 0) __mbsbind(&mbsobj, loc); where mbsobj is the mbstate_t object associated with stream when it is wide-oriented, except that the returned value can represent different errors. Returns [#3] The __mbsfbind function returns zero normally or a negative value if some error occurred.290c) 290c The __mbsfbind function is not required to detect any errors. (Changes associated with internal mbstate_t objects.) Append to 7.24.6.3.2p3: As a special case, if n is (size_t)-1 then ps becomes unbound irrespective of its previous state and an unspecified value is returned. Append to 7.24.6.3.3p2: If additionally ps is a null pointer, the internal mbstate_t object becomes unbound irrespective of its previous state. Append to 7.24.6.4p2: As a special case, if src is a null pointer then the normal behaviour of the function is ignored and instead ps becomes unbound irrespective of its previous state; an unspecified value is returned.