Document: N1930
Author: Jens Gustedt, INRIA
Date: 2015-04-24
Subject: Controlling expression of _Generic primary expression

Controlling expression of _Generic primary expression

Summary

This is a follow up of the now closed DR 423 which resulted in the clarification of the status of qualifications of rvalues.

This defect report aims to clarify the status of the controlling expression of _Generic primary expression:

Does the controlling expression of a _Generic primary expression undergo any type of conversion to calculate the type that is used to do the selection?

Implementers have given different answers to this question; gcc (choice 1 in the following) on one side and clang and IBM (choice 2) on the other side went quite opposite ways, resulting in severe incompatibility for _Generic expression that use qualifiers or arrays.

char const* a = _Generic("bla", char*: "blu");                 // clang error
char const* b = _Generic("bla", char[4]: "blu");               // gcc error
char const* c = _Generic((int const){ 0 }, int: "blu");        // clang error
char const* d = _Generic((int const){ 0 }, int const: "blu");  // gcc error
char const* e = _Generic(+(int const){ 0 }, int: "blu");       // both ok
char const* f = _Generic(+(int const){ 0 }, int const: "blu"); // both error

The last two lines, where gcc and clang agree, points to the nature of the problem: gcc treats all such expressions as rvalues and does all applicable conversions of 6.3.2.1, that is lvalue to rvalue and array to pointer conversions. clang treats them as lvalues.

Problem discussion

The problem arises to know whether or not the conversions of 6.3 apply to the controlling expression.

Integer promotions

Applying promotions would have as an effect that we wouldn't be able to distinguish narrow integer types from int. There is no indication that the text implies that form or conversion, nor that anybody has proposed to use _Generic like this.

Choice 1: Consequences of lvalue conversion

All conversion in 6.3.2.1 p2 describe what would in normal CS language be named the evaluation of an object. It has no provision to apply it to types alone. In particular it includes the special clause that uninitialized register variables lead to undefined behavior when undergoing lvalue conversion. As a consequence:

Any lvalue conversion of an uninitialized register variable leads to undefined behavior.

And thus

Under the hypothesis that the controlling expression undergoes lvalue conversion, any _Generic primary expression that uses an uninitialized register variable as controlling expression leads to undefined behavior.

Choice 2: Consequences not doing conversions

In view of the resolution of DR 423 (rvalues drop qualifiers) using _Generic primary expressions with objects in controlling expression may have results that appear surprising.

#define F(X) _Generic((X), char const: 0, char: 1, int: 2)
char const strc[] = "";
F(strc[0])   // -> 0
F(""[0])     // -> 1
F(+strc[0])  // -> 2

So the problem is here, that there is no type agnostic operator that results in a simple lvalue conversion for char const objects to char; all such operators also promote char to int.

Under the hypothesis that the controlling expression doesn't undergo conversion, any _Generic primary expression that uses a qualified lvalue of narrow type T can't directly trigger the association for T itself.

non-equivalence of the two approaches

For many areas the two approaches are feature equivalent, that is both allow to implement the same semantic concepts, but with different syntax. Rewriting code that was written with one of choices in mind to the other choice is in general not straight forward and probably can't be automated.

Application work around

Since today C implementations have already taken different paths for this feature, applications should be careful when using _Generic to remain in the intersection of these two interpretations. A certain number of design questions should be answered when implementing a type generic macro:

The following lists different strategies for common scenarios, that can be used to code type generic macros that will work with both of the choices 1 or 2.

Wide integers and floating point types

This is e.g the case of the C library interfaces in <tgmath.h>. If we know that the possible type of the argument is restricted in such a way, the easiest is to apply the unary plus operator +, as in

  #define F(X) _Generic(+(X),             \
    default: doubleFunc,                  \
    int: intFunc,                         \
    ...                                   \
    _Complex long double: cldoubleFunc)(X)

  #define fabs(X) _Generic(+(X),          \
    default: fabs,                        \
    float: fabsf,                         \
    long double: fabsl)(X)

This + sign ensures an lvalue to rvalue conversion, and, that it will error out at compilation time for pointer types or arrays. It also forcibly promotes narrow integer types, usually to int. For the later case of fabs all integer types will map to the double version of the function, and the argument will eventually be converted to double before the call is made.

Adding pointer types and converting arrays

If we also want to capture pointer types and convert arrays to pointers, we should use +0.

  #define F(X) _Generic((X)+0),           \
    default: doubleFunc,                  \
    char*: stringFunc,                    \
    char const*: stringFunc,              \
    int: intFunc,                         \
    ...                                   \
    _Complex long double: cldoubleFunc)(X)

This binary + ensures that any array is first converted to a pointer; the properties of 0 ensure that this constant works well with all the types that are to be captured, here. It also forcibly promotes narrow integer types, usually to int.

Converting arrays, only

If we k now that a macro will only be used for array and pointer types, we can use the [] operator:

  #define F(X) _Generic(&((X)[0]),        \
    char*: stringFunc,                    \
    char const*: stringFunc,              \
    wchar_t*: wcsFunc,                    \
    ...                                   \
    )(X)

This operator only applies to array or to pointer types and would error if present with any integer type.

Using qualifiers of types or arrays

If we want a macro that selects differently according to type qualification or according to different array size, we can use the & operator:

  #define F(X) _Generic(&(X),        \
    char**: stringFunc,              \
    char(*)[4]: string4Func,         \
    char const**: stringFunc,        \
    char const(*)[4]: string4Func,   \
    wchar_t**: wcsFunc,              \
    ...                              \
    )(X)

Possible solutions

The above discussion describes what can be read from the text of C11, alone, and not the intent of the committee. I think if the committee would have wanted a choice 2, the standard text would not have looked much different than what we have, now. Since also the intent of the committee to go for choice 1 seems not to be very clear from any additional text (minutes of the meetings, e.g) I think the reading of choice 2 should be the preferred one.

Suggested Technical Corrigendum (any choice)

Amend the list in footnote 121 for objects with register storage class. Change

Thus, the only operators that can be applied to an array declared with storage-class specifier register are sizeof and _Alignof.

Thus, an identifier with array type and declared with storage-class specifier register may only appear in primary expressions and as operand to sizeof and _Alignof.

Suggested Technical Corrigendum (Choice 2)
Change 6.5.1.1 p3, first sentence

The controlling expression of a generic selection is not evaluated and the type of that expression is used without applying any conversions described in Section 6.3.

Add _Generic to the exception list in 6.3.2.1 p3 to make it clear that array to pointer conversion applies to none of the controlling or association expression if they are lvalues of array type.

Except when it is the controlling expression or an association expression of a _Generic primary expression, or is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.

Also add a forward reference to _Generic in 6.3.2.

Suggested Technical Corrigendum (Choice 1)
If the intent of the committee had been choice 1 or similar, bigger changes of the standard would be indicated. I only list some of the areas that would need changes:

Also, add _Generic to the exception list in 6.3.2.1 p3 to make it clear that array to pointer conversion applies to none of the association expression if they are lvalues of array type.

Except when it is an association expression of a _Generic expression, or is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.

Suggested Technical Corrigendum (Status quo)
A third possibility would be to leave this leeway to implementations. I strongly object to that, but if so, I would suggest to add a phrase to 6.5.1.1 p3 like:

... in the default generic association. Whether or not the type of the controlling expression is determined as if any of conversions described in Section 6.3 are applied is implementation defined. None of the expressions ...



Previous Defect Report < - > Next Defect Report