N3695
Strict Compatibility for Translation-Time Type Matching Improvements

Published Proposal,

Previous Revisions:
None
Authors:
Paper Source:
GitHub
Issue Tracking:
GitHub
Project:
ISO/IEC 9899 Programming Languages — C, ISO/IEC JTC1/SC22/WG14
Proposal Category:
Change Request, Feature Request
Target:
C2y

Abstract

The recent changes to generic selection have allowed an undefined behavior that was only available to functions as a necessary evil to now leak into generic. This paper attempts to provide some stricter constraints without jeopardizing the use cases of the necessary feature.

1. Changelog

1.1. Revision 0 - January 7th, 2025

2. Introduction and Motivation

There are several strange hiccups and problems with _Generic as it concerns constant sized arrays, variable length arrays, _BitInt, and other (potentially new) feature sets. Aaron Ballman’s [N3260] provided a way to do direct type matching, which eliminated some of these concerns when using a type as the controlling indicator for generic selection. But, this was not enough to stop the few situations where implementation quirks were exposed and the inadequacies of type compatibility for a compile-time, zero run-time cost feature that _Generic was meant to be.

Additionally, the integration of Martin Uecker’s [N3348] has let an existing Undefined Behavior leak into _Generic by allow Variable-Length Arrays (VLAs) of non-constant size match with arrays of both constant and non-constant sizes (including ones whose size does not match the input, perhaps at multiple dimensions). The problem was acknowledged during the integration of [N3348] by WG14, but it was voted in with the promise that a paper would provide a better array matching capability later on.

There was some degree of agreement to pursue a similar-but-differently-specified approach to matching that was contained in [N3441]. Unfortunately, a revision of this paper was not ready for WG14 by the previous meeting, and therefore it was skipped over and WG14 simply settled on Martin Uecker’s [N3348]. At the meeting, it was voiced that if it was possible, another paper doing better matching should be brought forward, and one of us (JeanHeyd) volunteered to produce a paper that was in line with Martin Uecker’s design in [N3348].

The core approach for multidimensional matching and stricter matching from our [N3441] is still possible even after the integration of Martin Uecker’s tweaks, repurposing the use of [*] and [] in multidimensional matching to be more in line with what Martin Uecker designed and what WG14 has accepted in the current Working Draft of the C Standard. This paper sets out performing that behavior and extending the concept even to _BitInt, to aid with improved _Generic matching.

2.1. Unusual Array Behavior

Consider the following:

int main () {
  int arr[10] = {};
  int result = _Generic(typeof(arr),
    int[10]: 0,
    int[11]: 1,
    int[20]: 2,
    default: 3
  );
  return result;
}

This works just fine: constant sized arrays are considered compatible only with arrays of the same type and constant size. The above programs compiles, runs, returns 0 dependably, and exits. Consider the same program, however, with a variable length array for the controlling type of the generic selection expression:

int main () {
  int n = 20;
  int vla[n] = {};
  int result = _Generic(typeof(vla),
    int[10]: 0,
    int[11]: 1,
    int[20]: 2,
    default: 3
  );
  return result;
}

This program is fine even with the typeof(vla) thanks to the latest changes in the C Working Draft ([N3348]). The problem is after that, as every non-default branch of this generic selection is considered a match. Every variable length array is compatible with every other kind of array insofar as the compatibility rules are concerned. This provokes a constraint violation after the whole _Generic, in that only one branch of a generic selection expression may match the controlling type (or none, in which case there must be a default branch).

[N3348] exacerbated this problem by not only leaving the compatibility rules around this matter unresolved, but introducing variable length array types as a plausible branch for generic selection by stripping the constraint out:

int main () {
  int n = 20;
  int vla[n] = {};
  int result = _Generic(typeof(vla),
    int[20]: 0, // VLA... matches?
    int[11]: 1,
    int[10]: 2,
    int[ n]: 3, // VLA matches?? Double-compatibility issue??
    default: 4
  );
  return result;
}

Unfortunately, this too results in the same problem: all of these branches are considered compatible with one another under the changes and direction that [N3348] added to the C standard around August 2025. Even if one went back to matching on constant sized arrays for the controlling type, this code would still not compile because the VLA branch is considered compatible with all of the other type branches in the list: the compiler would reject the code still as no two generic selection branches may contain a compatible type, either:

int main () {
  int n = 20;
  int arr[20] = {};
  int result = _Generic(typeof(arr),
    int[10]: 0,
    int[11]: 1,
    int[20]: 4,
    int[ n]: 3, // compiler error: non-distinct branch from
                 // every other generic selection branch
    default: 4
  );
  return result;
}

This continues to deteriorate when using int[*] to match "any-sized integer array"; both of them are compatible with one another and they both match on arrays that are either variable length arrays or constant sized arrays. Nominally, this might not be a problem, except there is further issue: the compatibility rules themselves have Bad Behavior on them even if you strip out all of the compatible match branches and only have one:

int main () {
  int n = 10;
  int arr[n] = {}; // variable length array
  int result = _Generic(typeof(arr),
    int[11] : 0, // this matches for some reason???
    default: 1
  );
  return result;
}

This program returns 0, which makes no sense from the perspective of _Generic, which is a translation time-capable matching algorithm. The sizes do not match, but because we defined this in terms of compatibility (and all constant sized and variable length arrays are compatible with one another) we have introduced undefined behavior here. Worse, this gives the impression that the array has the size 11 when it clearly does not. This is easily spotted in these simple toy programs, but is far less usable when applied to larger programs with much more complex control flow and no ahead-of-time knowable constant values.

The acceptance of [N3348] made this situation slightly worse by allowing variable length arrays to be put inside of _Generic as well, leading to a situation where variable length arrays can easily match array types that are not of the same length.

int main () {
  int n = 10;
  int arr[20] = {};
  int result = _Generic(typeof(arr),
    int[n] : 0, // fixed-size arrays now match against any variable length array
    default: 1
  );
  return result;
}

Given these issues, we propose a general overhaul and a new phrase-of-power that we are going to term strict compatibility, that would be applied within generic selection context. The goal is to remove the recently introduced undefined behavior in generic selection and bring better type matching.

3. Design

This paper is, exclusively, focusing on improvements to _Generic. In particular, we:

We do this by defining "strict compatibility" first and foremost, which more precisely allows Known Constant Length Arrays, Incomplete Arrays, and Variable Length Arrays to match an array dimension with a size specification of T[*]. T[] allows Known Constant Length Arrays and Incomplete Arrays, but not Variable Length Arrays. T[CONSTANT] precisely matches T[CONSTANT], where the two constants compute the same integer constant expression.

Then, we define a second algorithm in the Generic Selection section of the standard that ranks certain kinds of array matches higher than others based on the use of [], [*]/[non_constant], and [CONSTANT]. The algorithm effectively states that, provided all of the strictly compatible associations are all derived from pointer or array types, we count how many constant, unknown size, and "any" (variable length or *) matches there are. The one with the highest "constant" dimensions wins, and if not that then the one with the highest "unknown size" dimensions win, and then finally the one with the highest "any" dimension matches wins. If any are equal and we run out of discriminators, then too many are compatible and it results in an error. The reason it is important to count is because you want something that looks like this:

int main () {
  return _Generic(int[2][2],
    int[][2]: 0,
    int[2][]: 1
  );
}

to be an error, while this:

int main () {
  return _Generic(int[2][2],
    int[2][2]: 0,
    int[2][]:  1
  );
}

should return 0. It also makes it so [*] binds "last" since it binds to any given dimension, while [] either binds to itself or a constant:

int main () {
  return _Generic(int[2][2],
    int[2][*]: 1
    int[2][]:  0
  )
  + _Generic(int[2][2],
    int[*][*]: 1
    int[*][]:  0
  );
  // equivalent to return 0 + 0;
}

Obviously, [2] and [3] are considered different for the purposes of the algorithm as well (they have to compute the same known constant value in order to match), which finally gives up proper separation of array types in _Generic.

3.1. Undefined Behavior / Perceived Value Mismatch

This paper eliminates the undefined behavior of matching constant-sized arrays or incomplete array to variable-length arrays and vice-versa. The case of undefined behavior / wrongful perception that it does not fix that was introduced by [N3348] is where a VLA is matched to another VLA of the wrong size:

int main () {
  int n = foo();
  int m = n + 1;
  _Generic(int[n][m],
    int[n][m]: 1, // dimensions are swappe (incorrect), but
    // generic selection still picks this one
  );
}

This is because the variable length array types are all substituted with [*], such that the compiler reads it as:

int main () {
  int n = foo();
  int m = n + 1;
  _Generic(int[n][m],
    int[*][*]: 1, // all variable length dimensions are
    // replaced with [*], so technically fine even if
    // visually misleading
  );
}

This can be problematic for reading purposes and for other uses. But, this 'cat" is already "out of the bag", so to speak, since the changes in [N3348]. There’s no real fix for this, unfortunately, other than just going back to banning the use of non-[*] variable length arrays in the type names of generic associations. But we removed that constraint, so this paper simply leaves this one untouched case of seeming undefined behavior in place, even though by-the-specification the replacement makes it somewhat "better'.

3.2. Array Usage Examples: Constant Sized Input

Here is an example of expected behavior from matching on a constant sized array with the whole gamut of different types deployed:

#include <assert.h>

int main () {
  int arr[10] = {};
	
  int result_constants = _Generic(typeof(arr),
    int[10]: 0,
    int[11]: 1,
    int[20]: 2,
    default: 3
  );
  assert(result_constants == 0);

  int result_constant_and_incomplete = _Generic(typeof(arr),
    int[10]: 0,
    int[]: 1,
    default: 2
  );
  assert(result_constant_and_incomplete == 0);

  int result_incomplete = _Generic(typeof(arr),
    int[]: 0,
    default: 1
  );
  assert(result_incomplete == 0);

  int result_incomplete_and_vla = _Generic(typeof(arr),
    int[]: 0,
    int[*]: 1,
    default: 2
  );
  assert(result_incomplete_and_vla == 0);

  int result_incomplete_constant_and_vla = _Generic(typeof(arr),
    int[10]: 0,
    int[]: 1,
    int[*]: 2,
    default: 3
  );
  assert(result_incomplete_constant_and_vla == 0);
	
  return 0;
}

3.3. Array Usage Examples: Variable Length Input

Here is a similar example, but with the input array being a VLA:

#include <assert.h>

int main () {
  int n = 10;
  int vla[n] = {};
	
  int vla_result_constants = _Generic(typeof(vla),
    int[10]: 1,
    int[11]: 2,
    int[20]: 3,
    default: 0
  );
  assert(vla_result_constants == 0);

  int vla_result_constant_and_incomplete = _Generic(typeof(vla),
    int[10]: 1,
    int[]:   0,
    default: 2
  );
  assert(vla_result_constant_and_incomplete == 0);

  int vla_result_incomplete = _Generic(typeof(vla),
    int[]:   1,
    default: 0
  );
  assert(vla_result_incomplete == 0);

  int vla_result_incomplete_and_vla = _Generic(typeof(vla),
    int[]:   1,
    int[*]:  0,
    default: 2
  );
  assert(vla_result_incomplete_and_vla == 0);

  int vla_result_incomplete_constant_and_vla = _Generic(typeof(vla),
    int[10]: 1,
    int[]:   2,
    int[*]:  0,
    default: 3
  );
  assert(vla_result_incomplete_constant_and_vla == 0);
	
  return 0;
}

4. Prior Art

There is no prior art for this. We intend to gauge Committee reaction since this was supported by commentary during the integration of Martin Uecker’s [N3348].

5. Wording

The wording is relative to the latest Working Draft.

5.1. Intent

The intent of this wording is to provide:

5.2. Modify §6.2.7 "Compatible type and composite type"

6.2.7 Compatible type and composite type

Two types are compatible types if they are the same. Additional rules for determining whether two types are compatible are described in 6.7.3 for type specifiers, in 6.7.4 for type qualifiers, and in 6.7.7 for certain declarators. Two pointer types are compatible if both are identically qualified and both are pointers to compatible types. Two array types are compatible if and only if both of the following hold:

  • they have compatible element types;
  • and, if both array length expressions are present, and are integer constant expressions, then they same constant value.
If the two array types are used in a context which requires them to be compatible, the behavior is undefined if the lengths of both are specified and the corresponding array length expressions evaluate to unequal values. Moreover, two complete structure, union, or enumerated types declared with the same tag are compatible if members satisfy the following requirements:
  • there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types;

  • if one member of the pair is declared with an alignment specifier, the other is declared with an equivalent alignment specifier;

  • and, if one member of the pair is declared with a name, the other is declared with the same name.

For two structures, corresponding members shall be declared in the same order. For two unions declared in the same translation unit, corresponding members shall be declared in the same order. For two structures or unions, corresponding bit-fields shall have the same widths. For two enumerations, corresponding members shall have the same values; if one has a fixed underlying type, then the other shall have a compatible fixed underlying type. For determining type compatibility, anonymous structures and unions are considered a regular member of the containing structure or union type, and the type of an anonymous structure or union is considered compatible with the type of another anonymous structure or union, respectively, if their members fulfill the preceding requirements.

Furthermore, two structure, union, or enumerated types declared in separate translation units are compatible in the following cases:

  • both are declared without tags and they fulfill the preceding requirements;

  • both have the same tag and are completed somewhere in their respective translation units and they fulfill the preceding requirements;

  • both have the same tag and at least one of the two types is not completed in its translation unit.

Otherwise, the structure, union, or enumerated types are incompatible.

Two types are strictly compatible if, in certain cases, they meet more stringent requirements than just being compatible. Two pointer types are strictly compatible if the referenced types are strictly compatible and both are identically qualified. For two array types:

  • if they both have a known constant length, then the two types are strictly compatible if the constant lengths are the same and the element types are strictly compatible;

  • if they both have an unknown size, then the two types are strictly compatible if the element types are strictly compatible;

  • if they both have a length expression of * , then the two types are strictly compatible if the element types are strictly compatible;

  • if one has a known constant length and the other has unknown size, then the two types are strictly compatible if the element types are strictly compatible;

  • if one has a non-constant length and the other has unknown size, then the two types are strictly compatible if the element types are strictly compatible;

  • or, if one has a non-constant length and the other has an array length expression of *, then the two types are strictly compatible if the element types are strictly compatible.

Otherwise, the two array types are not strictly compatible. Other types are strictly compatible if they are compatible.

All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.

...

5.3. Modify §6.5.2.1 "Generic selection"

6.5.2.1 Generic selection
Syntax
...
Constraints

A generic selection shall have no more than one default generic association. No two generic associations in the same generic selection shall specify strictly compatible types. If the generic controlling operand is an assignment expression, the controlling type of the generic selection expression is the type of the assignment expression as if it had undergone an lvalue conversion, array to pointer conversion, or function to pointer conversion. Otherwise, the controlling type of the generic selection expression is the type designated by the type name. The controlling type shall be compatible with at most one of the types named in the generic association list. If a generic selection has no default generic association, its controlling type shall be compatible with exactly one of the types named in its generic association list. One generic association shall be the selected association as described further in this subclause, unless there is a default generic association in the generic association list, in which case there may be one or zero selected associations.

Semantics

Let determine-association(Tcontrol, Tassoc) be the algorithm which takes two strictly compatible types Tcontrol and Tassoc as parameters. It computes an ordered triplet of integer values as the result, with each value of the triplet called (Nconstant, Nunknown, Nany). Each value of the ordered triplet Nconstant, Nunknown, and Nany that make up the result of the algorithm starts at 0, and then is modified as described in the following.

  • If Tassoc is a pointer type, then the current result is added to the result of determine-association utilized on Tcontrol and Tassoc’s referenced type and that is yielded as the result.

  • Otherwise, if Tassoc is an array type, then the values of the triplet are added to the result of determine-association utilized on with the element types of Tcontrol and Tassoc. Then, the values are increased by the following before the algorithm then yields that as the result:

    • if Tcontrol has a known constant length and Tassoc has a known constant length, increase Nconstant by 1;

    • if Tcontrol has unknown size and Tassoc has either a known constant length or unknown size, increase Nunknown by 1;

    • or, if Tcontrol or Tassoc has a size of * or a non-constant, variable length, increase Nany by one.

  • Otherwise, if Tassoc is not an array type or a pointer type, the algorithm yields the current value of the triplet (Nconstant, Nunknown, Nany).

NOTE   The following table shows sample invocations and utilizations of determine-association.

Tcontrol Tassoc determine-association(Tcontrol, Tassoc)
int int (0, 0, 0)
int[] int[] (0, 1, 0)
int[20] int[20] (1, 0, 0)
int[20] int[] (0, 1, 0)
int[20] int[*] (0, 0, 1)
int[*] int[20] (0, 0, 1)
int[] int[20] (0, 1, 0)
int[*][] int[*][*] (0, 0, 2)
int[*][] int[][*] (0, 0, 2)
int[*][] int[*][] (0, 1, 1)
int(*)[20] int(*)[] (0, 1, 0)
int(*)[20][20] int(*)[*][20] (1, 0, 1)
int[20][20][20] int[20][20][20] (3, 0, 0)
int[20][20][20] int[20][*][*] (1, 0, 2)
int[20][20][20] int[20][][20] (2, 1, 0)
int[20][20][20] int[][][20] (1, 2, 0)

Let the controlling type be "Tcontrol" and the generic association’s type name be "Tassoc". Let "compatible associations" be the generic associations where Tcontrol is strictly compatible with Tassoc.

  • If there are zero compatible associations and there is no default generic association, a constraint is violated.

  • Otherwise, if there are zero compatible associations and there is a default generic association, that is the selected association.

  • Otherwise, if there is only one compatible association, that is the selected association.

  • Otherwise, there are two or more compatible associations. Each compatible association whose type name is Tcompat is used to produce a set of ordered integer triplets by invoking the algorithm determine-association(Tcontrol, Tcompat), where Tcontrol and Tcompat are the arguments. The selected association is the compatible association with the ordered triplet that is lexicographically greatest, nonzero and unique.✨FN) If any of the resulting triplets is equivalent to (0, 0, 0), then there are multiple compatible associations where one or more are neither pointer type derivations or array type derivations and a constraint has been violated. If there is not a unique, lexicographically greatest ordered triplet, then there are multiple compatible associations and a constraint is violated.

✨FN) (1, 0, 0) is greater than (0, 1, 0) and (2, 0, 0) is greater than (1, 500, 42).

The generic controlling operand is not evaluated. If a generic selection has a selected association generic association with a type name that is compatible with the controlling type , then the result expression of the generic selection is the expression in that generic association. Otherwise, the result expression of the generic selection is the expression in the default generic association. None of the expressions from any other generic association of the generic selection is evaluated.

The type and value of a generic selection are identical to those of its result expression. It is an lvalue, a function designator, or a void expression if its result expression is, respectively, an lvalue, a function designator, or a void expression.

EXAMPLE    A cbrt type-generic macro can be implemented as follows:
#define cbrt(X) _Generic((X), \
  long double: cbrtl, \
  default: cbrt, \
  float: cbrtf \
  )(X)
EXAMPLE    The following two generic selection expressions select different associations because the assign- ment expression operand undergoes lvalue conversion while the type name operand is unchanged:
int func(const int i) {
  return _Generic(i,
    int : 0, // int is selected
    const int : 1,
    default : 2) +
  _Generic(typeof(i),
    int : 0,
    const int : 1, // const int is selected
    default : 2);
}

Thus, this function returns 1.

EXAMPLE    The following generic selection expressions are valid and all evaluate to 1.
void foo(int n, int m) {
  _Generic(int[3][2], int[3][*]: 1, int[2][*]: 0);
  _Generic(int[3][2], int[*][2]: 1, int[*][3]: 0);
  _Generic(int[3][n], int[3][*]: 1, int[2][*]: 0);
  _Generic(int[n][m], int[*][*]: 1, char[*][*]: 0);
  _Generic(int(*)[2], int(*)[*]: 1);
}
EXAMPLE    Generic selection can match multidimensional arrays by using incomplete array types or variable length array types.
int main () {
  int result = _Generic(int[20][10],
    int[][*]: 0,
    default: 1
  );
  return result; // return 0
}

Constant array types are associated with other constant array types before they are associated with incomplete array types.

int main () {
  static_assert(_Generic(int[20][10],
    int[][]: 1,
    default: 0
  ))
  static_assert(_Generic(int[20][10],
    int[][]:   0,
    int[20][]: 1,
    default:   0
  ));
  static_assert(_Generic(int[20][10],
    int[][]:     0,
    int[20][]:   0,
    int[20][10]: 1,
    default:     0
  ));
}

Multiple types of equivalent compatible association with the controlling operand is a constraint violation.

int main () {
  static_assert(_Generic(int[20][10],
    int[][]:   0,
    int[][10]: 1,
    int[][*]:  0,
    default:   0
  ));

  static_assert(_Generic(int[20][10],
    int[*][]: 0,
    int[][]:  1,
  ));
  static_assert(_Generic(int[][],
    int[*][]: 0,
    int[][]:  1,
  ));
  static_assert(_Generic(int[][],
    int[][]:  1,
    int[56][]: 0,
  ));
	
  _Generic(int,
    int: 1,
    int: 1, // constraint violation
  );
  _Generic(int[20][10],
    int[20][]: 1,
    int[20][]: 1, // constraint violation
  );
  _Generic(int[20][10],
    int[][]: 1,
    int[][]: 1, // constraint violation
  );
  _Generic(int[20][10],
    int[*][]: 1,
    int[*][]: 1, // constraint violation
  );
  _Generic(int[20][10],
    int[20][]: 1,
    int[][10]: 1, // constraint violation
  );
}

EXAMPLE   A variable length array is associated with variable length array types before incomplete array types.

extern int n;

int main () {
  int vla[n] = {};
	
  static_assert(_Generic(typeof(vla),
    int[10]: 0,
    int[11]: 0,
    int[20]: 0,
    default: 1
  ));

  static_assert(_Generic(typeof(vla),
    int[10]: 0,
    int[]:   1,
    default: 0
  ));

  static_assert(_Generic(typeof(vla),
    int[]:   0,
    default: 1
  ));

  static_assert(_Generic(typeof(vla),
    int[]:   0,
    int[*]:  1,
    default: 0
  ));

  static_assert(_Generic(typeof(vla),
    int[10]: 0,
    int[]:   0,
    int[*]:  1,
    default: 0
  ));
}

int n = 10;

5.4. Delete compatibility requirements from §6.7.7.2 "Pointer declarators" to move them to Compatible/composite types section

2 For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.

5.5. Modify §6.7.7.3 "Array declarators", moving compatible requirements to Compatible/composite types section and allowing multiple incomplete array type specifications in generic selection

6.5.2.1 Generic selection
Constraints

In addition to optional type qualifiers and the keyword static, the [ and ] can delimit an expression or *. If they delimit an expression, called the array length expression, the expression shall have an integer type. If the expression is a constant expression, it shall have a value greater than zero. The element type shall not be an incomplete or function type. a function type. The element type shall not be an incomplete type, except when used as part of the potentially nested sequence of declarators or abstract declarators for the controlling type of a generic selection or as part of the type name for a generic association. The optional type qualifiers and the keyword static shall appear only in a declaration of a function parameter with an array type, and then only in the outermost array type derivation.

...
Semantics
...
...

Two array types are compatible if and only if both of the following hold:

  • They have compatible element types.

  • If both array length expressions are present, and are integer constant expressions, then they same constant value.

If the two array types are used in a context which requires them to be compatible, the behavior is undefined if the lengths of both are specified and the corresponding array length expressions evaluate to unequal values.

References

Informative References

[N3260]
Aaron Ballman. N3260 - Generic selection expression with a type operand. May 12th, 2024. URL: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n3260.pdf
[N3348]
Martin Uecker. N3348 - Matching of Multi-Dimensional Arrays in Generic Selection Expressions. June 28th, 2024. URL: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n3348.pdf
[N3441]
JeanHeyd Meneide; Shepherd. _Generic and VLA realignment and improvement, r1 (updates N3331). January 25th, 2025. URL: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n3441.htm