Clarify array length specifications and sizeof expressions

Jens Gustedt, INRIA and ICube, France
Martin Uecker, Graz University of Technology, Austria

2023-12-10

document history

document number date comment
n3187 202312 this paper, original proposal

license

CC BY, see https://creativecommons.org/licenses/by/4.0

1 Problem description

1.1 What evaluation do we want for array lengths and sizeof expressions?

With C23, the behavior of certain types of expressions that involve varible array lengths and sizeof has changed from some prescribed behavior to implementation-defined.

The problem is an innocent looking phrase in 6.5.3.4 p2 (for sizeof)

… If the type of the operand is a variable length array type, the operand is evaluated; …

which of course states a necessity: to determine the size of a VLA an evaluation of the hidden state is mandatory. But the generality of that phrase implies a probably unwanted “side effect”. Expressions that use casts can be of VLA type and still use other operators with side effects outside the VLA type expression itself. An example

sizeof *(double(*)[n])++p

Here, p is thought to be a pointer to some complete type, and n is an identifier with integer type and value; the pointer value is cast to a pointer to array type which is then indirected to an array type. So the type of the operand is double[n] which may or may not be an constant length or variable length array.

For C17, the only clear case where the array could be a constant length array was when n designated an enumeration constant. For n the name of a variable, it was perhaps debatable if the standard allowed it to be constant expressions, but common interpretation excluded this and WG14 voted in fact to clarify this in this sense, see below. So the above sizeof expression was commonly treated as not being a constant expression. Thus the operand had to be evaluated and the side effect of the increment operator had to be applied, too.

Gcc follows this line of argument since early versions, in particular they have const-qualified variables that they accepted as “constant expression” (for example for the initialization of static variables) but not as “integer constant expression”.

Since the specification didn’t seem clear enough, during the elaboration C23 we first (in Jan 2022) constrained that property for n by making it explicit, that a variable (and other similar forms of constant expressions of integer type) cannot constitute an integer constant expression, and that in consequence ++p had to be evaluated. Later (Jun 2023), we then reverted the position and decided to leave the possibility of having such a variable as an integer constant expression to the appreciation of the implementation. Namely in C23 we now have the following

14 An implementation may accept other forms of constant expressions; however, it is implementation-defined whether they are an integer constant expression.

which changes the circumstances in which ++p is evaluated from mandatory to implementation-defined.

So C23 introduced a normative change, here, that now makes programs containing such code non-portable: there is no feature test for this property proposed by the standard.

We think that this new portability problem is merely artificial and should not be blamed on the concept of VLA. It is only there because we did not constrain the permitted expressions in sizeof operands enough. As a relatively simple solution we propose to ban operators with side effects from all array length and from all sizeof expressions where the operand has array type. Choosing all array types here is voluntary, because we want a constraint to trigger for all code, regardless on which side of the implementation-defined barrier the particular development platform is found.

With this choice of not having side effects, we will still have that some implementations have a specific array as constant length and some other have it as variable length. Nevertheless other than for some very special _Generic expressions, this distinction alone has not much implications for real-life programs. For the variable length case there could be, in principle, an lvalue conversion of some internal state, but that state does not change during the lifetime of the type. So in reality, applications will not notice any difference whichever way their implementation goes.

Another option would be to ban these operators in all sizeof expressions, not only for array types. The proposed text could be easily changed. Nevertheless we thought that this might have a too big impact on code that has nothing to do with VLA or array length expressions in general, and should thus be avoided at first.

1.2 What is the prototype for a function with VLA parameters?

By going through the standard with this problem in mind we found another inconsistency that we think should be improved at the same time, namely the status of parameters of variable modified type. For these there are basically two sets of rules:

These specifications leave a gap (or two depending on the POV), namely to know what the prototype of a function that only has a definition is, when it is called.

void foo(size_t n, char buf[n][n]) {
    ...
    // what type for foo for recursive call?
}

void bar(void) {
    // what type for foo on call?
}

Here, it is clear that for any call one array dimension is rewritten to a pointer. But what about the second? In principle inside the function itself could be as if we had a declaration that somehow kept the other dimension to a variable n. But that would mean that foo is somehow an external symbol with a VM type in its type description.

There is currently no text for this: the text for declaration only functions clearly doesn’t apply for both cases, the text for definitions would imply the definition of a VM type for the parameters but which would live in file scope.

1.3 What is the correct terminology when speaking of arrays?

Currently the standard badly mixes terminology when talking about arrays, namely it talks of array size, length and bound when referring basically to the same concept. This is quite confusing, but could be fixed relatively easily. We propose to consequently talk about

For arrays this is consistent with the use in variable length array.

2 Array declarators

Clause 6.7.6.2 is very confusing, because for example it talks about a “size” which refers to the assignment-expression in the syntax, but which is exactly not that, a size, but is the number of array elements. Also it leaves some evaluations of length expressions and visibility of their side effect to the merci of the implementation, and the visible type of a function definition after the function body ends is obscure.

The current specification reads:

6.7.6.2 Array declarators

Constraints

1 In addition to optional type qualifiers and the keyword static, the [ and ] may delimit an expression or a *. If they delimit an expression (which specifies the size of an array), the expression shall have an integer type. If the expression is a constant expression, it shall have a value greater than zero. The element type shall not be an incomplete or function type. The optional type qualifiers and the keyword static shall appear only in a declaration of a function parameter with an array type, and then only in the outermost array type derivation.

2 …

3 …

4 If the size is not present, the array type is an incomplete type. If the size is * instead of being an expression, the array type is a variable length array type of unspecified size, which can only be used as part of the nested sequence of declarators or abstract declarators for a parameter declaration, not including anything inside an array size expression in one of those declarators173); such arrays are nonetheless complete types. If the size is an integer constant expression and the element type has a known constant size, the array type is not a variable length array type; otherwise, the array type is a variable length array type. (Variable length arrays with automatic storage duration are a conditional feature that implementations need not support; see 6.10.9.3.)

5 If the size is an expression that is not an integer constant expression: if it occurs in a declaration at function prototype scope, it is treated as if it were replaced by *; otherwise, each time it is evaluated it shall have a value greater than zero. The size of each instance of a variable length array type does not change during its lifetime. Where a size expression is part of the operand of a typeof or sizeof operator and changing the value of the size expression would not affect the result of the operator, it is unspecified whether or not the size expression is evaluated. Where a size expression is part of the operand of an alignof operator, that expression is not evaluated.

We propose to add terminology to talk consistently of “array length” and to better distinguish arrays with variable length and constant length. Then we propose a normative change by clearly defining rules for the evaluation of array length expressions. Normative changes are marked like this, non-normative changes are not marked specially.

1 In addition to optional type qualifiers and the keyword static, the [ and ] may delimit an assignment expression or *. If they delimit an assignment expression, it shall have an integer type and it or any subexpression shall not use assignment, increment and decrement operators and shall not apply the indirection and array subscript operators to a pointer to volatile-qualified target type. If the assignment expression is a constant expression, it shall have a value greater than zero. The element type shall not be an incomplete or function type. The optional type qualifiers and the keyword static shall appear only in a declaration of a function parameter with an array type, and then only in the outermost array type derivation.

2 …

3 …

4 If present, the assignment expression or the * punctuator is called the length of the declarator. If the length is not present, the array type is an incomplete type. If the length is * instead of being an assignment expression, the array type is a variable length array type of unspecified size, which can only be used as part of the nested sequence of declarators or abstract declarators for a parameter declaration, not including anything inside an array length expression in one of those declarators173); such arrays are nonetheless complete types. If the length is an integer constant expression and the element type has a known constant size, the array type is a constant length array type; otherwise, the array type is a variable length array type. (Variable length arrays with automatic storage duration are a conditional feature that implementations need not support; see 6.10.9.3.)

5 If the length is an assignment expression that is not an integer constant expression: if it occurs in a declaration at function prototype scope, it is treated as if it were replaced by *. Otherwise, each time the length is evaluated it shall have a value greater than zero and the evaluation shall have no side effects. Nevertheless, for a function f that has such a definition with variably modified types, the identifier f has a type as if only declared (and not defined) and the same replacement rules as for function prototype scope apply for the purpose of determining the type of f; this not withstanding within their visibility scope, function parameters with variably modified type have a known length as described previously. The length of each instance of a variable length array type is stored in a hidden state of the execution as if an object of const-qualified but not volatile-qualified integer type is declared in the same scope as the declared array type; it does not change during the lifetime of the array type. Where a length expression is part of the operand of an alignof operator, that expression is not evaluated.

The new constraints can’t cover function calls, because their side effects (or lack thereof) are not visible directly. We propose to add a recommended practice after p6

Recommended practice

6’ If an array length expression contains a function call, it is recommended that the called function is unsequenced such that under no circumstances side effects may occur. In addition, it is recommended that, as far as possible, array length expressions that produce side effects are diagnosed.

2.1 Alternative variant that would be more restricted

The changes proposed above have the disadvantage that they introduce new undefined behavior in the standard, in particular for side effects that would be hidden in function calls or that would be triggered by floating point computations. Since with C23 we have indeed the possibility to better describe what we expect and to place the interdiction completely into the constraint:

1 In addition to optional type qualifiers and the keyword static, the [ and ] may delimit an assignment expression or *. If they delimit an assignment expression, it shall have an integer type and it or any subexpression shall not use assignment, increment and decrement operators, shall not apply the indirection and array subscript operators to a pointer to volatile-qualified target type, shall not perform evaluation or arithmetic of floating point type, and shall not evaluate function call expressions unless the corresponding function pointer has a type that has the [[unsequenced]] attribute. If the assignment expression is a constant expression, it shall have a value greater than zero. The element type shall not be an incomplete or function type. The optional type qualifiers and the keyword static shall appear only in a declaration of a function parameter with an array type, and then only in the outermost array type derivation.

2 …

3 …

4 If present, the assignment expression or the * punctuator is called the length of the declarator. If the length is not present, the array type is an incomplete type. If the length is * instead of being an assignment expression, the array type is a variable length array type of unspecified size, which can only be used as part of the nested sequence of declarators or abstract declarators for a parameter declaration, not including anything inside an array length expression in one of those declarators173); such arrays are nonetheless complete types. If the length is an integer constant expression and the element type has a known constant size, the array type is a constant length array type; otherwise, the array type is a variable length array type. (Variable length arrays with automatic storage duration are a conditional feature that implementations need not support; see 6.10.9.3.)

5 If the length is an assignment expression that is not an integer constant expression: if it occurs in a declaration at function prototype scope, it is treated as if it were replaced by *. Otherwise, each time the length is evaluated it shall have a value greater than zero. Nevertheless, for a function f that has such a definition with variably modified types, the identifier f has a type as if only declared (and not defined) and the same replacement rules as for function prototype scope apply for the purpose of determining the type of f; this not withstanding within their visibility scope, function parameters with variably modified type have a known length as described previously. The length of each instance of a variable length array type is stored in a hidden state of the execution as if an object of const-qualified but not volatile-qualified integer type is declared in the same scope as the declared array type; it does not change during the lifetime of the array type. Where a length expression is part of the operand of an alignof operator, that expression is not evaluated.

NOTE: The exclusion of operations in the constraints ensure that array length expression will only perform value computations but never initiate side effects, see 5.1.2.3.

No recommended practice would be necessary for this variant.

3 The sizeof and alignof operators

Here, the current text reads:

6.5.3.4 The sizeof and alignof operators

Constraints

1 The sizeof operator shall not be applied to an expression that has function type or an incomplete type, to the parenthesized name of such a type, or to an expression that designates a bit-field member. The alignof operator shall not be applied to a function type or an incomplete type.

Semantics

2 The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

This text has several problems, marked with strike through.

  1. The first, is that “name of the type” is not an introduced term. The correct term is “type name”, which is the term that also the syntax derivation uses. But also, the mention in p2 doesn’t even contribute to more clarity; the syntax derivation already has enough information that needs not to be repeated.
  2. The second is that specifying here at that place that it is an integer does not help at all; p5 provides the necessary details for that later, anyhow.
  3. Third, people are often confused with the model of evaluation that is used, here.
  4. Then, “integer constant” is well defined, but describes a lexical term which one should probably better call “integer literal”. We don’t think that a lexical replacement of a sizeof expression by a number token with integer type is intended, here.

We propose to replace that text with the following, again only normative changes are marked:

6.5.3.4 The sizeof and alignof operators

Constraints

1 The sizeof operator shall not be applied to an expression that has function type or an incomplete type, to a parenthesized type name, or to an expression that designates a bit-field member; if the type is an array type the operand shall not use assignment, increment and decrement operators and shall not apply the indirection and array subscript operators to a pointer to a volatile-qualified target type. The alignof operator shall not be applied to a function type or an incomplete type.

Semantics

2 The sizeof operator yields the size (in bytes) of its operand; it is determined from the type of the operand. If that type is a variable length array type the sizeof expression is said to be variable. In that case, the operand is evaluated and the sizeof expression is not an integer constant expression; the evaluation shall not produce side effects. Otherwise, the result is determined at translation time, the operand is not evaluated, and the sizeof expression is said to be constant and is an integer constant expression.

We also propose to add a recommended practice, here.

Recommended practice

If the operand of a sizeof expressions has an array type and contains a function call, it is recommended that the function has the unsequenced property.

The confusion in terminology also is present in the example of p8, where a comment uses the term “execution time sizeof” which has not been introduced. We propose to change the term to “variable sizeof”.

Similar problematic denomination is also present in the example 5 6.7.2.5 p10 for the typeof operators. A similar change should be applied here.

3.1 Alternative variant

Similar as the above for array length we may already have variant that puts all restrictions into the constraint section.

6.5.3.4 The sizeof and alignof operators

Constraints

1 The sizeof operator shall not be applied to an expression that has function type or an incomplete type, to a parenthesized type name, or to an expression that designates a bit-field member; if the type is an array type the operand shall not use assignment, increment and decrement operators, shall not apply the indirection and array subscript operators to a pointer to a volatile-qualified target type, shall not perform evaluation or arithmetic of floating point type, and shall not evaluate function call expressions unless the corresponding function pointer has a type that has the [[unsequenced]] attribute. The alignof operator shall not be applied to a function type or an incomplete type.

Semantics

2 The sizeof operator yields the size (in bytes) of its operand; it is determined from the type of the operand. If that type is a variable length array type the sizeof expression is said to be variable. In that case, the operand is evaluated and the sizeof expression is not an integer constant expression. Otherwise, the result is determined at translation time, the operand is not evaluated, and the sizeof expression is said to be constant and is an integer constant expression.

4 changing terminology

As a consequence, we propose the following changes in terminology, to be apply throughout the document.

C23 C2y
array bound array length
array size (expression) array length (expression)
non-variable length array constant length array
execution time sizeof variable sizeof
bounds-checking length-checking

Among others, this implies changes in Annex K (which becomes “Length-checking interfaces”) but not in Annex L where the term bound is used in a more generic way.

5 Questions

  1. Does WG14 want to apply the changes in #2?
  2. Does WG14 want to apply the changes in #2.1?
  3. Does WG14 want to apply the changes in #3?
  4. Does WG14 want to apply the changes in #3.1?
  5. Does WG14 want to engage in the changes in terminology as dexcribed in in #4?