N2996
Enhancements to Enumerations

Published Proposal,

Previous Revisions:
N2963 (r4), N2904 (r3), N2575 (r2), n2533 (r1), n2008 (r0)
Authors:
Clive Pygott (LDRA Ltd.)
Latest:
https://thephd.dev/_vendor/future_cxx/papers/C - Enhanced Enumerations.html
Paper Source:
GitHub ThePhD/future_cxx
Issue Tracking:
GitHub
Project:
ISO/IEC JTC1/SC22/WG14 9899: Programming Language — C
Proposal Category:
Feature Request
Target:
C23

Abstract

Enumerations should have the ability to specify the underlying type to aid in portability and usability across platforms, across ABIs, and across languages (for serialization and similar purposes).

1. Changelog

1.1. Revision 5 - June 17th, 2022

1.2. Revision 4 - April 12th, 2022

1.3. Revision 3 - January 1st, 2022

1.4. Revision 2 - October 4th, 2020

1.5. Revision 1 - June 28th, 2020

1.6. Revision 0 - February 17th, 2016

2. Introduction and Motivation

C normally tries to pick int for its enumerations, but it’s entirely unspecified what the type for the enum will end up being. It’s constants (and the initializers for those constants) are always treated as ints, which is not very helpful for individuals who want to use things like enumerations in their bitfields with specific kinds of properties. This means it’s impossible to portably define an enumeration, which drastically decreases its usefulness and makes it harder to rely on enumeration values (and consequently, their type) in standard C code. This has led to a number of communities and tools attempting to do enumerations differently in several languages, or in the case of C++ simply enhancing enumerations with specific features to make them both portable and dependable.

This proposal provides an underlying enumeration type, specified after a colon of the _identifier_ for the enumeration name, to give the enumeration a dependable type. It makes the types for each of the enumeration constants the same as the specified underlying type, while leaving the current enumerations as unspecified as they were in their old iterations. It does not attempt to solve problems outside the scope of making sure that constants with specified underlying type are dependable, and attempts to make forward declaration of enumerations work across implementations.

3. Prior Art

C++ has this as a feature for their enumerations. Certain C compilers have this as an extension in their C compilation modes specifically, including Clang.

4. Design

The design of this feature follows C++'s syntax for both compatibility reasons and because the design is genuinely simple and useful:

enum a : unsigned long long {
	a0 = 0xFFFFFFFFFFFFFFFFULL
	// ^ not a constraint violation with a 64-bit unsigned long long
};

Furthermore, the type of a0 is specified to be unsigned long long, such that this program:

enum a : unsigned long long {
	a0 = 0xFFFFFFFFFFFFFFFFULL
};

int main () {
	return _Generic(a0, unsigned long long: 0, default: 1);
}

exits with a return value of 0. Note that because this change is entirely opt-in, no previous code is impacted and code that was originally a syntax violation will become well-formed with the same semantics as they had from their C++ counterparts. The interesting component of this proposal - that is currently marked optional - addresses a separate issue found in the current enumeration specification.

4.1. Unsigned, Wraparound, and Overflow Semantics

Consider the code sample:

enum flags : unsigned int {
	a = 0x01,
	// …
	o = 0x8000,
	p = 0x10000,
	// …
	low_16_merged_flags = 0xFFFF,
	alternative_p // implicit 0xFFFF + 1
}

This code is (intentionally) a footgun. For starters, int and unsigned int need not be 32 bits wide: their lowest requirement is 16 bits. This means that the p flag is not within the representable range of an unsigned int. There is also the problem of the enumeration constant that comes after the low_16_merged_flags enumeration constant, the alternative_p. This one is, implicitly, the same as p because of the 0xFFFF + 1 would yield 0x10000. This, too, is outside the range of a 16-bit unsigned integer type in C.

There are 2 ways to resolve this tension.

The first is to allow this code to compile, and perform silent wraparound on p and alternative_p. This means that, regardless of the user intent, the specified value (p) and implicit value (alternative_p) would both take on a value of 0x1, same as the a flag. If this code was meant to be ported between platforms, this code compiles silently but has the wrong expected behavior when run. Tests, fuzzing, and other mechanisms may catch the problem and remind the user to appropriate a better named underlying type, or check the flag values more carefully.

The second way to solve this is to make the above a constraint violation. That means both p and alternative_p, when ported to a platform where unsigned int is 16 bits wide, will loudly complain that the value is inappropriate. This would prevent compilation on platforms, rather than require testing, fuzzing, and other techniques to handle the range of values.

This proposal goes with the second way. It is a far better user experience to prevent compilation where possible: silent wraparound is a property of the machine and done for performance and hardware reasons. For interpreted implementations, the translation step still has to take care of the expression because it is considered a constant expression. Enumeration initialization should be robust C code to remain robust and without error over the long term.

Users who would like to avoid such errors will be reminded to select from the wide variety of battle-tested integer types in <stdint.h>, provided for their convenience, when such cases arise in C23 and beyond:

#include <limits.h>

enum flags : uint_least32_t { // 👍!
	a = 0x01,
	// …
	o = 0x8000,
	p = 0x100000, // works fine
	p = 0x100000u, // works fine
	// …
	low_16_merged_flags = 0xFFFF,
	alternative_p // implicit 0xFFFF + 1,
	              // works fine for 32-bit
}

It is better to provide an error that prevents non-portable code from exhibiting non-portable behavior, while portable code compiles, works, and runs across all platforms as expected. Finally, users who want the wraparound behavior can perform a manual cast to get what they want:

enum flags : unsigned int {
	a = 0x01,
	// …
	o = 0x8000,
	p = (unsigned int)0x100000, // cast: wraparound explicit
	p = 0x100000u, // literal suffix: explicit (any errors handled by literal)
	// …
	low_16_merged_flags = 0xFFFF,
	alternative_p // implicit 0xFFFF + 1, constraint violation
}

This is also consistent with existing practice around the subject (Clang x86-64 trunk).

4.2. Bit-Precise Integer Types and bool?

Integers such as _BitInt(31) are, currently, allowed as an extension for an underlying enumeration type in Clang. However, discussing this with the Clang implementers, there was sentiment that this just "happened to work" and was a not a fully planned part of the _BitInt/_ExtInt integration plan. They proposed that they would implement a diagnostic for it for future versions of Clang. In the standard, we do not want to step on the toes of anyone who may want to develop extensions in this place, especially when it comes to whether or not bit-precise enumeration types undergo integer promotion or follow the same rules for enumeration constants and similar. Therefore, we exclude them as usable types at this time.

We do not exclude bool from the possible set of types. It is allowed in C++ and other C extensions, and it allows for an API to provide mnemonic or otherwise fitting names for binary choices without needing to resort to a bit-field of a particular type. This provides a tangible benefit to code. Values outside of true or false can be errored/warned on when creating a bool enumeration, but that is a quality of implementation decision.

4.3. Variables, Declarations, and Parsing (Oh my!)

Currently, parsers for C may not properly handle the following code:

int main () {
	enum e : long long value = 0;
	return 0;
}

A sufficiently weak parser implementation can determine that this is an enumeration of underlying type long, and leave the declaration name to be the second long. This is a constraint violation, thanks to declaring a variable of long, and there is no workaround for it. There are several options to help accomodate for this problem:

  1. for enumerations declaring variables, putting an underlying type is not allowed unless the enumeration is also being defined or is used purely as a forward declaration (no identifier);

  2. for enumerations declaring type definitions, putting an underlying type is not allowed unless the enumeration is also being defined (as you cannot forward-declare a type definition, this does not have the same exemption as #1 on this list); and,

  3. as a fallout from #1, because this can never be used to declare an object, any use of an equals sign or similar to provide an initializer to initialize the value is also illegal if there is a specifier for the underlying type.

This forms a comprehensive set of fixes for the given issues. Finally, if an identifier is present, the implementation is required to consume the longest token sequence that would compose of a single type name (named according to the C grammar as: specifier-qualifier-list), before the opening brace { is provided.

4.4. Type of Enumeration Constants

Given this code sample:

enum e : unsigned short {
    x
};

int main () {
    return _Generic(x, enum e: 0, default: 1);
}

The program returns 0. x is considered a type enum e, and is compatible with unsigned short. Therefore, the following program would be a constraint violation regarding _Generic:

enum e : unsigned short {
    x
};

int main () {
    return _Generic(x, enum e: 0, unsigned short: 2, default: 1);
}

Furthermore, this program would return 0:

enum e : unsigned short {
    x
};

int main () {
    return _Generic(x, unsigned short: 0, default: 1);
}

since the enumerated type is compatible with the underlying type (but not the other way around).

4.5. Incomplete Types?

Previous revisions of this paper attempted to say that enumerations declared without underlying types could be considered incomplete types, similar to structures and unions. This may not always work because compatibility rules (and the ability to pun between pointers of said types) may not work because a forward-declared enumeration without an underlying type may be compatible with any integer type, and it is not guaranteed that all pointers to integer types have the same storage and alignment requirements. Does there exist an implementation where int*, long long*, void*, and similar do not exhibit the same storage and alignment requirements (not of what they point to, but of the literal pointer value itself)? It is dubious to answer "yes". But, the rule in §6.2.5¶25 that makes structures and unions have the same alignment requirements but not the integer types:

A pointer to void shall have the same representation and alignment requirements as a pointer to a character type.53) Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All pointers to structure types shall have the same representation and alignment requirements as each other. All pointers to union types shall have the same representation and alignment requirements as each other. Pointers to other types need not have the same representation or alignment requirements.

§6.2.5¶31, ISO/IEC 9899:202x, C Standard Working Draft, April 12th, 2022

So we cannot guarantee that the requirements for compatibility (pointer values to any two types have the same storage and alignment) are met. That rule has been there for a long time, so they must have a good reason for not allowing it for the integer types. (… Right?)

Nothing needs to be said for enumerations with fixed underlying types because enumerations with fixed underlying types are always complete, and therefore need no special rules for handling their existence as an "incomplete" pointer.

5. Proposed Wording

The following wording is relative to N2912.

5.1. Intent

The intent of the wording is to provide the ability to express enumerations with the underlying type present. In particular:

5.2. Proposed Specification

5.2.1. Modify Section §6.2.7 Compatible type and composite type, paragraph 1

… Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: if one is declared with a tag, the other shall be declared with the same tag. If both are completed anywhere within their respective translation units, then the following additional requirements apply: … For two enumerations, corresponding members shall have the same values ; if one has a fixed underlying type, then the other shall have a compatible fixed underlying type.

5.2.2. Modify Section §6.4.4.3 Enumeration constants

6.4.4.3 Enumeration constants
Syntax

enumeration-constant:

identifier

Semantics

An identifier declared as an enumeration constant for an enumeration without a fixed underlying type has type int. An identifier declared as an enumeration constant for an enumeration with a fixed underlying type has the associated enumerated type.

An enumeration constant may be used in an expression (or constant expression) wherever a value of standard or extended integer type may be used.

Forward references: enumeration specifiers (6.7.2.2).

5.2.3. Modify Section §6.7.2.2 Enumeration constants

6.7.2.2 Enumeration specifiers
Syntax

enum-specifier:

enum attribute-specifier-sequenceopt identifieropt enum-type-specifieropt { enumerator-list }

enum attribute-specifier-sequenceopt identifieropt enum-type-specifieropt { enumerator-list , }

enum identifier enum-type-specifieropt

enumerator-list:

enumerator

enumerator-list , enumerator

enumerator:

enumeration-constant attribute-specifier-sequenceopt

enumeration-constant attribute-specifier-sequenceopt = constant-expression

enum-type-specifier:

: specifier-qualifier-list

All enumerations have an underlying type. The underlying type can be explicitly specified using an enum-type-specifier and is its fixed underlying type. If it is not explicitly specified, the underlying type is the enumeration’s compatible type, which is either a signed or unsigned integer type, or char.

Constraints

For an enumeration with a fixed underlying type, an enumeration constant with a constant expression that defines its value shall:

— have that value be representable as that fixed underlying type without conversion, if the fixed underlying type is not bool; or

— be implicitly converted to 1 or 0 following the usual conversion rules for bool (6.3.1.2), if the underlying type is bool.

The definition of an enumeration constant without a defining constant expression shall not overflow or wraparound the fixed underlying type by adding 1 to the previous enumeration constant.

For an enumeration without a fixed underlying type, the expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an int.
If an enum type specifier is present, then the longest possible sequence of tokens that can be interpreted as a specifier qualifier list is as interpreted part of the enum type specifier. It shall name an integer type, or char, that is not an enumeration or bit-precise integer type.

An enum specifier of the form

enum identifier enum-type-specifier

may not appear except in a declaration of the form

enum identifier enum-type-specifier ;

unless it is immediately followed by an opening brace, an enumerator list (with an optional ending comma), and a closing brace.

If two enum specifiers that include an enum type specifier declare the same type, the underlying types shall be compatible.
Semantics
The optional attribute specifier sequence in the enum specifier appertains to the enumeration; the attributes in that attribute specifier sequence are thereafter considered attributes of the enumeration whenever it is named. The optional attribute specifier sequence in the enumerator appertains to that enumerator.
The identifiers in an enumerator list of an enumeration without a fixed underlying type are declared as constants that have type int and they . The identifiers in an enumerator list of an enumeration with fixed underlying type are declared as constants whose types are the same as the enumerated type. They may appear may appear wherever such are permitted.133) An enumerator with = defines its enumeration constant as the value of the constant expression. If the first enumerator has no =, the value of its enumeration constant is 0. Each subsequent enumerator with no = defines its enumeration constant as the value of the constant expression obtained by adding 1 to the value of the previous enumeration constant. (The use of enumerators with = may produce enumeration constants with values that duplicate other values in the same enumeration.) The enumerators of an enumeration are also known as its members.
Each For all enumerations without a fixed underlying type, each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type (excluding the bit-precise integer types) . The choice of type is implementation-defined139), but shall be capable of representing the values of all the members of the enumeration.
[📝 NOTE TO EDITOR: The wording in the above paragraph for "excluding the bit-precise…" is identical from the "Improved Normal Enumerations" Proposal, and should be appropriately merged if both paper are added to the standard.]

For all enumerations with a fixed underlying type, the enumerated type is compatible with the underlying type of the enumeration. After possible lvalue conversion a value of the enumerated type behaves the same as the same value with the underlying type, in particular with all aspects of promotion, conversion and arithmetic.FN0✨).

FN0✨) This means in particular that if the compatible type is bool, values of the enumerated type behave in all aspects the same as bool and the members only have values 0 and 1. If it is a signed integer type and the constant expression of an enumeration constant overflows, a constraint for constant expressions (6.6) is violated.
The An enumerated type declaration without a fixed underlying type is an incomplete type until immediately after the } that terminates the list of enumerator declarations, and complete thereafter. An enumerated type declaration of an enumeration with a fixed underlying type declares a complete type immediately after its enum type specifier.

EXAMPLE The following fragment: …

EXAMPLE Even if the value of an enumeration constant is generated by the implicit addition of 1, an enumeration with a fixed underlying type does not exhibit typical overflow behavior:

#include <limits.h>

enum us : unsigned short {
	us_max = USHRT_MAX,
	us_violation, /* Constraint violation:
	                 USHRT_MAX + 1 would wraparound. */
	us_violation_2 = us_max + 1, /* Maybe constraint violation:
	                                USHRT_MAX + 1 may be promoted to "int", and
	                                result is too wide for the underlying type. */
	us_wrap_around_to_zero = (unsigned short)(USHRT_MAX + 1) /* Okay: conversion
	                          done in constant expression before conversion to
	                          underlying type: unsigned smenatics okay. */
};

enum ui : unsigned int {
	ui_max = UINT_MAX,
	ui_violation, /* Constraint violation:
	                 UINT_MAX + 1 would wraparound. */
	ui_no_violation = ui_max + 1, /* Okay: Arithmetic performed as typical
	                                  unsigned integer arithmetic: conversion
	                                  from a value that is already 0 to 0. */
	ui_wrap_around_to_zero = (unsigned int)(UINT_MAX + 1) /* Okay: conversion
	                          done in constant expression before conversion to
	                          underlying type: unsigned smenatics okay. */
};

int main () {
	// Same as return 0;
	return ui_wrap_around_to_zero
	       + us_wrap_around_to_zero;
}

EXAMPLE The following fragment:

#include <limits.h>

enum E1: short;
enum E2: short;
enum E3;
enum E4 : unsigned long long;

enum E1 : short { m11, m12 };
enum E1 x = m11;

enum E2 : long { m21, m22 }; /* Constraint violation: different underlying types */

enum E3 {
	m31,
	m32,
	m33 = sizeof(enum E3) /* Constraint violation: E3 is incomplete */
};
enum E3 : int; /* Constraint violation: E3 previously had no underlying type */

enum E4 : unsigned long long {
	m40 = sizeof(enum E4),
	m41 = ULLONG_MAX,
	m42 /* Constraint violation: unrepresentable value (wraparound) */
};

enum E5 y; /* Constraint violation: incomplete type */
enum E6 : long int z; /* Constraint violation: enum-type-specifier
                         with identifier in declarator */
enum E7 : long int = 0; /* Constraint violation:
                           enum-type-specifier with initializer */

demonstrates many of the properties of multiple declarations of enumerations with underlying types. Particularly, enum E3 is declared without an underlying type first, therefore a redeclaration with an underlying type second is a violation. Because it not complete at that time within its enumerator list, sizeof(enum E3) is a constraint violation within the enum E3 definition. enum E4 is complete as it is being defined, therefore sizeof(enum E4) is not a constraint violation.

EXAMPLE The following fragment:
enum no_underlying {
	a0
};

int main () {
	int a = _Generic(a0,
		int: 2,
		unsigned char: 1,
		default: 0
	);
	int b = _Generic((enum no_underlying)a0,
		int: 2,
		unsigned char: 1,
		default: 0
	);
	return 0;
}

demonstrates the implementation-defined nature of the underlying type of enumerations using generic selection (6.5.1.1). The value of a after its initialization is 2. The value of b after its initialization is implementation-defined: the enumeration must be compatible with a type large enough to fit the values of its enumeration constants. Since the only value is 0 for a0, b may hold any of 2, 1, or 0.

Now, consider a similar fragment, but using a fixed underlying type:

enum underlying : unsigned char {
	b0
};

int main () {
	int a = _Generic(b0,
		int: 2,
		unsigned char: 1,
		default: 0
	);
	int b = _Generic((enum underlying)b0,
		int: 2,
		unsigned char: 1,
		default: 0
	);
	return 0;
}

Here, we are guaranteed that a and b are both initialized to 1. This makes enumerations with a fixed underlying type more portable.

EXAMPLE Enumerations with a fixed underlying type must have their braces and the enumerator list specified as part of their declaration if they are not a standalone declaration:
void f1 (enum a : long b); /* Constraint violation */
void f2 (enum c : long { x } d);
enum e : int f3(); /* Constraint violation */

typedef enum t u;
typedef enum v : short W; /* Constraint violation */
typedef enum q : short { s } R;

struct s1 {
	int x;
	enum e : int : 1; /* Constraint violation */
	int y;
};

enum forward; /* Constraint violation */
extern enum forward fwd_val0; /* Constraint violation: incomplete type */
extern enum forward* fwd_ptr0; /* Constraint violation: enums cannot be
                                  used like other incomplete types */
extern int* fwd_ptr0; /* Constraint violation: incompatible with incomplete type */

enum forward1 : int;
extern enum forward1 fwd_val1;
extern int fwd_val1;
extern enum forward1* fwd_ptr1;
extern int* fwd_ptr1;

int main () {
	enum e : short;
	enum e : short f = 0; /* Constraint violation */
	enum g : short { y } h = y;
	return 0;
}

Forward references: generic selection (6.5.1.1), tags (6.7.2.3), declarations (6.7), declarators (6.7.6), function declarations (6.7.6.3), type names (6.7.7) .

5.2.4. Modify Section §6.7.2.3 Tags

6.7.2.3 Tags
Constraints

A type specifier of the form

struct-or-union attribute-specifier-sequenceopt identifieropt { member-declaration-list }

or

enum attribute-specifier-sequenceopt identifieropt enum-type-specifieropt { enumerator-list }

or

enum attribute-specifier-sequenceopt identifieropt enum-type-specifieropt { enumerator-list , }

declares a structure, union, or enumerated type. …

A declaration of the form

struct-or-union attribute-specifier-sequenceopt identifier ;

or
enum identifier enum-type-specifier ;

specifies a structure or union type structure, union, or enumerated type and declares the identifier as a tag of that type.142) The optional attribute specifier sequence appertains to the structure or union type being declared; the attributes in that attribute specifier sequence are thereafter considered attributes of the structure or union type whenever it is named.

If a type specifier of the form

struct-or-union attribute-specifier-sequenceopt identifier

occurs other than as part of one of the above forms, and no other declaration of the identifier as a tag is visible, then it declares an incomplete structure or union type, and declares the identifier as the tag of that type.143)

143)A similar construction with enum that does not contain a fixed underlying type does not exist. Enumerations with a fixed underlying type are always complete after the enum type specifier.
If a type specifier of the form

struct-or-union attribute-specifier-sequenceopt identifier

or

enum identifier enum-type-specifier

occurs other than as part of one of the above forms, and a declaration of the identifier as a tag is visible, then it specifies the same type as that other declaration, and does not redeclare the tag.

5.2.5. Add implementation-defined enumeration behavior to Annex J

6. Acknowledgements

Thanks to:

We hope this paper serves you all well.