N2995
__supports_literal

Published Proposal,

Previous Revisions:
N2962 (r0)
Author:
Latest:
https://thephd.dev/_vendor/future_cxx/papers/C - __supports_literal.html
Paper Source:
github.com/ThePhD/future_cxx
Issue Tracking:
GitHub
Proposal Category:
Feature Request
Target:
C2y/C3a
Project:
ISO/IEC JTC1/SC22/WG14 9899: Programming Language — C

Abstract

This proposal gives compilers the ability to advertise what token sequences are recognizable and what kind of literal they represent to end-users. The construct is a preprocessor construct and aids in replacing the previous limitations of the maximum integer type as well as solve the issues with floating point, string, decimal floating pointer, and other literal creation for an implementation.

1. Changelog

1.1. Revision 1 - June 17th, 2022

1.2. Revision 0 - April 12th, 2022

2. Introduction and Motivation

Users did not have a way to adequately query support of specific kinds of literals or the maximum integer their processor could handle without triggering constraint violations for handling tokens in the preprocessor. It was only in the upcoming C23 Standard that users received a way to check whether or not an integer constant they wrote was valid by checking the width of that value against (U)INTMAX_WIDTH.

This, however, still leaves other problems out in the open.

There are implementations with special floating point, string, and other kinds of literals built in as extensions to the language. They cannot be adequately captured by the standard and thus leaves users to fall back to using special compiler-specific preprocessor macros to gate usage so as not to produce errors. Worse, the integer problem ties the Application Binary Interface problems of intmax_t to all literals made by the implementation, typically limiting them to 64-bit maximum values for their literals. This has becoming an increasingly difficult problem, and many proposals - including one for bit-precise integer types (_BitInt(...), [N2946]) and one for the bit-specific integer types ([N2889]) - are trying to divorce intmax_t from certain types so they can be larger than the type.

While these are all good movements in the right direction, we propose a more fundamental fix that can be applied to more cases and allow for easier future growth in the space of literals, while simultaneously improving the situation around integer-based literals related to intmax_t. The feature is a __supports_literal preprocessor directive, which returns a positive integer for recognized constant/literal token sequence, and 0 otherwise.

3. Design

__supports_literal is a preprocessor macro function that is usable in #if preprocessor directives and produces an integer constant expression. It takes a pp-tokens sequence and determines whether or not the given sequence resolves to a literal. The return values are as follows:

This allows an implementation to properly advertise support for various constructs as provided by their implementation. For example, the GNU Compiler Collection (GCC) supports "raw string literals" (see: [raw-string-literals]), but only with -std=gnu11 instead of -std=c11. Therefore, it can be detected by writing:

#if __supports_literal(R"meow(🐈😸😹😺😻)meow")
	// Supports the raw string literal syntax
#else
	// Does NOT support raw string literal syntax
#endif

A more concrete example of this are types which extend the typical preprocessor syntax beyond just larger integer support or similar. For example, shading languages like HLSL and GLSL have built-in support for providing floating constants as a 16-bit type named "half" with the syntax 32.h, where h is the floating point suffix here. Some implementations simply support 128-bit integers natively in their compilers as well, but often cannot give support for it in the preprocessor due to the limitations of intmax_t and the ABI problems which surround upgrading the type. Finally, there are extensions to write decimal floating point types directly as floating point constants as well.

This syntax enables software engineers hoping to write robust C code to check for the support for all of these things and act accordingly, allowing for a consistent way to enable constants of types outside of what is currently blessed by the C Standard and already in use with many existing implementations and direct derivatives of C.

4. Wording

Wording is relative to [N2912].

4.1. Modify 6.10.1 Conditional inclusion with new syntax, a __supports_literal expression, and new Recommended Practice

6.10.1 Conditional inclusion
Syntax

has-include-expression:

__has_include ( header-name )

__has_include ( header-name-tokens )

supports-literal-expression:

__supports_literal ( pp-tokens )

Constraints
The expression that controls conditional inclusion shall be an integer constant expression except that: identifiers (including those lexically identical to keywords) are interpreted as described below182) and it may contain zero or more defined macro expressions, has_include expressions, supports_literal expressions, and/or has_c_attribute expressions as unary operator expressions.
Each supports_literal expression is replaced by a nonzero pp-number matching the form of an integer constant if the implementation supports the given preprocessing token sequence. The preprocessing tokens are processed just as in normal text and all macro replacement occurs once for the preprocessing tokens within the expression. The value of the replacement pp-number integer constant for supported token sequences is:

1 if the preprocessing token sequence is valid and forms an integer constant;

2 if the preprocessing token sequence is valid and forms a bit-precise integer constant;

3 if the preprocessing token sequence is valid and forms a standard floating point constant, decimal floating point constant (H.5.2), or complex type (complex and decimal floating point types are conditional features that implementations need not support; see 6.10.8.3.);

4 if the preprocessing token sequence is either the token true or the token false false;

5 if the preprocessing token sequence is valid and forms a character constant;

6 if the preprocessing token sequence is valid and forms a string literal;

— an integer constant less than or equal to -1 in value, if the preprocessing token sequence forms a implementation-defined recognized constantFN✨0); or,

0, if it matches none of the aboveFN✨1).

The preprocessing token sequence shall match the form of the associated constant or literal when the value of the replacement is non-zero.

Prior to evaluation, macro invocations in the list of preprocessing tokens that will become the controlling constant expression are replaced (except for those macro names modified by the defined unary operator), just as in normal text. If the token defined is generated as a result of this replacement process or use of the defined unary operator does not match one of the two specified forms prior to macro replacement, the behavior is undefined. After all replacements due to macro expansion and evaluations of defined macro expressions, has_include expressions, supports_literal expressions, and has_c_attribute expressions have been performed, all remaining identifiers (including those lexically identical to keywords) are replaced with the pp-number 0, and then each preprocessing token is converted into a token. The resulting tokens compose the controlling constant expression which is evaluated according to the rules of 6.6. For the purposes of this token conversion and evaluation, all signed integer types and all unsigned integer types act as if they have the same representation as, respectively, the types types with a width that are greater than or equal to intmax_t and uintmax_t defined in the header <stdint.h>.183) This includes interpreting character constants, which may involve converting escape sequences into execution character set members. Whether the numeric value for these character constants matches the value obtained when an identical character constant occurs in an expression (other than within a #if or #elif directive) is implementation-defined.
EXAMPLE This demonstrates a way to check for very large integer literals and, if supported, use it as the type for a large integer type.
#if __supports_literal(340282366920938463463374607431768211455)
	// use potential implementation-defined large literal type
	#define SUPPORTS_LONG_LITERAL 1
	typedef typeof(340282366920938463463374607431768211455) long_literal_t;
#else
	#define SUPPORTS_LONG_LITERAL 0
	// no big literal type
#endif

The long_literal_t type need not be a standard integer type, as integer literals in particular are allowed to select an extended integer type if the value exceeds the representable values of the standard integer types (6.4.4.1). This allows the user to query support for such types.

EXAMPLE Some literals are composed of multiple tokens. The following program:
#define STRA "A"
#define STRB "B"

int main () {
#if __supports_literal(STRA STRB) == 6
	return 0;
#else
	return 1;
#endif
}

is equivalent to this program:

int main () {
	return 0;
}
EXAMPLE Some token sequence operations may appear to look like multi-token literals, but they are in fact literals with operators applied to them and therefore the supports literal expression is replaced by 0:
static_assert(__supports_literal(-1) == 0, "operators applied to literals "
	"form non-literals.");
static_assert(__supports_literal(+1) == 0, "operators applied to literals "
	"form non-literals.");

-1 and +1 and similar constructs qualify as constant expressions, but they do not satisfy the requirements of the supports literal expression.

Recommended Practice
For supports_literal expressions, an implementation which recognizes a given implementation-defined constant or literal should replace the expression with a pp-number of the form -xxx, where the numeric value of "xxx" provides meaningful information to the end-user. Some meaningful values may be:

Value Suggested Meaning
-1 Implementation-defined integer constants, e.g. for flexible, unlimited precision integer constants or extended integer constants.
-2 Implementation-defined bit-precise integer constants, e.g. those potentially beyond the width of BITINT_MAXWIDTH but nonetheless supported by the implementation’s preprocessor directives or constant expression implementation.
-3 Implementation-defined floating constants, e.g. for implementation-defined decimal floating point beyond what is specified in Annex H and similar constructs.
-4 Implementation-defined predefined constants, e.g. implementation-defined boolean or pointer constants.
-5 Implementation-defined character constants, e.g. character constants which imbue a given encoding for the produced character value using a special prefix.
-6 Implementation-defined string literals, e.g. for different formats of string literals which treat escape sequences or provide translation-time format substitution information.

4.2. Add a extra paragraph to H.5.2 Constants in Annex H for __supports_literal expression

H.5.2 Conditional inclusion
For supports literal expressions (6.10.1), any floating suffix from the expanded list above used to compose a valid floating constant shall return a value of 3.

References

Informative References

[N2889]
Jens Gustedt. N2889 - Pointers and integer types. March 22nd, 2022. URL: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2889.htm
[N2912]
ISO/IEC JTC1 SC22 WG14 - Programming Languages, C; JeanHeyd Meneide; Freek Wiedijk. N2912: ISO/IEC 9899:202x - Programming Languages, C. June 8th, 2022. URL: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf
[N2946]
Aaron Ballman; Philipp Klause Krause; Elizabeth Andrews. N2946 - _BitInt Fixes. March 22nd, 2022. URL: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2946.pdf
[RAW-STRING-LITERALS]
cppreference.com. String literal - C++ Language. March 24th, 2022. URL: https://en.cppreference.com/w/cpp/language/string_literal