Simple TU initialization and cleanup handling with dependencies

Jens Gustedt, INRIA and ICube, France

2023-12-10

document history

document number date comment
n3185 202312 this paper, original proposal

license

CC BY, see https://creativecommons.org/licenses/by/4.0

Introduction and overview

Dynamic initialization of global data is often a difficult task in C and C++. Up to recently C had no mandatory tool that would handle this. C++ had constructor calls for static data that would not easily respect a dependency order between different translation units.

There is one particular compiler extension that is meant to deal with these problems namely the [[gnu::constructor]] attribute that is widely implemented in the field. In its general form (but which has issues with some compilers) it allows to add a numerical priority as a parameter, and thus different TU would be initialized according to their priority. This feature is difficult to handle in larger projects, because dependencies are not made explicit and because priorities have to be assigned and reassigned to TU (much as line numbers in BASIC) as a project grows.

Also, this feature has the possible disadvantage of initializing unconditionally, even for program parts that might eventually not be used. In contrast to that, the C standard has tools that must be triggered explicitly and thus may avoid expensive initialization for unused parts of a program or library.

In C23 we now have three mandatory functions that handle initialization and cleanup, namely call_once, atexit and at_quick_exit. The basic level of this proposal uses these exclusively to create a feature that solves user triggered initialization, associated cleanup and initialization dependencies à la C in a simplistic and unexciting way.

A second level then builds upon the first and adds unconditional initialization. this second level would need implementation specific tools, such as the above mentioned vendor attribute or such as a C++ constructor for static data.

1 The proposal

We present two levels of specification. A first that provides initialization that is triggered by a userspace macro and that only uses C23 standard features under the hood. Then a second level provides unconditional mandatory initialization for TUs that request it. Only that second level needs compiler magic for its implementation.

Note also that this proposal does not handle thread specific initialization and cleanup. For these the C standard foresees the tss_t type and tss_create etc functions. Wrapping these in more convenient interfaces could be subject to a different proposal.

1.1 Application triggered initialization

The basic feature provides four macro interfaces which we hope are easy to comprehend and only generate minimal overhead.

1.1.1 Defining the initialization code for a TU or feature group: ONCE_DEFINE

Any invocation of this definition macro should only be compiled once, so it would typically be used in a .c file. Its syntax is a macro invocation followed by a compound statement that forms the body of an internal initialization function:

ONCE_DEFINE ( identifier ) compound-statement

It defines one function with a signature

void long_name_composed_with_logger_as_an_init_function(void);

This function will later be called under hood similar to once_call in places that must ensure that a global initialization of the feature has taken place.

The generated name should be unique and not conflict with any other user space name or other once-feature that has been defined elsewhere, as long as the used identifier is unique within a project.

An example could look as follows:

FILE* logfile = nullptr;

ONCE_DEFINE(logger) {
    logfile = fopen("my-favorite", "w");
}

This registers the depending code to be executed much as a function would be called when using once_call. But note that the user here does not have to specify a once_flag, nor do they have to invent a naming convention that ties such a flag and the function together.

1.1.2 Executing the initialization code exactly once: ONCE_DEPEND

This macro hides a function call to the function that was defined by ONCE_DEFINE somewhere in the program, not necessarily in the same TU. It has to appear in block scope at a place where several declarations and statements can be placed. Typically it would appear at the beginning of functions that depend on the proper initialization of the feature:

int main(void) {
    ONCE_DEPEND(logger);

    fprintf(logfile, "We are up!\n");

}

In particular it can be used to express dependencies between different TU when placed into the initialization code of another once feature:

ONCE_DEFINE(tracker) {
    ONCE_DEPEND(logger);

}

Now whenever a user uses ONCE_DEPEND(tracker) in their code, the initialization of logger is launched as well. In particular logger is initialized before the rest of the initializer of tracker is executed, and so that code can already rely upon logger and e.g use logfile.

If the initialization code itself needs interfaces from another TU, it is important that this dependency is marked inside the code of ONCE_DEFINE as shown above; thereby it is guaranteed that the two initialization codes are chained in the correct order, regardless of the circumstances in which initialization is triggered.

1.1.3 Defining the cleanup code for a TU or feature group: ONCE_ATEXIT

The use of this macro is optional but must be located in the same TU as the corresponding ONCE_DEFINE. It provides a way to specify cleanup code that is executed as if by an atexit handler. The syntax is similar to the definition syntax, a macro invocation followed by a compound statement that makes up the body of an internal function:

ONCE_ATEXIT ( identifier ) compound-statement

Thus the following example

ONCE_ATEXIT(logger) {
    fclose(logfile);
}

executes a call to fclose at any regular program termination.

The order in which these handlers are executed is the reverse order in which the initializations have been called dynamically. So in our example above if the chaining was triggered by a call to ONCE_DEPEND(tracker) we would see the following ordering

ONCE_DEPEND(tracker)
→ ONCE_DEPEND(logger)
// initialization code of logger
// initialization code of tracker



exit -> // atexit code of tracker
        // atexit code of logger

This order is robust, even if ONCE_DEPEND(logger) is called first in some other part of the executable that is independent of tracker.

1.1.4 Defining quick cleanup code for a TU or feature group: ONCE_AT_QUICK_EXIT

The use of this macro is optional and works analogous to ONCE_ATEXIT:

ONCE_AT_QUICK_EXIT ( identifier ) compound-statement

only that the depending block forms the body of a function that is handed to at_quick_exit instead of atexit.

1.2 Unconditional initialization

Such an initialization that would not necessarily be triggered explicitly needs additional support that goes beyond C23, for example the mentioned GNU attribute. We propose that in addition to the above two supplementary macros are provided.

1.2.1 Initialization by means of ONCE_DEFINE_STRONG

This macro is similar to ONCE_DEFINE but guarantees unconditional initialization, if the platform supports such a thing. If this variant is used it is important to maintain dependencies between TU by means of the ONCE_DEPEND macro. then still guarantees that the initialization code is called in the right order: whichever initialization code is called first by the system at startup, a marked dependency will trigger the other TU before executing the remainder.

1.2.2 Fallback dependency marking with ONCE_DEPEND_WEAK

When using strong initialization, marking dependencies in code outside initialization is actually not necessary. To address this possible optimization a second dependency macro can be used. In contexts that support unconditional initialization it basically does nothing. Otherwise, it falls back to the full dependency macro ONCE_DEPEND.

2 A reference implementation

The following reference implementation of the first level of macros is header only, simple, efficient and in essence fits on one page. A trade-of between some form of efficiency and the number of visible external names is chosen. We don’t think that efficiency is really of high importance here. The “critical” part would be ONCE_DEPEND, which in this implementation results in two nested function calls. But if it turns out to be critical for user code, this could be reduced to just one function call (by playing some inline games) or just one atomic exchange (a bit more involved and needs more system support such as futex).

First, the tools that we need are already regrouped in a single header <stdlib.h> so we propose that we also target that one for the additions.

#include <stdlib.h>
#if __STDC_VERSION_STDLIB_H__ < 202311L
#include <threads.h>
#endif

#define ONCE_NAME(NAME) NAME ## _init_generated_once
#define ONCE_NAME_USER(NAME) NAME ## _user_generated_once
#define ONCE_NAME_INTERNAL(NAME) NAME ## _internal_generated_once
#define ONCE_NAME_FLAG(NAME) NAME ## _flag_generated_once
#define ONCE_NAME_ATEXIT_INTERNAL(NAME) NAME ## _atexit_internal_generated_once
#define ONCE_NAME_ATEXIT(NAME) NAME ## _atexit_generated_once
#define ONCE_NAME_AT_QUICK_EXIT_INTERNAL(NAME) NAME ## _at_quick_exit_internal_generated_once
#define ONCE_NAME_AT_QUICK_EXIT(NAME) NAME ## _at_quick_exit_generated_once

Note that call_once is only in <stdlib.h> since C23. Before it only was in the optional header <threads.h> which we include as a fallback. It should easily be possible to define other fallbacks, for example by using POSIX threads.

Note also that for convenience we also use macros that implement the internal naming convention that is used here. These could easily adapted as needed.

ONCE_DEPEND has no surprises

#define ONCE_DEPEND(NAME)           \
 extern void ONCE_NAME(NAME)(void); \
 ONCE_NAME(NAME)()

ONCE_DEFINE is slightly more complicated. In addition to the function with external linkage that we have declared it defines several symbols with internal linkage.

#define ONCE_DEFINE(NAME)                                       \
  /* Forward declarations. */                                   \
  static void ONCE_NAME_USER(NAME)(void);                       \
  static void (*const ONCE_NAME_ATEXIT_INTERNAL(NAME))(void);   \
  static void (*const ONCE_NAME_AT_QUICK_EXIT_INTERNAL(NAME))(void); \
  /* This function is used with call_once */                    \
  static void ONCE_NAME_INTERNAL(NAME)(void) {                  \
    ONCE_NAME_USER(NAME)();                                     \
    if (ONCE_NAME_ATEXIT_INTERNAL(NAME)) {                      \
      atexit(ONCE_NAME_ATEXIT_INTERNAL(NAME));                  \
    }                                                           \
    if (ONCE_NAME_AT_QUICK_EXIT_INTERNAL(NAME)) {               \
      at_quick_exit(ONCE_NAME_AT_QUICK_EXIT_INTERNAL(NAME));    \
    }                                                           \
  }                                                             \
  /* This is the function called by ONCE_DEPEND */              \
  void ONCE_NAME(NAME)(void) {                                  \
    /* The once flag is hidden inside */                        \
    static once_flag ONCE_NAME_FLAG(NAME) = ONCE_FLAG_INIT;     \
    call_once(&ONCE_NAME_FLAG(NAME), ONCE_NAME_INTERNAL(NAME)); \
  }                                                             \
  /* This has the user code for initialization */               \
  static void ONCE_NAME_USER(NAME)(void)

The single point of entry ONCE_NAME(NAME) ensures that the linkage namespace is not polluted with more than one symbol and the once_flag and called user functions are glued together without possibility of bypass.

The fact that ONCE_NAME_ATEXIT_INTERNAL(NAME) is a static function pointer variable comes into play if a ONCE_ATEXIT definition is provided by the user.

#define ONCE_ATEXIT(NAME)                                    \
  static void ONCE_NAME_ATEXIT(NAME)(void);                  \
  static void (*const ONCE_NAME_ATEXIT_INTERNAL(NAME))(void) \
                      = ONCE_NAME_ATEXIT(NAME);              \
  static void ONCE_NAME_ATEXIT(NAME)(void)

This now defines and initializes the ONCE_NAME_ATEXIT_INTERNAL(NAME) variable with a pointer to a static function that holds the user code for cleanup. Above the pointer was only passed as an argument to atexit if it is non-null. Since it is const qualified and static any decent compiler should be able optimize that code efficiently:

The same mechanism is used to define and register code that would be provided for at_quick_exit.

#define ONCE_AT_QUICK_EXIT(NAME)                                    \
  static void ONCE_NAME_AT_QUICK_EXIT(NAME)(void);                  \
  static void (*const ONCE_NAME_AT_QUICK_EXIT_INTERNAL(NAME))(void) \
                      = ONCE_NAME_AT_QUICK_EXIT(NAME);              \
  static void ONCE_NAME_AT_QUICK_EXIT(NAME)(void)

3 Possible extensions

In the implementation on which this proposal is based upon we already have added a marking of the compiled TU by means of [[maybe_unused]] static strings. It allows to extract an initialization dependency graph from the generated executable.

More generally, implementations in a compiler itself (not via macros) could detect initialization loops and stop translation if any are found.

4 Standardeeze

7.24.4.9 Initialization, cleanup and dependency between translation units

Synopsis

#include <stdlib.h>

ONCE_DEFINE ( identifier ) compound-statement

ONCE_DEPEND ( identifier ) ;

ONCE_ATEXIT ( identifier ) compound-statement

ONCE_AT_QUICK_EXIT ( identifier ) compound-statement

ONCE_DEFINE_STRONG ( identifier ) compound-statement

ONCE_DEPEND_WEAK ( identifier ) ;

Description

The macros in this clause provide means of executing the compound statements either at program startup or at program termination just as if called or registered with the call_once, atexit or at_quick_exit library functions. These calls can be triggered in an application controlled way by using the macros for dependencies. In particular, with these applications are able to mark dependencies in initialization between different translation units.

Each identifier that is used with ONCE_DEFINE or ONCE_DEFINE_STRONG identifies a specific initialization group. An invocation of ONCE_DEPEND( ID) within the compound statement of an invocation ONCE_DEFINE( JE) or ONCE_DEFINE_STRONG( JE) constitutes a direct initialization dependency from group JE to group ID. The transitive closure of the direct initialization dependency relation shall form an acyclic directed graph.

7.24.4.9.1 Conditional initialization

7.24.4.9.1.1 The ONCE_DEFINE macro

The ONCE_DEFINE macro registers its argument ID as a name of an initialization group that is valid within the whole program and associates the compound statement as to be executed when the initialization of the group ID is requested.

Any invocation of this macro shall be located in file scope. For any identifier ID, at most one invocation of either ONCE_DEFINE( ID) or ONCE_DEFINE_STRONG( ID) shall be present in the whole program. The effect is the same as the definition of a function that has the compound statement as the function body, that has external linkage and that has an implementation-defined name that uses the identifier ID to create a unique reserved identifier that does not collide with any identifier specified by the application. Two invocations with different identifiers shall use different such generated names.

The group ID shall be initialized when and only if an invocation of the macro ONCE_DEPEND( ID) is executed.

7.24.4.9.1.2 The ONCE_DEPEND macro

Invocations of the ONCE_DEPEND(ID) macro shall be placed in block scope at a point where several declarations and statements are permitted by the syntax. An invocation of ONCE_DEPEND shall not appear in the compound statement that is associated to an invocation of ONCE_ATEXIT or ONCE_AT_QUICK_EXIT.

When an invocation of the ONCE_DEPEND macro is met during program execution it triggers the initialization of the group ID. Similar to a call to call_once this initialization shall be performed at most once per program execution. For any evaluation that is sequenced after such an invocation this initialization shall have been performed to its entirety and all side effect shall be visible. After the initialization of group ID has been completed, subsequent calls ONCE_DEPEND( ID) have no effect.

7.24.4.9.1.3 The ONCE_ATEXIT macro

The ONCE_ATEXIT macro associates the compound statement to be executed when the atexit handler for the group ID is triggered on termination of the program.

Any invocation of this macro shall be located in file scope. For any identifier ID, at most one invocation ONCE_ATEXIT( ID) shall be present in the whole program. For each invocation ONCE_ATEXIT( ID) there shall be an invocation ONCE_DEFINE( ID) or ONCE_DEFINE_STRONG( ID) that is situated in the same translation unit.

The effect is the same as the following.

7.24.4.9.1.4 The ONCE_AT_QUICK_EXIT macro

This macro is the same as ONCE_ATEXIT, only that the code is registered with at_quick_exit instead of atexit.

7.24.4.9.2 Unconditional initialization

The following macros describe groups and dependencies for which the intent is that the initialization code is executed unconditionally at program startup. Whether or not such an unconditional initialization is supported is implementation-defined. Nevertheless these macros are mandatory.

7.24.4.9.2.1 The ONCE_DEFINE_STRONG macro

Similar to ONCE_DEFINE, the ONCE_DEFINE_STRONG macro registers its argument ID as a name of an initialization group that is valid within the whole program and associates the compound statement that is to be executed when the initialization of the group ID is performed.

If application code is executed that depends upon the registered initialization code for ID, either the implementation shall support unconditional initialization, or an invocation of ONCE_DEPEND( ID) or ONCE_DEPEND_WEAK( ID) shall have been sequenced before. In particular, if ONCE_ATEXIT( ID) or ONCE_AT_QUICK_EXIT( ID) are present within the same program this initialization shall be sequenced before any call to exit or quick_exit, respectively.

7.24.4.9.2.1 The ONCE_DEPEND_WEAK macro

An invocation ONCE_DEPEND_WEAK shall not appear in the compound statement that is associated to an invocation of ONCE_DEFINE, ONCE_DEFINE_STRONG, ONCE_ATEXIT or ONCE_AT_QUICK_EXIT.

This macro is the same as ONCE_DEPEND, only that the implementation may remove any effect of this macro if it supports unconditional initialization.