Thread-Local Storage

ISO/IEC JTC1 SC22 WG21 N2147 = 07-0007 - 2007-01-05

Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org

This proposal is a revision of N1966 = 06-0036 - 2006-02-23.

Introduction

In multi-threaded applications, there often arises the need to maintain data that is unique to a thread. We call this thread-local storage.

Several techniques have been used to accomplish this task. Notable among them is the POSIX getthreadspecific and setthreadspecific facility. Unfortunately, this facility is clumsy and slow. In addition, the facility is not particularly helpful when converting a single-threaded application to a multi-threaded application.

Several vendors have provided a language extension for a new storage class that indicates that a variable has thread storage duration. Use of thread variables is relatively easy and access to thread variables is relatively fast. In addition, the conversion of a single-threaded application using static-duration variables to a multi-threaded application using thread-duration variables requires less wholesale program restructuring.

Roughly equivalent extensions are available from
GNU Thread-Local Storage
Hewlett-Packard Using Thread Local Storage
Hewlett-Packard Tru64 UNIX to HP-UX STK: critical Impact: TLS - feature differences (CrCh320)
Intel Thread-local Storage
Microsoft Thread Local Storage
Sun Microsystems Thread-Local Storage

The C++ standard should adopt existing practice for thread-local storage. In addition, the C++ standard should extend existing practice to enable broader use.

Proposal

The specification outline is as follows. We defer detailed changes to the text of the standard to the final section.

Thread Storage Duration

Add a new storage duration called thread storage duration. Objects with thread storage duration are unique to each thread.

Those objects which may have static storage duration may have thread storage duration instead. These objects include namespace-scope variables, function-local static variables, and class static member variables.

Storage Class __thread

Add __thread, a new keyword and storage class specifier. The __thread specifier indicates that the variable has thread storage duration.

Variables declared with the __thread specifier are bound as they would be without the __thread specifier.

Addresses of Thread Variable

The address-of operator (&), when applied to a thread variable, is evaluated at run time and returns the address of the current thread's variable. Therefore, the address of a thread variable is not a constant.

Thread-local storage defines lifetime and scope, not accessibility. That is, one may take the address of a thread-local variable and pass it to other threads.

The address of a thread variable is stable for the lifetime of the corresponding thread. The address of a thread variable may be freely used during the variable's lifetime by any thread in the program. When a thread terminates, all addresses of that thread's variables are invalid and may not be used.

Thread Variable Dynamic Initialization

A thread variable may be statically initialized as would any other static-duration variable.

At present, all implementations of thread-local storage do not support dynamic initalization (and presumably non-trivial destructors). There was mild consensus at the Mont Treblant meeting to support dynamic initialization of function-local, thread-local variables. The intialization of such variables is already guarded and synchronous, so new technology is not required. On the other hand, the implementation for dynamic initialization of namespace-scope variables is much more difficult, and may require additional linker and operating system support. There was no consensus to support dynamic initialization of namespace-scope variables at this time. However, interviews with prospective users indicated a firm desire for full dynamic initialization of thread storage duration variables. The programmers simply did not want to partition their types this way.

The implementation of dynamic initialization and destruction can be implemented with two approaches.

.init sections
Extend the semantics of .init sections to also include sections for thread-local storage. These thread-local inits will be invoked whenever the corresponding storage section is allocated. This approach requires operating-system support.
initialized flags
The compiler inserts dynamic tests on an initialized flag into the program before access to a thread-local variable. The initialization of a thread-local variable must initialize all such variables defined within its translation unit. Note though, that initializations should be marked complete before executing the initialization to prevent recursive attempts to initialize the same variable. (Such recursive initializations have undefined behavior and are governed by the zero-initialization clause.) This approach does not require operating-system support, but has higher run-time cost.

In either case, the initialization of a thread-local variable must place the destruction on a thread-local list for subsequent handling on exit from the thread (potentially with cancellation cleanup functions).

Other Issues

There are some other issues that deserve mention even though they are not properly part of the C++ standard because they affect real programs.

Dynamic Libraries

The allocation of thread-local storage for the full product of threads and dynamic libraries could result in very large storage requirements. The Sun Microsystems implementation only allocates thread-local storage for a dynamic library when the thread uses a variable from that library. That is, the Sun implementation allocates memory lazily for each thread and dynamic library pair. To avoid bloated programs, the language definition must permit this optimization.

The system may immediately deallocate the storage associated with a thread and dynamic library pair when either the thread terminates or the library is closed. The system is not required to deallocate immediately. However, the system is required to not leak storage. Thread-local storage for a thread must be reclaimed no later than a subsequent thread creation. Thread-local storage for a library within a thread must be reclaimed no later than a subsequent open of that library. (Opening another library does not require storage reclamation, though doing so would ceratinly reduce storage consumption.)

While storage deallocation can be defered, variable destruction must not be defered because destruction depends on access to thread state. In the presence of programmed closing of a dynamic library, its thread-local variables may need to be destructed out of order with respect to thread-local variables outside of the library.

System Interface

When dlsym() is used on a thread variable, the address returned will be the address of the currently executing thread's variable.

Standard Changes

The text of the standard changes as specified in this section.

2.11 Keywords [lex.key]

To table 3, add __thread.

3.6.1 Main function [basic.start.main]

In paragraph 4, edit as follows. This change is the minimal necessary to accomodate thread-duration objects. A more robust specification of termination is needed. See 18.4 Start and termination [support.start.term].

Calling the function std::exit(int) declared in <cstdlib> (18.4) terminates the program without leaving the current block or current thread and hence without destroying any objects with automatic storage duration (12.4) or thread storage duration (3.7.2(new)). If std::exit is called to end a program during the destruction of an object with static or thread storage duration, the program has undefined behavior.

3.6.2 Initialization of non-local objects [basic.start.init]

Before paragraph 1, add a new paragraph

There are two broad classes of non-local objects, those with static storage duration (3.7.1) and those with thread storage duration (3.7.2(new)). Objects with static storage duration are initialized as a consequence of program initiation. Objects with thread storage duration are initialized as a consequence of thread initiation. Within each initiation, initialization occurs as follows.

In paragraph 1, edit

Objects with static storage duration (3.7.1) or thread storage duration (3.7.2(new)) shall be zero-initialized (8.5) before any other initialization takes place. A reference with static or thread storage duration and an object of POD type with static or thread storage duration can be initialized with a constant expression (5.19);

In paragraph 2, edit

An implementation is permitted to perform the initialization of an object of namespace scope with static storage duration as a static initialization even if such initialization is not required to be done statically, provided that

In paragraph 3, edit

It is implementation-defined whether or not the dynamic initialization (8.5, 9.4, 12.1, 12.6.1) of an object of namespace scope and with static storage duration is done before the first statement of main. ....

After paragraph 3, add new paragraph 4.

It is implementation-defined whether or not the dynamic initialization (8.5, 9.4, 12.1, 12.6.1) of an object of namespace scope and with thread storage duration is done before the first statement of the initial function of the thread. If the initialization is deferred to some point in time after the first statement of the initial function of the thread, it shall occur before the first use of any object with thread storage duration defined in the same translation unit as the object to be initialized.

In existing paragraph 4, edit

If construction or destruction of a non-local static object of namespace scope ends in throwing an uncaught exception, the result is a call to std::terminate (18.7.3.3).

3.6.3 Termination [basic.start.term]

In paragraph 1, edit

Destructors (12.4) for initialized objects of static storage duration (declared at block scope or at namespace scope) are called as a result of returning from main and as a result of calling exit (18.3). Destructors (12.4) for initialized objects with thread storage duration (declared at block scope or at namespace scope) are called as a result of returning from the initial function of a thread. When the initial function of a thread is the main function, the objects are destructed before those of static storage duration. These objects are destroyed in the reverse order of the completion of their constructor or of the completion of their dynamic initialization. If an object is initialized statically, the object is destroyed in the same order as if the object was dynamically initialized. For an object of array or class type, all subobjects of that object are destroyed before any local object with static storage duration initialized during the construction of the subobjects is destroyed.

In paragraph 4, edit

Calling the function std::abort() declared in <cstdlib< terminates the program without executing destructors for objects of with automatic, thread, or with static storage duration and without calling the functions passed to std::atexit().

3.7 Storage Duration [basic.stc]

To the list of storage durations in paragraph 1, between static and automatic, add

In paragraph 2, edit

Static, thread, and automatic durations are associated with objects introduced by declarations (3.1) and implicitly created by the implementation (12.2).

In paragraph 3, edit

The storage class specififers static, __thread, and auto are related to storage duration as described below.

3.7.1 Static storage duration [basic.stc.static]

In paragraph 1, edit

All objects which neither do not have dynamic storage duration, do not have thread storage duration, and nor are not local, have static storage duration.

3.7.2(new) Thread storage duration [basic.stc.thread]

Add a new section after 3.7.1 Static storage duration [basic.stc.static] with the following contents.

All objects declared with the __thread keyword have thread storage duration. The storage for these objects shall last for the duration of the thread in which they are created. There is a distinct object per thread, and use of the declared name refers to the object associated with the current thread.

An object with thread storage duration shall be initialized before its first use, and if initialized, shall be destroyed on thread exit.

3.7.3.1(old) Allocation functions [basic.stc.dynamic.allocation]

In paragraph 4, edit

[ Note: in particular, a global allocation function is not called to allocate storage for objects with static storage duration (3.7.1), for objects with thread storage duration (3.7.2(new)), for objects of type std::type_info (5.2.8), for the copy of an object thrown by a throw expression (15.1). --end note ]

3.8 Object Lifetime [basic.life]

In paragraph 8, edit

If a program ends the lifetime of an object of type T with static (3.7.1), thread (3.7.2(new), or automatic (3.7.2)(3.7.3(new)) storage duration and if T has a non-trivial destructor,

In footnote 40, edit

that is, an object for which a destructor will be called implicitly -- either either upon exit from the block for an object with automatic storage duration, upon exit from the thread for an object with thread storage duration, or upon exit from the program for an object with static storage duration.

In paragraph 9, edit

Creating a new object at the storage location that a const object with static, thread, or automatic storage duration occupies or, at the storage location that such a const object used to occupy before its lifetime ended results in undefined behavior.

5.19 Constant expressions [expr.const]

Paragraph 2 remains unchanged, intepreting "static" as modifying initialization rather than as a reference to duration.

Other expressions are considered constant-expressions only for the purpose of non-local static object initialization (3.6.2). Such constant expressions shall evaluate to one of the following:

Paragraphs 4 (address constant expressions) and 5 (reference constant expressions) remain unchanged. The omission of thread storage duration becomes significant, though, in that objects with thread storage duration do not have constant addresses.

6.7 Declaration statement [stmt.dcl]

In paragraph 4, edit

The zero-initialization (8.5) of all local objects with static storage duration (3.7.1) or thread storage duration (3.7.2(new)) is performed before any other initialization takes place. A local object of POD type (3.9) with static or thread storage duration initialized with constant-expressions is initialized before its block is first entered. An implementation is permitted to perform early initialization of other local objects with static or thread storage duration under the same conditions that an implementation is permitted to statically initialize an object with static or thread storage duration in namespace scope (3.6.2).

Paragraph 5 is unchanged, which by implication states that thread storage duration objects must be destructed.

7.1.1 Storage class specifiers [dcl.stc]

In paragraph 1, add "__thread" to the list of storage class specifiers.

In paragraph 1, edit

At most one storage-class-specifier shall appear in a given decl-specifier-seq., except that __thread may appear with static and extern. If __thread does appear, it shall be present in all declarations referring to the same object.

After paragraph 3, add a new paragraph

The __thread specifier can be applied only to the names of objects of block scope that also specify static or to the names of objects of namespace scope. It specifies that the named object has thread storage duration (3.7.2(new)).

In paragraph 4, edit

A static specifier used in the declaration of an object declares the object to have static storage duration (3.7.1), unless accompanied by the __thread specifier, which declares the object to have thread storage duration (3.7.2(new))

Paragraph 5 on extern is missing the parallel text.

8.5 Initializers [dcl.init]

In paragraph 2, edit

Automatic, register, thread, static, and namespace-scoped external variables of namespace scope can be initialized by arbitrary expressions involving literals and previsously declared variables and functions.

Paragraph 7 remains unchanged, which implies that thread storage duration objects may be uninitialized at program startup.

8.5.1 Aggregates [decl.init.aggr]

In paragraph 14, edit as follows. The expanded scope of 3.6.2 leaves this text mostly untouched.

When an aggregate with static or thread storage duration is initialized with a brace-enclosed initializer-list, if all the member initializer expressions are constant expressions, and the aggregate is a POD type, the initialization shall be done during a static phase of initialization (3.6.2); otherwise, it is unspecified whether the initialization of members with constant expressions takes place during the static phase or during the dynamic phase of initialization.

9.2 Class members [class.mem]

In paragraph 6, edit

A member shall not be declared to have automatic storage duration (auto, register), with the __thread storage-class-specifier unless also declared static, or with the extern storage-class-specifier.

9.4.2 Static data members [class.static.data]

In paragraph 1, edit

A static data member is not part of the subobjects of a class. For such a member declared __thread, there is only one copy of the member per thread. For such a member not declared __thread, there There is only one copy of a static the data member shared by all the objects of the class.

12.1 Constructors [class.ctor]

In paragraph 8, edit

Default constructors are called implicitly to create class objects of static, thread, or automatic storage duration (3.7.1, 3.7.2(new), 3.7.2) defined without an initializer (8.5), ...

12.2 Temporary objects [class.temporary]

In paragraph 5, edit

In addition, the destruction of temporaries bound to references shall take into account the ordering of destruction of objects with static, thread, or automatic storage duration (3.7.1, 3.7.2(new), 3.7.3(new));

12.4 Destructors [class.dtor]

In paragraph 10, edit

Destructors are invoked implicitly (1) for a constructed object with static storage duration (3.7.1) at program termination (3.6.3), (new) for a constructed object with thread storage duration (3.7.2(new)) at thread exit, (2) for a constructed object with automatic storage duration (3.7.23(new)) when the block in which the object is created exits (6.7), (3) for a constructed temporary object when the lifetime of the temporary object ends (12.2), (4) for a constructed object allocated by a new-expression (5.3.4), through use of a delete-expression (5.3.5), (5) in several situations due to the handling of exceptions (15.3).

12.6.1 Explicit initialization [class.expl.init]

In paragraph 4, edit

[ Note: the order in which objects with static or thread storage duration are initialized is described in 3.6.2 and 6.7. -- end note ]

15.3 Handling an exception [except.handle]

In paragraph 4, edit

Exceptions thrown in destructors of objects with static storage duration or in constructors of static-duration namespace-scope objects are not caught by a function-try-block on main(). Likewise, exceptions thrown in destructors of object with thread storage duration or in constructors of thread-duration namespace-scope objects are not caught by a function-try-block on the initial function of the thread.

15.5.1 The std::terminate() function [except.terminate]

In paragraph 1, in the list of causes for termination, edit

when construction or destruction of a non-local object with static or thread storage duration exits using an exception (3.6.2), or

Another possibility is to propogate the exception to the joiner, but then there would be no distinction between the thread function exiting with an exception and one of its thread-duration objects exiting with an exception.

18.4 Start and termination [support.start.term]

In paragraph 3, edit

The program is terminated without executing destructors for objects of automatic, thread, or static storage duration and without calling the functions passed to atexit() (3.6.3).

Paragraph 8, discusses the interaction of destruction and calling exit. The following edit is the minimum possible change to the standard to occomodate thread storage duration objects.

The function exit() has additional behavior in this International Standard: