Doc. No.:	WG14/N1551
Date:	2010-02-14
Reply to:	Hans-J. Boehm
Phone:	+1-650-857-3406
Email:	Hans.Boehm@hp.com

N1551: Thread support in the library

Unfortunately, the C library, as defined in the first CD, does not address a number of library issues that arise as the result of the introduction of threads. I believe that it is critical to address these issues, since little portable code can be written without addressing several of them. And this is reflected in several national body comments.

This is our initial attempt to do so in the main library sections of the standard. I did not go through annex K carefully. But it may be fine as is.

As observed below, annex K potentially offers solutions to some of the problems identified here. Unfortunately, it does not do so consistently, and it is not required to be implemented on all platforms supporting threads. Thus it cannot be leveraged to address threads issues as is.

This proposal follows Posix whenever possible. Both due to time constraints, and because reflector discussions indicated controversy about the overall direction of the solution, I often do not include precise C standard wording in cases in which Posix already provides the necessary specifications. It should be fairly straightforward to derive precise wording from the Posix specification.

Although there are cases in which the Posix solutions are not clearly technically optimal, I feel that even in the controversial cases, specifically implicit locking for I/O operations, the Posix approach is sufficiently well-established practice, even on non-Posix systems, that any other solution is inconsistent with current practice, and not viable, at this stage. I discuss this in a bit more detail in the appropriate section below.

Michael Wong and Jim Thomas provided helpful comments on an earlier draft of this paper.

Reconsider N1371

The changes suggested for atexit, at_quick_exit, mbrlen and friends, suggested by N1371 should be incorporated into the C standard, as they have been for C++0x. Some of the other changes proposed there, e.g. to setlocale, and getenv already seem to have been incorporated in other ways.

More precisely specify when library calls introduce data races

Currently 7.1.4p5 specifies restrictions on data that can be accessed by library functions and hence introduce data races. I believe we need an additional restriction, analogous to 17.6.4.8p5 [res.on.data.races] in the C++ FCD ( WG21/N3092 ) that prevents functions like qsort from needlessly following pointers in its arguments, or memcpy from accessing memory outside the specified range. For example, a memcpy implementation that copies whole words, and then restores the "past the end" bytes afterwards, should not be correct.

Wording change:

Add the sentence

Library functions shall access or modify only those memory locations they are required to access or modify to fulfill their specifications.

as the second-to-the-last sentence of 7.1.4p5. Note that library routines implemented in C should naturally satisfy this constraint. Assembly language code can usually read additional data without making that visible to the user. That remains allowed by the "as if" rule. Writing additional data, even if the original values are rewritten, causes real bugs and needs to be prohibited.

`strerror`, `strtok`, `rand`, and `asctime`

The descriptions of the first three functions specify that calls may introduce data races, though they say nothing about when this may happen. They leave the programmer with no obvious viable alternatives to use in a multithreaded program. For example, a library writer has no way to safely invoke rand, since the library provides no convention for protecting such calls with a common lock. There is no way to preclude simultaneous rand calls by other libraries from other threads.

strerror may be a somewhat special case, both in that it is unclear to me whether strerror itself could be made thread-safe as strerror_l already is for Posix, and in that the optional annex K strerror_s already appears to provide the same functionality, but with a different argument order. A Google code search suggests that strerror is used by far the most frequently (about 500K uses), with strerror_r far behind it (about 2500 uses) and strerror_s far behind that (about 100 uses). This suggests that making strerror itself thread safe would be clearly the best solution, if technically feasible. Here I assume it is not, but it makes sense to introduce strerror_r, since it is the most widely used thread-safe alternative.

strtok also has an annex K version, strtok_s that may be intended to be thread-safe, and thus could possibly serve as a replacement for strtok_r. However, the description in K.3.7.3.1 is unclear. It talks about sequences of calls, in which the last argument must remain the same. The example implies that it should be possible to have multiple such sequences in progress at once, which should also make it possible to use it from multiple threads. But that wouldn't otherwise have been my reading of the normative text. A Google code search for strtok_s turns up few hits, and the top ones seem inconsistent with the annex K specification, in that they have only three arguments. The strtok_r function is far more established.

The asctime function has very similar issues. But since it is defined in terms of an implementation, no further text is needed to address data races.

Wording change:

Add Posix functions strerror_r, strtok_r, rand_r, and asctime_r.

In 7.23.5.8p6, replace

~~The strtok function is not required to avoid data races.~~

with

The strtok function accesses and modifies static duration objects not referenced by the arguments. These objects are accessed only by strtok. Calls to strtok from multiple threads that are unordered by happens before introduce data races.

In 7.23.6.2p3, replace

~~The strerror function is not required to avoid data races.~~

with

The strerror function may modify static duration objects. Any such objects are accessed only by strerror. Calls to strerror from multiple threads that are unordered by happens before may introduce data races.

In 7.22.2.1p3, replace

~~The rand function is not required to avoid data races.~~

with

The rand function accesses and modifies static duration objects. These objects are accessed only by rand and srand. Calls to rand or srand from multiple threads that are unordered by happens before introduce data races.

Add a very similar paragraph after 7.22.2.2p2:

The srand function modifies static duration objects. These objects are accessed only by rand and srand. Calls to rand or srand from multiple threads that are unordered by happens before introduce data races.

Possible strerror alternatives:

Require support of the Posix strerror_l function in addition to, or instead of, strerror_r.
Do not add strerror_r and instead rely on the existing, but very rarely used, strerror_s. Require strerror_s for implementations that support threads. Note that relying on any of the Annex K functions for thread support may turn out to confuse users, since there is no rand_s. Thus such a solution would either require use of both _r and _s functions, or would require the invention of completely new functions duplicating Posix functionality.
Require strerror to be thread-safe, e.g. by preallocating all possible results, or possibly be relying on thread-local storage.

Possible strtok alternatives:

Instead of, or in addition to, strtok_r, rely on strtok_s for thread-safety. Clarify that a sequence of strtok_s calls consist of several, not necessarily consecutive, calls made using the same ptr and s1max values. Require strtok_s for applications that support threads.
Instead require that strtok maintains its state in a thread-local variable.

Possible asctime alternatives:

Instead of, or in addition to, asctime_r, rely on asctime_s for thread-safety. This has the same issues as above. asctime_s is currently not required, and is much less frequently used than asctime_r.

`setjmp/longjmp`

The current description does not restrict the longjmp target context to correspond to the current thread. But I think all current implementations will either immediately crash or, even worse, end up with two threads running on the same stack. The restriction proposed here is essentially copied from the Posix description of longjmp, with slight rewording to match the C standard.

Wording change:

Add

The effect of a call to longjmp where the contents of the jmp_buf argument were not saved by the calling thread is undefined.

after 7.13.2.1p2.

`malloc`/`free`

It needs to be clear that allocation and deallocation functions implicitly avoid data races on the underlying heap, and that a modification of a memory location p followed by a deallocation of p followed by a reallocation and access of the same memory location p in another thread, do not introduce a data race on p. At the same time, we need to ensure that allocating p in one thread, and deallocating it in another, without intervening synchronization, remains a data race. (Doing this requires memory_order_relaxed atomic operations, but is possible.) It is hard to interpret the current specification as satisfying all these constraints. The clarification must allow thread-local allocation caches and should require malloc calls to order memory accesses only where absolutely necessary. Requiring all of these calls to acquire and release a particular lock, for example, would be an overconstraint, since it would give malloc some fence-like properties, which may be expensive to enforce.

Note that getting this right is subtle, and has significant impact on static analysis of code that calls memory management functions.

Wording change:

Insert the following after 7.22.3p1:

For purposes of determining the existence of a data race, memory allocation functions behave as though they accessed only memory locations accessible through their arguments, and not other static duration storage. These functions may however visibly modify the storage that they allocate or deallocate. A call to free or realloc that deallocates a region p of memory synchronizes with any allocation call that allocates all or part of the region p. This synchronization occurs after any modification of p by the deallocating function, and before any such modification by the allocating function.

Stdio

Currently it is unclear whether unordered stdio accesses to the same stream result in data races, and whether unordered accesses to the same stream via functions like fputs can result in interleaved characters. This is unacceptable:

It means that common current practice for multithreaded C code, which is portable to all current platforms of which I am aware, would not result in conforming C1x code. For example, it effectively outlaws the common practice of calling fprintf(stderr, ...) to report errors from multiple threads, or from libraries that may be called from multiple threads. Making such code conform again would require pervasive changes. Google code search finds about 10 times more occurrences of the above pattern than references to pthread_mutex_lock. Thus compensating for this omission may well dominate all other costs of converting a program to C1x threads.
There is no real way to repair such code, since the standard defines no way to add locking to, for example, fprintf(stderr, ...) calls.
As a practical matter, it means that essentially all current implementations need to maintain their, expensive at times, locking behavior, but portable clients are not allowed to take advantage of it.

Thus I believe it is imperative that the locking behavior of these calls be properly specified. And they must be specified in a way that is consistent with current practice.

Current practice is that all currently specified stdio calls implicitly acquire a lock on the accessed stream, and that additional functions are provided to

explicitly acquire and release the (reentrant) lock on a stream, so that character interleaving can be avoided even if multiple calls are used.
perform single character IO without locking to avoid the locking overhead when it would dominate.

I propose to follow this widely established precedent here. Mailing list discussions suggested that it is controversial. On the other hand, it also appeared to confirm the fact that all major platforms currently follow our proposed path.

Wording change:

This is a rough initial attempt at wording to follow Posix locking conventions.

Add Posix functions flockfile, funlockfile, getc_unlocked, putc_unlocked, getchar_unlocked, putchar_unlocked.

Insert the following paragraphs after 7.21.2p6:

Each stream has an associated lock that can is used to prevent data races when multiple threads access a stream, and to restrict the interleaving of stream operations performed by multiple threads. Only one thread may hold this lock at a time. The lock is reentrant: A single thread may hold the lock multiple times at a given time.

All functions that read, write, position, or query the position of a stream, except for putc_unlocked, getc_unlocked, putchar_unlocked, and getchar_unlocked, lock the stream, as though with flockfile before accessing it. They release the lock associated with the stream, as though with funlockfile, when the access is complete.

N1551: Thread support in the library

Reconsider N1371

More precisely specify when library calls introduce data races

strerror, strtok, rand, and asctime

setjmp/longjmp

malloc/free

Stdio

`strerror`, `strtok`, `rand`, and `asctime`

`setjmp/longjmp`

`malloc`/`free`