Doc. No.:	WG21/N2802=J16/08-0312
Date:	2008-12-04
Reply to:	Hans-J. Boehm
Phone:	+1-650-857-3406
Email:	Hans.Boehm@hp.com

N2802: A plea to reconsider detach-on-destruction for thread objects

The destructor for std::thread is currently described as having the following effect:

If joinable() then detach(), otherwise no effects. [Note: Destroying a joinable thread can be unsafe if the thread accesses objects or the standard library unless the thread performs explicit synchronization to ensure that it does not access the objects or the standard library past their respective lifetimes. Terminating the process with _exit or quick_exit removes some of these obligations. - end note]

This was discussed at the Kona meeting and quite controversial at the time. The current solution was adopted in large part because it is consistent with Boost.

We would nonetheless like to reopen the discussion, because we feel this is possibly the most dangerous feature being added to C++0x. Unlike other "dangerous" features in the language, this appears to us to have near zero potential benefit, in terms of either performance or user convenience. In retrospect, we also feel that even those of us who initially opposed this approach did not fully appreciate the problems it introduces.

Why is detaching on destruction so dangerous?

Consider the following simple example of a trivially parallelized naive Fibonacci function:

int fib(int n) {
    if (n <= 1) return n;
    int fib1, fib2;
    
    std::thread t([=, &fib1]{fib1 = fib(n-1);});
    fib2 = fib(n-2);
    t.join();
    return fib1 + fib2;
}

We first create a second thread to perform the first recursive subcomputation and then perform the second subcomputation ourselves. We then wait for the thread we created before computing the final result.

We claim the basic structure of this function is likely to be fairly typical for simple parallel algorithms. Since thread::join() does not pass on a value, it seems to be natural to assign to a shared variable in the child thread. (One could easily add code to limit the number of threads, etc., but that's irrelevant to this discussion.)

So far, we believe this code is fine (modulo the stupid algorithm, possible overflows in the result, and possible syntax errors). But now consider adding a simple error check, or adding a call to a function that might throw an exception, as in

// Don't write this code !!!
int fib(int n) {
    if (n <= 1) return n;
    int fib1, fib2;
    
    std::thread t([=, &fib1]{fib1 = fib(n-1);});
    fib2 = fib(n-2);
    if (fib2 < 0) throw ...
    t.join();
    return fib1 + fib2;
}

This revised version of the function will probably still work most of the time, but it contains a horrible, and essentially impossible to debug, memory overwrite error. Since the error occurs rarely, it is difficult to test for. If the resulting program is run in a trusted context, it also introduces a potential security hole. The presence of this error seems subtle enough that novice programmers are unlikely to avoid it.

This code may fail whenever the exception is actually thrown. When that happens, the thread computing fib1 for the invocation of fib() that threw the exception will be detached. If we are in the process of computing fib2 for our caller, the same will happen in our caller; the thread computing fib1 on its behalf will also be detached, and so on. After that, the stack is unwound, the exception is hopefully caught, and the main computation continues.

But all those threads computing fib1 are still running! And as they finish, they will write to all those instances of fib1. Which are no longer there, since the stack has been unwound. In its place will be the stack corresponding to the continuing computation that was initiated when the exception was caught.

Thus we now have a large number of threads writing to various locations on the user's stack. By the time the user tries to debug the resulting mess, there is a good chance they will all be gone, leaving him/her with nothing but a stack with mysteriously smashed values. Or those might no longer be visible either because a return address may have been overwritten, causing the main program to take a wild branch. Or conceivably, a malicious and very clever user might have arranged to invoke the fib on an argument that intentionally causes a return address to be overwritten in just the right way ...

Templatizing the original fib() function with respect to the argument type will result in the same problem, even without the explicit throw, since operations on the argument type might themselves throw exceptions, e.g. because an unbounded integer arithmetic operation failed to allocate memory.

The fundamental issue is that direct use of std::thread is not exception safe, if the created thread accesses any object b with bounded lifetime. An exception thrown between the creation of the thread and joining it will leave a detached "escaped" thread that can access, and potentially write to b after the end of its lifetime.

The object b accessed by an escaped thread does not need to have automatic storage duration for this problem to arise. It can be deallocated on the heap and subsequently deallocated by its owner. Or it can even be statically allocated and destroyed at program termination; there is often no way to ensure that such escaped threads finish before static destructors are called.

The use of futures to return values may help in some cases, but is a very incomplete solution. To see this consider what would have happened if we had used the same approach as above to naively parallelize a simple quicksort by running the recursive invocations in parallel. In this case, the recursive invocations will write to the original array, which will typically be allocated in some stack frame. The return value is not terribly interesting, but a detached thread would still overwrite the parent stack at unpredictable times.

Possible workarounds

We can think of the following plausible workarounds, given the current std::thread specification:

Arrange that threads only access "permanent" objects. This appears impractical, or at least very convoluted, for examples like the one in the preceding section. For heap allocated objects, this generally requires that either garbage collection is in use, or the thread and its owner both have a shared_ptr to the object. For statically allocated objects it usually requires use of _exit or quick_exit to avoid static destructor invocation. It is clearly difficult to enforce when threads need to access objects that are parameters to the function creating the thread. It also requires that threads created using this model somehow avoid calling into the standard library after main() exits (see [thread.threads.constr]), which again seems impractical in real cases.
Arrange that threads are only created using a wrapper for std::thread that overrides the default destructor behavior, or are created just before a guard object that joins or otherwise ensures termination of the thread. Except in really simple cases, this appears to be the only practical solution.

Thus it appears to us that the only way to live with the current default is to essentially always explicitly override it. This seems like a poor justification for adding a very unsafe feature to the language.

Boost Experience

This problem appears to be subtle enough that many programmers do not recognize its existence. Most of the examples in a 2002 Doctor Dobbs article on Boost Threads are not completely correct with respect to the committee draft. If join() throws (which it may), they can access the standard library after static destructors are called.

Toy examples, as in the above paper and in the Wikipedia example, are sometimes correct by virtue of not being able to throw exceptions between creation and join. They also generally avoid the worst of the problems by not returning values from the child thread.

Real examples we found with Google code search (e.g. libopenvrml) generally did not address this issue, and appeared to be generally incorrect, though it was difficult to sufficiently understand the lifetime issues based on a quick code inspection. It seems to be common to create a thread in a member function, and pass the this pointer to the thread, without worrying about unexpected thread destruction. This appears to be wrong unless the this object is never destroyed. The most transparent nontrivial case we found had essentially the structure of our (broken) example above.

We did not find any boost::thread uses that were both correct in this respect, and were robust with respect to both the addition of standard library calls in the thread, and exception producing calls between thread creation and join. We expect that most code maintainers would expect to be able to add such calls. In essence, it seems that the small amount of correct code was essentially correct by accident.

Solutions

Essentially any reasonable destructor behavior other than detaching the thread would be an acceptable solution and, in our opinion, a vast improvement. Given the absence of a cancellation facility, acceptable options include:

Joining the thread. This has the unfortunate consequence that throwing an exception will wait for threads being destroyed. But this is much more benign than stack overwrite errors by another thread. And if it results in serious responsiveness issues, e.g. because it creates deadlocks, it is debuggable. It does not risk security holes.
Any specification that allows or requires the implementation to treat destruction of a joinable thread as an error. Lawrence Crowl suggested simply replacing the detach() call in the current specification with a call to terminate() by analogy with unhandled exceptions. This is another error condition that cannot be safely reported via an exception.

I originally favored the first solution, but most of the feedback since then has favored the second, and I'm warming up to it. The first solution makes a large number of simple use cases implicitly safe, at the cost of introducing some potentially tricky performance debugging problems when it results in performance better than an outright deadlock, but worse than acceptable.

I don't recall the second alternative receiving serious consideration in the initial discussion, but certainly a case can be made for it. Abandoning a thread by detaching it is not safe. Joining it implicitly risks unexpected delays. It seems that exception-safe code will usually need to address this issue explicitly. One way to encourage it to do so is to encourage the implementation to report when it doesn't.

Proposed wording:

There was a consensus in the mailing list discussion that the thread destructor and move assignment operators should perform the same action on the thread object being destroyed/overwritten. This is reflected here:

Alternative 1:

Replace detach() in the paragraph 30.2.1.3p1 we quoted above [thread.threads.destr] with join(). Remove the note attached to that paragraph.

In 30.2.1.4 [thread.thread.assign], replace detach() with join().

Alternative 2:

Replace detach() in the paragraph 30.2.1.3 we quoted above [thread.threads.destr] with terminate(). Replace the note attached to the paragraph with: [Note: Either implicitly detaching or joining a joinable() thread in its destructor could result in difficult to debug correctness (for detach) or performance (for join) bugs encountered only when an exception is raised. Thus the programmer must ensure that the destructor is never executed while the thread is still joinable. --end note]

In 30.2.1.4 [thread.thread.assign], replace detach() with terminate().