Doc. No.:	WG21/N3679
Date:	2013-5-05
Reply to:	Hans-J. Boehm
Phone:	+1-650-857-3406
Email:	Hans.Boehm@hp.com

N3679: Async() future destructors must wait

We've had repeated debates about the desirability of having futures returned by async() wait in their destructor for the underlying task to complete. See for example, N3630, and it's predecessor N3451. This had turned into a particularly sensitive issue, since future destructors do not block consistently, but only when returned from async(), potentially making them difficult to use in general purpose code.

A number of the older papers, e.g. N3630, argued that future destructors should not block at all. Here we argue that such a change would be too much: It would introduce subtle program bugs, which are likely to be exploitable as security holes. A very similar argument was presented, in slide form, at the Bristol SG1 meeting. It contributed to an alternate proposal, N3637, which was almost voted into the working paper.

The only point of this paper is to document more of the discussion leading to N3637 in the interest of avoiding future repetition.

The basic issue

Futures returned by async() with async launch policy wait in their destructor for the associated shared state to become ready. This prevents a situation in which the associated thread continues to run, and there is no longer a means to wait for it to complete because the associated future has been destroyed. Without heroic efforts to otherwise wait for completion, such a "run-away" thread can continue to run past the lifetime of the objects on which it depends.

As an example, consider the following pair of functions:

void f() {
  vector<int> v;
  ...
  do_parallel_foo(v);
  ...
}

void do_parallel_foo(vector<int>& v) {
  auto fut = no_join_async([&] {...  foo(v); return ...; });
  a: ...
  fut.get();
  ...
}

If no_join_async() returns a future whose destructor does not wait for async completion, everything may work well until the code at a throws an exception. At that point nothing waits for the async to complete, and it may continue to run past the exit from both do_parallel_foo() and f(), causing the async task to access and overwite memory previously allocated to v way past it's lifetime.

The end result is likely to be a cross-thread "memory smash" similar to that described in N2802 under similar conditions.

This problem is of course avoided if get() or wait() is called on no_join_async()-generated futures before they are destroyed. The difficulty, as in N2802, is that an unexpected exception may cause that code to be bypassed. Thus some sort of scope guard is usually needed to ensure safety. If the programmer forgets to add the scope guard, it appears likely that an attacker could generate e.g. a bad_alloc exception at an opportune point to take advantage of the oversight, and cause a stack to be overwritten. It may be possible to also control the data used to overwrite the stack, and thus gain control over the process. This is a sufficiently subtle error that, in our experience, it is likely to be overlooked in real code.

Not all dangling pointers are created equal

It has repeatedly been argued that this is no worse than existing dangling pointer issues, such as those introduced by lambda expressions with reference captures. Here we argue that it is in fact worse, by contrasting the two corresponding examples in the following table. Both examples operate on a vector v passed in as a parameter. In both cases, the function foo should normally ensure that there are no references to v once foo() returns, since there is no reason to expect that v will still be around. On the left side, we assume a hypothetical no_join_async() whose returned future does not block in its destructor, as above.

Async-induced dangling reference	Lambda-induced dangling reference
void foo(vector<int> &v) { auto f = no_join_async([&] {... sort(v); return v.size(); }); a: ... // drop f }	function<int> foo(vector<int> &v) { function<int> f = [&] {... sort(v); return v.size(); }) a: ... return f; }

Async-induced dangling reference

Lambda-induced dangling reference

void foo(vector<int> &v)
{
  auto f = no_join_async([&] {...
    sort(v); return v.size(); });
  a: ...
  // drop f
}

function<int> foo(vector<int> &v) {
  function<int> f = [&] {... sort(v); return v.size(); })
  a: ...
  return f;
}

Both pieces of code are buggy, or at least very brittle. On the left, v may be accessed after the return of foo() because the asynchronous task continues run. On the right side, the returned lambda expression has captured v by reference. There is no guarantee that v still exists when the lambda expression is invoked.

But there are several reasons to consider the version of the left significantly more hazardous:

On the right side an explicit action is required to let f escape. On the left side, the bug is introduced, and v escapes, by omission of the code to wait for the async to complete. That is at the root of the other problems as well.
On the right side, the problem is removed by an unexpected exception at a. On the left side exceptions greatly aggravate the problem. If the code is "corrected" by explicitly calling f.wait() just before the end of foo(), an unexpected exception at a will still cause the get() call to be skipped, reintroducing the runaway task that accesses v after the end of its lifetime.
Since real instances of the problem on the left are often introduced by an unexpected exception at the wrong point, it is far less likely to be caught during testing. It nonetheless seems plausible that such an exception could be introduced by an attacker, e.g. by limiting memory and providing input that requires a large memory allocation.
The code on the right can still be used correctly by not calling the returned function after v's lifetime. The code on the left is impossible to use correctly, and thus useless, since there is no way to ensure that v won't be accessed past it's lifetime, even if v has static duration.
The left side should generally also be avoided for performance reasons, since it leaves a thread running, consuming hardware resources and power, in spite of the fact that it is performing useless computation.