Clarifying C++ Futures

ISO/IEC JTC1 SC22 WG21 N3170 = 10-0160 - 2010-10-17

Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org
Anthony Williams, anthony@justsoftwaresolutions.co.uk
Howard Hinnant, howard.hinnant@gmail.com

Introduction
Easy Concurrency
    Early Call
    Late Return
        Multiple Get
        Const Reference
        Naming Types
        Undefined Behavior
    Deferred Functions
Copyable Versus Move-Only Types
Get Value Concurrently
    External Synchronization
    Moving Get
    Move Invalidation
Atomic Operation
Summary
Solution
    US 202 — 30.6.8/18
    US 203 — 30.6.8
    US 204 — 30.6.8/7-8
    US 205 — 30.6.9/3
Wording
    30.6.1 Overview [futures.overview]
    30.6.6 Class template future [futures.unique_future]
    30.6.7 Class template shared_future [futures.shared_future]
    30.6.8 Class template atomic_future [futures.atomic_future]
    30.6.9 Function template async [futures.async]

Introduction

At the Summer 2010 meeting of the C++ Standards Committee, it became apparent that members had different models for how futures were intended to be used, and thus different views on the interface of futures. This paper explores the uses of futures, problems with those uses, and solutions to those problems. It captures and extends the discussion at the meeting. We analyze the implications of the current interface. We present a relatively minimal solution. Finally, we propose normative wording.

This paper specifically addresses the following national body comments.

US 202 — 30.6.8/18

The note in this paragraph says "unlike future, calling get more than once on the same atomic_future object is well defined and produces the result again." There is nothing in future that says anything negative about calling get more than once.

Remove this note, or add words to the requirements for future that reflect what this note says.

US 203 — 30.6.8

Both future and shared_future specify that calling most member functions on an object for which valid() == false produces undefined behavior. There is no such statement for atomic_future.

Add a new paragraph after 30.6.8 [futures.atomic_future]/2 with the same words as 30.6.7 [futures.shared_future]/3.

US 204 — 30.6.8/7-8

According to the definition of atomic_future, all members of atomic_future are synchronizing except constructors. However, it would probably be appropriate for a move constructor to be synchronizing on the source object. If not, the postconditions on paragraphs 7-8, might not be satisfied. This may be applicable if a collection of futures are being doled out to a set of threads that process their value.

Make the move constructor for atomic future lock the source.

US 205 — 30.6.9/3

The third sentence says "If the invocation is not deferred, a call to a waiting function on an asynchronous return object that shares the associated asynchronous state created by this async call shall block until the associated thread has completed." The next sentence says "If the invocation is not deferred, the join() on the created thread..." Blocking until a thread completes is not necessarily a join.

Decide whether the requirement is to block until finished or to call join, and rewrite to match.

Easy Concurrency

There are many uses for futures. In this section, we concentrate on the use case for obtaining easy concurrency from sequential code. Given the following example,


void original( type arg ) {
    ....1
    ....2  type var = work( arg );
    ....3  if ( cond1 ) {
    ....4      use1( var );
    ....5  } else {
    ....6      use2( var );
    ....7  }
}

the task is to use async and future to obtain concurrency within the example.

Early Call

Technique: The working technique is to move the call of work earlier, so that work executes concurrently with code ....2.


void early_call( type arg ) {
    ....1  auto ftr = async( [=]{ return work( arg ); } );
    ....2  type var = work( arg ) ftr.get();
    ....3  if ( cond1 ) {
    ....4      use1( var );
    ....5  } else {
    ....6      use2( var );
    ....7  }
}

This code works in general with the existing unique future, which requires at most one call to get. This approach works in general, even when the return value is not stored.


void early_not_stored( type arg ) {
    ....1  auto ftr = async( [=]{ return work( arg ); } );
    ....2  use1( work( arg ) ftr.get() );
}

Late Return

Technique: Another approach to easy concurrency is to change uses of any stored return value into access to the future value, effectively moving the time the return value is needed to later. So, for the original example above, the work executes concurrently with code ....3 and either ....4 or ....6.


void late_return( type arg ) {
    ....1  
    ....2  type var auto ftr = async( [=]{ return work( arg ); } );
    ....3  if ( cond1 ) {
    ....4      use1( var ftr.get() );
    ....5  } else {
    ....6      use2( var ftr.get() );
    ....7  }
}

Multiple Get

Problem: This code works with unique future only if get() is called at most once. Any edit to the code that introduces a second call introduces undefined behavior.


void bad_second_use( type arg ) {
    ....1  
    ....2  type var auto ftr = async( [=]{ return work( arg ); } );
    ....3  if ( cond1 ) {
    ....4      use1( var ftr.get() );
    ....5  } else {
    ....6      use2( var ftr.get() );
    ....7  }
    ....8  use3( var ftr.get() ); // second use is undefined
}

Workaround: The example can be made to work with a shared_future.


void good_second_use( type arg ) {
    ....1  
    ....2  type var shared_future<type> ftr
               = async( [=]{ return work( arg ); } );
    ....3  if ( cond1 ) {
    ....4      use1( var ftr.get() );
    ....5  } else {
    ....6      use2( var ftr.get() );
    ....7  }
    ....8  use3( var ftr.get() ); // second use is defined
}

Const Reference

Problem: The return late approach fails for slightly more complicated examples. For instance, modify the else clause in our example to either read or write var.


void write_to_get( type arg ) {
    ....1  
    ....2  type var shared_future<type> ftr
               = async( [=]{ return work( arg ); } );
    ....3  if ( cond1 ) {
    ....4      use1( var ftr.get() );
    ....5  } else {
    ....6      if ( cond2 ) {
    ....7          use2( var ftr.get() );
    ....8      } else {
    ....9          var ftr.get() = something();
                   // assign to const reference!
    ....10     }
    ....11 }
    ....12 use3( var ftr.get() );
}

The problem is that the get() on shared_future returns a const reference, which means that the update of the variable is ill-formed.

Solution: The return late approach can work in general only if the get function returns a non-const reference to the appropriate storage. That is, the shared_future::get call serves as a replacement for naming the former variable that held the function return value.

Naming Types

Problem: Programmers must name the return type when declaring the shared_future; auto is not available within template argument lists.

Solution: The burden of manual type inference can be eliminated by providing a second async that returns a shared_future.


void auto_shared( type arg ) {
    ....1  
    ....2  type var auto ftr
               = shared_async( [=]{ return work( arg ); } );
    ........
}

Solution: The burden of manual type inference can also be eliminated by providing an operation to turn an rvalue unique future into an rvalue shared_future:


template< Type >
shared_future< Type > future::share() { .... }

Which changes the above code to:


void auto_shared( type arg ) {
    ....1  
    ....2  type var auto ftr
               = async( [=]{ return work( arg ); } ).share();
    ........
}

Observation: The cost of conversion between unique future and shared_future is very low because they share the same representation.

Undefined Behavior

Problem: Failing to apply the workaround results in undefined behavior.

Strawman: To avoid undefined behavior, the library committee showed a strong preference [13-2-0-4] for have a second use of such a get throw an exception. Unfortunately, that approach turns a problem that is diagnosable at compile time into a defined run-time behavior. That is, a programmer could exploit the defined throw to control normal program execution.

Solution: Leave the behavior as undefined, but encourage implementations to detect a second call and then either call std::terminate or throw an exception.

Problem: Programmers must use a shared_future only for some uses. The core of the problem is that unique future was designed around move-only types, and thus permits only one call to get.

Solution: We can address the problem by separating the sharing of futures from their applicability to move-only types. In particular, we can make sharing a feature of the future type, and 'move only' a feature of operations on that future. There is support [9-5-1-3] for allowing multiple get operations on a unique future for a copyable type.

Note: This solution also substantially reduces the need for easy type conversion, because it preserves the unique future for concurrency conversion.

Problem: In generic contexts, different constraints on get depending on the template argument may lead to subtly broken code.

Solution: Separate the get operation into two variants, one intended to be called only once and one intended to be called multiple times. The latter does not support move-only types, and so would not be available in specializations of futures on move-only types. There is support [10-2-1-4] for renaming the only-once get to something else. Suggestions include release, pop, and drain.

Deferred Functions

Technique: One of the mechanisms that supports easy concurrency is deferred functions. This mechanism enables the system to adapt to the amount of parallelism available, which avoids over-subscribing the system. That is, programmers can specify more concurrency than they think they need with confidence that the excess will not have a significant cost.

Observation: The single-use get of unique future keeps the cost of excess concurrency low by avoiding threads when they are not helpful. The implementation need not store the function result, but can instead return it directly. The implementation does not need to catch exceptions, but can instead pass them up unhindered.

Copyable Versus Move-Only Types

The current futures have seemingly inconsistent support for copyable and move-only argument types.

Observation: The current unique future supports only a single-use, value-returning get. This design is most appropriate to move-only types.

Observation: The current shared_future supports only a multi-use, const-reference returning get. This design is most appropriate to copyable types. In particular, it does not support moving a result out of the future. This design means that its use with move-only types is restricted to accessing the result through the const reference returned. This may or may not be a problem, depending on the type (e.g. a const unique_ptr<T> still allows non-const access to the T object).

Observation: The single-use value get is move optimized. E.g. std::vector<int> v = ftr.get().

Observation: The const reference get is access optimized. E.g. int i = ftr.get()[3].

Get Value Concurrently

The shared_future is designed to be shared between threads, that is to allow multiple concurrent get operations.

Note: There is some committee support [4-10-2-1] for renaming shared_future to multi_future.

External Synchronization

Problem: The get operation on shared_future currently returns a const reference. This reference persists beyond the scope of any locking that operations on the future can provide. Thus, access through the reference requires programmer-provided external synchronization.

Solution: Programmers can avoid that external synchronization by returning a type that is thread-safe for const qualification.

Problem: The objects in the standard library must meet this const requirement, but it is not presently a requirement on user types and the existing specification for shared_future does not make such a requirement.

Solution: Normatively, make this const requirement apply to the argument types of shared_future.

Problem: Such a const requirement would fail for a get returning a non-const reference, which was our solution to an earlier problem.

Solution: We can provide a different get operation that returns a copy of the future object. This copying would be done under a lock internal to the future's liaison state.

Note: This lock is not required for a unique future because it a priori cannot be copied concurrently.

Note: The copy-returning get can exploit copy elision. Returning a copy is not necessarily any more expensive than returning a reference and copying the object outside the call.

Problem: The use of a non-const-reference-returning get requires care in managing external synchronization.

Solution: Do not provide a non-const-reference-returning get for shared_future. (This solution constrains expert programmers.)

Moving Get

Problem: A version of get that returns a copy is not applicable to move-only types.

Solution: Disable shared_future for move-only types. This solution is more extreme than the status quo.

Solution: Disable copying of move-only types out of shared_future, permitting only const lvalue access. This solution is the status quo.

Problem: Sharing a future for move-only types would enable "first available consumer" program structures, and hence disabling shared_future for move-only types, or providing onliy reference-returning get, would limit programmers.

Solution: Provide a get that moves out its value, and throws an exception on a second call.

Problem: The above solution puts exception handling into non-exceptional code. Indeed, with a shared_future, one can expect multiple calls because otherwise a unique future would suffice.

Strawman: Return a default-constructed object when the object has been previously moved. Unfortunately, this strawman puts an undesireable constraint on programmers.

Solution: We can avoid the multiple-move attempt by providing a get operation that move-assigns the value if and only if it has not been moved. That get operation would return a boolean indicating whether or not the move occurred.

For example,


moveonly var;
....
if ( ftr.moving_get( var ) ) {
    .... // var has the value; use it.
}
else {
    .... // another thread already moved the value;
    // var does not have the value; do something else.
}

Solution: We can avoid the multiple-move attempt by providing a get operation that calls a user function if and only if the value has not been moved. That function would receive the moved value as a parameter. For convenience, that get operation would return a boolean indicating whether or not the function was called.

For example, we can forward the value to a function.


....
if ( ftr.conditional_get( [](moveonly arg){ forward(arg); } ) ) {
    .... // value was forwarded.
}
else {
    .... // value was not forwarded.
}

Move Invalidation

Problem: A call-once get changes the future state to invalid. The wait operations have a precondition of valid. For unique future this precondition is not a problem. However, for a shared_future, another thread could invalidate the precondition just before a call to wait.

Solution: Permit wait calls on an invalid future. Return immediately from the wait.

Atomic Operation

The working draft defines an atomic_future for direct sharing of future variables. This type is modeled on shared_future, which is not a bad approach at the high level. Indeed the discussion above that applies to shared_future also applies to atomic_future. Unfortunately, the other details of the interface of atomic_future seem less sound.

Problem: The copy assign operation implies a transactional read and write, which is less performant than alternatives, as it implies acquiring a pair of locks, e.g. with std::lock().

Solution: Assign a shared_future to an atomic_future, rather than assigning an atomic_future, Do the same with the copy constructor.

Problem: The copy assign operation returns a reference to the left-hand side, which is problematic, as the subsequent read implied by that reference is not coordinated with the write.

Solution: Return an rvalue rather than an lvalue.

Problem: There is no mechanism to obtain a shared_future from an atomic_future.

Solution: Add an implicit conversion operator on atomic_future yielding a shared_future.

Problem: There is no mechanism to swap the liaison state of an atomic_future.

Solution: Add an exchange operation, like those of chapter 29.

Problem: There is no mechanism to compare-and-swap the liaison state of an atomic_future.

Solution: Add a compare_exchange operation, like those of chapter 29. Unlike those operations, the measure of equality is whether or not the futures refer to the same liaison state.

Observation: The package_task that sets a future's liaison state happens before successful get operations (30.6.4 [futures.state], and so the get operation provides memory synchronization. Thus there appears no need for memory synchronization specifications on atomic_future operations.

Problem: The sequential consistency of atomic_future operations is not specified.

Solution: The solution is arguable. However, for consistency with other atomic operations that do not specify memory order, atomic_future operations should be sequentially consistent.

Problem: The liaison state is atomically reference counted, and extra work is performed on temporaries.

Solution: Add move operations. These operations only affect performance.

Solution: This consideration also applies to shared_future.

Problem: The get operation returns a const reference, as does shared_future. This return implies that future values must be thread-safe read only.

Solution: Add a locking get to copy out the future value, even when copies are mutating.

Problem: Such an interface would be incompatible with sharing the liaison state between shared_future and atomic_future, which in turn implies the solutions above may be ill-advised.

Solution: Give ourselves more time to develop the atomic_future interface by removing atomic_future from the library.

Problem: The existing interface is already in the FCD, and requires consensus to remove.

Summary

The existing futures interfaces have problems. Our solution consists of two primary approaches:

We preserve the distinction that a unique future has unique ownership of the get side of the liaison state. We also preserve the distinction that a shared_future enables multiple ownership of the get side of the liaison state. Finally, we preserve the distinction that an atomic_future enables concurrent access to a future variable.

The operations on unique future and shared_future discussed in this paper and their analysis are as follows.

operation unique future shared_future
<moveonly> <copyable> <moveonly> <copyable>
get only once, returning rvalue throw on second call problem: no diagnostics, throw abuse problem: non-exceptional throw
undefined (terminate) on second call okay: supports call early, diagnostics unsupportable: no effective way to ensure that another thread will not grab the value
conditional get, returning bool move assign unneeded: testing valid beforehand is sufficient okay: minimal, simple support for move-only types in shared_future
callback okay: generalizes support for move-only types in shared_future
get called many times returns rvalue unsupportable: requires copying non-copyable types okay: supports value futures unsupportable: requires copying non-copyable types okay: supports locked copying
returns non-const lvalue restricted: must use result in place okay: supports return value late tricky: requires external synchronization
returns const lvalue restricted: must use result in place, does not support general late return restricted: does not support general late return restricted: must use result in place, requires thread-safe const restricted: requires thread-safe const
valid okay tricky: one-way unstable; in the presence of moving gets, true does not imply that it will remain so
wait operations okay modify: needs update for invalid futures

Observations on the atomic_future interface are as follows.

operation observation
copy
assignment
const atomic_future& argument, atomic_future& lvalue result problem: implies transactional semantics, uncoordinated read and write
const shared_future& argument, shared_future rvalue result okay: similar to atomic (chapter 29) operations
copy
constructor
atomic_future argument problem: atomic types should be variables, not values; similar to atomic (chapter 29) operations
shared_future argument, shared_future rvalue result okay: similar to atomic (chapter 29) operations
move
assignment
atomic_future&& argument, atomic_future& lvalue result problem: atomic types should be variables, not values; similar to atomic (chapter 29) operations
shared_future&& argument, shared_future rvalue result optional: implementations can avoid some reference counting
move
constructor
atomic_future argument problem: atomic types should be variables, not values; similar to atomic (chapter 29) operations
shared_future argument, shared_future rvalue result optional: implementations can avoid some reference counting
conversion
operator,
shared_future
rvalue result
const member function okay: similar to atomic (chapter 29) operations
&& member function optional: implementations can avoid some reference counting
exchange
function,
shared_future
rvalue result
plain member function okay: similar to atomic (chapter 29) operations
&& member function optional: implementations can avoid some reference counting
compare_exchange
function
plain member function okay: similar to atomic (chapter 29) operations

Solution

We choose generally conservative solutions to the issues raised by the national body comments.

US 202 — 30.6.8/18

The national body comment is:

The note in this paragraph says "unlike future, calling get more than once on the same atomic_future object is well defined and produces the result again." There is nothing in future that says anything negative about calling get more than once.

Remove this note, or add words to the requirements for future that reflect what this note says.

This comment indicates a misinterpretation of the intent of the future design. We address weaknesses in the design exposed by the misinterpretation and the subsequent committee discussion.

Multiple future::get call diagnosis

The current unique future makes multiple calls to get undefined behavior. We propose to add a suggestion that implementers detect a second call and throw future_error with an error condition of future_errc::no_state.

Support for restricted late return

We ease use of the late-return pattern by providing an easy conversion from future to shared_future. We propose to add a future::share operation to enable effective use of auto.

Enable safe concurrent get

The current definition of shared_future enables data races on get. We propose to avoid this by adding a requirement that the argument type to shared_future meet the 'const means concurrency-safe' requirement in place for the standard library. [FIX Also for atomic_future if we keep it.]

US 203 — 30.6.8

The national body comment is:

Both future and shared_future specify that calling most member functions on an object for which valid() == false produces undefined behavior. There is no such statement for atomic_future. Add a new paragraph after 30.6.8 [futures.atomic_future]/2 with the same words as 30.6.7 [futures.shared_future]/3.

We believe the proposed resolution goes in the wrong direction. Atomic futures may become invalid due to concurrent agents, and hence making operations on them invalid is unsupportable. While shared_future does not currently suffer from this problem, plausible future extensions could introduce it. So, we propose to change shared_future operations that currently require valid() == true to instead throw throw future_error with an error condition of future_errc::no_state.

US 204 — 30.6.8/7-8

The national body comment is:

According to the definition of atomic_future, all members of atomic_future are synchronizing except constructors. However, it would probably be appropriate for a move constructor to be synchronizing on the source object. If not, the postconditions on paragraphs 7-8, might not be satisfied. This may be applicable if a collection of futures are being doled out to a set of threads that process their value.

Make the move constructor for atomic future lock the source.

We could not agree on what the interface to atomic_future should be nor could we find a use case for it. Therefore, we propose to remove atomic_future.

However, if we continue to support atomic_future, then we concur with the proposed resolution.

US 205 — 30.6.9/3

The national body comment is:

The third sentence says "If the invocation is not deferred, a call to a waiting function on an asynchronous return object that shares the associated asynchronous state created by this async call shall block until the associated thread has completed." The next sentence says "If the invocation is not deferred, the join() on the created thread..." Blocking until a thread completes is not necessarily a join.

Decide whether the requirement is to block until finished or to call join, and rewrite to match.

[FIX Note changed proposal.] We propose to clarify that "block until finished" is the same as "as if join()", which preserves the happens-before chain.

Wording

30.6.1 Overview [futures.overview]

Adjust the synopsis to reflect the changes below.

30.6.6 Class template future [futures.unique_future]

Edit paragraph 3 as follows.

The effect of calling any member function other than the destructor, the move-assignment operator, or valid on a future object for which valid() == false is undefined. [Note: Implementations are encouraged to detect this case and throw future_error with an error condition of future_errc::no_state. —end note]

Edit the synopsis as follows.


namespace std {
  template <class R>
  class future {
  public:
    future();
    future(future &&);
    future(const future& rhs) = delete;
    ~future();
    future& operator=(const future& rhs) = delete;
    future& operator=(future&&);

    shared_future<R> share();

    ....

After paragraph 11, add a new function entry as follows.

shared_future<R> share();

Effects: shared_future<R>(std::move(*this));

Postcondition: valid() == false

Remove paragraphs 13, 19, 21, and 24.

Requires: valid() == true.

30.6.7 Class template shared_future [futures.shared_future]

Edit paragraph 3 as follows.

The effect of calling any member function other than the destructor, the copy-assignment operator, the move-assignment operator, or valid() on a shared_future object for which valid() == false is undefined shall throw future_error with an error condition of future_errc::no_state.

Edit paragraph 19 as follows.

Requires: valid() == true. The copy constructor of R and calls through the reference returned from get() shall not introduce a data race (17.6.3.10 [res.on.objects]).

Remove paragraphs 24, 26, and 29.

Requires: valid() == true.

30.6.8 Class template atomic_future [futures.atomic_future]

Remove this section. Otherwise, edit as below.

Edit paragraph 2 as follows.

Unlike future and shared_future, member functions of atomic_future other than constructors and destructors are synchronization operations (1.10). The copy constructor is a synchronizing operation on the argument object, but not the constructed object. Accessor member functions perform acquire operations on the object. All member function calls shall be included in the order of memory_order_seq_cst operations (29.3). All synchronizing operations on atomic_future are sequentially consistent (1.10, 29.3).

Remove paragraph 13.

Throws: future_error with an error condition of no_state if the precondition is not met.

After paragraph 14 add a new paragraph as follows.

Requires: The copy constructor of R and calls through the reference returned from get() shall not introduce a data race (17.6.3.10 [res.on.objects]).

Edit paragraphs 21, 25, and 29 as follows.

Throws: future_error with an error condition of no_state if valid() == false, otherwise future_error if an error condition occurs.

30.6.9 Function template async [futures.async]

Edit paragraph 5 as follows.

Synchronization: the invocation of async happens before (1.10) the invocation of f. [Note: this statement applies even when the corresponding future object is moved to another thread. —end note] If the invocation is not deferred, a call to a waiting function on an asynchronous return object that shares the associated asynchronous state created by this async call shall block until the associated thread has completed, as if joined ([thread.thread.member]). If the invocation is not deferred, the join() on the created thread happens-before (1.10) the first function that successfully detects the ready status of the associated asynchronous state returns or before the function that gives up the last reference to the associated asynchronous state returns, whichever happens first. If the invocation is deferred, the completion of the invocation of the deferred function happens-before the calls to the waiting functions return.