Remove std::reference_closure

Remove `std::reference_closure`

ISO/IEC JTC1 SC22 WG21 N2845 = 09-0035 - 2009-03-05

Lawrence Crowl, crowl@google.com, Lawrence@Crowl.org
Douglas Gregor, doug.gregor@gmail.com
David Abrahams, dave@boostpro.com

Introduction
Issues
Benchmark
std::function Optimization
    Small Function Object
    Direct Copy Call
    Move Semantics
    Use LLVM
    Results
    Sources
Future Optimization
Conclusion
Proposal
    5.1.1 Lambda expressions [expr.prim.lambda]
    20.6.18 Class template reference_closure [func.referenceclosure]

Introduction

The specification of lambda expressions adopted with N2550 Lambda Expressions and Closures: Wording for Monomorphic Lambdas (Revision 4) included a specification that closures consisting only of references be implemented as a class derived from std::reference_closure. The intent of this specification was to enable improved performance of an class of closures across binary interfaces.

N2830 Problems with reference_closure proposed that std::reference_closure be removed from the language and provided some evidence for that position. N2839 Response to "Problems with reference_closure" disputed some of that evidence and argued for keeping std::reference_closure. This paper provides new techniques for aggressive optimization of std::function and corresponding benchmark results that show that the relative cost of std::function to std::reference_closure can be much lower than previous evidence suggested. This new evidence enables a consensus agreement to remove std::reference_closure.

This paper summarizes the issues, describes the new std::function optimization techniques, presents the benchmark results, and proposes changes to the working draft.

Issues

Closures have anonymous types, and are hence not suitable for binary interfaces. The expected development model for binary interfaces using closures is to first represent the closures with std::function. When there is evidence of a need for additional performance, an additional overloaded interface uses std::reference_closure to handle the appropriate subset more efficiently.

There are problems with taking this approach.

The user must write the additional overloads. This work can be ameliorated by having both versions use a common templated implementation.
Not all closure types are handled by std::reference_closure. This lack of support could be ameliorated by changing the lambda, but the workaround is not generally applicable.
The closure type must be derived from std::reference_closure, which requires the closure type to contain a function pointer that it might not otherwise require. This unused space can be ameliorated by a compiler that does function cloning and parameter propogation.

There are problems with not taking this approach.

There is no indication in use of the std::function parameter type that the closure will not be used past completion of the function. That is, there is no obvious guarantee that the closure type will be used only during its lifetime. So, there is a risk of use after destruction. This risk can be ameliorated by passing the std::function by reference.
Implementations of std::function are slower than implementations of std::reference_closure.

Benchmark

Since the purpose of std::reference_closure is performance, a benchmark is appropriate. The benchmark measures the penalties of using lambdas as a control abstraction, and early results for that benchmark influenced the decision to adopt std::reference_closure.

The basis of the benchmark is that:

Users form lambdas to describe tasks, e.g, [&]() { do_some_work(); }.
The lambdas are passed into a parallel scheduling library as tasks.
The parallel scheduling library executes the tasks, often predominately in a serial context. (That is, the exploited parallelism may be much lower than the possible parallelism.)

The benchmark itself consists of a many repetitions of the following.

Logical Action Representation Operation

Build the closure object. n/a

Pass the closure to the task scheduler as a "callback". construct

Copy the callback to the execution engine. copy

Invoke the callback to the original closure object. indirect call

Logical Action	Representation Operation
Build the closure object.	n/a
Pass the closure to the task scheduler as a "callback".	construct
Copy the callback to the execution engine.	copy
Invoke the callback to the original closure object.	indirect call

The benchmark environment consists of:

Mac OS 10.5.6
2.66GHz MacBook Pro
Apple GCC 4.0.1
Apple libstdc++ 4.0
TR1 implementation of std::function

The initial results are similar to those obtained at the adoption of std::reference_closure. Those results show std::function with 23.5 times the overhead of std::reference_closure.

`std::function` Optimization

The methodology of the optimization work is:

Ensure that the benchmark is testing what it is meant to test.
Optimize the hot path in std::function without a loss of generality.

Small Function Object

The implementation of std::function has a "small function object" optimization. This optimization eliminates a malloc and free pair on each copy. Unfortunately, this optimization was not enabled. Specializing the trait corrected the problem. We anticipate that in C++0x implementation, the problem will not arise.

Direct Copy Call

The implementation of std::function's copy constructor uses an indirect call. This call is needed for the general case. When the "small function object" has a trivial copy constructor, the implementation can simply copy the bits and avoid that call.

Move Semantics

The benchmark copies the callback. In C++0x, we would move from it, which eliminates a single branc in the copy (move) operation.

Use LLVM

The LLVM compiler generated somewhat better code than the GCC compiler.

Results

The results of the benchmark ranged from an overhead factor of 1.6 for the 32-bit architecture to 2.2 for the 64-bit architecture. (The difference is probably mostly because the 64-bit architecture passes small structs in registers.)

Sources

The benchmark is in Boost Subversion at http://svn.boost.org/svn/boost/sandbox/reference_closure.

The optimized std::tr1::function is on the committee's Wiki (functional/functional_iterate.h). This version can drop in to Apple GCC 4.0.1. An unencumbered version of the optimized std::tr1::function will be available in the Boost repository.

Future Optimization

The compiled implementation of std::reference_closure is generally fairly good. However, it has some unnecessary memory operations and could yield performance improvements with optimizer attention.

The implementation of std::function throws an exception if its function pointer is null. This implies testing that pointer for null, which is expensive. The implementation could use a pointer to a function that throws rather than a null pointer, thus saving the branch.

Conclusion

We conclude that std::function has and will likely continue to have double the overhead of std::reference_closure. However, there are significant compiler implementation and user programability costs associated with a second, logically equivalent, binary representation for closures. On balance, we recommend removing std::reference_closure.

Proposal

We propose to remove std::reference_closure from the standard.

5.1.1 Lambda expressions [expr.prim.lambda]

Remove paragraph 12.

If every name in the effective capture set is preceded by & and the lambda expression is not mutable, F is publicly derived from std::reference_closure<R(P)> (20.6.18), where R is the return type and P is the parameter-type-list of the lambda expression. Converting an object of type F to type std::reference_closure<R(P)> and invoking its function call operator shall have the same effect as invoking the function call operator of F. [Note: This requirement effectively means that such F’s must be implemented using a pair of a function pointer and a static scope pointer. —end note]

20.6.18 Class template `reference_closure` [func.referenceclosure]

Remove the entire section 20.6.18 (from N2800) Class template reference_closure [func.referenceclosure], including

20.6.18.1 Construct, copy, destroy [func.referenceclosure.cons]
20.6.18.2 Observer [func.referenceclosure.obs]
20.6.18.3 Invocation [func.referenceclosure.invoke]
20.6.18.4 Comparison [func.referenceclosure.compare]