Improving shared_ptr for C++0x

Overview
Allocator Support
Aliasing Support
Object Creation
Move Support
Atomic Access
Cycle Collection

I. Overview

While shared_ptr has already proven its utility time and again, we have observed frequent requests for enhancements in several legitimate and key areas:

Ability to control its internal allocations;
Aliasing support;
A factory function that eliminates the need for an explicit new expression;
Reduced reference counting overhead when using shared_ptr in "mostly unique" ownership situations such as containers;
A variant of shared_ptr that is atomic, that is, safe to be manipulated from multiple threads without synchronization;
A way to reclaim cyclic shared_ptr structures;
A variant of shared_ptr that is confined to a single thread and uses unsynchronized reference count updates.

This document proposes additions to the C++0x standard to address the first six. Some of the proposed additions are essentially a subset of those presented in N1851 by Kliatchko and Rocha. We refer the interested reader to that paper for a more extensive rationale.

This proposal makes use of variadic templates and rvalue references. These C++0x features markedly improve the usability and effectiveness of several of the proposed additions.

II. Allocator Support

The default behavior of shared_ptr is to allocate its control block using new. This precludes its use in contexts where uncontrolled dynamic allocations are not allowed, such as embedded environments or the implementation of ::operator new. The proposed addition allows the user to supply an allocator that shared_ptr will use.

Impact:

This feature affects the interface of shared_ptr, allowing its broader use, and is therefore strongly recommended to be added to the C++0x standard.

Proposed text:

Add to shared_ptr [util.smartptr.shared] the following constructor:

template<class Y, class D, class A> shared_ptr( Y * p, D d, A a );

and the following member function:

template<class Y, class D, class A> void reset( Y * p, D d, A a );

Change the section:

template<class Y, class D> shared_ptr( Y * p, D d );
Requires: p shall be convertible to T*. D shall be CopyConstructible. The copy constructor and destructor of D shall not throw exceptions. The expression d(p) shall be well-formed, shall have well defined behavior, and shall not throw exceptions.

Effects: Constructs a shared_ptr object that owns the pointer p and the deleter d.

in [util.smartptr.shared.const] to:

template<class Y, class D> shared_ptr( Y * p, D d );
template<class Y, class D, class A> shared_ptr( Y * p, D d, A a );
Requires: p shall be convertible to T*. D shall be CopyConstructible. The copy constructor and destructor of D shall not throw. The expression d(p) shall be well-formed, shall have well defined behavior, and shall not throw. A shall be an allocator [allocator.requirements]. The copy constructor and destructor of A shall not throw.

Effects: Constructs a shared_ptr object that owns the pointer p and the deleter d. The second constructor shall use a copy of a to allocate memory for internal use.

Add the following to [util.smartptr.shared.mod]:

template<class Y, class D, class A> void reset( Y * p, D d, A a );
Effects: Equivalent to shared_ptr( p, d, a ).swap( *this ).

Implementability:

This feature has been added to boost::shared_ptr and will be part of Boost 1.35. See:

http://boost.cvs.sourceforge.net/*checkout*/boost/boost/boost/shared_ptr.hpp
http://boost.cvs.sourceforge.net/*checkout*/boost/boost/libs/smart_ptr/test/shared_ptr_alloc2_test.cpp

for reference.

III. Aliasing Support

Advanced users often require the ability to create a shared_ptr instance p that shares ownership with another (master) shared_ptr q but points to an object that is not a base of *q. *p may be a member or an element of *q, for example. This section proposes an additional constructor that can be used for this purpose.

An interesting side effect of this increase of expressive power is that now the *_pointer_cast functions can be implemented in user code. The make_shared factory function presented later in this document can also be implemented using only the public interface of shared_ptr via the aliasing constructor.

Impact:

This feature affects the interface of shared_ptr in a way that increases its expressive power and is therefore strongly recommended to be added to the C++0x standard.

Proposed text:

Add to shared_ptr [util.smartptr.shared] the following constructor:

template<class Y> shared_ptr( shared_ptr<Y> const & r, T * p );

Add the following to [util.smartptr.shared.const]:

template<class Y> shared_ptr( shared_ptr<Y> const & r, T * p );
Effects: Constructs a shared_ptr instance that stores p and shares ownership with r.

Postconditions: get() == p && use_count() == r.use_count().

Throws: nothing.

[Note: To avoid the possibility of a dangling pointer, the user of this constructor must ensure that p remains valid at least until the ownership group of r is destroyed. --end note.]

[Note: This constructor allows creation of an empty shared_ptr instance with a non-NULL stored pointer. --end note.]

Implementability:

This feature has been added to boost::shared_ptr and will be part of Boost 1.35. See:

http://boost.cvs.sourceforge.net/*checkout*/boost/boost/boost/shared_ptr.hpp
http://boost.cvs.sourceforge.net/*checkout*/boost/boost/libs/smart_ptr/test/shared_ptr_alias_test.cpp

for reference.

IV. Object Creation

Consistent use of shared_ptr can eliminate the need to use an explicit delete, but it currently provides no support in avoiding explicit new. There have been repeated requests from users for a factory function that creates an object of a given type and returns a shared_ptr to it. Besides convenience and style, such a function is also exception safe and considerably faster because it can use a single allocation for the object and its corresponding control block, eliminating a significant portion of shared_ptr's construction overhead. This function eliminates one of the major efficiency complaints about shared_ptr.

This section proposes a family of overloaded function templates, make_shared<T> and allocate_shared<T>, to address this need. make_shared uses the global operator new to allocate memory, whereas allocate_shared uses an user-supplied allocator, allowing finer control consistent with section II of this document.

Impact:

This feature does not affect the interface of shared_ptr. It is possible to implement in a non-intrusive way using only the public interface, as long as aliasing support is present. Access to implementation details can eliminate between 5 and 8 bytes of storage overhead on a typical 32 bit platform.

The addition is a strong candidate for the C++0x standard, but can be relegated to a technical report.

Proposed text:

In the following, the ... notation denotes a variable number of arguments. Once variadic templates are voted into the WP, the intent is to replace this notation with the appropriate variadic template syntax, for consistency with N2192.

Synopsis:

namespace std {
  template<class T> shared_ptr<T> make_shared();
  template<class T, class A> shared_ptr<T> allocate_shared( A const & a );

  template<class T, class A1, ..., class An> shared_ptr<T> make_shared( A1 const & a1, ..., An const & an );
  template<class T, class A, class A1, ..., class An> shared_ptr<T> allocate_shared( A const & a, A1 const & a1, ..., An const & an );
}

Description:

template<class T> shared_ptr<T> make_shared();
template<class T, class A> shared_ptr<T> allocate_shared( A const & a );

template<class T, class A1, ..., class An> shared_ptr<T> make_shared( A1 && a1, ..., An && an );
template<class T, class A, class A1, ..., class An> shared_ptr<T> allocate_shared( A const & a, A1 && a1, ..., An && an );
Requires: The expression new( pv ) T() or new( pv ) T( std::forward<A1>(a1), ..., std::forward<An>(an) ), where pv is a void* pointing to storage suitable to hold an object of type T, shall be well-formed and shall have well defined behavior. A shall be an allocator [allocator.requirements]. The copy constructor and destructor of A shall not throw.

Effects: Allocates memory suitable for an object of type T and constructs an object in it via the placement new expression new( pv ) T() or new( pv ) T( std::forward<A1>(a1), ..., std::forward<An>(an) ). allocate_shared uses a copy of a to allocate memory.

Returns: A shared_ptr instance that stores and owns the address of the newly constructed object of type T.

Postconditions: get() != 0 && use_count() == 1.

Throws: bad_alloc, or an implementation-defined exception when a resource other than memory could not be obtained.

Exception safety: If an exception is thrown, has no effect.

[Note: These functions will typically allocate more memory than sizeof(T) to allow for internal bookkeeping structures such as the reference counts. --end note]

Implementability:

A proof of concept non-intrusive implementation is available at:

http://www.pdimov.com/cpp/make_shared.cpp

This implementation uses variadic templates and rvalue references. When these features are not available, it falls back on a family of overloaded function templates taking arguments by const reference.

V. Move Support

Users often express concerns over the cost of copying a shared_ptr in situations where the source of the copy is no longer needed. To address this use case, N1851 proposes a separate smart pointer, managed_ptr, that is convertible from and to shared_ptr and enforces unique ownership.

The current proposal does not take this approach. Instead, we propose that move constructors and move assignment operators be added to shared_ptr. This allows a shared_ptr to be as efficient as an auto_ptr or the proposed unique_ptr when the source of the copy or assignment is a temporary or no longer needed. Move-aware standard containers will automatically take advantage of this optimization. As an example of the consequences, reallocating a vector< shared_ptr<T> > will no longer entail any reference count updates.

Impact:

This feature affects the interface of shared_ptr in a way that reduces its copy overhead and is in line with the rvalue recommendations for the standard library presented in N1859-N1862. We believe that it is a strong candidate for addition to the C++0x standard.

Proposed text:

Add to shared_ptr [util.smartptr.shared] the following:

shared_ptr( shared_ptr && r );
template<class Y> shared_ptr( shared_ptr<Y> && r );

shared_ptr& operator=( shared_ptr && r );
template<class Y> shared_ptr& operator=( shared_ptr<Y> && r );

Add the following to [util.smartptr.shared.const]:

shared_ptr( shared_ptr && r );
template<class Y> shared_ptr( shared_ptr<Y> && r );
Requires: For the second constructor Y* shall be convertible to T*.

Effects: Move-constructs a shared_ptr instance from r.

Postconditions: *this contains the old value of r. r is empty.

Throws: nothing.

Add the following to [util.smartptr.shared.assign]:

shared_ptr& operator=( shared_ptr && r );
template<class Y> shared_ptr& operator=( shared_ptr<Y> && r );
Effects: Equivalent to shared_ptr( move( r ) ).swap( *this ).

Returns: *this.

Implementability:

This feature has been added to boost::shared_ptr and will be part of Boost 1.35.

VI. Atomic Access

The level of thread safety offered by shared_ptr is the default for the standard library: const operations on the same instance can be performed concurrently by multiple threads, but updates require exclusive access. Some scenarios require stronger guarantees; there is considerable demand (on the Boost mailing lists and otherwise) for an atomic shared_ptr, one that can withstand concurrent updates from two threads without synchronization.

Instead of providing a separate atomic smart pointer, we propose additions to shared_ptr's interface to address this use case. A typical example scenario — a "lock-free" reader/writer pattern — that takes advantage of this functionality is outlined below:

shared_ptr<State> ps;

void reader()
{
    shared_ptr<State> p = ps.copy();
    use *p;
}

// single writer case
void writer()
{
    shared_ptr<State> p( new State( *ps ) );
    update *p reflecting new information;
    ps.replace( move( p ) );
}

// or, multiple writers case
void writer()
{
    shared_ptr<State> p = ps.copy();
    do
    {
        shared_ptr<State> q( new State( *p ) );
        update *q reflecting new information;
    }
    while( !ps.compare_swap( p, move( q ) ) );
}

Impact:

This feature affects the interface of shared_ptr. We believe that it is a strong candidate for addition to the C++0x standard.

Proposed text:

Add to shared_ptr [util.smartptr.shared] the following:

// [util.smartptr.shared.atomic], atomic access:

shared_ptr copy() const;
void replace( shared_ptr r );
bool compare_swap( shared_ptr & v, shared_ptr w );

Add a new section [util.smartptr.shared.atomic]:

A program is allowed to access the same shared_ptr instance concurrently from multiple threads if the access is done exclusively via the member functions in this section.
shared_ptr copy() const;
Returns: *this.

Throws: nothing.
void replace( shared_ptr r );
Effects: swap( r ).

Throws: nothing.
bool compare_swap( shared_ptr & v, shared_ptr w );
Effects: If *this is equivalent to v, assigns w to *this, otherwise assigns *this to v.

Returns: true if *this was equivalent to v, false otherwise.

Throws: nothing.

Remarks: two shared_ptr instances are equivalent if they store the same pointer value and share ownership.

Implementability:

A typical implementation would use a spinlock pool as described in N2145. In pseudocode, the functions will be implemented as follows:

shared_ptr copy() const
{
    lock the spinlock for *this;
    shared_ptr r( *this );
    unlock the spinlock for *this;
    return r;
}

void replace( shared_ptr r )
{
    lock the spinlock for *this;
    swap( r );
    unlock the spinlock for *this;
}

bool compare_swap( shared_ptr & v, shared_ptr w )
{
    lock the spinlock for *this;

    if( *this is equivalent to v )
    {
        swap( w );
        unlock the spinlock for *this;
        return true;
    }
    else
    {
        shared_ptr tmp( *this );
        unlock the spinlock for *this;
        tmp.swap( v );
        return false;
    }
}

Note that the code carefully avoids destroying a non-empty shared_ptr instance while holding the spinlock, as this may lead to user code being called and present an opportunity for a deadlock. It also attempts to limit the spinlock scope to the minimum necessary to prevent contention.

VII. Cycle Collection

The inability of shared_ptr to handle cyclic structures has long been cited as one of its important downsides. The proposed addition offers a way for users to opt to reclaim cyclic structures via implementing a support interface for their nodes. It imposes no overhead on existing non-cyclic uses that do not implement the interface.

Impact:

This feature does not affect the interface of shared_ptr, but requires support from the implementation in a similar way to enable_shared_from_this. We believe that it is therefore a candidate for addition to the C++0x standard.

Proposed text:

Add to the synopsis of <memory> the following:

class enable_sp_collect;
void sp_collect();

Add the following two sections after [util.smartptr.enab]:

Class enable_sp_collect

A class that derives from enable_sp_collect via public and unambiguous inheritance is called collectible. The implementation tracks all collectible classes owned by shared_ptr instances and can reclaim cycles of unreachable collectible objects.
namespace std {

  class enable_sp_collect
  {
  protected:
    ~enable_sp_collect() {}
  public:
    virtual void sp_enumerate( vector< shared_ptr<enable_sp_collect> > & v ) = 0;
    virtual void sp_reset() = 0;
  };

} // namespace std
virtual void sp_enumerate( vector< shared_ptr<enable_sp_collect> > & v ) = 0;
Effects: Appends all shared_ptr instances to collectible objects that *this owns to v.
virtual void sp_reset() = 0;
Effects: Resets or destroys all shared_ptr instances to collectible objects that *this owns.

Throws: nothing.

[Note: This function may be called while the collector is holding a lock. Performing anything else besides resetting the shared_ptr instances is strongly discouraged as it may lead to deadlocks. --end note]

[Example:
class Node: public enable_sp_collect
{
private:

  shared_ptr<Node> link;
  vector< shared_ptr<Node> > links;

public:

  void sp_enumerate( vector< shared_ptr<enable_sp_collect> > & v )
  {
    v.push_back( link );
    v.insert( v.end(), links.begin(), links.end() );
  }

  void sp_reset()
  {
    link.reset();
    links.clear();
  }
};
--end example]

sp_collect
namespace std {

  void sp_collect();

}
Effects: For every collectible object owned by a shared_ptr, calls sp_enumerate. Performs reachability analysis on the returned data and identifies the unreachable collectible objects. Retains temporary shared_ptr references to all unreachable objects. For every unreachable object, calls sp_reset. Drops the temporary references, causing the destruction of the unreachable objects.

[Note: This function is allowed to be called in a separate thread. --end note]

[Note: The reachability analysis may be performed using Christopher's algorithm. Using the instance graph, compute the use counts the objects would have if there were no external shared_ptr instances. Compare the computed use counts with the actual use counts. For every mismatch, mark the object as reachable from the outside. Propagate reachability throughout the graph. --end note]

Implementability:

A very preliminary proof of concept implementation is available at:

http://www.pdimov.com/cpp/enable_sp_collect.cpp

It is expected to be integrated with boost::shared_ptr in time for the mailing.

Thanks to Joe Gottman for his comments on the move support.

--end