Document number: N2297=07-0157
Programming Language C++, Library Subgroup
 
Peter Dimov, <pdimov@pdimov.com>
Beman Dawes, <bdawes@acm.org>
 
2007-06-21

Improving shared_ptr for C++0x, Revision 1

Changes in Revision 1

I. Overview

While shared_ptr has already proven its utility time and again, we have observed frequent requests for enhancements in several legitimate and key areas:

This document proposes additions to the C++0x standard to address the first six. Some of the proposed additions are essentially a subset of those presented in N1851 by Kliatchko and Rocha. We refer the interested reader to that paper for a more extensive rationale.

This proposal makes use of variadic templates and rvalue references. These C++0x features markedly improve the usability and effectiveness of several of the proposed additions.

II. Allocator Support

The default behavior of shared_ptr is to allocate its control block using new. This precludes its use in contexts where uncontrolled dynamic allocations are not allowed. The proposed addition allows the user to supply an allocator that shared_ptr will use.

Boost users have repeatedly been asking for the ability to control the internal control block allocation; one particularly amusing occurence was the desire to use a shared_ptr in the implementation of ::operator new. Typical environments where shared_ptr cannot be used as-is include embedded systems and computer and video games. Paul Pedriana (one of the primary authors of the Electronic Arts STL implementation) writes in N2271 that

EASTL's shared_ptr/weak_ptr allow the user to specify an allocator instead of implicitly using global new

and explains that

As described in the game software issues section, global new usage is often verboten in game software development, at least for console platforms. Thus any library facility which uses global operator new or any memory allocation that cannot be controlled by the user is unacceptable.

Asked to comment on the proposed addition, Paul kindly offered the following quote:

Certainly we (I can speak for all of Electronic Arts here) are in support of your proposal and you can document it as such. You could probably get other game developers and embedded developers to agree as well.

Vladimir Kliatchko and Ilougino Rocha offer additional arguments in favor of this functionality in N1851, complete with proposed wording. The suggested changes in this paper are a subset of those.

The key part of this proposal is that the allocator is not made part of shared_ptr's type. It is instead supplied as a parameter to the shared_ptr constructor in the same manner as the deleter is. This is made possible by the already existing deleter support infrastructure. Coupled with the fact that shared_ptr only allocates memory once in its constructor, and only deallocates it when the last instance in an ownership group is destroyed, this allows us to avoid all common sources of allocator problems. The allocator used for the creation of a particular shared_ptr instance remains purely an implementation detail for the user of that shared_ptr.

Why use the std::allocator interface if it's widely perceived as "broken"? The simple answer is that it works for our (and our users') purposes and is already in the Standard.

[Acknowledgments: this allocator interface has been independently suggested by Doug Gregor, Joe Gottman, Greg Colvin and others on the Boost list, as well as in N1851.]

Impact:

This feature extends the interface of shared_ptr in a backward-compatible way, allowing its broader use, and is therefore strongly recommended to be added to the C++0x standard. It does not impact existing uses, nor does it introduce binary compatibility issues.

Proposed text:

Add to shared_ptr [util.smartptr.shared] the following constructor:

template<class Y, class D, class A> shared_ptr( Y * p, D d, A a );

and the following member function:

template<class Y, class D, class A> void reset( Y * p, D d, A a );

Change the section:

template<class Y, class D> shared_ptr( Y * p, D d );

Requires: p shall be convertible to T*. D shall be CopyConstructible. The copy constructor and destructor of D shall not throw exceptions. The expression d(p) shall be well-formed, shall have well defined behavior, and shall not throw exceptions.

Effects: Constructs a shared_ptr object that owns the pointer p and the deleter d.

in [util.smartptr.shared.const] to:

template<class Y, class D> shared_ptr( Y * p, D d );
template<class Y, class D, class A> shared_ptr( Y * p, D d, A a );

Requires: p shall be convertible to T*. D shall be CopyConstructible. The copy constructor and destructor of D shall not throw. The expression d(p) shall be well-formed, shall have well defined behavior, and shall not throw. A shall be an allocator [allocator.requirements]. The copy constructor and destructor of A shall not throw.

Effects: Constructs a shared_ptr object that owns the pointer p and the deleter d. The second constructor shall use a copy of a to allocate memory for internal use.

Add the following to [util.smartptr.shared.mod]:

template<class Y, class D, class A> void reset( Y * p, D d, A a );

Effects: Equivalent to shared_ptr( p, d, a ).swap( *this ).

Implementability:

This feature has been added to boost::shared_ptr and will be part of Boost 1.35. See:

http://boost.cvs.sourceforge.net/*checkout*/boost/boost/boost/shared_ptr.hpp
http://boost.cvs.sourceforge.net/*checkout*/boost/boost/libs/smart_ptr/test/shared_ptr_alloc2_test.cpp

for reference.

III. Aliasing Support

Advanced users often require the ability to create a shared_ptr instance p that shares ownership with another (master) shared_ptr q but points to an object that is not a base of *q. *p may be a member or an element of *q, for example. This section proposes an additional constructor that can be used for this purpose.

An interesting side effect of this increase of expressive power is that now the *_pointer_cast functions can be implemented in user code. The make_shared factory function presented later in this document can also be implemented using only the public interface of shared_ptr via the aliasing constructor.

Impact:

This feature extends the interface of shared_ptr in a backward-compatible way that increases its expressive power and is therefore strongly recommended to be added to the C++0x standard. It introduces no source- and binary compatibility issues.

Proposed text:

Add to shared_ptr [util.smartptr.shared] the following constructor:

template<class Y> shared_ptr( shared_ptr<Y> const & r, T * p );

Add the following to [util.smartptr.shared.const]:

template<class Y> shared_ptr( shared_ptr<Y> const & r, T * p );

Effects: Constructs a shared_ptr instance that stores p and shares ownership with r.

Postconditions: get() == p && use_count() == r.use_count().

Throws: nothing.

[Note: To avoid the possibility of a dangling pointer, the user of this constructor must ensure that p remains valid at least until the ownership group of r is destroyed. --end note.]

[Note: This constructor allows creation of an empty shared_ptr instance with a non-NULL stored pointer. --end note.]

Implementability:

This feature has been added to boost::shared_ptr and will be part of Boost 1.35. See:

http://boost.cvs.sourceforge.net/*checkout*/boost/boost/boost/shared_ptr.hpp
http://boost.cvs.sourceforge.net/*checkout*/boost/boost/libs/smart_ptr/test/shared_ptr_alias_test.cpp

for reference.

IV. Object Creation

Consistent use of shared_ptr can eliminate the need to use an explicit delete, but it currently provides no support in avoiding explicit new. There have been repeated requests from users for a factory function that creates an object of a given type and returns a shared_ptr to it. Besides convenience and style, such a function is also exception safe and considerably faster because it can use a single allocation for the object and its corresponding control block, eliminating a significant portion of shared_ptr's construction overhead. This function eliminates one of the major efficiency complaints about shared_ptr.

This section proposes a family of overloaded function templates, make_shared<T> and allocate_shared<T>, to address this need. make_shared uses the global operator new to allocate memory, whereas allocate_shared uses an user-supplied allocator, allowing finer control consistent with section II of this document.

The rationale for choosing the name make_shared is that the expression make_shared<Widget>() can be read aloud and conveys the intended meaning. A free function also enables a non-intrusive implementation that can be delivered without modifications to an existing shared_ptr implementation.

Impact:

This feature does not affect the interface of shared_ptr. It is possible to implement in a non-intrusive way using only the public interface, as long as aliasing support is present. Access to implementation details can eliminate between 5 and 8 bytes of storage overhead on a typical 32 bit platform.

The addition is a strong candidate for the C++0x standard, but can be relegated to a technical report.

Proposed text:

Synopsis:

namespace std {
  template<class T> shared_ptr<T> make_shared();
  template<class T, class A> shared_ptr<T> allocate_shared( A const & a );

  template<class T, class A1, ..., class An> shared_ptr<T> make_shared( A1 const & a1, ..., An const & an );
  template<class T, class A, class A1, ..., class An> shared_ptr<T> allocate_shared( A const & a, A1 const & a1, ..., An const & an );
}

Description:

template<class T> shared_ptr<T> make_shared();
template<class T, class A> shared_ptr<T> allocate_shared( A const & a );

template<class T, class... Args> shared_ptr<T> make_shared( Args && ... args );
template<class T, class... Args> shared_ptr<T> allocate_shared( A const & a, Args && ... args );

Requires: The expression new( pv ) T() or new( pv ) T( std::forward<Args>(args)... ), where pv is a void* pointing to storage suitable to hold an object of type T, shall be well-formed and shall have well defined behavior. A shall be an allocator [allocator.requirements]. The copy constructor and destructor of A shall not throw.

Effects: Allocates memory suitable for an object of type T and constructs an object in it via the placement new expression new( pv ) T() or new( pv ) T( std::forward<Args>(args)... ). allocate_shared uses a copy of a to allocate memory.

Returns: A shared_ptr instance that stores and owns the address of the newly constructed object of type T.

Postconditions: get() != 0 && use_count() == 1.

Throws: bad_alloc, or an exception thrown from A::allocate or the constructor of T.

Exception safety: If an exception is thrown, has no effect.

Remarks: Implementations are encouraged, but not required, to perform no more than one memory allocation. [Note: This provides efficiency equivalent to an intrusive smart pointer. --end note]

[Note: These functions will typically allocate more memory than sizeof(T) to allow for internal bookkeeping structures such as the reference counts. --end note]

Implementability:

A proof of concept non-intrusive implementation is available at:

http://www.pdimov.com/cpp/make_shared.cpp

This implementation uses variadic templates and rvalue references. When these features are not available, it falls back on a family of overloaded function templates taking arguments by const reference.

V. Move Support

Users often express concerns over the cost of copying a shared_ptr in situations where the source of the copy is no longer needed. To address this use case, N1851 proposes a separate smart pointer, managed_ptr, that is convertible from and to shared_ptr and enforces unique ownership.

The current proposal does not take this approach. Instead, we propose that move constructors and move assignment operators be added to shared_ptr. This allows a shared_ptr to be as efficient as an auto_ptr or the proposed unique_ptr when the source of the copy or assignment is a temporary or no longer needed. Move-aware standard containers will automatically take advantage of this optimization. As an example of the consequences, reallocating a vector< shared_ptr<T> > will no longer entail any reference count updates.

Impact:

This feature affects the interface of shared_ptr in a way that reduces its copy overhead and is in line with the rvalue recommendations for the standard library presented in N1859-N1862. We believe that it is a strong candidate for addition to the C++0x standard.

Proposed text:

Add to shared_ptr [util.smartptr.shared] the following:

shared_ptr( shared_ptr && r );
template<class Y> shared_ptr( shared_ptr<Y> && r );

shared_ptr& operator=( shared_ptr && r );
template<class Y> shared_ptr& operator=( shared_ptr<Y> && r );

Add the following to [util.smartptr.shared.const]:

shared_ptr( shared_ptr && r );
template<class Y> shared_ptr( shared_ptr<Y> && r );

Requires: For the second constructor Y* shall be convertible to T*.

Effects: Move-constructs a shared_ptr instance from r.

Postconditions: *this contains the old value of r. r is empty.

Throws: nothing.

Add the following to [util.smartptr.shared.assign]:

shared_ptr& operator=( shared_ptr && r );
template<class Y> shared_ptr& operator=( shared_ptr<Y> && r );

Effects: Equivalent to shared_ptr( move( r ) ).swap( *this ).

Returns: *this.

Implementability:

This feature has been added to boost::shared_ptr and will be part of Boost 1.35.

VI. Atomic Access

The level of thread safety offered by shared_ptr is the default for the standard library: const operations on the same instance can be performed concurrently by multiple threads, but updates require exclusive access. Some scenarios require stronger guarantees; there is considerable demand (on the Boost mailing lists and otherwise) for an atomic shared_ptr, one that can withstand concurrent updates from two threads without synchronization.

Instead of providing a separate atomic smart pointer, we propose additions to shared_ptr's interface to address this use case. A typical example scenario — a "lock-free" reader/writer pattern — that takes advantage of this functionality is outlined below:

shared_ptr<State> ps;

void reader()
{
    shared_ptr<State> p = ps.copy();
    use *p;
}

// single writer case
void writer()
{
    shared_ptr<State> p( new State( *ps ) );
    update *p reflecting new information;
    ps.replace( move( p ) );
}

// or, multiple writers case
void writer()
{
    shared_ptr<State> p = ps.copy();
    do
    {
        shared_ptr<State> q( new State( *p ) );
        update *q reflecting new information;
    }
    while( !ps.compare_swap( p, move( q ) ) );
}

Impact:

This feature extends the interface of shared_ptr in a backward-compatible way. We believe that it is a strong candidate for addition to the C++0x standard. It introduces no source- or binary compatibility issues.

Proposed text:

Add to shared_ptr [util.smartptr.shared] the following:

// [util.smartptr.shared.atomic], atomic access:

shared_ptr copy() const;
void replace( shared_ptr r );
bool compare_swap( shared_ptr & v, shared_ptr w );

Add a new section [util.smartptr.shared.atomic]:

A program is allowed to access the same shared_ptr instance concurrently from multiple threads if the access is done exclusively via the member functions in this section.

shared_ptr copy() const;

Returns: *this.

Throws: nothing.

void replace( shared_ptr r );

Effects: swap( r ).

Throws: nothing.

bool compare_swap( shared_ptr & v, shared_ptr w );

Effects: If *this is equivalent to v, assigns w to *this, otherwise assigns *this to v.

Returns: true if *this was equivalent to v, false otherwise.

Throws: nothing.

Remarks: two shared_ptr instances are equivalent if they store the same pointer value and share ownership.

Implementability:

A typical implementation would use a spinlock pool as described in N2145. In pseudocode, the functions will be implemented as follows:

shared_ptr copy() const
{
    lock the spinlock for *this;
    shared_ptr r( *this );
    unlock the spinlock for *this;
    return r;
}

void replace( shared_ptr r )
{
    lock the spinlock for *this;
    swap( r );
    unlock the spinlock for *this;
}

bool compare_swap( shared_ptr & v, shared_ptr w )
{
    lock the spinlock for *this;

    if( *this is equivalent to v )
    {
        swap( w );
        unlock the spinlock for *this;
        return true;
    }
    else
    {
        shared_ptr tmp( *this );
        unlock the spinlock for *this;
        tmp.swap( v );
        return false;
    }
}

Note that the code carefully avoids destroying a non-empty shared_ptr instance while holding the spinlock, as this may lead to user code being called and present an opportunity for a deadlock. It also attempts to limit the spinlock scope to the minimum necessary to prevent contention.

VII. Cycle Collection

The inability of shared_ptr to handle cyclic structures has long been cited as one of its important downsides. The proposed addition offers a way for users to opt to reclaim cyclic structures via implementing a support interface for their nodes. It imposes no overhead on existing non-cyclic uses that do not implement the interface.

Impact:

This feature does not affect the interface of shared_ptr, but requires support from the implementation in a similar way to enable_shared_from_this. We believe that it is therefore a candidate for addition to the C++0x standard.

Proposed text:

Add to the synopsis of <memory> the following:

class enable_sp_collect;
void sp_collect();

Add the following two sections after [util.smartptr.enab]:

Class enable_sp_collect

A class that derives from enable_sp_collect via public and unambiguous inheritance is called collectible. The implementation tracks all collectible classes owned by shared_ptr instances and can reclaim cycles of unreachable collectible objects.

namespace std {

  class enable_sp_collect
  {
  protected:
    ~enable_sp_collect() {}
  public:
    virtual void sp_enumerate( vector< shared_ptr<enable_sp_collect> > & v ) = 0;
    virtual void sp_reset() = 0;
  };

} // namespace std
virtual void sp_enumerate( vector< shared_ptr<enable_sp_collect> > & v ) = 0;

Effects: Appends all shared_ptr instances to collectible objects that *this owns to v.

virtual void sp_reset() = 0;

Effects: Resets or destroys all shared_ptr instances to collectible objects that *this owns.

Throws: nothing.

[Note: This function may be called while the collector is holding a lock. Performing anything else besides resetting the shared_ptr instances is strongly discouraged as it may lead to deadlocks. --end note]

[Example:

class Node: public enable_sp_collect
{
private:

  shared_ptr<Node> link;
  vector< shared_ptr<Node> > links;

public:

  void sp_enumerate( vector< shared_ptr<enable_sp_collect> > & v )
  {
    v.push_back( link );
    v.insert( v.end(), links.begin(), links.end() );
  }

  void sp_reset()
  {
    link.reset();
    links.clear();
  }
};

--end example]

sp_collect

namespace std {

  void sp_collect();

}

Effects: For every collectible object owned by a shared_ptr, calls sp_enumerate. Performs reachability analysis on the returned data and identifies the unreachable collectible objects. Retains temporary shared_ptr references to all unreachable objects. For every unreachable object, calls sp_reset. Drops the temporary references, causing the destruction of the unreachable objects.

[Note: This function is allowed to be called in a separate thread. --end note]

[Note: The reachability analysis may be performed using Christopher's algorithm. Using the instance graph, compute the use counts the objects would have if there were no external shared_ptr instances. Compare the computed use counts with the actual use counts. For every mismatch, mark the object as reachable from the outside. Propagate reachability throughout the graph. --end note]

Implementability:

A very preliminary proof of concept implementation is available at:

http://www.pdimov.com/cpp/enable_sp_collect.cpp

It is expected to be integrated with boost::shared_ptr in time for the mailing.


Thanks to Joe Gottman for his comments on the move support.

Thanks to Jens Maurer for his comments on the make_shared wording.

--end