With the introduction of non-trivial copy constructors and assignment operators, C++ long ago lost the potential efficiency gains associated with expanding dynamic arrays in-place as provided by realloc in C. When dealing with dynamic arrays of objects in C++ (e.g. vector<T>), there exists no standard facility today by which the vector can expand its capacity without allocating a larger block of memory and copying each element over (using the element's copy constructor).
Move semantics (N1377) provides a tremendous performance boost when the need arises to reallocate the contiguous block of memory. This paper proposes to complement move semantics by giving the client (i.e. vector or string) a way to possibly delay the reallocation by expanding the block of memory in-place (and thus gain back what we lost when we had to give up realloc).
There are several pieces of functionality that should be in place in order to make in-place memory expansion for templates such as vector and string practical:
Strictly speaking, the first point isn't absolutely required. However I believe that without it, writing allocators capable of supplying expand-in-place functionality will be overly difficult for the average programmer. With that in mind, N1085 has been submitted to WG14 for review. WG14 subsequently passed it to WG21 for comment. This paper augments N1085 in the context of C++ applicability. N1085 proposes four new C functions which augment the malloc interface. This paper builds on that proposal, showing how an expanded allocator interface could make use of the C interface proposed in N1085, and how in turn vector could make use of such an allocator. This expanded allocator interface, along with the infrastructure (versioning) to detect that interface is herein respectfully proposed for standardization in C++0X.
This paper will propose five new member functions for the allocator requirements. These five new member functions are optional, and even the default std::allocator need not implement them. However, so that containers can query (at compile time) the allocator to discover if this expanded interface is available, it is convenient to version the allocator's API. Note: this paper does not address ABI versioning issues which is a separate topic.
The current (C++03) allocator interface will be referred to as version 1. The proposed addition to the allocator will be referred to as version 2. And there will exist a versioning infrastructure such that a container will be able to retrieve an allocator's version number with the following simple statement:
static const unsigned ver = std::version<Allocator>::value;
Where Allocator is the container's allocator type. This versioning system will even work with today's allocators (without modifying them) and return the value 1. There is also nothing which is allocator-specific about this versioning system and it could be applied in a number of other cases.
Allocators which wish to declare themselves as supporting the version 2 interface must include the following nested typedef which must be publicly accessible:
template <class T> class my_allocator
{
public:
    typedef std::version_type<my_allocator, 2> version;
    ...
};
This versioning system has been designed to reduce the risk of accidental conformance to a vanishingly low probability. The version_type<T, V> template has the following interface:
template <class T, unsigned V>
struct version_type
    : public integral_constant<unsigned, V>
{
    typedef T type;
    version_type(const version_type<T, 0>&);
};
Note the non-explicit constructor taking a version_type<T, 0>. The purpose of this constructor is to provide extra security against accidental conformance. The allocator must not only have a nested type named version, that type must also have the property that std::version_type<T,0> is implicitly convertible to it, else std::version will not recognize it as a version number. All std::version_type<T,N> types have this property because of this non-explicit constructor.
Note also that the client type is embedded in the version_type. The purpose of this is to allow base classes to have versions without propagating that version information to derived types. Thus derived types must explicitly be changed to reflect version changes. This insures that derived types won't accidently inherit an updated version number.
Additionally, if desired, clients can explicitly specialize the std::version traits class in order to specify their version number. This is simply an alternate syntax for versioning and has no different functionality.
Examples:
struct A
{
    typedef std::version_type<A, 2> version;
};
std::version<A>::value is 2.
struct B
{
    struct version {static unsigned const value = 3;};
};
std::version<B>::value is 1. B::version does not influence std::version<B>::value.
struct C
{
    static int const version = 4;
};
std::version<C>::value is 1. C::version does not influence std::version<C>::value.
struct D
{
    int version() const {return 5;}
};
std::version<D>::value is 1. D::version() does not influence std::version<D>::value.
struct E
{
};
std::version<E>::value is 1. Types with no nested version type are automatically assigned a version of 1.
struct F
{
};
namespace std
{
template <>
struct version< ::F >
{
    static const unsigned value = 6;
};
} // std
std::version<F>::value is 6. This example demonstrates the technique of registering a version number by specialization of std::version.
struct G
    : public A
{
};
std::version<G>::value is 1, despite the fact that std::version<A>::value is 2. Version numbers are not inheritable.
struct H
    : public G
{
    typedef std::version_type<H, 3> version;
};
std::version<H>::value is 3, its base class G has a version of 1, and the base class of G, A, has a version of 2.
Additionally, version should be insensitive to cv-qualifications. That is, an invariant is that version<A>::value == version<const A>::value. This reduces the chance of obtaining the wrong version number. Also in generic code it can be the case that a reference to some object A can be used instead of an A itself. Therefore version should operate on the referenced object in that case:
version<A>::value == version<const A&>::value
The "version 2" allocator interface is a strict superset of the current (version 1) allocator interface. This allows version 2 allocators to be used with containers (and any other objects) which only expect the version 1 interface. This fact also allows the presentation of the version 2 interface without repeating the current interface. In addition to the interface itself, also shown below is an example implementation based on N1085.
template <class T>
class allocator
{
public:
    ...
    // as today
    ...
    // plus:
    typedef version_type<allocator, 2> version;
    size_type size(pointer p) const throw() {return sizeof_alloc(p) / sizeof(T);}
    pointer allocate(size_type size_requested, size_type& size_received)
    {
        size_requested *= sizeof(T);
        pointer result = (pointer)request_malloc(size_requested, &size_received);
        if (result == 0)
            throw bad_alloc();
        size_received /= sizeof(T);
        return result;      
    }
    pointer request(size_type size_requested, size_type& size_received) throw()
    {
        size_requested *= sizeof(T);
        pointer result = (pointer)request_malloc(size_requested, &size_received);
        size_received /= sizeof(T);
        return result;
    }
    bool resize(pointer p, size_type size_requested, size_type& size_received) throw()
    {
        size_requested *= sizeof(T);
        int result = resize_alloc(p, size_requested, &size_received);
        size_received /= sizeof(T);
        return result != 0;
    }
    bool expand(pointer p, size_type min_size, size_type preferred_size, size_type& size_received) throw()
    {
        min_size       *= sizeof(T);
        preferred_size *= sizeof(T);
        int result = negotiate_alloc(p, min_size, preferred_size, &size_received);
        size_received /= sizeof(T);
        return result != 0;
    }
};
As one can immediately see, if the interface proposed in N1085 exists, it is quite easy to map that C interface to this proposed C++ allocator interface. One merely has to adjust the results for sizeof(T).
The size function allows the client to ask how many elements a previously allocated pointer actually points to. The amount returned may be greater than the number which was requested via a previous allocate. Clients such as string or vector can use this information to add to their capacity at no cost.
The allocate function is an overload of the existing allocate member function. This overload passes a reference to a size_type which will hold the number of elements actually allocated after a successful allocation. This saves the client from needing to make a separate call to size directly after an allocation.
The request member function is similar to allocate except that its failure mode is different. On failure instead of throwing an exception, it returns null. Also on failure the size_received parameter will hold a suggested size which may succeed in a future call to request (no guarantee is made). This member function can be handy if the client has some flexibility in how much needs to be allocated (e.g. I would like up to 100 elements, but I need at least 50).
The resize member function will attempt to change the size of a memory block in-place. It can attempt to either grow or shrink an allocation. If the change in size was successful, true is returned. Also in this case, size_received will contain the new size (in terms of number of elements). If the call was not successful (false was returned), then size_received will contain a suggested size which may succeed in a future call. Clients may use this function to implement "shrink to fit" functionality, or to attempt a capacity growth while avoiding a reallocation.
The expand member function is a more specialized version of resize which concentrates on growing the capacity. The client can specify a preferred size and a minimum acceptable size. The allocator should attempt to obtain the maximum amount of memory within that range that it can. Clients such as vector can use this member during push_back when size() == capacity(). For example the client might ask to expand the current memory block by at least one element, and up to a maximum of double the current capacity. On failure, false will be returned and the client will have to resort to allocating (or requesting) a larger block of memory.
In this section there is an example (and partial) implementation of std::vector which demonstrates how a few vector member functions would be implemented in order to be both compatible with version 1 allocators, and take advantage of the new functionality in version 2 allocators in order to boost performance.
The basic technique is to rely on private helper functions which compile-time dispatch based on the version number of the allocator. For example, the vector constructor that takes a size_type parameter indicating the number of elements to default construct might in turn call a private allocate helper function, which is split into two implementations: one if the allocator is version 1, and another if the allocator is version 2:
template <class T, class A>
class vector
{
private:
    typedef integral_constant<unsigned,                 1> allocator_v1;
    typedef integral_constant<unsigned,                 2> allocator_v2;
    typedef integral_constant<unsigned, version<A>::value> alloc_version;
    void allocate(size_type n) {allocate(n, alloc_version());}
    void allocate(size_type n, allocator_v1)
    {
        data_ = alloc_.allocate(n);
        capacity_ = n;
    }
    void allocate(size_type n, allocator_v2)
    {
        data_ = alloc_.allocate(n, capacity_);
    }
public:
    vector(size_type n)
    {
        allocate(n);
        // default construct n elements
        size_ = n;
    }
    ...
In the above example the implementation sets up integral_constant types which correspond to version 1 allocators, version 2 allocators, and the version number of the current allocator. One can then easily overload the private helper functions (allocate in this example) for the version 1 and version 2 implementations, and compile-time dispatch to them using the current allocator version.
In the above example the vector constructor simply calls the private allocate function to allocate sufficient memory before it constructs the proper number of elements and correctly sets size(). The private allocate function will behave as it does today if the vector has a version 1 allocator. That is, it simply allocates the memory and sets the capacity() to the same size as the number of elements requested by the constructor.
However if the allocator is version 2, a different private allocate helper is called which uses the new overloaded allocate function of version 2 allocators. This variant will set the vector's capacity() to at least the size requested by the constructor, and perhaps a little more depending upon the underlying allocator's implementation details. This variation might be especially effective when the sizeof(T) is small (e.g. vector<char>).
The current Freescale implementation of vector, combined with an extension allocator which calls malloc instead of new, implements this technique. The following example code demonstrates this "free" extra capacity:
#include <vector>
#include <iostream>
#include <memory>
int main()
{
    std::vector<char> v1(5);
    std::cout << v1.capacity() << '\n';
    std::vector<char, Metrowerks::malloc_alloc<char> > v2(5);
    std::cout << v2.capacity() << '\n';
}
The above code currently outputs:
5 12
This output indicates that when asking for 5 char's, the underlying malloc system actually delivers room for 12 (on this system). Without this proposal, the memory for the extra 7 char's goes completely to waste simply because the vector has no way to know that this additional memory exists.
The push_back member of vector can also be optimized in the presence of a version 2 allocator. Consider the following example code for push_back:
void push_back(const T& x)
{
    if (size_ < capacity_ || expand_by(1))
    {
        // sufficient capacity, append element
    }
    else
    {
        // insufficient capacity, reallocate
    }
}
Here if size() is threatening to exceed capacity(), before branching to the reallocation branch of the algorithm, an attempt is made to expand the current memory block in-place by calling the private helper function expand_by. Like the private helper function in the constructor example, expand_by can branch to different algorithms depending upon the version of the allocator.
bool expand_by(size_type n)
    {return expand_by(n, alloc_version());}
bool expand_by(size_type, allocator_v1) {return false;}
bool expand_by(size_type n, allocator_v2)
{
    size_type minimum_size = n + capacity_;
    size_type maximum_size = max(minimum_size, growth_factor * capacity_);
    size_type size_received;
    if (alloc_.expand(data_, minimum_size, maximum_size, size_received))
    {
        capacity_ = size_received;
        return true;
    }
    return false;
}
For version 1 allocators, it is never possible to expand a block of memory in place so expand_by simply returns false. Modern compilers should be able to optimize expand_by completely out of existence in this case.
If we have a version 2 allocator, then the new expand member of the allocator is called, asking for an expansion that is as much as twice (or whatever the growth_factor) the current capacity, or at least big enough to hold n more elements (where n == 1 in the case of push_back). The expand-in-place will not always be successful of course, but it is relatively inexpensive to ask the question and pays off handsomely when the underlying allocator can manage it.
A test with the Freescale implementation which does nothing but push_back a default constructed string 1000 times into a vector<string> shows that this optimization roughly doubles the speed of that loop.
This paper proposes the specification of the versioning system in terms of std::tr1::integral_constant (part of the type traits library) as if integral_constant was already part of the working draft. This dependence could easily be removed, but currently I do not see the need to do that.
This proposal, in its entirety, including N1085, has been implemented in the Freescale C and C++ libraries, except that the names of the C functions have been mangled with underscores, and the C++ versioning system was put in namespace Metrowerks.
The Freescale implementation of std::allocator does not support the version 2 interface, but an extension allocator does (Metrowerks::malloc_alloc). The Freescale std::vector and an extension container Metrowerks::cdeque recognize version 2 allocators and take advantage of their extended interface. Of course these containers continue to work with today's (version 1) allocators as well.
Note that in the proposed wording below, the implementation is required to provide infrastructure such as version and version_type, but is not required to provide a std::allocator which meets the version 2 allocator specification, nor to provide containers which recognize version 2 allocators (version 2 allocators will work with such containers, but offer no enhanced performance). Merely the syntax for easily allowing such optimizations is standardized. Market forces can dictate whether the optimization is actually provided on any given platform.
Also note that if an implementation is to provide a std::allocator which supports the version 2 interface, it must either detect that the application has not replaced operator new, or it must must implement the interface in such a way as to be prepared for the application to replace operator new (i.e. std::allocator is required to call operator new and in general can not assume anything about the behavior of this allocation system).
Modify the <utility> synopsis in 20.2:
namespace std {
  //  lib.operators, operators:
  namespace rel_ops {
    template<class T> bool operator!=(const T&, const T&);
    template<class T> bool operator> (const T&, const T&);
    template<class T> bool operator<=(const T&, const T&);
    template<class T> bool operator>=(const T&, const T&);
  }
  
  //  lib.pairs, pairs:
  template <class T1, class T2> struct pair;
  template <class T1, class T2>
    bool operator==(const pair<T1,T2>&, const pair<T1,T2>&);
  template <class T1, class T2>
    bool operator< (const pair<T1,T2>&, const pair<T1,T2>&);
  template <class T1, class T2>
    bool operator!=(const pair<T1,T2>&, const pair<T1,T2>&);
  template <class T1, class T2>
    bool operator> (const pair<T1,T2>&, const pair<T1,T2>&);
  template <class T1, class T2>
    bool operator>=(const pair<T1,T2>&, const pair<T1,T2>&);
  template <class T1, class T2>
    bool operator<=(const pair<T1,T2>&, const pair<T1,T2>&);
  template <class T1, class T2> pair<T1,T2> make_pair(T1, T2);
  //  lib.versioning, version:
  template <class T, unsigned V> struct version_type;
  template <class T> struct version;
}
Add new section: 20.2.3:
-1- A standardized infrastructure for giving class types an easily extracted version number is defined. Existing class types which lack explicit version numbers are automatically assigned a version number of 1 by this system. This system allows classes to communicate versioned interfaces to each other.
-1- The library provides a template which can be used to explicitly declare a version number for a user-defined class or struct.
template <class T, unsigned V>
struct version_type
    : public integral_constant<unsigned, V>
{
    typedef T type;
    version_type(const version_type<T, 0>&);
};
-2- It is unspecified whether the version_type constructor has a definition.
template <class T> struct version;
-1- version publicly derives from integral_constant<unsigned, 1> for all types T except for the case that T is a struct or class and meets all of the following requirements:
For types T meeting the above requirements, version<T> will publicly derive from integral_constant<unsigned, T::version::value>.
-2- In the case that T is a reference type, version will drop the reference and instead operate on the referenced type.
-3- In the case that T is a cv-qualified type, version will ensure that the same value will be returned as for a non-cv-qualified T.
-4- [Example:
struct A {};
const unsigned vA = std::version<A>::value;  // vA == 1
struct B {
    typedef std::version_type<B, 2> version;
};
const unsigned vB  = std::version<      B >::value; // vB  == 2
const unsigned vBr = std::version<const B&>::value; // vBr == 2
struct C : public B {};
const unsigned vC = std::version<C>::value;  // vC == 1
-- end example]
Add new section 20.1.6.1 - Optional Allocator requirements:
-1- Allocators may choose to implement an extended API in addition to those requirements defined in lib.allocator.requirements. If such an allocator declares its intention to conform to this extended interface, it must implement all of it.
-2- An allocator (named Alloc for example) declares its intent to conform to this optional interface by having the following nested declaration:
class Alloc
{
public:
    typedef std::version_type<Alloc, 2> version;
    /* ... */
};
-3- Allocators having this nested version type will define the following additional member functions:
size_type size(pointer p) const throw();
-4- Requires: p is non-null and has been returned by a previous call to allocate (either overload), or request, and has not yet been deallocated by deallocate.
-5- Returns: The returned value indicates that the client can store valid values of T to the range [ptr, ptr + returned-value) without fear of corrupting the heap.
-6- Throws: Nothing.
pointer allocate(size_type size_requested, size_type& size_received);
-7- Effects: Memory is allocated for size_requested objects of type T but objects are not constructed. allocate may raise an appropriate exception. On success, the returned pointer refers to the allocated memory, and size_received indicates the number of objects actually allocated (size_received >= size_requested). size_received will hold the same value that would be returned by the size function if supplied the pointer returned from this call. [Note: If size_requested == 0, the return value is unspecified, and throwing an exception is still possible. -- end note]
-8- Throws: Throws an exception if unable to allocate the requested memory.
pointer request(size_type size_requested, size_type& size_received) throw();
-9- Effects: Memory is allocated for size_requested objects of type T but objects are not constructed. request will not raise an exception. On success, the returned pointer refers to the allocated memory, and size_received indicates the number of objects actually allocated (size_received >= size_requested). On failure, a null pointer value will be returned and size_received will hold a suggested value for size_requested which may succeed in a future call to request (there is no guarantee that it will succeed). [Note: If size_requested == 0, the return value is unspecified. -- end note]
-10- Throws: Nothing.
bool resize(pointer p, size_type size_requested, size_type& size_received) throw();
-11- Requires: The pointer p is non-null and has been returned from one of the allocate member functions or the request member function and has not been deallocated with the deallocate member function.
-12- Effects: An attempt is made to change the size of the memory block referenced by p to fit size_requested objects of T. Whether or not this change in size is successful, p will still point to the block of memory after this call (the memory block will not be relocated in the client's address space, nor deallocated, even if size_requested == 0). On success size_received will hold the new size(p) and size_received >= size_requested. If size_requested < the size(p) just prior to this call, success will only be reported if this call to resize results in a decrease reported by size(p). If not successful, size_received will have a value that serves as a hint for a size_requested that may succeed in a future call (no guarantee of such success is offered). On failure, the value reported by size(p) after the call will be the same as was reported just prior to the call. [Note: Implementations are allowed to set size_received to size(p) on failure. -- end note]
-13- Returns: true if the resize succeeded, otherwise false.
-14- Throws: Nothing.
bool expand(pointer p, size_type min_size, size_type preferred_size, size_type& size_received) throw()
-15- Requires: The pointer p is non-null and has been returned from one of the allocate member functions or the request member function and has not been deallocated with the deallocate member function. min_size <= preferred_size.
-16- Effects: An attempt is made to change the size of the memory block referenced by p to fit preferred_size objects of T. If the allocator can not expand the size(p) to preferred_size, the allocator will attempt to expand the block to hold the maximum number of objects it can as long as this results in size(p) >= min_size. If the size(p) prior to the call to expand is already greater than or equal to min_size, success is assured. On success, size_received will have the value which would be returned by a subsequent call to size(p). On failure size_received will hold a hint for a preferred_size that may succeed in a future call to expand (no guarantee of such success is offered). On failure the value reported by size(p) is not changed by this call. Whether or not this change in size is successful, p will still point to the block of memory after this call (the memory block will not be relocated in the client's address space, nor deallocated, even if min_size == 0). [Note: Implementations are allowed to set size_received to size(p) on failure. -- end note]
-17- Returns: true if the expansion succeeded, otherwise false.
-18- Throws: Nothing.
I'm afraid that I do not know who to credit with the design of the proposed versioning system. My thanks go out to the C++ community in general for all of the help generously offered over the years.