14 crazy ideas for the standard library in C++0x

Author:	Thorsten Ottosen
Contact:	nesotto@cs.aau.dk
organizations:	Dezide Aps and Aalborg University
Date:	2005-08-24
Number:	WG21/N1870 and J16/05-0130
Working Group:	Evolution

Abstract

This paper discusses 14 small ideas for enhancing the standard library. The ideas all intend to make the library easier to use or more efficient.

Table of Contents

1 Introduction
2 Ideas for the container library
3 Ideas for utilities
4 Ideas for algorithms
5 Acknowledgements

1 Introduction

This is a small paper about various ideas for the standard library.

The ideas try to

make the standard library easier to use
make the standard library more efficient
make the standard more flexible

The author has often found the need for many of the facilities suggested below. Some has also been discussed in various C++ newsgroups.

We do not provide any wording yet. The LWG should discuss each item and then ask the author to provide wording if they like a particular idea.

2 Ideas for the container library

2.1 Add member `resize_capacity()` to `vector` and `basic_string`

Motivation:

Remove the need for "the swap trick" to trim excess capacity in a vector or a string. We can now just say
```
vector<T> vec = ...;
...
vec.resize_capacity( vec.size() );
```
Provide explicit control over the buffer size.

Remarks:

"The swap trick" is discussed in Effective STL, item 17. Its still a bit of a hack compared to genuine support for controling the capacity.

2.2 Require `size()` to be of constant complexity

Section 23.1/5 allows a container to implement size() with linear complexity. This flexibility can sometimes be used by implementers. However, it is doubtful if this is for the benefit of the users.

We therefore suggest that it size() is required to be of constant complexity.

Motivation:

If users forget to put the computation of the container's size outside the loop, but rather call size() repeatedly, a straight-forward O(n) loop becomes O(n^2).

Remarks:

The rest of the functions under Note A might be re-considered too.

2.3 Add `zero()` function to standard sequences

With the like advent of a new auto, it become much easier to specify iterators inside loops

for( auto i = cont.begin(), e = cont.end(); ... )

When two variables are initialized in one statement, it is required that they are deduced to be the same type. This raises a problem with sequences because

for( auto i = 0, s = cont.size(); i != s; ... )

is not legal. We have to do the rather awkward

for( auto i = decltype(cont.size())(0), s = cont.size(); i != s; ... )

Therefore we suggest to add a helper member function to sequences that return a zero of the correct size type:

size_type zero() const { return 0; }

which allows us to write

for( auto i = cont.zero(), s = cont.size(); i != s; ... )

2.4 Add splice-like functions to node-based containers

Move-semantics is going to give the standard library a well-deserved speed enhancement. However, for node-based containers we should be able to move individual nodes between containers of the same type. This essentially corresponds to the splice() members in list.

The speedup could be enormous for certain operations on node-based containers.

Therefore we suggest that the following members are added to map, multimap, set and multiset

container
{
    ...
    void splice( container&& );
    void splice( iterator, iterator, container& );
    void splice( iterator, container& ); 
}

which is guaranteed to reuse the internal nodes.

Remarks:

Would it be possible to do this for deque to?
Can this work even if the allocators are different?

2.5 Add `set_buffer( T*, size_t )` to `vector` and `basic_string`

It would often be convenient if continuous-storage containers could take ownership of a heap-allocated array. This is important in low level-code when either

one have a heap-allocated array already
one does not want to initialize the buffer, because it must be completely overwritten anyway

Remarks:

Can non-default allocators be a problem?

2.6 Add new `assign()` member to all sequences

The following member:

template <class T, ...>
class sequence 
{
  // call 'f' 'n' times
  // 'f()' takes a reference to an element to initialize
  template< class F > assign( size_type n, F f );
  ..
};

has the following advantages:

it allows users to fill the sequence using a functor

it allows advanced users to forgo default initialization by supplying an empty functor

2.7 Add new set/map lookup functions

Consider a simple word-counting example:

map<string,int> m;
...
for(...)
    m[ str(b,e) ]++;

We need to construct a temporary string just to check if we should increment the word count---and we then throw this temporary away. This can be very expensive when the key is a heavy object.

Therefore we propose that the following new lookup functions are added to map and unordered_map

template< class ComparableToKey >
value_type& lazy_insert( const ComparableToKey& );

If the object is not found, it needs to be converted. This should be done via a call to

template< class T, class U >
T convert_to( const U& r )
{
    // default implementation should use iterator-range
    // construction
}

Alternatively we could rely on implicit conversion by default (the important thing being that we can customize it).

For set and unordered_set we should add

template< class ComparableToKey >
std::pair<iterator,bool> lazy_insert( const ComparableToKey& );

Also, in a similar manner, we could add the following members to map, unordered_map, set and unordered_set:

template< class ComparableToKey >
iterator lazy_find( const ComparableToKey& r );

template< class ComparableToKey >
size_type lazy_count( const ComparableToKey& r );

Perhaps even extensions to lower_bound() etc. should be allowed.

3 Ideas for utilities

3.1 Add functors for conversion

Conversion is a very important correctness aspects of the language. The language is not too good in this respect, although good compilers do give warnings.

Motivation:

If there is no easy way to do a certain conversion, users tend to not do it.
It is not uncommon that users simply turn conversion warnings off. Conversions need to be painless.
We need these functors when applying a conversion to a sequence (perhaps via some kind of transform iterator).

Remarks:

The following functors are suggested:

template< class T >
struct static_cast_to
{
    template< class U >
    T operator()( U u ) const
    {
        return static_cast<T>(u);
    }
};

template< class T >
struct dynamic_cast_to
{
    template< class U >
    T operator()( U u ) const
    {
        return dynamic_cast<T>(u);
    }
};

template< class T >
struct const_cast_to
{
    template< class U >
    T operator()( U u ) const
    {
        return const_cast<T>(u);
    }
};

template< class T >
struct reinterpret_cast_to
{
    template< class U >
    T operator()( U u ) const
    {
        return reinterprest_cast<T>(u);
    }
};

If we are lucky enough to get a numeric_cast<T> proposal we should also add

template< class T >
struct numeric_cast_to
{
    template< class U >
    T operator()( U u ) const
    {
        return numeric_cast<T>(u);
    }
};

Discussion:

Are the arguments of operator()() properly specified?

3.2 Add iterator utilities from Boost

The iterator library could benefit from these handy utilities from Boost.Util

template <class T>
T next(T x) { return ++x; }

template <class T, class Distance>
T next(T x, Distance n)
{
    std::advance(x, n);
    return x;
}

template <class T>
T prior(T x) { return --x; }

template <class T, class Distance>
T prior(T x, Distance n)
{
    std::advance(x, -n);
    return x;
}

Motivation:

the utilities are very handy
the utilities are often requested by users
the utilities are very simple

3.3 Add header <std> and <tr1>

This would enable better support for precompiled headers. Even with modules in C++0x, this will be easy to do for many years until modules are widely available.

It is also considerably easier to teach newbies that they simply include <std>.

3.4 Standardize an `unspecified-bool` type

An unknown function-pointer is often used when eg. smart pointers provide an implicit conversion to bool, only in a more restrictive form that is only intended to be used in an if-statement:

smart_ptr<T> p = ...;
...
if( p )
{
    ...
}

The idiom is so common with library writers that we should standardize the type, for example

namespace std
{
    typedef unspecified-type restricted_bool;
    const restricted_bool      restricted_true;
    const restricted_bool      restricted_false;
}

4 Ideas for algorithms

4.1 Add `minmax()` algorithms from Boost

For some time these utilities have been part of boost.

The algorithms are easy to specify and complement min() and max() algorithms very well:

template <class T>
tuple<T const&, T const&> >
minmax(const T& a, const T& b);

template <class T, class BinaryPredicate>
tuple<T const&, T const&> >
minmax(const T& a, const T& b, BinaryPredicate comp);

template <class ForwardIterator>
std::pair<ForwardIterator,ForwardIterator>
minmax_element(ForwardIterator first, ForwardIterator last);

template <class ForwardIterator, class BinaryPredicate>
std::pair<ForwardIterator,ForwardIterator>
minmax_element(ForwardIterator first, ForwardIterator last,
               BinaryPredicate comp);

4.2 Add a few statistics algorithms to `<numeric>`

A few often use algorithms would be nice to see in <numeric>:

template< class T, class InputIterator >
T mean( InputIterator first, InputIterator last );

template< class T, class InputIterator >
T variance( InputIterator first, InputIterator last );

template< class T, class InputIterator >
T std_deviation( InputIterator first, InputIterator last );

template< class T, class InputIterator, class InputIterator2 >
T covariance( InputIterator first, InputIterator last,
              InputIterator2 first2, InputIterator2 last2 );

template< class T, class InputIterator, class InputIterator2 >
T
correlation_coefficient( InputIterator first, InputIterator last,
                         InputIterator2 first2, InputIterator2 last2 );

template< class T, class InputIterator, class InputIterator2 >
tuple<T,T,T> 
least_square_line( InputIterator first, InputIterator last, 
                   InputIterator2 first2, InputIterator2 last2 );

template< class T, class InputIterator, class InputIterator2 >
tuple<T,T>
least_square_line_without_correlation( InputIterator first, InputIterator last,
                                       InputIterator2 first2, InputIterator2 last2 );
                                       
template< class T >
T factorial( T n );

template< class T >
T nPr( T n, T r );

template< class T >
T nCr( T n, T r );

template< class T, class Integer >
T binomial( T p, Integer n, Integer r );

template< class T, class Integer >
T negative_binomial( T p, Integer n, Integer r );

template< class T, class Integer >
T poisson( T lambda, Integer r );

template< class T, class Integer >
T hypergeometric( Integer n, Integer r,
                  Integer blue, Integer red );

template< class T, class Integer >
T geometric( T p, Integer r );

Motivation:

many of these functions are frequently used
many of these functions are nice for beginners that need to make small games or be taught introductive statistics (this will benefit C++ as a language to start introductive programming in).
functions like mean(), std_deviation() and variance() are fairly easy to implement, but if high-quality single-pass algorithm takes considerable more time. As such, the library algorithm will be guaranteed to be optimal.
an implementation of all this may be found in the boost sandbox under the directory stat.

4.3 Consider adding additional algorithms

Consider adding some of the algorithms discussed on the Boost Wiki

In particular consider

is_sorted()
nonunique_copy()

5 Acknowledgements

The author has presented some ideas directly or indirectly taken from or inspired by the following people (at least):

David Abrahams
Howard Hinnant
Scott Meyers
Daniel Walker
Hervé Brönnimann
Michael Goldshteyn

Also thanks to the people on comp.std.c++ for their suggestions. I think some of the above ideas will make some of the suggestions much easier to write efficiently outside the standard.