Author: | Thorsten Ottosen |
---|---|

Contact: | nesotto@cs.aau.dk |

organizations: | Dezide Aps and Aalborg University |

Date: | 2005-08-24 |

Number: | WG21/N1870 and J16/05-0130 |

Working Group: | Evolution |

Abstract

This paper discusses 14 small ideas for enhancing the standard library. The ideas all intend to make the library easier to use or more efficient.

- 1 Introduction
- 2 Ideas for the container library
- 2.1 Add member
`resize_capacity()`to`vector`and`basic_string` - 2.2 Require
`size()`to be of constant complexity - 2.3 Add
`zero()`function to standard sequences - 2.4 Add splice-like functions to node-based containers
- 2.5 Add
`set_buffer( T*, size_t )`to`vector`and`basic_string` - 2.6 Add new
`assign()`member to all sequences - 2.7 Add new set/map lookup functions

- 2.1 Add member
- 3 Ideas for utilities
- 4 Ideas for algorithms
- 5 Acknowledgements

This is a small paper about various ideas for the standard library.

The ideas try to

- make the standard library easier to use
- make the standard library more efficient
- make the standard more flexible

The author has often found the need for many of the facilities suggested below. Some has also been discussed in various C++ newsgroups.

We do not provide any wording yet. The LWG should discuss each item and then ask the author to provide wording if they like a particular idea.

**Motivation**:

Remove the need for "the swap trick" to trim excess capacity in a vector or a string. We can now just say

vector<T> vec = ...; ... vec.resize_capacity( vec.size() );

Provide explicit control over the buffer size.

**Remarks**:

"The swap trick" is discussed in Effective STL, item 17. Its still a bit of a hack compared to genuine support for controling the capacity.

Section 23.1/5 allows a container to implement `size()`
with linear complexity. This flexibility can sometimes
be used by implementers. However, it is doubtful if this
is for the benefit of the users.

We therefore suggest that it `size()` is required to
be of constant complexity.

**Motivation**:

- If users forget to put the computation of the container's
size outside the loop, but rather call
`size()`repeatedly, a straight-forward O(n) loop becomes O(n^2).

**Remarks**:

The rest of the functions under Note A might be re-considered too.

With the like advent of a new `auto`,
it become much easier to specify iterators
inside loops

for( auto i = cont.begin(), e = cont.end(); ... )

When two variables are initialized in one statement, it is required that they are deduced to be the same type. This raises a problem with sequences because

for( auto i = 0, s = cont.size(); i != s; ... )

is not legal. We have to do the rather awkward

for( auto i = decltype(cont.size())(0), s = cont.size(); i != s; ... )

Therefore we suggest to add a helper member function to sequences that return a zero of the correct size type:

size_type zero() const { return 0; }

which allows us to write

for( auto i = cont.zero(), s = cont.size(); i != s; ... )

Move-semantics is going to give the standard library
a well-deserved speed enhancement. However,
for node-based containers we should be
able to move individual nodes between containers
of the same type. This essentially corresponds
to the `splice()` members in `list`.

The speedup could be enormous for certain operations on node-based containers.

Therefore we suggest that the following
members are added to `map`, `multimap`,
`set` and `multiset`

container{ ... void splice(container&&); void splice( iterator, iterator,container&); void splice( iterator,container&); }

which is guaranteed to reuse the internal nodes.

**Remarks**:

- Would it be possible to do this for
`deque`to? - Can this work even if the allocators are different?

It would often be convenient if continuous-storage containers could take ownership of a heap-allocated array. This is important in low level-code when either

- one have a heap-allocated array already
- one does not want to initialize the buffer, because it must be completely overwritten anyway

**Remarks**:

- Can non-default allocators be a problem?

The following member:

template <class T, ...> class sequence { // call 'f' 'n' times // 'f()' takes a reference to an element to initialize template< class F > assign( size_type n, F f ); .. };

has the following advantages:

- it allows users to fill the sequence using a functor

- it allows advanced users to forgo default initialization by supplying an empty functor

Consider a simple word-counting example:

map<string,int> m; ... for(...) m[ str(b,e) ]++;

We need to construct a temporary string just to check if we should increment the word count---and we then throw this temporary away. This can be very expensive when the key is a heavy object.

Therefore we propose that the following new lookup
functions are added to `map` and `unordered_map`

template< class ComparableToKey > value_type& lazy_insert( const ComparableToKey& );

If the object is not found, it needs to be converted. This should be done via a call to

template< class T, class U > T convert_to( const U& r ) { // default implementation should use iterator-range // construction }

Alternatively we could rely on implicit conversion by default (the important thing being that we can customize it).

For `set` and `unordered_set` we should add

template< class ComparableToKey > std::pair<iterator,bool> lazy_insert( const ComparableToKey& );

Also, in a similar manner, we could add the following members
to `map`, `unordered_map`, `set` and `unordered_set`:

template< class ComparableToKey > iterator lazy_find( const ComparableToKey& r ); template< class ComparableToKey > size_type lazy_count( const ComparableToKey& r );

Perhaps even extensions to `lower_bound()` etc. should be allowed.

Conversion is a very important correctness aspects of the language. The language is not too good in this respect, although good compilers do give warnings.

**Motivation**:

- If there is no easy way to do a certain conversion, users tend to not do it.
- It is not uncommon that users simply turn conversion warnings off. Conversions need to be painless.
- We need these functors when applying a conversion to a sequence (perhaps via some kind of transform iterator).

**Remarks**:

The following functors are suggested:

template< class T > struct static_cast_to { template< class U > T operator()( U u ) const { return static_cast<T>(u); } }; template< class T > struct dynamic_cast_to { template< class U > T operator()( U u ) const { return dynamic_cast<T>(u); } }; template< class T > struct const_cast_to { template< class U > T operator()( U u ) const { return const_cast<T>(u); } }; template< class T > struct reinterpret_cast_to { template< class U > T operator()( U u ) const { return reinterprest_cast<T>(u); } };

If we are lucky enough to get a `numeric_cast<T>`
proposal we should also add

template< class T > struct numeric_cast_to { template< class U > T operator()( U u ) const { return numeric_cast<T>(u); } };

**Discussion**:

Are the arguments of `operator()()` properly specified?

The iterator library could benefit from these handy utilities from Boost.Util

template <class T> T next(T x) { return ++x; } template <class T, class Distance> T next(T x, Distance n) { std::advance(x, n); return x; } template <class T> T prior(T x) { return --x; } template <class T, class Distance> T prior(T x, Distance n) { std::advance(x, -n); return x; }

**Motivation**:

- the utilities are very handy
- the utilities are often requested by users
- the utilities are very simple

This would enable better support for precompiled headers. Even with modules in C++0x, this will be easy to do for many years until modules are widely available.

It is also considerably easier to teach
newbies that they simply include `<std>`.

An unknown function-pointer is often used when
eg. smart pointers provide an implicit conversion
to `bool`, only in a more restrictive form that
is only intended to be used in an if-statement:

smart_ptr<T> p = ...; ... if( p ) { ... }

The idiom is so common with library writers that we should standardize the type, for example

namespace std { typedefunspecified-typerestricted_bool; const restricted_bool restricted_true; const restricted_bool restricted_false; }

For some time these utilities have been part of boost.

The algorithms are easy to specify and complement
`min()` and `max()` algorithms very well:

template <class T> tuple<T const&, T const&> > minmax(const T& a, const T& b); template <class T, class BinaryPredicate> tuple<T const&, T const&> > minmax(const T& a, const T& b, BinaryPredicate comp); template <class ForwardIterator> std::pair<ForwardIterator,ForwardIterator> minmax_element(ForwardIterator first, ForwardIterator last); template <class ForwardIterator, class BinaryPredicate> std::pair<ForwardIterator,ForwardIterator> minmax_element(ForwardIterator first, ForwardIterator last, BinaryPredicate comp);

A few often use algorithms would be nice
to see in `<numeric>`:

template< class T, class InputIterator > T mean( InputIterator first, InputIterator last ); template< class T, class InputIterator > T variance( InputIterator first, InputIterator last ); template< class T, class InputIterator > T std_deviation( InputIterator first, InputIterator last ); template< class T, class InputIterator, class InputIterator2 > T covariance( InputIterator first, InputIterator last, InputIterator2 first2, InputIterator2 last2 ); template< class T, class InputIterator, class InputIterator2 > T correlation_coefficient( InputIterator first, InputIterator last, InputIterator2 first2, InputIterator2 last2 ); template< class T, class InputIterator, class InputIterator2 > tuple<T,T,T> least_square_line( InputIterator first, InputIterator last, InputIterator2 first2, InputIterator2 last2 ); template< class T, class InputIterator, class InputIterator2 > tuple<T,T> least_square_line_without_correlation( InputIterator first, InputIterator last, InputIterator2 first2, InputIterator2 last2 ); template< class T > T factorial( T n ); template< class T > T nPr( T n, T r ); template< class T > T nCr( T n, T r ); template< class T, class Integer > T binomial( T p, Integer n, Integer r ); template< class T, class Integer > T negative_binomial( T p, Integer n, Integer r ); template< class T, class Integer > T poisson( T lambda, Integer r ); template< class T, class Integer > T hypergeometric( Integer n, Integer r, Integer blue, Integer red ); template< class T, class Integer > T geometric( T p, Integer r );

**Motivation**:

- many of these functions are frequently used
- many of these functions are nice for beginners that need to make small games or be taught introductive statistics (this will benefit C++ as a language to start introductive programming in).
- functions like
`mean()`,`std_deviation()`and`variance()`are fairly easy to implement, but if high-quality single-pass algorithm takes considerable more time. As such, the library algorithm will be guaranteed to be optimal. - an implementation of all this may be found in the boost sandbox
under the directory
`stat`.

Consider adding some of the algorithms discussed on the Boost Wiki

In particular consider

`is_sorted()``nonunique_copy()`

The author has presented some ideas directly or indirectly taken from or inspired by the following people (at least):

- David Abrahams
- Howard Hinnant
- Scott Meyers
- Daniel Walker
- HervĂ© BrĂ¶nnimann
- Michael Goldshteyn

Also thanks to the people on `comp.std.c++` for their
suggestions. I think some of the
above ideas will make some of the suggestions much
easier to write efficiently outside the standard.