Document number: N3046=10-0036

Alisdair Meredith
2010-02-16

N3046 Iterators in C++0x

Motivation for paper

While the C++ iterators and algorithms interface has proven remarkably successful, a number of short-comings have come to light over the decade since it was designed. Many of these were addressed by the concepts work occurring as part of the C++0x project, and with the passing of that feature, all the work done on iterators has been effectively lost. This paper sets out to resolve as many of the known design and documentation issues as possible, following the model laid out in the cocepts work.

Known Issues

There are a number of LWG issues in this area, highlighted in the list below:

Number Clause Description Status (before) Status (after)
594 20.2.1 [utility.arg.requirements] Disadvantages of defining Swappable in terms of CopyConstructible and Assignable Open Open
742 20.2.1 [utility.arg.requirements] Enabling swap for proxy iterators Open Open
408 24.2 [iterator.requirements] Is vector<reverse_iterator<char*> > forbidden? Open NAD Editorial
446 24.2 [iterator.requirements] Iterator equality between different containers Open NAD Editorial
1210 24.2 [iterator.requirements] iterator reachability should not require a container New NAD Editorial
1212 24.2 [iterator.requirements] result of post-increment/decrement operator New NAD Editorial
1213 24.2 [iterator.requirements] Meaning of valid and singular iterator underspecified New Open
1185 24.2 [iterator.requirements] iterator categories and output iterators New NAD Editorial
484 24.2.1 [input.iterators] Convertible to T NAD Future Open
485 24.2.2 [output.iterators] output iterator insufficiently constrained Ready NAD Editorial
476 24.2.3 [forward.iterators] Forward Iterator implied mutability NAD Dup 1185
1311 24.2.3 [forward.iterators] multi-pass property of Forward Iterator underspecified New NAD Editorial
299 24.2.4 [bidirectional.iterators] Incorrect return types for iterator dereference Open NAD Editorial
458 24.2.5 [random.access.iterators] unintended limitation for operator- NAD NAD
1079 24.2.5 [random.access.iterators] RandomAccessIterator's operator- has nonsensical effects clause Open NAD Editorial
940 24.4.4 [iterator.operations] std::distance Ready NAD Editorial
1211 24.5.3.1 [move.iterator] move iterators should be restricted as input iterators Open Open

Note that several of the issues marked NAD were resolved as such on the basis that concepts would solve the problem, or remove the underlying wording. That is no longer the case, so we revisit those issues in this paper to see if there should be a solution as part of the 0x wording.

Motivating Examples

Here are several simple examples of types that either fail to be iterators in unexpected ways, or appear to offer more than they can actually deliver, while staying within the letter of the standard.

Integral Range Iterator


template<unsigned M, unsigned N> {
  struct iterator {
    unsigned position;

    iterator(unsigned val) : position{val} {}
    
    unsigned operator*() const { return position; }
    
    iterator& operator++() { ++position; return *this; }
    iterator& operator++(int) { return iterator{++position}; }
    
    bool operator==(iterator rhs) const { 
      return position == rhs.position;
    }
  };
  
  static iterator begin() { return iterator{M}; }
  static iterator end()   { return iterator{N}; }
};

This example is typical of a range of problematic iterators frequently referred to as proxy iterators. The key problem is that dereferencing the iterator does not return a true native reference to the underlying element. Examples in the library would include the move_iterator adaptor and the well-known vector<bool>::iterator.

Note that the proxy in the example is quite subtle, as it would be quite reasonable to return a const unsigned & from operator*, yet this would still be a proxy iterator. The reason is that each iterator is maintaining its own copy of position, rather than sharing a reference to the same element in the underlying sequence. Naturally, it is difficult to share elements that exist purely in the abstract, rather than physically in memory. The key to this problem is the following requirement on Forward Iterators:

"If a and b are both dereferenceable, then a == b if and only if *a and *b are the same object."

Infinite Range

The following constant iterator allows iteration over an infinite sequence of values.


template<unsigned N> {
  struct iterator {
    static const unsigned value = N;

    iterator() {}
    
    const unsigned & operator*() const { return value; }
    
    iterator& operator++() { return *this; }
    iterator& operator++(int) { return *this; }
    
    bool operator==(iterator rhs) const { 
      return true;
    }
  };
  

Note that this iterator seems fine at first, and does not suffer the proxy issue as all references truly refer to the same object. However, it has the unusual property that the distance between any two iterators is zero. This will be very surprising for consumers of Forward Iterators and the more capable categories, yet there is nothing restricting this to a mere Input Iterator.

Note that it would be very simple to add support for a distance property to this iterator by adding an additional size_t member to track position. It would seem a stretch to describe such a range as truly infinite though, if the maximum expressible distance is distinctly finite.

A Single-pass Forward Iterator

Consider the following iterator adaptor:


template<typename Iter> 
struct shared_iterator {
  shared_ptr position;

  shared_iterator(Iter val) : position{new Iter{val}} {}
  
  unsigned operator*() const { return *position; }
  
  shared_iterator& operator++() { ++*position; return *this; }
  shared_iterator operator++(int) { 
    shared_iterator result{*this};
    ++*iterator;
    return result;
  }
  
  bool operator==(const shared_iterator& rhs) const { 
    return *position == *rhs.position;
  }
};

This example subverts the notion that a Forward Iterator is a multi-pass iterator by simply wrapping any other iterator in a shared_ptr, and deferring all operations to the shared iterator. All copies of this iterator share the same underlying base iterator, so incrementing any one will increment them all. Yet this adaptor still meets all the requirements of a conforming Forward Iterator. While iterating the underlying sequence is consumed, but all copied iterators reach the end at the same time, and will continue to behave correctly as past-the-end iterators at that point. This is the basis of LWG issue 1311.

Other issues

Inconsistent Style

One of the features of the 0x standardization process in the library clauses has been an improved consistency of wording and style. There is less duplication, as common terms are defined in a single place, typically the Requirement section of clause 20, and then referenced as needed. The iterators clause relied on concepts to provide the same consistency, so the wording has not benefitted from the same attention to detail. The effects are more noticeable as the core language grows, delivering an increasingly diverse range of syntax to effect the same expression.

Shuffle Iterators

One of the key discoveries of the concepts work was a new iterator concept dubbed the Shuffle Iterator. This combined many of the features of a mutable Forward Iterator with the Swappable concept in order to better describe the requirements on algorithms in clause 25.

It is not clear that the requirements for a Shuffle Iterator can be so easily expressed without the language of concepts, so this paper makes no attempt to add Shuffle Iterators into the standard taxonomy of iterators.

Proposed Solutions

The goal of this paper is in many ways to be evolutionary, not revolutionary. Rather than immediately solving all the possible problems, the goal is to update the wording of this clause to a level consistent with the rest of the working paper, and consistent with the lessons from concepts. This clarified wording will form a strong basis for future papers or issues to address specific concerns

Reference Types

It is generally thought that the biggest problem facing iterator consumers is the poorly defined support for proxied iterators. This paper starts to address that by naming the reference type returned by an iterator's operator*. This is transparently made possible by extending iterator_traits with a new type alias using the new decltype keyword. This automatically supports pointers and all existing iterators with an appropriate default, and could be opened up as an extension point in the future, once clear requirements on the reference type have been defined. We choose to go no further on this issue at the moment, waiting on clarification of the wording for Swappable types among other issues.

Re-use Common Definitions

There are many useful terms defined in clause 20.2 [Requirements] that can simplify the presentation of the iterator requirements. This happened by default in the concept-based version of the library. The benefit is clear, as many of these terms have been tweaked with a considerable attention to detail in the currently proposed standard. The requirements clauses and tables for each iterator category are adjusted accordingly.

Constant Iterators can be Random Access Iterators

LWG 476 points out that it is very easy to read into the current wording the requirement that all Forward, Bidirectional and Random Access Iterators must also be Output Iterators. Hence, std::vector<T>::const_iterator is merely an Input Iterator. This was much clearer in the concpets version of the clause, so that wording is adopted here.

Wording

24.2 Iterator requirements [iterator.requirements]

1 Iterators are a generalization of pointers that allow a C++ program to work with different data structures (containers) in a uniform manner. To be able to construct template algorithms that work correctly and efficiently on different types of data structures, the library formalizes not just the interfaces but also the semantics and complexity assumptions of iterators. All input iterators i support the expression *i, resulting in a value of some class, enumeration, or built-in type T, called the value type of the iterator. All output iterators support the expression *i = o where o is a value of some type that is in the set of types that are writable to the particular iterator type of i. All iterators i for which the expression (*i).m is well-defined, support the expression i->m with the same semantics as (*i).m. For every iterator type X for which equality is defined, there is a corresponding signed integral type called the difference type of the iterator.

2 Since iterators are an abstraction of pointers, their semantics is a generalization of most of the semantics of pointers in C++. This ensures that every function template that takes iterators works as well with regular pointers. This International Standard defines five categories of iterators, according to the operations defined on them: input iterators, output iterators, forward iterators, bidirectional iterators and random access iterators, as shown in Table 100. Table 100 — Relations among iterator categories Random Access ! Bidirectional ! Forward ! Input ! Output

3 Forward iterators satisfy all the requirements of the input and output iterators and can be used whenever either kindan input iterator is specified; Bidirectional iterators also satisfy all the requirements of the forward iterators and can be used whenever a forward iterator is specified; Random access iterators also satisfy all the requirements of bidirectional iterators and can be used whenever a bidirectional iterator is specified.

4 Besides its category, a forward, bidirectional, or random access iterator can also be mutable or constant depending on whether the result of the expression *i behaves as a reference or as a reference to a constant. Constant iterators do not satisfy the requirements for output iterators, and the result of the expression *i (for constant iterator i) cannot be used in an expression where an lvalue is required.

4 Iterators that further satisy all the requirements of output iterators are called mutable iterators. Nonmutable iterators are referred to as constant iterators.

5 Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator i for which the expression *i is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. Iterators can also have singular values that are not associated with any containersequence. [ Example: After the declaration of an uninitialized pointer x (as with int* x;), x must always be assumed to have a singular value of a pointer. —end example ] Results of most expressions are undefined for singular values; the only exceptions are destroying an iterator that holds a singular value and the assignment of a non-singular value to an iterator that holds a singular value. In this case the singular value is overwritten the same way as any other value. Dereferenceable values are always non-singular.

After value-initialization, any iterator that satisfies the DefaultConstructible requirements ([defaultconstructible]) shall not introduce undefined behaviour when used as the source of a copy or move operation(*new). [Note: This guarantee is not offered for default-initialization (8.5 [dcl.init]), although the distinction only matters for types with trivial default constructors such as pointers, or aggregates holding pointers. This guarantee is essential for the implementation of the default constructor of generic iterator adaptors.— end note]

6 An iterator j is called reachable from an iterator i if and only if there is a finite sequence of applications of the expression ++i that makes i == j. If j is reachable from i, they refer to the same containersequence.

7 Most of the library's algorithmic templates that operate on data structures have interfaces that use ranges. A range is a pair of iterators that designate the beginning and end of the computation. A range [i,i) is an empty range; in general, a range [i,j) refers to the elements in the data structure starting with the oneelement pointed to by i and up to but not including the oneelement pointed to by j. Range [i,j) is valid if and only if j is reachable from i. The result of the application of functions in the library to invalid ranges is undefined.

8 All the categories of iterators require only those functions that are realizable for a given category in constant time (amortized). Therefore, requirement tables for the iterators do not have a complexity column.

9 Destruction of an iterator may invalidate pointers and references previously obtained from that iterator.

10 An invalid iterator is an iterator that may be singular.267

11 In the following sections, a and b denote values of type X or const X, n denotes a value of the difference type Distanceiterator_traits<X>::difference_type, u, tmp, and m denote identifiers, r denotes a value of X&, t denotes a value of valueobject type T, o denotes a value of some type that is writable to the output iterator. difference_type and reference refer to the types iterator_traits<X>::difference_type and iterator_traits<X>::reference respectively.

267) This definition applies to pointers, since pointers are iterators. The effect of dereferencing an iterator that has been invalidated is undefined.

*new) Typically such a value-initialized iterator will be a non-derefereneable past-the-end iterator for an empty sequence.

24.2.1 Iterator [iterator.iterators]

1 The Iterator requirements form the basis of the iterator concept taxonomy, and every iterator meets the Iterator requirements. This set of requirements specifies operations for dereferencing and incrementing the iterator, but provides no way to manipulate values. Most algorithms will require additional operations to read ([input.iterators]) or write ([output.iterators]) values, or to provide a richer set of iterator movements ([forward.iterators],[birectional.iterators], [random.access.iterators]).

2 A type X conforms to the Iterator requirements if:

  • X satisfies all the requirements of a CopyConstructible, CopyAssignable, Swappable, and Destructible type ([utility.arg.requirements])
  • There is a well-formed instantiation of the template iterator_traits<X>
  • The expressions shown in Table ?? Iterator Requirements are valid and have the indicated semantics
  • Table ?? - Iterator requirements
    Expression Return type Operational semantics Assertion/note/ pre-/post-condition
    *r reference pre: r is dereferenceable.
    ++r X&
    (void)r++ void { X tmp = r; ++r; return tmp; }

    24.2.12 Input iterators [input.iterators]

    1 A class or a built-in type X satisfies the requirements of an input iterator for the value type T if X satisfies the Iterator (24.2.1) and EqualityComparable (20.2) requirements , and the following expressions are valid, as shown in Table 101.

    2 In Table 101, the term the domain of == is used in the ordinary mathematical sense to denote the set of values over which == is (required to be) defined. This set can change over time. Each algorithm places additional requirements on the domain of == for the iterator values it uses. These requirements can be inferred from the uses that algorithm makes of == and !=. [ Example: the call find(a,b,x) is defined only if the value of a has the property p defined as follows: b has property p and a value i has property p if (*i==x) or if (*i!=x and ++i has property p). —end example ]

    Table 101 - Input iterator requirements
    OperationExpression Return Type Operational semantics Semantics,Assertion/note/ pre-/post-conditions
    X u(a); X post: u is a copy of a
    A destructor is assumed to be present and accessible.
    u = a; X& result: u
    post: u is a copy of a
    a == b convertible to bool == is an equivalence relation over its domain.
    a != b contextually convertible to bool !(a == b) pre: (a,b) is in the domain of ==.
    bool(a == b) != bool(a != b) over the domain of ==
    *a convertible to T pre: a is dereferenceable.
    If a == b and (a,b) is in the domain of == then *a is equivalent to *b.
    a->m type of X::m (*a).m pre: a is dereferenceable.
    pre: (*a).m is well-defined.
    Equivalent to (*a).m.
    ++r X& pre: r is dereferenceable.
    post: r is dereferenceable or r is past-the-end.
    post: any copies of the previous value of r are no longer required either to be dereferenceable or to be in the domain of ==.
    (void)r++ equivalent to (void)++r
    *r++ convertible to T { T tmp = *r; ++r; return tmp; } { T tmp = *r; ++r; return tmp; }

    3 [ Note: For input iterators, a == b does not imply ++a == ++b. (Equality does not guarantee the substitution property or referential transparency.) Algorithms on input iterators should never attempt to pass through the same iterator twice. They should be single pass algorithms. Value type T is not required to be an CopyAssignable type (230.2). These algorithms can be used with istreams as the source of the input data through the istream_iterator class template. —end note ]

    24.2.23 Output iterators [output.iterators]

    1 A class or a built-in type X satisfies the requirements of an output iterator if X is a CopyConstructible (34) and Assignablesatisfies the requirements of an Iterator type (23.2) and also the following expressions are valid, as shown in Table 102.

    Table 102 — Output iterator requirements
    Expression Return type Operational semantics Assertion/note/ pre-/post-condition
    X(a) a = t is equivalent to X(a) = t.
    note: a destructor is assumed.
    X u(a);
    X u = a;
    *r = o result is not used post: r is not required to be dereferenceable.
    post: r is incrementable.
    ++r X& &r == &++r.
    post: r is dereferenceable, unless otherwise specified.
    post: r is not required to be incrementable.
    r++ convertible to const X& { X tmp = r; ++r; return tmp; } post: r is dereferenceable, unless otherwise specified.
    post: r is not required to be incrementable.
    *r++ = o result is not used post: r is dereferenceable, unless otherwise specified.
    post: r is not required to be incrementable.

    2 [ Note: The only valid use of an operator* is on the left side of the assignment statement. Assignment through the same value of the iterator happens only once. Algorithms on output iterators should never attempt to pass through the same iterator twice. They should be single pass algorithms. Equality and inequality might not be defined. Algorithms that take output iterators can be used with ostreams as the destination for placing data through the ostream_iterator class as well as with insert iterators and insert pointers. —end note ]

    24.2.34 Forward iterators [forward.iterators]

    1 A class or a built-in type X satisfies the requirements of a forward iterator if the following expressions are valid, as shown in Table 103.:

  • X satisfies all the requirements of an input iterator
  • X satisfies all the requirements of a DefaultConstructible type ([utility.arg.requirements])
  • reference is an alias for T &(*new2).
  • The expressions shown in Table 103 are valid, and have the described semantics.
  • Objects of type X offer the multi-pass guarantee below.
  • When satisfying the listed requirements, two forward iterator objects a and b are considered equivalent if a == b. The domain of == for forward iterators is that of iterators into the same underlying sequence, or past-the-end iterators to that same sequence. In general, two past-the-end iterators derived from different sequences cannot be compared without further guarantees.

    Footnote new2) T may be const-qualified, for example if X is a constant iterator.

    Table 103 — Forward iterator requirements
    Expression Return type Operational semantics Assertion/note/ pre-/post-condition
    X u; note: u might have a singular value.
    note: a destructor is assumed.
    X() note: X() might be singular.
    X(a) a == X(a)
    X u(a);
    X u = a;
    X u; u = a; post: u == a
    a == b convertible to bool == is an equivalence relation.
    a != b convertible to bool !(a == b)
    r = a X& post: r == a
    *a T& if X is mutable,
    otherwise const T&
    reference
    pre: a is dereferenceable.
    a == b implies *a ==and *b refer to the same element.
    If X is mutable, *a = t is valid.
    a->m U& if X is mutable,
    otherwise const U&
    (*a).m pre: (*a).m is well-defined.
    ++r X& pre: r is dereferenceable.
    post: r is dereferenceable or r is past-the-end.
    r == s and r is dereferenceable implies ++r == ++s.
    &r == &++r.
    r++ convertible to const X& { X tmp = r; ++r; return tmp; }
    *r++ T& if X is mutable, otherwise const T&
    reference

    — If a and b are equal, then either a and b are both dereferenceable or else neither is dereferenceable.
    — If a and b are both dereferenceable, then a == b if and only if *a and *b are the same objectrefer to the same element in the underlying sequence.

    2 [ Note: The condition that a == b implies ++a == ++b (which is not true for input and output iterators) and the removal of the restrictions on the number of the assignments through the iterator (which applies to output iterators) allows the use of multi-pass one-directional algorithms with forward iterators. —end note ]

    Two derefenceable iterators a and b of type X deliver the mutli-pass guarantee if the following sequence of expression statements and assertions hold:

    
        assert(a == b);
        ++a;
        assert(a != b);
        ++b;
        assert(a == b);
    

    24.2.45 Bidirectional iterators [bidirectional.iterators]

    1 A class or a built-in type X satisfies the requirements of a bidirectional iterator if, in addition to satisfying the requirements for forward iterators, the following expressions are valid as shown in Table 104.

    Table 104 — Bidirectional iterator requirements (in addition to forward iterator)
    Expression Return type Operational semantics Assertion/note/ pre-/post-condition
    --r X& pre: there exists s such that r == ++s.
    post: r is dereferenceable.
    --(++r) == r.
    --r == --s implies r == s.
    &r == &--r.
    r-- convertible to const X& { X tmp = r; --r; return tmp; }
    *r-- convertible to T
    reference

    2 [ Note: Bidirectional iterators allow algorithms to move iterators backward as well as forward. —end note ]

    24.2.56 Random access iterators [random.access.iterators]

    1 A class or a built-in type X satisfies the requirements of a random access iterator if, in addition to satisfying the requirements for bidirectional iterators, the following expressions are valid as shown in Table 105.

    Table 105 — Random access iterator requirements (in addition to bidirectional iterator)
    Expression Return type Operational semantics Assertion/note/ pre-/post-condition
    r += n X&
    
    { Distancedifference_type m = n;
      if (m >= 0)
        while (m--)
          ++r;
        else while (m++)
          --r;
      return r; }
    
    a + n
    n + a
    X { X tmp = a; return tmp += n; } a + n == n + a.
    r -= n X& return r += -n;
    a - n X { X tmp = a; return tmp -= n; }
    b - a Distancedifference_type (a < b) ? distance(a,b) : -distance(b,a)
    return n;
    pre: there exists a value n of Distancedifference_type such that a + n == b.
    b == a + (b - a).
    a[n] convertible to const T &reference *(a + n)
    a < b contextually convertible to bool b - a > 0 < is a total ordering relation
    a > b contextually convertible to bool b < a > is a total ordering relation opposite to <.
    a >= b contextually convertible to bool !(a < b)
    a <= b contextually convertible to bool !(a > b)

    Synchronization

    In order to remain compatible with Ready issues currently going being handled by regular issues list processing, the following changes should also be made:

    Issue 940

    Change 24.4.4 [iterator.operations]/4+5 as indicated:

    
    template<class InputIterator>
      typename iterator_traits<InputIterator>::difference_type
        distance(InputIterator first, InputIterator last);
    

    4 Effects: If InputIterator meets the requirements of random access iterator then returns (last - first), otherwise rReturns the number of increments or decrements needed to get from first to last.

    5 Requires: If InputIterator meets the requirements of random access iterator then last shall be reachable from first or first shall be reachable from last, otherwise last shall be reachable from first.