[Ranges] Ranges and Iterables

Sean Parent sean_parent at mac.com
Sat Mar 1 00:55:41 CET 2014


On Fri, Feb 28, 2014 at 2:49 PM, Eric Niebler <eric.niebler at gmail.com>wrote:

>
>
> If we assume the delimiting predicate takes a position instead of a
> value, then even the case where the limiter is a position can be handled
> as a delimiter with bind. So the delimiter is the most general. Only the
> counted algorithms are the odd man out here.
>

Yes, I think there are interesting models where this generalization is
applicable (for example, a linked list where the last node points back to
itself). The ASL Forest data structure has a number of similar constructs.

I still view using operator==() for this as a bit of a hack. It isn't
horrible though since one can defined operator==() in a non-intrusive
fashion. And there are cases where we can do better if we know the end
position is in fact a position.


>
> > If we allow a position limiter to be of a different type, then equality
> > of the two positions becomes isomorphic with a position and a delimiter.
> > That is we treat operator==(T, U) as a predicate where U is bound.
>
> Right.
>
> > Some other observations. I do not know of any algorithm that requires
> > bidirectional iterators that could work on a range other than a position
> > range.
>
> std::regex_search is one. There may be others.
>

I should have stated that more strongly - any algorithm that requires
bidirectional iterators will end up requiring a positional range. I haven't
looked at std::regex_search implementation but I'm certain that the
iterator that is being backed up is derived from an iterator that has moved
forward and is now bounded by a copy of that iterator.


> > So the interesting cases that Eric's proposal does not handle (at least
> > not yet) are the cases where a range consists of an input or forward
> > iterator and the limiter is a count. I do feel strongly that these are
> > important cases.
>
> Basically true. Iterables can accommodate (iterator,count) pairs, but
> not without a perf hit in some cases that Sean has spotted. It's
> something I'm actively thinking about.
>

Keep thinking!


>
> > I also object to the notion of infinite ranges - which are simply
> > unbounded positions. As for the problem of an N way zip, provide an
> > interface to zip that takes two tuples, one of iterators and one of
> > ranges. This would generally be a useful construct anyway as often times
> > when zipping items together it is known that the sizes are equal and so
> > there is no need for redundant bounds checking.
>
> I could live with that, although it feels like a more cumbersome
> interface. Infinite ranges allow for a simpler, more uniform API.
>

Abstraction isn't about hiding the complexity that is there, it is about
capturing the complexity that is there in a minimal form.


>
> You allow someone to build a range with an iterator and a delimiting
> predicate. If this predicate always returns false, isn't that an
> infinite range? And if someone passed such a range to zip, wouldn't it
> "work"? If if the predicate were a constant expression wouldn't it have
> exactly zero overhead? Are you going to tell people they can't do that,
> and if so, why?
>

There isn't anyway to enforce such a limit. But I wouldn't require that
someone package an iterator into a rage just to use it in an interface
either. Let me be honest and just use an unbounded iterator.


>
> > I would define a range along these lines.
> >
> > A type T denotes a range if:
> >
> > It has a begin() -> position_t<R>
> > It provides at least one of the following limiter operations:
> >    end() -> position_t <R>
> >    size() -> distance_type_t<R>
> >    delimiter() -> bool (const value_type_t<R>&)
> >    sentinel() -> value_type_t<R>
> >
> > All of the above operations must be constant time if provided.
> > If sentinel() is provided then delimiter() must also be provided and be
> > equivalent to { return x == sentinel(r); }. This should probably be
> > handled by having a default implementation of delimiter().
> > If the position is random access, then either both size() and end() are
> > provided or neither. Probably provided with a default implementation of
> > size().
> >
> > Although at first glance this seems like it would create a mess of
> > dispatch with algorithms, I don't believe that to be true, it doesn't
> > change the fundamentals of the underlying algorithms and so most
> > interfaces will simply be written in terms of a relatively small number
> > of basis algorithms.
>
> Even at second or third glance (for me), it seems like it would create a
> mess of dispatch. A range-based for_each, accumulate, or transform (to
> pick 3 at random) would need to do one of 4 different things depending
> on the type of range passed in (assuming delimiting predicate takes
> position instead of value):
>
> - PositionRange -> dispatch to (r.begin(),[&](i){return i==r.end();})
> - CountedRange -> dispatch to (r.begin(),n)
> - SentinelRange -> dispatch to (r.begin(),[&](i){return
>                                           *i==sentinel(r);})
> - DelimitedRange -> dispatch to (r.begin(),delimiter(r));
>


I don't think I'd even dispatch in these cases, I'd provide complete
implementations for each case. That really doesn't bother me at all, these
are foundational algorithms that we _should_ have _many_ different variants
of. But from an implementation standpoint you are free to collapse
internally as you have described. You could probably make this work:

template <typename R, // R denotes range
          typename F> // F models void(value_type_t<R>)
iterator_t<R> for_each(R& r, F f) { __for_each(begin(r), __limit_check(r),
f); }

Writing __limit_check() is an exercise left to the reader, but such a check
will only be of use for some algorithms. Others will be more efficient
coded more directly.

I want to point out the result of the above for_each(). Since we are not
dealing with an iterator range defined by a pair of iterators, we must
return the final position of the iterator. A for_each with the current
signature where the End iterator is of a different type, has an incorrect
signature. After calling it there is no way to continue from the new
position. That is, if I have an input stream and I want to progress to the
current end, then after there are more items in the stream I have no way to
continue from the current position.


> Although all the algorithms would funnel down to 1 or 2 underlying
> algorithms, we can't avoid all the dispatching boilerplate. (And of
> course some algorithms will need to further dispatch on the iterator
> category of r.begin().) It's a lot of dispatching code for every
> range-based algorithm that anybody ever writes.
>

Sure we can - we just provide a good set of transformations to make the
funneling down simple. I picked lower_bound in my example because using
distance() is a simple, existing, transformation. We need a handful of
others so you can go from however the range interface to however your
algorithm is defined in a single call.


>
> > As noted above, there are several transformations
> > available. i.e., lower_bound() would look like:
> >
> > template <typename R, // R denotes a range where position_t<R> models
> > ForwardIterator
> >           typename T> // T is value_type_t<R>
> > position<R> lower_bound(R& r, const T& x) {
> >     return lower_bound_n(begin(r), distance(r), x);
> > }
> >
> > The existing lower_bound() would be written as:
> >
> > template <typename I, // I models ForwardIterator
> >           typename T> // T is value_type_t<R>
> > I lower_bound(I f, I l, const T& x) {
> >     return lower_bound_n(f, distance(f, l), x);
> > }
> >
> > In my mind, this is a much simpler and complete approach.
>
> It's more complete in that I haven't handled counted algorithms yet. I
> only became aware of the problem 2 days ago, so gimme a minute. :-)
>

Tick, tick, tick...


>
> I disagree that it's simpler, though. It's certainly not simpler for
> people who are writing algorithms or reading the <algorithm> header. I
> don't think four separate range hierarchies are simpler than 1.
>

What four separate range hierarchies? I have a single (variant) definition
of a range, there are no hierarchies. There is no
"RandomAccessCountedRange".

>
> With Iterables, there is generally one range algorithm that dispatches
> to one iterator/sentinel algorithm. And we don't have to add that
> algorithm; it's the one that exists today (modulo the type of end). Even
> counted ranges can be handled by dispatching to that one for the cases
> where bundling the iterator and the count together incurs no extra
> overhead -- which I think is true for many/most (e.g. for_each,
> transform, accumulate). Those algorithms for which keeping the position
> and the count separate is a win will need to be handled specially, but
> (a) it's far less work than what you're suggesting, and (b) it's an
> optimization only; that is, code that works with Iterables but is
> oblivious to counted ranges will work with counted ranges and have the
> same algorithmic complexity.
>

That's a BS argument - InputIterators work as defined with the same
complexity our Iterables. I also disagree it is far less work, as noted
about your approach has serious implications to the interface to nearly all
the algorithms, I believe all will require at least one additional result
value. In either case the entire implementation of the STL library should
be revisited and recoded to the new concepts. I happen to think my approach
is far less work in this regard because the ideas presented already exist
in the current implementation.

Sean


> Eric
>
>
>
>
>
> > On Fri, Feb 28, 2014 at 10:03 AM, Eric Niebler <eric.niebler at gmail.com
> > <mailto:eric.niebler at gmail.com>> wrote:
> >
> >     On 02/28/2014 06:22 AM, Andrew Sutton wrote:
> >     >> I think the answer is that we have a ton of existing algorithms
> >     >> that work with "end" iterators, and we'd like to keep those.
> >     >
> >     > More generally, we have a ton of existing algorithms, and we'd like
> >     > to keep those :)
> >     >
> >     > I think that any proposal that tries to deprecate (whatever that
> >     > means) or replace the existing STL algorithms, modify their
> >     > semantics, or replace its design philosophy should be a
> non-starter.
> >
> >     Agreed. I'm not sure anybody believes that iterators can be built on
> top
> >     of ranges -- iterators are lower-level. Any attempt to build a range
> >     library where ranges are the primitives is throwing away every
> existing
> >     iterator-based algorithm.
> >
> >     > Eric's work shows that there are cases where the current begin/end
> >     > can be relaxed to make those algorithms more general without loss
> of
> >     > performance. I think we'd make progress by focusing on finding the
> >     > set of algorithms that the generalization benefits the most.
> >     >
> >     > I'm not sure that *every* STL algorithm can be reasonably weakened
> >     > from Iterators to Iterables. But then, I haven't done the
> >     > implementation work, so I can't say that with any great confidence.
> >
> >     Once we make it easy for people to create Iterables, they will. If
> the
> >     algorithms don't work with them, they'll complain, so we would need
> to
> >     make as many algorithms work with them as is possible. I've gone
> through
> >     all the C++03 algorithms in the STL and found the ones that can be
> made
> >     to work with the weaker concepts. The good news is that most of them
> >     work just fine with Iterables, and the change is trivial. I describe
> the
> >     results of my survey here:
> >
> >
> http://ericniebler.com/2014/02/21/introducing-iterables#iterables-and-the-stl-algorithms
> >
> >     Eric
> >     _______________________________________________
> >     Ranges mailing list
> >     Ranges at isocpp.open-std.org <mailto:Ranges at isocpp.open-std.org>
> >     http://www.open-std.org/mailman/listinfo/ranges
> >
> >
> >
> >
> > _______________________________________________
> > Ranges mailing list
> > Ranges at isocpp.open-std.org
> > http://www.open-std.org/mailman/listinfo/ranges
> >
> _______________________________________________
> Ranges mailing list
> Ranges at isocpp.open-std.org
> http://www.open-std.org/mailman/listinfo/ranges
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.open-std.org/pipermail/ranges/attachments/20140228/65dba633/attachment-0001.html 


More information about the Ranges mailing list