P1664R2
reconstructible_range - a concept for putting ranges back together

Published Proposal,

Authors:
Audience:
LEWG
Project:
ISO/IEC JTC1/SC22/WG21 14882: Programming Language — C++
Target:
C++23
Latest:
https://thephd.github.io/vendor/future_cxx/papers/d1664.html

Abstract

This paper proposes new concepts to the Standard Library for ranges called Reconstructible Ranges for the purpose of ensuring a range/view broken down into its two iterators can be "glued" back together using an ADL-found function taking a tag, the range’s iterator, and the range’s sentinel.

1. Revision History

1.1. Revision 2 - January 13th, 2020

1.2. Revision 1 - November 24th, 2019

1.3. Revision 0 - August, 5th, 2019

2. Motivation

Currently With Proposal
vector vec{1, 2, 3, 4, 5};
span s{vec.data(), 5};
span v = s | views::drop(1) | views::take(10)
           | views::drop(1) | views::take(10);
❌ - Compilation error
vector vec{1, 2, 3, 4, 5};
span s{vec.data(), 5};
auto v = s | views::drop(1) | views::take(10)
           | views::drop(1) | views::take(10);
⚠️ - Compiles, but decltype(v) is ranges::take_view<ranges::drop_view<ranges::take_view<ranges::drop_view<span<int, dynamic_extent>>>>>
vector vec{1, 2, 3, 4, 5};
span s{vec.data(), 5};
span v = s | views::drop(1) | views::take(10)
           | views::drop(1) | views::take(10);
✔️ - Compiles and works with no extra template instantiations
vector vec{1, 2, 3, 4, 5};
span s{vec.data(), 5};
auto v = s | views::drop(1) | views::take(10)
           | views::drop(1) | views::take(10);
✔️ - Compiles and works with no extra templates. decltype(v) is std::span<int>
std::u8string name = "𐌀𐌖𐌋𐌄𐌑𐌉·𐌌𐌄𐌕𐌄𐌋𐌉𐌑 𐑡𐑹𐑡 ·𐑚𐑻𐑯𐑸𐑛 ·𐑖𐑷";
char16_t conversion_buffer[432];

std::u8string_view name_view(name);
std::span<char16_t> output(conversion_buffer, 432);

auto encoding_result = std::text::transcode(input, output);
// compiler error here !!
std::u8string_view unprocessed_code_units 
	= encoding_result.input;
std::span<char16_t> unconsumed_output 
	= encoding_result.output;
❌ - Compilation error
std::u8string name = "𐌀𐌖𐌋𐌄𐌑𐌉·𐌌𐌄𐌕𐌄𐌋𐌉𐌑 𐑡𐑹𐑡 ·𐑚𐑻𐑯𐑸𐑛 ·𐑖𐑷";
char16_t conversion_buffer[432];

std::u8string_view name_view(name);
std::span<char16_t> output(conversion_buffer, 432);

auto encoding_result = std::text::transcode(input, output);
auto unprocessed_code_units = encoding_result.input;
auto unconsumed_output = encoding_result.output;
⚠️ - Compiles, but decltype(unprocessed_code_units) is ranges::subrange<std::u8string_view::iterator, std::u8string_view::iterator> and decltype(unconsumed_output) is ranges::subrange<std::span<char16_t, std::dynamic_extent>::iterator, std::span<char16_t, std::dynamic_extent>::iterator>
std::u8string name = "𐌀𐌖𐌋𐌄𐌑𐌉·𐌌𐌄𐌕𐌄𐌋𐌉𐌑 𐑡𐑹𐑡 ·𐑚𐑻𐑯𐑸𐑛 ·𐑖𐑷";
char16_t conversion_buffer[432];

std::u8string_view name_view(name);
std::span<char16_t> output(conversion_buffer, 432);

auto encoding_result = std::text::transcode(input, output);
// compiler error here !!
std::u8string_view unprocessed_code_units 
	= encoding_result.input;
std::span<char16_t> unconsumed_output 
	= encoding_result.output;
✔️ - Compiles and works, types match input.
std::u8string name = "𐌀𐌖𐌋𐌄𐌑𐌉·𐌌𐌄𐌕𐌄𐌋𐌉𐌑 𐑡𐑹𐑡 ·𐑚𐑻𐑯𐑸𐑛 ·𐑖𐑷";
char16_t conversion_buffer[432];

std::u8string_view name_view(name);
std::span<char16_t> output(conversion_buffer, 432);

auto encoding_result = std::text::transcode(input, output);
auto unprocessed_code_units = encoding_result.input;
auto unconsumed_output = encoding_result.output;
✔️ - Compiles and works, where decltype(unprocessed_code_units) is std::u8string_view and decltype(unconsumed_output) is std::span<char16_t>.

Currently in C++, there is no Generic ("with a capital G") way to take a range apart with its iterators and put it back together. That is, the following code is not guaranteed to work:

template <typename Range>
auto operate_on_and_return_updated_range (Range&& range) {
	using uRange = std::remove_cvref_t<Range>;
	if (std::ranges::empty(range)) {
		// ... the below errors
		return uRange(std::forward<Range>(range));
	}
	/* perform some work with the
	iterators or similar */
	auto first = std::ranges::begin(range);
	auto last = std::ranges::end(range);
	if (*first == u'\0xEF') {
		// ...
		std::advance(first, 3);
		// ...
	}
	// ... algorithm finished,
	// return the "updated" range!

	// ... but the below errors
	return uRange(std::move(first), std::move(last));
}

int main () {
	std::string_view meow_view = "나는 유리를 먹을 수 있어요. 그래도 아프지 않아요";
	// this line will error
	std::string_view sub_view
		= operate_on_and_return_updated_range(meow_view);
	return 0;
}

The current fix is to employ std::ranges::subrange<I, S, K> to return a generic subrange:

template <typename Range>
auto operate_on_and_return_updated_range (Range&& range) {
	using uRange = std::remove_cvref_t<Range>;
	using I = std::Iterator<uRange>;
	using S = std::Sentinel<uRange>;
	using Result = std::ranges::subrange<I, S>;
	if (std::ranges::empty(range)) {
		return uRange(std::forward<Range>(range));
	}
	// perform some work with the
	// iterators or similar
	auto first = std::ranges::begin(range);
	auto last = std::ranges::end(range);
	if (*first == u'\0xEF') {
		// ...
		std::advance(first, 3);
		// ...
	}
	// ... algorithm finished,
	// return the "updated" range!

	// now it works!
	return Result(std::move(first), std::move(last));
}

int main () {
	std::string_view meow_view = "나는 유리를 먹을 수 있어요. 그래도 아프지 않아요";
	auto sub_view
		= operate_on_and_return_updated_range(meow_view);
	// decltype(sub_view) ==
	//   std::ranges::subrange<std::string_view::iterator,std::string_view::iterator>
	//   which is nowhere close to ideal.
	return 0;
}

This makes it work with any two pair of iterators, but quickly becomes undesirable from an interface point of view. If a user passes in a std::span<T, Extent> or a std::basic_string_view<Char, Traits> that interface and information is entirely lost to the user of the above function. std::ranges::subrange<Iterator, Sentinel, Kind> does not -- and cannot/should not -- mimic the interface of the view it was created from other than what information comes from its iterators: it is the barebones idea of a pair-of-iterators/iterator-sentinel style of range. This is useful in the generic sense that if a library developer must work with iterators, they can always rely on creation of a std::ranges::subrange of the iterator and sentinel.

Unfortunately, this decreases usability for end users. Users who have, for example a std::string_view, would prefer to have the same type after calling an algorithm. There is little reason why the original type needs to be discarded if it supports being put back together from its iterators. This break-it-apart-and-then-generic-subrange approach also discards any range-specific storage optimizations and layout considerations, leaving us with the most bland kind of range similar to the "pair of iterators" model. Compilation time goes up as well: users must spawn a fresh std::ranges::subrange<I, S, K> for every different set of iterator/sentinel/kind triplet, or handle deeply nested templates in templates as the input types. This makes it impossible to compile interfaces as dynamic libraries without having to explicitly materialize or manually cajole a std::ranges::subrange into something more palatable for the regular world.

There is also a problem where there are a wide variety of ranges that could conceivably meet this criterion, but do not. The author of this paper was not the only one to see utility in such operations. [p1739r0] does much the same that this paper does, without the introduction of a concept to formalize the behavior it presents. In particular, it selects views which can realistically have their return types changed to match the input range and operations being performed (or a similarly powerful alternative) by asking whether they can be called with a function called reconstruct from a subrange of the iterators with the expressions acted upon. This paper does not depend on any other papers, but note that the changes from [p1739r0], [p1391r2] and [p1394r2] all follow down to the logical conclusion laid out here:

3. Design

The design is given in 2 concepts added to the standard:

template <class R,
	class It = ranges::iterator_t<remove_reference_t<R>>,
	class Sen = ranges::sentinel_t<remove_reference_t<R>>>
concept pair_reconstructible_range =
	ranges::range<R> &&
	safe_range<remove_reference_t<R>> &&
	requires (It first, Sen last) {
		reconstruct(
			in_place_type<remove_cvref_t<R>>,
			std::move(first),
			std::move(last)
		);
	};

template <class R,
	class It = ranges::iterator_t<remove_reference_t<R>>,
	class Sen = ranges::sentinel_t<remove_reference_t<R>>>
concept reconstructible_range =
	ranges::range<R> &&
	safe_range<remove_reference_t<R>> &&
	requires (ranges::subrange<It, Sen> first_last) {
		reconstruct(
			in_place_type<remove_cvref_t<R>>,
			std::move(first_last).begin(),
			std::move(first_last).end()
		);
	};

It is the formalization that a range can be reconstructed from its begin iterator and end iterator/sentinel. It also provides a concept for allowing a range to be put back together from a subrange of its iterator/sentinel pair. This allows a developer to propagate the input type’s properties after modifying its iterators for some underlying work, algorithm or other effect. This concept is also the basis of the idea behind [p1739r0].

Both concepts require that the type with any references removed model the concept safe_range. This ensures that the validity of the iterators is in fact independent of the lifetime of the range they originate from and that a "reconstructed" range does not depend on the original. We remove reference before performing this check, because all reference types that model Range also model safe_range and the intent of the proposed changes is narrower: (re)construction is assumed to be in constant time (this typically implies that R also models View, but it is sufficient to check safe_range<remove_reference_t<R>>). Note that this explicitly excludes types like std::vector<int> const & from being reconstructible.

3.1. Should this apply to all Ranges?

Not all ranges can meet this requirement. Some ranges contain state which cannot be trivially propagated into the iterators, or state that cannot be reconstructed from the iterator/sentinel pair itself. However, most of the common ranges representing unbounded views, empty views, iterations viewing some section of non-owned storage, or similar can all be reconstructed from their iterator/iterator or iterator/sentinel pair.

For example std::ranges::single_view contains a exposition *semiregular-box* template type (ranges.semi.wrap) which holds a value to iterate over. It would not be possible to reconstruct the exact same range (e.g., iterators pointing to the exact same object) with the semi-regular wrapper.

Finally, there are ranges which could be reconstructible by just the definition of a constructor that takes iterators or a subrange. While this was a benefit in that the majority of conforming ranges were reconstructible, it made a situation where someone could write a strange range. For example, consider this alternative implementation of rng | views::drop(1):

class pop_front_view {
private:
	int *begin_, *end_;

public:
	pop_front_view() = default;

	pop_front_view(int* begin, int* end)
		: begin_(begin==end?begin:begin+1), end_(end)
	{}

	int* begin() const { return begin_; }

	int* end() const { return end_; }
};

Reconstructing this range using the constructors is a surefire way to have a developer scratching at their head, wondering what is going on. Therefore, rather than require reconstruction through the constructor, we rely instead on an Customization Point Object design instead.

3.2. Applicability

There are many ranges to which this is applicable, but only a handful in the standard library need or satisfy this. If [p1391r2] and [p1394r2] are accepted, then the two most important view types -- std::span<T, Extent> and std::basic_string_view<Char, Traits> -- will model both concepts. std::ranges::subrange<Iterator, Sentinel, Kind> already fits this as well. By formalizing concepts in the standard, we can dependably and reliably assert that these properties continue to hold for these ranges. The ranges to which this would be helpfully applicable to in the current standard and proposals space are:

There are also upcoming ranges from [range-v3] and elsewhere that could model this concept:

And there are further range adaptor closure objects that could make use of this concept:

Note that these changes will greatly aid other algorithm writers who want to preserve the same input ranges. In the future, the standard may provide an ranges::reconstruct(...) algorithm for these types.

3.3. Two Concepts

By giving these ranges Iterator, Sentinel, and/or std::ranges::subrange<Iterator, Sentinel> reconstruction functions, we can enable a greater degree of interface fidelity without having to resort to std::ranges::subrange for all generic algorithms. There should be a preference for reconstruct(std::inplace_type<R>, Iterator, Sentinel) reconstruction, because one-argument functions can have extremely overloaded meanings. It also produces less compiler boilerplate to achieve the same result of reconstructing the range when one does not have to go through std::ranges::subrange<I, S, K>. However, it is important to attempt to move away from the iterator, sentinel model being deployed all the time: std::ranges::subrange offers a single type that can accurately represent the intent and can be fairly easy to constrain overload sets on (most of the time).

This paper includes two concepts that cover both reconstructible methods.

4. Impact

Originally, the impact of this feature was perceived to be small and likely not necessary to work into C++20. Indeed: this paper originally targeted C++23 with the intent of slowly working through existing ranges and range implementations and putting the concept and the manifestation of concepts in range libraries, particularly range-v3, over time.

This changed in the face of [p1739r0]. Hauswedell’s paper here makes it clear there are usability and API wins that are solved by this concept for APIs that are already in the working draft today, and that not having the concept has resulted in interface inconsistency and ad-hoc, one-off fixes to fit limited problem domains without any respite to routines which have a desire to preserve the input types into their algorithms. Since this paper’s concept is likely to change interfaces API return values in a beneficial but ultimately breaking manner, this paper’s consideration was brought up to be presented as a late C++20 paper for the purpose of fixing the interface as soon as possible.

Note that this is a separate concept. It is not to be added to the base Range concept, or added to any other concept. It is to be applied separately to the types which can reasonably support it for the benefit of algorithms and code which can enhance the quality of their implementation.

Unfortunately, the paper was removed from consideration due to new information during the C++ November 2019 Belfast Standardization Meeting. It is to be re-litigated at the C++ February 2020 Prague Standardization Meeting, maybe.

5. Proposed Changes

The following wording is relative to the latest C++ Draft paper.

5.1. Feature Test Macro

This paper results in a concept to help guide the further development of standard ranges and simplify their usages in generic contexts. There is one proposed feature test macro, __cpp_lib_reconstructible_range, which is to be input into the standard and then explicitly updated every time a reconstruct from a is added to reflect the new wording. We hope that by putting this in the standard early, most incoming ranges will be checked for compatibility with pair_reconstructible_range and reconstructible_range.

5.2. Intent

The intent of this wording is to provide greater generic coding guarantees and optimizations by allowing for a class of ranges and views that model the new exposition-only definitions of a reconstructible range:

If safe_range is changed to the safe_range concept name, then this entire proposal will rename all its uses of safe_range.

For ease of reading, the necessary portions of other proposal’s wording is duplicated here, with the changes necessary for the application of reconstructible range concepts. Such sections are clearly marked.

5.3. Proposed Library Wording

Add a feature test macro __cpp_lib_reconstructible_range.

Insert into §24.2 Header <ranges> Synopsis [ranges.syn] a new customization point object in the inline namespace:

namespace std::ranges {
  inline namespace unspecified {
 
  ...
    inline constexpr unspecified reconstruct = unspecified;
  }
  ...
}

Insert into §24.4.2 Ranges [range.range]'s after paragraph 7, one additional paragraph:

8 The concepts pair_reconstructible_range and reconstructible_range concepts describe the requirements on ranges that are efficiently constructible from values of their iterator and sentinel types.
template <class R, 
	class It = ranges::iterator_t<remove_reference_t<R>>,
	class Sen = ranges::sentinel_t<remove_reference_t<R>>>
concept pair_reconstructible_range =
	ranges::range<R> &&
	safe_range<remove_reference_t<R>> &&
	requires (It first, Sen last) {
		reconstruct(
			in_place_type<remove_cvref_t<R>>,
			std::move(first),
			std::move(last)
		);
	};

template <class R, 
	class It = ranges::iterator_t<remove_reference_t<R>>,
	class Sen = ranges::sentinel_t<remove_reference_t<R>>>
concept reconstructible_range =
	ranges::range<R> &&
	safe_range<remove_reference_t<R>> &&
	requires (ranges::subrange<It, Sen> first_last) {
		reconstruct(
			in_place_type<remove_cvref_t<R>>,
			std::move(first_last).begin(),
			std::move(first_last).end()
		);
	};

9 Let r be a range with type R.

9.1 — Let re be the result of reconstruct(in_place_type<remove_cvref_t<R>>, ranges::begin(r), ranges::end(r)) if such an expression is well-formed. r models pair_reconstructible_range if
ranges::begin(r) == ranges::begin(re) is true, and
ranges::end(r) == ranges::end(re) is true.
9.2 — Let sub_re be the result of reconstruct(in_place_type<remove_cvref_t<R>>, ranges::subrange(ranges::begin(r), ranges::end(r))), if such an expression is well-formed. Then sub_re models reconstructible_range if
ranges::begin(r) == ranges::begin(sub_re) is true, and
ranges::end(r) == ranges::end(sub_re) is true.

Insert a new sub-clause "§24.3.13 ranges::reconstruct [range.prim.recons]", after "§24.3.12 ranges::cdata [range.prim.cdata]":

24.3.12     ranges::reconstruct [range.prim.recons]

1 The name reconstruct denotes a customization point object.

2 The expression ranges::reconstruct(in_place_type<R>, I, S) for some type R and some sub-expressions I and S is expression-equivalent to:

(2.1) reconstruct(in_place_type<R>, std::move(I), std::move(S)) if it is a valid expression and R, I, and S model pair_reconstructible_range.
(2.2) Otherwise, reconstruct(in_place_type<R>, ranges::subrange<I, S>(std::move(I), std::move(S))) if it is a valid expression and R, I, and S model reconstructible_range.
(2.3) ranges::subrange<remove_cvref_t<decltype(I)>, remove_cvref_t<decltype(S)>>(std::move(I), std::move(S)) if it is a valid expression.
(2.4) Otherwise, ranges::reconstruct(std::in_place_type<R>, I, S) is ill-formed.

6. Acknowledgements

Thanks to Corentin Jabot, Christopher DiBella, and Hannes Hauswedell for pointing me to p1035 and p1739 to review both papers and combine some of their ideas in here. Thanks to Eric Niebler for prompting me to think of the generic, scalable solution to this problem rather than working on one-off fixes for individuals views.

Thank you to Oktal, Anointed of ADL, Blessed Among Us and Morwenn, the ever-watching Code Guardian for suggesting improvements to the current concept form.

References

Informative References

[N4835]
Richard Smith. Working Draft, Standard for Programming Language C++. 8 October 2019. URL: https://wg21.link/n4835
[P0009R9]
H. Carter Edwards, Bryce Adelstein Lelbach, Daniel Sunderland, David Hollman, Christian Trott, Mauro Bianco, Ben Sander, Athanasios Iliopoulos, John Michopoulos, Mark Hoemmen. mdspan: A Non-Owning Multidimensional Array Reference. 20 January 2019. URL: https://wg21.link/p0009r9
[P1255R4]
Steve Downey. A view of 0 or 1 elements: view::maybe. 17 June 2019. URL: https://wg21.link/p1255r4
[P1391R2]
Corentin Jabot. Range constructor for std::string_view. 17 June 2019. URL: https://wg21.link/p1391r2
[P1394R2]
Corentin Jabot, Casey Carter. Range constructor for std::span. 17 June 2019. URL: https://wg21.link/p1394r2
[P1739R0]
Hannes Hauswedell. Type erasure for forwarding ranges in combination with "subrange-y" view adaptors. 15 June 2019. URL: https://wg21.link/p1739r0
[RANGE-V3]
Eric Niebler; Casey Carter. range-v3. June 11th, 2019. URL: https://github.com/ericniebler/range-v3