ranges::rotate_copy
and ranges::reverse_copy
Document #: | P3709R1 [Latest] [Status] |
Date: | 2025-06-02 |
Project: | Programming Language C++ |
Audience: |
LEWG |
Reply-to: |
Ruslan Arutyunyan <ruslan.arutyunyan@intel.com> Alexey Kukanov <alexey.kukanov@intel.com> |
This paper proposes changing the return value type for parallel ranges::rotate_copy
and ranges::reverse_copy
in [P3179R9].
LEWG approved the “range-as-the-output” design aspect for our parallel range algorithms proposal [P3179R9]. It introduces the use of a range with its own bound as the output sequence, taking into account that the output size may be less than the input size. A common questions for such algorithms is which iterator to return for the input when the output size is not sufficient to hold the processed data?
For most cases the answer is pretty simple: we return the iterator
which points to the position right after the last processed element, or
one past the last. Consider std::ranges::copy
as an example:
::vector input{1,2,3,4,5};
std::vector<int> output(5); // output with sufficient size
std::vector<int> smaller_output(3); // output with insufficient size
std
auto res1 = std::ranges::copy(std::execution::par, input, output);
// after copy invocation res1.in == input.end() is true
auto res2 = std::ranges::copy(std::execution::par, input, smaller_output);
// after copy invocation res2.in == input.begin() + 3 is true
// res2.in points to the element equal to 4
It gives us consistent behavior with serial range algorithms when the output size is sufficient, and returns the stop point in the input when the output size is insufficient.
The proposed design in [P3179R9] follows the same logic
(returning the stop point for the input) for parallel ranges::rotate_copy
and ranges::reverse_copy
,
and both algorithms use the same return type as their serial range
counterparts. It’s consistent with the rest of the parallel range
algorithms that have an output. However, it leads to interesting
consequences:
reverse_copy
goes in the reverse
order, so the one past the last iterator for the input never equals to
last
, unless the output range is
empty.rotate_copy
goes over two
subranges within the input range, starting from
middle
. The one past the last
iterator for the input also never equals to
last
: it’s one of the iterators in
either [middle, last - 1]
or [first, middle]
.The proposed design in [P3179R9] allows users to know where the algorithm stops if the output size is not sufficient to hold all the elements from the input. The potential problem, however, comes from the combinations of three factors:
last
for both
reverse_copy
and
rotate_copy
, however parallel range
algorithms return a different iterator (as explained above) even if the
output size is sufficient.last
, if it
was calculated.To elaborate more on serial range algorithms: the current strategy
for the vast majority of them is returning as much calculated data as
possible. For example, ranges::for_each
returns the iterator equal to last
because that is what the algorithm calculated besides applying a
callable object element-wise. This is useful information because users
might not have such an iterator before the algorithm call; they might
only have a sentinel.
Furthermore, in the future we expect serial range algorithms with
“range-as-the-output” that support not only
random_access_iterator
. For such
algorithms it totally makes sense to return both the input stop
point and the input last
, if
the latter was calculated. For the vast majority of algorithms with the
output, the input last
either cannot
be calculated (when the output size is insufficient) or matches the stop
point. For reverse_copy
and
rotate_copy
this is not true. We
need to keep that in mind when designing these algorithms, because it
would be very unfortunate if serial “range-as-the-output” algorithms end
up returning something different from parallel ones. Thus, the design of
the parallel range algorithms needs to accept that they should return
more information for random_access_<iterator|range>
than seems necessary.
One of the design goals for [P3179R9] is to keep the same return
type for the proposed algorithms compared to existing range algorithms,
where possible. However, there is a clear indication that we need to do
something different for reverse_copy
and rotate_copy
.
rotate_copy
recommended designIn [P3179R9]
rotate_copy
returns the iterator
past the last copied element for the input. By design, it never returns
last
, and it returns
middle
for two scenarios:
The raised concern is that users might be surprised at runtime by a completely different return value after switching to parallel execution. Moreover, the actual reason for the different value is not an execution policy but the way how the output range is passed in, which might be viewed as a purely syntactical modification. Consider the following example:
iterator-as-the-output
|
range-as-the-output
|
---|---|
|
|
Please note, that in the table above we don’t use algorithm overloads
with execution policy. Instead, we show the future serial range
algorithms with “range-as-the-output”. That’s because we want to use a
non-common_range
,
non-random_access_range
, so that
calculating last
would make sense.
Parallel range algorithms require
random_access_range
, so the code
would fail to compile if written with execution policy.
While we could address the concern for
rotate_copy
if we jump to the end to
return last
when the output size is
sufficient, there would be two problems with that approach:
last
as the
iterator but still not return it, if the output size is insufficient.
See Introduction for more information.So, if we don’t want to surprise users with the different behavior of
parallel rotate_copy
at runtime, we
need to make the code fail to compile when the returned value for the
existing rotate_copy
is stored and
used. We think that users likely store the return value from range
algorithms as
auto
type,
so the names of the publicly accessible fields should also be different
for the result type of the “range-as-the-output”
rotate_copy
.
Since rotate_copy
“swaps” two
subranges - from middle
to
last
and from
first
to
middle
- we want to tell how far the
algorithm progressed for both of them, thus we propose to change the
result type to “alias-ed” ranges::in_in_out_result
.
Its first data member, in1
, contains
one past the last processed iterator for [middle, last)
,
which is either the stop point or equals to
last
if this subrange was fully
copied. Similarly, in2
contains one
past the last processed iterator for [first, middle)
,
which might equal to first
if the
processing has not reached this subrange, otherwise it is the stop point
up to and including middle
.
We could introduce some new structure because currently
in_in_out_result
is only used for
the algorithms with two input ranges. However, we don’t see a reason to
introduce yet another type since we think that
in_in_out_result
is perfectly
applicable for the discovered use case.
The updated signatures for parallel ranges::rotate_copy
are:
template <class In, class Out>
using rotate_copy_truncated_result = std::ranges::in_in_out_result<In, In, Out>;
template<execution-policy Ep, random_access_iterator I, sized_sentinel_for<I> S,
<O> OutS>
random_access_iterator O, sized_sentinel_forrequires indirectly_copyable<I, O>
::rotate_copy_truncated_result<I, O>
ranges::rotate_copy(Ep&& exec, I first, I middle, S last, O result, OutS result_last);
ranges
template<execution-policy Ep, sized-random-access-range R, sized-random-access-range OutR>
requires indirectly_copyable<iterator_t<R>, iterator_t<OutR>>
::rotate_copy_truncated_result<borrowed_iterator_t<R>, borrowed_iterator_t<OutR>>
ranges::rotate_copy(Ep&& exec, R&& r, iterator_t<R> middle, OutR&& result_r); ranges
In our opinion, the design above addresses all the concerns because:
in_out_result
the data member
for input is named in
, while in
in_in_out_result
the names for input
are in1
and
in2
, so we achieve the goal for the
code to fail to compile when one uses the output and switches to the
parallel overload, even if users store the result as
auto
type
(but see a caveat below).rotate_copy
returns
last
, calculated as an
iterator, and middle
as the stop
point, when the output size is sufficient,middle
to
last
and from
first
to
middle
) when the output size is less
than the input one.A possible implementation of the envisioned serial algorithm:
struct rotate_copy_fn
{
template<std::forward_iterator I, std::sentinel_for<I> S, std::forward_iterator O, std::sentinel_for<O> OutS>
requires std::indirectly_copyable<I, O>
constexpr std::ranges::rotate_copy_truncated_result<I, O>
operator()(I first, I middle, S last, O result, OutS result_last) const
{
while (middle != last && result != result_last)
{
*result = *middle;
++middle;
++result;
}
while (first != middle && result != result_last)
{
*result = *first;
++first;
++result;
}
return {std::move(middle), std::move(first), std::move(result)};
}
template<std::ranges::forward_range R, std::ranges::forward_range OutR>
requires std::indirectly_copyable<std::ranges::iterator_t<R>, std::ranges::iterator_t<OutR>>
constexpr std::ranges::rotate_copy_truncated_result<std::ranges::borrowed_iterator_t<R>, std::ranges::borrowed_iterator_t<OutR>>
operator()(R&& r, std::ranges::iterator_t<R> middle, OutR&& result_r) const
{
return (*this)(std::ranges::begin(r), std::move(middle),
::ranges::end(r), std::ranges::begin(result_r), std::ranges::end(result_r));
std}
};
inline constexpr rotate_copy_fn rotate_copy{};
Please note that we propose the following order for returned iterators:
in1
as a stop point in
[middle
,
last
].in2
as a stop point in
[first
,
middle
].Even though both belong to the same input range, those are just stop
points in two subranges, so we don’t want users to think of them as of a
valid std::ranges::subrange
.
Caveat: Since C++26 we can imagine a code that compiles for both serial range algorithms and the parallel ones even with the proposed design in this paper, because of structured binding with a pack ([P1061R10]).
Let’s compare:
Serial ranges::rotate_copy
|
Parallel ranges::rotate_copy
with the proposed design
|
---|---|
|
|
We can imagine such code if people want to only store the first or
the last data member of the returned object and ignore the rest. With
the code shown above it’s even more important that
in1
is a stop point in
[middle
,
last
] because, assuming that the
size of out
is sufficient for the
whole input, this code compiles and works at runtime as expected. The
same can be said if the structured binding is written as auto [...rest, out_last]
to get the iterator to the output.
reverse_copy
recommended designFor reverse_copy
the proposed
algorithm in [P3179R9] always returns the iterator to
the last copied element (in the reversed order) for the input. There is
no “dual meaning” for that returned iterator. In isolation it seems
perfectly fine, however two concerns come in mind:
last
), when
the output size is sufficient.ranges::reverse_copy
with “range-as-the-output” we would calculate
last
as the iterator but not return
that. See Introduction for more
information.Consider the following example:
iterator-as-the-output
reverse_copy
|
range-as-the-output
reverse_copy
|
---|---|
|
|
Essentially, the story is similar with
rotate_copy
. All the following
ruminations also apply for parallel
reverse_copy
:
in_in_out_result
std::ranges::subrange
The solution we see is also similar. We propose to return “alias-ed”
ranges::in_in_out_result
:
in1
is always the iterator equal
to last
.in2
represents the stop point,
as proposed in [P3179R9].The updated signatures for parallel ranges::reverse_copy
are:
template <class In, class Out>
using reverse_copy_truncated_result = std::ranges::in_in_out_result<In, In, Out>;
template<execution-policy Ep, random_access_iterator I, sized_sentinel_for<I> S,
<O> OutS>
random_access_iterator O, sized_sentinel_forrequires indirectly_copyable<I, O>
::reverse_copy_truncated_result<I, O>
ranges::reverse_copy(Ep&& exec, I first, S last, O result, OutS result_last);
ranges
template<execution-policy Ep, sized-random-access-range R, sized-random-access-range OutR>
requires indirectly_copyable<iterator_t<R>, iterator_t<OutR>>
::reverse_copy_truncated_result<borrowed_iterator_t<R>, borrowed_iterator_t<OutR>>
ranges::reverse_copy(Ep&& exec, R&& r, OutR&& result_r); ranges
A possible implementation of the envisioned serial algorithm:
struct reverse_copy_fn
{
template<std::bidirectional_iterator I, std::sentinel_for<I> S, std::forward_iterator O, std::sentinel_for<O> OutS>
requires std::indirectly_copyable<I, O>
constexpr std::ranges::reverse_copy_truncated_result<I, O>
operator()(I first, S last, O result, OutS result_last) const
{
auto last_iter = std::ranges::next(first, last);
auto pos = last_iter;
while (pos != first && result != result_last)
{
*result = *--pos;
++result;
}
return {last_iter, pos, result};
}
template<std::ranges::bidirectional_range R, std::ranges::forward_range OutR>
requires std::indirectly_copyable<std::ranges::iterator_t<R>, std::ranges::iterator_t<OutR>>
constexpr std::ranges::reverse_copy_truncated_result<std::ranges::borrowed_iterator_t<R>, std::ranges::borrowed_iterator_t<OutR>>
operator()(R&& r, OutR&& result_r) const
{
return (*this)(std::ranges::begin(r), std::ranges::end(r),
::ranges::begin(result_r), std::ranges::end(result_r));
std}
};
inline constexpr reverse_copy_fn reverse_copy{};
There are the following alternatives to the proposed design:
last
for future
serial range algorithms with “range-as-the-output”.reverse_copy
as proposed in
[P3179R9], return
last
for parallel
rotate_copy
when the output is
sufficient. The concerns are:
reverse_copy
the concerns
from the previous bullets apply.rotate_copy
becomes inconsistent
with other parallel range algorithms with “range-as-the-output” because
it can “jump” to last
instead of
returning the stop point.rotate_copy
calculates last
but does not return
it, so it likely means a different return type for serial
“range-as-the-output” rotate_copy
in
the future.We think these alternatives would be worse. If this proposal does not
have a consensus, authors’ preference is to remove parallel
reverse_copy
and
rotate_copy
from [P3179R9] and add them later, possibly
together with the serial “range-as-the-output” algorithms.
The diff shown below is against [P3179R9].
namespace ranges {
template<class I, class O>
using reverse_copy_result = in_out_result<I, O>;+ template<class I, class O>
+ using reverse_copy_truncated_result = in_in_out_result<I, I, O>;
template<bidirectional_iterator I, sentinel_for<I> S, weakly_incrementable O>
requires indirectly_copyable<I, O>
constexpr reverse_copy_result<I, O>
reverse_copy(I first, S last, O result);
template<bidirectional_range R, weakly_incrementable O>
requires indirectly_copyable<iterator_t<R>, O>
constexpr reverse_copy_result<borrowed_iterator_t<R>, O>
reverse_copy(R&& r, O result);
template<execution-policy Ep, random_access_iterator I, sized_sentinel_for<I> S,
random_access_iterator O, sized_sentinel_for<O> OutS>
requires indirectly_copyable<I, O>- reverse_copy_result<I, O>
+ reverse_copy_truncated_result<I, O>
reverse_copy(Ep&& exec, I first, S last, O result, OutS result_last); // freestanding-deleted
template<execution-policy Ep, sized-random-access-range R, sized-random-access-range OutR>
requires indirectly_copyable<iterator_t<R>, iterator_t<OutR>>- reverse_copy_result<borrowed_iterator_t<R>, borrowed_iterator_t<OutR>>
+ reverse_copy_truncated_result<borrowed_iterator_t<R>, borrowed_iterator_t<OutR>>
reverse_copy(Ep&& exec, R&& r, OutR&& result_r); // freestanding-deleted
}
namespace ranges {
template<class I, class O>
using rotate_copy_result = in_out_result<I, O>;+ template<class I, class O>
+ using rotate_copy_truncated_result = in_in_out_result<I, I, O>;
template<forward_iterator I, sentinel_for<I> S, weakly_incrementable O>
requires indirectly_copyable<I, O>
constexpr rotate_copy_result<I, O>
rotate_copy(I first, I middle, S last, O result);
template<forward_range R, weakly_incrementable O>
requires indirectly_copyable<iterator_t<R>, O>
constexpr rotate_copy_result<borrowed_iterator_t<R>, O>
rotate_copy(R&& r, iterator_t<R> middle, O result);
template<execution-policy Ep, random_access_iterator I, sized_sentinel_for<I> S,
random_access_iterator O, sized_sentinel_for<O> OutS>
requires indirectly_copyable<I, O>- ranges::rotate_copy_result<I, O>
+ ranges::rotate_copy_truncated_result<I, O>
ranges::rotate_copy(Ep&& exec, I first, I middle, S last, O result, OutS result_last); // freestanding-deleted
template<execution-policy Ep, sized-random-access-range R, sized-random-access-range OutR>
requires indirectly_copyable<iterator_t<R>, iterator_t<OutR>>- ranges::rotate_copy_result<borrowed_iterator_t<R>, borrowed_iterator_t<OutR>>
+ ranges::rotate_copy_truncated_result<borrowed_iterator_t<R>, borrowed_iterator_t<OutR>>
ranges::rotate_copy(Ep&& exec, R&& r, iterator_t<R> middle, OutR&& result_r); // freestanding-deleted }
template<bidirectional_iterator I, sentinel_for<I> S, weakly_incrementable O>
requires indirectly_copyable<I, O>
constexpr ranges::reverse_copy_result<I, O>
::reverse_copy(I first, S last, O result);
rangestemplate<bidirectional_range R, weakly_incrementable O>
requires indirectly_copyable<iterator_t<R>, O>
constexpr ranges::reverse_copy_result<borrowed_iterator_t<R>, O>
::reverse_copy(R&& r, O result); ranges
5
Let N
be
last - first
.
6
Preconditions: The ranges [first, last)
and [result, result + N)
do not overlap.
7
Effects: Copies the range [first, last)
to the range [result, result + N)
such that for every non-negative integer
i < N
the following assignment takes place: *(result + N - 1 - i) = *(first + i)
.
8 Returns:
result + N
for the overloads in namespace
std
.{last, result + N}
for the overloads in namespace
ranges
.9
Complexity: Exactly
N
assignments.
template<execution-policy Ep, random_access_iterator I, sized_sentinel_for<I> S,
<O> OutS>
random_access_iterator O, sized_sentinel_forrequires indirectly_copyable<I, O>
::reverse_copy_truncated_result<I, O>
ranges::reverse_copy(Ep&& exec, I first, S last, O result, OutS result_last);
rangestemplate<execution-policy Ep, sized-random-access-range R, sized-random-access-range OutR>
requires indirectly_copyable<iterator_t<R>, iterator_t<OutR>>
::reverse_copy_truncated_result<borrowed_iterator_t<R>, borrowed_iterator_t<OutR>>
ranges::reverse_copy(Ep&& exec, R&& r, OutR&& result_r); ranges
x
Let N
be
min(last - first
,
result_last - result
),
and let NEW_FIRST
be first + (last - first) - N
.
x
Preconditions: The ranges [first, last)
and [result, result + N)
do not overlap.
x
Effects: Copies the range [NEW_FIRST, last)
to the range [result, result + N)
such that for every non-negative integer
i < N
the following assignment takes place: *(result + N - 1 - i) = *(NEW_FIRST + i)
.
x
Returns: {
.last,
NEW_FIRST, result + N}
[ Note: While the return type for the
parallel and non-parallel algorithm overloads in the namespace
ranges
is the same
the semantics is different because the parallel range algorithm
overloads
result_last - result
can be insufficient to copy all data from the input. — end
note ]
x
Complexity: Exactly
N
assignments.
template<forward_iterator I, sentinel_for<I> S, weakly_incrementable O>
requires indirectly_copyable<I, O>
constexpr ranges::rotate_copy_result<I, O>
::rotate_copy(I first, I middle, S last, O result); ranges
6
Let N
be
last - first
.
7
Preconditions: [first, middle)
and [middle, last)
are valid ranges. The ranges [first, last)
and [result, result + N)
do not overlap.
8
Effects: Copies the range [first, last)
to the range [result, result + N)
such that for each non-negative integer
i < N
the following assignment takes place: *(result + i) = *(first + (i + (middle - first)) % N)
.
9 Returns:
result + N
for the overloads in namespace
std
.{last, result + N}
for the overload in namespace
ranges
.10
Complexity: Exactly
N
assignments.
template<execution-policy Ep, random_access_iterator I, sized_sentinel_for<I> S,
<O> OutS>
random_access_iterator O, sized_sentinel_forrequires indirectly_copyable<I, O>
::rotate_copy_truncated_result<I, O>
ranges::rotate_copy(Ep&& exec, I first, I middle, S last, O result, OutS result_last); ranges
x
Let M
be
last - first
and N
be
min(M
, result_last - result
).
x
Preconditions: [first, middle)
and [middle, last)
are valid ranges. The ranges [first, last)
and [result, result + N)
do not overlap.
x
Effects: Copies the range [first, last)
to the range [result, result + N)
such that for each non-negative integer
i < N
the following assignment takes place: *(result + i) = *(first + (i + (middle - first)) % M)
.
x
Returns:{first + (N + (middle - first)) % M, result + N}
.
[ Note: While the return type for the
parallel and non-parallel algorithm overloads in the namespace
ranges
is the same
the semantics is different because the parallel range algorithm
overloads
result_last - result
can be insufficient to copy all data from the input. — end
note ]
x
Complexity: Exactly
N
assignments.
template<forward_range R, weakly_incrementable O>
requires indirectly_copyable<iterator_t<R>, O>
constexpr ranges::rotate_copy_result<borrowed_iterator_t<R>, O>
::rotate_copy(R&& r, iterator_t<R> middle, O result); ranges
Effects: Equivalent to: return ranges::rotate_copy(ranges::begin(r), middle, ranges::end(r), std::move(result));
template<execution-policy Ep, sized-random-access-range R, sized-random-access-range OutR>
requires indirectly_copyable<iterator_t<R>, iterator_t<OutR>>
::rotate_copy_truncated_result<borrowed_iterator_t<R>, borrowed_iterator_t<OutR>>
ranges::rotate_copy(Ep&& exec, R&& r, iterator_t<R> middle, OutR&& result_r); ranges
Effects: Equivalent to: return ranges::rotate_copy(std::forward<Ep>(exec), ranges::begin(r), middle, ranges::end(r), ranges::begin(result_r), ranges::end(result_r));
reverse_copy
and
rotate_copy
return type.