| Document #: | P3826R0 [Latest] [Status] |
| Date: | 2025-10-05 |
| Project: | Programming Language C++ |
| Audience: |
SG1 Concurrency and Parallelism Working Group LEWG Library Evolution Working Group LWG Library Working Group |
| Reply-to: |
Eric Niebler <eric.niebler@gmail.com> |
In the current Working Draft, 33 [exec] has sender algorithms that are customizable. While the sender/receiver concepts and the algorithms themselves have been stable for several years now, the customization mechanism has seen a fair bit of recent churn. [P3718R0] is the latest effort to shore up the mechanism. Unfortunately, there are gaps in its proposed resolution. This paper details those gaps.
The problems are fixable although the fixes are non-trivial. The time for elaborate fixes has passed. This paper proposes to remove the ability to customize sender algorithms for C++26. A future paper will propose to add the feature back post-’26.
The author feels that postponing the feature will be less disruptive and safer than trying to patch it at the last minute. Most common usages of sender/receiver will not be affected.
[P3718R0] identifies real problems with the status quo of sender algorithm customization. It proposes using information from the sender about where it will complete during “early” customization, which happens when a sender algorithm constructs and returns a sender; and it proposes using information from the receiver about where the operation will start during “late” customization, when the sender and the receiver are connected.
The problem with this separation of responsibilities is that many
senders do not know where they will complete until they know where they
will be started. A simple example is the
just()
sender; it completes inline wherever it is started. And the information
about where a sender will start is not known during early customization,
when the sender is being asked for this information.
For the expression then(sndr, fn)
for example, if the then CPO asks
sndr where it will complete,
sndr might not be able to answer, in
which case no “early” customization is performed. And during “late”
(connect-time)
customization, only the receiver’s information about where the operation
will start is used to find a customization. Presumably an algorithm like
then(sndr, fn)
would want to dispatch based on where the function
fn will execute, but for some
expressions that information cannot be determined with the API proposed
in P3718.
An illustrative example is:
namespace ex = std::execution; auto sndr = ex::starts_on(gpu, ex::just()) | ex::then(fn); std::this_thread::sync_wait(std::move(sndr));
… where gpu is a scheduler that
runs work (unsurprisingly) on a GPU.
fn will execute on the GPU, so a
GPU implementation of then should be
used. By the proposed resolution of P3718, algorithm customization
proceeds as follows:
During early customization, when starts_on(gpu, just()) | then(fn)
is executing, the then CPO asks the
starts_on(gpu, just())
sender where it will complete as if by:
auto&&tmp1= ex::starts_on(gpu, ex::just()); autodom1= ex::get_domain(ex::get_env(tmp1));
The starts_on sender will in
turn ask the
just()
sender, as if by:
auto&&tmp2= ex::just(); autodom2= ex::get_domain(ex::get_env(tmp2));
As discussed, the
just()
sender doesn’t know where it will complete until it knows where it will
be started, but that information is not yet available. As a result,
dom2 ends up as
default_domain, which is then
reported as the domain for the
starts_on sender. That’s incorrect.
The starts_on sender will complete
on the GPU.
The then CPO uses
default_domain to find an
implementation of the then
algorithm, which will find the default implementation. As a result, the
then CPO returns an ordinary
then sender.
When that then sender is
connected to sync_wait’s receiver,
late customization happens.
connect asks
sync_wait’s receiver where the
then sender will be started. It does
that with get_domain(get_env(rcvr)).
sync_wait starts operations on the
current thread, so the get_domain
query will return default_domain. As
with early customization, late customization will also not find a GPU
implementation.
The end result of all of this is that a default (which is effectively
a CPU) implementation will be used to evaluate the
then algorithm on the GPU. That is a
bad state of affairs.
OK, so there is a problem. What do we do? There are a number of different options.
std::execution
additionsAlthough the safest option, I hope most agree that such a drastic
step is not warranted by this issue. Pulling the
sender abstraction and everything
that depends on it would result in the removal of:
The sender/receiver-related concepts and customization points, without which the ecosystem will have no shared async abstraction, and which will set back the adoption of structured concurrency three years.
The sender algorithms, which capture common async patterns and make them reusable,
execution::counting_scope
and execution::simple_counting_scope,
and related features for incremental adoption of structured
concurrency,
execution::parallel_scheduler
and all of its related APIs, and
execution::task
and execution::task_scheduler
(C++26 will still not have a standard coroutine task type
<heavy sigh>).
This option should only be considered if all the other options are determined to have unacceptable risk.
This option would keep all of the above library components with the exception of the customizable sender algorithms:
then,
upon_error,
upon_stoppedlet_value,
let_error,
let_stoppedbulk,
bulk_chunked,
bulk_unchunkedstarts_on,
continues_on,
onwhen_all,
when_all_with_variantstopped_as_optional,
stopped_as_errorinto_variantsync_waitaffine_onThis would leave users with no easy standard way to start work on a given execution context, or transition to another execution context, or to execute work in parallel, or to wait for work to finish.
In fact, without the bulk
algorithms, we leave no way for the
parallel_scheduler to execute work
in parallel!
While still delivering a standard async abstraction with minimal risk, the loss of the algorithms would make it just an abstraction. Like coroutines, adoption of senders as an async lingua franca will be hampered by lack of standard library support.
This is the option this paper proposes. We ship everything currently in the Working Draft but remove the ability to customize the algorithms. This gives us a free hand to design a better customization mechanism for C++29 – provided we have high confidence that those new customization hooks can be added without break existing behavior.
A fair question is: how can we have such certainty when we do not know what the customization hooks are yet?
To answer that question for myself, I implemented new customization hooks here that address the known issues. Using that design (described in Appendix A: The planned fix) as a polestar, this paper proposes wording to remove customization in such a way that will let us add it back later without breakage.
My experience implementing the solution gives me confidence that we can introduce that solution or one like it later without compatibility problems.
This option is not as reckless as it sounds. I describe the shape of a possible fix in Appendix A: The planned fix. It would not be the first time the Committee shipped a standard with known defects, and the DR process exists for just this purpose.
What gives me pause, however, is the fact that I have “fixed” this problem before only to find that my fix is broken, and not just once!
I have implemented my planned fix, and it seems to work, but it has not seen any real-world usage. In short, my confidence is not high enough to endorse this solution.
Should someone with sufficient interest come and vet my solution, I might change my mind. Shipping it as-is is certainly the least amount of work for everyone involved.
Removing algorithm customization is fairly straightforward in most
regards, but there are a few parts of std::execution
that need special care.
The parallel_scheduler goes to
great lengths to ensure that the
bulk family of algorithms –
bulk,
bulk_chunked, and
bulk_unchunked – are executed in
parallel when the users requests it and when the underlying execution
context supports it.
To that end, the
parallel_scheduler “provides a
customized implementation” of the
bulk_chunked and
bulk_unchunked algorithms, but
nothing is said about how those custom implementations are found or
under what circumstances users can be assured that the
parallel_scheduler will use them.
Arguably, this is under-specified in the current Working Draft and
should be addressed whether this paper is accepted or not.
We have to give users a guarantee that if X, Y,
and Z conditions are met, bulk[_[un]chunked]
will be run in parallel with absolute certainty.
One solution is to say that the
bulk algorithms are guaranteed to
execute in parallel when the immediate predecessor of the
bulk operation is known to complete
on the parallel_scheduler. In a
sender expression such as the following:
sndr | std::execution::bulk(std::par, 1024, fn)
If sndr’s attributes advertizes a
completion scheduler of type
parallel_scheduler, then we can
guarantee that the bulk operation
will execute in parallel. Implementations can choose to parallelize
bulk under other circumstances, but
we require this one.
The implication of offering this guarantee is that we must preserve the guarantee going forward. Any new customization mechanism we might add must never result in parallel execution becoming serialized.
The reverse is not necessarily true though. I maintain that a future
change that parallelizes a bulk
algorithm that formerly executed serially on the
parallel_scheduler is an acceptable
change of behavior.
If SG1 or LEWG disagrees, there are ways to avoid even this behavior change.
Library issue #4336 describes the
poor interaction between
task_scheduler, a type-erased
scheduler, and the bulk family of
algorithms; namely, that the
task_scheduler always executes
bulk in serial, even when it is
wrapping a parallel_scheduler.
This is not a problem caused by the customization mechanism, but it is something that can be addressed as part of the customization removal process.
When we address that issue, we must avoid the
parallel_scheduler pitfall by
under-specifying the interaction with
bulk. As with
parallel_scheduler, users must have
a guarantee about the conditions under which
bulk is accelerated on a
task_scheduler.
Fortunately, the
parallel_scheduler has already given
us a way to punch the bulk_chunked
and bulk_unchunked algorithms
through a type-erased API boundary:
parallel_scheduler_backend
(33.16.3
[exec.sysctxrepl.psb]).
By specifying the behavior of
task_scheduler in terms of
parallel_scheduler_backend and
bulk_item_receiver_proxy, we can
give task_scheduler the ability to
parallelize bulk without having to
invent a new mechanism.
bulk algorithmsFew users will ever have a need to customize an algorithm like
then or
let_value. The
bulk algorithms are a different
story. Anybody with a custom thread pool will benefit from a custom
bulk implementation that can run in
parallel on the thread pool. The loss of algorithm customization is
particularly painful in this area. This section explores some options to
address these concerns and makes a recommendation.
bulk,
bulk_chunked, and
bulk_unchunkedThis option cuts the Gordian knot, but comes at a high cost. The
parallel_scheduler can hardly be
called “parallel” if it does not offer a way to execute work in
parallel, so cutting the bulk
algorithms probably means cutting
parallel_scheduler also.
In this option, we keep the bulk
algorithms and the
parallel_scheduler, and we say that
the bulk algorithms are executed in
parallel on the parallel_scheduler
(and on a task_scheduler that wraps
a parallel_scheduler), but we leave
the mechanism unspecified.
This option is essentially the status quo, except that as
discussed in The parallel
scheduler, this aspect of the
parallel_scheduler is currently
under-specified. The referenced section proposes a path forward.
A variant of this option is to specify an exposition-only mechanism
whereby bulk gets parallelized.
This option makes
parallel_scheduler and
task_scheduler “magic” with respect
to the bulk algorithms. End users
would have no standard mechanism to parallelize
bulk on their own third-party thread
pools in C++26.
This is the approach taken by the Proposed wording below.
bulk*
algorithms onlyIn this option, we reintroduce algorithm customization with a
special-purpose API just for the
bulk algorithms. For example, a
scheduler might have an optional sch.bulk_transform(sndr, env)
that turns a serial
bulk* sender
into one that executes in parallel on scheduler
sch. Whenever a
bulk* sender
is passed to
connect,
connect can
check the sender’s predecessor for a completion scheduler that defines
bulk_transform and uses it if
found.
The downside of this approach is that we will still have to support this API even when a more general algorithm customization mechanism is available. That doesn’t seem terribly onerous to me, but that is for SG1/LEWG to decide.
Without algorithm customization, manufacturers of special-purpose hardware accelerators will not be able to ship a scheduler that both:
works with any standard-conforming implementation of std::execution,
and
performs optimally on their hardware for all of the standard algorithms.
See Mitigating factors for some reasons why this is not as terrible as it sounds.
The loss of direct support for sender algorithm customization is a
blow to power users of std::execution,
but there are a few factors that mitigate the blow.
All of the senders returned from the standard algorithms are self-describing and can be unpacked into their constituent parts with structured bindings. A sufficiently motivated user can “customize” an algorithm by writing a recursive sender tree transformation, explicitly transforming senders before launching them.
The sender concepts and customization points make it possible for
users to write their own sender algorithms that interoperate with the
standard ones. If a user wants to change the behavior of the
then algorithm in some way, they
have the option of writing their own and using it instead. I expect
libraries of third-party algorithms to appear on GitHub in time, as they
tend to.
Some execution contexts place extra-standard requirements on the code
that executes on them. For example, NVIDIA GPUs require
device-accelerated code to be annotated with its proprietary
__device__ annotation. Standard
libraries are unlikely to ship implementations of std::execution
with such annotations. The consequence is that, rather than shipping
just a GPU scheduler with some algorithm customizations, a vendor like
NVIDIA is already committed to shipping its own complete implementation
of std::execution (in
a different namespace, of course).
For such vendors, the inability to customize standard algorithms is a moot point. Since it is implementing the standard algorithms, the implementations can do whatever they want.
The approach to removing sender algorithm customization is twofold:
Remove those components that facilitate algorithm customization and their uses where it is easy to do so.
In all other cases, turn normative mechanisms into non-normative ones so we can change them later. This results in smaller and safer wording changes and preserves the already agreed-upon semantics in a way that is easy to verify.
The steps for removing algorithm customization are detailed below.
Remove the type
default_domain (33.9.5
[exec.domain.default]).
Remove the functions:
transform_sender (33.9.6
[exec.snd.transform]),transform_env (33.9.7
[exec.snd.transform.env]),
andapply_sender (33.9.8
[exec.snd.apply]).Remove the query object
get_domain (33.5.5
[exec.get.domain]).
Remove the exposition-only helpers:
completion-domain
(33.9.2
[exec.snd.expos]/8-9),get-domain-early
(33.9.2
[exec.snd.expos]/13),
andget-domain-late
(33.9.2
[exec.snd.expos]/14).Change the functions
get_completion_signatures
(33.9.9
[exec.getcomplsigs])
and connect
(33.9.10
[exec.connect]) to
operate on a sender determined as follows instead of passing the sender
through transform_sender:
If the sender has a tag with an exposition-only transform-sender
member function, pass the sender to this function with the receiver’s
environment and continue the operation on the resulting sender. This
preserves the behavior of calling
transform_sender with the
default_domain.
Otherwise, perform the operation on the passed-in sender.
For the following algorithms that are currently expressed in
terms of a sender transformation to a lowered form, move the lowering
from alg.transform_sender(sndr, env)
to alg.transform-sender(sndr, env).
starts_on (33.9.12.5
[exec.starts.on]),continues_on (33.9.12.6
[exec.continues.on]),on (33.9.12.8
[exec.on]),bulk (33.9.12.11
[exec.bulk]),when_all_with_variant
(33.9.12.12
[exec.when.all]),stopped_as_optional
(33.9.12.14
[exec.stopped.opt]),
andstopped_as_error
(33.9.12.15
[exec.stopped.err]).For each sender adaptor algorithm in 33.9.12
[exec.adapt] that is
specified to be expression-equivalent to some
transform_sender invocation of the
form:
transform_sender(some-computed-domain(),make-sender(tag, {args...}, sndr));
Change the expression to:
make-sender(tag, {args...}, sndr);
For example, in 33.9.12.6 [exec.continues.on]/3, the following:
transform_sender(get-domain-early(sndr),make-sender(continues_on, sch, sndr))
would be changed to:
make-sender(continues_on, sch, sndr)
Additionally, if there is some caveat of the form “except that
sndr is evaluated only once,” that
caveat should be removed as appropriate.
Merge the schedule_from
(33.9.12.7
[exec.schedule.from])
and continues_on (33.9.12.6
[exec.continues.on])
algorithms into one algorithm called
continues_on. (Currently they are
separate so that they can be customized independently; by default
continues_on merely dispatches to
schedule_from.)
Change 33.9.13.1
[exec.sync.wait]
and 33.9.13.2
[exec.sync.wait.var]
to dispatch directly to their default implementations instead of
computing a domain and using
apply_sender to dispatch to an
implementation.
Fix a bug in the on(sndr, sch, closure)
algorithm where a write_env is
incorrectly changing the “current” scheduler before its child
continues_on actually transfers to
that scheduler. continues_on needs
to know the scheduler on which it will be started in order to find
customizations correctly in the future.
Tweak the wording of
parallel_scheduler (33.15
[exec.par.scheduler])
to indicate that it
(parallel_scheduler) is permitted to
run the bulk family of algorithms in
parallel in accordance with those algorithms’ semantics, rather than
suggesting that those algorithms are “customized” for
parallel_scheduler. The mechanism
for such remains non-normative, however we specify the conditions under
which the parallel_scheduler is
guaranteed to run the bulk
algorithms in parallel. (This is currently under-specified.)
Respecify task_scheduler in
terms of parallel_scheduler_backend
so that the bulk algorithms can be
accelerated despite task_scheduler’s
type-erasure. This addresses LWG#4336. As with
parallel_scheduler, we specify the
conditions under which
task_scheduler is guaranteed to run
the bulk algorithms in
parallel.
From the scheduler concept,
remove the required expression:
{ auto(get_completion_scheduler<set_value_t>(get_env(schedule(std::forward<Sch>(sch))))) } -> same_as<remove_cvref_t<Sch>>;
Instead, add a semantic requirement that if the above
expression is well-formed, then it shall compare equal to
sch. Additionally, require that that
expression is well-formed for the
parallel_scheduler, the
task_scheduler, and
run_loop’s scheduler, but not
inline_scheduler. See inline_scheduler for the motivation behind
these changes, but in short: the
inline_scheduler does not know where
it completes in C++26 but will in C++29.
Optional, but recommended: Change the env<>::query
member function to accept optional additional arguments after the query
tag. This restores the original design of
env to that which was first proposed
in [P3325R1] and which was approved by LEWG
straw poll in St Louis. As described in Restoring algorithm
customization in C++29, when asking a sender for its completion
scheduler, the caller needs to pass extra information about where the
operation will be started, and that will require env<>::query
to accept extra arguments.
This is admittedly a lot of changes, but the first 9 changes represent a simplification from the status quo, and the other changes are either neutral in terms of specification or else correct an existing Library issue.
In the final accounting, the result of these changes will be a vastly simpler specification for [exec].
For C++29, we want the sender algorithms in std::execution to
be customizable, with different implementations suited for different
execution contexts. If we remove customization for C++26, how do we add
it back without breaking code?
Recall that many senders do not know where they will complete until they know where they will be started, and that information is not currently provided when the sender is queried for its completion scheduler. This is the shoal on which algorithm customization has foundered, because without accurate information about where operations are executing, it is impossible to pick the right algorithm implementation.
Once the problem is stated plainly, the fix (or at least a major part of it) is obvious:
When asking the sender where it will complete, tell it where it will start.
The implication of this is that so-called “early” customization, performed when constructing a sender, will not be coming back. The receiver’s execution environment is not known when constructing a sender. C++29 will bring back “late” customization only.
A paper targetting C++29 will propose that we extend the
get_completion_scheduler query to
support an optional environment argument. Given a sender
S and receiver
R, the query would look like:
// Pass the sender's attributes and the receiver's environment when computing // the completion scheduler: auto sch = get_completion_scheduler<set_value_t>(get_env(S), get_env(R));
It will not be possible in C++26 to pass the receiver’s environment in this way, making this a conforming extension since it would not change the meaning of any existing code.
This change will also make it possible to provide a completion
scheduler for the error channel in more cases. That is often not
possible today since many errors are reported inline on the context on
which the operation is started. The receiver’s environment knows where
the operation will be started, so by passing it to the get_completion_scheduler<set_error_t>
query, the error completion scheduler is knowable.
Note The paragraph above makes it sound like this
would be changing the behavior for the get_completion_scheduler<set_error_t>(get_env(sndr))
query. But that expression will behave as it always has. Only when
called with the receiver’s environment will any new behavior manifest;
hence, this change is a pure extension.
By the way, this extension to
get_completion_scheduler motivates
the change to env<>::query
described above in The removal
process. Although we could decide to defer that change until it is
needed in C++29, it seems best to me to make the change now.
There are sender expressions that complete on an indeterminate
scheduler based on runtime factors;
when_all is a good example. This is
the problem the get_domain query
solved. So long as all of when_all’s
child senders share a common domain tag – a property of the scheduler –
we know the domain on which the
when_all operation will complete,
even though we do not know which scheduler it will complete on. The
domain controls algorithm selection, not the scheduler
directly.
So the plan will be to bring back a
get_domain query in C++29.
Additionally, just as it is necessary to have three
get_completion_scheduler queries,
one each for the three different completion channels, it is necessary to
have three get_completion_domain
queries for the times when the completion scheduler is indeterminate but
the domain is known.
Note Above we say, “So long as all of
when_all’s child senders share a
common domain tag […]”. This sounds like we are adding a new requirement
to the when_all algorithm. However,
this requirement will be met for all existing uses of
when_all. Before C++29, all senders
will be in the “default” domain, so they trivially all share a common
domain.
Giving a non-default domain to a scheduler is the way to opt-in to
algorithm customization. Prior to C++29, there will be no
get_*domain
queries, hence the addition of those queries in C++29 will not affect
any existing schedulers. And the domain queries will be so-called
“forwarding” queries, meaning they will automatically be passed through
layers of sender adaptors. Users will not have to change their code in
order for domain information to be propagated. As a result, this change
is a pure extension.
connectSince C++29 will support only late
(connect-time)
customization, customizing an algorithm effectively amounts to
customizing that algorithm’s
connect
operation. By default, connect(sndr, rcvr)
calls sndr.connect(rcvr),
but in C++29 there will be a way to do something different depending on
the sender’s attributes and the receiver’s environment.
connect
will compute two domains, the “starting” domain and the (value)
“completion” domain:
Domain kind
|
Query
|
|---|---|
| Starting domain | get_domain(get_env(rcvr)) |
| Completion domain | get_completion_domain<set_value_t>(get_env(sndr), get_env(rcvr)) |
How
connect will
use this information to select an algorithm implementation is currently
under design. (See Appendix A: The
planned fix for more information.) But at that point, it is only a
matter of mechanism. The key point is that
connect has
the information it needs to dispatch accurately, and that we can make
that addition without breaking code. And we can.
bulkOnce we have a general mechanism for customizing algorithms, we can
consider changing parallel_scheduler
and task_scheduler to use that
mechanism to find parallel implementations of the
bulk algorithms. In C++26, it is
unspecified precisely how those schedulers accelerate
bulk, and we can certainly leave it
that way for C++29. No change is often the safest change and always the
easiest.
If we wanted to switch to using the new algorithm dispatch mechanics
in C++29, I believe we can do so with minimal impact on existing code.
Any behavior change would be an improvement, accelerating
bulk operations that should
have been accelerated but were not.
Consider the following sender:
starts_on(parallel_scheduler(), just() | bulk(fn))
In C++26, we can offer no iron-clad standard guarantee that this
bulk operation will be accelerated
even though it is executing on the parallel scheduler. The predecessor
of bulk,
just(), does
not know where it will complete in C++26. There is no plumbing yet to
tell it that it will be started on the parallel scheduler. As a result,
it is QoI whether this bulk will
execute in parallel or not.
But suppose we add a get_completion_domain<set_value_t>
query to the parallel_scheduler such
that the query returns an instance of a new type:
parallel_domain. Now, when
connecting the bulk sender,
connect will
ask for the predecessor’s domain, passing also the receiver’s
environment. Now the
just()
sender is able to say where it completes: the domain where it starts,
get_domain(get_env(rcvr)).
This will return parallel_domain{}.
connect
would then use that information to find a parallel implementation of
bulk.
As a result, in C++29 we could guarantee that this usage of
bulk will be parallelized. For some
stdlib implementations, this would be a behavior change: what once
executed serially on a thread of the parallel scheduler now executes in
parallel on many threads. Can that break working code? Yes, but only
code that had already violated the preconditions of
bulk: that
fn can safely be called in
parallel.
I do not believe this should be considered a breaking change, since any code that breaks is already broken.
All of the above is true also for
task_scheduler, which merely adds an
indirection to the call to
connect.
After the changes suggested by this paper, the
task_scheduler accelerates
bulk in the same way as
parallel_scheduler.
Note If we assign
parallel_domain to the
parallel_scheduler, and we
also add a requirement to
when_all that all of its child
operations share a common domain (see Domains), does that have the potential to break
existing code? It would not. We would make
parallel_domain inherit from
default_domain so that
when_all will compute the common
domain as default_domain even if one
child completes in the
parallel_domain.
inline_schedulerThe suggestion above to extend the get_completion_scheduler<*>
query presents an intriguing possibility for the
inline_scheduler: the ability for it
to report the scheduler on which its scheduling operations complete!
Consider the sender schedule(inline_scheduler{}).
Ask it where it completes today and it will say, “I complete on the
inline_scheduler.”, which isn’t
terribly useful. However, if you ask it, “Where will you complete – and
by the way you will be started on the
parallel_scheduler?”, now that
sender can report that it will complete on the
parallel_scheduler.
The result is that code that uses the
inline_scheduler will no longer
cause the actual scheduler to be hidden.
This realization is the motivation behind the change to strike the
get_completion_scheduler<set_value_t>(get_env(schedule(sch)))
requirement from the scheduler
concept. We want that expression to be ill-formed for the
inline_scheduler. Instead, we want
the following query to be well-formed (in C++29):
get_completion_scheduler<set_value_t>(get_env(schedule(inline_scheduler())), get_env(rcvr))
That expression should be equivalent to get_scheduler(get_env(rcvr)),
which says that the sender of
inline_scheduler completes wherever
it is started.
NoteThe reason we do not want
inline_scheduler to have a (largely
meaningless) completion scheduler in C++26 is because we want it to have
a meaningful one in C++29. And it would be strange if asking for the
completion scheduler gave different answers depending on whether or not
an environment was passed to the query.
This follows the general
principle that if you query a sender’s metadata early (sans environment)
and then later query it again with an environment, the answer should not
change. If the sender does not know the answer with certainty without an
environment, better for the expression to be ill-formed rather than
returning potentially inaccurate information.
[ Editor's note: In 33.4 [execution.syn], make the following changes: ]
… as before … namespace std::execution { // [exec.queries], queriesstruct get_scheduler_t {struct get_domain_t {unspecified};unspecified}; struct get_delegation_scheduler_t {unspecified}; struct get_forward_progress_guarantee_t {unspecified}; template<class CPO> struct get_completion_scheduler_t {unspecified}; struct get_await_completion_adaptor_t {unspecified};inline constexpr get_scheduler_t get_scheduler{}; inline constexpr get_delegation_scheduler_t get_delegation_scheduler{}; enum class forward_progress_guarantee; inline constexpr get_forward_progress_guarantee_t get_forward_progress_guarantee{}; template<class CPO> constexpr get_completion_scheduler_t<CPO> get_completion_scheduler{}; inline constexpr get_await_completion_adaptor_t get_await_completion_adaptor{}; … as before … // [exec.env], class template env template<queryable... Envs> struct env;inline constexpr get_domain_t get_domain{};// [exec.domain.default], execution domains// [exec.sched], schedulers struct scheduler_t {}; … as before … template<sender Sndr> using tag_of_t =struct default_domain;see below;// [exec.snd.transform], sender transformationstemplate<class Domain, sender Sndr, queryable... Env>requires (sizeof...(Env) <= 1)constexpr sender decltype(auto) transform_sender(Domain dom, Sndr&& sndr, const Env&... env) noexcept(see below);// [exec.snd.transform.env], environment transformationstemplate<class Domain, sender Sndr, queryable Env>constexpr queryable decltype(auto) transform_env(Domain dom, Sndr&& sndr, Env&& env) noexcept;// [exec.snd.apply], sender algorithm applicationtemplate<class Domain, class Tag, sender Sndr, class... Args>constexpr decltype(auto) apply_sender(// [exec.connect], the connect sender algorithm struct connect_t; inline constexpr connect_t connect{}; … as before …Domain dom, Tag, Sndr&& sndr, Args&&... args) noexcept(see below);
[ Editor's note: Remove subsection 33.5.5 [exec.get.domain]. ]
[ Editor's note: In 33.6 [exec.sched], change paragraphs 1 and 5 and strike paragraph 6 as follows: ]
The
schedulerconcept defines the requirements of a scheduler type (33.3 [exec.async.ops]).scheduleis a customization point object that accepts a scheduler. A valid invocation ofscheduleis a schedule-expression.namespace std::execution { template<class Sch> concept scheduler = derived_from<typename remove_cvref_t<Sch>::scheduler_concept, scheduler_t> && queryable<Sch> && requires(Sch&& sch) { { schedule(std::forward<Sch>(sch)) } -> sender;{ auto(get_completion_scheduler<set_value_t>(get_env(schedule(std::forward<Sch>(sch))))) }} && equality_comparable<remove_cvref_t<Sch>> && copyable<remove_cvref_t<Sch>>; }-> same_as<remove_cvref_t<Sch>>;… as before …
- For a given scheduler expression
sch, if the expressionauto(get_completion_scheduler<set_value_t>(get_env(schedule(sch))))is well-formed, it shall have typeremove_cvref_t<Sch>and shall compare equal tosch.
- For a given scheduler expression
sch, if the expressionget_domain(sch)is well-formed, then the expressionget_domain(get_env(schedule(sch)))is also well-formed and has the same type.
[ Editor's note: In 33.9.1 [exec.snd.general], change paragraph 1 as follows: ]
Subclauses 33.9.11 [exec.factories] and 33.9.12 [exec.adapt] define
customizablealgorithms that return senders.Each algorithm has a default implementation.Letsndrbe the result of an invocation of such an algorithm or an object equal to the result (18.2 [concepts.equality]), and letSndrbedecltype((sndr)). Letrcvrbe a receiver of typeRcvrwith associated environment env of typeEnvsuch thatsender_to<Sndr, Rcvr>istrue.For the default implementation of the algorithm that producedConnectingsndr, csndrtorcvrand starting the resulting operation state (33.3 [exec.async.ops]) necessarily results in the potential evaluation (6.3 [basic.def.odr]) of a set of completion operations whose first argument is a subexpression equal torcvr. LetSigsbe a pack of completion signatures corresponding to this set of completion operations, and letCSbe the type of the expressionget_completion_signatures<Sndr, Env>(). ThenCSis a specialization of the class templatecompletion_signatures(33.10 [exec.cmplsig]), the set of whose template arguments isSigs. If none of the types inSigsare dependent on the typeEnv, then the expressionget_completion_signatures<Sndr>()is well-formed and its type isCS.If a user-provided implementation of the algorithm that producedsndris selected instead of the default:
(1.1) Any completion signature that is in the set of types denoted bycompletion_signatures_of_t<Sndr, Env>and that is not part ofSigsshall correspond to error or stopped completion operations, unless otherwise specified.
(1.2) If none of the types inSigsare dependent on the typeEnv, thencompletion_signatures_of_t<Sndr>andcompletion_signatures_of_t<Sndr, Env>shall denote the same type.
[ Editor's note: Change 33.9.2 [exec.snd.expos] paragraph 6 as follows: ]
- For a scheduler
sch,isSCHED-ATTRS(sch)an expressionequivalent too1whose type satisfiesqueryablesuch thato1.query(get_completion_scheduler<Tag>)is an expression with the same type and value asschMAKE-ENV(get_completion_scheduler<set_value_t>, sch)where.Tagis one ofset_value_torset_stopped_t, and such thato1.query(get_domain)is expression-equivalent tosch.query(get_domain)isSCHED-ENV(sch)an expressionequivalent too2whose type satisfiesqueryablesuch thato2.query(get_scheduler)is a prvalue with the same type and value assch, and such thato2.query(get_domain)is expression-equivalent tosch.query(get_domain).MAKE-ENV(get_scheduler, sch)
[ Editor's note: Remove
the prototype of the exposition-only
completion-domain
function just before 33.9.2
[exec.snd.expos]
paragraph 8, and with it remove paragraphs 8 and 9, which specify the
function’s behavior. ]
[ Editor's note: Remove
33.9.2
[exec.snd.expos]
paragraphs 13 and 14 and the prototypes for the
get-domain-early and
get-domain-late
functions. ]
[ Editor's note: Remove subsection 33.9.5 [exec.domain.default]. ]
[ Editor's note: Remove subsection 33.9.6 [exec.snd.transform]. ]
[ Editor's note: Remove subsection 33.9.7 [exec.snd.transform.env]. ]
[ Editor's note: Remove subsection 33.9.8 [exec.snd.apply]. ]
[ Editor's note: Change 33.9.9 [exec.getcomplsigs] as follows: ]
Let
exceptbe an rvalue subexpression of an unspecified class typeExceptsuch thatmove_constructible<isExcept> && derived_from<Except, exception>true. LetbeCHECKED-COMPLSIGS(e)eifeis a core constant expression whose type satisfiesvalid-completion-signatures; otherwise, it is the following expression:(e, throwexcept, completion_signatures())Let
be expression-equivalent toget-complsigs<Sndr, Env...>()remove_reference_t<Sndr>::template get_completion_signatures<Sndr, Env...>().LetLetNewSndrbeSndrifsizeof...(Env) == 0istrue; otherwise,decltype(wheres)sis the following expression:NewSndrbedecltype(tag_of_t<Sndr>().if that expression is well-formed, andtransform-sender(declval<Sndr>(), declval<Env>()...))Sndrotherwise.transform_sender(get-domain-late(declval<Sndr>(), declval<Env>()...),declval<Sndr>(),declval<Env>()...)Constraints:
sizeof...(Env) <= 1istrue.Effects: Equivalent to: … as before …
[ Editor's note: Change 33.9.10 [exec.connect] as follows: ]
connectconnects (33.3 [exec.async.ops]) a sender with a receiver.The name
connectdenotes a customization point object. For subexpressionssndrandrcvr, letSndrbedecltype((sndr))andRcvrbedecltype((rcvr)),; letnew_sndrbe the expressiontransform_sender(decltype(get-domain-late(sndr, get_env(rcvr))){}, sndr, get_env(rcvr))tag_of_t<Sndr>().if that expression is well-formed, andtransform-sender(sndr, get_env(rcvr))sndrotherwise; and letDSandDRbedecay_t<decltype((new_sndr))>anddecay_t<Rcvr>, respectively.Let
connect-awaitable-promisebe … as before …
[ Editor's note: Change 33.9.11.1 [exec.schedule] paragraph 4 as follows: ]
If the expression
get_completion_scheduler<set_value_t>(get_env(sch.schedule()))== sch
is ill-formed
or well-formed and does not
evaluates to
falsesch,
the behavior of calling schedule(sch)
is undefined.
[ Editor's note: From 33.9.12.1 [exec.adapt.general], strike paragraph (3.6) as follows: ]
Unless otherwise specified:
… as before …
(3.5) An adaptor whose child senders are all non-dependent (33.3 [exec.async.ops]) is itself non-dependent.
(3.6)
These requirements apply to any function that is selected by the implementation of the sender adaptor.(3.7) Recommended practice: Implementations should use the completion signatures of the adaptors to communicate type errors to users and to propagate any such type errors from child senders.
[ Editor's note: Change 33.9.12.5 [exec.starts.on] paragraph 3 as follows: ]
Otherwise, the expression
starts_on(sch, sndr)is expression-equivalent to:.make-sender(starts_on, sch, sndr)transform_sender(query-with-default(get_domain, sch, default_domain()),make-sender(starts_on, sch, sndr))
except thatschis evaluated only once.Let
out_sndrandenvbe subexpressions such thatOutSndrisdecltype((out_sndr)). Ifissender-for<OutSndr, starts_on_t>false, then theexpressionsexpressionstarts_on.transform_env(out_sndr, env)andstarts_on.transform_sendertransform-sender(out_sndr, env)areis ill-formed; otherwise it is equivalent to:auto&& [_, sch, sndr] = out_sndr; return let_value( schedule(sch), [sndr = std::forward_like<OutSndr>(sndr)]() mutable noexcept(is_nothrow_move_constructible_v<decay_t<OutSndr>>) { return std::move(sndr); });
- Let
out_sndrbe … as before …
[ Editor's note: Remove subsection 33.9.12.6 [exec.continues.on] ]
[ Editor's note: Change 33.9.12.7 [exec.schedule.from] to [exec.continues.on] and change it as follows: ]
33.9.12.
76execution::[execschedule_fromcontinues_on.schedule.from.continues.on]
schedule_fromcontinues_onschedules work dependent on the completion of a sender onto a scheduler’s associated execution resource.
[Note 1:schedule_fromis not meant to be used in user code; it is used in the implementation ofcontinues_on. — end note]The name
schedule_fromcontinues_ondenotes a customization point object. For some subexpressionsschandsndr, letSchbedecltype((sch))andSndrbedecltype((sndr)). IfSchdoes not satisfy scheduler, orSndrdoes not satisfysender,schedule_from(sch, sndr)continues_on(sndr, sch)is ill-formed.Otherwise, the expression
schedule_from(sch, sndr)continues_on(sndr, sch)is expression-equivalent to:make-sender(continues_on, sch, sndr)transform_sender(query-with-default(get_domain, sch, default_domain()),make-sender(schedule_from, sch, sndr))except that sch is evaluated only once.
The exposition-only class template
impls-for(33.9.1 [exec.snd.general]) is specialized forschedule_from_tcontinues_on_tas follows:namespace std::execution { template<> structimpls-for<schedule_from_tcontinues_on_t> :default-impls{ static constexpr autoget-attrs=see below; static constexpr autoget-state=see below; static constexpr autocomplete=see below; template<class Sndr, class... Env> static consteval voidcheck-types(); }; }The member
is initialized with a callable object equivalent to the following lambda:impls-for<schedule_from_tcontinues_on_t>::get-attrs[](const auto& data, const auto& child) noexcept -> decltype(auto) { returnJOIN-ENV(SCHED-ATTRS(data),FWD-ENV(get_env(child))); }The member
is initialized with a callable object equivalent to the following lambda:impls-for<schedule_from_tcontinues_on_t>::get-state… as before …
template<class Sndr, class... Env> static consteval voidcheck-types();… as before …
The member
is initialized with a callable object equivalent to the following lambda:impls-for<schedule_from_tcontinues_on_t>::complete… as before …
Let
out_sndrbe a subexpression denoting a sender returned fromschedule_from(sch, sndr)continues_on(sndr, sch)or one equal to such, and letOutSndrbe the typedecltype((out_sndr)). Letout_rcvrbe … as before …
[ Editor's note: Change 33.9.12.8 [exec.on] paragraphs 3-8 as follows: ]
Otherwise, if
decltype((sndr))satisfiessender, the expressionon(sch, sndr)is expression-equivalent to:.make-sender(on, sch, sndr)transform_sender(query-with-default(get_domain, sch, default_domain()),make-sender(on, sch, sndr))except that
schis evaluated only once.For subexpressions
sndr,sch, andclosure, if
(4.1)
decltype((sch))does not satisfyscheduler, or(4.2)
decltype((sndr))does not satisfysender, or(4.3)
closureis not a pipeable sender adaptor closure object ([exec.adapt.obj]), the expressionon(sndr, sch, closure)is ill-formed; otherwise, it is expression-equivalent to:.make-sender(on,product-type{sch, closure}, sndr)transform_sender(get-domain-early(sndr),make-sender(on,product-type{sch, closure}, sndr))except that
sndris evaluated only once.Let
out_sndrandenvbe subexpressions, letOutSndrbedecltype((out_sndr)), and letEnvbedecltype((env)). Ifissender-for<OutSndr, on_t>false, then theexpressionsexpressionon.transform_env(out_sndr, env)andon.transform_sendertransform-sender(out_sndr, env)areis ill-formed.Otherwise: Let
not-a-schedulerbe an unspecified empty class type.
The expression
on.transform_env(out_sndr, env)has effects equivalent to:auto&& [_, data, _] = out_sndr; if constexpr (scheduler<decltype(data)>) { returnJOIN-ENV(SCHED-ENV(std::forward_like<OutSndr>(data)),FWD-ENV(std::forward<Env>(env))); } else { return std::forward<Env>(env); }
The expression
on.has effects equivalent to:transform_sendertransform-sender(out_sndr, env)auto&& [_, data, child] = out_sndr; if constexpr (scheduler<decltype(data)>) { auto orig_sch =query-with-default(get_scheduler, env,not-a-scheduler()); if constexpr (same_as<decltype(orig_sch),not-a-scheduler>) { returnnot-a-sender{}; } else { return continues_on( starts_on(std::forward_like<OutSndr>(data), std::forward_like<OutSndr>(child)), std::move(orig_sch)); } } else { auto& [sch, closure] = data; auto orig_sch =query-with-default( get_completion_scheduler<set_value_t>, get_env(child),query-with-default(get_scheduler, env,not-a-scheduler())); if constexpr (same_as<decltype(orig_sch),not-a-scheduler>) { returnnot-a-sender{}; } else { returnwrite_envcontinues_on(continues_onwrite_env( std::forward_like<OutSndr>(closure)( continues_on( write_env(std::forward_like<OutSndr>(child),SCHED-ENV(orig_sch)), sch)),orig_sch),SCHED-ENV(sch)SCHED-ENV(sch)orig_sch); } }
[ Editor's note: Change 33.9.12.9 [exec.then] paragraph 3 as follows: ]
Otherwise, the expression
is expression-equivalent tothen-cpo(sndr, f):.make-sender(then-cpo, f, sndr)transform_sender(get-domain-early(sndr),make-sender(then-cpo, f, sndr))except that
sndris evaluated only once.
[ Editor's note: Change 33.9.12.10 [exec.let] paragraphs 2-4 as follows: ]
For
let_value,let_error, andlet_stopped, letset-cpobeset_value,set_error, andset_stopped, respectively. Let the expressionlet-cpobe one oflet_value,let_error, orlet_stopped. For a subexpressionsndr, letbe expression-equivalent to the first well-formed expression below:let-env(sndr)
- (2.1)
SCHED-ENV(get_completion_scheduler<decayed-typeof<set-cpo>>(get_env(sndr)))
- (2.2)
MAKE-ENV(get_domain, get_domain(get_env(sndr)))
- (2.3)
(void(sndr), env<>{})The names
let_value,let_error, andlet_stoppeddenote … as before …Otherwise, the expression
is expression-equivalent tolet-cpo(sndr, f):.make-sender(let-cpo, f, sndr)transform_sender(get-domain-early(sndr),make-sender(let-cpo, f, sndr))except that
sndris evaluated only once.
[ Editor's note: Change 33.9.12.11 [exec.bulk] paragraphs 3 and 4 and insert paragraphs 5 and 6 as follows: ]
Otherwise, the expression
is expression-equivalent to:bulk-algo(sndr, policy, shape, f)transform_sender(get-domain-early(sndr),make-sender(bulk-algo,product-type<see below, Shape, Func>{policy, shape, f}, sndr))
except thatThe first template argument ofsndris evaluated only once.product-typeisPolicyifPolicymodelscopy_constructible, andconst Policy&otherwise.Let
sndrandbe an expression such thatenvbe subexpressionsSndrisdecltype((sndr)). Ifissender-for<Sndr, bulk_t>false, then the expressionbulk.transform_sender(sndr, env)is ill-formed; otherwise, it is equivalent to:as-bulk-chunked(sndr)auto [_, data, child] = sndr; auto& [policy, shape, f] = data; auto new_f = [func = std::move(f)](Shape begin, Shape end, auto&&... vs) noexcept(noexcept(f(begin, vs...))) { while (begin != end) func(begin++, vs...); } return bulk_chunked(std::move(child), policy, shape, std::move(new_f));
[ Note: This causes thebulk(sndr, policy, shape, f)sender to be expressed in terms ofbulk_chunked(sndr, policy, shape, f)when it is connected to a receiverwhose execution domain does not customize. — end note ]bulk
Let
sndrandenvbe subexpressions, letSndrbedecltype((sndr)), and letschbe expression-equivalent toget_completion_scheduler<set_value_t>(get_env(sndr.. Ifget<2>()))issender-for<Sndr,decayed-typeof<bulk-algo>>false, the expressionis ill-formed; otherwise, it is expression-equivalent to:bulk-algo.transform-sender(sndr, env)
[ Editor's note: Change 33.9.12.12 [exec.when.all] as follows: ]
when_allandwhen_all_with_variantboth … as before …The names
when_allandwhen_all_with_variantdenote customization point objects. Letsndrsbe a pack of subexpressions,and letSndrsbe a pack of the typesdecltype((sndrs))..., and let. The expressionsCDbe the typecommon_type_t<decltype(. Letget-domain-early(sndrs))...>CD2beCDifCDis well-formed, anddefault_domainotherwisewhen_all(sndrs...)andwhen_all_with_variant(sndrs...)are ill-formed if any of the following istrue:The expression
when_all(sndrs...)is expression-equivalent to:.make-sender(when_all, {}, sndrs...)transform_sender(CD2(),make-sender(when_all, {}, sndrs...))The exposition-only class template
impls-for(33.9.1 [exec.snd.general]) is specialized forwhen_all_tas follows:namespace std::execution { template<> structimpls-for<when_all_t> :default-impls{static constexpr autostatic constexpr autoget-attrs=see below;get-env=see below; static constexpr autoget-state=see below; static constexpr autostart=see below; static constexpr autocomplete=see below; template<class Sndr, class... Env> static consteval voidcheck-types(); }; }… as before …
- Throws: Any exception thrown as a result of evaluating the Effects
, or an exception of an unspecified type derived from.exceptionwhenCDis ill-formed
The member
is initialized with a callable object equivalent to the following lambda expression:impls-for<when_all_t>::get-attrs[](auto&&, auto&&... child) noexcept { if constexpr (same_as<CD, default_domain>) { return env<>(); } else { returnMAKE-ENV(get_domain, CD()); } }… as before …
The expression
when_all_with_variant(sndrs...)is expression-equivalent to:.make-sender(when_all_with_variant, {}, sndrs...)transform_sender(CD2(),make-sender(when_all_with_variant, {}, sndrs...));Given subexpressions
sndrandenv, ifissender-for<decltype((sndr)), when_all_with_variant_t>false, then the expressionwhen_all_with_variant.is ill-formed; otherwise, it is equivalent to:transform_sendertransform-sender(sndr, env)auto&& [_, _, ...child] = sndr; return when_all(into_variant(std::forward_like<decltype((sndr))>(child))...);[Note 1: This causes the
when_all_with_variant(sndrs...)sender to becomewhen_all(into_variant(sndrs)...)when it is connected with a receiverwhose execution domain does not customize. — end note]when_all_with_variant
[ Editor's note: Change 33.9.12.13 [exec.into.variant] paragraph 3 as follows: ]
Otherwise, the expression
into_variant(sndr)is expression-equivalent to:.make-sender(into_variant, {}, sndr)transform_sender(get-domain-early(sndr),make-sender(into_variant, {}, sndr))except that
sndris only evaluated once.
[ Editor's note: Change 33.9.12.14 [exec.stopped.opt] paragraphs 2 and 4 as follows: ]
The name
stopped_as_optionaldenotes a pipeable sender adaptor object. For a subexpressionsndr, letSndrbedecltype((sndr)). The expressionstopped_as_optional(sndr)is expression-equivalent to:.make-sender(stopped_as_optional, {}, sndr)transform_sender(get-domain-early(sndr),make-sender(stopped_as_optional, {}, sndr))except that
sndris only evaluated once.The exposition-only class template
impls-for… as before …Let
sndrandenvbe subexpressions such thatSndrisdecltype((sndr))andEnvisdecltype((env)). Ifissender-for<Sndr, stopped_as_optional_t>falsethen the expressionstopped_as_optional.is ill-formed; otherwise, iftransform_sendertransform-sender(sndr, env)sender_in<ischild-type<Sndr>,FWD-ENV-T(Env)>false, the expressionstopped_as_optional.is equivalent totransform_sendertransform-sender(sndr, env); otherwise, it is equivalent to:not-a-sender()auto&& [_, _, child] = sndr; using V =single-sender-value-type<child-type<Sndr>,FWD-ENV-T(Env)>; return let_stopped( then(std::forward_like<Sndr>(child), []<class... Ts>(Ts&&... ts) noexcept(is_nothrow_constructible_v<V, Ts...>) { return optional<V>(in_place, std::forward<Ts>(ts)...); }), []() noexcept { return just(optional<V>()); });
[ Editor's note: Change 33.9.12.15 [exec.stopped.err] paragraphs 2 and 3 as follows: ]
The name
stopped_as_errordenotes a pipeable sender adaptor object. For some subexpressionssndranderr, letSndrbedecltype((sndr))and letErrbedecltype((err)). If the typeSndrdoes not satisfysenderor if the typeErrdoes not satisfymovable-value,stopped_as_error(sndr, err)is ill-formed. Otherwise, the expressionstopped_as_error(sndr)is expression-equivalent to:.make-sender(stopped_as_error, err, sndr)transform_sender(get-domain-early(sndr),make-sender(stopped_as_error, err, sndr))except that
sndris only evaluated once.Let
sndrandenvbe subexpressions such thatSndrisdecltype((sndr))andEnvisdecltype((env)). Ifissender-for<Sndr, stopped_as_error_t>falsethen the expressionstopped_as_error.is ill-formed; otherwise, it is equivalent to:transform_sendertransform-sender(sndr, env)auto&& [_, err, child] = sndr; using E = decltype(auto(err)); return let_stopped( std::forward_like<Sndr>(child), [err = std::forward_like<Sndr>(err)]() noexcept(is_nothrow_move_constructible_v<E>) { return just_error(std::move(err)); });
[ Editor's note: Change 33.9.12.16 [exec.associate] paragraph 10 as follows: ]
The name
associatedenotes a pipeable sender adaptor object. For subexpressionssndrandtoken:
(10.1) If
decltype((sndr))does not satisfysender, orremove_cvref_t<decltype((token))>does not satisfyscope_token, thenassociate(sndr, token)is ill-formed.(10.2) Otherwise, the expression
associate(sndr, token)is expression-equivalent to:.make-sender(associate,associate-data(token, sndr))transform_sender(get-domain-early(sndr),make-sender(associate,associate-data(token, sndr)))except that
sndris evaluated only once.
[ Editor's note: Change 33.9.13.1 [exec.sync.wait] paragraphs 4 and 9 as follows: ]
The name
this_thread::sync_waitdenotes a customization point object. For a subexpressionsndr, letSndrbedecltype((sndr)). The expressionthis_thread::sync_wait(sndr)is expression-equivalent tothe following, except thatsndris evaluated only once:sync_wait., whereapply(sndr)applyis the exposition-only member function specified below.apply_sender(get-domain-early(sndr), sync_wait, sndr)Mandates:
(4.1)
sender_in<Sndr,is true.sync-wait-env>(4.2) The type
is well-formed.sync-wait-result-type<Sndr>
- (4.3)
same_as<decltype(ise),sync-wait-result-type<Sndr>>true, whereeis theapply_senderexpression i>… as before …
For a subexpression
sndr, letSndrbedecltype((sndr)). Ifsender_to<Sndr,issync-wait-receiver<Sndr>>false, the expressionsync_wait.is ill-formed; otherwise, it is equivalent to:apply_senderapply(sndr)sync-wait-state<Sndr> state; auto op = connect(sndr,sync-wait-receiver<Sndr>{&state}); start(op); state.loop.run(); if (state.error) { rethrow_exception(std::move(state.error)); } return std::move(state.result);
[ Editor's note: Change Note 1 in 33.9.13.1 [exec.sync.wait] paragraph 10.1 as follows: ]
[Note 1: The
defaultimplementation ofsync_waitachieves forward progress guarantee delegation by providing arun_loopscheduler via theget_delegation_schedulerquery on thesync-wait-receiver’s environment. Therun_loopis driven by the current thread of execution. — end note]
[ Editor's note: Change 33.9.13.2 [exec.sync.wait.var] paragraphs 1 and 2 as follows: ]
The name
this_thread::sync_wait_with_variantdenotes a customization point object. For a subexpressionsndr, letSndrbedecltype(into_variant(sndr)). The expressionthis_thread::sync_wait_with_variant(sndr)is expression-equivalent tothe following, exceptsndris evaluated only once:sync_wait_with_variant., whereapply(sndr)applyis the exposition-only member function specified below.apply_sender(get-domain-early(sndr), sync_wait_with_variant, sndr)Mandates:
(1.1)
sender_in<Sndr,issync-wait-env>true.(1.2) The type
is well-formed.sync-wait-with-variant-result-type<Sndr>
- (1.3)
same_as<decltype(ise),sync-wait-with-variant-result-type<Sndr>>true, whereeis theapply_senderexpression i>The expression
sync_wait_with_variant.is equivalent to:apply_senderapply(sndr)using result_type =sync-wait-with-variant-result-type<Sndr>; if (auto opt_value = sync_wait(into_variant(sndr))) { return result_type(std::move(get<0>(*opt_value))); } return result_type(nullopt);
[ Editor's note: Change Note 1 in 33.9.13.1 [exec.sync.wait] paragraph 10.1 as follows: ]
[Note 1: The
defaultimplementation ofsync_wait_with_variantachieves forward progress guarantee delegation (6.10.2.3 [intro.progress]) by relying on the forward progress guarantee delegation provided bysync_wait. — end note]
[ Editor's note: Change 33.11.2 [exec.env] as follows: ]
namespace std::execution { template<queryable... Envs> struct env { Envs0 envs0; // exposition only Envs1 envs1; // exposition only ⋮ Envsn-1 envsn-1; // exposition only template<class QueryTag, class... Args> constexpr decltype(auto) query(QueryTag q, Args&&... args) const noexcept(see below); }; template<class... Envs> env(Envs...) -> env<unwrap_reference_t<Envs>...>; }
- The class template
envis used to construct a queryable object from several queryable objects. Query invocations on the resulting object are resolved by attempting to query each subobject in lexical order.… as before …
template<class QueryTag, class... Args> constexpr decltype(auto) query(QueryTag q, Args&&... args) const noexcept(see below);
Let
has-querybe the following exposition-only concept:template<class Env, class QueryTag, class... Args> concepthas-query= // exposition only requires (const Env& env, Args&&... args) { env.query(QueryTag(), std::forward<Args>(args)...); };Let
febe the first element ofenvs0, envs1, … envsn-1such that the expressionis well-formed.fe.query(q, std::forward<Args>(args)...)Constraints:
(ishas-query<Envs, QueryTag, Args...> || ...)true.Effects: Equivalent to:
returnfe.query(q, std::forward<Args>(args)...);Remarks: The expression in the
noexceptclause is equivalent tonoexcept(.fe.query(q, std::forward<Args>(args)...))
[ Editor's note: In 33.12.1.2 [exec.run.loop.types], add a new paragraph after paragraph 4 as follows: ]
- Let
schbe an expression of typerun-loop-scheduler. The expressionschedule(sch)has typerun-loop-senderand is not potentially-throwing ifschis not potentially-throwing.
- For type
set-tagother thanset_error_t, the expressionget_completion_scheduler<evaluates toset-tag>(get_env(schedule(sch))) ==schtrue.
[ Editor's note: Change 33.13.3 [exec.affine.on] paragraph 3 as follows: ]
Otherwise, the expression
affine_on(sndr, sch)is expression-equivalent to:.make-sender(affine_on, sch, sndr)transform_sender(get-domain-early(sndr),make-sender(affine_on, sch, sndr))except that
sndris evaluated only once.
[ Editor's note: Change paragraph 3 of 33.13.4 [exec.inline.scheduler] as follows: ]
Let sndr be an expression of type
inline-sender, letrcvrbe an expression such thatreceiver_of<decltype((rcvr)), CS>istruewhereCSiscompletion_signatures<set_value_t()>, then:[ Editor's note: Move the text of (3.1) below into this paragraph. ](3.1) the expression
connect(sndr, rcvr)has typeand is potentially-throwing if and only ifinline-state<remove_cvref_t<decltype((rcvr))>>((void)sndr, auto(rcvr))is potentially-throwing, and.(3.2) the expression
get_completion_scheduler<set_value_t>(get_env(sndr))has typeinline_schedulerand is potentially-throwing if and only ifget_env(sndr)is potentially-throwing.
[ Editor's note: Change 33.13.5 [exec.task.scheduler] as follows: ]
namespace std::execution { class task_scheduler {classts-sender; // exposition onlytemplate<receiver R>class state; // exposition onlytemplate<class Sch>classpublic: using scheduler_concept = scheduler_t; template<class Sch, class Allocator = allocator<void>> requires (!same_as<task_scheduler, remove_cvref_t<Sch>>) && scheduler<Sch> explicit task_scheduler(Sch&& sch, Allocator alloc = {});backend-for; // exposition onlyts-sendersee belowschedule();template <class Sndr, class Env> // exposition onlyfriend bool operator==(const task_scheduler& lhs, const task_scheduler& rhs) noexcept; template<class Sch> requires (!same_as<task_scheduler, Sch>) && scheduler<Sch> friend bool operator==(const task_scheduler& lhs, const Sch& rhs) noexcept; private: shared_ptr<see belowbulk-transform(Sndr&& sndr, const Env& env);voidparallel_scheduler_backend>sch_; // exposition only// see [exec.sysctxrepl.psb]}; }
task_scheduleris a class that modelsscheduler(33.6 [exec.sched]). Given an objectsof typetask_scheduler, letbe theSCHED(s)sched_member of the object owned bys..sch_
- For an lvalue
rof type derived fromreceiver_proxy, letbe an object of a type that modelsWRAP-RCVR(r)receiverand whose completion handlers result in invoking the corresponding completion handlers ofr.template<class Sch> structbackend-for: parallel_scheduler_backend {// exposition onlyexplicitbackend-for(Sch sch) : sched_(std::move(sch)) {} void schedule(receiver_proxy& r, span<byte> s) noexcept override; void schedule_bulk_chunked(size_t shape, bulk_item_receiver_proxy& r, span<byte> s) noexcept override; void schedule_bulk_unchunked(size_t shape, bulk_item_receiver_proxy& r, span<byte> s) noexcept override; Schsched_;// exposition only};
- Let
sndrbe a sender whose only value completion signature isset_value_t()and for which the expressionget_completion_scheduler<set_value_t>(get_env(sndr)) ==issched_true.void schedule(receiver_proxy& r, span<byte> s) noexcept override;
- Effects: Constructs an operation state
oswithconnect(schedule(and callssched_),WRAP-RCVR(r))start(os).void schedule_bulk_chunked(size_t shape, bulk_item_receiver_proxy& r, span<byte> s) noexcept override;
- Effects: Let
chunk_sizebe an integer less than or equal toshape, letnum_chunksbe(shape + chunk_size - 1) / chunk_size, and letfnbe a function object such that for an integeri,fn(i)callsr.execute(i * chunk_size, m), wheremis the lesser of(i + 1) * chunk_sizeandshape. Constructs an operation stateosas if withconnect(bulk(sndr, par, num_chunks, fn),and callsWRAP-RCVR(r))start(os).void schedule_bulk_unchunked(size_t shape, bulk_item_receiver_proxy& r, span<byte> s) noexcept override;
- Effects: Let
fnbe a function object such that for an integeri,fn(i)is equivalent tor.execute(i, i + 1). Constructs an operation stateosas if withconnect(bulk(sndr, par, shape, fn),and callsWRAP-RCVR(r))start(os).template<class Sch, class Allocator = allocator<void>> requires(!same_as<task_scheduler, remove_cvref_t<Sch>>) && scheduler<Sch> explicit task_scheduler(Sch&& sch, Allocator alloc = {});
- Effects: Initialize
sch_withallocate_shared<.backend-for<remove_cvref_t<Sch>>>(alloc, std::forward<Sch>(sch))[ Editor's note: Paragraphs 3-7 are kept unmodified. Remove paragraphs 8-12 and add the following paragraphs: ]
see belowschedule();
Returns: a prvalue
sndrwhose typeSndrmodelssendersuch that:
(8.1)
get_completion_scheduler<set_value_t>(get_env(sndr))is equal to*this.(8.2) If a receiver
rcvris connected tosndrand the resulting operation state is started, calls, wheresch_->schedule(r, s)
(8.2.1)
ris a proxy forrcvrwith basesystem_context_replaceability::receiver_proxy(33.15 [exec.par.scheduler]) and(8.2.2)
sis a preallocated backend storage forr.template <class BulkSndr, class Env> // exposition onlysee belowbulk-transform(BulkSndr&& bulk_sndr, const Env& env);
Constraints:
sender_in<BulkSndr, Env>istrueand eitherorsender-for<BulkSndr, bulk_chunked_t>issender-for<BulkSndr, bulk_unchunked_t>true.Returns: a prvalue
sndrwhose type modelssendersuch that:
(10.1)
get_completion_scheduler<set_value_t>(get_env(sndr))is equal to*this.(10.2)
bulk_sndris connected to an unspecified receiver if a receiverrcvris connected tosndr. If the resulting operation state is started,
(10.2.1) If
bulk_sndrcompletes with valuesvals, letargsbe a pack of lvalue subexpressions designating objects decay-copied fromvals. Then
(10.2.1.1) If
bulk_sndris the result of callingbulk_chunked(child, policy, shape, f),is called wheresch_->schedule_bulk_chunked(shape, r, s)ris a bulk chunked proxy forrcvrwith callablefand argumentsargs, andsis a preallocated backend storage forr.(10.2.1.2) Otherwise,
bulk_sndris the result of callingbulk_unchunked(child, policy, shape, f). Callswheresch_->schedule_bulk_unchunked(shape, r, s)ris a bulk unchunked proxy forrcvrwith callablefand argumentsargs, andsis a preallocated backend storage forr.(10.2.2) All other completion operations are forwarded unchanged.
[ Editor's note: In 33.15 [exec.par.scheduler], add a new paragraph after paragraph 3, another before paragraph 10, and change paragraphs 10 and 11 as follows: ]
- The expression
get_forward_progress_guarantee(sch)returnsforward_progress_guarantee::parallel.?. The expression
get_completion_scheduler<set_value_t>(get_env(schedule(sch))) == schevaluates totrue.… as before …
?. Let
schbe a subexpression of typeparallel_scheduler. For subexpressionssndrandenv, iftag_of_t<Sndr>is neitherbulk_chunked_tnorbulk_unchunked_t, the expressionsch.is ill-formed; otherwise, letbulk-transform(sndr, env)child,pol,shape, andfbe subexpressions equal to the arguments used to createsndr.
When the tag type ofparallel_schedulerprovides a customized implementation of thebulk_chunkedalgorithm (33.9.12.11 [exec.bulk]). If a receiverrcvris connected to the sender returned bybulk_chunked(sndr, pol, shape, f)sndrisbulk_chunked_t, the expressionsch.returns a sender such that if it is connected to a receiverbulk-transform(sndr, env)rcvrand the resulting operation state is started, then:
(10.1) If
sndrchildcompletes with valuesvals, letargsbe a pack of lvalue subexpressions designatingvals, thenb.schedule_bulk_chunked(shape, r, s)is called, where(10.2) All other completion operations are forwarded unchanged.
[ Note: Customizing the behavior of
bulk_chunkedaffects thedefaultimplementation ofbulk. — end note ]
When the tag type ofparallel_schedulerprovides a customized implementation of thebulk_unchunkedalgorithm (33.9.12.11 [exec.bulk]). If a receiverrcvris connected to the sender returned bybulk_unchunked(sndr, pol, shape, f)sndrisbulk_unchunked_t, the expressionsch.returns a sender such that if it is connected to a receiverbulk-transform(sndr, env)rcvrand the resulting operation state is started, then:
Our willingness to remove algorithm customization depends on our confidence that we can add it back later without breaking code. Section Restoring algorithm customization in C++29 talks about how we would go about this. This appendix fleshies out some of the details.
A sender expression represents a task graph, the nodes of which are asynchronous operations. Every async operation is started on some execution context, the starting context, and completes on another execution context, the completing context. The two might be the same, but that’s irrelevant.
Note This is a simplification. Some senders like
when_all can complete on one of
several contexts. We solve that problem with domains as described
below.
Imagine we assign each execution resource a color. The mission then is to paint every node in the task graph with the colors of its starting and completing contexts. Once we know where each operation will start and complete, we can use that information to pick the right algorithm implementation.
In regards to customization, each color can be thought to represent not an individual execution resource, but rather a set of algorithm implementations. Two different execution resources might use the same set of algorithm implementations, so they would have the same “color”. In fact, most execution resources will use the default set of algorithm implementations, in which case they all have the same color.
That’s not always the case though. A thread pool would not want to
use the default implementation of
bulk for example – that would be
serial. The thread pool would have a different color corresponding to
its set of preferred algorithm implementations.
In std::execution
today, this notion of color is called a “domain”. A domain is a tag type
that is used to select a set of algorithm implementations. Schedulers,
which are stand-ins for execution resources, advertize their domain with
the get_domain query.
Completing the mission requires two things:
Identifying the starting and completing domain of every operation in the task graph, and
Using that information to select the preferred implementation for the algorithm that operation represents.
Let’s take these two separately.
So-called “early” customization, which determines the return type of
then(sndr, fn)
for example, is predicated on the fact that senders know the domain on
which they will complete. As discussed above, that’s false. Many senders
only know where they will complete once they know where they will start,
which isn’t known until the sender is connected to a receiver.
So early customization is irreparably broken. There is no plan to add it back.
That leaves late customization, which is performed by the
connect
customization point. The receiver, which is an extension of caller,
knows where the operation will start. If the sender is given this
information – that is, if the sender is told where it will start – it
can accurately report where it will complete. This is the key
insight.
When
connect
queries a sender’s attributes for its domain, it should pass the
receiver’s environment. That way a sender has all the information
available when computing its completion domain.
get_completion_domainIt is sometimes the case that a sender’s value and error completions can happen on different domains. For example, imagine trying to schedule work on a GPU. If it succeeds, you are in the GPU domain, and Bob’s your uncle. If scheduling fails, however, the error cannot be reported on the GPU because we failed to make it there!
So asking a sender for a singular completion domain is not flexible
enough. We have three separate queries for a sender’s completion
scheduler: get_completion_scheduler<set_[value|error|stopped]_t>.
Similarly, we should have three separate queries for a sender’s
completion domain: get_completion_domain<set_[value|error|stopped]_t>.
Note If we have the
get_completion_scheduler queries,
why do we need
get_completion_domain? We can ask
the completion scheduler for its domain, right? The answer is that a
sender like when_all(s1, s2)
doesn’t know what scheduler it will complete on. It completes on the
context of whichever sender, s1 or
s2, finishes last. But if
s1 and
s2 have the same completion
domain, it doesn’t matter that we do not know the completion
scheduler. The domain determines the preferred set of algorithm
implementations. Hence we need separate queries for the completion
domain. (Additionally, when_all must
require that all of its child senders share a common domain.)
The addition of the completion domain queries creates a nice symmetry as shown in the table below (with additions in green):
Receiver
|
Sender
|
|
|---|---|---|
| Query for scheduler | get_scheduler |
get_completion_scheduler<set_value_t>get_completion_scheduler<set_error_t>get_completion_scheduler<set_stopped_t> |
| Query for domain | get_domain |
get_completion_domain<set_value_t>get_completion_domain<set_error_t>get_completion_domain<set_stopped_t> |
For a sender sndr and an
environment env, we can get the
sender’s completion domain as follows:
auto completion_domain = get_completion_domain<set_value_t>(get_env(sndr), env);
A sender like
just() would
implement this query as follows:
template <class... Values> class just_sender { private: struct attrs { template <class Env> auto query(get_completion_domain_t<set_value_t>, const Env& env) const noexcept { // just(...) completes where it starts. the domain of the environment is where // the sender will start, so return that. return get_domain(env); } //... }; public: attrs get_env() const noexcept { return attrs{}; } //... };
Note A query that accepts an additional argument is
novel in std::execution,
but the query system was designed to support this usage. See
33.2.2
[exec.queryable.concept].
connectWith the addition of the get_completion_domain<...>
queries that can accept the receiver’s environment,
connect can
now “paint” the operation with its starting and completing colors, aka
domains. When passed arguments sndr
and rcvr, the starting domain
is:
// Get the operation's starting domain: auto starting_domain = get_domain(get_env(rcvr));
To get the completion domain (when the operation completes successfully):
// Get the operation's completion domain for the value channel: auto completion_domain = get_completion_domain<set_value_t>(get_env(sndr), get_env(rcvr));
Now
connect has
all the information it needs to select the correct algorithm
implementation. Great!
But this presents the
connect
function with a dilemna: how does it use two domains to pick
one algorithm implementation?
Consider that the starting domain might want a say in how
start works, and the completing
domain might want a say in how
set_value works. So should we let
the starting domain customize start
and the completing domain customize
set_value?
No. start and
set_value are bookends around an
async operation; they must match. Often
set_value needs state that is set up
in start. Customizing the two
independently is madness.
NoteThe following is more speculative than what has been described so far.
A possible solution I have been exploring is to bring back sender transforms. Each domain can apply its transform in turn. I do not yet have reason to believe the order matters, but it is important that when asked to transform a sender, a domain knows whether it is the “starting” domain or the “completing” domain.
Here is how a domain might customize
bulk when it is the completing
domain:
struct thread_pool_domain { template <sender-for<bulk_t> Sndr, class Env> auto transform_sender(set_value_t, Sndr&& sndr, const Env& env) const { //... } };
Since it has set_value_t as its
first argument, this transform is only applied when
thread_pool_domain is an operation’s
completion domain. Had the first argument been
start_t, the transform would only be
used when thread_pool_domain is a
starting domain.
transform_sender
In this reimagined customization design, the
connect CPO
does a few things:
Determines the starting and completing domains,
Applies the completing domain’s transform (if any),
Applies the starting domain’s transform (if any) to the resulting sender,
Connnects the twice-transformed sender to the receiver.
The first three steps are doing something different than connecting a
sender and receiver, so it makes sense to factor them out into their own
utility. I call it transform_sender
here, but it does not need to be normative since only
connect will
call it.
The new transform_sender looks
like this:
template <class Domain, class Tag, class Sndr, class Env> concepthas-sender-transform-for= requires (Sndr(*make_sndr)(), const Env env) { Domain().transform_sender(Tag(), make_sndr(), env); } template <class Domain, class Tag> constexpr autotransform-sender-recurse=overload-set{ []<class Self, class Sndr, class Env>(this Self self, Sndr&& sndr, const Env& env) -> decltype(auto) requireshas-sender-transform-for<Domain, Tag, Sndr, Env> { return self(Domain().transform_sender(Tag(), std::forward<Sndr>(sndr), env)); }, []<class Sndr, class Env>(Sndr&& sndr, const _Env&) -> Sndr { return std::forward<Sndr>(sndr); } }; template <class Sndr, class Env> auto transform_sender(Sndr&& sndr, const Env& env) { auto starting_domain = get_domain(env); auto completing_domain = get_completion_domain<set_value_t>(get_env(sndr), env); auto starting_transform =transform-sender-recurse<decltype(starting_domain), start_t>; auto completing_transform =transform-sender-recurse<decltype(completing_domain), set_value_t>; return starting_transform(completing_transform(std::forward<Sndr>(sndr), env), env); }
With this definition of
transform_sender, connect(sndr, rcvr)
is equivalent to transform_sender(sndr, get_env(rcvr)).connect(rcvr).
Let’s see how this new approach addresses the problems noted in the motivating example above. The troublesome code is:
namespace ex = std::execution; auto sndr = ex::starts_on(gpu, ex::just()) | ex::then(fn); std::this_thread::sync_wait(std::move(sndr));
The problem with P3718
describes how the current design and the “fixed” one proposed in [P3718R0] go off the rails while
determining the domain in which the function
fn will execute, causing it to use a
CPU implementation instead of a GPU one.
In the new design, when the then
sender is being connected to
sync_wait’s receiver, the starting
domain will still be the
default_domain, but when asking the
sender where it will complete, the answer will be different. Let’s see
how:
When asked for its completion domain, the
then sender will ask the
starts_on sender where it will
complete, as if by:
auto&&tmp1= ex::starts_on(gpu, ex::just()); autodom1= ex::get_completion_domain<ex::set_value_t>(ex::get_env(tmp1), ex::get_env(rcvr));
In turn, the starts_on sender
asks the
just()
sender where it will complete, telling it where it will start.
(This is the new bit.) It looks like:
auto&&tmp2= ex::just(); // ask for the gpu scheduler's domain: autogpu-dom= ex::get_completion_domain<ex::set_value_t>(gpu); // construct an env that reflects the fact thattmp2will be started on the gpu: autoenv2= ex::env{ex::prop{ex::get_scheduler, gpu}, ex::prop{ex::get_domain,gpu-dom}, ex::get_env(rcvr)}; // pass the new env when asking `just()` for its completion domain: autodom2= ex::get_completion_domain<ex::set_value_t>(ex::get_env(tmp2),env2);
The
just()
sender, when asked where it will complete, will respond with the domain
on which it is started. That information is provided by the
env2 environment passed to
the query: get_domain(.
That will return
env2)gpu-dom.
Having correctly determined that the
then sender will start on the
default domain and complete on the GPU domain,
connect can
select the right implementation for the
then algorithm. It does that by
calling:
return ex::transform_sender(sndr, ex::get_env(rcvr)).connect(rcvr);
The transform_sender call will
execute the following (simplified):
ex::default_domain().transform_sender(ex::start,gpu-dom.transform_sender(ex::set_value, sndr, ex::get_env(rcvr)), ex::get_env(rcvr))
The default_domain does not apply
any transformation to then senders,
so this expression reduces to:
gpu-dom.transform_sender(ex::set_value, sndr, ex::get_env(rcvr))
So, in the new customization scheme, the GPU domain gets a crack at
transforming the then sender before
it is connected to a receiver, as it should.