P2300R1: `std::execution`

1. Introduction

This paper proposes a self-contained design for a Standard C++ framework for managing asynchronous execution on generic execution contexts. It is based on the ideas in [P0443R14] and its companion papers.

1.1. Motivation

Today, C++ software is increasingly asynchronous and parallel, a trend that is likely to only continue going forward. Asynchrony and parallelism appears everywhere, from processor hardware interfaces, to networking, to file I/O, to GUIs, to accelerators. Every C++ domain and every platform need to deal with asynchrony and parallelism, from scientific computing to video games to financial services, from the smallest mobile devices to your laptop to GPUs in the world’s fastest supercomputer.

While the C++ Standard Library has a rich set concurrency primitives (std::atomic, std::mutex, std::counting_semaphore, etc) and lower level building blocks (std::thread, etc), we lack a Standard vocabulary and framework for asynchrony and parallelism that C++ programmers desperately need. std::async/std::future/std::promise, C++11’s intended exposure for asynchrony, is inefficient, hard to use correctly, and severely lacking in genericity, making it unusable in many contexts. We introduced parallel algorithms to the C++ Standard Library in C++17, and while they are an excellent start, they are all inherently synchronous and not composable.

This paper proposes a Standard C++ model for asynchrony, based around three key abstractions: schedulers, senders, and receivers, and a set of customizable asynchronous algorithms.

1.2. Priorities

Be composable and generic, allowing users to write code that can be used with many different types of execution contexts.
Encapsulate common asynchronous patterns in customizable and reusable algorithms, so users don’t have to invent things themselves.
Make it easy to be correct by construction.
Support both lazy and eager execution in a way that does not compromise the efficiency of either and allows users to write code that is agnostic to eagerness.
Support the diversity of execution contexts and execution agents, because not all execution agents are created equal; some are less capable than others, but not less important.
Allow everything to be customized by an execution context, including transfer to other execution contexts, but don’t require that execution contexts customize everything.
Care about all reasonable use cases, domains and platforms.
Errors must be propagated, but error handling must not present a burden.
Support cancellation, which is not an error.
Have clear and concise answers for where things execute.
Be able to manage and terminate the lifetimes of objects asynchronously.

1.3. Examples

See § 4.12 User-facing sender factories, § 4.13 User-facing sender adaptors, and § 4.14 User-facing sender consumers for short explanations of the algorithms used in these code examples.

1.3.1. Hello world

using namespace std::execution;

scheduler auto sch = get_thread_pool().scheduler();                           // 1

sender auto begin = schedule(sch);                                            // 2
sender auto hi_again = then(begin, []{                                        // 3
    std::cout << "Hello world! Have an int.";                                 // 3
    return 13;                                                                // 3
});                                                                           // 3
sender auto add_42 = then(hi_again, [](int arg) { return arg + 42; });        // 4

auto [i] = this_thread::sync_wait(add_42).value();                            // 5

This example demonstrates the basics of schedulers, senders, and receivers:

First we need to get a scheduler from somewhere, such as a thread pool. A scheduler is a lightweight handle to an execution resource.
To start a chain of work on a scheduler, we call § 4.12.1 execution::schedule, which returns a sender that completes on the scheduler. sender describes asynchronous work and sends a signal (value, error, or done) to some recipient(s) when that work completes.
We use sender algorithms to produce senders and compose asynchronous work. § 4.13.2 execution::then is a sender adaptor that takes an input sender and a std::invocable, and calls the std::invocable on the signal sent by the input sender. The sender returned by then sends the result of that invocation. In this case, the input sender came from schedule, so its void, meaning it won’t send us a value, so our std::invocable takes no parameters. But we return an int, which will be sent to the next recipient.
Now, we add another operation to the chain, again using § 4.13.2 execution::then. This time, we get sent a value - the int from the previous step. We add 42 to it, and then return the result.
Finally, we’re ready to submit the entire asynchronous pipeline and wait for its completion. Everything up until this point has been completely asynchronous; the work may not have even started yet. To ensure the work has started and then block pending its completion, we use § 4.14.2 this_thread::sync_wait, which will either return a std::optional<std::tuple<...>> with the value sent by the last sender, or an empty std::optional if the last sender sent a done signal, or it throws an exception if the last sender sent an error.

1.3.2. Asynchronous inclusive scan

using namespace std::execution;

sender auto async_inclusive_scan(scheduler auto sch,                          // 2
                                 std::span<const double> input,               // 1
                                 std::span<double> output,                    // 1
                                 double init,                                 // 1
                                 std::size_t tile_count)                      // 3
{
  std::size_t const tile_size = (input.size() + tile_count - 1) / tile_count;

  std::vector<double> partials(tile_count + 1);                               // 4
  partials[0] = init;                                                         // 4

  return transfer_just(sch, std::move(partials))                              // 5
       | bulk(tile_count,                                                     // 6
           [=](std::size_t i, std::vector<double>& partials) {                // 7
             auto start = i * tile_size;                                      // 8
             auto end   = std::min(input.size(), (i + 1) * tile_size);        // 8
             partials[i + 1] = *--std::inclusive_scan(begin(input) + start,   // 9
                                                      begin(input) + end,     // 9
                                                      begin(output) + start); // 9
           })                                                                 // 10
       | then(                                                                // 11
           [](std::vector<double>& partials) {
             std::inclusive_scan(begin(partials), end(partials),              // 12
                                 begin(partials));                            // 12
             return std::move(partials);                                      // 13
           })
       | bulk(tile_count,                                                     // 14
           [=](std::size_t i, std::vector<double>& partials) {                // 14
             auto start = i * tile_size;                                      // 14
             auto end   = std::min(input.size(), (i + 1) * tile_size);        // 14
             std::for_each(output + start, output + end,                      // 14
               [&] (double& e) { e = partials[i] + e; }                       // 14
             );
           })
       | then(                                                                // 15
           [=](std::vector<double>& partials) {                               // 15
             return output;                                                   // 15
           });                                                                // 15
}

This example builds an asynchronous computation of an inclusive scan:

It scans a sequence of doubles (represented as the std::span<const double> input) and stores the result in another sequence of doubles (represented as std::span<double> output).
It takes a scheduler, which specifies what execution context the scan should be launched on.
It also takes a tile_count parameter that controls the number of execution agents that will be spawned.
First we need to allocate temporary storage needed for the algorithm, which we’ll do with a std::vector, partials. We need one double of temporary storage for each execution agent we create.
Next we’ll create our initial sender with § 4.12.3 execution::transfer_just. This sender will send the temporary storage, which we’ve moved into the sender. The sender has a completion scheduler of sch, which means the next item in the chain will use sch.
Senders and sender adaptors support composition via operator|, similar to C++ ranges. We’ll use operator| to attach the next piece of work, which will spawn tile_count execution agents using § 4.13.7 execution::bulk (see § 4.11 Most sender adaptors are pipeable for details).
Each agent will call a std::invocable, passing it two arguments. The first is the agent’s index (i) in the § 4.13.7 execution::bulk operation, in this case a unique integer in [0, tile_count). The second argument is what the input sender sent - the temporary storage.
We start by computing the start and end of the range of input and output elements that this agent is responsible for, based on our agent index.
Then we do a sequential std::inclusive_scan over our elements. We store the scan result for our last element, which is the sum of all of our elements, in our temporary storage partials.
After all computation in that initial § 4.13.7 execution::bulk pass has completed, every one of the spawned execution agents will have written the sum of its elements into its slot in partials.
Now we need to scan all of the values in partials. We’ll do that with a single execution agent which will execute after the § 4.13.7 execution::bulk completes. We create that execution agent with § 4.13.2 execution::then.
§ 4.13.2 execution::then takes an input sender and an std::invocable and calls the std::invocable with the value sent by the input sender. Inside our std::invocable, we call std::inclusive_scan on partials, which the input senders will send to us.
Then we return partials, which the next phase will need.
Finally we do another § 4.13.7 execution::bulk of the same shape as before. In this § 4.13.7 execution::bulk, we will use the scanned values in partials to integrate the sums from other tiles into our elements, completing the inclusive scan.
async_inclusive_scan returns a sender that sends the output std::span<double>. A consumer of the algorithm can chain additional work that uses the scan result. At the point at which async_inclusive_scan returns, the computation may not have completed. In fact, it may not have even started.

1.3.3. Asynchronous dynamically-sized read

using namespace std::execution;

sender_of<std::size_t> auto async_read(                                       // 1
    sender_of<std::span<std::byte>> auto buffer,                              // 1
    auto handle);                                                             // 1

struct dynamic_buffer {                                                       // 3
  std::unique_ptr<std::byte[]> data;                                          // 3
  std::size_t size;                                                           // 3
};                                                                            // 3

sender_of<dynamic_buffer> auto async_read_array(auto handle) {                // 2
  return just(dynamic_buffer{})                                               // 4
       | let_value([] (dynamic_buffer& buf) {                                 // 5
           return just(std::as_writeable_bytes(std::span(&buf.size, 1))       // 6
                | async_read(handle)                                          // 7
                | then(                                                       // 8
                    [&] (std::size_t bytes_read) {                            // 9
                      assert(bytes_read == sizeof(buf.size));                 // 10
                      buf.data = std::make_unique(new std::byte[buf.size]);   // 11
                      return std::span(buf.data.get(), buf.size);             // 12
                    }
                | async_read(handle)                                          // 13
                | then(
                    [&] (std::size_t bytes_read) {
                      assert(bytes_read == buf.size);                         // 14
                      return std::move(buf);                                  // 15
                    });
       });
}

This example demonstrates a common asynchronous I/O pattern - reading a payload of a dynamic size by first reading the size, then reading the number of bytes specified by the size:

async_read is a pipeable sender adaptor. It’s a customization point object, but this is what it’s call signature looks like. It takes a sender parameter which must send an input buffer in the form of a std::span<std::byte>, and a handle to an I/O context. It will asynchronously read into the input buffer, up to the size of the std::span. It returns a sender which will send the number of bytes read once the read completes.
async_read_array takes an I/O handle and reads a size from it, and then a buffer of that many bytes. It returns a sender that sends a dynamic_buffer object that owns the data that was sent.
dynamic_buffer is an aggregate struct that contains a std::unique_ptr<std::byte[]> and a size.
The first thing we do inside of async_read_array is create a sender that will send a new, empty dynamic_array object using § 4.12.2 execution::just. We can attach more work to the pipeline using operator| composition (see § 4.11 Most sender adaptors are pipeable for details).
We need the lifetime of this dynamic_array object to last for the entire pipeline. So, we use let_value, which takes an input sender and a std::invocable that must return a sender itself (see § 4.13.4 execution::let_* for details). let_value sends the value from the input sender to the std::invocable. Critically, the lifetime of the sent object will last until the sender returned by the std::invocable completes.
Inside of the let_value std::invocable, we have the rest of our logic. First, we want to initiate an async_read of the buffer size. To do that, we need to send a std::span pointing to buf.size. We can do that with § 4.12.2 execution::just.
We chain the async_read onto the § 4.12.2 execution::just sender with operator|.
Next, we pipe a std::invocable that will be invoked after the async_read completes using § 4.13.2 execution::then.
That std::invocable gets sent the number of bytes read.
We need to check that the number of bytes read is what we expected.
Now that we have read the size of the data, we can allocate storage for it.
We return a std::span<std::byte> to the storage for the data from the std::invocable. This will be sent to the next recipient in the pipeline.
And that recipient will be another async_read, which will read the data.
Once the data has been read, in another § 4.13.2 execution::then, we confirm that we read the right number of bytes.
Finally, we move out of and return our dynamic_buffer object. It will get sent by the sender returned by async_read_array. We can attach more things to that sender to use the data in the buffer.

1.4. What this proposal is not

This paper is not a patch on top of [P0443R14]; we are not asking to update the existing paper, we are asking to retire it in favor of this paper, which is already self-contained; any example code within this paper can be written in Standard C++, without the need to standardize any further facilities.

This paper is not an alternative design to [P0443R14]; rather, we have taken the design in the current executors paper, and applied targeted fixes to allow it to fulfill the promises of the sender/receiver model, as well as provide all the facilities we consider essential when writing user code using standard execution concepts; we have also applied the guidance of removing one-way executors from the paper entirely, and instead provided an algorithm based around senders that serves the same purpose.

1.5. Design changes from P0443

The executor concept has been removed and all of its proposed functionality is now based on schedulers and senders, as per SG1 direction.
Properties are not included in this paper. We see them as a possible future extension, if the committee gets more comfortable with them.
Users now have a choice between using a strictly lazy vs a possibly eager version of most sender algorithms.
Senders now advertise what scheduler, if any, their evaluation will complete on.
The places of execution of user code in P0443 weren’t precisely defined, whereas they are in this paper. See § 4.5 Senders can propagate completion schedulers.
P0443 did not propose a suite of sender algorithms necessary for writing sender code; this paper does. See § 4.12 User-facing sender factories, § 4.13 User-facing sender adaptors, and § 4.14 User-facing sender consumers.
P0443 did not specify the semantics of variously qualified connect overloads; this paper does. See § 4.7 Senders can be either multi-shot or single-shot.
Specific type erasure facilities are omitted, as per LEWG direction. Type erasure facilities can be built on top of this proposal, as discussed in § 5.9 Ranges-style CPOs vs tag_invoke.
A specific thread pool implementation is omitted, as per LEWG direction.

1.6. Prior art

This proposal builds upon and learns from years of prior art with asynchronous and parallel programming frameworks in C++.

Futures, as traditionally realized, require the dynamic allocation and management of a shared state, synchronization, and typically type-erasure of work and continuation. Many of these costs are inherent in the nature of "future" as a handle to work that is already scheduled for execution. These expenses rule out the future abstraction for many uses and makes it a poor choice for a basis of a generic mechanism.

Coroutines suffer many of the same problems, but can avoid synchronizing when chaining dependent work because they typically start suspended. In many cases, coroutine frames require unavoidable dynamic allocation. Consequently, coroutines in embedded or heterogeneous environments require great attention to detail. Nor are coroutines good candidates for cancellation because the early and safe termination of coroutines requires unsatisfying solutions. On the one hand, exceptions are inefficient and disallowed in many environments. Alternatively, clumsy ad-hoc mechanisms, whereby co_yield returns a status code, hinder correctness. See [P1662R0] for a complete discussion.

Callbacks are the simplest, most powerful, and most efficient mechanism for creating chains of work, but suffer problems of their own. Callbacks must propagate either errors or values. This simple requirement yields many different interface possibilities, but the lack of a standard obstructs generic design. Additionally, few of these possibilities accommodate cancellation signals when the user requests upstream work to stop and clean up.

1.7. Field experience

This proposal draws heavily from our field experience with libunifex, Thrust, and Agency. It is also inspired by the needs of countless other C++ frameworks for asynchrony, parallelism, and concurrency, including:

Before this proposal is approved, we will present a new implementation of this proposal written from the specification and intended as a contribution to libc++. This implementation will demonstrate the viability of the design across the use cases and execution contexts that the committee has identified as essential.

2. Revision history

2.1. R1

The changes since R0 are as follows:

Added a new concept, sender_of.
Added a new scheduler query, this_thread::execute_may_block_caller.
Added a new scheduler query, get_forward_progress_guarantee.
Removed the unschedule adaptor.
Various fixes of typos and bugs.

2.2. R0

Initial revision.

3. Design - introduction

The following four sections describe the entirety of the proposed design.

§ 3 Design - introduction describes the conventions used through the rest of the design sections, as well as an example illustrating how we envision code will be written using this proposal.
§ 4 Design - user side describes all the functionality from the perspective we intend for users: it describes the various concepts they will interact with, and what their programming model is.
§ 5 Design - implementer side describes the machinery that allows for that programming model to function, and the information contained there is necessary for people implementing senders and sender algorithms (including the standard library ones) - but is not necessary to use senders productively.

3.1. Conventions

The following conventions are used throughout the design section:

The namespace proposed in this paper is the same as in [P0443R14]: std::execution; however, for brevity, the std:: part of this name is omitted. When you see execution::foo, treat that as std::execution::foo.
Universal references and explicit calls to std::move/std::forward are omitted in code samples and signatures for simplicity; assume universal references and perfect forwarding unless stated otherwise.
None of the names proposed here are names that we are particularly attached to; consider the names to be reasonable placeholders that can freely be changed, should the committee want to do so.

3.2. Queries and algorithms

A query is a std::invocable that takes some set of objects (usually one) as parameters and returns facts about those objects without modifying them. Queries are usually customization point objects, but in some cases may be functions.

An algorithm is a std::invocable that takes some set of objects as parameters and causes those objects to do something. Algorithms are usually customization point objects, but in some cases may be functions.

4. Design - user side

4.1. Execution contexts describe the place of execution

An execution context is a resource that represents the place where execution will happen. This could be a concrete resource - like a specific thread pool object, or a GPU - or a more abstract one, like the current thread of execution. Execution contexts don’t need to have a representation in code; they are simply a term describing certain properties of execution of a function.

4.2. Schedulers represent execution contexts

A scheduler is a lightweight handle that represents a strategy for scheduling work onto an execution context. Since execution contexts don’t necessarily manifest in C++ code, it’s not possible to program directly against their API. A scheduler is a solution to that problem: the scheduler concept is defined by a single sender algorithm, schedule, which returns a sender that will complete on an execution context determined by the scheduler. Logic that you want to run on that context can be placed in the receiver’s completion-signalling method.

execution::scheduler auto sch = get_thread_pool().scheduler();
execution::sender auto snd = execution::schedule(sch);
// snd is a sender (see below) describing the creation of a new execution resource
// on the execution context associated with sch

Note that a particular scheduler type may provide other kinds of scheduling operations which are supported by its associated execution context. It is not limited to scheduling purely using the execution::schedule API.

Future papers will propose additional scheduler concepts that extend scheduler to add other capabilities. For example:

A time_scheduler concept that extends scheduler to support time-based scheduling. Such a concept might provide access to schedule_after(sched, duration), schedule_at(sched, time_point) and now(sched) APIs.
Concepts that extend scheduler to support opening, reading and writing files asynchronously.
Concepts that extend scheduler to support connecting, sending data and receiving data over the network asynchronously.

4.3. Senders describe work

A sender is an object that describes work. Senders are similar to futures in existing asynchrony designs, but unlike futures, the work that is being done to arrive at the values they will send is also directly described by the sender object itself. A sender is said to send some values if a receiver connected (see § 5.3 execution::connect) to that sender will eventually receive said values.

The primary defining sender algorithm is § 5.3 execution::connect; this function, however, is not a user-facing API; it is used to facilitate communication between senders and various sender algorithms, but end user code is not expected to invoke it directly.

The way user code is expected to interact with senders is by using sender algorithms. This paper proposes an initial set of such sender algorithms, which are described in § 4.4 Senders are composable through sender algorithms, § 4.12 User-facing sender factories, § 4.13 User-facing sender adaptors, and § 4.14 User-facing sender consumers. For example, here is how a user can create a new sender on a scheduler, attach a continuation to it, and then wait for execution of the continuation to complete:

execution::scheduler auto sch = get_thread_pool().scheduler();
execution::sender auto snd = execution::schedule(sch);
execution::sender auto cont = execution::then(snd, []{
    std::fstream file{ "result.txt" };
    file << compute_result;
});

this_thread::sync_wait(cont);
// at this point, cont has completed execution

4.4. Senders are composable through sender algorithms

Asynchronous programming often departs from traditional code structure and control flow that we are familiar with. A successful asynchronous framework must provide an intuitive story for composition of asynchronous work: expressing dependencies, passing objects, managing object lifetimes, etc.

The true power and utility of senders is in their composability. With senders, users can describe generic execution pipelines and graphs, and then run them on and across a variety of different schedulers. Senders are composed using sender algorithms:

sender factories, algorithms that take no senders and return a sender.
sender adaptors, algorithms that take (and potentially execution::connect) senders and return a sender.
sender consumers, algorithms that take (and potentially execution::connect) senders and do not return a sender.

4.5. Senders can propagate completion schedulers

One of the goals of executors is to support a diverse set of execution contexts, including traditional thread pools, task and fiber frameworks (like HPX) and Legion), and GPUs and other accelerators (managed by runtimes such as CUDA or SYCL). On many of these systems, not all execution agents are created equal and not all functions can be run on all execution agents. Having precise control over the execution context used for any given function call being submitted is important on such systems, and the users of standard execution facilities will expect to be able to express such requirements.

[P0443R14] was not always clear about the place of execution of any given piece of code. Precise control was present in the two-way execution API present in earlier executor designs, but it has so far been missing from the senders design. There has been a proposal ([P1897R3]) to provide a number of sender algorithms that would enforce certain rules on the places of execution of the work described by a sender, but we have found those sender algorithms to be insufficient for achieving the best performance on all platforms that are of interest to us. The implementation strategies that we are aware of result in one of the following situations:

trying to submit work to one execution context (such as a CPU thread pool) from another execution context (such as a GPU or a task framework), which assumes that all execution agents are as capable as a std::thread (which they aren’t).
forcibly interleaving two adjacent execution graph nodes that are both executing on one execution context (such as a GPU) with glue code that runs on another execution context (such as a CPU), which is prohibitively expensive for some execution contexts (such as CUDA or SYCL).
having to customise most or all sender algorithms to support an execution context, so that you can avoid problems described in 1. and 2, which we believe is impractical and brittle based on months of field experience attempting this in Agency.

None of these implementation strategies are acceptable for many classes of parallel runtimes, such as task frameworks (like HPX) or accelerator runtimes (like CUDA or SYCL).

Therefore, in addition to the on sender algorithm from [P1897R3], we are proposing a way for senders to advertise what scheduler (and by extension what execution context) they will complete on. Any given sender may have completion schedulers for some or all of the signals (value, error, or done) it completes with (for more detail on the completion signals, see § 5.1 Receivers serve as glue between senders). When further work is attached to that sender by invoking sender algorithms, that work will also complete on an appropriate completion scheduler.

4.5.1. `execution::get_completion_scheduler`

get_completion_scheduler is a query that retrieves the completion scheduler for a specific completion signal from a sender. Calling get_completion_scheduler on a sender that does not have a completion scheduler for a given signal is ill-formed. If a sender advertises a completion scheduler for a signal in this way, that sender must ensure that it sends that signal on an execution agent belonging to an execution context represented by a scheduler returned from this function. See § 4.5 Senders can propagate completion schedulers for more details.

execution::scheduler auto cpu_sched = new_thread_scheduler{};
execution::scheduler auto gpu_sched = cuda::scheduler();

execution::sender auto snd0 = execution::schedule(cpu_sched);
execution::scheduler auto completion_sch0 =
  execution::get_completion_scheduler<execution::set_value_t>(snd0);
// completion_sch0 is equivalent to cpu_sched

execution::sender auto snd1 = execution::then(snd0, []{
    std::cout << "I am running on cpu_sched!\n";
});
execution::scheduler auto completion_sch1 =
  execution::get_completion_scheduler<execution::set_value_t>(snd1);
// completion_sch1 is equivalent to cpu_sched

execution::sender auto snd2 = execution::transfer(snd1, gpu_sched);
execution::sender auto snd3 = execution::then(snd2, []{
    std::cout << "I am running on gpu_sched!\n";
});
execution::scheduler auto completion_sch3 =
  execution::get_completion_scheduler<execution::set_value_t>(snd3);
// completion_sch3 is equivalent to gpu_sched

4.6. Execution context transitions are explicit

[P0443R14] does not contain any mechanisms for performing an execution context transition. The only sender algorithm that can create a sender that will move execution to a specific execution context is execution::schedule, which does not take an input sender. That means that there’s no way to construct sender chains that traverse different execution contexts. This is necessary to fulfill the promise of senders being able to replace two-way executors, which had this capability.

We propose that, for senders advertising their completion scheduler, all execution context transitions must be explicit; running user code anywhere but where they defined it to run must be considered a bug.

The execution::transfer sender adaptor performs a transition from one execution context to another:

execution::scheduler auto sch1 = ...;
execution::scheduler auto sch2 = ...;

execution::sender auto snd1 = execution::schedule(sch1);
execution::sender auto then1 = execution::then(snd1, []{
    std::cout << "I am running on sch1!\n";
});

execution::sender auto snd2 = execution::transfer(then1, sch2);
execution::sender auto then2 = execution::then(snd2, []{
    std::cout << "I am running on sch2!\n";
});

this_thread::sync_wait(then2);

4.7. Senders can be either multi-shot or single-shot

Some senders may only support launching their operation a single time, while others may be repeatable and support being launched multiple times. Executing the operation may consume resources owned by the sender.

For example, a sender may contain a std::unique_ptr that it will be transferring ownership of to the operation-state returned by a call to execution::connect so that the operation has access to this resource. In such a sender, calling execution::connect consumes the sender such that after the call the input sender is no longer valid. Such a sender will also typically be move-only so that it can maintain unique ownership of that resource.

A single-shot sender can only be connected to a receiver at most once. Its implementation of execution::connect only has overloads for an rvalue-qualified sender. Callers must pass the sender as an rvalue to the call to execution::connect, indicating that the call consumes the sender.

A multi-shot sender can be connected to multiple receivers and can be launched multiple times. Mult-shot senders customise execution::connect to accept an lvalue reference to the sender. Callers can indicate that they want the sender to remain valid after the call to execution::connect by passing an lvalue reference to the sender to call these overloads. Multi-shot senders should also define overloads of execution::connect that accept rvalue-qualified enders to allow the sender to be also used in places where only a single-shot sender is required.

If the user of a sender does not require the sender to remain valid after connecting it to a receiver then it can pass an rvalue-reference to the sender to the call to execution::connect. Such usages should be able to accept either single-shot or multi-shot senders.

If the caller does wish for the sender to remain valid after the call then it can pass an lvalue-qualified sender to the call to execution::connect. Such usages will only accept multi-shot senders.

Algorithms that accept senders will typically either decay-copy an input sender and store it somewhere for later usage (for example as a data-member of the returned sender) or will immediately call execution::connect on the input sender, such as in this_thread::sync_wait or execution::start_detached.

Some multi-use sender algorithms may require that an input sender be copy-constructible but will only call execution::connect on an rvalue of each copy, which still results in effectively executing the operation multiple times. Other multi-use sender algorithms may require that the sender is move-constructible but will invoke execution::connect on an lvalue reference to the sender.

For a sender to be usable in both multi-use scenarios, it will generally be required to be both copy-constructible and lvalue-connectable.

4.8. Senders are forkable

Any non-trivial program will eventually want to fork a chain of senders into independent streams of work, regardless of whether they are single-shot or multi-shot. For instance, an incoming event to a middleware system may be required to trigger events on more than one downstream system. This requires that we provide well defined mechanisms for making sure that connecting a sender multiple times is possible and correct.

The split sender adaptor facilitates connecting to a sender multiple times, regardless of whether it is single-shot or multi-shot:

auto some_algorithm(execution::sender auto&& input) {
    execution::sender auto multi_shot = split(input);
    // "multi_shot" is guaranteed to be multi-shot,
    // regardless of whether "input" was multi-shot or not

    return when_all(
      then(multi_shot, [] { std::cout << "First continuation\n"; }),
      then(multi_shot, [] { std::cout << "Second continuation\n"; })
    );
}

4.9. Senders are joinable

Similarly to how it’s hard to write a complex program that will eventually want to fork sender chains into independent streams, it’s also hard to write a program that does not want to eventually create join nodes, where multiple independent streams of execution are merged into a single one in an asynchronous fashion.

when_all is a sender adaptor that returns a sender that completes when the last of the input senders completes. It sends a pack of values, where the elements of said pack are the values sent by the input senders, in order. when_all returns a sender that also does not have an associated scheduler.

transfer_when_all accepts an additional scheduler argument. It returns a sender whose value completion scheduler is the scheduler provided as an argument, but otherwise behaves the same as when_all. You can think of it as a composition of transfer(when_all(inputs...), scheduler), but one that allows for better efficiency through customization.

4.10. Schedulers advertise their forward progress guarantees

To decide whether a scheduler (and its associated execution context) is sufficient for a specific task, it may be necessary to know what kind of forward progress guarantees it provides for the execution agents it creates. The C++ Standard defines the following forward progress guarantees:

concurrent, which requires that a thread makes progress eventually;
parallel, which requires that a thread makes progress once it executes a step; and
weakly parallel, which does not require that the thread makes progress.

This paper introduces a scheduler query function, get_forward_progress_guarantee, which returns one of the enumerators of a new enum type, forward_progress_guarantee. Each enumerator of forward_progress_guarantee corresponds to one of the aforementioned guarantees.

4.11. Most sender adaptors are pipeable

To facilitate an intuitive syntax for composition, most sender adaptors are pipeable; they can be composed (piped) together with operator|. This mechanism is similar to the operator| composition that C++ range adaptors support and draws inspiration from piping in *nix shells. Pipeable sender adaptors take a sender as their first parameter and have no other sender parameters.

a | b will pass the sender a as the first argument to the pipeable sender adaptor b. Pipeable sender adaptors support partial application of the parameters after the first. For example, all of the following are equivalent:

execution::bulk(snd, N, [] (std::size_t i, auto d) {});
execution::bulk(N, [] (std::size_t i, auto d) {})(snd);
snd | execution::bulk(N, [] (std::size_t i, auto d) {});

Piping enables you to compose together senders with a linear syntax. Without it, you’d have to use either nested function call syntax, which would cause a syntactic inversion of the direction of control flow, or you’d have to introduce a temporary variable for each stage of the pipeline. Consider the following example where we want to execute first on a CPU thread pool, then on a CUDA GPU, then back on the CPU thread pool:

Syntax Style	Example
Function call (nested)	auto snd = execution::then( execution::transfer( execution::then( execution::transfer( execution::then( execution::schedule(get_thread_pool_scheduler()) []{ return 123; }), cuda::new_stream_scheduler()), [](int i){ return 123 * 5; }), get_thread_pool()), [](int i){ return i - 5; }); auto [result] = this_thread::sync_wait(snd).value(); // result == 610
Function call (named temporaries)	auto snd0 = execution::schedule(get_thread_pool_scheduler()); auto snd1 = execution::then(snd0, []{ return 123; }); auto snd2 = execution::transfer(snd1, cuda::new_stream_scheduler()); auto snd3 = execution::then(snd2, [](int i){ return 123 * 5; }) auto snd4 = execution::transfer(snd3, get_thread_pool()) auto snd5 = execution::then(snd4, [](int i){ return i - 5; }); auto [result] = *this_thread::sync_wait(snd4); // result == 610
Pipe	auto snd = execution::schedule(get_thread_pool_scheduler()) \| execution::then([]{ return 123; }) \| execution::transfer(cuda::new_stream_scheduler()) \| execution::then([](int i){ return 123 * 5; }) \| execution::transfer(get_thread_pool()) \| execution::then([](int i){ return i - 5; }); auto [result] = this_thread::sync_wait(snd).value(); // result == 610

Syntax Style

Example

Function call
(nested)

auto snd = execution::then(
             execution::transfer(
               execution::then(
                 execution::transfer(
                   execution::then(
                     execution::schedule(get_thread_pool_scheduler())
                     []{ return 123; }),
                   cuda::new_stream_scheduler()),
                 [](int i){ return 123 * 5; }),
               get_thread_pool()),
             [](int i){ return i - 5; });
auto [result] = this_thread::sync_wait(snd).value();
// result == 610

Function call
(named temporaries)

auto snd0 = execution::schedule(get_thread_pool_scheduler());
auto snd1 = execution::then(snd0, []{ return 123; });
auto snd2 = execution::transfer(snd1, cuda::new_stream_scheduler());
auto snd3 = execution::then(snd2, [](int i){ return 123 * 5; })
auto snd4 = execution::transfer(snd3, get_thread_pool())
auto snd5 = execution::then(snd4, [](int i){ return i - 5; });
auto [result] = *this_thread::sync_wait(snd4);
// result == 610

Pipe

auto snd = execution::schedule(get_thread_pool_scheduler())
         | execution::then([]{ return 123; })
         | execution::transfer(cuda::new_stream_scheduler())
         | execution::then([](int i){ return 123 * 5; })
         | execution::transfer(get_thread_pool())
         | execution::then([](int i){ return i - 5; });
auto [result] = this_thread::sync_wait(snd).value();
// result == 610

Certain sender adaptors are not be pipeable, because using the pipeline syntax can result in confusion of the semantics of the adaptors involved. Specifically, the following sender adaptors are not pipeable.

execution::when_all and execution::when_all_with_variant: Since this sender adaptor takes a variadic pack of senders, a partially applied form would be ambiguous with a non partially applied form with an arity of one less.
execution::on and execution::lazy_on: This sender adaptor changes how the sender passed to it is executed, not what happens to its result, but allowing it in a pipeline makes it read as if it performed a function more similar to transfer.

Sender consumers could be made pipeable, but we have chosen to not do so. However, since these are terminal nodes in a pipeline and nothing can be piped after them, we believe a pipe syntax may be confusing as well as unnecessary, as consumers cannot be chained. We believe sender consumers read better with function call syntax.

4.12. User-facing sender factories

A sender factory is an algorithm that takes no senders as parameters and returns a sender.

4.12.1. `execution::schedule`

execution::sender auto schedule(
    execution::scheduler auto scheduler
);

Returns a sender describing the start of a task graph on the provided scheduler. See § 4.2 Schedulers represent execution contexts.

execution::scheduler auto sch1 = get_system_thread_pool().scheduler();

execution::sender auto snd1 = execution::schedule(sch1);
// snd1 describes the creation of a new task on the system thread pool

4.12.2. `execution::just`

execution::sender auto just(
    auto ...&& values
);

Returns a sender with no completion schedulers, which sends the provided values. If a provided value is an lvalue reference, a copy is made inside the returned sender and a non-const lvalue reference to the copy is sent. If the provided value is an rvalue reference, it is moved into the returned sender and an rvalue reference to it is sent.

execution::sender auto snd1 = execution::just(3.14);
execution::sender auto then1 = execution::then(snd1, [] (double d) {
  std::cout << d << "\n";
});

execution::sender auto snd2 = execution::just(3.14, 42);
execution::sender auto then2 = execution::then(snd1, [] (double d, int i) {
  std::cout << d << ", " << i << "\n";
});

std::vector v3{1, 2, 3, 4, 5};
execution::sender auto snd3 = execution::just(v3);
execution::sender auto then3 = execution::then(snd3, [] (std::vector<int>& v3copy) {
  for (auto&& e : v3copy) { e *= 2; } return v3copy;
}
auto&& [v3copy] = this_thread::sync_wait(then3).value();
// v3 contains {1, 2, 3, 4, 5}; v3copy will contain {2, 4, 6, 8, 10}.

execution::sender auto snd4 = execution::just(std::vector{1, 2, 3, 4, 5});
execution::sender auto then4 = execution::then(snd4, [] (std::vector<int>&& v4) {
  for (auto&& e : v4) { e *= 2; }
  return std::move(v4);
});
auto&& [v4] = this_thread::sync_wait(then4).value();
// v4 contains {2, 4, 6, 8, 10}.

4.12.3. `execution::transfer_just`

execution::sender auto transfer_just(
    execution::scheduler auto scheduler,
    auto ...&& values
);

Returns a sender whose value completion scheduler is the provided scheduler, which sends the provided values in the same manner as just.

execution::sender auto vals = execution::transfer_just(
    get_system_thread_pool().scheduler(),
    1, 2, 3
);
execution::sender auto snd = execution::then(pred, [](auto... args) {
    std::print(args..);
});
// when snd is executed, it will print "123"

This adaptor is included as it greatly simplifies lifting values into senders.

4.13. User-facing sender adaptors

A sender adaptor is an algorithm that takes one or more senders, which it may execution::connect, as parameters, and returns a sender, whose completion is related to the sender arguments it has received.

Many sender adaptors come in two versions: a strictly lazy one, which is never allowed to submit any work for execution prior to the returned sender being started later on, and a potentially eager one, which is allowed to submit work prior to the returned sender being started. Sender consumers such as § 4.13.11 execution::ensure_started, § 4.14.1 execution::start_detached, and § 4.14.2 this_thread::sync_wait start senders; the implementations of non-lazy versions of the sender adaptors are allowed, but not guaranteed, to start senders.

The strictly lazy versions of the adaptors below (that is, all the versions whose names start with lazy_) are guaranteed to not start any input senders passed into them.

For more implementer-centric description of starting senders, see § 5.5 Laziness is defined by sender adaptors.

4.13.1. `execution::transfer`

execution::sender auto transfer(
    execution::sender auto input,
    execution::scheduler auto scheduler
);

execution::sender auto lazy_transfer(
    execution::sender auto input,
    execution::scheduler auto scheduler
);

Returns a sender describing the transition from the execution agent of the input sender to the execution agent of the target scheduler. See § 4.6 Execution context transitions are explicit.

execution::scheduler auto cpu_sched = get_system_thread_pool().scheduler();
execution::scheduler auto gpu_sched = cuda::scheduler();

execution::sender auto cpu_task = execution::schedule(cpu_sched);
// cpu_task describes the creation of a new task on the system thread pool

execution::sender auto gpu_task = execution::transfer(cpu_task, gpu_sched);
// gpu_task describes the transition of the task graph described by cpu_task to the gpu

4.13.2. `execution::then`

execution::sender auto then(
    execution::sender auto input,
    std::invocable<values-sent-by(input)...> function
);

execution::sender auto lazy_then(
    execution::sender auto input,
    std::invocable<values-sent-by(input)...> function
);

then returns a sender describing the task graph described by the input sender, with an added node of invoking the provided function with the values sent by the input sender as arguments.

lazy_then is guaranteed to not begin executing function until the returned sender is started.

execution::sender auto input = get_input();
execution::sender auto snd = execution::then(input, [](auto... args) {
    std::print(args..);
});
// snd describes the work described by pred
// followed by printing all of the values sent by pred

This adaptor is included as it is necessary for writing any sender code that actually performs a useful function.

4.13.3. `execution::upon_*`

execution::sender auto upon_error(
    execution::sender auto input,
    std::invocable<errors-sent-by(input)...> function
);

execution::sender auto lazy_upon_error(
    execution::sender auto input,
    std::invocable<errors-sent-by(input)...> function
);

execution::sender auto upon_error(
    execution::sender auto input,
    std::invocable<> function
);

execution::sender auto lazy_upon_error(
    execution::sender auto input,
    std::invocable<> function
);

upon_error and upon_done are similar to then, but where then works with values sent by the input sender, upon_error works with errors, and upon_done is invoked when the "done" signal is sent.

4.13.4. `execution::let_*`

execution::sender auto let_value(
    execution::sender auto input,
    std::invocable<values-sent-by(input)...> function
);

execution::sender auto lazy_let_value(
    execution::sender auto input,
    std::invocable<values-sent-by(input)...> function
);

execution::sender auto let_error(
    execution::sender auto input,
    std::invocable<errors-sent-by(input)...> function
);

execution::sender auto lazy_let_error(
    execution::sender auto input,
    std::invocable<errors-sent-by(input)...> function
);

execution::sender auto let_done(
    execution::sender auto input,
    std::invocable<> function
);

execution::sender auto lazy_let_done(
    execution::sender auto input,
    std::invocable<> function
);

let_value is very similar to then: when it is started, it invokes the provided function with the values sent by the input sender as arguments. However, where the sender returned from then sends exactly what that function ends up returning - let_value requires that the function return a sender, and the sender returned by let_value sends the values sent by the sender returned from the callback. This is similar to the notion of "future unwrapping" in future/promise-based frameworks.

lazy_let_value is guaranteed to not begin executing function until the returned sender is started.

let_error and let_done are similar to let_value, but where let_value works with values sent by the input sender, let_error works with errors, and let_done is invoked when the "done" signal is sent.

4.13.5. `execution::on`

execution::sender auto on(
    execution::scheduler auto sched,
    execution::sender auto snd
);

execution::sender auto lazy_on(
    execution::scheduler auto sched,
    execution::sender auto snd
);

Returns a sender which, when started, will start the provided sender on an execution agent belonging to the execution context associated with the provided scheduler. This returned sender has no completion schedulers.

4.13.6. `execution::into_variant`

execution::sender into_variant(
    execution::sender auto snd
);

Returns a sender which sends a variant of tuples of all the possible sets of types sent by the input sender. Senders can send multiple sets of values depending on runtime conditions; this is a helper function that turns them into a single variant value.

4.13.7. `execution::bulk`

execution::sender auto bulk(
    execution::sender auto input,
    std::integral auto size,
    invocable<decltype(size), values-sent-by(input)...> function
);

execution::sender auto lazy_bulk(
    execution::sender auto input,
    std::integral auto size,
    invocable<decltype(size), values-sent-by(input)...> function
);

Returns a sender describing the task of invoking the provided function with the values sent by the input sender for every index in the provided shape.

In this paper, only integral types satisfy the concept of a shape, but future papers will explore bulk shapes of different kinds in more detail.

lazy_bulk is guaranteed to not begin executing function until the returned sender is started.

4.13.8. `execution::split`

execution::sender auto split(execution::sender auto sender);

execution::sender auto lazy_split(execution::sender auto sender);

If the provided sender is a multi-shot sender, returns that sender. Otherwise, returns a multi-shot sender which sends values equivalent to the values sent by the provided sender. See § 4.7 Senders can be either multi-shot or single-shot.

4.13.9. `execution::when_all`

execution::sender auto when_all(
    execution::sender auto ...inputs
);

execution::sender auto when_all_with_variant(
    execution::sender auto ...inputs
);

when_all returns a sender which completes once all of the input senders have completed. The values send by this sender are the values sent by each of the input, in order of the arguments passed to when_all.

when_all_with_variant does the same, but it adapts all the input senders using into_variant.

The returned sender has no completion schedulers.

when_all is a strictly lazy adaptor. It is guaranteed to not start any of the input senders until the returned sender is started.

See § 4.9 Senders are joinable.

execution::scheduler auto sched = get_thread_pool().scheduler();

execution::sender auto sends_1 = ...;
execution::sender auto sends_abc = ...;

execution::sender auto both = execution::when_all(sched,
    sends_1,
    sends_abc
);

execution::sender auto final = execution::then(both, [](auto... args){
    std::cout << std::format("the two args: {}, {}", args...);
});
// when final executes, it will print "the two args: 1, abc"

4.13.10. `execution::transfer_when_all`

execution::sender auto transfer_when_all(
    execution::scheduler auto sched,
    execution::sender auto ...inputs
);

execution::sender auto transfer_when_all_with_variant(
    execution::scheduler auto sched,
    execution::sender auto ...inputs
);

execution::sender auto lazy_transfer_when_all(
    execution::scheduler auto sched,
    execution::sender auto ...inputs
);

execution::sender auto lazy_transfer_when_all_with_variant(
    execution::scheduler auto sched,
    execution::sender auto ...inputs
);

Similar to § 4.13.9 execution::when_all, but returns a sender whose value completion scheduler is the provided scheduler.

See § 4.9 Senders are joinable.

4.13.11. `execution::ensure_started`

execution::sender auto ensure_started(
    execution::sender auto sender
);

Once ensure_started returns, it is known that the provided sender has been connected and start has been called on the resulting operation state (see § 5.2 Operation states represent work); in other words, the work described by the provided sender has been submitted for execution on the appropriate execution contexts. Returns a sender which completes when the provided sender completes and sends values equivalent to those of the provided sender.

4.14. User-facing sender consumers

A sender consumer is an algorithm that takes one or more senders, which it may execution::connect, as parameters, and does not return a sender.

4.14.1. `execution::start_detached`

void auto start_detached(
    execution::sender auto sender
);

Like ensure_started, but does not return a value; if the provided sender sends an error instead of a value, std::terminate is called.

4.14.2. `this_thread::sync_wait`

auto sync_wait(
    execution::sender auto sender
) requires (always-sends-same-values(sender))
    -> std::optional<std::tuple<values-sent-by(sender)>>;

this_thread::sync_wait is a sender consumer that submits the work described by the provided sender for execution, similarly to ensure_started, except that it blocks the current std::thread or thread of main until the work is completed, and returns an optional tuple of values that were sent by the provided sender on its completion of work. Where § 4.12.1 execution::schedule and § 4.12.3 execution::transfer_just are meant to enter the domain of senders, sync_wait is meant to exit the domain of senders, retrieving the result of the task graph.

If the provided sender sends an error instead of values, sync_wait throws that error as an exception, or rethrows the original exception if the error is of type std::exception_ptr.

If the provided sender sends the "done" signal instead of values, sync_wait returns an empty optional.

For an explanation of the requires clause, see § 5.8 Most senders are typed. That clause also explains another sender consumer, built on top of sync_wait: sync_wait_with_variant.

Note: This function is specified inside std::this_thread, and not inside execution. This is because sync_wait has to block the current execution agent, but determining what the current execution agent is is not reliable. Since the standard does not specify any functions on the current execution agent other than those in std::this_thread, this is the flavor of this function that is being proposed. If C++ ever obtains fibers, for instance, we expect that a variant of this function called std::this_fiber::sync_wait would be provided. We also expect that runtimes with execution agents that use different synchronization mechanisms than std::thread's will provide their own flavors of sync_wait as well (assuming their execution agents have the means to block in a non-deadlock manner).

4.15. `execution::execute`

In addition to the three categories of functions presented above, we also propose to include a convenience function for fire-and-forget eager one-way submission of an invocable to a scheduler, to fulfil the role of one-way executors from P0443.

void execution::execute(
    execution::schedule auto sched,
    std::invocable auto fn
);

Submits the provided function for execution on the provided scheduler, as-if by:

auto snd = execution::schedule(sched);
auto work = execution::then(snd, fn);
execution::start_detached(work);

5. Design - implementer side

5.1. Receivers serve as glue between senders

A receiver is a callback that supports more than one channel. In fact, it supports three of them:

set_value, which is the moral equivalent of an operator() or a function call, which signals successful completion of the operation its execution depends on;
set_error, which signals that an error has happened during scheduling of the current work, executing the current work, or at some earlier point in the sender chain; and
set_done, which signals that the operation completed without succeeding (set_value) and without failing (set_error). This result is often used to indicate that the operation stopped early, typically because it was asked to do so because the result is no longer needed.

Exactly one of these channels must be successfully (i.e. without an exception being thrown) invoked on a receiver before it is destroyed; if a call to set_value failed with an exception, either set_error or set_done must be invoked on the same receiver. These requirements are know as the receiver contract.

While the receiver interface may look novel, it is in fact very similar to the interface of std::promise, which provides the first two signals as set_value and set_error, and it’s possible to emulate the third channel with lifetime management of the promise.

Receivers are not a part of the end-user-facing API of this proposal; they are necessary to allow unrelated senders communicate with each other, but the only users who will interact with receivers directly are authors of senders.

Receivers are what is passed as the second argument to § 5.3 execution::connect.

5.2. Operation states represent work

An operation state is an object that represents work. Unlike senders, it is not a chaining mechanism; instead, it is a concrete object that packages the work described by a full sender chain, ready to be executed. An operation state is neither movable nor copyable, and its interface consists of a single algorithm: start, which serves as the submission point of the work represented by a given operation state.

Operation states are not a part of the user-facing API of this proposal; they are necessary for implementing sender consumers like execution::ensure_started and this_thread::sync_wait, and the knowledge of them is necessary to implement senders, so the only users who will interact with operation states directly are authors of senders and authors of sender algorithms.

The return value of § 5.3 execution::connect must satisfy the operation state concept.

5.3. `execution::connect`

execution::connect is a customization point which connects senders with receivers, resulting in an operation state that will ensure that the receiver contract of the receiver passed to connect will be fulfilled.

execution::sender auto snd = some input sender;
execution::receiver auto rcv = some receiver;
execution::operation_state auto state = execution::connect(snd, rcv);

execution::start(state);
// at this point, it is guaranteed that the work represented by state has been submitted
// to an execution context, and that execution context will eventually fulfill the
// receiver contract of rcv

// operation states are not movable, and therefore this operation state object must be
// kept alive until the operation finishes

5.4. Sender algorithms are customizable

Senders being able to advertise what their completion schedulers are fulfills one of the promises of senders: that of being able to customize an implementation of a sender algorithm based on what scheduler any work it depends on will complete on.

The simple way to provide customizations for functions like then, that is for sender adaptors and sender consumers, is to follow the customization scheme that has been adopted for C++20 ranges library; to do that, we would define the expression execution::then(sender, invocable) to be equivalent to:

sender.then(invocable), if that expression is well formed; otherwise
then(sender, invocable), performed in a context where this call always performs ADL, if that expression is well formed; otherwise
a default implementation of then, which returns a sender adaptor, and then define the exact semantics of said adaptor.

However, this definition is problematic. Imagine another sender adaptor, bulk, which is a structured abstraction for a loop over an index space. Its default implementation is just a for loop. However, for accelerator runtimes like CUDA, we would like sender algorithms like bulk to have specialized behavior, which invokes a kernel of more than one thread (with its size defined by the call to bulk); therefore, we would like to customize bulk for CUDA senders to achieve this. However, there’s no reason for CUDA kernels to necessarily customize the then sender adaptor, as the generic implementation is perfectly sufficient. This creates a problem, though; consider the following snippet:

execution::scheduler auto cuda_sch = cuda_scheduler{};

execution::sender auto initial = execution::schedule(cuda_sch);
// the type of initial is a type defined by the cuda_scheduler
// let’s call it cuda::schedule_sender<>

execution::sender auto next = execution::then(cuda_sch, []{ return 1; });
// the type of next is a standard-library implementation-defined sender adaptor
// that wraps the cuda sender
// let’s call it execution::then_sender_adaptor<cuda::schedule_sender<>>

execution::sender auto kernel_sender = execution::bulk(next, shape, [](int i){ ... });

How can we specialize the bulk sender adaptor for our wrapped schedule_sender? Well, here’s one possible approach, taking advantage of ADL (and the fact that the definition of "associated namespace" also recursively enumerates the associated namespaces of all template parameters of a type):

namespace cuda::for_adl_purposes {
template<typename... SentValues>
class schedule_sender {
    execution::operation_state auto connect(execution::receiver auto rcv);
    execution::scheduler auto get_completion_scheduler() const;
};

execution::sender auto bulk(
    execution::sender auto && input,
    execution::shape auto && shape,
    invocable<sender-values(input)> auto && fn)
{
    // return a cuda sender representing a bulk kernel launch
}
} // namespace cuda::for_adl_purposes

However, if the input sender is not just a then_sender_adaptor like in the example above, but another sender that overrides bulk by itself, as a member function, because its author believes they know an optimization for bulk - the specialization above will no longer be selected, because a member function of the first argument is a better match than the ADL-found overload.

This means that well-meant specialization of sender algorithms that are entirely scheduler-agnostic can have negative consequences. The scheduler-specific specialization - which is essential for good performance on platforms providing specialized ways to launch certain sender algorithms - would not be selected in such cases. But it’s really the scheduler that should control the behavior of sender algorithms when a non-default implementation exists, not the sender. Senders merely describe work; schedulers, however, are the handle to the runtime that will eventually execute said work, and should thus have the final say in how the work is going to be executed.

Therefore, we are proposing the following customization scheme (also modified to take § 5.9 Ranges-style CPOs vs tag_invoke into account): the expression execution::<sender-algorithm>(sender, args...), for any given sender algorithm that accepts a sender as its first argument, should be equivalent to:

tag_invoke(<sender-algorithm>, get_completion_scheduler<Signal>(sender), sender, args...), if that expression is well-formed; otherwise
tag_invoke(<sender-algorithm>, sender, args...), if that expression is well-formed; otherwise
a default implementation, if there exists a default implementation of the given sender algorithm.

where Signal is one of set_value, set_error, or set_done; for most sender algorithms, the completion scheduler for set_value would be used, but for some (like upon_error or let_done), one of the others would be used.

For sender algorithms which accept concepts other than sender as their first argument, we propose that the customization scheme remains as it has been in [P0443R14] so far, except it should also use tag_invoke.

5.5. Laziness is defined by sender adaptors

We distinguish two different guarantees about when work is submitted to an execution context:

strictly lazy submission, which means that there is a guarantee that no work is submitted to an execution context before a receiver is connected to a sender, and execution::start is called on the resulting operation state;
potentially eager submission, which means that work may be submitted to an execution context as soon as all the information necessary to perform it is provided.

If a sender adaptor requires potentially eager submission, strictly lazy submission is acceptable as an implementation, because it does fulfill the potentially eager guarantee. This is why the default implementations for the non-strictly-lazy sender adaptors are specified to dispatch to the strictly lazy ones; for an author of a specific sender, it is sufficient to specialize the strictly lazy version, to also achieve a specialization of the potentially eager one.

As has been described in § 4.13 User-facing sender adaptors, whether a sender adaptor is guaranteed to perform strictly lazy submission or not is defined by the adaptor used to perform it; the adaptors whose names begin with lazy_ provide the strictly lazy guarantee.

5.6. Lazy senders provide optimization opportunities

Because lazy senders fundamentally describe work, instead of describing or representing the submission of said work to an execution context, and thanks to the flexibility of the customization of most sender algorithms, they provide an opportunity for fusing multiple algorithms in a sender chain together, into a single function that can later be submitted for execution by an execution context. There are two ways this can happen.

The first (and most common) way for such optimizations to happen is thanks to the structure of the implementation: because all the work is done within callbacks invoked on the completion of an earlier sender, recursively up to the original source of computation, the compiler is able to see a chain of work described using senders as a tree of tail calls, allowing for inlining and removal of most of the sender machinery. In fact, when work is not submitted to execution contexts outside of the current thread of execution, compilers are capable of removing the senders abstraction entirely, while still allowing for composition of functions across different parts of a program.

The second way for this to occur is when a sender algorithm is specialized for a specific set of arguments. For instance, we expect that, for senders which are known to have been started already, § 4.13.11 execution::ensure_started will be an identity transformation, because the sender algorithm will be specialized for such senders. Similarly, an implementation could recognize two subsequent lazy § 4.13.7 execution::bulks of compatible shapes, and merge them together into a single submission of a GPU kernel.

5.7. Execution context transitions are two-step

Because execution::transfer takes a sender as its first argument, it is not actually directly customizable by the target scheduler. This is by design: the target scheduler may not know how to transition from a scheduler such as a CUDA scheduler; transitioning away from a GPU in an efficient manner requires making runtime calls that are specific to the GPU in question, and the same is usually true for other kinds of accelerators too (or for scheduler running on remote systems). To avoid this problem, specialized schedulers like the ones mentioned here can still hook into the transition mechanism, and inject a sender which will perform a transition to the regular CPU execution context, so that any sender can be attached to it.

This, however, is a problem: because customization of sender algorithms must be controlled by the scheduler they will run on (see § 5.4 Sender algorithms are customizable), the type of the sender returned from transfer must be controllable by the target scheduler. Besides, the target scheduler may itself represent a specialized execution context, which requires additional work to be performed to transition to it. GPUs and remote node schedulers are once again good examples of such schedulers: executing code on their execution contexts requires making runtime API calls for work submission, and quite possibly for the data movement of the values being sent by the input sender passed into transfer.

To allow for such customization from both ends, we propose the inclusion of a secondary transitioning sender adaptor, called schedule_from. This adaptor is a form of schedule, but takes an additional, second argument: the input sender. This adaptor is not meant to be invoked manually by the end users; they are always supposed to invoke transfer, to ensure that both schedulers have a say in how the transitions are made. Any scheduler that specializes transfer(snd, sch) shall ensure that the return value of their customization is equivalent to schedule_from(sch, snd2), where snd2 is a successor of snd that sends values equivalent to those sent by snd.

The default implementation of transfer(snd, sched) is schedule_from(sched, snd).

5.8. Most senders are typed

All senders should advertise the types they will send when they complete. This is necessary for a number of features, and writing code in a way that’s agnostic of whether an imput sender is typed or not in common sender adaptors such as execution::then is hard.

The mechanism for this advertisement is the same as in [P0443R14]; the way to query the types is through sender_traits::value_types<tuple_like, variant_like>.

sender_traits::value_types is a template that takes two arguments: one is a tuple-like template, the other is a variant-like template. The tuple-like argument is required to represent senders sending more than one value (such as when_all). The variant-like argument is required to represent senders that choose which specific values to send at runtime.

There’s a choice made in the specification of § 4.14.2 this_thread::sync_wait: it returns a tuple of values sent by the sender passed to it, wrapped in std::optional to handle the set_done signal. However, this assumes that those values can be represented as a tuple, like here:

execution::sender auto sends_1 = ...;
execution::sender auto sends_2 = ...;
execution::sender auto sends_3 = ...;

auto [a, b, c] = this_thread::sync_wait(
    execution::transfer_when_all(
        execution::get_completion_scheduler<execution::set_value_t>(sends_1),
        sends_1,
        sends_2,
        sends_3
    )).value();
// a == 1
// b == 2
// c == 3

This works well for senders that always send the same set of arguments. If we ignore the possibility of having a sender that sends different sets of arguments into a receiver, we can specify the "canonical" (i.e. required to be followed by all senders) form of value_types of a sender which sends Types... to be as follows:

template<template<typename ...> typename TupleLike>
using value_types = TupleLike;

If senders could only ever send one specific set of values, this would probably need to be the required form of value_types for all senders; defining it otherwise would cause very weird results and should be considered a bug.

This matter is somewhat complicated by the fact that (1) set_value for receivers can be overloaded and accept different sets of arguments, and (2) senders are allowed to send multiple different sets of values, depending on runtime conditions, the data they consumed, and so on. To accomodate this, [P0443R14] also includes a second template parameter to value_types, one that represents a variant-like type. If we permit such senders, we would almost certainly need to require that the canonical form of value_types for all senders (to ensure consistency in how they are handled, and to avoid accidentally interpreting a user-provided variant as a sender-provided one) sending the different sets of arguments Types1..., Types2..., ..., TypesN... to be as follows:

template<
    template<typename ...> typename TupleLike,
    template<typename ...> typename VariantLike
>
using value_types = VariantLike<
    TupleLike<Types1...>,
    TupleLike<Types2...>,
    ...,
    TupleLike<Types3...>
>;

This, however, introduces a couple of complications:

A just(1) sender would also need to follow this structure, so the correct type for storing the value sent by it would be std::variant<std::tuple<int>> or some such. This introduces a lot of compile time overhead for the simplest senders, and this overhead effectively exists in all places in the code where value_types is queried, regardless of the tuple-like and variant-like templates passed to it. Such overhead does exist if only the tuple-like parameter exists, but is made much worse by adding this second wrapping layer.
As a consequence of (1): because sync_wait needs to store the above type, it can no longer return just a std::tuple<int> for just(1); it has to return std::variant<std::tuple<int>>. C++ currently does not have an easy way to destructure this; it may get less awkward with pattern matching, but even then it seems extremely heavyweight to involve variants in this API, and for the purpose of generic code, the kind of the return type of sync_wait must be the same across all sender types.

One possible solution to (2) above is to place a requirement on sync_wait that it can only accept senders which send only a single set of values, therefore removing the need for std::variant to appear in its API; because of this, we propose to expose both sync_wait, which is a simple, user-friendly version of the sender consumer, but requires that value_types have only one possible variant, and sync_wait_with_variant, which accepts any sender, but returns an optional whose value type is the variant of all the possible tuples sent by the input sender:

auto sync_wait_with_variant(
    execution::sender auto sender
) -> std::optional<std::variant<
        std::tuple<values₀-sent-by(sender)>,
        std::tuple<values₁-sent-by(sender)>,
        ...,
        std::tuple<values_n-sent-by(sender)>
    >>;

auto sync_wait(
    execution::sender auto sender
) requires (always-sends-same-values(sender))
    -> std::optional<std::tuple<values-sent-by(sender)>>;

5.9. Ranges-style CPOs vs `tag_invoke`

The contemporary technique for customization in the Standard Library is customization point objects. A customization point object, will it look for member functions and then for nonmember functions with the same name as the customization point, and calls those if they match. This is the technique used by the C++20 ranges library, and previous executors proposals ([P0443R14] and [P1897R3]) intended to use it as well. However, it has several unfortunate consequences:

It does not allow for easy propagation of customization points unknown to the adaptor to a wrapped object, which makes writing universal adapter types much harder - and this proposal uses quite a lot of those.
It effectively reserves names globally. Because neither member names nor ADL-found functions can be qualified with a namespace, every customization point object that uses the ranges scheme reserves the name for all types in all namespaces. This is unfortunate due to the sheer number of customization points already in the paper, but also ones that we are envisioning in the future. It’s also a big problem for one of the operations being proposed already: sync_wait. We imagine that if, in the future, C++ was to gain fibers support, we would want to also have std::this_fiber::sync_wait, in addition to std::this_thread::sync_wait. However, because we would want the names to be the same in both cases, we would need to make the names of the customizations not match the names of the customization points. This is undesirable.

This paper proposes to instead use the mechanism described in [P1895R0]: tag_invoke; the wording for tag_invoke has been incorporated into the proposed specification in this paper.

In short, instead of using globally reserved names, tag_invoke uses the type of the customization point object itself as the mechanism to find customizations. It globally reserves only a single name - tag_invoke - which itself is used the same way that ranges-style customization points are used. All other customization points are defined in terms of tag_invoke. For example, the customization for std::this_thread::sync_wait(s) will call tag_invoke(std::this_thread::sync_wait, s), instead of attempting to invoke s.sync_wait(), and then sync_wait(s) if the member call is not valid.

Using tag_invoke has the following benefits:

It reserves only a single global name, instead of reserving a global name for every customization point object we define.

It is possible to propagate customizations to a subobject, because the information of which customization point is being resolved is in the type of an argument, and not in the name of the function:

// forward most customizations to a subobject
template<typename Tag, typename ...Args>
friend auto tag_invoke(Tag && tag, wrapper & self, Args &&... args) {
    return std::forward<Tag>(tag)(self.subobject, std::forward<Args>(args)...);
}

// but override one of them with a specific value
friend auto tag_invoke(specific_customization_point_t, wrapper & self) {
    return self.some_value;
}

It is possible to pass those as template arguments to types, because the information of which customization point is being resolved is in the type. Similarly to how [P0443R14] defines a polymorphic executor wrapper which accepts a list of properties it supports, we can imagine scheduler and sender wrappers that accept a list of queries and operations they support. That list can contain the types of the customization point objects, and the polymorphic wrappers can then specialize those customization points on themselves using tag_invoke, dispatching to manually constructed vtables containing pointers to specialized implementations for the wrapped objects. For an example of such a polymorphic wrapper, see unifex::any_unique (example).

6. Specification

Much of this wording follows the wording of [P0443R14].

§ 7 General utilities library [utilities] is meant to be a diff relative to the wording of the [utilities] clause of [N4885]. This diff applies changes from [P1895R0].

§ 8 Thread support library [thread] is meant to be a diff relative to the wording of the [thread] clause of [N4885]. This diff applies changes from [P2175R0].

§ 9 Execution control library [execution] is meant to be added as a new library clause to the working draft of C++.

7. General utilities library [utilities]

7.1. Function objects [function.objects]

7.1.1. Header `<functional>` synopsis [functional.syn]

At the end of this subclause, insert the following declarations into the synopsis within namespace std:

// [func.tag_invoke], tag_invoke
inline namespace unspecified {
  inline constexpr unspecified tag_invoke = unspecified;
}

template<auto& Tag>
  using tag_t = decay_t<decltype(Tag)>;

template<class Tag, class... Args>
  concept tag_invocable =
    invocable<decltype(tag_invoke), Tag, Args...>;

template<class Tag, class... Args>
  concept nothrow_tag_invocable =
    tag_invocable<Tag, Args...> &&
    is_nothrow_invocable_v<decltype(tag_invoke), Tag, Args...>;

template<class Tag, class... Args>
  using tag_invoke_result = invoke_result<decltype(tag_invoke), Tag, Args...>;

template<class Tag, class... Args>
  using tag_invoke_result_t = invoke_result_t<decltype(tag_invoke), Tag, Args...>;

7.1.2. `execution::tag_invoke` [func.tag_invoke]

Insert this section as a new subclause, between Searchers [func.search] and Class template hash [unord.hash].

The name std::tag_invoke denotes a customization point object. For some subexpressions tag and args..., tag_invoke(tag, args...) is expression-equivalent to an unqualified call to tag_invoke(decay-copy(tag), args...) with overload resolution performed in a context that includes the declaration:
void tag_invoke();
and that does not include the the std::tag_invoke name.

8. Thread support library [thread]

Note: The specification in this section is incomplete; it does not provide an API specification for the new types added into <stop_token>. For a less formal specification of the missing pieces, see the "Proposed Changes" section of [P2175R0]. A future revision of this paper will contain a full specification for the new types.

8.1. Stop tokens [thread.stoptoken]

8.1.1. Header `<stop_token>` synopsis [thread.stoptoken.syn]

At the beginning of this subclause, insert the following declarations into the synopsis within namespace std:

template<template<typename> class>
  struct check-type-alias-exists; // exposition-only

template<typename T>
  concept stoppable_token = see-below;

template<typename T, typename CB, typename Initializer = CB>
  concept stoppable_token_for = see-below;

template<typename T>
  concept unstoppable_token = see-below;

At the end of this subclause, insert the following declarations into the synopsis of within namespace std:

// [stoptoken.never], class never_stop_token
class never_stop_token;

// [stoptoken.inplace], class in_place_stop_token
class in_place_stop_token;

// [stopsource.inplace], class in_place_stop_source
class in_place_stop_source;

// [stopcallback.inplace], class template in_place_stop_callback
template<typename Callback>
class in_place_stop_callback;

8.1.2. Stop token concepts [thread.stoptoken.concepts]

Insert this section as a new subclause between Header <stop_token> synopsis [thread.stoptoken.syn] and Class stop_token [stoptoken].

The stoppable_token concept checks for the basic interface of a “stop token” which is copyable and allows polling to see if stop has been requested and also whether a stop request is possible. It also requires an associated nested template-type-alias, T::callback_type<CB>, that identifies the stop-callback type to use to register a callback to be executed if a stop-request is ever made on a stoppable_token of type, T. The stoppable_token_for concept checks for a stop token type compatible with a given callback type. The unstoppable_token concept checks for a stop token type that does not allow stopping.
template<typename T>
  concept stoppable_token =
    copy_constructible<T> &&
    move_constructible<T> &&
    is_nothrow_copy_constructible_v<T> &&
    is_nothrow_move_constructible_v<T> &&
    equality_comparable<T> &&
    requires (const T& token) {
      { token.stop_requested() } noexcept -> boolean-testable;
      { token.stop_possible() } noexcept -> boolean-testable;
      typename check-type-alias-exists<T::template callback_type>;
    };

template<typename T, typename CB, typename Initializer = CB>
  concept stoppable_token_for =
    stoppable_token<T> &&
    invocable<CB> &&
    requires {
      typename T::template callback_type<CB>;
    } &&
    constructible_from<CB, Initializer> &&
    constructible_from<typename T::template callback_type<CB>, T, Initializer> &&
    constructible_from<typename T::template callback_type<CB>, T&, Initializer> &&
    constructible_from<typename T::template callback_type<CB>, const T, Initializer> &&
    constructible_from<typename T::template callback_type<CB>, const T&, Initializer>;

template<typename T>
  concept unstoppable_token =
    stoppable_token<T> &&
    requires {
      { T::stop_possible() } -> boolean-testable;
    } &&
    (!T::stop_possible());
Let t and u be distinct object of type T. The type T models stoppable_token only if:

All copies of a stoppable_token reference the same logical shared stop state and shall report values consistent with each other.

If t.stop_possible() evaluates to false then, if u, references the same logical shared stop state, u.stop_possible() shall also subsequently evaluate to false and u.stop_requested() shall also subsequently evaluate to false.

If t.stop_requested() evaluates to true then, if u, references the same logical shared stop state, u.stop_requested() shall also subsequently evaluate to true and u.stop_possible() shall also subsequently evaluate to true.

Given a callback-type, CB, and a callback-initializer argument, init, of type Initializer then constructing an instance, cb, of type T::callback_type<CB>, passing t as the first argument and init as the second argument to the constructor, shall, if t.stop_possible() is true, construct an instance, callback, of type CB, direct-initialized with init, and register callback with t’s shared stop state such that callback will be invoked with an empty argument list if a stop request is made on the shared stop state.

If t.stop_requested() is true at the time callback is registered then callback may be invoked immediately inline inside the call to cb’s constructor.

If callback is invoked then, if u references the same shared stop state as t, an evaluation of u.stop_requested() will be true if the beginning of the invocation of callback strongly-happens-before the evaluation of u.stop_requested().

If t.stop_possible() evaluates to false then the construction of cb is not required to construct and initialize callback.

Construction of a T::callback_type<CB> instance shall only throw exceptions thrown by the initialization of the CB instance from the value of type Initializer.

Destruction of the T::callback_type<CB> object, cb, removes callback from the shared stop state such that callback will not be invoked after the destructor returns.

If callback is currently being invoked on another thread then the destructor of cb will block until the invocation of callback returns such that the return from the invocation of callback strongly-happens-before the destruction of callback.

Destruction of a callback cb shall not block on the completion of the invocation of some other callback registered with the same shared stop state.

9. Execution control library [execution]

This Clause describes components supporting execution of function objects [function.objects].
The following subclauses describe the requirements, concepts, and components for execution control primitives as summarized in Table 1.

Table 1: Execution control library summary **[tab:execution.summary]**
	Subclause	Header
[execution.schedulers]	Schedulers	`<execution>`
[execution.receivers]	Receivers
[execution.op_state]	Operation states
[execution.senders]	Senders
[execution.execute]	One-way execution

9.1. Header `<execution>` synopsis [execution.syn]

namespace std::execution {
  // [execution.helpers], helper concepts
  template<class T>
    concept moveable-value = see-below; // exposition only

  // [execution.schedulers], schedulers
  template<class S>
    concept scheduler = see-below;

  // [execution.schedulers.queries], scheduler queries
  enum class forward_progress_guarantee;
  inline namespace unspecified {
    struct get_forward_progress_guarantee_t;
    inline constexpr get_forward_progress_guarantee_t get_forward_progress_guarantee{};
  }
}

namespace std::this_thread {
  inline namespace unspecified {
    struct execute_may_block_caller_t;
    inline constexpr execute_may_block_caller_t execute_may_block_caller{};
  }
}

namespace std::execution {
  // [execution.receivers], receivers
  template<class T, class E = exception_ptr>
    concept receiver = see-below;

  template<class T, class... An>
    concept receiver_of = see-below;

  inline namespace unspecified {
    struct set_value_t;
    inline constexpr set_value_t set_value{};
    struct set_error_t;
    inline constexpr set_error_t set_error{};
    struct set_done_t;
    inline constexpr set_done_t set_done{};
  }

  // [execution.receivers.queries], receiver queries
  inline namespace unspecified {
    struct get_scheduler_t;
    inline constexpr get_scheduler_t get_scheduler{};
    struct get_allocator_t;
    inline constexpr get_allocator_t get_allocator{};
    struct get_stop_token_t;
    inline constexpr get_stop_token_t get_stop_token{};
  }

  // [execution.op_state], operation states
  template<class O>
    concept operation_state = see-below;

  inline namespace unspecified {
    struct start_t;
    inline constexpr start_t start{};
  }

  // [execution.senders], senders
  template<class S>
    concept sender = see-below;

  template<class S, class R>
    concept sender_to = see-below;

  template<class S>
    concept has-sender-types = see-below; // exposition only

  template<class S>
    concept typed_sender = see-below;

  template<class... Ts>
    struct type-list = see-below; // exposition only

  template<class S, class ...Ts>
    concept sender_of = see-below;

  // [execution.senders.traits], sender traits
  inline namespace unspecified {
    struct sender_base {};
  }

  template<class S>
    struct sender_traits;

  inline namespace unspecified {
    // [execution.senders.connect], the connect sender algorithm
    struct connect_t;
    inline constexpr connect_t connect{};

    // [execution.senders.queries], sender queries
    template<class CPO>
    struct get_completion_scheduler_t;
    template<class CPO>
    inline constexpr get_completion_scheduler_t get_completion_scheduler{};

    // [execution.senders.factories], sender factories
    struct schedule_t;
    inline constexpr schedule_t schedule{};
    template<class... Ts>
      struct just-sender; // exposition only
    template<moveable-value... Ts>
      just-sender<remove_cvref_t<Ts>...> just(Ts &&...);
    struct transfer_just_t;
    inline constexpr transfer_just_t transfer_just{};

    // [execution.senders.adaptors], sender adaptors
    struct on_t;
    inline constexpr on_t on{};
    struct lazy_on_t;
    inline constexpr lazy_on_t lazy_on{};
    struct transfer_t;
    inline constexpr transfer_t transfer{};
    struct lazy_transfer_t;
    inline constexpr lazy_transfer_t lazy_transfer{};
    struct schedule_from_t;
    inline constexpr schedule_from_t schedule_from{};
    struct lazy_schedule_from_t;
    inline constexpr lazy_schedule_from_t lazy_schedule_from{};

    struct then_t;
    inline constexpr then_t then{};
    struct lazy_then_t;
    inline constexpr lazy_then_t lazy_then{};
    struct upon_error_t;
    inline constexpr upon_error_t upon_error{};
    struct lazy_upon_error_t;
    inline constexpr lazy_upon_error_t lazy_upon_error{};
    struct upon_done_t;
    inline constexpr upon_done_t upon_done{};
    struct lazy_upon_done_t;
    inline constexpr lazy_upon_done_t lazy_upon_done{};

    struct let_value_t;
    inline constexpr let_value_t let_value{};
    struct lazy_let_value_t;
    inline constexpr lazy_let_value_t lazy_let_value{};
    struct let_error_t;
    inline constexpr let_error_t let_error{};
    struct lazy_let_error_t;
    inline constexpr lazy_let_error_t lazy_let_error{};
    struct let_done_t;
    inline constexpr let_done_t let_done{};
    struct lazy_let_done_t;
    inline constexpr lazy_let_done_t lazy_let_done{};

    struct bulk_t;
    inline constexpr bulk_t bulk{};
    struct lazy_bulk_t;
    inline constexpr lazy_bulk_t lazy_bulk{};

    struct split_t;
    inline constexpr split_t split{};
    struct lazy_split_t;
    inline constexpr lazy_split_t lazy_split{};
    struct when_all_t;
    inline constexpr when_all_t when_all{};
    struct when_all_with_variant_t;
    inline constexpr when_all_with_variant_t when_all_with_variant{};
    struct transfer_when_all_t;
    inline constexpr transfer_when_all_t transfer_when_all{};
    struct lazy_transfer_when_all_t;
    inline constexpr lazy_transfer_when_all_t lazy_transfer_when_all{};
    struct transfer_when_all_with_variant_t;
    inline constexpr transfer_when_all_with_variant_t
      transfer_when_all_with_variant{};
    struct lazy_transfer_when_all_with_variant_t;
    inline constexpr lazy_transfer_when_all_with_variant_t
      lazy_transfer_when_all_with_variant{};

    template<typed_sender S>
      using into-variant-type = see-below; // exposition-only
    template<typed_sender S>
      see-below into_variant(S &&);

    // [execution.senders.consumers], sender consumers
    struct ensure_started_t;
    inline constexpr ensure_started_t ensure_started{};

    struct start_detached_t;
    inline constexpr start_detached_t start_detached{};
  }
}

namespace std::this_thread {
  inline namespace unspecified {
    template<typed_sender S>
      using sync-wait-type = see-below; // exposition-only
    template<typed_sender S>
      using sync-wait-with-variant-type = see-below; // exposition-only

    struct sync_wait_t;
    inline constexpr sync_wait_t sync_wait{};
    struct sync_wait_with_variant_t;
    inline constexpr sync_wait_with_variant_t sync_wait_with_variant{};
  }
}

namespace std::execution {
  inline namespace unspecified {
    // [execution.execute], one-way execution
    struct execute_t;
    inline constexpr execute_t execute{};
  }
}

9.2. Helper concepts [execution.helpers]

template<class T>
concept moveable-value = // exposition only
  move_constructible<remove_cvref_t<T>> &&
  constructible_from<remove_cvref_t<T>, T>;

9.3. Schedulers [execution.schedulers]

The scheduler concept defines the requirements of a type that allows for scheduling of work on its associated execution context.

template<class S>
  concept scheduler =
    copy_constructible<remove_cvref_t<S>> &&
    equality_comparable<remove_cvref_t<S>> &&
    requires(S&& s) {
      execution::schedule((S&&)s);
    };

None of a scheduler’s copy constructor, destructor, equality comparison, or swap member functions shall exit via an exception.
None of these member functions, nor a scheduler type’s schedule function, shall introduce data races as a result of concurrent invocations of those functions from different threads.
For any two (possibly const) values s1 and s2 of some scheduler type S, s1 == s2 shall return true only if both s1 and s2 are handles to the same associated execution context.
A scheduler type’s destructor shall not block pending completion of any receivers connected to the sender objects returned from schedule. [Note: The ability to wait for completion of submitted function objects may be provided by the associated execution context of the scheduler. —end note]

9.3.1. Scheduler queries [execution.schedulers.queries]

9.3.1.1. `execution::get_forward_progress_guarantee` [execution.schedulers.queries.get_forward_progress_guarantee]

enum class forward_progress_guarantee {
    concurrent,
    parallel,
    weakly_parallel
};

execution::get_forward_progress_guarantee is used to ask a scheduler about the forward progress guarantees of execution agents created by that scheduler.
The name execution::get_forward_progress_guarantee denotes a customization point object. For some subexpression s, let S be decltype((s)). If S does not satisfy execution::scheduler, execution::get_forward_progress_guarantee is ill-formed. Otherwise, execution::get_forward_progress_guarantee(s) is expression equivalent to:
1. tag_invoke(execution::get_forward_progress_guarantee, as_const(s)), if this expression is well formed and its type is execution::forward_progress_guarantee, and is noexcept.
2. Otherwise, execution::forward_progress_guarantee::weakly_parallel.
If execution::get_forward_progress_guarantee(s) for some scheduler s returns execution::forward_progress_guarantee::concurrent, all execution agents created by that scheduler shall provide the concurrent forward progress guarantee. If it returns execution::forward_progress_guarantee::parallel, all execution agents created by that scheduler shall provide at least the parallel forward progress guarantee.

9.3.1.2. `this_thread::execute_may_block_caller` [execution.schedulers.queries.execute_may_block_caller

this_thread::execute_may_block_caller is used to ask a scheduler s whether a call execution::execute(s, f) with any invocable f may block the thread where such a call occurs.
The name this_thread::execute_may_block_caller denotes a customization point object. For some subexpression s, let S be decltype((s)). If S does not satisfy execution::scheduler, this_thread::execute_may_block_caller is ill-formed. Otherwise, this_thread::execute_may_block_caller(s) is expression equivalent to:
1. tag_invoke(this_thread::execute_may_block_caller, as_const(s)), if this expression is well formed and its type is bool, and is noexcept.
2. Otherwise, true.
If this_thread::execute_may_block_caller(s) for some scheduler s returns false, no execution::execute(s, f) call with some invocable f shall block the calling thread.

9.4. Receivers [execution.receivers]

A receiver represents the continuation of an asynchronous operation. An asynchronous operation may complete with a (possibly empty) set of values, an error, or it may be cancelled. A receiver has three principal operations corresponding to the three ways an asynchronous operation may complete: set_value, set_error, and set_done. These are collectively known as a receiver’s completion-signal operations.

The receiver concept defines the requirements for a receiver type with an unknown set of value types. The receiver_of concept defines the requirements for a receiver type with a known set of value types, whose error type is std::exception_ptr.

template<class T, class E = exception_ptr>
concept receiver =
  move_constructible<remove_cvref_t<T>> &&
  constructible_from<remove_cvref_t<T>, T> &&
  requires(remove_cvref_t<T>&& t, E&& e) {
    { execution::set_done(std::move(t)) } noexcept;
    { execution::set_error(std::move(t), (E&&) e) } noexcept;
  };

template<class T, class... An>
concept receiver_of =
  receiver<T> &&
  requires(remove_cvref_t<T>&& t, An&&... an) {
    execution::set_value(std::move(t), (An&&) an...);
  };

The receiver’s completion-signal operations have semantic requirements that are collectively known as the receiver contract, described below:
1. None of a receiver’s completion-signal operations shall be invoked before execution::start has been called on the operation state object that was returned by execution::connect to connect that receiver to a sender.
2. Once execution::start has been called on the operation state object, exactly one of the receiver’s completion-signal operations shall complete non-exceptionally before the receiver is destroyed.
3. If execution::set_value exits with an exception, it is still valid to call execution::set_error or execution::set_done on the receiver, but it is no longer valid to call execution::set_value on the receiver.
Once one of a receiver’s completion-signal operations has completed non-exceptionally, the receiver contract has been satisfied.

9.4.1. `execution::set_value` [execution.receivers.set_value]

execution::set_value is used to send a value completion signal to a receiver.
The name execution::set_value denotes a customization point object. The expression execution::set_value(R, Vs...) for some subexpressions R and Vs... is expression-equivalent to:
1. tag_invoke(execution::set_value, R, Vs...), if that expression is valid. If the function selected by tag_invoke does not send the value(s) Vs... to the receiver R’s value channel, the program is ill-formed with no diagnostic required.
2. Otherwise, execution::set_value(R, Vs...) is ill-formed.

9.4.2. `execution::set_error` [execution.receivers.set_error]

execution::set_error is used to send a error signal to a receiver.
The name execution::set_error denotes a customization point object. The expression execution::set_error(R, E) for some subexpressions R and E is expression-equivalent to:
1. tag_invoke(execution::set_error, R, E), if that expression is valid. If the function selected by tag_invoke does not send the error E to the receiver R’s error channel, the program is ill-formed with no diagnostic required.
2. Otherwise, execution::set_error(R, E) is ill-formed.

9.4.3. `execution::set_done` [execution.receivers.set_done]

execution::set_done is used to send a done signal to a receiver.
The name execution::set_done denotes a customization point object. The expression execution::set_done(R) for some subexpression R is expression-equivalent to:
1. tag_invoke(execution::set_done, R), if that expression is valid. If the function selected by tag_invoke does not signal the receiver R’s done channel, the program is ill-formed with no diagnostic required.
2. Otherwise, execution::set_done(R) is ill-formed.

9.4.4. Receiver queries [execution.receivers.queries]

9.4.4.1. `execution::get_scheduler` [execution.receivers.queries.get_scheduler]

execution::get_scheduler is used to ask a receiver object for a suggested scheduler to be used by a sender it is connected to when it needs to launch additional work. [Note: the presence of this query on a receiver does not bind a sender to use its result. --end note]
The name execution::get_scheduler denotes a customization point object. For some subexpression r, let R be decltype((r)). If R does not satisfy execution::receiver, execution::get_scheduler is ill-formed. Otherwise, execution::get_scheduler(r) is expression equivalent to:
1. tag_invoke(execution::get_scheduler, as_const(r)), if this expression is well formed and satisfies execution::scheduler, and is noexcept.
2. Otherwise, execution::get_scheduler(r) is ill-formed.

9.4.4.2. `execution::get_allocator` [execution.receivers.queries.get_allocator]

execution::get_allocator is used to ask a receiver object for a suggested allocator to be used by a sender it is connected to when it needs to allocate memory. [Note: the presence of this query on a receiver does not bind a sender to use its result. --end note]
The name execution::get_allocator denotes a customization point object. For some subexpression r, let R be decltype((r)). If R does not satisfy execution::receiver, execution::get_allocator is ill-formed. Otherwise, execution::get_allocator(r) is expression equivalent to:
1. tag_invoke(execution::get_allocator, as_const(r)), if this expression is well formed and models Allocator, and is noexcept.
2. Otherwise, execution::get_allocator(r) is ill-formed.

9.4.4.3. `execution::get_stop_token` [execution.receivers.queries.get_stop_token]

execution::get_stop_token is used to ask a receiver object for an associated stop token of that receiver. A sender connected with that receiver can use this stop token to check whether a stop request has been made. [Note: such a stop token being signalled does not bind the sender to actually cancel any work. --end note]
The name execution::get_stop_token denotes a customization point object. For some subexpression r, let R be decltype((r)). If R does not satisfy execution::receiver, execution::get_stop_token is ill-formed. Otherwise, execution::get_stop_token(r) is expression equivalent to:
1. tag_invoke(execution::get_stop_token, as_const(r)), if this expression is well formed and satisfies stoppable_token, and is noexcept.
2. Otherwise, never_stop_token{}.
Let r be a receiver, s be a sender, and op_state be an operation state resulting from an execution::connect(s, r) call. Let token be a stop token resulting from an execution::get_stop_token(r) call. token must remain valid at least until a call to a receiver completion-signal function of r returns successfully. [Note: this means that, unless it knows about further guarantees provided by the receiver r, the implementation of op_state should not use token after it makes a call to a receiver completion-signal function of r. This also implies that stop callbacks registered on token by the implementation of op_state or s must be destroyed before such a call to a receiver completion-signal function of r. --end note]

9.5. Operation states [execution.op_state]

The operation_state concept defines the requirements for an operation state type, which allows for starting the execution of work.

template<class O>
  concept operation_state =
    destructible<O> &&
    is_object_v<O> &&
    requires (O& o) {
      { execution::start(o) } noexcept;
    };

9.5.1. `execution::start` [execution.op_state.start]

execution::start is used to start work represented by an operation state object.
The name execution::start denotes a customization point object. The expression execution::start(O) for some lvalue subexpression O is expression-equivalent to:
1. tag_invoke(execution::start, O), if that expression is valid. If the function selected by tag_invoke does not start the work represented by the operation state O, the program is ill-formed with no diagnostic required.
2. Otherwise, execution::start(O) is ill-formed.
The caller of execution::start(O) must guarantee that the lifetime of the operation state object O extends at least until one of the receiver completion-signal functions of a receiver R passed into the execution::connect call that produced O is ready to successfully return. [Note: this allows for the receiver to manage the lifetime of the operation state object, if destroying it is the last operation it performs in its completion-signal functions. --end note]

9.6. Senders [execution.senders]

A sender describes a potentially asynchronous operation. A sender’s responsibility is to fulfill the receiver contract of a connected receiver by delivering one of the receiver completion-signals.

The sender concept defines the requirements for a sender type. The sender_to concept defines the requirements for a sender type capable of being connected with a specific receiver type.

template<class S>
  concept sender =
    move_constructible<remove_cvref_t<S>> &&
    !requires {
      typename sender_traits<remove_cvref_t<S>>::__unspecialized; // exposition only
    };

template<class S, class R>
  concept sender_to =
    sender<S> &&
    receiver<R> &&
    requires (S&& s, R&& r) {
      execution::connect((S&&) s, (R&&) r);
    };

A sender is typed if it declares what types it sends through a connected receiver’s channels.

The typed_sender concept defines the requirements for a typed sender type.

template<class S>
  concept has-sender-types = // exposition only
    requires {
      typename has-value-types<S::template value_types>;
      typename has-error-types<S::template error_types>;
      typename bool_constant<S::sends_done>;
    };

template<class S>
  concept typed_sender =
    sender<S> &&
    has-sender-types<sender_traits<remove_cvref_t<S>>>;

The sender_of concept defines the requirements for a typed sender type that on successful completion sends the specified set of value types.

template<class... Ts>
  struct type-list {};

template<class S, class... Ts>
  concept sender_of =
    typed_sender<S> &&
    same_as<
      type-list<Ts...>,
      typename sender_traits<S>::value_types<type-list, type_identity_t>
    >;

9.6.1. Sender traits [execution.senders.traits]

The class sender_base is used as a base class to tag sender types which do not expose member templates value_types, error_types, and a static member constant expression sends_done.
The class template sender_traits is used to query a sender type for facts associated with the signal it sends.

The primary class template sender_traits<S> is defined as if inheriting from an implementation-defined class template sender-traits-base<S> defined as follows:

If has-sender-types<S> is true, then sender-traits-base<S> is equivalent to:

template<class S>
  struct sender-traits-base {
    template<template<class...> class Tuple, template<class...> class Variant>
      using value_types = typename S::template value_types<Tuple, Variant>;

    template<template<class...> class Variant>
      using error_types = typename S::template error_types<Variant>;

    static constexpr bool sends_done = S::sends_done;
  };

Otherwise, if derived_from<S, sender_base> is true, then sender-traits-base<S> is equivalent to
```
template<class S>
  struct sender-traits-base {};
```

Otherwise, sender-traits-base<S> is equivalent to

template<class S>
  struct sender-traits-base {
    using __unspecialized = void; // exposition only
  };

If sender_traits<S>::value_types<Tuple, Variant> for some sender type S is well formed, it shall be a type Variant<Tuple<Args0..., Args1..., ..., ArgsN...>>, where the type packs Args0 through ArgsN are the packs of types the sender S passes as arguments to execution::set_value after a receiver object. If such sender S invokes execution::set_value(r, args...) for some receiver r, where decltype(args) is not one of the type packs Args0 through ArgsN, the program is ill-formed with no diagnostic required.
If sender_traits<S>::error_types<Variant> for some sender type S is well formed, it shall be a type Variant<E0, E1, ..., EN>, where the types E0 through EN are the types the sender S passes as arguments to execution::set_error after a receiver object. If such sender S invokes execution::set_error(r, e) for some receiver r, where decltype(e) is not one of the types E0 through EN, the program is ill-formed with no diagnostic required.
If sender_traits<S>::sends_done is well formed and true, and such sender S invokes execution::set_done(r) for some receiver r, the program is ill-formed with no diagnostic required.
Users may specialize sender_traits on program-defined types.

9.6.2. `execution::connect` [execution.senders.connect]

execution::connect is used to connect a sender with a receiver, producing an operation state object that represents the work that needs to be performed to satisfy the receiver contract of the receiver with values that are the result of the operations described by the sender.
The name execution::connect denotes a customization point object. For some subexpressions s and r, let S be decltype((s)) and R be decltype((r)). If R does not satisfy execution::receiver or S does not satisfy execution::sender, execution::connect(s, r) is ill-formed. Otherwise, the expression execution::connect(s, r) is expression-equivalent to:
1. tag_invoke(execution::connect, s, r), if that expression is valid and its type satisfies execution::operation_state. If the function selected by tag_invoke does not return an operation state for which execution::start starts work described by s, the program is ill-formed with no diagnostic required.
2. Otherwise, execution::connect(s, r) is ill-formed.
Standard sender types shall always expose an rvalue-qualified overload of a customization of execution::connect. Standard sender types shall only expose an lvalue-qualified overload of a customization of execution::connect if they are copyable.

9.6.3. Sender queries [execution.senders.queries]

9.6.3.1. `execution::get_completion_scheduler` [execution.senders.queries.get_completion_scheduler]

execution::get_completion_scheduler is used to ask a sender object for the completion scheduler for one of its signals.
The name execution::get_completion_scheduler denotes a customization point object template. For some subexpression s, let S be decltype((s)). If S does not satisfy execution::sender, execution::get_completion_scheduler is ill-formed. If the template argument CPO in execution::get_completion_scheduler<CPO> is not one of execution::set_value_t, execution::set_error_t, or execution::set_done_t, execution::get_completion_scheduler<CPO> is ill-formed. Otherwise, execution::get_completion_scheduler<CPO>(s) is expression-equivalent to:
1. tag_invoke(execution::get_completion_scheduler<CPO>, as_const(s)), if this expression is well formed and satisfies execution::scheduler, and is noexcept.
2. Otherwise, execution::get_completion_scheduler<CPO>(s) is ill-formed.
If, for some sender s and customization point object CPO, execution::get_completion_scheduler<decltype(CPO)>(s) is well-formed and results in a scheduler sch, and the sender s invokes CPO(r, args...), for some receiver r which has been connected to s, with additional arguments args..., on an execution agent which does not belong to the associated execution context of sch, the behavior is undefined.

9.6.4. Sender factories [execution.senders.factories]

9.6.4.1. General [execution.senders.factories.general]

Subclause [execution.senders.factories] defines sender factories, which are utilities that return senders without accepting senders as arguments.

9.6.4.2. `execution::schedule` [execution.senders.schedule]

execution::schedule is used to obtain a sender associated with a scheduler, which can be used to describe work to be started on that scheduler’s associated execution context.
The name execution::schedule denotes a customization point object. For some subexpression s, let S be decltype((s)). If S does not satisfy execution::scheduler, execution::schedule is ill-formed. Otherwise, the expression execution::schedule(s) is expression-equivalent to:
1. tag_invoke(execution::schedule, s), if that expression is valid and its type satisfies execution::sender. If the function selected by tag_invoke does not return a sender whose set_value completion scheduler is equivalent to s, the program is ill-formed with no diagnostic required.
2. Otherwise, execution::schedule(s) is ill-formed.

9.6.4.3. `execution::just` [execution.senders.just]

execution::just is used to create a sender that propagates a set of values to a connected receiver.

template<class... Ts>
struct just-sender // exposition only
{
  std::tuple<Ts...> vs_;

  template<template<class...> class Tuple, template<class...> class Tuple>
  using value_types = Variant<Tuple<Ts...>>;

  template<template<class...> class Variant>
  using error_types = Variant<>;

  static const constexpr auto sends_done = false;

  template<class R>
  struct operation_state {
    std::tuple<Ts...> vs_;
    R r_;

    void tag_invoke(execution::start_t)
      noexcept(noexcept(
        execution::set_value(declval<R>(), declval<Ts>()...)
      )) {
      try {
        apply([&](Ts &... values_) {
          execution::set_value(move(r_), move(values_)...);
        }, vs_);
      }
      catch (...) {
        execution::set_error(move(r_), current_exception());
      }
    }
  };

  template<receiver R>
    requires receiver_of<R, Ts...> && (copyable<Ts>... &&)
  auto tag_invoke(execution::connect_t, R && r) const & {
    return operation_state<R>{ vs_, std::forward<R>(r) };
  }

  template<receiver R>
    requires receiver_of<R, Ts...>
  auto tag_invoke(execution::connect_t, R && r) && {
    return operation_state<R>{ std::move(vs_), std::forward<R>(r) };
  }
};

template<moveable-value... Ts>
  just-sender<remove_cvref_t<Ts>...> just(Ts &&... ts) noexcept(see-below);

Effects: Initializes vs_ with make_tuple(forward<Ts>(ts)...).

Remarks: The expression in the noexcept-specifier is equivalent to

(is_nothrow_constructible_v<remove_cvref_t<Ts>, Ts> && ...)

9.6.4.4. `execution::transfer_just` [execution.senders.transfer_just]

execution::transfer_just is used to create a sender that propagates a set of values to a connected receiver on an execution agent belonging to the associated execution context of a specified scheduler.
The name execution::transfer_just denotes a customization point object. For some subexpressions s and vs..., let S be decltype((s)) and Vs... be decltype((vs)). If S does not satisfy execution::scheduler, or any type V in Vs does not satisfy moveable-value, execution::transfer_just(s, vs...) is ill-formed. Otherwise, execution::transfer_just(s, vs...) is expression-equivalent to:
1. tag_invoke(execution::transfer_just, s, vs...), if that expression is valid and its type satisfies execution::typed_sender. If the function selected by tag_invoke does not return a sender whose set_value completion scheduler is equivalent to s and sends values equivalent to vs... to a receiver connected to it, the program is ill-formed with no diagnostic required.
2. Otherwise, execution::transfer(execution::just(vs...), s).

9.6.5. Sender adaptors [execution.senders.adaptors]

9.6.5.1. General [execution.senders.adaptors.general]

Subclause [execution.senders.adaptors] defines sender adaptors, which are utilities that transform one or more senders into a sender with custom behaviors. When they accept a single sender argument, they can be chained to create sender chains.
The bitwise OR operator is overloaded for the purpose of creating sender chains. The adaptors also support function call syntax with equivalent semantics.
Most sender adaptors have two versions, an potentially eager version, and a strictly lazy version. For such sender adaptors, adaptor is the potentially eager version, and lazy_adaptor is the strictly lazy version.
A strictly lazy version of a sender adaptor is required to not begin executing any functions which would observe or modify any of the arguments of the adaptor before the returned sender is connected with a receiver using execution::connect, and execution::start is called on the resulting operation state. This requirement applies to any function that is selected by the implementation of the sender adaptor.
Unless otherwise specified, all sender adaptors which accept a single sender argument return sender objects that propagate sender queries to that single sender argument. This requirement applies to any function that is selected by the implementation of the sender adaptor.
Unless otherwise specified, whenever a strictly lazy sender adaptor constructs a receiver it passes to another sender’s connect, that receiver shall propagate receiver queries to a receiver accepted as an argument of execution::connect. This requirements applies to any sender returned from a function that is selected by the implementation of a strictly lazy sender adaptor.

9.6.5.2. Sender adaptor closure objects [execution.senders.adaptor.objects]

A pipeable sender adaptor closure object is a function object that accepts one or more sender arguments and returns a sender. For a sender adaptor closure object C and an expression S such that decltype((S)) models sender, the following expressions are equivalent and yield a sender:
```
C(S)
S | C
```
Given an additional pipeable sender adaptor closure object D, the expression C | D is well-formed and produces another range adaptor closure object such that the following two expressions are equivalent:
```
S | C | D
S | (C | D)
```
A pipeable sender adaptor object is a customization point object that accepts a sender as its first argument and returns a sender.
If a pipeable sender adaptor object accepts only one argument, then it is a pipeable sender adaptor closure object.
If a pipeable sender adaptor object accepts more than one argument, then the following expressions are equivalent:
```
adaptor(sender, args...)
adaptor(args...)(sender)
sender | adaptor(args...)
```
In that case, adaptor(args...) is a pipeable sender adaptor closure object.

9.6.5.3. `execution::on` [execution.senders.adaptors.on]

execution::on and execution::lazy_on are used to adapt a sender in a sender that will start the input sender on an execution agent belonging to a specific execution context.
The name execution::on denotes a customization point object. For some subexpressions sch and s, let Sch be decltype((sch)) and S be decltype((s)). If Sch does not satisfy execution::scheduler, or S does not satisfy execution::sender, execution::on is ill-formed. Otherwise, the expression execution::on(sch, s) is expression-equivalent to:
1. tag_invoke(execution::on, sch, s), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, lazy_on(sch, s).
If the function selected above does not return a sender which starts s on an execution agent of the associated execution context of sch, the program is ill-formed with no diagnostic required.
The name execution::lazy_on denotes a customization point object. For some subexpressions sch and s, let Sch be decltype((sch)) and S be decltype((s)). If Sch does not satisfy execution::scheduler, or S does not satisfy execution::sender, execution::lazy_on is ill-formed. Otherwise, the expression execution::lazy_on(sch, s) is expression-equivalent to:
1. tag_invoke(execution::lazy_on, sch, s), if that expression is valid and its type satisfies execution::sender. If the function selected above does not return a sender which starts s on an execution agent of the associated execution context of sch when started, the program is ill-formed with no diagnostic required.
2. Otherwise, constructs a sender s2. When s2 is connected with some receiver out_r, it results in an operation state op_state. When execution::start is called on op_state, it:
  1. Constructs a receiver r:
    1. When execution::set_value(r) is called, it calls execution::connect(s, out_r), which results in op_state2. It calls execution::start(op_state2). If any of these throws an exception, it calls execution::set_error on out_r, passing current_exception() as the second argument.
    2. When execution::set_error(r, e) is called, it calls execution::set_error(out_r, e).
    3. When execution::set_done(r) is called, it calls execution::set_done(out_r).
  2. Calls execution::schedule(sch), which results in s3. It then calls execution::connect(s3, r), resulting in op_state3, and then it calls execution::start(op_state3). If any of these throws an exception, it catches it and calls execution::set_error(out_r, current_exception()).
Any receiver r created by an implementation of on and lazy_on shall implement the get_scheduler receiver query. The scheduler returned from the query for all such receivers should be equivalent to the sch argument passed into the on or lazy_on call.

9.6.5.4. `execution::transfer` [execution.senders.adaptors.transfer]

execution::transfer and execution::lazy_transfer are used to adapt a sender into a sender with a different associated set_value completion scheduler. [Note: it results in a transition between different execution contexts when executed. --end note]
The name execution::transfer denotes a customization point object. For some subexpressions sch and s, let Sch be decltype((sch)) and S be decltype((s)). If Sch does not satisfy execution::scheduler, or S does not satisfy execution::sender, execution::transfer is ill-formed. Otherwise, the expression execution::transfer(s, sch) is expression-equivalent to:
1. tag_invoke(execution::transfer, get_completion_scheduler<set_value_t>(s), s, sch), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::transfer, s, sch), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, schedule_from(sch, s).
If the function selected above does not return a sender which is a result of a call to execution::schedule_from(sch, s2), where s2 is a sender which sends equivalent to those sent by s, the program is ill-formed with no diagnostic required.
The name execution::lazy_transfer denotes a customization point object. For some subexpressions sch and s, let Sch be decltype((sch)) and S be decltype((s)). If Sch does not satisfy execution::scheduler, or S does not satisfy execution::sender, execution::lazy_transfer is ill-formed. Otherwise, the expression execution::lazy_transfer(s, sch) is expression-equivalent to:
1. tag_invoke(execution::lazy_transfer, get_completion_scheduler<set_value_t>(s), s, sch), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::lazy_transfer, s, sch), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, lazy_schedule_from(sch, s).
If the function selected above does not return a sender which is a result of a call to execution::lazy_schedule_from(sch, s2), where s2 is a sender which sends equivalent to those sent by s, the program is ill-formed with no diagnostic required.
Senders returned from execution::transfer and execution::lazy_transfer shall not propagate the sender queries get_completion_scheduler<CPO> to an input sender. They shall return a sender equivalent to the sch argument from those queries.

9.6.5.5. `execution::schedule_from` [execution.senders.adaptors.schedule_from]

execution::schedule_from and execution::lazy_schedule_from are used to schedule work dependent on the completion of a sender onto a scheduler’s associated execution context. [Note: schedule_from and lazy_schedule_from are not meant to be used in user code; they are used in the implementation of transfer and lazy_transfer. -end note]
The name execution::schedule_from denotes a customization point object. For some subexpressions sch and s, let Sch be decltype((sch)) and S be decltype((s)). If Sch does not satisfy execution::scheduler, or S does not satisfy execution::typed_sender, execution::schedule_from is ill-formed. Otherwise, the expression execution::schedule_from(sch, s) is expression-equivalent to:
1. tag_invoke(execution::schedule_from, sch, s), if that expression is valid and its type satisfies execution::sender. If the function selected by tag_invoke does not return a sender which completes on an execution agent belonging to the associated execution context of sch and sends signals equivalent to those sent by s, the program is ill-formed with no diagnostic required.
2. Otherwise, lazy_schedule_from(sch, s).
The name execution::lazy_schedule_from denotes a customization point object. For some subexpressions sch and s, let Sch be decltype((sch)) and S be decltype((s)). If Sch does not satisfy execution::scheduler, or S does not satisfy execution::typed_sender, execution::lazy_schedule_from is ill-formed. Otherwise, the expression execution::lazy_schedule_from(sch, s) is expression-equivalent to:
1. tag_invoke(execution::lazy_schedule_from, sch, s), if that expression is valid and its type satisfies execution::sender. If the function selected by tag_invoke does not return a sender which completes on an execution agent belonging to the associated execution context of sch and sends signals equivalent to those sent by s, the program is ill-formed with no diagnostic required.
2. Otherwise, constructs a sender s2. When s2 is connected with some receiver out_r, it:
  1. Constructs a receiver r.
  2. Calls execution::connect(s, r), which results in an operation state op_state2. If any of these throws an exception, calls execution::set_error on out_r, passing current_exception() as the second argument.
  3. When a receiver completion-signal Signal(r, args...) is called, it constructs a receiver r2:
    1. When execution::set_value(r2) is called, it calls Signal(out_r, args...).
    2. When execution::set_error(r2, e) is called, it calls execution::set_error(out_r, e).
    3. When execution::done(r2) is called, it calls execution::set_done(out_r).
    It then calls execution::schedule(sch), resulting in a sender s3. It then calls execution::connect(s3, r2), resulting in an operation state op_state3. It then calls execution::start(op_state3). If any of these throws an exception, it catches it and calls execution::set_error(out_r, current_exception()).
  4. Returns an operation state op_state that contains op_state2. When execution::start(op_state) is called, calls execution::start(op_state2).
Senders returned from execution::transfer and execution::lazy_transfer shall not propagate the sender queries get_completion_scheduler<CPO> to an input sender. They shall return a scheduler equivalent to the sch argument from those queries.

9.6.5.6. `execution::then` [execution.senders.adaptors.then]

execution::then and execution::lazy_then are used to attach invocables as continuation for successful completion of the input sender.
The name execution::then denotes a customization point object. For some subexpressions s and f, let S be decltype((s)). If S does not satisfy execution::sender, execution::then is ill-formed. Otherwise, the expression execution::then(s, f) is expression-equivalent to:
1. tag_invoke(execution::then, get_completion_scheduler<set_value_t>(s), s, f), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::then, s, f), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, lazy_then(s, f).
If the function selected above does not return a sender which invokes f with the result of the set_value signal of s, passing the return value as the value to any connected receivers, and propagates the other completion-signals sent by s, the program is ill-formed with no diagnostic required.
The name execution::lazy_then denotes a customization point object. For some subexpressions s and f, let S be decltype((s)). If S does not satisfy execution::sender, execution::lazy_then is ill-formed. Otherwise, the expression execution::lazy_then(s, f) is expression-equivalent to:
1. tag_invoke(execution::lazy_then, get_completion_scheduler<set_value_t>(s), s, f), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::lazy_then, s, f), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, constructs a sender s2. When s2 is connected with some receiver out_r, it:
  1. Constructs a receiver r:
    1. When execution::set_value(r, args...) is called, calls invoke(f, args...) and passes the result v to execution::set_value(out_r, v). If any of these throws an exception, it catches it and calls execution::set_error(out_r, current_exception()).
    2. When execution::set_error(r, e) is called, calls execution::set_error(out_r, e).
    3. When execution::set_done(r) is called, calls execution::set_done(out_r).
  2. Calls execution::connect(s, r), which results in an operation state op_state2.
  3. Returns an operation state op_state that contains op_state2. When execution::start(op_state) is called, calls execution::start(op_state2).
If the function selected above does not return a sender which invokes f with the result of the set_value signal of s, passing the return value as the value to any connected receivers, and propagates the other completion-signals sent by s, the program is ill-formed with no diagnostic required.

9.6.5.7. `execution::upon_error` [execution.senders.adaptors.upon_error]

execution::upon_error and execution::lazy_upon_error are used to attach invocables as continuation for successful completion of the input sender.
The name execution::upon_error denotes a customization point object. For some subexpressions s and f, let S be decltype((s)). If S does not satisfy execution::sender, execution::upon_error is ill-formed. Otherwise, the expression execution::upon_error(s, f) is expression-equivalent to:
1. tag_invoke(execution::upon_error, get_completion_scheduler<set_error_t>(s), s, f), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::upon_error, s, f), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, lazy_upon_error(s, f).
If the function selected above does not return a sender which invokes f with the result of the set_error signal of s, passing the return value as the value to any connected receivers, and propagates the other completion-signals sent by s, the program is ill-formed with no diagnostic required.
The name execution::lazy_upon_error denotes a customization point object. For some subexpressions s and f, let S be decltype((s)). If S does not satisfy execution::sender, execution::lazy_upon_error is ill-formed. Otherwise, the expression execution::lazy_upon_error(s, f) is expression-equivalent to:
1. tag_invoke(execution::lazy_upon_error, get_completion_scheduler<set_error_t>(s), s, f), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::lazy_upon_error, s, f), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, constructs a sender s2. When s2 is connected with some receiver out_r, it:
  1. Constructs a receiver r:
    1. When execution::set_value(r, args...) is called, calls execution::set_value(out_r, args...).
    2. When execution::set_error(r, e) is called, calls invoke(f, e) and passes the result v to execution::set_value(out_r, v). If any of these throws an exception, it catches it and calls execution::set_error(out_r, current_exception()).
    3. When execution::set_done(r) is called, calls execution::set_done(out_r).
  2. Calls execution::connect(s, r), which results in an operation state op_state2.
  3. Returns an operation state op_state that contains op_state2. When execution::start(op_state) is called, calls execution::start(op_state2).
If the function selected above does not return a sender which invokes f with the result of the set_error signal of s, passing the return value as the value to any connected receivers, and propagates the other completion-signals sent by s, the program is ill-formed with no diagnostic required.

9.6.5.8. `execution::upon_done` [execution.senders.adaptors.upon_done]

execution::upon_done and execution::lazy_upon_done are used to attach invocables as continuation for successful completion of the input sender.
The name execution::upon_done denotes a customization point object. For some subexpressions s and f, let S be decltype((s)). If S does not satisfy execution::sender, execution::upon_done is ill-formed. Otherwise, the expression execution::upon_done(s, f) is expression-equivalent to:
1. tag_invoke(execution::upon_done, get_completion_scheduler<set_done_t>(s), s, f), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::upon_done, s, f), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, lazy_upon_done(s, f).
If the function selected above does not return a sender which invokes f when the set_done signal of s is called, passing the return value as the value to any connected receivers, and propagates the other completion-signals sent by s, the program is ill-formed with no diagnostic required.
The name execution::lazy_upon_done denotes a customization point object. For some subexpressions s and f, let S be decltype((s)). If S does not satisfy execution::sender, execution::lazy_upon_done is ill-formed. Otherwise, the expression execution::lazy_upon_done(s, f) is expression-equivalent to:
1. tag_invoke(execution::lazy_upon_done, get_completion_scheduler<set_done_t>(s), s, f), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::lazy_upon_done, s, f), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, constructs a sender s2. When s2 is connected with some receiver out_r, it:
  1. Constructs a receiver r:
    1. When execution::set_value(r, args...) is called, calls execution::set_value(out_r, args...).
    2. When execution::set_error(r, e) is called, calls execution::set_error(out_r, e).
    3. When execution::set_done(r) is called, calls invoke(f) and passes the result v to execution::set_value(out_r, v). If any of these throws an exception, it catches it and calls execution::set_error(out_r, current_exception()).
  2. Calls execution::connect(s, r), which results in an operation state op_state2.
  3. Returns an operation state op_state that contains op_state2. When execution::start(op_state) is called, calls execution::start(op_state2).
If the function selected above does not return a sender which invokes f when the set_done signal of s is called, passing the return value as the value to any connected receivers, and propagates the other completion-signals sent by s, the program is ill-formed with no diagnostic required.

9.6.5.9. `execution::let_value` [execution.senders.adaptors.let_value]

execution::let_value and execution::lazy_let_value are used to insert continuations creating more work dependent on the results of their input senders into a sender chain.
The name execution::let_value denotes a customization point object. For some subexpressions s and f, let S be decltype((s)). If S does not satisfy execution::sender, execution::let_value is ill-formed. Otherwise, the expression execution::let_value(s, f) is expression-equivalent to:
1. tag_invoke(execution::let_value, get_completion_scheduler<set_value_t>(s), s, f), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::let_value, s, f), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, lazy_let_value(s, f).
If the function selected above does not return a sender which invokes f when set_value is called, and making its completion dependent on the completion of a sender returned by f, and propagates the other completion-signals sent by s, the program is ill-formed with no diagnostic required.
The name execution::lazy_let_value denotes a customization point object. For some subexpressions s and f, let S be decltype((s)). If S does not satisfy execution::sender, execution::lazy_let_value is ill-formed. Otherwise, the expression execution::lazy_let_value(s, f) is expression-equivalent to:
1. tag_invoke(execution::lazy_let_value, get_completion_scheduler<set_value_t>(s), s, f), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::lazy_let_value, s, f), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, constructs a sender s2. When s2 is connected with some receiver out_r, it:
  1. Constructs a receiver r.
    1. When execution::set_value(r, args...) is called, copies args... into op_state2 as args2..., then calls invoke(f, args2...), resulting in a sender s3. It then calls execution::connect(s3, out_r), resulting in an operation state op_state3. op_state3 is saved as a part of op_state2. It then calls execution::start(op_state3). If any of these throws an exception, it catches it and calls execution::set_error(out_r, current_exception()).
    2. When execution::set_error(r, e) is called, calls execution::set_error(out_r, e).
    3. When execution::set_done(r, e) is called, calls execution::set_done(out_r).
  2. Calls execution::connect(s, r), which results in an operation state op_state2.
  3. Returns an operation state op_state that contains op_state2. When execution::start(op_state) is called, calls execution::start(op_state2).
If the function selected above does not return a sender which invokes f when set_value is called, and making its completion dependent on the completion of a sender returned by f, and propagates the other completion-signals sent by s, the program is ill-formed with no diagnostic required.

9.6.5.10. `execution::let_error` [execution.senders.adaptors.let_error]

execution::let_error and execution::lazy_let_error are used to insert continuations creating more work dependent on the results of their input senders into a sender chain.
The name execution::let_error denotes a customization point object. For some subexpressions s and f, let S be decltype((s)). If S does not satisfy execution::sender, execution::let_error is ill-formed. Otherwise, the expression execution::let_error(s, f) is expression-equivalent to:
1. tag_invoke(execution::let_error, get_completion_scheduler<set_error_t>(s), s, f), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::let_error, s, f), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, lazy_let_error(s, f).
If the function selected above does not return a sender which invokes f when set_error is called, and making its completion dependent on the completion of a sender returned by f, and propagates the other completion-signals sent by s, the program is ill-formed with no diagnostic required.
The name execution::lazy_let_error denotes a customization point object. For some subexpressions s and f, let S be decltype((s)). If S does not satisfy execution::sender, execution::lazy_let_error is ill-formed. Otherwise, the expression execution::lazy_let_error(s, f) is expression-equivalent to:
1. tag_invoke(execution::lazy_let_error, get_completion_scheduler<set_error_t>(s), s, f), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::lazy_let_error, s, f), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, constructs a sender s2. When s2 is connected with some receiver out_r, it:
  1. Constructs a receiver r.
    1. When execution::set_value(r, args...) is called, calls execution::set_value(out_r, args...).
    2. When execution::set_error(r, e) is called, copies e into op_state2 as e, then calls invoke(f, e), resulting in a sender s3. It then calls execution::connect(s3, out_r), resulting in an operation state op_state3. op_state3 is saved as a part of op_state2. It then calls execution::start(op_state3). If any of these throws an exception, it catches it and calls execution::set_error(out_r, current_exception()).
    3. When execution::set_done(r, e) is called, calls execution::set_done(out_r).
  2. Calls execution::connect(s, r), which results in an operation state op_state2.
  3. Returns an operation state op_state that contains op_state2. When execution::start(op_state) is called, calls execution::start(op_state2).
If the function selected above does not return a sender which invokes f when set_error is called, and making its completion dependent on the completion of a sender returned by f, and propagates the other completion-signals sent by s, the program is ill-formed with no diagnostic required.

9.6.5.11. `execution::let_done` [execution.senders.adaptors.let_done]

execution::let_done and execution::lazy_let_done are used to insert continuations creating more work dependent on the results of their input senders into a sender chain.
The name execution::let_done denotes a customization point object. For some subexpressions s and f, let S be decltype((s)). If S does not satisfy execution::sender, execution::let_done is ill-formed. Otherwise, the expression execution::let_done(s, f) is expression-equivalent to:
1. tag_invoke(execution::let_done, get_completion_scheduler<set_done_t>(s), s, f), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::let_done, s, f), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, lazy_let_done(s, f).
If the function selected above does not return a sender which invokes f when set_done is called, and making its completion dependent on the completion of a sender returned by f, and propagates the other completion-signals sent by s, the program is ill-formed with no diagnostic required.
The name execution::lazy_let_done denotes a customization point object. For some subexpressions s and f, let S be decltype((s)). If S does not satisfy execution::sender, execution::lazy_let_done is ill-formed. Otherwise, the expression execution::lazy_let_done(s, f) is expression-equivalent to:
1. tag_invoke(execution::lazy_let_done, get_completion_scheduler<set_done_t>(s), s, f), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::lazy_let_done, s, f), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, constructs a sender s2. When s2 is connected with some receiver out_r, it:
  1. Constructs a receiver r.
    1. When execution::set_value(r, args...) is called, calls execution::set_value(out_r, args...).
    2. When execution::set_error(r, e) is called, calls execution::set_error(out_r, e).
    3. When execution::set_done(r) is called, calls invoke(f), resulting in a sender s3. It then calls execution::connect(s3, out_r), resulting in an operation state op_state3. op_state3 is saved as a part of op_state2. It then calls execution::start(op_state3). If any of these throws an exception, it catches it and calls execution::set_error(out_r, current_exception()).
  2. Calls execution::connect(s, r). which results in an operation state op_state2.
  3. Returns an operation state op_state that contains op_state2. When execution::start(op_state) is called, calls execution::start(op_state2).
If the function selected above does not return a sender which invokes f when set_done is called, and making its completion dependent on the completion of a sender returned by f, and propagates the other completion-signals sent by s, the program is ill-formed with no diagnostic required.

9.6.5.12. `execution::bulk` [execution.senders.adaptors.bulk]

execution::bulk and execution::lazy_bulk are used to run a task repeatedly for every index in an index space.
The name execution::bulk denotes a customization point object. For some subexpressions s, shape, and f, let S be decltype((s)), Shape be decltype((shape)), and F be decltype((f)). If S does not satisfy execution::sender or Shape does not satisfy integral, execution::bulk is ill-formed. Otherwise, the expression execution::bulk(s, shape, f) is expression-equivalent to:
1. tag_invoke(execution::bulk, get_completion_scheduler<set_value_t>(s), s, shape, f), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::bulk, s, shape, f), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, lazy_bulk(s, shape, f).
The name execution::lazy_bulk denotes a customization point object. For some subexpressions s, shape, and f, let S be decltype((s)), Shape be decltype((shape)), and F be decltype((f)). If S does not satisfy execution::sender or Shape does not satisfy integral, execution::bulk is ill-formed. Otherwise, the expression execution::bulk(s, shape, f) is expression-equivalent to:
1. tag_invoke(execution::bulk, get_completion_scheduler<set_value_t>(s), s, shape, f), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::bulk, s, shape, f), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, constructs a sender s2. When s2 is connected with some receiver out_r, it:
  1. Constructs a receiver r:
    1. When execution::set_value(r, args...) is called, calls f(i, args...) for each i of type Shape from 0 to shape, then calls execution::set_value(out_r, args...). If any of these throws an exception, it catches it and calls execution::set_error(out_r, current_exception()).
    2. When execution::set_error(r, e) is called, calls execution::set_error(out_r, e).
    3. When execution::set_done(r, e) is called, calls execution::set_done(out_r, e).
  2. Calls execution::connect(s, r), which results in an operation state op_state2.
  3. Returns an operation state op_state that contains op_state2. When execution::start(op_state) is called, calls execution::start(op_state2).
If the function selected above does not return a sender which invokes f(i, args...) for each i of type Shape from 0 to shape when the input sender sends values args..., or does not propagate the values of the signals sent by the input sender to a connected receiver, the program is ill-formed with no diagnostic required.

9.6.5.13. `execution::split` [execution.senders.adaptors.split]

execution::split and execution::lazy_split are used to adapt an arbitrary sender into a sender that can be connected multiple times.
The name execution::split denotes a customization point object. For some subexpression s, let S be decltype((s)). If S does not satisfy execution::typed_sender, execution::split is ill-formed. Otherwise, the expression execution::split(s) is expression-equivalent to:
1. tag_invoke(execution::split, get_completion_scheduler<set_value_t>(s), s), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::split, s), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, lazy_split(s).
If the function selected above does not return a sender which sends references to values sent by s, propagating the other channels, the program is ill-formed with no diagnostic required.
The name execution::lazy_split denotes a customization point object. For some subexpression s, let S be decltype((s)). If S does not satisfy execution::typed_sender, execution::lazy_split is ill-formed. Otherwise, the expression execution::lazy_split(s) is expression-equivalent to:
1. tag_invoke(execution::lazy_split, get_completion_scheduler<set_value_t>(s), s), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::lazy_split, s), if that expression is valid and its type satisfies execution::sender.
3. Otherwise, constructs a sender s2, which:
  1. Creates an object sh_state. The lifetime of sh_state shall last for at least as long as the lifetime of the last operation state object returned from execution::connect(s, some_r) for some receiver some_r.
  2. Constructs a receiver r:
    1. When execution::set_value(r, args...) is called, saves the expressions args... as subobjects of sh_state.
    2. When execution::set_error(r, e) is called, saves the expression e as a subobject of sh_state.
    3. When execution::set_done(r) is called, saves this fact in sh_state.
  3. Calls execution::connect(s, r), resulting in an operation state op_state2. op_state2 is saved as a subobject of sh_state.
  4. When s2 is connected with a receiver out_r, it returns an operation state object op_state. When execution::start(op_state) is called, it calls execution::start(op_state2), if this is the first time this expression would be evaluated. When both execution::start(op_state) and Signal(r, args...) have been called, calls Signal(out_r, args2...), where args2... is a pack of lvalues referencing the subobjects of sh_state that have been saved by the original call to Signal(r, args...).
If the function selected above does not return a sender which sends references to values sent by s, propagating the other channels, the program is ill-formed with no diagnostic required.

9.6.5.14. `execution::when_all` [execution.senders.adaptors.when_all]

execution::when_all is used to join multiple sender chains and create a sender whose execution is dependent on all of the input senders that only send a single set of values. execution::when_all_with_variant is used to join multiple sender chains and create a sender whose execution is dependent on all of the input senders, which may have one or more sets of sent values.
The name execution::when_all denotes a customization point object. For some subexpressions s..., let S be decltype((s)). If any type S_i in S... does not satisfy execution::typed_sender, or the number of the arguments sender_traits<S_i>::value_types passes into the Variant template parameter is not 1, execution::when_all is ill-formed. Otherwise, the expression execution::when_all(s...) is expression-equivalent to:
1. tag_invoke(execution::when_all, s...), if that expression is valid and its type satisfies execution::sender. If the function selected by tag_invoke does not return a sender which sends a concatenation of values sent by s... when they all complete with set_value, the program is ill-formed with no diagnostic required.
2. Otherwise, constructs a sender s. When s is connected with some receiver out_r, it:
  1. For each sender s_i in s..., constructs a receiver r_i:
    1. If execution::set_value(r_i, t_i...) is called for every r_i, execution::set_value(out_r, t₀..., t₁..., ..., t_n...) is called, where n is sizeof...(s) - 1.
    2. Otherwise, if execution::set_error(r_i, e) is called for any r_i, execution::set_error(out_r, e) is called.
    3. Otherwise, if execution::set_done(r_i) is called for any r_i, execution::set_done(out_r) is called.
  2. For each sender s_i in s..., calls execution::connect(s_i, r_i), resulting in operation states op_state_i.
  3. Returns an operation state op_state that contains each operation state op_state_i. When execution::start(op_state) is called, calls execution::start(op_state_i) for each op_state_i.
The name execution::when_all_with_variant denotes a customization point object. For some subexpressions s..., let S be decltype((s)). If any type S_i in S... does not satisfy execution::typed_sender, execution::when_all_with_variant is ill-formed. Otherwise, the expression execution::when_all_with_variant(s...) is expression-equivalent to:
1. tag_invoke(execution::when_all_with_variant, s...), if that expression is valid and its type satisfies execution::sender. If the function selected by tag_invoke does not return a sender which sends the types into-variant-type<S>... when they all complete with set_value, the program is ill-formed with no diagnostic required.
2. Otherwise, execution::when_all(execution::into_variant(s)...).
Adaptors defined in this subclause are strictly lazy.
Senders returned from adaptors defined in this subclause shall not expose the sender queries get_completion_scheduler<CPO>.
tag_invoke expressions used in the definitions of the sender adaptors in this subclause shall not consider member functions of their first non-tag arguments.

9.6.5.15. `execution::transfer_when_all` [execution.senders.adaptors.transfer_when_all]

execution::transfer_when_all and execution::lazy_transfer_when_all are used to join multiple sender chains and create a sender whose execution is dependent on all of the input senders that only send a single set of values each, while also making sure that they complete on the specified scheduler. execution::transfer_when_all_with_variant and execution::lazy_transfer_when_all_with_variant are used to join multiple sender chains and create a sender whose execution is dependent on all of the input senders, which may have one or more sets of sent values. [Note: this can allow for better customization of the adaptor. --end note]
The name execution::transfer_when_all denotes a customization point object. For some subexpressions sch and s..., let Sch be decltype(sch) and S be decltype((s)). If Sch does not satisfy scheduler, or any type S_i in S... does not satisfy execution::typed_sender, or the number of the arguments sender_traits<S_i>::value_types passes into the Variant template parameter is not 1 execution::transfer_when_all is ill-formed. Otherwise, the expression execution::transfer_when_all(sch, s...) is expression-equivalent to:
1. tag_invoke(execution::transfer_when_all, sch, s...), if that expression is valid and its type satisfies execution::sender. If the function selected by tag_invoke does not return a sender which sends a concatenation of values sent by s... when they all complete with set_value, or does not send its completion signals, other than ones resulting from a scheduling error, on an execution agent belonging to the associated execution context of sch, the program is ill-formed with no diagnostic required.
2. Otherwise, transfer(when_all(s...), sch).
The name execution::lazy_transfer_when_all denotes a customization point object. For some subexpressions sch and s..., let Sch be decltype(sch) and S be decltype((s)). If Sch does not satisfy scheduler, or any type S_i in S... does not satisfy execution::typed_sender, or the number of the arguments sender_traits<S_i>::value_types passes into the Variant template parameter is not 1, execution::lazy_transfer_when_all is ill-formed. Otherwise, the expression execution::lazy_transfer_when_all(sch, s...) is expression-equivalent to:
1. tag_invoke(execution::lazy_transfer_when_all, sch, s...), if that expression is valid and its type satisfies execution::sender. If the function selected by tag_invoke does not return a sender which sends a concatenation of values sent by s... when they all complete with set_value, or does not send its completion signals, other than ones resulting from a scheduling error, on an execution agent belonging to the associated execution context of sch, the program is ill-formed with no diagnostic required.
2. Otherwise, lazy_transfer(when_all(s...), sch).
The name execution::transfer_when_all_with_variant denotes a customization point object. For some subexpressions s..., let S be decltype((s)). If any type S_i in S... does not satisfy execution::typed_sender, execution::transfer_when_all_with_variant is ill-formed. Otherwise, the expression execution::transfer_when_all_with_variant(s...) is expression-equivalent to:
1. tag_invoke(execution::transfer_when_all_with_variant, s...), if that expression is valid and its type satisfies execution::sender. If the function selected by tag_invoke does not return a sender which sends the types into-variant-type<S>... when they all complete with set_value, the program is ill-formed with no diagnostic required.
2. Otherwise, execution::transfer_when_all(sch, execution::into_variant(s)...).
The name execution::lazy_transfer_when_all_with_variant denotes a customization point object. For some subexpressions s..., let S be decltype((s)). If any type S_i in S... does not satisfy execution::typed_sender, execution::lazy_transfer_when_all_with_variant is ill-formed. Otherwise, the expression execution::lazy_transfer_when_all_with_variant(s...) is expression-equivalent to:
1. tag_invoke(execution::lazy_transfer_when_all_with_variant, s...), if that expression is valid and its type satisfies execution::sender. If the function selected by tag_invoke does not return a sender which sends the types into-variant-type<S>... when they all complete with set_value, the program is ill-formed with no diagnostic required.
2. Otherwise, execution::lazy_transfer_when_all(sch, execution::into_variant(s)...).
Senders returned from execution::transfer_when_all and execution::lazy_transfer_when_all shall not propagate the sender queries get_completion_scheduler<CPO> to input senders. They shall return a scheduler equivalent to the sch argument from those queries.

9.6.5.16. `execution::into_variant` [execution.senders.adaptors.into_variant]

execution::into_variant can be used to turn a typed sender which sends multiple sets of values into a sender which sends a variant of all of those sets of values.

The template into-variant-type is used to compute the type sent by a sender returned from execution::into_variant.

template<typed_sender S>
  using into-with-variant-type =
    typename execution::sender_traits<remove_cvref_t<S>>
      ::template value_types<tuple, variant>;

template<typed_sender S>
  see-below into_variant(S && s);

Effects: Returns a sender s2. When s2 is connected with some receiver out_r, it:
1. Constructs a receiver r:
  1. If execution::set_value(r, ts...) is called, calls execution::set_value(out_r, into-variant-type<S>(make_tuple(ts...))).
  2. If execution::set_error(r, e) is called, calls execution::set_error(out_r, e).
  3. If execution::set_done(r) is called, calls execution::set_done(out_r).
2. Calls execution::connect(s, r), resulting in an operation state op_state2.
3. Returns an operation state op_state that contains op_state2. When execution::start(op_state) is called, calls execution::start(op_state2).

9.6.5.17. `execution::ensure_started` [execution.senders.adaptors.ensure_started]

execution::ensure_started is used to eagerly start the execution of a sender, while also providing a way to attach further work to execute once it has completed.
The name execution::ensure_started denotes a customization point object. For some subexpression s, let S be decltype((s)). If S does not satisfy execution::typed_sender, execution::ensure_started is ill-formed. Otherwise, the expression execution::ensure_started(s) is expression-equivalent to:
1. tag_invoke(execution::ensure_started, get_completion_scheduler<set_value_t>(s), s), if that expression is valid and its type satisfies execution::sender.
2. Otherwise, tag_invoke(execution::ensure_started, s), if that expression is valid and its type satisfies execution::sender.
3. Otherwise:
  1. Constructs a receiver r.
  2. Calls execution::connect(s, r), resulting in operation state op_state, and then calls execution::start(op_state). If any of these throws an exception, it catches it and calls execution::set_error(r, current_exception()).
  3. Constructs a sender s2. When s2 is connected with some receiver out_r, it results in an operation state op_state2. Once both execution::start(op_state2) and one of the receiver completion-signals has been called on r:
    1. If execution::set_value(r, ts...) has been called, calls execution::set_value(out_r, ts...).
    2. If execution::set_error(r, e) has been called, calls execution::set_error(out_r, e).
    3. If execution::set_done(r) has been called, calls execution::set_done(out_r).
If the function selected above does not eagerly start the sender s and return a sender which propagates the signals sent by s once started, the program is ill-formed with no diagnostic required.

9.6.6. Sender consumers [execution.senders.consumers]

9.6.6.1. `execution::start_detached` [execution.senders.consumer.start_detached]

execution::start_detached is used to eagerly start a sender without the caller needing to manage the lifetimes of any objects.
The name execution::start_detached denotes a customization point object. For some subexpression s, let S be decltype((s)). If S does not satisfy execution::sender, execution::start_detached is ill-formed. Otherwise, the expression execution::start_detached(s) is expression-equivalent to:
1. tag_invoke(execution::start_detached, execution::get_completion_scheduler<execution::set_value_t>(s), s), if that expression is valid and its type is void.
2. Otherwise, tag_invoke(execution::start_detached, s), if that expression is valid and its type is void.
3. Otherwise:
  1. Constructs a receiver r:
    1. When set_value(r, ts...) is called, it does nothing.
    2. When set_error(r, e) is called, it calls std::terminate.
    3. When set_done(r) is called, it does nothing.
  2. Calls execution::connect(s, r), resulting in an operation state op_state, then calls execution::start(op_state).
If the function selected above does not eagerly start the sender s after connecting it with a receiver which ignores the set_value and set_done signals and calls std::terminate on the set_error signal, the program is ill-formed with no diagnostic required.

9.6.6.2. `this_thread::sync_wait` [execution.senders.consumers.sync_wait]

this_thread::sync_wait and this_thread::sync_wait_with_variant are used to block a current thread until a sender passed into it as an argument has completed, and to obtain the values (if any) it completed with.

The templates sync-wait-type and sync-wait-with-variant-type are used to determine the return types of this_thread::sync_wait and this_thread::sync_wait_with_variant.

template<typed_sender S>
  using sync-wait-type = optional<
    typename execution::sender_traits<remove_cvref_t<S>>
      ::template value_types<tuple, type_identity_t>>;

template<typed_sender S>
  using sync-wait-with-variant-type = optional<into-variant-type<S>>;

The name this_thread::sync_wait denotes a customization point object. For some subexpression s, let S be decltype((s)). If S does not satisfy execution::typed_sender, or the number of the arguments sender_traits<S>::value_types passes into the Variant template parameter is not 1, this_thread::sync_wait is ill-formed. Otherwise, this_thread::sync_wait is expression-equivalent to:
1. tag_invoke(this_thread::sync_wait, execution::get_completion_scheduler<execution::set_value_t>(s), s), if this expression is valid and its type is sync-wait-type<S>.
2. Otherwise, tag_invoke(this_thread::sync_wait, s), if this expression is valid and its type is sync-wait-type<S>.
3. Otherwise:
  1. Constructs a receiver r.
  2. Calls execution::connect(s, r), resulting in an operation state op_state, then calls execution::start(op_state).
  3. Blocks the current thread until a receiver completion-signal of r is called. When it is:
    1. If execution::set_value(r, ts...) has been called, returns sync-wait-type<S>(make_tuple(ts...))>.
    2. If execution::set_error(r, e...) has been called, if remove_cvref_t(decltype(e)) is exception_ptr, calls std::rethrow_exception(e). Otherwise, throws e.
    3. If execution::set_done(r) has been called, returns sync-wait-type<S(nullopt)>.
The name this_thread::sync_wait_with_variant denotes a customization point object. For some subexpression s, let S be decltype((s)). If S does not satisfy execution::typed_sender, this_thread::sync_wait_with_variant is ill-formed. Otherwise, this_thread::sync_wait_with_variant is expression-equivalent to:
1. tag_invoke(this_thread::sync_wait_with_variant, execution::get_completion_scheduler<execution::set_value_t>(s), s), if this expression is valid and its type is sync-wait-with-variant-type<S>.
2. Otherwise, tag_invoke(this_thread::sync_wait_with_variant, s), if this expression is valid and its type is sync-wait-with-variant-type<S>.
3. Otherwise, this_thread::sync_wait(execution::into_variant(s)).
Any receiver r created by an implementation of sync_wait and sync_wait_with_variant shall implement the get_scheduler receiver query. The scheduler returned from the query for the receiver created by the default implementation shall return an implementation-defined scheduler that is driven by the waiting thread, such that scheduled tasks run on the thread of the caller.

9.7. `execution::execute` [execution.execute]

execution::execute is used to create fire-and-forget tasks on a specified scheduler.
The name execution::execute denotes a customization point object. For some subexpressions sch and f, let Sch be decltype((sch)) and F be decltype((f)). If Sch does not satisfy execution::scheduler or F does not satisfy invocable<>, execution::execute is ill-formed. Otherwise, execution::execute is expression-equivalent to:
1. tag_invoke(execution::execute, sch, f), if that expression is valid and its type is void. If the function selected by tag_invoke does not invoke the function f on an execution agent belonging to the associated execution context of sch, or if it does not call std::terminate if an error occurs after control is returned to the caller, the program is ill-formed with no diagnostic required.
2. Otherwise, execution::start_detached(execution::then(execution::schedule(sch), f)).