1. Changes
1.1. R7
- 
     Change the title from "System execution context" to "Parallel scheduler" 
- 
     Incorporate feedback from SG1 Hagenberg, Austria, February 2025 (rename system_scheduler parallel_scheduler 
- 
     Tweaking the replaceability API to be consistent with [P3481R1], now at R2 (support bulk_chunked bulk_unchunked 
1.2. R6
- 
     Incorporate feedback from SG1 and LEWG given in Wrocław, Poland, November 2024. 
- 
     Remove the wording about delegatee schedulers 
- 
     system_scheduler set_error std :: exception_ptr 
- 
     Remove system_scheduler query ( get_completion_scheduler_t ) set_stopped_t 
- 
     Add noexcept system_scheduler 
- 
     Add lvalue connect 
- 
     Add sender_concept completion_signatures get_env 
- 
     Mandate replaceability 
- 
     Define replaceability mechanism (both link-time and runtime) 
- 
     Define replaceability API 
- 
     Add Specification section 
1.3. R5
- 
     Streamline the paper 
- 
     Make replaceability availability implementation-defined 
- 
     Make replaceability API implementation-defined 
- 
     Replace the use of system_context get_system_scheduler () 
- 
     Update user-facing API 
- 
     Relax the lifetime guarantees to allow using system scheduler outside of main () 
1.4. R4
- 
     Add more design considerations & goals. 
- 
     Add comparison of different replaceability options 
- 
     Add motivation for replaceability ABI standardization 
- 
     Add the example of the ABI for replacement 
- 
     Strengthen the lifetime guarantees. 
1.5. R3
- 
     Remove execute_all execute_chunk 
- 
     Add design discussion about the approach we should take for customization and the extent to which the context should be implementation-defined. 
- 
     Add design discussion for an explicit system_context 
- 
     Add design discussion about priorities. 
1.6. R2
- 
     Significant redesign to fit in [P2300R10] model. 
- 
     Strictly limit to parallel progress without control over the level of parallelism. 
- 
     Remove direct support for task groups, delegating that to async_scope 
1.7. R1
- 
     Minor modifications 
1.8. R0
- 
     First revision 
2. Introduction
[P2300R10] describes a rounded set of primitives for asynchronous and parallel execution that give a firm grounding for the future. However, the paper lacks a standard execution context and scheduler. It has been broadly accepted that we need some sort of standard scheduler.As part of [P3109R0], system context was voted as a must-have for the initial release of senders/receivers. It provides a convenient and scalable way of spawning concurrent work for the users of senders/receivers.
As noted in [P2079R1], an earlier revision of this paper, the 
One of the biggest problems with local thread pools is that they lead to CPU oversubscription. This introduces a performance problem for complex systems that are composed from many independent parts.
Another problem that system context is aiming to solve is the composability of components that may rely on different parallel engines. An application might have multiple parts, possibly in different binaries; different parts of the application may not know of each other. Thus, different parts of the application might use different parallel engines. This can create several problems:
- 
     oversubscription because of different thread pools 
- 
     problems with nested parallel loops (one parallel loop is called from the other) 
- 
     problems related to interaction between different parallel engines 
- 
     etc. 
To solve these problems we propose a parallel execution context that:
- 
     can be shared between multiple parts of the application 
- 
     does not suffer from oversubscription 
- 
     can integrate with the OS scheduler 
- 
     can be replaced by the user to compose well with other parallel runtimes 
This parallel execution context is called in this paper the system context. The users can obtain a scheduler from this system context.
Note: SG1 has expressed a desire to use the name "parallel scheduler" instead of "system execution scheduler", so that we can use "system execution scheduler" for a concurrent-forward-progress execution context in the future. For simplicity reasons, the paper uses the term "system context" to refer to the context of the parallel scheduler.
2.1. Design overview
The system context is a parallel execution context of undefined size, supporting explicitly parallel forward progress.
The execution resources of the system context are envisioned to be shared across all binaries in the same process. System scheduler works best with CPU-intensive workloads, and thus, limiting oversubscription is a key goal.
By default, the system context should be able to use the OS scheduler, if the OS has one. On systems where the OS scheduler is not available, the system context will have a generic implementation that acts like a thread pool.
For enabling the users to hand-tune the performance of their applications, and for fulfilling the composability requirements, the system context should be replaceable. The user should be able to replace the default implementation of the system context with a custom one that fits their needs.
Other key concerns of this design are:
- 
     Extensibility: being able to extend the design to work with new additions to the senders/receivers framework. 
- 
     Lifetime: as system context is a global resource, we need to pay attention to the lifetime of this resource. 
- 
     Performance: as we envision this to be used in many cases to spawn concurrent work, performance considerations are important. 
3. Examples
As a simple parallel scheduler we can use it locally, andsync_wait using namespace = std :: execution ; scheduler auto sch = get_parallel_scheduler (); sender auto begin = schedule ( sch ); sender auto hi = then ( begin , []{ std :: cout << "Hello world! Have an int." ; return 13 ; }); sender auto add_42 = then ( hi , []( int arg ) { return arg + 42 ; }); auto [ i ] = std :: this_thread :: sync_wait ( add_42 ). value (); 
We can structure the same thing using 
using namespace std :: execution ; scheduler auto sch = get_parallel_scheduler (); sender auto hi = then ( just (), []{ std :: cout << "Hello world! Have an int." ; return 13 ; }); sender auto add_42 = then ( hi , []( int arg ) { return arg + 42 ; }); auto [ i ] = std :: this_thread :: sync_wait ( on ( sch , add_42 )). value (); 
The parallel scheduler customizes 
using namespace std :: execution ; auto bar () { return let_value ( read_env ( get_scheduler ), // Fetch scheduler from receiver. []( auto current_sched ) { return bulk ( current_sched . schedule (), 1 , // Only 1 bulk task as a lazy way of making cout safe []( auto idx ){ std :: cout << "Index: " << idx << " \n " ; }) }); } void foo () { auto [ i ] = std :: this_thread :: sync_wait ( on ( get_parallel_scheduler (), // Start bar on the system's parallel scheduler bar ())) // and propagate it through the receivers . value (); } 
Use 
using namespace std :: execution ; int result = 0 ; { async_scope scope ; scheduler auto sch = get_parallel_scheduler (); sender auto work = then ( just (), [ & ]( auto sched ) { int val = 13 ; auto print_sender = then ( just (), [ val ]{ std :: cout << "Hello world! Have an int with value: " << val << " \n " ; }); // spawn the print sender on sched to make sure it // completes before shutdown scope . spawn ( on ( sch , std :: move ( print_sender ))); return val ; }); scope . spawn ( on ( sch , std :: move ( work ))); // This is custom code for a single-threaded context that we have replaced // We need to drive it in main. // It is not directly sender-aware, like any pre-existing work loop, but // does provide an exit operation. We may call this from a callback chained // after the scope becomes empty. // We use a temporary terminal_scope here to separate the shut down // operation and block for it at the end of main, knowing it will complete. async_scope terminal_scope ; terminal_scope . spawn ( scope . on_empty () | then ([]( my_os :: exit ( sch )))); my_os :: drive ( sch ); std :: this_thread :: sync_wait ( terminal_scope ); }; // The scope ensured that all work is safely joined, so result contains 13 std :: cout << "Result: " << result << " \n " ; // and destruction of the context is now safe 
To change the implementation of system scheduler at link-time, one might do it the following way (very simplistic example):
namespace std :: execution :: system_context_replaceability { extern __attribute__ (( __weak__ )) std :: shared_ptr < parallel_scheduler > query_parallel_scheduler_backend () { return std :: make_shared < my_parallel_scheduler_impl > (); } } 
4. Design
4.1. User facing API
parallel_scheduler get_parallel_scheduler (); class parallel_scheduler { // exposition only public : parallel_scheduler () = delete ; ~ parallel_scheduler (); parallel_scheduler ( const parallel_scheduler & ) noexcept ; parallel_scheduler ( parallel_scheduler && ) noexcept ; parallel_scheduler & operator = ( const parallel_scheduler & ) noexcept ; parallel_scheduler & operator = ( parallel_scheduler && ) noexcept ; bool operator == ( const parallel_scheduler & ) const noexcept ; forward_progress_guarantee query ( get_forward_progress_guarantee_t ) const noexcept ; impl - defined - parallel_sender schedule () const noexcept ; // customization for bulk }; class impl - defined - parallel_sender { // exposition only public : using sender_concept = sender_t ; using completion_signatures = execution :: completion_signatures < set_value_t (), set_stopped_t (), set_error_t ( exception_ptr ) > ; impl - defined - environment get_env () const noexcept ; parallel_scheduler query ( get_completion_scheduler_t < set_value_t > ) const noexcept ; template < receiver R > requires receiver_of < R > impl - defined - operation_state connect ( R && ) & noexcept ( std :: is_nothrow_constructible_v < std :: remove_cvref_t < R > , R > ); template < receiver R > requires receiver_of < R > impl - defined - operation_state connect ( R && ) && noexcept ( std :: is_nothrow_constructible_v < std :: remove_cvref_t < R > , R > ); }; 
- 
     get_parallel_scheduler () 
- 
     two objects returned by get_parallel_scheduler () 
- 
     if Sch get_parallel_scheduler () - 
       Sch 
- 
       Sch scheduler 
- 
       Sch get_forward_progress_guarantee parallel 
- 
       Sch schedule sender 
- 
       schedule Sch 
- 
       Sch bulk bulk - 
         when execution :: set_value ( r , args ...) receiver i Shape 0 sh sh bulk f ( i , args ...) 
 
- 
         
 
- 
       
- 
     if sch get_parallel_scheduler () - 
       the lifetime of sch 
- 
       sch 
- 
       if sch2 get_parallel_scheduler () sch == sch2 trueif and only if they share the same backend implementation.
 
- 
       
- 
     if snd schedule get_parallel_scheduler () Snd - 
       Snd 
- 
       Snd sender 
- 
       Snd get_completion_scheduler 
- 
       connect snd receiver start () 
- 
       if snd receiver get_stop_token stop_token start set_stopped 
- 
       if snd receiver set_error ( ep ) ep std :: exception_ptr 
 
- 
       
- 
     The bulk get_parallel_scheduler () - 
       the corresponding sender has the same properties as the sender returned by schedule () 
- 
       the functor given to bulk 
 
- 
       
4.2. Replaceability API
namespace std :: execution :: system_context_replaceability { struct parallel_scheduler ; // Called by the frontend. // Users might replace this function. shared_ptr < parallel_scheduler > query_parallel_scheduler_backend (); // Implemented by the frontend. struct receiver { virtual ~ receiver () = default ; protected : // exposition only virtual bool unspecified - query - env ( unspecified - id , void * ) noexcept = 0 ; public : receiver ( const receiver & ) = delete ; receiver ( receiver && ) = delete ; receiver & operator = ( const receiver & ) = delete ; receiver & operator = ( receiver && ) = delete ; virtual void set_value () noexcept = 0 ; virtual void set_error ( std :: exception_ptr ) noexcept = 0 ; virtual void set_stopped () noexcept = 0 ; template < class - type P > // class-type is defined in [execution.syn] std :: optional < P > try_query () noexcept ; }; // Implemented by the frontend. struct bulk_item_receiver : receiver { virtual void start ( uint32_t start , uint32_t end ) noexcept = 0 ; }; struct storage { void * data ; uint32_t size ; }; // Implemented by the backend. struct parallel_scheduler { virtual ~ parallel_scheduler () = default ; virtual void schedule ( receiver * , storage ) noexcept = 0 ; virtual void schedule_bulk_chunked ( uint32_t , bulk_item_receiver * , storage ) noexcept = 0 ; virtual void schedule_bulk_unchunked ( uint32_t , bulk_item_receiver * , storage ) noexcept = 0 ; }; } 
- 
     Note: for the current exposition we call backend the part of the application that implements a (possibly custom) system context, and frontend the part of the application around execution :: parallel_scheduler 
- 
     query_parallel_scheduler_backend parallel_scheduler - 
       Note: it is expected for users to want to have link-time replacements of this function. 
 
- 
       
- 
     Note: receiver bulk_item_receiver parallel_scheduler 
- 
     receiver - 
       if, on the frontend side, the sender obtained from a system scheduler is connected to a receiver that has an environment for which get_stop_token inplace_stop_token receiver :: try_query < inplace_stop_token > () 
- 
       Note: depending on the implementation of the frontend, not all the environment properties may be available to the backend. 
 
- 
       
- 
     if sch parallel_scheduler query_parallel_scheduler_backend - 
       if sch . schedule () r receiver * s storage 
- 
       at least one of set_value set_error set_stopped r - 
         set_value 
- 
         if set_value set_error 
- 
         set_stopped r 
 
- 
         
- 
       the storage s s . data s . size - 
         the storage represented by s receiver r 
- 
         the implementation may use the storage represented by s 
 
- 
         
- 
       Note: if the receiver r set_stopped 
 
- 
       
- 
     if sch . schedule_bulk_chunked () n uint32_t r bulk_item_receiver * s storage - 
       at least one of set_value set_error set_stopped r - 
         set_value 
- 
         if set_value set_error 
- 
         set_stopped r 
 
- 
         
- 
       the storage s s . data s . size - 
         the storage represented by s receiver r 
- 
         the implementation may use the storage represented by s 
 
- 
         
- 
       Note: if the receiver r set_stopped 
- 
       if set_value r start r n n schedule_bulk_chunked - 
         the start r begin end begin end 0 begin end n 
- 
         the start 
- 
         the start set_value () 
 
- 
         
- 
       schedule_bulk_unchunked schedule_bulk_chunked start ( i , i + 1 ) i [ 0 , n ) 
- 
       if in the process of calling start r set_error r r n start 
- 
       if in the process of calling start r set_stopped r r n start 
 
- 
       
5. Design discussion and decisions
5.1. To drive or not to drive
On single-threaded systems (e.g., freestanding implementations) or on systems in which the main thread has special significance (e.g., to run the Qt main loop), it’s important to allow scheduling work on the main thread. For this, we need the main thread to drive work execution.The earlier version of this paper, [P2079R2], included 
We can simplify this discussion to a single function:
void drive ( system_context & ctx , sender auto snd ); 
Let’s assume we have a single-threaded environment, and a means of customizing the system context for this environment.
We know we need a way to donate 
- 
     define our drive 
- 
     or allow the customization to define a custom drive 
With a standard 
system_context ctx ; auto snd = on ( ctx , doWork ()); drive ( ctx , std :: move ( snd )); 
Without drive, we rely on an 
system_context ctx ; async_scope scope ; auto snd = on ( ctx , doWork ()); scope . spawn ( std :: move ( snd )); custom_drive_operation ( ctx ); 
Neither of the two variants is very portable.
The first variant requires applications that don’t care about drive-ability to call 
We envision a new paper that adds support for a main scheduler similar to the system scheduler. The main scheduler, for hosted implementations would be typically different than the system scheduler. On the other hand, on freestanding implementations, the main scheduler and system scheduler can share the same underlying implementation, and both of them can execute work on the main thread; in this mode, the main scheduler is required to be driven, so that system scheduler can execute work.
Keeping those two topic as separate papers allows to make progress independently.
5.2. Freestanding implementations
This paper payed attention to freestanding implementations, but doesn’t make any wording proposals for them. We express a strong desire for the system scheduler to work on freestanding implementations, but leave the details to a different paper.
We envision that, a followup specification will ensure that the system scheduler will work in freestanding implementations by sharing the implementation with the main scheduler, which is driven by the main thread.
5.3. Making system context replaceable
TODO: update section to remove the possibility for run-time replaceability.
The system context aims to allow people to implement an application that is dependent only on parallel forward progress and to port it to a wide range of systems. As long as an application does not rely on concurrency, and restricts itself to only the system context, we should be able to scale from single threaded systems to highly parallel systems.
In the extreme, this might mean porting to an embedded system with a very specific idea of an execution context. Such a system might not have a multi-threading support at all, and thus the system context not only runs with single thread, but actually runs on the system’s only thread. We might build the context on top of a UI thread, or we might want to swap out the system-provided implementation with one from a vendor (like Intel) with experience writing optimized threading runtimes.
The latter is also important for the composability of the existing code with the system context, i.e., if Intel Threading building blocks (oneTBB) is used by somebody and they want to start using the system context as well, it’s likely that the users want to replace system context implementation with oneTBB because in that case they would have one thread pool and work scheduler underneath.
We should allow customization of the system context to cover this full range of cases.
To achieve this we see options:
- 
     Link-time replaceability. This could be achieved using weak symbols, or by choosing a runtime library to pull in using build options. 
- 
     Run-time replaceability. This could be achieved by subclassing and requiring certain calls to be made early in the process. 
- 
     Compile-time replaceability. This could be achieved by importing different headers, by macro definitions on the command line or various other mechanisms. 
Link-time replaceability has the following characteristics:
- 
     Pro: we have precedence in the standard: this is similar to replacing operator new 
- 
     Pro: more predictable, in that it can be guaranteed to be application-global. 
- 
     Pro: some of the type erasure and indirection can be removed in practice with link-time optimization. 
- 
     Con: it requires defining the ABI and thus, in some cases, would require some type erasure and some inefficiency. 
- 
     Con: harder to get it correctly with shared libraries (e.g., DLLs might have different replaced versions of the system scheduler). 
- 
     Con: the replacement might depend on the order of linking. 
Run-time replaceability has the following characteristics:
- 
     Pro: we have precedence in the standard: this is similar to std :: set_terminate () 
- 
     Pro: easier to achieve consistent behavior on applications with shared libraries (e.g., Windows has the same version of C++ standard library in DLL). 
- 
     Pro: a program can have multiple implementations of system scheduler. 
- 
     Con: race conditions between replacing the system scheduler and using it to spawn work (for buggy implementations). 
- 
     Con: implies going over an ABI, and cannot be optimized at link-time. 
- 
     Con: different implementation may allocate resources for the system scheduler at startup, and then, at the start of main, the implementation is replaced (this is mainly a QOI issue). 
Compile-time replaceability has the following characteristics:
- 
     Pro: users can do this with a type-def that can be used everywhere and switched. 
- 
     Con: potential problems with ODR violations. 
- 
     Con: doesn’t support shareability across different binaries of the same process 
The paper considers compile-time replaceability as not being a viable option because it easily breaks one of the fundamental design principles of a system context, i.e. having one, shared, application-wide execution context, which avoids oversubscription.
Replaceability is also part of the [P2900R8] proposal for the contract-violation handler.
The paper proposes that whether the handler is replaceable to be implementation-defined.
If an implementation chooses to support replaceability, it shall be done similar to replacing the global 
The replaceability topic is highly controversial. Some people think that link-time replaceability is the way to go; Adobe supports this position. Others think that run-time replaceability is the way to go; Bloomberg supports this latter position.
The feedback we received from Microsoft, is that they will likely not support replaceability on their platforms. They would prefer that we offer implementations an option to not implement replaceability. Moreover, for systems where replaceability is supported they would prefer to make the replaceability mechanism to be implementation defined.
The authors disagree with the idea that replaceability is not needed for Windows platforms (or other platforms that provide an OS scheduler). The OS scheduler is optimized for certain workloads, and it’s not the best choice for all workloads. This way not providing replaceability options have the following drawbacks:
- 
     it limits the ability to hand-tune the performance of the application (when system scheduler is used); 
- 
     it limits the parallel_scheduler 
- 
     it limits the parallel_scheduler 
- 
     it limits the composability of the system context with other parallel runtimes (while avoiding oversubscription). 
The poll taken in Wrocław, Poland, November 2024, showed that the majority of the participants support mandating replaceability, as well as specifying the mechanism of replaceability and the API for replaceability.
In accordance with the feedback, and the beliefs of the authors, the paper proposes the following:
- 
     mandate replaceability 
- 
     make the replaceability mechanism to be both link-time and runtime 
- 
     define a replaceability API 
5.4. Replaceability details
TODO: update this section to match the absence of run-time replaceability
To replace the system scheduler, the user needs to do the following:
- 
     Implement a system scheduler behind the std :: system_context_replaceability :: parallel_scheduler 
- 
     If runtime replaceability is desired, call std :: set_system_context_backend_factory 
- 
     If link-time replaceability is desired, follow the implementation instructions to replace the std :: system_context_replaceability :: query_system_context 
The above points raise a few questions:
- 
     Can the system scheduler be replaced multiple times? 
- 
     Can the system scheduler be replaced after work is scheduled/started? 
- 
     Can the system scheduler be replaces outside of main () 
- 
     Can the system scheduler be replaced both at runtime and at link-time? 
A quick answer to all of the above questions is "yes".
Replacing the system scheduler multiple times can be achieved with runtime replaceability.
Any call to 
Changing the system scheduler can be done after work is scheduled/started. The old work will continue to execute on the previous system scheduler. This implies that, for brief periods of time, multiple system schedulers backends may be active at the same time (possibly leading to oversubscription).
The system scheduler can be replaced outside of 
While, in theory it is possible to replace the system scheduler both at runtime and at link-time, in practice, it is not recommended. The paper leaves it to the implementation to decide what happens when both mechanisms are used. Implementations might choose to disable runtime replaceability if link-time replaceability is used. Alternatively, implementations might choose to allow the users to start the program with a link-time replaced system context, and then replace it at runtime with a different system context.
5.5. Extensibility
The 
Whatever the replaceability mechanism is, we need to ensure that new features can be added to the system context in a backwards-compatible manner.
There are two levels in which we can extend the system context:
- 
     Add more types of schedulers, beside the system scheduler. 
- 
     Add more features to the existing scheduler. 
The first type of extensibility can easily be solved by adding new getters for the new types of schedulers. Different types of schedulers should be able to be replaced separately; e.g., one should be able to replace the I/O scheduler without replacing the system scheduler. The discussed replaceability mechanisms support this.
To extend existing schedulers, one can pass properties to it, via the receiver. For example, adding priorities to the system scheduler can be done by adding a priority property to the type-erased receiver object. The list of properties that can be passed to the backend is not finite; however both the frontend and the backend must have knowledge about the supported properties.
5.6. Shareability
One of the motivations of this paper is to stop the proliferation of local thread pools, which can lead to CPU oversubscription. If multiple binaries are used in the same process, we don’t want each binary to have its own implementation of system context. Instead, we would want to share the same underlying implementation.
The paper mandates shareability, but leaves the details of shareability to be implementation-defined (they are different for each backend).
5.7. Performance
To support shareability and replaceability, system context calls may need to go across binary boundaries, over the defined API. A common approach for this is to have COM-like objects. However, the problem with that approach is that it requires memory allocation, which might be a costly operation. This becomes problematic if we aim to encourage programmers to use the system context for spawning work in a concurrent system.
While there are some costs associated with implementing all the goals stated here, we want the implementation of the system context to be as efficient as possible. For example, a good implementation should avoid memory allocation for the common case in which the default implementation is utilized for a platform.
This paper cannot recommend the specific implementation techniques that should be used to maximize performance; these are considered Quality of Implementation (QOI) details.
5.8. Lifetime
Underneath the system scheduler, there is a singleton of some sort. We need to specify the lifetime of this object and everything that derives from it.
Revision R4 of the paper mandates that the lifetime of any 
We received feedback that this was too strict.
First, there are many applications where the C++ part does not have a 
R5 revision of the paper relaxes the lifetime requirements of the system scheduler. The system scheduler can now be used in any part of a C++ program.
5.9. Need for the system_context 
    Our goal is to expose a global shared context to avoid oversubscription of threads in the system and to efficiently share a system thread pool.
Underneath the system_context The question is how we expose the singleton. We have a few obvious options:
- 
     Explicit context objects, as we’ve described in R2, R3 and R4 of this paper, where a system_context 
- 
     A global get_system_context () system_context 
- 
     A global get_parallel_scheduler () 
In R4 and earlier revisions, we opted for an explicit context object. The reasoning was that providing explicit contexts makes it easier to understand the lifetime of the schedulers. However, adding this extra class does not affect how one would reason about the lifetime of the schedulers or the work scheduled on them. Therefore, introducing an artificial scope object becomes an unnecessary burden.
There were also arguments made for adding 
Thus, the paper simply proposes a 
5.10. Backend environment
For custom backend implementations, it is often necessary access properties of the environment that is connected to the sender that triggers a schedule operation. Because the backend is behind a type-erased boundary, the environment that can be passed to the backend also needs to be type-erased.We’ve added the possibility to encode environment properties in the receiver object.
The backend can query the receiver for environment properties by using the 
The backend can query the receiver object for the entire lifetime of the asynchronous operation (until one of the completion signals is called).
One of the implication for using types to query properties is that the exchange of environment properties needs to use concrete types, and cannot use concepts.
If, for example, the backend needs to obtain a stop token, it needs to ask for a specific stop token type (like 
At this point, the paper requires that only 
The authors envision that the list of supported properties will grow over time.
5.11. Priorities
It’s broadly accepted that we need some form of priorities to tweak the behavior of the system context. This paper does not include priorities, though early drafts of R2 did. We had different designs in flight for how to achieve priorities and decided they could be added later in either approach.The first approach is to expand one or more of the APIs.
The obvious way to do this would be to add a priority-taking version of 
implementation - defined - parallel_scheduler get_scheduler (); implementation - defined - parallel_scheduler get_scheduler ( priority_t priority ); 
This approach would offer priorities at scheduler granularity and apply to large sections of a program at once.
The other approach, which matches the receiver query approach taken elsewhere in [P2300R10] is to add a 
In either case we can add the priority in a separate paper. It is thus not urgent that we answer this question, but we include the discussion point to explain why they were removed from the paper.
5.12. Reference implementation
The authors prepared a reference implementation in stdexec
A few key points of the implementation:
- 
     The implementation is divided into two parts: "frontend" and "backend". The frontend part implements the API defined in this paper and calls the backend for the actual implementation. The backend provides the actual implementation of the system context. 
- 
     (uses the name system_scheduler parallel_scheduler 
- 
     Allows link-time replaceability for system_scheduler 
- 
     Allows run-time replaceability for system_scheduler 
- 
     Defines a replaceability API between the frontend and backend parts. This way, one can easily extend this interface when new features need to be added to system context. 
- 
     Uses preallocated storage on the frontend side, so that the default implementation doesn’t need to allocate memory on the heap when adding new work to system_scheduler 
- 
     Uses a Phoenix singleton pattern to ensure that the system scheduler is alive when needed. 
- 
     As the default implementation is created outside of the frontend part, it can be shared between multiple binaries in the same process. 
- 
     uses a static_thread_pool libdispatch 
(as the time of writing this paper revision, not all these features are merged on the mainline).
5.13. Addressing received feedback
5.13.1. Allow for system context to borrow threads
Early feedback on the paper from Sean Parent suggested a need for the system context to support a configuration where it carries no threads of its own and takes over the main thread. While in [P2079R2] we proposedexecute_chunk execute_all drive () As we discussed previously, a separate paper is supposed to take care of the drive-ability aspect.
5.13.2. Allow implementations to use Grand Central Dispatch and Windows Thread Pool
In the current form of the paper, we allow implementations to define the best choice for implementing the system context for a particular system. This includes using Grand Central Dispatch on Apple platforms and Windows Thread Pool on Windows.
In addition, we propose implementations to allow the replaceability of the system context implementation. This means that users should be allowed to write their own system context implementations that depend on OS facilities or a necessity to use some vendor (like Intel) specific solutions for parallelism.
5.13.3. Priorities and elastic pools
Feedback from Sean Parent:
There is so much in that proposal that is not specified. What requirements are placed on the system scheduler? Most system schedulers support priorities and are elastic (i.e., blocking in the system thread pool will spin up additional threads to some limit).
The lack of details in the specification is intentional, allowing implementers to make the best compromises for each platform. As different platforms have different needs, constraints, and optimization goals, the authors believe that it is in the best interest of the users to leave some of these details as Quality of Implementation (QOI) details.
5.13.4. Implementation-defined may make things less portable
Some feedback gathered during discussions on this paper suggested that having many aspects of the paper to be implementation-defined would reduce the portability of the system context.
While it is true that people that would want to replace the system scheduler will have a harder time doing so, this will not affect the users of the system scheduler. They would still be able to the use system context and system scheduler without knowing the implementation details of those.
We have a precedence in the C++ standard for this approach with the global allocator.
5.13.5. Replaceability is not needed (at least on Windows)
Microsoft provided feedback that they will likely not support replaceability on their platforms. The feedback we received from Microsoft, is that they will likely not support replaceability on their platforms. They would prefer that we offer implementations an option to not implement replaceability. Moreover, for systems where replaceability is supported they would prefer to make the replaceability mechanism to be implementation defined.
The authors disagree with the idea that replaceability is not needed for Windows platforms (or other platforms that provide an OS scheduler). The OS scheduler is optimized for certain workloads, and it’s not the best choice for all workloads. This way not providing replaceability options have the following drawbacks:
- 
     it limits the ability to hand-tune the performance of the application (when system scheduler is used); 
- 
     it limits the parallel_scheduler 
- 
     it limits the parallel_scheduler 
- 
     it limits the composability of the system context with other parallel runtimes (while avoiding oversubscription). 
The poll taken in Wrocław, Poland, November 2024, showed that the majority of the participants support mandating replaceability, as well as specifying the mechanism of replaceability and the API for replaceability.
6. Specification
6.1. Header < version > 
   To the 
#define __cpp_lib_syncbuf 201803L // also in <syncstream> #define __cpp_lib_parallel_scheduler 2025XXL // also in <execution> #define __cpp_lib_text_encoding 202306L // also in <text_encoding> 
6.2. 8.2 Header < execution > 
   To the 
namespace std :: execution { // [exec.get_parallel_scheduler] parallel_scheduler get_parallel_scheduler (); // [exec.parallel_scheduler] class parallel_scheduler { unspecified }; // [exec.parallel_scheduler] class impl - defined - parallel_sender { unspecified }; } // [exec.sysctxrepl] namespace std :: execution :: system_context_replaceability { struct parallel_scheduler ; shared_ptr < parallel_scheduler > query_parallel_scheduler_backend (); struct receiver { virtual ~ receiver () = default ; receiver ( const receiver & ) = delete ; receiver ( receiver && ) = delete ; receiver & operator = ( const receiver & ) = delete ; receiver & operator = ( receiver && ) = delete ; protected : // exposition only virtual bool unspecified - query - env ( unspecified - id , void * ) noexcept = 0 ; public : virtual void set_value () noexcept = 0 ; virtual void set_error ( std :: exception_ptr ) noexcept = 0 ; virtual void set_stopped () noexcept = 0 ; template < class - type P > std :: optional < P > try_query () noexcept ; }; struct bulk_item_receiver : receiver { virtual void start ( uint32_t , uint32_t ) noexcept = 0 ; }; struct storage { void * data ; uint32_t size ; }; struct parallel_scheduler { virtual ~ parallel_scheduler () = default ; virtual void schedule ( receiver * , storage ) noexcept = 0 ; virtual void schedule_bulk_chunked ( uint32_t , bulk_item_receiver * , storage ) noexcept = 0 ; virtual void schedule_bulk_unchunked ( uint32_t , bulk_item_receiver * , storage ) noexcept = 0 ; }; } 
6.3. System context
Add the following as a new subsection at the end of 33 [exec]:
33.N.1 
- 
      get_parallel_scheduler 
- 
      Returns: An instance of a execution :: parallel_scheduler - 
        The intended use for system scheduler is to have the parallel execution context shared between applications. 
 
- 
        
33.N.2 
- 
      parallel_scheduler scheduler - 
        Users might alter the behavior of the system scheduler by replacing the backend implementation, as discussed in [exec.sysctxrepl]. 
 
- 
        
- 
      An instance of the parallel_scheduler execution :: system_context_replaceability :: parallel_scheduler - 
        If the user does not specify a custom backend, a default is provided by the implementation. 
- 
        If no system scheduler backend is available when an instance of the class parallel_scheduler terminate () 
 
- 
        
- 
      Two objects sch1 sch2 execution :: parallel_scheduler 
- 
      If sch execution :: parallel_scheduler get_forward_progress_guarantee ( sch ) execution :: parallel 
- 
      Let snd execution :: schedule () parallel_scheduler snd impl - defined - parallel_sender recv snd schedule ( r , s ) r s - 
        r execution :: system_context_replaceability :: receiver - 
          when the backend calls r -> set_value () recv 
- 
          when the backend calls r -> set_error () ep recv ep 
- 
          when the backend calls r -> set_done () recv 
 
- 
          
- 
        s execution :: system_context_replaceability :: storage r -> set_value () r -> set_error () r -> set_stopped () 
 
- 
        
- 
      Implementations shall provide customizations for the execution :: bulk_chunked () snd execution :: bulk_chunked ( sndr , shape , f ) recv snd schedule_bulk_chunked ( n , r , s , e ) - 
        r execution :: system_context_replaceability :: bulk_item_receiver - 
          when the backend calls r -> set_value () recv 
- 
          when the backend calls r -> set_error () ep recv ep 
- 
          when the backend calls r -> set_done () recv 
- 
          when the backend calls r -> start () b e f ( b , e ) 
 
- 
          
- 
        s execution :: system_context_replaceability :: storage r -> set_value () r -> set_error () r -> set_stopped () 
- 
        Customizing the behavior of bulk_chunked bulk 
 
- 
        
- 
      Implementations shall provide customizations for the execution :: bulk_unchunked () snd execution :: bulk_unchunked ( sndr , shape , f ) recv snd schedule_bulk_unchunked ( n , r , s , e ) - 
        r execution :: system_context_replaceability :: bulk_item_receiver - 
          when the backend calls r -> set_value () recv 
- 
          when the backend calls r -> set_error () ep recv ep 
- 
          when the backend calls r -> set_done () recv 
- 
          when the backend calls r -> start () i f ( i ) 
 
- 
          
- 
        s execution :: system_context_replaceability :: storage r -> set_value () r -> set_error () r -> set_stopped () 
 
- 
        
33.N.3 
- 
      Facilities in execution :: system_context_replaceability we use the term backend to refer to the actual implementation of the system scheduler, which is hidden behind a type-erased boundary. At the same type, we refer to the frontend as the part of the system context that is exposed to the users, i.e., the execution :: parallel_scheduler - 
        query_system_context () 
- 
        receiver receiver 
- 
        bulk_item_receiver bulk_item_receiver 
- 
        storage 
- 
        parallel_scheduler 
 
 query_parallel_scheduler_backend () 
- 
        
- 
      Returns: a non-null shared pointer to an object that implements the parallel_scheduler 
- 
      Remarks: - 
        A C++ program may provide replacements for query_parallel_scheduler_backend () 
 
 std :: optional < P > receiver :: try_query () noexcept 
- 
        
- 
      Returns: an optional object that contains the property of type P - 
        If, on the frontend side, the sender obtained from a system scheduler is connected to a receiver that has an environment for which get_stop_token inplace_stop_token try_query < inplace_stop_token > () 
- 
        It is unspecified for which other properties P try_query 
 
 struct parallel_scheduler 
- 
        
- 
      parallel_scheduler 
 virtual void parallel_scheduler :: schedule ( receiver * r , storage s ) noexcept = 0 
- 
      Effects: undefined behavior unless the following are met by implementations: - 
        it schedules new work on a thread belonging to the execution context represented by this 
- 
        eventually, one of the following methods are called on the given object r set_value () set_error () set_stopped () 
- 
        if no error occurs, and the work is not cancelled, r -> set_value () 
- 
        if an error occurs, r -> set_error () exception_ptr 
- 
        if the work is cancelled, r -> set_stopped () 
- 
        The canonical way of cancelling the work is to store a stop token inside the type-erased receiver r r . try_query < inplace_stop_token > () 
 
- 
        
- 
      Remarks: - 
        The caller guarantees that, until r -> set_value () r -> set_error () r -> set_stopped () - 
          the this parallel_scheduler 
- 
          parameter objects r s 
- 
          the memory of size s . size s . data s . size 
 
- 
          
 
 virtual void parallel_scheduler :: schedule_bulk_chunked ( uint32_t n , bulk_item_receiver * r , storage s ) noexcept = 0 
- 
        
- 
      Effects: same as for schedule ( r , s , e ) - 
        if r -> start ( b , e ) 0 b e n 
- 
        all calls to r -> start ( b , e ) b e 
- 
        if r -> set_value () i 0 n r -> start ( b , e ) b i e 
- 
        all calls to r -> start () r -> set_value () r -> set_error () r -> set_stopped () 
- 
        all calls to r -> start () this 
 
- 
        
- 
      Remarks: same as for schedule ( r , s ) 
 virtual void parallel_scheduler :: schedule_bulk_unchunked ( uint32_t n , bulk_item_receiver * r , storage s ) noexcept = 0 
- 
      Effects: same as for schedule_bulk_chunked ( r , s , e ) r -> start ( b , e ) e == b + 1 
- 
      Remarks: same as for schedule ( r , s ) 
7. Polls
7.1. SG1, Wrocław, Poland, 2024
SG1 provided the following feedback, through a poll:
- 
     Forward P2079R5 to LEWG for C++26 with changes: - 
       Remove the wording about delegatee schedulers 
- 
       Add wording about set_error 
 | SF | F | N | A | SA | | 3 | 2 | 1 | 0 | 0 | Unanimous consent 
- 
       
7.2. LEWG, Wrocław, Poland, 2024
LEWG provided the following feedback, through polls:
- 
     POLL: We support mandating (by specifying in the standard) replaceability, as described in: “P2079R5 System execution context” | SF | F | N | A | SA | | 10 | 5 | 1 | 0 | 1 | Attendance: 15 IP, 7 online # of Authors: 2 Author’s Position: 2x SF Outcome: Consensus in favor SA: I think using this central point is equivalent to using global and is bad software engineering practice and I wouldn’t like to see it in the standard. 
- 
     POLL: We support mandating the specific mechanism(s) (link time/runtime/both) of replaceability as described in: “P2079R5 System execution context” | SF | F | N | A | SA | | 5 | 6 | 3 | 0 | 2 | Attendance: 15 IP, 7 online # of Authors: 2 Author’s Position: 1x SF 1x N Outcome: Consensus in favor SA: I don’t believe it’s actually possible to describe this in terms of the abstract machine, we’ve never done that. 
- 
     POLL: We support specifying an API for the replaceability mechanism(s), as described in: “P2079R5 System execution context” | SF | F | N | A | SA | | 7 | 6 | 2 | 0 | 1 | Attendance: 15 (IP), 7 (R) # of Authors: 2 Author’s Position: 2x SF Outcome: Consensus in favor 
- 
     POLL: For lifetime: “P2079R5 System execution context” (approval poll - allowed to vote multiple times): - 
       Not specified in the standard (valid to use the system scheduler everywhere) | 11 | 
- 
       Allow lifetime to start before main() (allow use of scheduler before entering main but not after main returns) | 1 | 
- 
       Constrain lifetime to only be inside of main() | 8 | 
 Attendance: 15 (IP), 7 (R) Outcome: Slight preference towards 1 ( If authors advocate for 3, they need to add rationale for it) 
- 
       
In addition to these polls, LEWG provided the following action items:
- 
     Add noexcept to move, copy constructor, move/copy assignment operators (special member functions). 
- 
     Strike system_scheudler query(get_completion_scheduler_t) overload. 
- 
     Add Lvalue connectability. 
- 
     Add normative recommendation / non-normative note (if normative - should not be in a note) for sharability in implementations, to be decided later. 
7.3. SG1, Hagenberg, Austria, 2025
SG1 provided the following feedback, through a forwarding poll:
- 
     Forward P2079R6 to LEWG with the following changes towards C++26: A. Move the name get_system_scheduler B. Rename the current get_system_scheduler get_parallel_scheduler C. Remove run-time replaceability. SF | F | N | A | SA 5 | 5 | 0 | 1 | 0 Consensus Against: LB: I’m a bit concerned about some of the lifetime implications of the API changes. I’d rather see it back in SG1 again before it goes to LEWG.