| Document #: | P4215R0 [Latest] [Status] |
| Date: | 2026-05-12 |
| Project: | Programming Language C++ |
| Audience: |
SG1 |
| Reply-to: |
Lucian Radu Teodorescu (Garmin) <lucteo@lucteo.ro> |
C++26 senders provide a vocabulary for composing asynchronous work, but they do not by themselves provide sender-native equivalents of synchronization primitives that impose non-local concurrency constraints. Programs still need to serialize unrelated operations, bound concurrency across independently submitted work, wait for readiness, and coordinate completion or phase boundaries.
This paper explores a family of sender-based primitives, called gates, that control admission of work according to such non-local constraints. The design is motivated by the ordering relations expressed by existing synchronization primitives, but aims to preserve sender composition, safety invariants, progress, non-blocking waiting, and compatibility with structured lifetime management.
The paper is exploratory. It asks SG1 whether this model is a useful direction for C++ concurrency, and where further design effort should be focused.
The sender model gives C++ a way to describe asynchronous work as values. Sender algorithms allow programs to compose work into execution graphs, and completion signals make the flow of values, errors, and cancellation explicit. This is a significant improvement over ad-hoc asynchronous APIs.
Most sender composition is local: the work items being related are
present in the same expression, or are connected by an execution graph
that can be inspected from one place. For example, a program can express
that two operations run concurrently with
when_all
, that one operation starts after another with
let_value
, or that work transfers to a scheduler with
starts_on
or
continues_on
.
Some concurrency constraints are not local in this sense. Consider an application in which several subsystems can write to the same file. The subsystem that initiates one write may not know about the subsystem that initiates another write. Nevertheless, the program must impose a non-local constraint: two writes to that file must not execute concurrently. Similar constraints occur when independent request paths share a database connection pool, when many operations depend on a one-time initialization step, or when a set of workers must rendezvous at a phase boundary.
In synchronous code, we routinely express these constraints with synchronization objects. A mutex is not local to one call graph; it is a shared object through which otherwise independent pieces of code participate in the same serialization constraint. A semaphore bounds concurrent access to a resource. A condition variable coordinates progress based on shared state. A latch or barrier establishes ordering between groups of work.
The question for sender-based C++ is not whether these constraints still exist. They do. The question is how to express them without falling back to blocking threads, manual signaling protocols, or unstructured lifetime management.
This paper approaches the question from the constraints themselves. We first look at classic synchronization primitives as ways to impose ordering or concurrency relations between work items. We then explore sender-native objects, called gates, that express similar non-local constraints while composing with sender algorithms, cancellation, completion signaling, and asynchronous scopes.
Serial execution imposes a total order on the execution of work
items. If
a
and
b
are two work items, then either
a
<
b
or
b
<
a
.
On the other hand, concurrent execution imposes a strict partial
ordering over the execution of work items. Thus, we have 3 possibilities
for executing two work items
a
and
b
:
a
<
b
;b
<
a
;a
<
b
nor
b
<
a
(concurrent execution).This is a run-time view of the execution. At program specification, it is useful to consider another relation:
a
<
b
or
b
<
a
(mutual exclusion)We denote this last relation with
a
< >
b
. For convenience, we also denote with
a
||
b
the absence of any imposed ordering relation between
a
and
b
– this does not guarantee that they would run concurrently; they may or
may not execute concurrently.
In this document we always use the “<” symbol to denote ordering between work items. We do not use it as the less operator from C++.
Typically any constraint of the form
a
<
b
means
a
happens-before
b
. However, the model may be relaxed to consider other types of
relations.
For more details please see [Teodorescu26].
We call a concurrency constraint local when the work items being constrained are composed together directly. For example:
auto s
= std::execution::when_all(read_header(), read_body())
| std::execution::let_value([](auto header, auto body) {
return parse_message(header, body);
});The relationship between
read_header
,
read_body
, and
parse_message
is local to the sender expression. The expression itself describes which
work can overlap and which work depends on earlier completion. In this
fragment, we have:
read_header
||
read_body
, and
read_header
<
parse_message
, and
read_body
<
parse_message
.
We call a concurrency constraint non-local when the constrained work items are not necessarily composed together at the same point in the program. The constraint is instead mediated by a shared object, policy, or resource. For example:
void subsystem_a() {
write_log_record(...);
}
void subsystem_b() {
write_log_record(...);
}The two calls may be initiated by unrelated parts of the program, but they may still need to participate in the same serialization constraint if they write to the same file. The constraint is non-local because it is not visible in either call site alone.
While we can easily encode local constraints with sender algorithms, encoding non-local constraints is not trivial. Reasoning about non-local constraints is also harder, and we do not, in general, have guarantees that progress composes.
At its core, non-local concurrency control is the repeated application of a partition function over the set of work items. In other words, we need to decide what can start execution at any given time, and what needs to be postponed.
The important observation is that most programs do not need a single global concurrency policy. Concurrency constraints are usually sparse. A log file, a database connection pool, a readiness condition, or a phase boundary constrains only the work items that use that object. Work outside that object is unrelated.
Thus, we can model each such object as a small admission controller for one non-local constraint. The controller partitions associated work into work that may run now and work that must wait. In synchronous code, this role is played by mutexes, semaphores, condition variables, latches, and barriers. In sender-based asynchronous code, we want corresponding facilities that impose the same kinds of constraints without blocking execution resources and without exposing manual release or notification protocols when sender structure can express them.
This paper calls these sender-native admission controllers gates. A gate is shared by otherwise independent sender expressions, and it imposes a named non-local concurrency constraint on the work associated with it.
Some of these abstractions have a long history in practice. Task queues, strands, serializers, asynchronous semaphores, rate limiters, and similar facilities are widely used in asynchronous systems. This paper is not primarily trying to claim novelty for each individual primitive. Instead, it tries to identify the underlying concurrency relations that such facilities impose, and to describe a common sender-native model that can extend from familiar cases to less-established non-local constraints.
A more detailed treatment of this model can be found in [Teodorescu26].
This paper uses the terminology from [P4214R0]: correctness is decomposed into safety and liveness, and for the class of concurrency facilities discussed here, liveness is treated through progress. A useful concurrency primitive should therefore preserve both:
Thus, throughout this paper, the safety and progress discussion for each gate asks two questions:
The rest of this paper follows a simple method.
First, we identify the relation imposed by existing synchronization primitives. For example, a mutex imposes an ordering constraint between critical sections; a semaphore imposes a cardinality constraint on concurrently executing work; a barrier imposes ordering between phases.
Second, we ask how the same kind of relation could be expressed for sender-based asynchronous work. The resulting abstraction should not block an execution resource while waiting. It should compose with sender algorithms and completion signals. It should avoid manual release or notification protocols when release can instead be structurally tied to sender completion.
Third, we evaluate whether the abstraction preserves safety and progress compositionally. If the submitted work is safe and makes progress when executed under the intended constraint, the primitive should not introduce hidden races, lost notifications, or progress dependencies unrelated to the submitted work.
The following sections apply this method to the gate vocabulary introduced above.
Here we analyze synchronization primitives in the C++ standard from a pre-senders perspective.
Let us look at the following example using an
std::mutex
:
std::mutex m;
void usage() {
head();
{
std::lock_guard<std::mutex> guard{m};
w();
}
tail();
}We execute work
w
in a region protected by mutex
m
, with
head
being executed before
w
and
tail
being executed after it.
Thus, we have the following ordering constraints:
head
<
w
(Local 1)w
<
tail
(Local 2)head
<
tail
(Local 3)In addition to these, the mutex imposes the extra set of ordering constraints:
w1
and
w2
are protected by
m
, then
w1
< >
w2
(Mutex)
The constraints imposed by a mutex.
Similar to the previous example, we can look at an example that uses shared mutexes:
std::shared_mutex m;
void usage_read() {
head();
{
std::shared_lock guard{m};
w();
}
tail();
}
void usage_write() {
head();
{
std::unique_lock guard{m};
w();
}
tail();
}A shared mutex can be used in two modes: for reading data (shared mode), or for writing data (unique mode). The Local ordering constraints discussed for mutexes also apply for shared mutexes, regardless of the mode in which they are used. The core ordering constraint from mutex morphs into:
w1
and
w2
are protected by
m
, either of which being used in write mode, then
w1
< >
w2
(SharedMutex)As expected, multiple read operations may execute concurrently, but a read cannot happen concurrently with a write, and neither can two writes.
The constraints imposed by a shared_mutex.
Timed and recursive mutexes do not introduce new ordering constraints compared to the basic mutex. Thus, they are not interesting for our analysis.
try_lock
facilitiesThe
try_lock
family of functions allows the user to execute a work item only if other
work items on the mutex are not executing at that time. Compared to our
previous example,
w
might not execute. When
w
is executed, then all (Local) ordering constraints need to be
followed; otherwise, (Local 1) and (Local 2) make no
sense.
A
counting_semaphore
behaves like a
mutex
, but allows more than one concurrent access to the same resource. Also,
unlike a mutex, the semaphore allows unstructured access to the
resource; the resource does not have ownership like in the mutex case.
Semaphores can be used in signaling mechanisms in which mutexes are
harder to use (e.g., cross-thread handoff). Conceptually, a mutex
protects a resource while a semaphore coordinates progress.
A
binary_semaphore
is a semaphore that cannot have more than one concurrent access to the
resource. From that perspective it is similar to a mutex, but it allows
the unstructured access patterns of a
counting_semaphore
.
The useful constraint provided by a semaphore is the concurrency bound. The problematic part, for our purposes, is not bounded concurrency itself, but the unstructured manual acquire/release protocol: a release can be forgotten, performed on only one completion path, or disconnected from the work whose completion should free capacity.
Let us take the following example as being representative for a semaphore:
std::counting_semaphore s(10);
void usage(int index) {
head();
s.acquire();
w(index);
s.release();
tail();
}To discuss ordering constraints, let W = {
w1
,
w2
, …
wN
} be the set of work items protected by semaphore
s
with count K.
Like in the previous cases, the Local ordering constraints also apply here. On top of those, we have something specific to semaphores:
w_
i ||
w_
j ∀ i, j ∈ W0, the cardinality of
W0 is not greater than K (Semaphore)In other words, we cannot have more than K work items execute concurrently. If we have more work items that are ready to be executed, their execution needs to be delayed until one of the currently K work items completes. This is a concurrency cardinality constraint.
The constraints imposed by a
counting_semaphore.
A condition variable suspends the execution of a thread of execution until a condition is met. A condition variable is always paired with a mutex that is used to protect access to the data underlying the condition that is blocking execution.
The following is an illustrative example of a condition variable:
std::mutex m;
std::condition_variable cv;
void condition_check(int index) {
head();
std::unique_lock lock(m);
cv.wait(lock, [] { return c(); });
tail();
}
void update_condition(int index) {
head2();
{
std::unique_lock lock(m);
w();
}
cv.notify_one();
tail2();
}Let us try to extract the ordering constraints for this example. First, both parts (checking the condition and updating the condition) follow the Local ordering constraints. We also have the Mutex ordering constraint apply for checking and updating the condition. In addition to those, we also have extra ordering constraints for condition variables:
c()
that allows
cv.wait
to complete observes an update performed by some
w
, so that
w
<
c
(CondVar)
The constraints imposed by a
condition_variable.
A
latch
allows one or more threads of execution to block until a fixed number of
arrivals has occurred. Once the expected number of arrivals has been
reached, the latch stays open and cannot be reused.
Let us take the following example as being representative for a latch:
std::latch l(K);
void worker(int index) {
head(index);
w(index);
l.count_down();
tail(index);
}
void coordinator() {
head_c();
l.wait();
tail_c();
}To discuss ordering constraints, let W = {
w1
,
w2
, …
wK
} be the set of work items whose arrivals cause latch
l
to reach zero.
Like in the previous cases, the Local ordering constraints apply for each worker and for the coordinator. On top of those, we have something specific to latches:
w
∈ W, then
w
<
tail_c
(Latch)In other words, all work items that contribute to opening the latch
need to complete before the work that follows
l.wait()
can execute. The latch does not impose ordering constraints between the
work items in W themselves; they can execute concurrently.
The constraints imposed by a latch.
A
barrier
allows a fixed number of threads of execution to repeatedly rendezvous
at the end of a phase. Once all participants arrive at the barrier, the
phase is completed and the participants can continue with the next
phase.
Let us take the following example as being representative for a barrier:
std::barrier b(K);
void participant(int index) {
head1(index);
w1(index);
b.arrive_and_wait();
tail1(index);
head2(index);
w2(index);
b.arrive_and_wait();
tail2(index);
}To discuss ordering constraints, let W1 = {
w1_1
,
w1_2
, …
w1_K
} be the set of work items executed before the first barrier phase is
completed, and let W2 = {
w2_1
,
w2_2
, …
w2_K
} be the set of work items executed after the first barrier phase is
completed.
Like in the previous cases, the Local ordering constraints apply for each participant. On top of those, we have something specific to barriers:
w1
∈ W1 and
w2
∈ W2, then
w1
<
w2
(Barrier)In other words, a barrier imposes an ordering constraint between phases: all work items before a barrier phase need to complete before any work item after that barrier phase can execute. The barrier does not impose ordering constraints between work items in the same phase; they can execute concurrently.
The constraints imposed by a barrier.
The synchronization primitives above allow programs to impose ordering and concurrency constraints between work items that are not locally related in the call graph. Mutexes serialize critical sections, semaphores bound concurrency, condition variables coordinate progress, and latches and barriers establish cross-thread phase boundaries.
The sender model gives C++ a vocabulary for structured asynchronous work, but it does not by itself provide replacements for these non-local constraints. This paper therefore considers primitives that can express such constraints while remaining compatible with structured concurrency.
We use the following goals to evaluate candidate primitives:
Proposed primitive
|
Alternative names
|
Corresponding legacy primitive
|
|---|---|---|
serial_gate
|
task_queue
,
serializer
,
strand
,
async_mutex
|
mutex
|
read_write_gate
|
rw_task_queue
,
rw_serializer
,
async_shared_mutex
|
shared_mutex
|
capacity_gate
|
n_serializer
,
bottleneck
,
work_limiter
,
concurrency_limiter
,
throttler
,
async_semaphore
|
counting_semaphore
|
readiness_gate
|
condition_signal
,
condition
,
state_signal
,
state_condition
,
async_condition_variable
|
condition_variable
|
completion_gate
|
countdown_signal
,
arrival_gate
,
join_counter
,
async_latch
|
latch
|
phase_gate
|
phase_signal
,
phase_join
,
rendezvous
,
phase_boundary
,
async_barrier
|
barrier
|
We prefer using the word gate in these names, as it suggests admission into a region of work, and it avoids implying blocking threads or legacy synchronization mechanics. It also resonates with the idea of adding (concurrency) constraints.
serial_gate
Similar to a
mutex
but for the structured concurrency world, a
serial_gate
ensures that only one work item is executed at a given time.
The
serial_gate
abstraction needs to maintain the (Mutex) constraint. In
addition to that, we want to maintain as much as possible from the
(Local) constraints. However, satisfying all of them is not
ideal.
If we were to satisfy all of the local constraints,
tail
would be forced to wait until
w
is complete. Because there are no other instructions between
head
, the protected execution of
w
, and
tail
, this wait implies either blocking the thread or executing other work.
Blocking is not acceptable for an asynchronous primitive. Executing
unrelated work is also not ideal, as it can increase the latency of
tail
: if the current thread starts executing unrelated work, it may need to
wait until that work completes before it can execute
tail
.
The best option is to split
tail
into two work items: what needs to be executed after
head
(Local 3), and what needs to be executed after
w
(Local 2). We call these work items
tail_h
and
tail_w
respectively.
With this split, we have a new set of local constraints:
head
<
w
(NewLocal 1)w
<
tail_w
(NewLocal 2)head
<
tail_h
(NewLocal 3)The following image shows how the constraints look like for
serial_gate
:
The constraints imposed by a serial_gate.
Based on the ideas from [P3955R0], an API might look like:
struct serial_gate {
std::execution::enter_scope_sender acquire();
};The users will be able to use it like:
serial_gate g;
std::execution::counting_scope scope;
void usage() {
head();
auto s = std::execution::within(g.acquire(), w_sender())
| std::execution::then([] { tail_w(); });
std::execution::spawn(std::move(s), scope.get_token());
tail_h();
}This example assumes that the spawned sender does not complete with
set_error
; otherwise the error must be handled before calling
spawn
.
Inspired from the API of
counting_scope
, we can have the following API:
struct serial_gate {
struct token; // models std::execution::scope_token
token get_token() noexcept;
};This can be used the following way:
serial_gate g;
std::execution::counting_scope scope;
void usage() {
head();
auto s = std::execution::associate(w_sender(), g.get_token())
| std::execution::then([] { tail_w(); });
std::execution::spawn(std::move(s), scope.get_token());
tail_h();
}
void usage2() { // without tail_w
head();
std::execution::spawn(w_sender(), g.get_token());
tail_h();
}For the purposes of this paper, we are not interested in exploring the API design alternative. We simply assume that the previous design is the preferred one.
The following examples use the
acquire()
API from the previous section. For brevity, they assume that spawned
senders either do not complete with
set_error
or handle errors before being passed to
spawn
, and that the enclosing program eventually joins the
counting_scope
.
A
serial_gate
can be used to protect state that is accessed by asynchronous
operations. Unlike a
mutex
, entering the protected region suspends the operation until it can make
progress; it does not block the current thread of execution.
struct cache {
void insert(record);
};
serial_gate cache_gate;
cache c;
std::execution::counting_scope scope;
void on_record(record r) {
auto update
= std::execution::just(std::move(r))
| std::execution::then([&](record r) {
c.insert(std::move(r));
});
auto protected_update =
std::execution::within(cache_gate.acquire(), std::move(update));
std::execution::spawn(std::move(protected_update), scope.get_token());
}In this example all calls to
cache::insert
are serialized by
cache_gate
, even if multiple records are received concurrently.
Some asynchronous resources allow many callers, but require only one operation to be active at a time. For example, a connection might require that requests are sent and responses are consumed serially, even though callers can enqueue work concurrently.
serial_gate connection_gate;
connection conn;
std::execution::counting_scope scope;
void submit_request(request req) {
prepare(req); // does not need exclusive access to conn
auto transaction
= send_request(conn, std::move(req))
| std::execution::let_value([&] {
return read_response(conn);
})
| std::execution::then([](response resp) {
process_response(std::move(resp));
});
auto protected_transaction =
std::execution::within(connection_gate.acquire(), std::move(transaction));
std::execution::spawn(std::move(protected_transaction), scope.get_token());
continue_after_submit(); // does not wait for the transaction to finish
}The
serial_gate
constrains only the transaction that uses
conn
. Work before submitting the transaction and work after submission are
not forced to wait for the protected operation to complete.
For work that does not need a continuation after the protected region, the protected sender can be spawned directly into an enclosing scope:
serial_gate gate;
std::execution::counting_scope scope;
void usage() {
head();
auto protected_work =
std::execution::within(gate.acquire(), w_sender());
std::execution::spawn(std::move(protected_work), scope.get_token());
tail_h();
}This form is appropriate when the only required continuation after
w_sender()
is the release of the gate itself.
A
serial_gate
can be implemented so that it preserves both safety and progress
compositionally, assuming that the code submitted to the gate also
preserves safety and progress.
For safety, the implementation must maintain the following invariant:
This is the sender-based equivalent of the (Mutex) constraint described earlier. If each protected work item preserves the invariants of the state it accesses when executed in isolation, then serializing those work items is sufficient to prevent concurrent interference between them. The gate does not need to know which invariants are being protected; it only needs to ensure that no two protected work items can observe or mutate that state concurrently through the gate.
The use of
within(g.acquire(), snd)
is important for this argument. The sender returned by
acquire()
admits the operation to the gate, and the corresponding exit sender
releases the gate after
snd
completes. Thus, the release of the gate is tied to the completion of
the protected sender, including non-value completion paths. Users are
not required to manually signal the gate after the work is complete, so
there is no separate release protocol that can be forgotten or executed
in the wrong order.
For progress, a
serial_gate
should avoid blocking an execution resource while an operation is
waiting to enter the gate. If the gate is already occupied, the waiting
operation is suspended and resumed only when it can make progress. When
the current protected operation completes, the implementation selects
another waiting operation, if any, and admits it. Under the usual
assumptions that admitted work eventually completes and that the
scheduler used to resume waiting operations provides progress, the gate
itself does not introduce a deadlock or lost wakeup.
In particular, a submitted operation that is waiting to enter the gate should eventually be admitted, provided that the operation is not cancelled or abandoned, the gate remains alive, previously admitted work eventually completes, and the execution agents needed by the implementation continue to make progress. This eventual-admission property is part of what makes the gate useful as a composable concurrency primitive: submitting work to the gate should not create an indefinite wait that is independent of the submitted work and the scheduler.
This is stronger than an asynchronous interface to a semaphore-like
primitive with separate acquire and release operations. A user of
serial_gate
is not given an operation that manually releases the gate, so the user
cannot forget to release it, release it on only one completion path, or
make progress depend on an unrelated notification. Release is
structurally connected to completion of the admitted sender.
This progress guarantee is conditional. A
serial_gate
cannot make an operation complete if the protected work never completes,
or if the enclosing lifetime mechanism cancels or abandons the
operation. Users can also still create progress failures by using the
gate in a context that blocks an execution resource needed by the
operation, for example by calling
sync_wait
on a scheduler thread that must also run the operation needed to release
the gate. It also cannot guarantee fairness unless the specification
requires a particular admission policy. These are not failures of the
abstraction; they are the same limits that apply to sender-based
asynchronous composition in general. The important property is that the
gate itself does not expose a manual signaling protocol that lets users
break progress independently of the submitted work.
Fairness matters here only for waiter-level progress, not for system-level progress. Even if admitted work keeps completing, a particular waiting operation may still be bypassed indefinitely in an open system where new operations continue to arrive. Therefore, whether a particular waiting operation is eventually admitted depends on the gate’s admission policy and any anti-starvation guarantees that policy provides.
We do not plan to require a strict fairness policy, so implementations can optimize admission. Under such a relaxed policy, starvation may be possible; in those cases, per-waiter eventual-admission is not guaranteed. This reflects the scope of the progress contract: the gate guarantees progress only under its stated assumptions; if usage permits starvation under a relaxed admission policy, those assumptions are not met, so per-waiter eventual admission is outside the contract.
Therefore, if the calling code maintains the lifetime of the gate and
of the associated operations, and if the protected work items themselves
maintain safety and make progress, a
serial_gate
can maintain the non-local serialization constraint without breaking
safety or progress.
The destruction rules for all gate objects are discussed in Destruction. They are compatible with the safety and progress goals.
read_write_gate
Similar to a
shared_mutex
, but for structured concurrency, a
read_write_gate
allows multiple shared operations to execute concurrently, while
exclusive operations execute alone.
The
read_write_gate
follows the same split-tail model described for
serial_gate
, so the (NewLocal) constraints apply to each admitted
operation.
The non-local constraint is (SharedMutex): read operations may overlap with other read operations, while write operations are ordered with respect to all other operations admitted through the same gate.
The constraints imposed by a read_write_gate.
Following the
serial_gate
API, the primitive can expose separate entry senders for shared and
exclusive admission:
struct read_write_gate {
std::execution::enter_scope_sender acquire_read();
std::execution::enter_scope_sender acquire_write();
};The use of the gate is otherwise the same as for
serial_gate
:
read_write_gate g;
std::execution::counting_scope scope;
void read_usage() {
head_read();
auto s = std::execution::within(g.acquire_read(), read_sender())
| std::execution::then([] { tail_read_w(); });
std::execution::spawn(std::move(s), scope.get_token());
tail_read_h();
}
void write_usage() {
head_write();
auto s = std::execution::within(g.acquire_write(), write_sender())
| std::execution::then([] { tail_write_w(); });
std::execution::spawn(std::move(s), scope.get_token());
tail_write_h();
}A cache often allows many concurrent lookups, but requires updates to be exclusive.
read_write_gate cache_gate;
cache c;
auto lookup(key k) {
auto read
= std::execution::just(std::move(k))
| std::execution::then([&](key k) {
return c.lookup(k);
});
return std::execution::within(cache_gate.acquire_read(), std::move(read))
| std::execution::then([](record r) {
use_record(std::move(r));
});
}
auto insert(record r) {
auto write
= std::execution::just(std::move(r))
| std::execution::then([&](record r) {
c.insert(std::move(r));
});
return std::execution::within(cache_gate.acquire_write(), std::move(write));
}Multiple calls to
lookup
can execute concurrently. Calls to
insert
are serialized with respect to all other cache operations admitted by
the gate.
The same pattern applies when many operations need a stable view of shared configuration, but configuration reloads must be exclusive.
read_write_gate config_gate;
configuration config;
auto serve(request req) {
auto read_config
= std::execution::just(std::move(req))
| std::execution::then([&](request req) {
return handle_with_config(std::move(req), config);
});
return std::execution::within(config_gate.acquire_read(), std::move(read_config));
}
auto reload(configuration next) {
auto update_config
= std::execution::just(std::move(next))
| std::execution::then([&](configuration next) {
config = std::move(next);
});
return std::execution::within(config_gate.acquire_write(), std::move(update_config));
}Requests may read the current configuration concurrently. A reload is admitted only when no request is using the configuration through the gate, and no new request is admitted in read mode while the reload is active.
The safety and progress argument is the same as for
serial_gate
. The implementation maintains a different admission invariant: either
any number of read operations are admitted and no write operation is
admitted, or exactly one write operation is admitted and no read
operation is admitted. As long as the submitted work preserves safety
and progress, and the calling code maintains the lifetime of the gate
and associated operations, a
read_write_gate
can maintain the read-write constraint without exposing a manual
signaling protocol that lets users break progress independently of the
submitted work.
The same eventual-admission requirement applies: an operation submitted to the gate should eventually be admitted when its admission mode is compatible with the operations ahead of it, assuming the operation is not cancelled or abandoned, the gate remains alive, previously admitted work eventually completes, and the scheduler provides progress. If a fairness policy is not specified, the guarantee may need to be phrased in terms of the chosen admission policy; for example, a writer should not be indefinitely bypassed by later readers unless the specification explicitly permits that behavior.
capacity_gate
Similar to a
counting_semaphore
, but for structured concurrency, a
capacity_gate
allows at most K operations to execute their protected work
concurrently.
The
capacity_gate
follows the same split-tail model described for
serial_gate
, so the (NewLocal) constraints apply to each admitted
operation.
The non-local constraint is (Semaphore): among all operations admitted through the same gate, at most K protected work items may execute concurrently.
The constraints imposed by a capacity_gate with
capacity 2.
Following the
serial_gate
API, the primitive can expose one entry sender. The maximum concurrency
is a property of the gate object.
struct capacity_gate {
explicit capacity_gate(size_t max_concurrency);
std::execution::enter_scope_sender acquire();
};The use of the gate is otherwise the same as for
serial_gate
:
capacity_gate g{2};
std::execution::counting_scope scope;
void usage() {
head();
auto s = std::execution::within(g.acquire(), w_sender())
| std::execution::then([] { tail_w(); });
std::execution::spawn(std::move(s), scope.get_token());
tail_h();
}An application may need to limit the number of concurrent requests sent to an external service, while still allowing callers to construct sender expressions independently.
capacity_gate service_gate{8};
service client;
auto fetch(resource_id id) {
auto request
= std::execution::just(id)
| std::execution::let_value([&](resource_id id) {
return client.async_fetch(id);
});
return std::execution::within(service_gate.acquire(), std::move(request));
}At most eight calls to
client.async_fetch
are active through
service_gate
at any time. The caller of
fetch
can still compose the returned sender with other work before deciding
how to start it.
A
capacity_gate
can also bound work that is expensive even if the scheduler has more
execution resources available.
capacity_gate compression_gate{4};
auto compress_file(path input, path output) {
auto work
= std::execution::just(std::move(input), std::move(output))
| std::execution::then([](path input, path output) {
compress(input, output);
});
return std::execution::within(compression_gate.acquire(), std::move(work));
}Here the gate expresses a program-level concurrency limit, not a scheduling policy. The scheduler may have more than four worker threads, but no more than four compression operations admitted through the gate execute concurrently.
The safety and progress argument is the same as for
serial_gate
. The implementation maintains a different admission invariant: no more
than K operations are admitted at once. As long as the
submitted work preserves safety and progress, and the calling code
maintains the lifetime of the gate and associated operations, a
capacity_gate
can maintain the bounded-concurrency constraint without exposing a
manual signaling protocol that lets users break progress independently
of the submitted work.
The same eventual-admission requirement applies: an operation submitted to the gate should eventually be admitted when capacity becomes available, assuming the operation is not cancelled or abandoned, the gate remains alive, previously admitted work eventually completes, and the scheduler provides progress.
The same considerations on fairness and starvation apply here. For ensuring progress under any circumstances, the users of capacity gate must provide stronger guarantees.
Similar to the classic
try_lock
facility, we can bring speculative execution to gates. Sometimes an
operation is useful only if it can start immediately. If the non-local
constraint would force the operation to wait, the program may prefer to
skip that operation and continue with other work.
For sender-based gates, this can be expressed by adding
try_
versions of the corresponding acquire operations. A
try_
acquire operation attempts to enter the gate without enqueueing the
operation as a waiter. If the operation can be admitted immediately, it
completes successfully and produces the exit sender used by
within
. If the operation cannot be admitted immediately, it completes with
set_stopped()
.
For example, a
serial_gate
might provide:
struct serial_gate {
std::execution::enter_scope_sender acquire();
std::execution::enter_scope_sender try_acquire();
};and a
read_write_gate
might provide:
struct read_write_gate {
std::execution::enter_scope_sender acquire_read();
std::execution::enter_scope_sender acquire_write();
std::execution::enter_scope_sender try_acquire_read();
std::execution::enter_scope_sender try_acquire_write();
};This allows callers to express speculative work directly:
serial_gate cache_update_gate;
auto opportunistic_refresh() {
return std::execution::within(
cache_update_gate.try_acquire(),
refresh_cache_snapshot());
}If the gate is free,
refresh_cache_snapshot()
executes under the gate. If the gate is occupied, the returned sender
completes with
set_stopped()
and no refresh is performed.
The same idea applies to
capacity_gate
:
capacity_gate upload_gate{8};
auto maybe_upload(chunk c) {
return std::execution::within(
upload_gate.try_acquire(),
upload_chunk(std::move(c)));
}Here an upload is started only if capacity is available immediately. If all capacity is already in use, the operation is stopped rather than queued.
The important property is that speculative execution does not weaken
the gate’s safety invariant. A successful
try_
acquire admits the operation exactly like the corresponding
non-speculative acquire. An unsuccessful
try_
acquire does not admit the operation and therefore does not execute the
protected work. Thus, the only additional behavior is the possibility of
set_stopped()
before the protected work starts.
Speculative acquisition is not a replacement for the
eventual-admission guarantees discussed above. A normal acquire
operation submits work to the gate and expects eventual admission or
eventual resolution according to the gate’s semantics. A
try_
acquire operation is explicitly different: it asks whether the work can
be admitted now, and otherwise declines to wait.
readiness_gate
Similar to a
condition_variable
, but for structured concurrency, a
readiness_gate
delays work until some readiness condition has been established.
The
readiness_gate
follows the same split-tail model described for
serial_gate
, so the (NewLocal) constraints apply to the operations
involved.
The non-local constraint is (CondVar): assuming the readiness condition is initially false, work that depends on the condition can start only after some operation has made the condition true.
Unlike a condition variable, the gate should not expose a protocol in which one operation waits and an unrelated operation must remember to notify it. The readiness transition is represented as sender work, and waiting operations are resumed when the gate observes that transition.
The constraints imposed by a readiness_gate.
One possible API separates the operation that waits for readiness from the operation that establishes readiness:
struct readiness_gate {
std::execution::enter_scope_sender wait();
std::execution::sender auto set_ready();
std::execution::sender auto close();
};The
wait()
sender completes when the gate is ready and admits the dependent work.
The
set_ready()
sender establishes readiness and resumes operations waiting on the gate.
The
close()
sender completes waiting operations with
set_stopped()
if readiness has not been established.
readiness_gate g;
std::execution::counting_scope scope;
void wait_usage() {
head_wait();
auto s = std::execution::within(g.wait(), when_ready_sender())
| std::execution::then([] { tail_wait_w(); });
std::execution::spawn(std::move(s), scope.get_token());
tail_wait_h();
}
void update_usage() {
head_update();
auto s = g.set_ready()
| std::execution::then([] { tail_update_w(); });
std::execution::spawn(std::move(s), scope.get_token());
tail_update_h();
}The API above treats readiness as an explicit transition: some
operation decides that the condition is true and calls
set_ready()
. This is simpler than a
condition_variable
, but it does not model the traditional condition-variable pattern in
which the waiting operation re-checks a predicate protected by the same
synchronization mechanism. In that pattern, notification is only a hint;
the predicate is the source of truth.
An alternative design is to make the predicate part of the gate:
template <class Predicate>
struct readiness_gate {
explicit readiness_gate(Predicate pred);
std::execution::enter_scope_sender wait();
std::execution::sender auto update();
std::execution::sender auto close();
};In this design,
wait()
admits dependent work only when the predicate evaluates to
true
. The
update()
operation is used after code has modified the state observed by the
predicate; it causes the gate to re-check the predicate and admit
waiters if the condition is now satisfied. If the gate is closed before
the predicate becomes true, waiters complete with
set_stopped()
.
For example:
configuration config;
readiness_gate config_loaded{[&] {
return config.has_value();
}};
auto load_configuration(path p) {
return async_read_config(std::move(p))
| std::execution::then([&](configuration loaded) {
config = std::move(loaded);
})
| std::execution::let_value([&] {
return config_loaded.update();
});
}
auto handle_request(request req) {
auto use_config
= std::execution::just(std::move(req))
| std::execution::then([&](request req) {
return handle_with_config(std::move(req), config);
});
return std::execution::within(config_loaded.wait(), std::move(use_config));
}This alternative is closer to
condition_variable
: the predicate, not the update operation itself, determines whether
dependent work may proceed. It avoids admitting work after an incorrect
set_ready()
call, and it can naturally handle cases in which multiple updates are
needed before readiness is established.
The cost is that the gate now has to own or reference the predicate
and define where and how the predicate is evaluated. If the predicate
reads shared state, the design must also specify how that state is
protected from concurrent access. This may require combining
readiness_gate
with another gate, or making the readiness gate itself responsible for
the state being checked. That makes the abstraction heavier than the
explicit-transition API.
The explicit
set_ready()
design is appropriate when readiness is a one-shot fact established by a
specific operation, such as successful initialization. The
predicate-owned design is appropriate when readiness is derived from
shared state and the gate must prevent users from separating the
readiness signal from the condition that justifies it.
One common use is to delay work until asynchronous initialization has completed.
readiness_gate initialized;
service svc;
auto start_service(configuration cfg) {
auto start
= std::execution::just(std::move(cfg))
| std::execution::let_value([&](configuration cfg) {
return svc.async_start(std::move(cfg));
})
| std::execution::let_value([&] {
return initialized.set_ready();
});
return start;
}
auto query_service(query q) {
auto query
= std::execution::just(std::move(q))
| std::execution::let_value([&](query q) {
return svc.async_query(std::move(q));
});
return std::execution::within(initialized.wait(), std::move(query));
}Calls to
query_service
may be created before the service has started, but their queries are not
admitted until
start_service
establishes readiness.
A component may also need to delay requests until the first configuration value has been loaded.
readiness_gate config_loaded;
configuration config;
auto load_configuration(path p) {
auto load
= async_read_config(std::move(p))
| std::execution::then([&](configuration loaded) {
config = std::move(loaded);
})
| std::execution::let_value([&] {
return config_loaded.set_ready();
});
return load;
}
auto handle_request(request req) {
auto use_config
= std::execution::just(std::move(req))
| std::execution::then([&](request req) {
return handle_with_config(std::move(req), config);
});
return std::execution::within(config_loaded.wait(), std::move(use_config));
}The gate expresses that request handling depends on the initial configuration becoming available. Once readiness is established, subsequent requests can proceed without a blocking wait.
The safety and progress argument follows the same ideas as for
serial_gate
. The implementation maintains a different admission invariant:
dependent work is not admitted until readiness has been established. As
long as the work that establishes readiness and the work that depends on
readiness preserve safety and progress, and the calling code maintains
the lifetime of the gate and associated operations, a
readiness_gate
can maintain the readiness constraint without exposing a
condition-variable-style manual notification protocol.
The eventual-admission requirement becomes an eventual-resolution
requirement for
readiness_gate
: an operation submitted to the gate should eventually either be
admitted after readiness is established, or complete with
set_stopped()
if the gate is closed before readiness is established. This assumes that
the operation is not otherwise cancelled or abandoned, the gate remains
alive until it is made ready or closed, and the scheduler provides
progress. Before readiness is established or the gate is closed, waiting
is not a progress failure; it is the constraint expressed by the
gate.
The presence of
close()
also has implications for the destruction of
readiness_gate
objects; see Destruction for more details on
the requirements for destruction.
completion_gate
Similar to a
latch
, but for structured concurrency, a
completion_gate
delays dependent work until a fixed number of submitted operations have
completed.
The
completion_gate
follows the same split-tail model described for
serial_gate
, so the (NewLocal) constraints apply to the operations
involved.
The non-local constraint is (Latch): dependent work admitted
through the gate can start only after all operations that contribute to
opening the gate have completed. As with
latch
, the contributing operations are not ordered with respect to each other
by the gate.
The constraints imposed by a completion_gate.
One possible API separates arrivals from waiting for completion:
struct completion_gate {
explicit completion_gate(size_t expected);
std::execution::sender auto arrive();
std::execution::enter_scope_sender wait();
std::execution::sender auto close();
};The
arrive()
sender records one completion. The
wait()
sender completes when the expected number of arrivals has been recorded
and admits the dependent work. The
close()
sender completes waiting operations with
set_stopped()
if the expected number of arrivals has not been reached. In typical use,
arrive()
is structurally attached to the completion of the contributing sender,
as in the examples below.
completion_gate g{2};
std::execution::counting_scope scope;
void worker1() {
head1();
auto s
= w1_sender()
| std::execution::let_value([&] {
return g.arrive();
})
| std::execution::then([] { tail1_w(); });
std::execution::spawn(std::move(s), scope.get_token());
tail1_h();
}
void coordinator() {
head_c();
auto s = std::execution::within(g.wait(), after_completion_sender())
| std::execution::then([] { tail_c_w(); });
std::execution::spawn(std::move(s), scope.get_token());
tail_c_h();
}Several independent startup tasks may need to complete before the rest of a service can begin accepting work.
completion_gate startup_done{3};
auto load_index() {
return async_load_index()
| std::execution::let_value([&] {
return startup_done.arrive();
});
}
auto connect_database() {
return async_connect_database()
| std::execution::let_value([&] {
return startup_done.arrive();
});
}
auto warm_cache() {
return async_warm_cache()
| std::execution::let_value([&] {
return startup_done.arrive();
});
}
auto accept_requests() {
return std::execution::within(startup_done.wait(), start_accepting_requests());
}The three startup tasks can run concurrently.
accept_requests
is admitted only after all three arrivals have been recorded.
A
completion_gate
can also express that a continuation depends on a fixed batch of
operations, without imposing order between the operations in the
batch.
completion_gate batch_done{files.size()};
auto process_file(path p) {
return async_process_file(std::move(p))
| std::execution::let_value([&] {
return batch_done.arrive();
});
}
auto write_summary() {
return std::execution::within(batch_done.wait(), async_write_summary());
}Each file-processing operation contributes one arrival. The summary is not written until all file-processing operations in the batch have completed.
The safety and progress argument follows the same ideas as for
serial_gate
. The implementation maintains a different admission invariant:
dependent work is not admitted until the required number of arrivals has
been recorded. As long as the arriving work and the dependent work
preserve safety and progress, and the calling code maintains the
lifetime of the gate and associated operations, a
completion_gate
can maintain the completion constraint without exposing a manual
wait/notify protocol.
The eventual-admission requirement becomes an eventual-resolution
requirement for
completion_gate
: an operation submitted to the gate should eventually either be
admitted after the expected number of arrivals has been recorded, or
complete with
set_stopped()
if the gate is closed before that happens. This assumes that the
operation is not otherwise cancelled or abandoned, the gate remains
alive until it is opened or closed, and the scheduler provides progress.
Before the expected number of arrivals has been recorded or the gate is
closed, waiting is not a progress failure; it is the constraint
expressed by the gate.
The presence of
close()
also has implications for the destruction of
completion_gate
objects; see Destruction for more details on
the requirements for destruction.
phase_gate
Similar to a
barrier
, but for structured concurrency, a
phase_gate
coordinates a fixed set of participants through repeated phase
boundaries.
The
phase_gate
follows the same split-tail ideas as the previous gates, but the
important relation is between phases. The non-local constraint is
(Barrier): all work in one phase must complete before any work
in the next phase can execute. Work within the same phase is not ordered
by the gate and may execute concurrently.
The constraints imposed by a phase_gate.
The central operation for a phase gate is an asynchronous
arrive_and_wait()
operation. It records the participant’s arrival at the current phase and
completes only when all participants for that phase have arrived.
struct phase_gate {
explicit phase_gate(size_t expected);
std::execution::enter_scope_sender arrive_and_wait();
std::execution::sender auto close();
};The
arrive_and_wait()
operation admits the continuation after the phase boundary. The
close()
sender completes waiting operations with
set_stopped()
if the phase cannot complete.
phase_gate g{2};
std::execution::counting_scope scope;
void participant() {
auto s
= phase1_sender()
| std::execution::let_value([&] {
return std::execution::within(g.arrive_and_wait(), phase2_sender());
});
std::execution::spawn(std::move(s), scope.get_token());
}An iterative algorithm may have multiple participants that compute a step independently, then exchange or observe the results only after all participants have completed the step.
phase_gate iteration_done{workers.size()};
auto worker(worker_state& state) {
return compute_step(state)
| std::execution::let_value([&] {
return std::execution::within(
iteration_done.arrive_and_wait(),
exchange_boundaries(state));
})
| std::execution::let_value([&] {
return std::execution::within(
iteration_done.arrive_and_wait(),
compute_next_step(state));
});
}The first
arrive_and_wait()
ensures that no participant exchanges boundaries before all participants
have completed
compute_step
. The second one ensures that no participant starts the next step before
all participants have completed the boundary exchange.
A group of participants may also need to move through pipeline stages together, while allowing concurrency inside each stage.
phase_gate stage_done{participants};
auto participant(input_chunk input) {
return parse_to_local_storage(std::move(input))
| std::execution::let_value([&] {
return std::execution::within(
stage_done.arrive_and_wait(),
validate_local_storage());
})
| std::execution::let_value([&] {
return std::execution::within(
stage_done.arrive_and_wait(),
publish_local_storage());
});
}All participants complete parsing before any participant validates, and all participants complete validation before any participant publishes.
The safety and progress argument follows the same ideas as for
serial_gate
. The implementation maintains a different admission invariant:
continuations after a phase boundary are not admitted until all expected
participants have arrived at that boundary. As long as each
participant’s phase work preserves safety and progress, and the calling
code maintains the lifetime of the gate and associated operations, a
phase_gate
can maintain the phase-ordering constraint without exposing a blocking
barrier wait.
The eventual-admission requirement becomes an eventual-resolution
requirement for
phase_gate
: an operation submitted to the gate should eventually either be
admitted after all expected participants have arrived at the current
phase, or complete with
set_stopped()
if the gate is closed before that happens. This assumes that the
operation is not otherwise cancelled or abandoned, the gate remains
alive until the phase is completed or closed, and the scheduler provides
progress. Before all expected participants arrive or the gate is closed,
waiting is not a progress failure; it is the constraint expressed by the
gate.
The presence of
close()
also has implications for the destruction of
phase_gate
objects; see Destruction for more details on
the requirements for destruction.
All gate objects have lifetime requirements. A gate owns the state needed to remember waiting operations, admission order, and the operations currently admitted through the gate. Destroying that state while operations are still associated with it would leave those operations without a well-defined synchronization object.
For this reason, destroying a gate while it has outstanding
associated work is a program error. The implementation should call
std::terminate()
in this case. Outstanding work includes operations that have been
submitted to the gate but not yet admitted, operations currently
admitted whose exit sender has not completed, and operations waiting for
the gate to resolve a dependency.
This rule is intentionally strict. A gate destructor should not try to block until outstanding work completes, because blocking may consume an execution resource needed by that work to make progress. It also should not silently abandon operations, because that would make progress depend on object lifetime in a way that is difficult to reason about. Programs that need to shut down a gate must first arrange for all associated operations to complete, be cancelled, or otherwise be resolved by the gate’s explicit API, and only then destroy the gate.
For gates with an explicit closing operation, such as
readiness_gate::close()
, closing the gate is separate from destroying it. Closing resolves
operations according to the gate’s semantics; destruction still requires
that no outstanding associated work remains.
The gates above are derived from existing synchronization primitives. Once admission is expressed as sender composition, other useful non-local constraints become possible.
keyed_serial_gate
serializes operations that have the same key, while allowing operations
with different keys to proceed concurrently.budget_gate
admits operations while a resource budget is available; unlike
capacity_gate
, different operations may consume different amounts.priority_gate
admits pending operations according to priority.deadline_gate
admits, orders, or stops pending operations according to deadlines.rate_gate
admits operations according to a time-based rate rather than a
concurrency bound.throttle_gate
spaces out admitted operations, for example by ensuring a minimum
interval between starts.backpressure_gate
admits upstream work only when downstream capacity is available.coalescing_gate
combines compatible pending operations before admission.latest_only_gate
retains only the most recent pending operation and stops or discards
older pending operations that have been superseded.The APIs above are intentionally phrased in terms of the scope vocabulary from [P3955R0]. In that model, entering an asynchronous scope is itself a sender operation, and leaving the scope is represented by an exit sender that is structurally connected to the completion of the work executed inside the scope.
That is exactly the shape needed by gates. A gate’s
acquire()
operation is an enter-scope sender: it waits until the gate can admit
the operation, and then produces the exit sender that releases the gate.
The expression
std::execution::within(g.acquire(), snd)therefore means: enter the gate, run
snd
, and release the gate after
snd
completes. This is what prevents the gate from degenerating into a
manual acquire/release protocol. The release operation is not a separate
user action; it is part of the structure of the sender expression.
The same observation applies directly to gates that admit work into a
protected region, such as
read_write_gate::acquire_read()
,
read_write_gate::acquire_write()
, and
capacity_gate::acquire()
. For dependency and rendezvous gates, such as
completion_gate::wait()
and
phase_gate::arrive_and_wait()
, the enter-scope shape is still useful, but the exit sender may be
trivial: the important operation is delaying admission of the dependent
work until the gate’s condition is satisfied.
Thus, this paper can be seen as exploring concrete asynchronous scopes whose purpose is not object lifetime, but non-local concurrency control.
counting_scope
The
counting_scope
facility from [P3149R11] solves a different but
related problem. A
counting_scope
tracks the lifetime of asynchronous work that has been associated with
the scope. It gives a program a way to spawn non-sequential work and
later close, stop, and join that work.
This is already a primitive for non-local concurrency. Work
associated with a
counting_scope
is not necessarily lexically nested in the caller that started it, but
the scope still imposes non-local lifetime constraints:
The gates proposed in this paper are complementary. A gate controls
when work may execute; a
counting_scope
controls how long spawned work remains associated with an
enclosing lifetime. In the examples above, these roles are often
combined:
auto protected_work =
std::execution::within(g.acquire(), work());
std::execution::spawn(std::move(protected_work), scope.get_token());Here the gate imposes the serialization, read-write, capacity,
readiness, completion, or phase constraint. The
counting_scope
tracks the lifetime of the spawned operation. Neither facility subsumes
the other.
There is also a design connection through
scope_token
. One possible API for gates is to expose a token that models the same
token concept used by
counting_scope
; the token’s
wrap
operation would apply the gate constraint to the associated sender. This
paper does not rely on that spelling, but the relationship suggests that
gates and
counting_scope
may share customization and specification machinery.
The following facilities are also adjacent to this design space:
when_all
,
let_value
,
starts_on
, and
continues_on
express local sender composition. Gates add shared state that allows
different sender expressions to participate in the same non-local
constraint.This paper is exploratory. The design of various facilities still needs work. It does not yet ask SG1 to standardize any particular gate. Instead, it asks whether this is a useful direction for sender-based concurrency, and where further work should be focused.
In particular, we would like SG1 feedback on the following questions:
counting_scope
from [P3149R11] clear and useful?serial_gate
,
read_write_gate
,
capacity_gate
,
readiness_gate
,
completion_gate
, or
phase_gate?The main question is whether SG1 agrees that C++ needs sender-native primitives for these non-local constraints. If so, the next question is where the author should spend effort: formalizing one or two concrete gates, refining the common gate model, exploring integration with scope tokens, or developing more examples and implementation experience.