Document number: P2643R1
Date: 2023-05-18
Reply to: Gonzalo Brito Gadeschi <gonzalob _at_ nvidia.com>
Authors: Gonzalo Brito Gadeschi, Olivier Giroux, Thomas Rodgers
Audience: Concurrency
Improving C++ concurrency features
Revisions
r0 - (Kona Drafts)
- Added discussion of pros/cons of returning pair<T. bool> vs. optional<T>.
- Added discussion of timed waits and freestanding, given that <chrono> is not part of freestanding.
- Added wording for barrier::try_wait_forandbarrier::try_wait_until.
- Added motivating examples.
- Fixed barrier::try_wait_for/_untilsignatures; they were incorrectly accepting anarrival_token&&, but since these can be called in a loop consuming thearrival_tokenis incorrect.
- Removed discussion of ‘hinted’ wait mechanism. The design surface area of this proposal is such that it should be a separate paper, brought forward an interested party.
- Removed fallible untimed try_wait.
r1 - (Varna Draft)
- Removed timed waiting for freestanding, to reflect Kona guidance, and added discussion.
- Removed pros/cons discussion of returning optional<T>vspair<bool, T>vsT, reflecting Kona guidance.
- Re-added fallible untimed try_waitwith rationale, reflecting Kona guidance.
Introduction
P1135R6 introduced serval new concurrency primitives to the C++20 concurrency library:
- <atomic>: added the class- atomic_flag, the- waitand- notify_one/_allto class template- atomic<>, and free function versions of these.
- <semaphore>: added class template- counting_semaphore<>and class- binary_semaphore.
- <barrier>,- <latch>: added class template- barrier<>and class- latch.
Though each element included was long coming, and had much implementation experience behind it, fresh user feedback tells us that some improvements could still be made:
- Return last observed value from atomic::wait; this value is lost otherwise.
- Add timed versions of atomic::waitand other concurrency primitves likebarrierandlatch, to make it easier to implement concurrency primitives on top ofatomicthat expose timed waiting facilities themselves, e.g., like<semaphore>.
- Avoid spurious polling in atomic::waitby accepting a predicate.
This proposal proposes extensions to address these shortcomings. This branch demonstrates its implementability in libstdc++.
The design of the features above is mostly orthogonal, and this section explores them independently.
Return last observed value on success
Desing: ::wait APIs change the return type signature from returnvoidT wait(...);
Example 0:
| Before | After | 
| atomic<int> a(42);
 a.wait(42);
 auto o = a.load();
 assert(o != 42); // MAY FAIL! | atomic<int> a(42);
 auto o = a.wait(42);
 
 assert(o != 42); // OK! | 
The atomic<T>::wait method guarantees that the thread is unblocked only if the value changed.
Before this paper, the new atomic<T> value is not returned to the caller. This has the following two shortcomings:
- That value is lost forever: after the thread is unblocked, the value might change before the unblocked thread calls atomic<T>::load(ABA Problem).
- Applications need the new value often: which forces them to call atomic<T>::load, even thoughatomic::wait<T>most likely already loaded the value, to test that it did change.
After this paper, the value returned by wait is returned to the caller, eliminating the need for the subsequent load.
This is an ABI breaking change that could be resolved by picking a different to-be-determined name for these APIs.
Fallible and timed versions of wait APIs
The design of the fallible timed versions of wait APIs adds three new APIs. For atomic these are
template <class T>
optional<T> atomic<T>::try_wait(T, memory_order);    
template <class T, class Rep, class Period>
optional<T> atomic<T>::try_wait_for(T, duration<Rep, Period> const&, memory_order);
    
template <class T, class Clock, class Duration>
optional<T> atomic<T>::try_wait_until(T, time_point<Clock, Duration> const&, memory_order);
They are non-blocking, i.e., they eventually return to the caller in a finite-set of steps, even if the value did not change. This enables the application to “do something else” before attempting to wait again.
On failure, i.e., if the value did not change, they return nullopt and the operation has no effects (it does not synchronize). On success, they return an optional<T> containing the last observed value, which is guaranteed to be different from the one the call site waited on.
The untimed try_wait overload attempts to wait for a finite unspecified duration. The implementation may pick a different duration every time, enabling it decide for how long the operation should attempt to wait based on dynamic information, like the load of the system.
Since <chrono> and <optional> are not freestanding, these APIs will not be available in freestanding implementations. We could attempt to improve this by:
- Supporting the subset of optionalAPIs that do not throw exceptions in freestanding.
- Supporting the <chrono>durations in freestanding.
Example 1: the atomic variable t tracks how many tasks need to still be processed, and the application prints every second - if it is still waiting - how many tasks remain:
| Before | After | 
| atomic<int> t;
 int old = 1;
 auto b = clock::now();
 while (old != 0) {
  old = t.load();
  if (auto e = clock::now(); (e - b) > 1s) {
  cout << old;
     b = e;
  }
 } | atomic<int> t;
 auto old = 1;
 
 while (old != 0) {
  auto o = t.try_wait_for(old, 1s);
   old = o.has_value()? o.value() : old;
  cout << old;
 }
 
 
 | 
Before this proposal, applications need to re-implement atomic<T>::wait logic, and like in this example, may accidentally call atomic<T>::load in a loop without any back-off.
After this proposal, try_wait_for expresses the application’s intent to wait for a certain duration of time, enabling the implementation to wait efficiently.
For barrier and latch they look like:
template <class CF>
bool barrier<CF>::try_wait(arrival_token& tok) const;
template <class CF, class Rep, class Period>
bool barrier<CF>::try_wait_for(arrival_token& tok, duration<Rep, Period> const& rel_time) const;
template <class CF, class Clock, class Duration>
bool barrier<CF>::try_wait_until(arrival_token& tok, time_point<Clock, Duration> const& abs_time) const;
template <class Rep, class Period>
bool latch::try_wait_for(duration<Rep, Period> const& rel_time) const;
template <class Clock, class Duration>
bool latch::try_wait_until(time_point<Clock, Duration> const& abs_time) const;
Example 2: using a barrier which tracks the completion of task_count tasks:
std::barrier b(task_count);
auto n = process_thread_local_tasks(); 
auto t = b.arrive(n);
    
while (!b.try_wait_for(t, 1ms)) {
   auto s = steal_and_process_tasks();
   b.arrive(s);
}
Predicate versions
There is no wording for the predicate versions proposed in this revision of this paper.
When the program is waiting for a condition different from “not equal to”, there is an added re-try loop around the wait operation in the program. This loop causes each call to wait to be performed as if it were the first call to wait, oblivious to the fact that the program has already been waiting for some time. This leads to re-executing the short-term polling strategy.
Taking a predicate instead of a value allows us to push the program-defined condition inside of atomic::wait, delete the outer loop, and allows the implementation to track time spent.
At least two implementations currently implement atomic::wait in terms of a wait taking a predicate.
The authors see two APIs that are both equvialent in terms of implementation complexity and usability, but have an opposite API depending on how the semantics of the predicate are defined.
Both options have this signature T atomic<T>::wait_with_predicate(Predicate&& p) and the predicate takes a T and returns a bool, but the predicate returns:
- Option 1: trueif we synchronizes
- Understood as the wait succeds
 
- Option 2: falseif we synchronizes
- Understood as “the value changed” (value compared not-equal - like atomic::wait(T)waits untilTchanges)
 
Feedback needed: We should pick an option before working on wording.
Example 3: same example as Example 1 using Option 1 above (true if we synchronize):
| Before | After | 
| atomic<int> t;
 int old = 1;
 auto b = clock::now();
 while (old != 0) {
  old = t.load();
  auto e = clock::now();
  if ((e - b) > 1s) {
  cout << old;
     b = e;
  }
 } | atomic<int> t;
 int old = t.load();
 auto p = [&](int v) {
  old = v;
  return v == 0;
 };
 
 while(!t.try_wait_with_predicate_for(p, 1s).has_value()) {
  cout << old;
 }
 
 | 
In Example 1, the “After this paper” implementation has the issue that try_wait_for(old, 1s) returns as soon as old changes. That is, while the goal of the application is to wait for the value becoming zero, and it could wait up to 1s for that, if after 0.1s the value changes from 10 to 9 this API will return.
The “After this paper” implementation in Example 2 fixes that using the fallibe timed _with_predicate version. The predicate is called every time the value changes, but it doesn’t “succeed” until the value is 0.
Wording
Return last observed value from atomic::wait
To [atomics.ref.generic.general]:
namespace std {
  template<class T> struct atomic_ref {  // [atomics.ref.generic.general]
    voidT wait(T, memory_order = memory_order::seq_cst) const noexcept;
  };
}
UNRESOLVED QUESTION: all atomic_ref types are missing volatile wait overloads. Should that be fixed as part of this work?
To [atomics.ref.ops]:
voidT wait(T old, memory_order order = memory_order::seq_cst) const noexcept;
- Preconditions: orderis neithermemory_order::releasenormemory_order::acq_rel.
- Effects: Repeatedly performs the following steps, in order:
- Evaluates load(order)and compares its value representation for equality against that ofold.
- If they compare unequal, returns the result of the evaluation of load(order)in the previous step.
- Blocks until it is unblocked by an atomic notifying operation or is unblocked spuriously.
 
- Remarks: This function is an atomic waiting operation (atomics.wait) on atomic object *ptr.
To [atomics.ref.int]:
namespace std {
  template<> struct atomic_ref<integral> {
    voidT wait(integral, memory_order = memory_order::seq_cst) const noexcept;
  };
}
To [atomics.ref.float]:
namespace std {
  template<> struct atomic_ref<floating-point> {
    voidT wait(floating-point, memory_order = memory_order::seq_cst) const noexcept;
  };
}
To [atomics.ref.pointer]:
namespace std {
  template<class T> struct atomic_ref<T*> {
    voidT* wait(T*, memory_order = memory_order::seq_cst) const noexcept;
  };
}
To [atomics.types.generic.general]:
namespace std {
  template<class T> struct atomic {
    voidT wait(T, memory_order = memory_order::seq_cst) const volatile noexcept;
    voidT wait(T, memory_order = memory_order::seq_cst) const noexcept;
  };
}
To [atomics.types.operations]:
voidT wait(T old, memory_order order = memory_order::seq_cst) const volatile noexcept;
voidT wait(T old, memory_order order = memory_order::seq_cst) const noexcept;
- Preconditions: order is neither memory_order::releasenormemory_order::acq_rel.
- Effects: Repeatedly performs the following steps, in order:
- Evaluates load(order)and compares its value representation for equality against that ofold.
- If they compare unequal, returns the result of the evaluation of load(order)in the previous step.
- Blocks until it is unblocked by an atomic notifying operation or is unblocked spuriously.
 
- Remarks: This function is an atomic waiting operation (atomics.wait).
To [atomics.types.int]:
namespace std {
  template<> struct atomic<integral> {
    voidT wait(integral, memory_order = memory_order::seq_cst) const volatile noexcept;
    voidT wait(integral, memory_order = memory_order::seq_cst) const noexcept;
  };
}
To [atomics.types.float]:
namespace std {
  template<> struct atomic<floating-point> {
    voidT wait(floating-point, memory_order = memory_order::seq_cst) const volatile noexcept;
    voidT wait(floating-point, memory_order = memory_order::seq_cst) const noexcept;
  };
}
To [atomics.types.pointer]:
namespace std {
  template<class T> struct atomic<T*> {
    voidT* wait(T*, memory_order = memory_order::seq_cst) const volatile noexcept;
    voidT* wait(T*, memory_order = memory_order::seq_cst) const noexcept;
  };
}
To [util.smartptr.atomic.shared]:
namespace std {
  template<class T> struct atomic<shared_ptr<T>> {
    voidshared_ptr<T> wait(shared_ptr<T> old, memory_order = memory_order::seq_cst) const noexcept;
  };
}
and
voidshared_ptr<T> wait(shared_ptr<T> old, memory_order order = memory_order::seq_cst) const noexcept;
- Preconditions: orderis neithermemory_order::releasenormemory_order::acq_rel.
- Effects: Repeatedly performs the following steps, in order:
- Evaluates load(order)and compares it toold.
- If the two are not equivalent, returns the result of the evaluation of load(order)in the previous step.
- Blocks until it is unblocked by an atomic notifying operation or is unblocked spuriously.
 
- Remarks: Two shared_ptrobjects are equivalent if they store the same pointer and either share ownership or are both empty. This function is an atomic waiting operation (atomics.wait).
To [util.smartptr.atomic.weak]:
namespace std {
  template<class T> struct atomic<weak_ptr<T>> {
    voidweak_ptr<T> wait(weak_ptr<T> old, memory_order = memory_order::seq_cst) const noexcept;
  };
}
voidweak_ptr<T> wait(weak_ptr<T> old, memory_order order = memory_order::seq_cst) const noexcept;
- Preconditions: order is neither memory_order::releasenormemory_order::acq_rel.
- Effects: Repeatedly performs the following steps, in order:
- Evaluates load(order)and compares it toold.
- If the two are not equivalent, returns the result of the evaluation of load(order)in the previous step.
- Blocks until it is unblocked by an atomic notifying operation or is unblocked spuriously.
 
- Remarks: Two weak_ptrobjects are equivalent if they store the same pointer and either share ownership or are both empty. This function is an atomic waiting operation (atomics.wait).
No changes to [atomics.nonmembers] are needed.
No changes to [atomic.flag]'s wait APIs are needed.
Fallible and timed-versions of ::wait APIs
To [atomics.ref.generic.general]:
namespace std {
  template<class T> struct atomic_ref {  // [atomics.ref.generic.general]
    optional<T> try_wait(memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<T> try_wait_for(
      T, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<T> try_wait_until(
      T, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
  };
}
To [atomics.ref.ops]:
optional<T> try_wait(memory_order = memory_order::seq_cst) const noexcept;
template <class Rep, class Period>
optional<T> try_wait_for(T old, 
    chrono::duration<Rep, Period> const& rel_time,
    memory_order order = memory_order::seq_cst
) const noexcept;
template <class Clock, class Duration>
optional<T> try_wait_until(T old, 
    chrono::time_point<Clock, Duration> const& abs_time,
    memory_order order = memory_order::seq_cst
) const noexcept;
- Preconditions: orderis neithermemory_order::releasenormemory_order::acq_rel.
- Effects: Repeatedly performs the following steps, in order:
- Evaluates load(order)and compares its value representation for equality against that ofold.
- If they compare unequal, returns the result of the evaluation of load(order)in the previous step.
- Blocks until it is unblocked by an atomic notifying operation or is unblocked spuriously or the timeout expired. If it is unblocked by the timeout there is no effect and it returns nullopt.
 
 The timeout expires (thread.req.timing) when the current time is afterabs_time(fortry_wait_until) or when at leastrel_timehas passed from the start of the function (fortry_wait_for).
 
 An implementation should ensure thattry_wait_forandtry_wait_untildo not consistently returnnulloptin the absence of contending atomic operations.
 
 The timeout fortry_waitis finite but otherwise unspecified.
 
- Throws: Timeout-related exceptions (thread.req.timing).
- Remarks: This function is an atomic waiting operation (atomics.wait) on atomic object *ptr.
To [atomics.ref.int]:
namespace std {
  template<> struct atomic_ref<integral> {
    optional<integral> try_wait(memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<integral> try_wait_for(
      integral, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<integral> try_wait_until(
      integral, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
  };
}
To [atomics.ref.float]:
namespace std {
  template<> struct atomic_ref<floating-point> {
    optional<floating-point> try_wait(memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<floating-point> try_wait_for(
      floating-point, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<floating-point> try_wait_until(
      floating-point, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
  };
}
To [atomics.ref.pointer]:
namespace std {
  template<class T> struct atomic_ref<T*> {
    optional<T*> try_wait(memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<T*> try_wait_for(
      T*, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<T*> try_wait_until(
      T*, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
  };
}
To [atomics.types.generic.general]:
namespace std {
  template<class T> struct atomic {
    optional<T> try_wait(memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<T> try_wait_for(
      integral, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
     optional<T> try_wait_for(
      integral, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
    template <class Clock, class Duration>
    optional<T> try_wait_until(
      integral, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<T> try_wait_until(
      integral, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
     
  };
}
To [atomics.types.operations]:
optional<T> try_wait(memory_order = memory_order::seq_cst) const noexcept;
template <class Rep, class Period>
optional<T> try_wait_for(T old, 
                         chrono::duration<Rep, Period> const& rel_time,
                         memory_order order = memory_order::seq_cst
                        ) const noexcept;
template <class Rep, class Period>
optional<T> try_wait_for(T old, 
                         chrono::duration<Rep, Period> const& rel_time,
                         memory_order order = memory_order::seq_cst
                        ) const volatile noexcept;
template <class Clock, class Duration>
optional<T> try_wait_until(T old, 
                           chrono::time_point<Clock, Duration> const& abs_time,
                           memory_order order = memory_order::seq_cst
                          ) const noexcept;
template <class Clock, class Duration>
optional<T> try_wait_until(T old, 
                           chrono::time_point<Clock, Duration> const& abs_time,
                           memory_order order = memory_order::seq_cst
                          ) const volatile noexcept;
EDITORIAL: analogous to atomic_ref. Intentionally left out from the current revision of this paper.
To [atomics.types.int]:
namespace std {
  template<> struct atomic<integral> {
    optional<integral> try_wait(memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<integral> try_wait_for(
      integral, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Rep, class Period>
    optional<integral> try_wait_for(
      integral, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
    template <class Clock, class Duration>
    optional<integral> try_wait_until(
      integral, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<integral> try_wait_until(
      integral, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
  };
}
To [atomics.types.float]:
namespace std {
  template<> struct atomic<floating-point> {
    optional<floating-point> try_wait(memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<floating-point> try_wait_for(
      floating-point, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Rep, class Period>
    optional<floating-point> try_wait_for(
      floating-point, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
    template <class Clock, class Duration>
    optional<floating-point> try_wait_until(
      floating-point, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<floating-point> try_wait_until(
      floating-point, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
  };
}
To [atomics.types.pointer]:
namespace std {
  template<class T> struct atomic<T*> {
    optional<T*> try_wait(memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<T*> try_wait_for(
      T*, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Rep, class Period>
    optional<T*> try_wait_for(
      T*, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
    template <class Clock, class Duration>
    optional<T*> try_wait_until(
      T*, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<T*> try_wait_until(
      T*, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
  };
}
To [util.smartptr.atomic.shared]:
namespace std {
  template<class T> struct atomic<shared_ptr<T>> {
    optional<shared_ptr<T>> try_wait(memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<shared_ptr<T>> try_wait_for(
      shared_ptr<T>l, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<shared_ptr<T>> try_wait_until(
      shared_ptr<T>, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
  };
}
EDITORIAL: analogous to the try_wait_for/_until APIS of atomic_ref, with shared_ptr/weak_ptr tweaks. Intentionally left out of the current revision of this paper.
To [util.smartptr.atomic.weak]:
namespace std {
  template<class T> struct atomic<weak_ptr<T>> {
    optional<weak_ptr<T>> try_wait(memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<weak_ptr<T>> try_wait_for(
      weak_ptr<T>l, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<weak_ptr<T>> try_wait_until(
      weak_ptr<T>, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
  };
}
EDITORIAL: analogous to the try_wait_for/_until APIS of atomic_ref, with shared_ptr/weak_ptr tweaks. Intentionally left out of the current revision of this paper.
EDITORIAL: No changes to [atomics.nonmembers] are needed.
To [atomic.flag]:
namespace std {
  struct atomic_flag {
    bool try_wait(memory_order = memory_order::seq_cst) const noexcept;
    bool try_wait(memory_order = memory_order::seq_cst) const noexcept volatile;
    template <class Rep, class Period>
    bool try_wait_for(
      bool, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
    ) const noexcept;
    template <class Rep, class Period>
    bool try_wait_for(
      bool, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
    ) const volatile noexcept;
    template <class Clock, class Duration>
    bool try_wait_until(
      bool, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
     template <class Clock, class Duration>
     bool try_wait_until(
      bool, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
  };
}
bool atomic_flag_try_wait(const atomic_flag* object, bool old) noexcept;
bool atomic_flag_try_wait(const volatile atomic_flag* object, bool old) noexcept;
bool atomic_flag_try_wait_explicit(const atomic_flag* object, bool old) noexcept;
bool atomic_flag_try_wait_explicit(const volatile atomic_flag* object, bool old) noexcept;
bool atomic_flag::try_wait(bool old, memory_order order = memory_order::seq_cst) const noexcept;
bool atomic_flag::try_wait(bool old, memory_order order = memory_order::seq_cst) const volatile noexcept;
bool atomic_flag_try_wait_for(const atomic_flag* object, bool old, chrono::duration<Rep, Period> const& rel_time) noexcept;
bool atomic_flag_try_wait_for(const volatile atomic_flag* object, bool old, chrono::duration<Rep, Period> const& rel_time) noexcept;
bool atomic_flag_try_wait_for_explicit(const atomic_flag* object, bool old, chrono::duration<Rep, Period> const& rel_time, memory_order order = memory_order::seq_cst) noexcept;
bool atomic_flag_try_wait_for_explicit(const volatile atomic_flag* object, bool old, chrono::duration<Rep, Period> const& rel_time, memory_order order = memory_order::seq_cst) noexcept;
bool atomic_flag::try_wait_for(bool old, chrono::duration<Rep, Period> const& rel_time, memory_order order = memory_order::seq_cst) const noexcept;
bool atomic_flag::try_wait_for(bool old, chrono::duration<Rep, Period> const& rel_time, memory_order order = memory_order::seq_cst) const volatile noexcept;
atomic_flag_try_wait_until(const atomic_flag* object, bool old, chrono::time_point<Clock, Duration> const& abs_time) noexcept;
bool atomic_flag_try_wait_until(const volatile atomic_flag* object, bool old, chrono::time_point<Clock, Duration> const& abs_time) noexcept;
bool atomic_flag_try_wait_until_explicit(const atomic_flag* object, bool old, chrono::time_point<Clock, Duration> const& abs_time, memory_order order = memory_order::seq_cst) noexcept;
bool atomic_flag_try_wait_until_explicit(const volatile atomic_flag* object, bool old, chrono::time_point<Clock, Duration> const& abs_time, memory_order order = memory_order::seq_cst) noexcept;
bool atomic_flag::try_wait_until(bool old, chrono::time_point<Clock, Duration> const& abs_time, memory_order order = memory_order::seq_cst) const noexcept;
bool atomic_flag::try_wait_until(bool old, chrono::time_point<Clock, Duration> const& abs_time, memory_order order = memory_order::seq_cst) const volatile noexcept;
For atomic_flag_try_wait, atomic_flag_try_wait_for, and atomic_flag_try_wait_until, let order be memory_order::seq_cst. Let flag be object for the non-member functions, and this for the member functions.
- Preconditions: orderis neithermemory_order::releasenormemory_order::acq_rel.
- Effects: Repeatedly performs the following steps, in order:
- Evaluates load(order)and compares its value representation for equality against that ofold.
- If they compare unequal, returns the result of the evaluation of load(order)in the previous step.
- Blocks until it is unblocked by an atomic notifying operation or is unblocked spuriously or the timeout expired. If it is unblocked by the timeout there is no effect and it returns nullopt.
 
 The timeout expires (thread.req.timing) when the current time is afterabs_time(fortry_wait_until) or when at leastrel_timehas passed from the start of the function (fortry_wait_for).
 
 An implementation should ensure thattry_wait_forandtry_wait_untildo not consistently returnnulloptin the absence of contending atomic operations.
 
 The timeout fortry_waitis finite but otherwise unspecified.
 
- Throws: Timeout-related exceptions (thread.req.timing).
- Remarks: This function is an atomic waiting operation (atomics.wait) on atomic object *ptr.
To [atomics.wait]:
- [Note 2: The following functions are atomic waiting operations:
 2.1atomic<T>::wait,atomic<T>::wait_for,atomic::<T>::wait_until,
 2.2atomic_flag::wait,atomic_flag::wait_for,atomic_flag::wait_until,
 2.3atomic_wait and,atomic_wait_explicit,atomic_wait_for,atomic_wait_for_explicit,atomic_wait_until,atomic_wait_until_explicit,
 2.4atomic_flag_wait and,atomic_flag_wait_explicit,atomic_flag_wait_for,atomic_flag_wait_for_explicit,atomic_flag_wait_until,atomic_flag_wait_until_explicit,and
 2.5atomic_ref<T>::wait,atomic_ref<T>::wait_for,atomic_ref<T>::wait_until.
 — end note]
UNRESOLVED QUESTION: do we need to change something else for the non-member versions of try_wait_for and try_wait_until operations?
To [thread.barrier]:
namespace std {
  template <class Completion Function>
  class barrier {
  
  public:
    bool try_wait(arrival_token& tok) const;
    template <class Rep, class Period>
    bool try_wait_for(arrival_token& tok, chrono::duration<Rep, Period> const& rel_time) const;
    template <class Clock, class Duration>
    bool try_wait_until(arrival_token& tok, chrono::time_point<Clock, Duration> const& abs_time) const;
  };
}
UNRESOLVED QUESTION: arrival_token& vs arrival_token const& ?
Note: try_wait_for/try_wait_until do not take ownership of the arrival_token because if they fail, the user still needs the token to be able to call them again.
bool try_wait(arrival_token& tok) const;
template <class Rep, class Period>
bool try_wait_for(arrival_token& tok, chrono::duration<Rep, Period> const& rel_time) const;
template <class Clock, class Duration>
bool try_wait_until(arrival_token& tok, chrono::time_point<Clock, Duration> const& abs_time) const;
- Preconditions: arrivalis associated with the phase synchronization point for the current phase or the immediately preceding phase of the same barrier object.
- Effects: Blocks at the synchronization point associated with arrivaluntil the phase completion step of the synchronization point’s phase is run or the timeout expired. If it is unblocked by the timeout there is no effect and it returnsfalse; otherwise, it returnstrue.
 
 The timeout expires (thread.req.timing) when the current time is afterabs_time(fortry_wait_until) or when at leastrel_timehas passed from the start of the function (fortry_wait_for).
 
 The timeout fortry_waitis finite but otherwise unspecified.
 
 An implementation must ensure thattry_wait_forandtry_wait_untildo not consistently returnfalseafter the phase completion step associated witharrivalhas run.
 
 [Note: Ifarrivalis associated with the synchronization point for a previous phase, the call returns immediately. — end note]
- Throws: system_errorwhen an exception is required (thread.req.exception) or timeout-related exceptions (thread.req.timing).
- Error conditions: Error conditions: Any of the error conditions allowed for mutex types (thread.mutex.requirements.mutex).
To thread.latch:
namespace std {
  class latch {
  public:
    bool try_wait() const;
    template <class Rep, class Period>
    bool try_wait_for(chrono::duration<Rep, Period> const& rel_time) const;
    template <class Clock, class Duration>
    bool try_wait_until(chrono::time_point<Clock, Duration> const& abs_time) const;
  };
}
bool try_wait() const;
template <class Rep, class Period>
bool try_wait_for(chrono::duration<Rep, Period> const& rel_time) const;
template <class Clock, class Duration>
bool try_wait_until(chrono::time_point<Clock, Duration> const& abs_time) const;
EDITORIAL: semantics intentionally omitted from the current revision of this paper but analogous to barrier.
Document number: P2643R1
Date: 2023-05-18
Reply to: Gonzalo Brito Gadeschi <gonzalob _at_ nvidia.com>
Authors: Gonzalo Brito Gadeschi, Olivier Giroux, Thomas Rodgers
Audience: Concurrency
Improving C++ concurrency features
Revisions
r0 - (Kona Drafts)
barrier::try_wait_forandbarrier::try_wait_until.barrier::try_wait_for/_untilsignatures; they were incorrectly accepting anarrival_token&&, but since these can be called in a loop consuming thearrival_tokenis incorrect.try_wait.r1 - (Varna Draft)
optional<T>vspair<bool, T>vsT, reflecting Kona guidance.try_waitwith rationale, reflecting Kona guidance.Introduction
P1135R6 introduced serval new concurrency primitives to the C++20 concurrency library:
<atomic>: added the classatomic_flag, thewaitandnotify_one/_allto class templateatomic<>, and free function versions of these.<semaphore>: added class templatecounting_semaphore<>and classbinary_semaphore.<barrier>,<latch>: added class templatebarrier<>and classlatch.Though each element included was long coming, and had much implementation experience behind it, fresh user feedback tells us that some improvements could still be made:
atomic::wait; this value is lost otherwise.atomic::waitand other concurrency primitves likebarrierandlatch, to make it easier to implement concurrency primitives on top ofatomicthat expose timed waiting facilities themselves, e.g., like<semaphore>.atomic::waitby accepting a predicate.This proposal proposes extensions to address these shortcomings. This branch demonstrates its implementability in libstdc++.
Design
The design of the features above is mostly orthogonal, and this section explores them independently.
Return last observed value on success
Desing:
::waitAPIs change the return type signature from returningvoidTwait(...);Example 0:
atomic<int> a(42);a.wait(42);auto o = a.load();assert(o != 42); // MAY FAIL!atomic<int> a(42);auto o = a.wait(42);assert(o != 42); // OK!The
atomic<T>::waitmethod guarantees that the thread is unblocked only if the value changed.Before this paper, the new
atomic<T>value is not returned to the caller. This has the following two shortcomings:atomic<T>::load(ABA Problem).atomic<T>::load, even thoughatomic::wait<T>most likely already loaded the value, to test that it did change.After this paper, the value returned by
waitis returned to the caller, eliminating the need for the subsequent load.This is an ABI breaking change that could be resolved by picking a different to-be-determined name for these APIs.
Fallible and timed versions of wait APIs
The design of the fallible timed versions of wait APIs adds three new APIs. For
atomicthese areThey are non-blocking, i.e., they eventually return to the caller in a finite-set of steps, even if the value did not change. This enables the application to “do something else” before attempting to wait again.
On failure, i.e., if the value did not change, they return
nulloptand the operation has no effects (it does not synchronize). On success, they return anoptional<T>containing the last observed value, which is guaranteed to be different from the one the call site waited on.The untimed
try_waitoverload attempts to wait for a finite unspecified duration. The implementation may pick a different duration every time, enabling it decide for how long the operation should attempt to wait based on dynamic information, like the load of the system.Since <chrono> and <optional> are not freestanding, these APIs will not be available in freestanding implementations. We could attempt to improve this by:
optionalAPIs that do not throw exceptions in freestanding.<chrono>durations in freestanding.Example 1: the atomic variable
ttracks how many tasks need to still be processed, and the application prints every second - if it is still waiting - how many tasks remain:atomic<int> t;int old = 1;auto b = clock::now();while (old != 0) {old = t.load();if (auto e = clock::now(); (e - b) > 1s) {cout << old;b = e;}}atomic<int> t;auto old = 1;while (old != 0) {auto o = t.try_wait_for(old, 1s);old = o.has_value()? o.value() : old;cout << old;}Before this proposal, applications need to re-implement
atomic<T>::waitlogic, and like in this example, may accidentally callatomic<T>::loadin a loop without any back-off.After this proposal,
try_wait_forexpresses the application’s intent to wait for a certain duration of time, enabling the implementation to wait efficiently.For
barrierandlatchthey look like:Example 2: using a
barrierwhich tracks the completion oftask_counttasks:Predicate versions
There is no wording for the predicate versions proposed in this revision of this paper.
When the program is waiting for a condition different from “not equal to”, there is an added re-try loop around the
waitoperation in the program. This loop causes each call towaitto be performed as if it were the first call towait, oblivious to the fact that the program has already been waiting for some time. This leads to re-executing the short-term polling strategy.Taking a predicate instead of a value allows us to push the program-defined condition inside of
atomic::wait, delete the outer loop, and allows the implementation to track time spent.At least two implementations currently implement
atomic::waitin terms of a wait taking a predicate.The authors see two APIs that are both equvialent in terms of implementation complexity and usability, but have an opposite API depending on how the semantics of the predicate are defined.
Both options have this signature
T atomic<T>::wait_with_predicate(Predicate&& p)and the predicate takes aTand returns abool, but the predicate returns:trueif we synchronizesfalseif we synchronizesatomic::wait(T)waits untilTchanges)Feedback needed: We should pick an option before working on wording.
Example 3: same example as Example 1 using Option 1 above (
trueif we synchronize):atomic<int> t;int old = 1;auto b = clock::now();while (old != 0) {old = t.load();auto e = clock::now();if ((e - b) > 1s) {cout << old;b = e;}}atomic<int> t;int old = t.load();auto p = [&](int v) {old = v;return v == 0;};while(!t.try_wait_with_predicate_for(p, 1s).has_value()) {cout << old;}In Example 1, the “After this paper” implementation has the issue that
try_wait_for(old, 1s)returns as soon asoldchanges. That is, while the goal of the application is to wait for the value becoming zero, and it could wait up to 1s for that, if after 0.1s the value changes from 10 to 9 this API will return.The “After this paper” implementation in Example 2 fixes that using the fallibe timed
_with_predicateversion. The predicate is called every time the value changes, but it doesn’t “succeed” until the value is 0.Wording
Return last observed value from
atomic::waitTo [atomics.ref.generic.general]:
To [atomics.ref.ops]:
orderis neithermemory_order::releasenormemory_order::acq_rel.load(order)and compares its value representation for equality against that ofold.load(order)in the previous step.*ptr.To [atomics.ref.int]:
To [atomics.ref.float]:
To [atomics.ref.pointer]:
To [atomics.types.generic.general]:
To [atomics.types.operations]:
memory_order::releasenormemory_order::acq_rel.load(order)and compares its value representation for equality against that ofold.load(order)in the previous step.To [atomics.types.int]:
To [atomics.types.float]:
To [atomics.types.pointer]:
To [util.smartptr.atomic.shared]:
and
orderis neithermemory_order::releasenormemory_order::acq_rel.load(order)and compares it toold.load(order)in the previous step.shared_ptrobjects are equivalent if they store the same pointer and either share ownership or are both empty. This function is an atomic waiting operation (atomics.wait).To [util.smartptr.atomic.weak]:
memory_order::releasenormemory_order::acq_rel.load(order)and compares it toold.load(order)in the previous step.weak_ptrobjects are equivalent if they store the same pointer and either share ownership or are both empty. This function is an atomic waiting operation (atomics.wait).No changes to [atomics.nonmembers] are needed.
No changes to [atomic.flag]'s
waitAPIs are needed.Fallible and timed-versions of
::waitAPIsTo [atomics.ref.generic.general]:
To [atomics.ref.ops]:
orderis neithermemory_order::releasenormemory_order::acq_rel.load(order)and compares its value representation for equality against that ofold.load(order)in the previous step.nullopt.The timeout expires (thread.req.timing) when the current time is after
abs_time(fortry_wait_until) or when at leastrel_timehas passed from the start of the function (fortry_wait_for).An implementation should ensure that
try_wait_forandtry_wait_untildo not consistently returnnulloptin the absence of contending atomic operations.The timeout for
try_waitis finite but otherwise unspecified.*ptr.To [atomics.ref.int]:
To [atomics.ref.float]:
To [atomics.ref.pointer]:
To [atomics.types.generic.general]:
To [atomics.types.operations]:
To [atomics.types.int]:
To [atomics.types.float]:
To [atomics.types.pointer]:
To [util.smartptr.atomic.shared]:
To [util.smartptr.atomic.weak]:
To [atomic.flag]:
For
atomic_flag_try_wait,atomic_flag_try_wait_for, andatomic_flag_try_wait_until, letorderbememory_order::seq_cst. Letflagbeobjectfor the non-member functions, andthisfor the member functions.orderis neithermemory_order::releasenormemory_order::acq_rel.load(order)and compares its value representation for equality against that ofold.load(order)in the previous step.nullopt.The timeout expires (thread.req.timing) when the current time is after
abs_time(fortry_wait_until) or when at leastrel_timehas passed from the start of the function (fortry_wait_for).An implementation should ensure that
try_wait_forandtry_wait_untildo not consistently returnnulloptin the absence of contending atomic operations.The timeout for
try_waitis finite but otherwise unspecified.*ptr.To [atomics.wait]:
2.1
atomic<T>::wait,atomic<T>::wait_for,atomic::<T>::wait_until,2.2
atomic_flag::wait,atomic_flag::wait_for,atomic_flag::wait_until,2.3
atomic_waitand,atomic_wait_explicit,atomic_wait_for,atomic_wait_for_explicit,atomic_wait_until,atomic_wait_until_explicit,2.4
atomic_flag_waitand,atomic_flag_wait_explicit,atomic_flag_wait_for,atomic_flag_wait_for_explicit,atomic_flag_wait_until,atomic_flag_wait_until_explicit,and2.5
atomic_ref<T>::wait,atomic_ref<T>::wait_for,atomic_ref<T>::wait_until.— end note]
To [thread.barrier]:
arrivalis associated with the phase synchronization point for the current phase or the immediately preceding phase of the same barrier object.arrivaluntil the phase completion step of the synchronization point’s phase is run or the timeout expired. If it is unblocked by the timeout there is no effect and it returnsfalse; otherwise, it returnstrue.The timeout expires (thread.req.timing) when the current time is after
abs_time(fortry_wait_until) or when at leastrel_timehas passed from the start of the function (fortry_wait_for).The timeout for
try_waitis finite but otherwise unspecified.An implementation must ensure that
try_wait_forandtry_wait_untildo not consistently returnfalseafter the phase completion step associated witharrivalhas run.[Note: If
arrivalis associated with the synchronization point for a previous phase, the call returns immediately. — end note]system_errorwhen an exception is required (thread.req.exception) or timeout-related exceptions (thread.req.timing).To thread.latch: