Document Number:	N4071
Date:	2014-06-19
Revises:	N3989
Editor:	Jared Hoberock NVIDIA Corporation jhoberock@nvidia.com

Execution policies

2.1

In general

[parallel.execpol.general]

This clause describes classes that represent execution policies. An execution policy is an object that expresses the requirements on the ordering of functions invoked as a consequence of the invocation of a standard algorithm. Execution policies afford standard algorithms the discretion to execute in parallel.

[ Example:

std::vector<int> v = ...

// standard sequential sort
std::sort(vec.begin(), vec.end());
std::sort(std::begin(vec), std::end(vec));
std::sort(vec.begin(), vec.end());

using namespace std::experimental::parallel;

// explicitly sequential sort
sort(seq, v.begin(), v.end());
sort(seq, std::begin(v), std::end(v));
sort(seq, v.begin(), v.end());

// permitting parallel execution
sort(par, v.begin(), v.end());
sort(par, std::begin(v), std::end(v));
sort(par, v.begin(), v.end());

// permitting vectorization as well
sort(vec, v.begin(), v.end());
sort(vec, std::begin(v), std::end(v));
sort(par_vec, v.begin(), v.end());

// sort with dynamically-selected execution
size_t threshold = ...
execution_policy exec = seq;
if (v.size() > threshold)
{
  exec = par;
}

sort(exec, v.begin(), v.end());
sort(exec, std::begin(v), std::end(v));
sort(exec, v.begin(), v.end());

— end example ]

[ Note: Because different parallel architectures may require idiosyncratic parameters for efficient execution, implementations of the Standard Library shouldmay provide additional execution policies to those described in this Technical Specification as extensions. — end note ]

2.2

Header `<experimental/execution_policy>` synopsis

[parallel.execpol.synopsis]

namespace std {
namespace experimental {
namespace parallel {
inline namespace v1 {
  // 2.3, Execution policy type trait
  template<class T> struct is_execution_policy;
  template<class T> constexpr bool is_execution_policy_v = is_execution_policy<T>::value;

  // 2.4, Sequential execution policy
  class sequential_execution_policy;

  // 2.5, Parallel execution policy
  class parallel_execution_policy;

  // 2.6, Parallel+Vector execution policy
  class vector_execution_policyparallel_vector_execution_policy;

  // 2.7, Dynamic execution policy
  class execution_policy;
}
}
}
}

2.3

Execution policy type trait

[parallel.execpol.type]


namespace std {
namespace experimental {
namespace parallel { 
  template<class T> struct is_execution_policy  { see below };
    : integral_constant<bool, see below> { };

}
}
}

is_execution_policy can be used to detect parallel execution policies for the purpose of excluding function signatures from otherwise ambiguous overload resolution participation.

If T is the type of a standard or implementation-defined execution policy, is_execution_policy<T> shall be publicly derived from integral_constant<bool,true>, otherwise from integral_constant<bool,false>.

is_execution_policy<T> shall be a UnaryTypeTrait with a BaseCharacteristic of true_type if T is the type of a standard or implementation-defined execution policy, otherwise false_type.

The behavior of a program that adds specializations for is_execution_policy is undefined.

2.4

Sequential execution policy

[parallel.execpol.seq]


namespace std {
namespace experimental {
namespace parallel {

  class sequential_execution_policy{ unspecified };

}
}
}

The class sequential_execution_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and require that a parallel algorithm's execution may not be parallelized.

2.5

Parallel execution policy

[parallel.execpol.par]


namespace std {
namespace experimental {
namespace parallel {

  class parallel_execution_policy{ unspecified };

}
}
}

The class parallel_execution_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be parallelized.

2.6

Parallel+Vector execution policy

[parallel.execpol.vec]


namespace std {
namespace experimental {
namespace parallel {

  class vector_execution_policyparallel_vector_execution_policy{ unspecified };

}
}
}

The class class vector_execution_policyparallel_vector_execution_policy is an execution policy type used as a unique type to disambiguate parallel algorithm overloading and indicate that a parallel algorithm's execution may be vectorized and parallelized.

2.7

Dynamic execution policy

[parallel.execpol.dynamic]


namespace std {
namespace experimental {
namespace parallel {

  class execution_policy
  {
    public:
      // 2.7.1, execution_policy construct/assign
      template<class T> execution_policy(const T& exec);
      template<class T> execution_policy& operator=(const T& exec);

      // 2.7.2, execution_policy object access
      template<class T> T* get() noexcept;
      template<class T> const T* get() const noexcept;
  };

}
}
}

The class execution_policy is a container for execution policy objects. execution_policy allows dynamic control over standard algorithm execution.

[ Example:

std::vector<float> sort_me = ...
        
using namespace std::experimental::parallel;
std::execution_policy exec = std::seq;

if(sort_me.size() > threshold)
{
  exec = std::par;
}
 
std::sort(exec, sort_me.begin(), sort_me.end());
std::sort(exec, std::begin(sort_me), std::end(sort_me));

— end example ]

Objects of type execution_policy shall be constructible and assignable from objects of type T for which is_execution_policy<T>::value is true.

2.7.1

`execution_policy` construct/assign

[parallel.execpol.con]

template<class T> execution_policy(const T& exec);

Effects:: Constructs an execution_policy object with a copy of exec's state.
Requires:: is_execution_policy<T>::value is true.
Remarks:: This constructor shall not participate in overload resolution unless is_execution_policy<T>::value is true.

template<class T> execution_policy& operator=(const T& exec);

Effects:: Assigns a copy of exec's state to *this.
Returns:: *this.
Requires:: is_execution_policy<T>::value is true.
Remarks:: This operator shall not partipate in overload resolution unless is_execution_policy<T>::value is true.

2.7.2

`execution_policy` object access

[parallel.execpol.access]


          const type_info& type() const noexcept;

Returns:: typeid(T), such that T is the type of the execution policy object contained by *this.


          template<class T> T* get() noexcept;
          template<class T> const T* get() const noexcept;

Returns:: If target_type() == typeid(T), a pointer to the stored execution policy object; otherwise a null pointer.
Requires:: is_execution_policy<T>::value is true.
Remarks:: This function shall not participate in overload resolution unless is_execution_policy<T> is true.

2.8

Execution policy objects

[parallel.execpol.objects]


namespace std {
namespace experimental {
namespace parallel {


  constexpr sequential_execution_policy seq = sequential_execution_policy();
  constexpr parallel_execution_policy   par = parallel_execution_policy();
  constexpr vector_execution_policy     vec = vector_execution_policy();


  constexpr sequential_execution_policy      seq{};
  constexpr parallel_execution_policy        par{};
  constexpr parallel_vector_execution_policy par_vec{};


}
}
}

The header <experimental/execution_policy> declares a global object associated with each type of execution policy defined by this Technical Specification.

Parallel algorithms

[parallel.alg]

4.1

In general

[parallel.alg.general]

This clause describes components that C++ programs may use to perform operations on containers and other sequences in parallel.

4.1.1

Effect of execution policies on algorithm execution

[parallel.alg.general.exec]

Parallel algorithms have template parameters named ExecutionPolicy which describe the manner in which the execution of these algorithms may be parallelized and the manner in which they apply user-provided function objects.

The applications of function objects in parallel algorithms invoked with an execution policy object of type sequential_execution_policy execute in sequential order in the calling thread.

The applications of function objects in parallel algorithms invoked with an execution policy object of type parallel_execution_policy are permitted to execute in an unordered fashion in unspecified threads, and indeterminately sequenced within each thread. [ Note: It is the caller's responsibility to ensure correctness, for example that the invocation does not introduce data races or deadlocks. — end note ]

[ Example:

using namespace std::experimental::parallel;
int a[] = {0,1};
std::vector<int> v;
for_each(par, std::begin(a), std::end(a), [&](int i) {
  v.push_back(i*2+1);
});
foo bar

The program above has a data race because of the unsynchronized access to the container v. — end example ]

[ Example:

using namespace std::experimental::parallel;
std::atomic<int> x = 0;
int a[] = {1,2};
for_each(par, std::begin(a), std::end(a), [&](int n) {
  x.fetch_add(1, std::memory_order_relaxed);
  // spin wait for another iteration to change the value of x
  while (x.load(std::memory_order_relaxed) == 1) { }
});

The above example depends on the order of execution of the iterations, and is therefore undefined (may deadlock). — end example ]

[ Example:

using namespace std::experimental::parallel;
int x=0;
std::mutex m;
int a[] = {1,2};
for_each(par, std::begin(a), std::end(a), [&](int) {
  m.lock();
  ++x;
  m.unlock();
});

The above example synchronizes access to object x ensuring that it is incremented correctly. — end example ]

The applications of function objects in parallel algorithms invoked with an execution policy of type vector_execution_policyparallel_vector_execution_policy are permitted to execute in an unordered fashion in unspecified threads, and unsequenced within each thread. [ Note: This means that multiple function object invocations may be interleaved on a single thread. — end note ] [ Note: As a consequence, function objects governed by the vector_execution_policyparallel_vector_execution_policy policy must not synchronize with each other. Specifically, they must not acquire locks. — end note ]

[ Example:

using namespace std::experimental::parallel;
int x=0;
std::mutex m;
int a[] = {1,2};
for_each(par_vec, std::begin(a), std::end(a), [&](int) {
  m.lock();
  ++x;
  m.unlock();
});

The above program is invalid because the applications of the function object are not guaranteed to run on different threads. — end example ]

[ Note: The application of the function object may result in two consecutive calls to m.lock on the same thread, which may deadlock. — end note ]

[ Note: The semantics of the parallel_execution_policy or the vector_execution_policyparallel_vector_execution_policy invocation allow the implementation to fall back to sequential execution if the system cannot parallelize an algorithm invocation due to lack of resources. — end note ]

A parallel algorithm invoked with an execution policy object of type parallel_execution_policy or vector_execution_policyparallel_vector_execution_policy may apply iterator member functions of a stronger category than its specification requires, if such iterators exist. In this case, the application of these member functions are subject to provisions 3. and 4. above, respectively.

[ Note: For example, an algorithm whose specification requires InputIterator but receives a concrete iterator of the category RandomAccessIterator may use operator[]. In this case, it is the algorithm caller's responsibility to ensure operator[] is race-free. — end note ]

Algorithms invoked with an execution policy object of type execution_policy execute internally as if invoked with instances of type sequential_execution_policy, parallel_execution_policy, or an implementation-defined execution policy type depending on the dynamic value of the execution_policy object. the contained execution policy object.

The semantics of parallel algorithms invoked with an execution policy object of implementation-defined type are unspecifiedimplementation-defined.

4.1.2

`ExecutionPolicy` algorithm overloads

[parallel.alg.overloads]

Parallel algorithms coexist alongside their sequential counterparts as overloads distinguished by a formal template parameter named ExecutionPolicy. This ~~template parameter corresponds to the parallel algorithm's first function parameter, whose type is ExecutionPolicy~~ is the first template parameter and corresponds to the parallel algorithm's first function parameter, whose type is ExecutionPolicy&&. The Parallel Algorithms Library provides overloads for each of the algorithms named in Table 1, corresponding to the algorithms with the same name in the C++ Standard Algorithms Library. For each algorithm in Table 1, if there are overloads for corresponding algorithms with the same name in the C++ Standard Algorithms Library, the overloads shall have an additional template type parameter named ExecutionPolicy, which shall be the first template parameter. In addition, each such overload shall have the new function parameter as the first function parameter of type ExecutionPolicy&&.

Unless otherwise specified, the semantics of ExecutionPolicy algorithm overloads are identical to their overloads without.

Parallel algorithms ~~have the requirement is_execution_policy<ExecutionPolicy>::value is true~~ shall not participate in overload resolution unless is_execution_policy<decay_t<ExecutionPolicy>>::value is true.

The algorithms listed in Table 1 shall have ExecutionPolicy overloads.

Table 1 — Table of parallel algorithms
`adjacent_difference`	`adjacent_find`	`all_of`	`any_of`
`copy`	`copy_if`	`copy_n`	`count`
`count_if`	`equal`	`exclusive_scan`	`fill`
`fill_n`	`find`	`find_end`	`find_first_of`
`find_if`	`find_if_not`	`for_each`	`for_each_n`
`generate`	`generate_n`	`includes`	`inclusive_scan`
`inner_product`	`inplace_merge`	`is_heap`	`is_heap_until`
`is_partitioned`	`is_sorted`	`is_sorted_until`	`lexicographical_compare`
`max_element`	`merge`	`min_element`	`minmax_element`
`mismatch`	`move`	`none_of`	`nth_element`
`partial_sort`	`partial_sort_copy`	`partition`	`partition_copy`
`reduce`	`remove`	`remove_copy`	`remove_copy_if`
`remove_if`	`replace`	`replace_copy`	`replace_copy_if`
`replace_if`	`reverse`	`reverse_copy`	`rotate`
`rotate_copy`	`search`	`search_n`	`set_difference`
`set_intersection`	`set_symmetric_difference`	`set_union`	`sort`
`stable_partition`	`stable_sort`	`swap_ranges`	`transform`
`uninitialized_copy`	`uninitialized_copy_n`	`uninitialized_fill`	`uninitialized_fill_n`
`unique`	`unique_copy`

[ Note: Not all algorithms in the Standard Library have counterparts in Table 1. — end note ]

4.2

Definitions

[parallel.alg.defns]

Define GENERALIZED_SUM(op, a1, ..., aN) as follows:

a1 when N is 1
op(GENERALIZED_SUM(op, b1, ..., bMK), GENERALIZED_SUM(op, bM, ..., bN)) where
- b1, ..., bN may be any permutation of a1, ..., aN and

Define GENERALIZED_NONCOMMUTATIVE_SUM(op, a1, ..., aN) as follows:

a1 when N is 1
op(GENERALIZED_NONCOMMUTATIVE_SUM(op, a1, ..., aMK), GENERALIZED_NONCOMMUTATIVE_SUM(op, aM, ..., aN) where 0 < M < N 1 < K+1 = M ≤ N.

4.3

Non-Numeric Parallel Algorithms

[parallel.alg.ops]

4.3.1

Header `<experimental/algorithm>` synopsis

[parallel.alg.ops.synopsis]

namespace std {
namespace experimental {
namespace parallel {
inline namespace v1 {
  template<class ExecutionPolicy,
           class InputIterator, class Function>
    void for_each(ExecutionPolicy&& exec,
                  InputIterator first, InputIterator last,
                  Function f);
  template<class InputIterator, class Size, class Function>
    InputIterator for_each_n(InputIterator first, Size n,
                             Function f);
}
}
}
}

4.3.2

For each

[parallel.alg.foreach]


          template<class ExecutionPolicy,
                   class InputIterator, class Function>
            void for_each(ExecutionPolicy&& exec,
                          InputIterator first, InputIterator last,
                          Function f);

Effects:: Applies f to the result of dereferencing every iterator in the range [first,last). [ Note: If the type of first satisfies the requirements of a mutable iterator, f may apply nonconstant functions through the dereferenced iterator. — end note ]
Complexity:: Applies f exactly last - first times.
Remarks:: If f returns a result, the result is ignored.
Notes:: Unlike its sequential form, the parallel overload of for_each does not return a copy of its Function parameter, since parallelization may not permit efficient state accumulation.
Requires:: Unlike its sequential form, the parallel overload of for_each requires Function to meet the requirements of CopyConstructible, but not MoveConstructible.


          template<class InputIterator, class Size, class Function>
            InputIterator for_each_n(InputIterator first, Size n,
                                     Function f);

Requires:: Function shall meet the requirements of MoveConstructible [ Note: Function need not meet the requirements of CopyConstructible. — end note ]
Effects:: Applies f to the result of dereferencing every iterator in the range [first,first + n), starting from first and proceeding to first + n - 1. [ Note: If the type of first satisfies the requirements of a mutable iterator, f may apply nonconstant functions through the dereferenced iterator. — end note ]
Returns:: first + n for non-negative values of n and first for negative values.
Remarks:: If f returns a result, the result is ignored.


          template<class ExecutionPolicy,
                   class InputIterator, class Size, class Function>
                   InputIterator for_each_n(ExecutionPolicy && exec,
                                            InputIterator first, Size n,
                                            Function f);

Effects:: Applies f to the result of dereferencing every iterator in the range [first,first + n), starting from first and proceeding to first + n - 1. [ Note: If the type of first satisfies the requirements of a mutable iterator, f may apply nonconstant functions through the dereferenced iterator. — end note ]
Returns:: first + n for non-negative values of n and first for negative values.
Remarks:: If f returns a result, the result is ignored.
Notes:: Unlike its sequential form, the parallel overload of for_each_n requires Function to meet the requirements of CopyConstructible, but not MoveConstructible.

4.4

Numeric Parallel Algorithms

[parallel.alg.numeric]

4.4.1

Header `<experimental/numeric>` synopsis

[parallel.alg.numeric.synopsis]

namespace std {
namespace experimental {
namespace parallel {
inline namespace v1 {
  template<class InputIterator>
    typename iterator_traits<InputIterator>::value_type
      reduce(InputIterator first, InputIterator last);
  template<class InputIterator, class T>
    T reduce(InputIterator first, InputIterator last, T init);
  template<class InputIterator, class T, class BinaryOperation>
    T reduce(InputIterator first, InputIterator last, T init,
             BinaryOperation binary_op);

  
  template<class InputIterator, class OutputIterator>
    OutputIterator
      exclusive_scan(InputIterator first, InputIterator last,
                     OutputIterator result);
  
  template<class InputIterator, class OutputIterator,
           class T>
    OutputIterator
      exclusive_scan(InputIterator first, InputIterator last,
                     OutputIterator result,
                     T init);
  template<class InputIterator, class OutputIterator,
           class T, class BinaryOperation>
    OutputIterator
      exclusive_scan(InputIterator first, InputIterator last,
                     OutputIterator result,
                     T init, BinaryOperation binary_op);

  template<class InputIterator, class OutputIterator>
    OutputIterator
      inclusive_scan(InputIterator first, InputIterator last,
                     OutputIterator result);
  template<class InputIterator, class OutputIterator,
           class BinaryOperation>
    OutputIterator
      inclusive_scan(InputIterator first, InputIterator last,
                     OutputIterator result,
                     BinaryOperation binary_op);
  template<class InputIterator, class OutputIterator,
           class T, class BinaryOperation, class T>
    OutputIterator
      inclusive_scan(InputIterator first, InputIterator last,
                     OutputIterator result,
                     T init, BinaryOperation binary_op, T init);
}
}
}
}

4.4.2

Reduce

[parallel.alg.reduce]


          template<class InputIterator>
            typename iterator_traits<InputIterator>::value_type
              reduce(InputIterator first, InputIterator last);

Effects:: Same as reduce(first, last, typename iterator_traits<InputIterator>::value_type{}).
Returns:: reduce(first, last, typename iterator_traits<InputIterator>::value_type{})
Requires:: typename iterator_traits<InputIterator>::value_type{} shall be a valid expression. The operator+ function associated with iterator_traits<InputIterator>::value_type shall not invalidate iterators or subranges, nor modify elements in the range [first,last).
Complexity:: O(last - first) applications of operator+.
Notes:: The primary difference between reduce and accumulate is that the behavior of reduce may be non-deterministic for non-associative or non-commutative operator+.


          template<class InputIterator, class T>
            T reduce(InputIterator first, InputIterator last, T init);

Effects:: Same as reduce(first, last, init, plus<>()).
Returns:: reduce(first, last, init, plus<>())
Requires:: The operator+ function associated with T shall not invalidate iterators or subranges, nor modify elements in the range [first,last).
Complexity:: O(last - first) applications of operator+.
Notes:: The primary difference between reduce and accumulate is that the behavior of reduce may be non-deterministic for non-associative or non-commutative operator+.


          template<class InputIterator, class T, class BinaryOperation>
            T reduce(InputIterator first, InputIterator last, T init,
                     BinaryOperation binary_op);

Returns:: GENERALIZED_SUM(binary_op, init, *first, ..., *(first + last - first - 1)*(first + (last - first) - 1)).
Requires:: binary_op shall not invalidate iterators or subranges, nor modify elements in the range [first,last).
Complexity:: O(last - first) applications of binary_op.
Notes:: The primary difference between reduce and accumulate is that the behavior of reduce may be non-deterministic for non-associative or non-commutative ~~operator+~~binary_op.

4.4.3

Exclusive scan

[parallel.alg.exclusive.scan]


          template<class InputIterator, class OutputIterator,
                   class T>
            OutputIterator
              exclusive_scan(InputIterator first, InputIterator last,
                             OutputIterator result,
                             T init);

Effects:: Same as exclusive_scan(first, last, result, init, plus<>()).
Returns:: exclusive_scan(first, last, result, init, plus<>())
Requires:: The operator+ function associated with iterator_traits<InputIterator>::value_type shall not invalidate iterators or subranges, nor modify elements in the ranges [first,last) or [result,result + (last - first)).
Complexity:: O(last - first) applications of operator+.
Notes:: The primary difference between exclusive_scan and inclusive_scan is that exclusive_scan excludes the ith input element from the ith sum. If the operator+ function is not mathematically associative, the behavior of exclusive_scan may be non-deterministic.


          template<class InputIterator, class OutputIterator,
                   class T, class BinaryOperation>
            OutputIterator
              exclusive_scan(InputIterator first, InputIterator last,
                             OutputIterator result,
                             T init, BinaryOperation binary_op);

Effects:: Assigns through each iterator i in [result,result + (last - first)) the value of GENERALIZED_NONCOMMUTATIVE_SUM(binary_op, init, *first, ..., (*first + i - result - 1)*(first + (i - result) - 1)).
Returns:: The end of the resulting range beginning at result.
Requires:: binary_op shall not invalidate iterators or subranges, nor modify elements in the ranges [first,last) or [result,result + (last - first)).
Complexity:: O(last - first) applications of binary_op.
Notes:: The primary difference between exclusive_scan and inclusive_scan is that exclusive_scan excludes the ith input element from the ith sum. If binary_op is not mathematically associative, the behavior of exclusive_scan may be non-deterministic.

4.4.4

Inclusive scan

[parallel.alg.inclusive.scan]


          template<class InputIterator, class OutputIterator>
            OutputIterator
              inclusive_scan(InputIterator first, InputIterator last,
                             OutputIterator result);

Effects:: Same as inclusive_scan(first, last, result, plus<>()).
Returns:: inclusive_scan(first, last, result, plus<>())
Requires:: The operator+ function associated with iterator_traits<InputIterator>::value_type shall not invalidate iterators or subranges, nor modify elements in the ranges [first,last) or [result,result + (last - first)).
Complexity:: O(last - first) applications of operator+.
Notes:: The primary difference between exclusive_scan and inclusive_scan is that exclusive_scan excludes the ith input element from the ith sum. If the operator+ function is not mathematically associative, the behavior of inclusive_scan may be non-deterministic.


          template<class InputIterator, class OutputIterator,
                   class BinaryOperation>
            OutputIterator
              inclusive_scan(InputIterator first, InputIterator last,
                             OutputIterator result,
                             BinaryOperation binary_op);
          
          template<class InputIterator, class OutputIterator,
                   class T, class BinaryOperation, class T>
            OutputIterator
              inclusive_scan(InputIterator first, InputIterator last,
                             OutputIterator result,
                             T init, BinaryOperation binary_op, T init);

Effects:: Assigns through each iterator i in [result,result + (last - first)) the value of GENERALIZED_NONCOMMUTATIVE_SUM(binary_op, *first, ..., (*first + i - result)*(first + (i - result))) or GENERALIZED_NONCOMMUTATIVE_SUM(binary_op, init, *first, ..., (*first + i - result)*(first + (i - result))) if init is provided.
Returns:: The end of the resulting range beginning at result.
Requires:: binary_op shall not invalidate iterators or subranges, nor modify elements in the ranges [first,last) or [result,result + (last - first)).
Complexity:: O(last - first) applications of binary_op.
Notes:: The primary difference between exclusive_scan and inclusive_scan is that inclusive_scan includes the ith input element in the ith sum. If binary_op is not mathematically associative, the behavior of inclusive_scan may be non-deterministic.

Working Draft, Technical Specification for C++ Extensions for Parallelism

General

Scope

Normative references

Namespaces and headers

Terms and definitions

Execution policies

In general

Header `<experimental/execution_policy>` synopsis

Execution policy type trait

Sequential execution policy

Parallel execution policy

Parallel+Vector execution policy

Dynamic execution policy

`execution_policy` construct/assign

`execution_policy` object access

Execution policy objects

Parallel exceptions

Exception reporting behavior

Header `<experimental/exception_list>` synopsis

Parallel algorithms

In general

Effect of execution policies on algorithm execution

`ExecutionPolicy` algorithm overloads

Definitions

Non-Numeric Parallel Algorithms

Header `<experimental/algorithm>` synopsis

For each

Numeric Parallel Algorithms

Header `<experimental/numeric>` synopsis

Reduce

Exclusive scan

Inclusive scan