Document #: | P3371R1 |

Date: | 2024-09-11 |

Project: | Programming Language C++ LEWG |

Reply-to: |
Mark Hoemmen <mhoemmen@nvidia.com> Ilya Burylov <burylov@gmail.com> |

- 1 Authors
- 2 Revision history
- 3 Abstract
- 4 Discussion and proposed changes
- 4.1 Support both overwriting and updating rank-k and rank-2k updates
- 4.2 Change rank-1 and rank-2 updates to be consistent with rank-k and rank-2k
- 4.3 Constrain alpha in Hermitian
rank-1 and rank-k updates to be noncomplex
- 4.3.1 Scaling factor alpha needs to be noncomplex, else update may be non-Hermitian
- 4.3.2 Nothing wrong with rank-2 or rank-2k updates
- 4.3.3 Nothing wrong with scaling factor beta
- 4.3.4 Nothing wrong with Hermitian matrix-vector and matrix-matrix products
- 4.3.5 What if
`Scalar`

is noncomplex but`conj`

is ADL-findable?

- 4.4 Triangular matrix products, unit
diagonals, and scaling factors
- 4.4.1 BLAS applies alpha after unit diagonal; linalg applies it before
- 4.4.2 Triangular solve algorithms not affected
- 4.4.3 Triangular matrix-vector product work-around
- 4.4.4 Triangular matrix-matrix product example
- 4.4.5 LAPACK never calls
`xTRMM`

with the implicit unit diagonal option and`alpha`

not equal to one - 4.4.6 Fixes would not break backwards compatibility

- 4.5 Triangular solves, unit diagonals, and scaling factors

- 5 Ordering with respect to other proposals and LWG issues
- 6 Acknowledgments
- 7 Wording
- 7.1 New exposition-only concepts for possibly-packed input and output matrices
- 7.2 New exposition-only concept for noncomplex numbers
- 7.3 Rank-1 update functions in synopsis
- 7.4 Rank-2 update functions in synopsis
- 7.5 Rank-k update functions in synopsis
- 7.6 Rank-2k update functions in synopsis
- 7.7 Specification of nonsymmetric rank-1 update functions
- 7.8 Specification of symmetric and Hermitian rank-1 update functions
- 7.9 Specification of symmetric and Hermitian rank-2 update functions
- 7.10 Specification of rank-k update functions
- 7.11 Specification of rank-2k update functions

- Mark Hoemmen (mhoemmen@nvidia.com) (NVIDIA)
- Ilya Burylov (burylov@gmail.com) (NVIDIA)

Revision 0 to be submitted 2024-08-15

Revision 1 to be submitted 2024-09-15

Main wording changes from R0:

Instead of just changing the symmetric and Hermitian rank-k and rank-2k updates to have both overwriting and updating overloads, change

*all*the update functions in this way: rank-1 (general, symmetric, and Hermitian), rank-2, rank-k, and rank-2kFor all the new updating overloads, specify that

`C`

(or`A`

) may alias`E`

For the new symmetric and Hermitian updating overloads, specify that the functions access the new

`E`

parameter in the same way (e.g., with respect to the lower or upper triangle) as the`C`

(or`A`

) parameterAdd exposition-only concept

to constrain a scaling factor to be noncomplex, as needed for Hermitian rank-1 and rank-k functions`noncomplex`

Add Ilya Burylov as coauthor

Change title and abstract to express the wording changes

Add nonwording section explaining why we change rank-1 and rank-2 updates to be consistent with rank-k and rank-2k updates

Add nonwording sections explaining why we don’t change

`hermitian_matrix_vector_product`

,`hermitian_matrix_product`

,`triangular_matrix_product`

, or the triangular solvesAdd nonwording section explaining why we constrain some scaling factors to be noncomplex at compile time, instead of taking a run-time approach

Reorganize and expand nonwording sections

The [linalg] functions `matrix_rank_1_update`

,
`matrix_rank_1_update_c`

,
`symmetric_rank_1_update`

,
`hermitian_rank_1_update`

,
`symmetric_matrix_rank_k_update`

,
`hermitian_matrix_rank_k_update`

,
`symmetric_matrix_rank_2k_update`

, and
`hermitian_matrix_rank_2k_update`

currently have behavior
inconsistent with their corresponding BLAS (Basic Linear Algebra
Subroutines) routines. Also, the behavior of the rank-k and rank-2k
updates is inconsistent with that of `matrix_product`

, even
though in mathematical terms they are special cases of a matrix-matrix
product. We propose three fixes.

Add “updating” overloads to the rank-1, rank-2, rank-k, and rank-2k update functions. The new overloads are analogous to the updating overloads of

`matrix_product`

. For example,`symmetric_matrix_rank_k_update(A, scaled(beta, C), C, upper_triangle)`

will perform*C*:=*β**C*+*A**A*^{T}.Change the behavior of the existing rank-1, rank-2, rank-k, and rank-2k update functions to be “overwriting” instead of “unconditionally updating.” For example,

`symmetric_matrix_rank_k_update(A, C, upper_triangle)`

will perform*C*=*A**A*^{T}instead of*C*:=*C*+*A**A*^{T}.For

`hermitian_rank_1_update`

and`hermitian_rank_k_update`

, constrain the`Scalar`

template parameter (if any) to be noncomplex. This ensures that the update will be mathematically Hermitian. (A constraint is not needed for the rank-2 and rank-2k update functions.)

Items (2) and (3) are breaking changes to the current Working Draft. Thus, we must finish this before finalization of C++26.

Each function in any section whose label begins with “linalg.algs” generally corresponds to one or more routines or functions in the original BLAS (Basic Linear Algebra Subroutines). Every computation that the BLAS can do, a function in the C++ Standard Library should be able to do.

One `std::linalg`

user
reported
an exception to this rule. The BLAS routine `DSYRK`

(Double-precision SYmmetric Rank-K update) computes *C* := *β**C* + *α**A**A*^{T},
but the corresponding `std::linalg`

function
`symmetric_matrix_rank_k_update`

only computes *C* := *C* + *α**A**A*^{T}.
That is, `std::linalg`

currently has no way to express this
BLAS operation with a general *β* scaling factor. This issue applies
to all of the symmetric and Hermitian rank-k and rank-2k update
functions.

`symmetric_matrix_rank_k_update`

: computes*C*:=*C*+*α**A**A*^{T}`hermitian_matrix_rank_k_update`

: computes*C*:=*C*+*α**A**A*^{H}`symmetric_matrix_rank_2k_update`

: computes*C*:=*C*+*α**A**B*^{H}+*α**B**A*^{H}`hermitian_matrix_rank_2k_update`

: computes*C*:=*C*+*α**A**B*^{H}+*ᾱ**B**A*^{H}, where*ᾱ*denotes the complex conjugate of*α*

These functions implement special cases of matrix-matrix products.
The `matrix_product`

function in `std::linalg`

implements the general case of matrix-matrix products. This function
corresponds to the BLAS’s `SGEMM`

, `DGEMM`

,
`CGEMM`

, and `ZGEMM`

, which compute *C* := *β**C* + *α**A**B*,
where *α* and *β* are scaling factors. The
`matrix_product`

function has two kinds of overloads:

*overwriting*(*C*=*A**B*) and*updating*(*C*=*E*+*A**B*).

The updating overloads handle the general *α* and *β* case by
`matrix_product(scaled(alpha, A), B, scaled(beta, C), C)`

.
The specification explicitly permits the input
`scaled(beta, C)`

to alias the output `C`

(**[linalg.algs.blas3.gemm]** 10: “*Remarks*:
`C`

may alias `E`

”). The `std::linalg`

library provides overwriting and updating overloads so that it can do
everything that the BLAS does, just in a more idiomatically C++ way.
Please see
P1673R13
Section 10.3 (“Function argument aliasing and zero scalar
multipliers”) for a more detailed explanation.

The problem with the current symmetric and Hermitian rank-k and
rank-2k functions is that they have the same *calling syntax* as
the overwriting version of `matrix_product`

, but
*semantics* that differ from both the overwriting and the
updating versions of `matrix_product`

. For example,

`(alpha, A, C); hermitian_matrix_rank_k_update`

updates *C* with *C* + *α**A**A*^{H},
but

`(scaled(alpha, A), conjugate_transposed(A), C); matrix_product`

overwrites *C* with *α**A**A*^{H}.
The current rank-k and rank-2k overloads are not overwriting, so we
can’t just fix this problem by introducing an “updating” overload for
each function.

Incidentally, the fact that these functions have “update” in their
name is not relevant, because that naming choice is original to the
BLAS. The BLAS calls its corresponding `xSYRK`

,
`xHERK`

, `xSYR2K`

, and `xHER2K`

routines “{Symmetric, Hermitian} rank {one, two} update,” even though
setting *β* = 0 makes these
routines “overwriting” in the sense of `std::linalg`

.

We propose to fix this by making the functions work just like
`matrix_vector_product`

or `matrix_product`

. This
entails three changes.

Add two new exposition-only concepts

and`possibly-packed-in-matrix`

for constraining input and output parameters of the changed or new symmetric and Hermitian update functions.`possibly-packed-out-matrix`

Add “updating” overloads of the symmetric and Hermitian rank-k and rank-2k update functions.

The updating overloads take a new input matrix parameter

`E`

, analogous to the updating overloads of`matrix_product`

, and make`C`

an output parameter instead of an in/out parameter. For example,`symmetric_matrix_rank_k_update(A, E, C, upper_triangle)`

computes*C*=*E*+*A**A*^{T}.Explicitly permit

`C`

and`E`

to alias, thus permitting the desired case where`E`

is`scaled(beta, C)`

.The updating overloads take

`E`

as a, and take`possibly-packed-in-matrix`

`C`

as a(instead of a`possibly-packed-out-matrix`

).`possibly-packed-inout-matrix`

`E`

must be accessed as a symmetric or Hermitian matrix (depending on the function name) and such accesses must use the same triangle as`C`

. (The existing [linalg.general] 4 wording for symmetric and Hermitian behavior does not cover`E`

.)

Change the behavior of the existing symmetric and Hermitian rank-k and rank-2k overloads to be overwriting instead of updating.

For example,

`symmetric_matrix_rank_k_update(A, C, upper_triangle)`

will compute*C*=*A**A*^{T}instead of*C*:=*C*+*A**A*^{T}.Change

`C`

from ato a`possibly-packed-inout-matrix`

.`possibly-packed-out-matrix`

Items (2) and (3) are breaking changes to the current Working Draft.
This needs to be so that we can provide the overwriting behavior *C* := *α**A**A*^{T}
or *C* := *α**A**A*^{H}
that the corresponding BLAS routines already provide. Thus, we must
finish this before finalization of C++26.

Both sets of overloads still only write to the specified triangle
(lower or upper) of the output matrix `C`

. As a result, the
new updating overloads only read from that triangle of the input matrix
`E`

. Therefore, even though `E`

may be a different
matrix than `C`

, the updating overloads do not need an
additional `Triangle t_E`

parameter for `E`

. The
`symmetric_*`

functions interpret `E`

as symmetric
in the same way that they interpret `C`

as symmetric, and the
`hermitian_*`

functions interpret `E`

as Hermitian
in the same way that they interpret `C`

as Hermitian.
Nevertheless, we do need new wording to explain how the functions may
interpret and access `E`

.

Rank-1 and rank-2 updates currently unconditionally update and do not take a

*β*scaling factor.We propose making all the rank-1 and rank-2 update functions consistent with the proposed change to the rank-k and rank-2k updates. This means both changing the meaning of the current overloads to be overwriting, and adding new overloads that are updating. This includes general (nonsymmetric), symmetric, and Hermitian rank-1 update functions, as well as symmetric and Hermitian rank-2 update functions.

As a result, the exposition-only concept

is no longer needed. We propose removing it.`possibly-packed-inout-matrix`

The rank-k and rank-2k update functions have the following rank-1 and
rank-2 analogs, where *A*
denotes a symmetric or Hermitian matrix (depending on the function’s
name) and *x* and *y* denote vectors.

`symmetric_matrix_rank_1_update`

: computes*A*:=*A*+*α**x**x*^{T}`hermetian_matrix_rank_1_update`

: computes*A*:=*A*+*α**x**x*^{H}`symmetric_matrix_rank_2_update`

: computes*A*:=*A*+*α**x**y*^{T}+*α**y**x*^{T}`hermitian_matrix_rank_2_update`

: computes*A*:=*A*+*α**x**y*^{H}+*ᾱ**x**y*^{H}

These functions *unconditionally* update the matrix *A*. They do not have an overwriting
option. In this, they follow the “general” (not necessarily symmetric or
Hermitian) rank-1 update functions.

`matrix_rank_1_update`

: computes*A*:=*A*+*x**y*^{T}`matrix_rank_1_update_c`

: computes*A*:=*A*+*x**y*^{H}

These six rank-1 and rank-2 update functions map to BLAS routines as follows.

`matrix_rank_1_update`

:`xGER`

`matrix_rank_1_update`

:`xGERC`

`symmetric_matrix_rank_1_update`

:`xSYR`

,`xSPR`

`hermitian_matrix_rank_1_update`

:`xHER`

,`xHPR`

`hermitian_matrix_rank_1_update`

:`xSYR2`

,`xSPR2`

`hermitian_matrix_rank_1_update`

:`xHER2`

,`xHPR2`

The Reference BLAS and the BLAS Standard (see Chapter 2, pp. 64 - 68)
differ here. The Reference BLAS and the original 1988 BLAS 2 paper
specify all of the rank-1 and rank-2 update routines listed above as
unconditionally updating, and not taking a *β* scaling factor. However, the
(2002) BLAS Standard specifies all of these rank-1 and rank-2 update
functions as taking a *β*
scaling factor. We consider the latter to express our design intent. It
is also consistent with the corresponding rank-k and rank-2k update
functions in the BLAS, which all take a *β* scaling factor and thus can do
either overwriting (with zero *β*) or updating (with nonzero *β*). These routines include
`xSYRK`

, `xHERK`

, `xSYR2K`

, and
`xHER2K`

. One could also include the general matrix-matrix
product `xGEMM`

among these, as `xGEMM`

also takes
a *β* scaling factor.

Section
10.3 of P1673R13 explains the three ways that the std::linalg design
translates Fortran `INTENT(INOUT)`

arguments into a C++
idiom.

Provide in-place and not-in-place overloads for triangular solve and triangular multiply.

“Else, if the BLAS function unconditionally updates (like

`xGER`

), we retain read-and-write behavior for that argument.”“Else, if the BLAS function uses a scalar beta argument to decide whether to read the output argument as well as write to it (like

`xGEMM`

), we provide two versions: a write-only version (as if`beta`

is zero), and a read-and-write version (as if`beta`

is nonzero).”

Our design goal was for functions with vector or matrix output to
imitate `std::transform`

as much as possible. This favors Way
(3) as the default approach, which turns `INTENT(INOUT)`

arguments on the Fortran side into separate input and output parameters
on the C++ side. Way (2) is really an awkward special case. The BLAS
Standard effectively eliminates this special case on the Fortran side by
making the rank-1 and rank-2 updates work just like the rank-k and
rank-2k updates, with a *β*
scaling factor. This makes it natural to eliminate the Way (2) special
case on the C++ side as well.

These changes make the exposition-only concept
* possibly-packed-inout-matrix* superfluous. We
propose removing it.

Note that this would not eliminate all uses of the exposition-only
concept * inout-matrix*. The in-place triangular
matrix product functions

`triangular_matrix_left_product`

and
`triangular_matrix_right_product`

, and the in-place
triangular linear system solve functions
`triangular_matrix_matrix_left_solve`

and
`triangular_matrix_matrix_right_solve`

would still use
`inout-matrix`

The C++ Working Draft already has `Scalar alpha`

overloads
of `hermitian_rank_k_update`

. The `Scalar`

type
currently can be complex. However, if `alpha`

has nonzero
imaginary part, then *α**A**A*^{H}
may no longer be a Hermitian matrix, even though *A**A*^{H} is
mathematically always Hermitian. For example, if *A* is the identity matrix (with ones
on the diagonal and zeros elsewhere) and *α* = *i*, then *α**A**A*^{H}
is the diagonal matrix whose diagonal elements are all *i*. While that matrix is symmetric,
it is not Hermitian, because all elements on the diagonal of a Hermitian
matrix must have nonzero imaginary part. The rank-1 update function
`hermitian_rank_1_update`

has the analogous issue.

The BLAS solves this problem by having the Hermitian rank-1 update
routines `xHER`

and rank-k update routines `xHERK`

take the scaling factor *α* as a
noncomplex number. This suggests a fix: For all
`hermitian_rank_1_update`

and
`hermitian_rank_k_update`

overloads that take
`Scalar alpha`

, constrain `Scalar`

so that it is
noncomplex. We can avoid introducing new undefined behavior (or “valid
but unspecified” elements of the output matrix) by making “noncomplex” a
constraint on the `Scalar`

type of `alpha`

.
“Noncomplex” should follow the definition of “noncomplex” used by
* conj-if-needed*: either an arithmetic type, or

`conj(E)`

is not ADL-findable for an expression
`E`

of type `Scalar`

.This issue does *not* arise with the rank-2 or rank-2k
updates. In the BLAS, the rank-2 updates `xHER2`

and the
rank-2k updates `xHER2K`

all take `alpha`

as a
complex number. The matrix *α**A**B*^{H} + *ᾱ**B**A*^{H}
is Hermitian by construction, so there’s no need to impose a
precondition on the value of *α*.

Both the current wording and the proposed changes to all these update
functions behave correctly with respect to `beta`

.

For the new updating overloads of
`hermitian_rank_1_update`

and
`hermitian_rank_k_update`

, [linalg] expresses a
`beta`

scaling factor by letting users supply
`scaled(beta, C)`

as the argument for `E`

. The
wording only requires that `E`

be Hermitian. If
`E`

is `scaled(beta, C)`

, this concerns only the
product of `beta`

and `C`

. It would be incorrect
to constrain `beta`

or `C`

separately. For
example, if *β* = − *i*
and *C* is the matrix whose
elements are all *i*, then *C* is not Hermitian but *β**C* (and therefore
`scaled(beta, C)`

) is Hermitian.

This issue does *not* arise with the rank-2k updates. For
example, the BLAS routine `xHER2K`

takes `beta`

as
a real number. The previous paragraph’s reasoning for `beta`

applies here as well.

This issue also does not arise with the rank-2 updates. In the
Reference BLAS, the rank-2 update routines `xHER2`

do not
have a way to supply `beta`

, while in the BLAS Standard,
`xHER2`

*does* take `beta`

. The BLAS
Standard says that “*α* is a
complex scalar and and [sic] *β*
is a real scalar.” The Fortran 77 and C bindings specify the type of
`beta`

as real (`<rtype>`

resp.
`RSCALAR_IN`

), but the Fortran 95 binding lists both
`alpha`

and `beta`

as
`COMPLEX(<wp>)`

. The type of `beta`

in the
Fortran 95 is likely a typo, considering the wording.

In our view, the current behavior of
`hermitian_matrix_vector_product`

and
`hermitian_matrix_product`

is correct with respect to the
`alpha`

scaling factor. These functions do not need extra
overloads that take `Scalar alpha`

. Users who want to supply
`alpha`

with nonzero imaginary part should *not* scale
the matrix `A`

(as in `scaled(alpha, A)`

).
Instead, they should scale the input vector `x`

, as in the
following.

```
auto alpha = std::complex{0.0, 1.0};
(A, upper_triangle, scaled(alpha, x), y); hermitian_matrix_vector_product
```

In Chapter 2 of the BLAS Standard, both `xHEMV`

and
`xHEMM`

take the scaling factors *α* and *β* as complex numbers
(`COMPLEX<wp>`

, where `<wp>`

represents the current working precision). The BLAS permits
`xHEMV`

or `xHEMM`

to be called with
`alpha`

whose imaginary part is nonzero. The matrix that the
BLAS assumes to be Hermitian is *A*, not *α**A*. Even if *A* is Hermitian, *α**A* might not necessarily be
Hermitian. For example, if *A*
is the identity matrix (diagonal all ones) and *α* is *i*, then *α**A* is not Hermitian but
skew-Hermitian.

The current [linalg] wording requires that the input matrix be
Hermitian. This excludes using `scaled(alpha, A)`

as the
input matrix, where `alpha`

has nonzero imaginary part. For
example, the following gives mathematically incorrect results.

```
auto alpha = std::complex{0.0, 1.0};
(scaled(alpha, A), upper_triangle, x, y); hermitian_matrix_vector_product
```

Note that the behavior of this is still well defined, at least after applying the fix proposed in LWG4136 for diagonal elements with nonzero imaginary part. It does not violate a precondition. Therefore, the Standard has no way to tell the user that they did something wrong.

The current wording permits introducing the scaling factor
`alpha`

through the input vector.

```
auto alpha = std::complex{0.0, 1.0};
(A, upper_triangle, scaled(alpha, x), y); hermitian_matrix_vector_product
```

This is fine as long as *α**A**x* equals *A**α**x*, that is, as
long as *α* commutes with the
elements of A. This issue would only be of concern if those
multiplications might be noncommutative. We’re already well outside the
world of “types that do ordinary arithmetic with
`std::complex`

.” This scenario would only arise given a
user-defined complex type, like
`user_complex<user_noncommutative>`

in the example
below, whose real parts have noncommutative multiplication.

```
template<class T>
class user_complex {
public:
(T re, T im) : re_(re), im_(im) {}
user_complex
// ... overloaded arithmetic operators ...
friend T real(user_complex<T> z) { return z.re_; }
friend T imag(user_complex<T> z) { return z.im_; }
friend user_complex<T> conj(user_complex<T> z) {
return {real(z), -imag(z)};
}
private:
T re_, im_;};
auto alpha = user_complex<user_noncommutative>{something, something_else};
(N, upper_triangle, scaled(alpha, x), y); hermitian_matrix_vector_product
```

The [linalg] library was designed to support element types with
noncommutative multiplication. On the other hand, generally, if we speak
of Hermitian matrices or even of inner products (which are used to
define Hermitian matrices), we’re working in a vector space. This means
that multiplication of the matrix’s elements is commutative. Anything
more general than that is far beyond what the BLAS can do. Thus, we
think restricting use of `alpha`

with nonzero imaginary part
to `scaled(alpha, x)`

is not so onerous.

Many users may not like the status quo of needing to scale
`x`

instead of `A`

. First, it differs from the
BLAS, which puts `alpha`

before `A`

in its
`xHEMV`

and `xHEMM`

function arguments. Second,
users would get no compile-time feedback and likely no run-time feedback
if they scale `A`

instead of `x`

. Their code would
compile and produce correct results for almost all matrix-vector or
matrix-matrix products, *except* for the Hermitian case, and
*only* when the scaling factor has a nonzero imaginary part.
However, we still think the status quo is the least bad choice. We
explain why by discussing the following alternatives.

Treat

`scaled(alpha, A)`

as a special case: expect`A`

to be Hermitian and permit`alpha`

to have nonzero imaginary partForbid

`scaled(alpha, A)`

at compile time, so that users must scale`x`

insteadAdd overloads that take

`alpha`

, analogous to the rank-1 and rank-k update functions

The first choice is mathematically incorrect, as we will explain below. The second is not incorrect, but could only catch some errors. The third is likewise not incorrect, but would add a lot of overloads for an uncommon use case, and would still not prevent users from scaling the matrix in mathematically incorrect ways.

“Treat `scaled(alpha, A)`

as a special case” actually
means three special cases, given some nested accessor type
`Acc`

and a scaling factor `alpha`

of type
`Scalar`

.

`scaled(alpha, A)`

, whose accessor type is`scaled_accessor<Scalar, Acc>`

`conjugated(scaled(alpha, A))`

, whose accessor type is`conjugated_accessor<scaled_accessor<Scalar, Acc>>`

`scaled(alpha, conjugated(A))`

, whose accessor type is`scaled_accessor<Scalar, conjugated_accessor<Acc>>`

One could replace `conjugated`

with
`conjugate_transposed`

(which we expect to be more common in
practice) without changing the accessor type.

This approach violates the fundamental [linalg] principle that “…
each `mdspan`

parameter of a function behaves as itself and
is not otherwise ‘modified’ by other parameters” (Section 10.2.5,
P1673R13). The behavior of [linalg] is agnostic of specific accessor
types, even though implementations likely have optimizations for
specific accessor types. [linalg] takes this approach for consistency,
even where it results in different behavior from the BLAS (see Section
10.5.2 of P1673R13). The application of this principle here is “the
input parameter `A`

is always Hermitian.”

In this case, the consistency matters for mathematical correctness.
What if `scaled(alpha, A)`

is Hermitian, but `A`

itself is not? An example is *α* = − *i* and *A* is the 2 x 2 matrix whose elements
are all *i*. If we treat
`scaled_accessor`

as a special case, then
`hermitian_matrix_vector_product`

will compute different
results.

Another problem is that users are permitted to define their own
accessor types, including nested accessors. Arbitrary user accessors
might have `scaled_accessor`

somewhere in that nesting, or
they might have the *effect* of `scaled_accessor`

without being called that. Thus, we can’t detect all possible ways that
the matrix might be scaled.

Hermitian matrix-matrix and matrix-vector product functions could
instead *forbid* scaling the matrix at compile time, at least for
the three accessor type cases listed in the previous option.

`scaled_accessor<Scalar, Acc>`

`conjugated_accessor<scaled_accessor<Scalar, Acc>>`

`scaled_accessor<Scalar, conjugated_accessor<Acc>>`

Doing this would certainly encourage correct behavior for the most
common cases. However, as mentioned above, users are permitted to define
their own accessor types, including nested accessors. Thus, we can’t
detect all ways that an arbitrary, possibly user-defined accessor might
scale the matrix. Furthermore, scaling the matrix might still be
mathematically correct. A scaling factor with nonzero imaginary part
might even *make* the matrix Hermitian. Applying *i* as a scaling factor twice would
give a perfectly valid scaling factor of − 1.

One could imagine adding overloads that take `alpha`

,
analogous to the rank-1 and rank-k update overloads that take
`alpha`

. Users could then write

`(alpha, A, upper_triangle, x, y); hermitian_matrix_vector_product`

instead of

`(A, upper_triangle, scaled(alpha, x), y); hermitian_matrix_vector_product`

We do not support this approach. First, it would introduce many
overloads, without adding to what the existing overloads can accomplish.
(Users can already introduce the scaling factor `alpha`

through `x`

.) Contrast this with the rank-1 and rank-k
Hermitian update functions, where not having `alpha`

overloads might break simple cases. Here are two examples.

If the matrix and vector element types and the scaling factor

*α*= 2 are all integers, then the update*C*:=*C*− 2*x**x*^{H}can be computed using integer types and arithmetic with an`alpha`

overload. However, without an`alpha`

overload, the user would need to use`scaled(sqrt(2), x)`

as the input vector, thus forcing use of floating-point arithmetic that may give inexact results.The update

*C*:=*C*−*x**x*^{H}would require resorting to complex arithmetic, as the only way to express −*x**x*^{H}with the same scaling factor for both vectors is (*i**x*)(*i**x*)^{H}.

Second, `alpha`

overloads would not prevent users from
*also* supplying `scaled(gamma, A)`

as the matrix for
some other scaling factor `gamma`

. Thus, instead of solving
the problem, the overloads would introduce more possibilities for
errors.

`Scalar`

is noncomplex but `conj`

is ADL-findable?Our proposed change defines a “noncomplex number” at compile time. We
say that complex numbers have `conj`

that is findable by ADL,
and noncomplex numbers are either arithmetic types or do not have an
ADL-findable `conj`

. We choose this definition because it is
the same one that we use to define the behavior of
`conjugated_accessor`

(and also `conjugated`

, if
P3050 is adopted). It also is the C++ analog of what the BLAS does,
namely specify the type of the `alpha`

argument as real
instead of complex.

This definition is conservative, because it excludes complex numbers
with zero imaginary part. For `conjugated_accessor`

and
`conjugated`

, this does not matter; the class and function
behave the same from the user’s perspective. The exposition-only
function * conj-if-needed* specifically exists so
that

`conjugated_accessor`

and `conjugated`

do not
change their input `mdspan`

’s `value_type`

.
However, for the rank-1 and rank-k Hermitian update functions affected
by this proposal, constraining `Scalar alpha`

at compile time
to be noncomplex prevents users from calling those functions with a
“complex” number `alpha`

whose imaginary part is zero.This matters if the user defines a number type `Real`

that
is meant to represent noncomplex numbers, but nevertheless has an
ADL-findable `conj`

, thus making it a “complex” number type
from the perspective of [linalg] functions. There are two ways users
might define `conj(Real)`

.

*Imitating*`std::complex`

: Users might define a complex number type`UserComplex`

whose real and imaginary parts have type`Real`

, and then imitate the behavior of`std::conj(double)`

by defining`UserComplex conj(Real x)`

to return a`UserComplex`

number with real part`x`

and imaginary part zero.*Type-preserving*:`Real conj(Real x)`

returns`x`

.

Option (1) would be an unfortunate choice. [linalg] defines
* conj-if-needed* specifically to fix the problem
that

`std::conj(double)`

returns
`std::complex<double>`

instead of `double`

.
However, Option (2) would be a reasonable thing for users to do,
especially if they have designed custom number types without [linalg] in
mind. One could accommodate such users by relaxing the constraint on
`Scalar`

and taking one of the following two approaches.Adding a precondition that

`imag-if-needed`

`(alpha)`

equals`Scalar{}`

Imitating LWG 4136, by defining the scaling factor to be

`real-if-needed`

`(alpha)`

instead of`alpha`

We did not take Approach (1), because adding a precondition decreases
safety by adding undefined behavior. It also forces users to add
run-time checks. Defining those checks correctly for generic, possibly
but not necessarily complex number types would be challenging. We did
not take Approach (2) because its behavior would deviate from the BLAS,
which requires the scaling factor `alpha`

to be noncomplex at
compile time.

In BLAS, triangular matrix-vector and matrix-matrix products apply

`alpha`

scaling to the implicit unit diagonal. In [linalg], the scaling factor`alpha`

is not applied to the implicit unit diagonal. This is because the library does not interpret`scaled(alpha, A)`

differently than any other`mdspan`

.Users of triangular matrix-vector products can recover BLAS functionality by scaling the input vector instead of the input matrix, so this only matters for triangular matrix-matrix products.

All calls of the BLAS’s triangular matrix-matrix product routine

`xTRMM`

in LAPACK (other than in testing routines) use`alpha`

equal to one.Straightforward approaches for fixing this issue would not break backwards compatibility.

Therefore, we do not consider fixing this a high-priority issue, and we do not propose a fix for it in this paper.

The `triangular_matrix_vector_product`

and
`triangular_matrix_product`

algorithms have an
`implicit_unit_diagonal`

option. This makes the algorithm not
access the diagonal of the matrix, and compute as if the diagonal were
all ones. The option corresponds to the BLAS’s “Unit” flag. BLAS
routines that take both a “Unit” flag and an `alpha`

scaling
factor apply “Unit” *before* scaling by `alpha`

, so
that the matrix is treated as if it has a diagonal of all
`alpha`

values. In contrast, [linalg] follows the general
principle that `scaled(alpha, A)`

should be treated like any
other kind of `mdspan`

. As a result, algorithms interpret
`implicit_unit_diagonal`

as applied to the matrix
*after* scaling by `alpha`

, so that the matrix still
has a diagonal of all ones.

The triangular solve algorithms in std::linalg are not affected,
because their BLAS analogs either do not take an `alpha`

argument (as with `xTRSV`

), or the `alpha`

argument does not affect the triangular matrix (with `xTRSM`

,
`alpha`

affects the right-hand sides `B`

, not the
triangular matrix `A`

).

This issue only reduces functionality of
`triangular_matrix_product`

. Users of
`triangular_matrix_vector_product`

who wish to replicate the
original BLAS functionality can scale the input matrix (by supplying
`scaled(alpha, x)`

instead of `x`

as the input
argument) instead of the triangular matrix.

The following example computes *A* := 2*A**B* where
*A* is a lower triangular
matrix, but it makes the diagonal of *A* all ones on the input (right-hand)
side.

`(scaled(2.0, A), lower_triangle, implicit_unit_diagonal, B, A); triangular_matrix_product`

Contrast with the analogous BLAS routine `DTRMM`

, which
has the effect of making the diagonal elements all `2.0`

.

`'Left side', 'Lower triangular', 'No transpose', 'Unit diagonal', m, n, 2.0, A, lda, B, ldb) dtrmm(`

If we want to use [linalg] to express what the BLAS call expresses, we need to perform a separate scaling.

```
(A, lower_triangle, implicit_unit_diagonal, B, A);
triangular_matrix_product(2.0, A); scale
```

This is counterintuitive, and may also affect performance.
Performance of `scale`

is typically bound by memory bandwidth
and/or latency, but if the work done by `scale`

could be
fused with the work done by the `triangular_matrix_product`

,
then `scale`

’s memory operations could be “hidden” in the
cost of the matrix product.

`xTRMM`

with the implicit unit diagonal option and
`alpha`

not equal to oneHow much might users care about this missing [linalg] feature? P1673R13 explains that the BLAS was codesigned with LAPACK and that every reference BLAS routine is used by some LAPACK routine. “The BLAS does not aim to provide a complete set of mathematical operations. Every function in the BLAS exists because some LINPACK or LAPACK algorithm needs it” (Section 10.6.1). Therefore, to judge the urgency of adding new functionality to [linalg], we can ask whether the functionality would be needed by a C++ re-implementation of LAPACK. We think not much, because the highest-priority target audience of the BLAS is LAPACK developers, and LAPACK routines (other than testing routines) never use a scaling factor alpha other than one.

We survey calls to `xTRMM`

in the latest version of LAPACK
as of the publication date of R1 of this proposal, LAPACK 3.12.0. It
suffices to survey `DTRMM`

, the double-precision real case,
since for all the routines of interest, the complex case follows the
same pattern. (We did survey `ZTRMM`

, the double-precision
complex case, just in case.) LAPACK has 24 routines that call
`DTRMM`

directly. They fall into five categories.

Test routines:

`DCHK3`

,`DCHKE`

,`DLARHS`

Routines relating to QR factorization or using the result of a QR factorization (especially with block Householder reflectors):

`DGELQT3`

,`DLARFB`

,`DGEQRT3`

,`DLARFB_GETT`

,`DLARZB`

,`DORM22`

Routines relating to computing an inverse of a triangular matrix or of a matrix that has been factored into triangular matrices:

`DLAUUM`

,`DTRITRI`

,`DTFTRI`

,`DPFTRI`

Routines relating to solving eigenvalue (or generalized eigenvalue) problems:

`DLAHR2`

,`DSYGST`

,`DGEHRD`

,`DSYGV`

,`DSYGV_2STAGE`

,`DSYGVD`

,`DSYGVX`

(note that`DLAQR5`

depends on`DTRMM`

via`EXTERNAL`

declaration, but doesn’t actually call it)Routines relating to symmetric indefinite factorizations:

`DSYT01_AA`

,`DSYTRI2X`

,`DSYTRI_3X`

The only routines that call `DTRMM`

with
`alpha`

equal to anything other than one or negative one are
the testing routines. Some calls in `DGELQT3`

and
`DLARFB_GETT`

use negative one, but these calls never specify
an implicit unit diagonal (they use the explicit diagonal option). The
only routine that might possibly call `DTRMM`

with both
negative one as alpha and the implicit unit diagonal is
`DTFTRI`

. (This routine “computes the inverse of a triangular
matrix A stored in RFP [Rectangular Full Packed] format.” RFP format was
introduced to LAPACK in the late 2000’s, well after the BLAS Standard
was published. See
LAPACK
Working Note 199, which was published in 2008.) `DTFTRI`

passes its caller’s `diag`

argument (which specifies either
implicit unit diagonal or explicit diagonal) to `DTRMM`

. The
only two LAPACK routines that call `DTFTRI`

are
`DERRRFP`

(a testing routine) and `DPFTRI`

.
`DPFTRI`

only ever calls `DTFTRI`

with
`diag`

*not* specifying the implicit unit diagonal
option. Therefore, LAPACK never needs both `alpha`

not equal
to one and the implicit unit diagonal option, so adding the ability to
“scale the implicit diagonal” in [linalg] is a low-priority feature.

We can think of two ways to fix this issue. First, we could add an
`alpha`

scaling parameter, analogous to the symmetric and
Hermitian rank-1 and rank-k update functions. Second, we could add a new
kind of `Diagonal`

template parameter type that expresses a
“diagonal value.” For example, `implicit_diagonal_t{alpha}`

(or a function form, `implicit_diagonal(alpha)`

) would tell
the algorithm not to access the diagonal elements, but instead to assume
that their value is `alpha`

. Both of these solutions would
let users specify the diagonal’s scaling factor separately from the
scaling factor for the rest of the matrix. Those two scaling factors
could differ, which is new functionality not offered by the BLAS. More
importantly, both of these solutions could be added later, after C++26,
without breaking backwards compatibility.

In BLAS, triangular solves with possibly multiple right-hand sides (

`xTRSM`

) apply`alpha`

scaling to the implicit unit diagonal. In [linalg], the scaling factor`alpha`

is not applied to the implicit unit diagonal. This is because the library does not interpret`scaled(alpha, A)`

differently than any other`mdspan`

.Users of triangular solves would need a separate

`scale`

call to recover BLAS functionality.LAPACK sometimes calls

`xTRSM`

with`alpha`

not equal to one.Straightforward approaches for fixing this issue would not break backwards compatibility.

Therefore, we do not consider fixing this a high-priority issue, and we do not propose a fix for it in this paper.

Triangular solves have a similar issue to the one explained in the
previous section. The BLAS routine `xTRSM`

applies
`alpha`

“after” the implicit unit diagonal, while std::linalg
applies `alpha`

“before.” (`xTRSV`

does not take
an `alpha`

scaling factor.) As a result, the BLAS solves with
a different matrix than std::linalg.

In mathematical terms, `xTRSM`

solves the equation *α*(*A*+*I*)*X* = *B*
for *X*, where *A* is the user’s input matrix
(without implicit unit diagonal) and *I* is the identity matrix (with ones
on the diagonal and zeros everywhere else).
`triangular_matrix_matrix_left_solve`

solves the equation
(*α**A*+*I*)*Y* = *B*
for *Y*. The two results *X* and *Y* are not equal in general.

Users could work around this problem by first scaling the matrix
*A* by *α*, and then solving for *Y*. In the common case where the
“other triangle” of *A* holds
another triangular matrix, users could not call
`scale(alpha, A)`

. They would instead need to iterate over
the elements of *A* manually.
Users might also need to “unscale” the matrix after the solve. Another
option would be to copy the matrix *A* before scaling.

```
for (size_t i = 0; i < A.extent(0); ++i) {
for (size_t j = i + 1; j < A.extent(1); ++j) {
[i, j] *= alpha;
A}
}
(A, lower_triangle, implicit_unit_diagonal, B, Y);
triangular_matrix_matrix_left_solvefor (size_t i = 0; i < A.extent(0); ++i) {
for (size_t j = i + 1; j < A.extent(1); ++j) {
[i, j] /= alpha;
A}
}
```

Users cannot solve this problem by scaling *B* (either with
`scaled(1.0 / alpha, B)`

or with
`scale(1.0 / alpha, B)`

). Transforming *X* into *Y* or vice versa is mathematically
nontrivial in general, and may introduce new failure conditions. This
issue occurs with both the in-place and out-of-place triangular
solves.

The common case in LAPACK is calling `xTRSM`

with
`alpha`

equal to one, but other values of `alpha`

occur. For example, `xTRTRI`

calls `xTRSM`

with
`alpha`

equal to − 1. Thus,
we cannot dismiss this issue, as we could with `xTRMM`

.

As with triangular matrix products above, we can think of two ways to
fix this issue. First, we could add an `alpha`

scaling
parameter, analogous to the symmetric and Hermitian rank-1 and rank-k
update functions. Second, we could add a new kind of
`Diagonal`

template parameter type that expresses a “diagonal
value.” For example, `implicit_diagonal_t{alpha}`

(or a
function form, `implicit_diagonal(alpha)`

) would tell the
algorithm not to access the diagonal elements, but instead to assume
that their value is `alpha`

. Both of these solutions would
let users specify the diagonal’s scaling factor separately from the
scaling factor for the rest of the matrix. Those two scaling factors
could differ, which is new functionality not offered by the BLAS. More
importantly, both of these solutions could be added later, after C++26,
without breaking backwards compatibility.

We currently have two other `std::linalg`

fix papers in
review.

P3222: Fix C++26 by adding

`transposed`

special cases for P2642 layouts (forwarded by LEWG to LWG on 2024-08-27 pending electronic poll results)P3050: “Fix C++26 by optimizing

`linalg::conjugated`

for noncomplex value types” (forwarded by LEWG to LWG on 2024-09-03 pending electronic poll results)

LEWG was aware of these two papers and this pending paper P3371 in
its 2024-09-03 review of P3050R2. All three of these papers increment
the value of the `__cpp_lib_linalg`

macro. While this
technically causes a conflict between the papers, advice from LEWG on
2024-09-03 was not to introduce special wording changes to avoid this
conflict.

We also have two outstanding LWG issues.

LWG4136 specifies the behavior of Hermitian algorithms on diagonal matrix elements with nonzero imaginary part. (As the BLAS Standard specifies and the Reference BLAS implements, the Hermitian algorithms do not access the imaginary parts of diagonal elements, and assume they are zero.) In our view, P3371 does not conflict with LWG4136.

LWG4137, “Fix Mandates, Preconditions, and Complexity elements of [linalg] algorithms,” affects several sections touched by this proposal, including [linalg.algs.blas3.rankk] and [linalg.algs.blas3.rank2k]. We consider P3371 rebased atop the wording changes proposed by LWG4137. While the wording changes may conflict in a formal (“diff”) sense, it is our view that they do not conflict in a mathematical or specification sense.

Many thanks (with permission) to Raffaele Solcà (CSCS Swiss National
Supercomputing Centre, `raffaele.solca@cscs.ch`

) for pointing
out some of the issues fixed by this paper, as well as the issues
leading to LWG4137.

Text in blockquotes is not proposed wording, but rather instructions for generating proposed wording. The � character is used to denote a placeholder section number which the editor shall determine.

In

[version.syn], for the following definition,

`#define __cpp_lib_linalg YYYYMML // also in <linalg>`

adjust the placeholder value

`YYYYMML`

as needed so as to denote this proposal’s date of adoption.

Add the following lines to the header

`<linalg>`

synopsis[linalg.syn], just after the declaration of the exposition-only conceptand before the declaration of the exposition-only concept`inout-matrix`

.`possibly-packed-inout-matrix`

[Editorial Note:This addition is not shown in green, becuase the authors could not convince Markdown to format the code correctly.– end note]

```
template<class T>
concept
```*possibly-packed-in-matrix* = *see below*; // *exposition only*
template<class T>
concept *possibly-packed-out-matrix* = *see below*; // *exposition only*

Then, remove the declaration of the exposition-only concept

from the header`possibly-packed-inout-matrix`

`<linalg>`

synopsis[linalg.syn].

Then, add the following lines to

[linalg.helpers.concepts], just after the definition of the exposition-only variable templateand just before the definition of the exposition-only concept`is-layout_blas_packed`

.`possibly-packed-inout-matrix`

[Editorial Note:This addition is not shown in green, becuase the authors could not convince Markdown to format the code correctly.– end note]

```
template<class T>
concept
```*possibly-packed-in-matrix* =
*is-mdspan*<T> && T::rank() == 2 &&
(T::is_always_unique() || *is-layout-blas-packed*<typename T::layout_type>);
template<class T>
concept *possibly-packed-out-matrix* =
*is-mdspan*<T> && T::rank() == 2 &&
<typename T::reference, typename T::element_type> &&
is_assignable_v(T::is_always_unique() || *is-layout-blas-packed*<typename T::layout_type>);

Then, remove the definition of the exposition-only concept

from`possibly-packed-inout-matrix`

[linalg.helpers.concepts].

In the Header

`<linalg>`

synopsis [linalg.syn], at the end of the section started by the following comment:

`// [linalg.helpers.concepts], linear algebra argument concepts`

,add the following declaration of the exposition-only concept

.`noncomplex`

[Editorial Note:This addition is not shown in green, becuase the authors could not convince Markdown to format the code correctly.– end note]

```
template<class T>
concept
```*noncomplex* = *see below*; // exposition only

In [linalg.helpers.concepts], change paragraph 3 to read as follows (new content in green).

Unless explicitly permitted, any * inout-vector*,

`inout-matrix`

`inout-object`

`out-vector`

`out-matrix`

`out-object`

`possibly-packed-out-matrix`

`possibly-packed-inout-matrix`

`mdspan`

parameter of the function.Append the following to the end of [linalg.helpers.concepts].

[Editorial Note:These additions are not shown in green, becuase the authors could not convince Markdown to format the code correctly.– end note]

```
template<class T>
concept
```*noncomplex* = *see below*;

4
A type `T`

models * noncomplex* if

`T`

is a linear algebra value type, and either(4.1)
`T`

is not an arithmetic type, or

(4.2) the
expression `conj(E)`

is not valid, with overload resolution
performed in a context that includes the declaration
`template<class T> T conj(const T&) = delete;`

.

In the header

`<linalg>`

synopsis[linalg.syn], replace all the declarations of all the`matrix_rank_1_update`

,`matrix_rank_1_update_c`

,`symmetric_matrix_rank_1_update`

, and`hermitian_matrix_rank_1_update`

overloads to read as follows.[Editorial Note:There are three changes here. First, the existing overloads become “overwriting” overloads. Second, new “updating” overloads are added. Third, the`hermitian_rank_1_update`

functions that take an`alpha`

parameter now constrain`alpha`

to be.`noncomplex`

Changes do not use red or green highlighting, becuase the authors could not convince Markdown to format the code correctly.

– end note]

```
// [linalg.algs.blas2.rank1], nonsymmetric rank-1 matrix update
// overwriting nonsymmetric rank-1 matrix update
template<
```*in-vector* InVec1, *in-vector* InVec2, *out-matrix* OutMat>
void matrix_rank_1_update(InVec1 x, InVec2 y, OutMat A);
template<class ExecutionPolicy, *in-vector* InVec1, *in-vector* InVec2, *out-matrix* OutMat>
void matrix_rank_1_update(ExecutionPolicy&& exec,
);
InVec1 x, InVec2 y, OutMat A
template<*in-vector* InVec1, *in-vector* InVec2, *out-matrix* OutMat>
void matrix_rank_1_update_c(InVec1 x, InVec2 y, OutMat A);
template<class ExecutionPolicy, *in-vector* InVec1, *in-vector* InVec2, *out-matrix* OutMat>
void matrix_rank_1_update_c(ExecutionPolicy&& exec,
);
InVec1 x, InVec2 y, OutMat A
// updating nonsymmetric rank-1 matrix update
template<*in-vector* InVec1, *in-vector* InVec2, *in-matrix* InMat, *out-matrix* OutMat>
void matrix_rank_1_update(InVec1 x, InVec2 y, InMat E, OutMat A);
template<class ExecutionPolicy, *in-vector* InVec1, *in-matrix* InMat, *in-vector* InVec2, *out-matrix* OutMat>
void matrix_rank_1_update(ExecutionPolicy&& exec,
);
InVec1 x, InVec2 y, InMat E, OutMat A
template<*in-vector* InVec1, *in-vector* InVec2, *in-matrix* InMat, *out-matrix* OutMat>
void matrix_rank_1_update_c(InVec1 x, InVec2 y, InMat E, OutMat A);
template<class ExecutionPolicy, *in-vector* InVec1, *in-vector* InVec2, *in-matrix* InMat, *out-matrix* OutMat>
void matrix_rank_1_update_c(ExecutionPolicy&& exec,
);
InVec1 x, InVec2 y, InMat E, OutMat A
// [linalg.algs.blas2.symherrank1], symmetric or Hermitian rank-1 matrix update
// overwriting symmetric rank-1 matrix update
template<class Scalar, *in-vector* InVec, *possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t);
template<class ExecutionPolicy,
class Scalar, *in-vector* InVec, *possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
);
Scalar alpha, InVec x, OutMat A, Triangle ttemplate<*in-vector* InVec, *possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_1_update(InVec x, OutMat A, Triangle t);
template<class ExecutionPolicy,
*in-vector* InVec, *possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
);
InVec x, OutMat A, Triangle t
// updating symmetric rank-1 matrix update
template<class Scalar, *in-vector* InVec, *possibly-packed-in-matrix* InMat, *possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
template<class ExecutionPolicy,
class Scalar, *in-vector* InVec, *possibly-packed-in-matrix* InMat, *possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
);
Scalar alpha, InVec x, InMat E, OutMat A, Triangle ttemplate<*in-vector* InVec, *possibly-packed-in-matrix* InMat, *possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_1_update(InVec x, InMat E, OutMat A, Triangle t);
template<class ExecutionPolicy,
*in-vector* InVec, *possibly-packed-in-matrix* InMat, *possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
);
InVec x, InMat E, OutMat A, Triangle t
// overwriting Hermitian rank-1 matrix update
template<*noncomplex* Scalar, *in-vector* InVec, *possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t);
template<class ExecutionPolicy,
*noncomplex* Scalar, *in-vector* InVec, *possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
);
Scalar alpha, InVec x, OutMat A, Triangle ttemplate<*in-vector* InVec, *possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_1_update(InVec x, OutMat A, Triangle t);
template<class ExecutionPolicy,
*in-vector* InVec, *possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
);
InVec x, OutMat A, Triangle t
// updating Hermitian rank-1 matrix update
template<*noncomplex* Scalar, *in-vector* InVec, *possibly-packed-in-matrix* InMat, *possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
template<class ExecutionPolicy,
*noncomplex* Scalar, *in-vector* InVec, *possibly-packed-in-matrix* InMat, *possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
);
Scalar alpha, InVec x, InMat E, OutMat A, Triangle ttemplate<*in-vector* InVec, *possibly-packed-in-matrix* InMat, *possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_1_update(InVec x, InMat E, OutMat A, Triangle t);
template<class ExecutionPolicy,
*in-vector* InVec, *possibly-packed-in-matrix* InMat, *possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
); InVec x, InMat E, OutMat A, Triangle t

In the header

`<linalg>`

synopsis[linalg.syn], replace all the declarations of all the`symmetric_matrix_rank_2_update`

and`hermitian_matrix_rank_2_update`

overloads to read as follows.[Editorial Note:These additions are not shown in green, becuase the authors could not convince Markdown to format the code correctly.– end note]

```
// [linalg.algs.blas2.rank2], symmetric and Hermitian rank-2 matrix updates
// overwriting symmetric rank-2 matrix update
template<
```*in-vector* InVec1, *in-vector* InVec2,
*possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_2_update(InVec1 x, InVec2 y, OutMat A, Triangle t);
template<class ExecutionPolicy, *in-vector* InVec1, *in-vector* InVec2,
*possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_2_update(ExecutionPolicy&& exec,
);
InVec1 x, InVec2 y, OutMat A, Triangle t
// updating symmetric rank-2 matrix update
template<*in-vector* InVec1, *in-vector* InVec2,
*possibly-packed-in-matrix* InMat,
*possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_2_update(InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
template<class ExecutionPolicy, *in-vector* InVec1, *in-vector* InVec2,
*possibly-packed-in-matrix* InMat,
*possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_2_update(ExecutionPolicy&& exec,
);
InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t
// overwriting Hermitian rank-2 matrix update
template<*in-vector* InVec1, *in-vector* InVec2,
*possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_2_update(InVec1 x, InVec2 y, OutMat A, Triangle t);
template<class ExecutionPolicy, *in-vector* InVec1, *in-vector* InVec2,
*possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_2_update(ExecutionPolicy&& exec,
);
InVec1 x, InVec2 y, OutMat A, Triangle t
// updating Hermitian rank-2 matrix update
template<*in-vector* InVec1, *in-vector* InVec2,
*possibly-packed-in-matrix* InMat,
*possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_2_update(InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
template<class ExecutionPolicy, *in-vector* InVec1, *in-vector* InVec2,
*possibly-packed-in-matrix* InMat,
*possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_2_update(ExecutionPolicy&& exec,
); InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t

In the header

`<linalg>`

synopsis[linalg.syn], replace all the declarations of all the`symmetric_matrix_rank_k_update`

and`hermitian_matrix_rank_k_update`

overloads to read as follows.[Editorial Note:These additions are not shown in green, becuase the authors could not convince Markdown to format the code correctly.– end note]

```
// [linalg.algs.blas3.rankk], rank-k update of a symmetric or Hermitian matrix
// overwriting symmetric rank-k matrix update
template<class Scalar,
```*in-matrix* InMat,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_k_update(
);
Scalar alpha, InMat A, OutMat C, Triangle ttemplate<class ExecutionPolicy, class Scalar,
*in-matrix* InMat,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_k_update(
&& exec, Scalar alpha, InMat A, OutMat C, Triangle t);
ExecutionPolicytemplate<*in-matrix* InMat,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_k_update(
);
InMat A, OutMat C, Triangle ttemplate<class ExecutionPolicy,
*in-matrix* InMat,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_k_update(
&& exec, InMat A, OutMat C, Triangle t);
ExecutionPolicy
// updating symmetric rank-k matrix update
template<class Scalar,
*in-matrix* InMat1,
*possibly-packed-in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_k_update(
Scalar alpha,);
InMat1 A, InMat2 E, OutMat C, Triangle ttemplate<class ExecutionPolicy, class Scalar,
*in-matrix* InMat1,
*possibly-packed-in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_k_update(
&& exec, Scalar alpha,
ExecutionPolicy);
InMat1 A, InMat2 E, OutMat C, Triangle ttemplate<*in-matrix* InMat1,
*possibly-packed-in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_k_update(
);
InMat1 A, InMat2 E, OutMat C, Triangle ttemplate<class ExecutionPolicy,
*in-matrix* InMat1,
*possibly-packed-in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_k_update(
&& exec,
ExecutionPolicy);
InMat1 A, InMat2 E, OutMat C, Triangle t
// overwriting rank-k Hermitian matrix update
template<*noncomplex* Scalar,
*in-matrix* InMat,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_k_update(
);
Scalar alpha, InMat A, OutMat C, Triangle ttemplate<class ExecutionPolicy, *noncomplex* Scalar,
*in-matrix* InMat,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_k_update(
&& exec, Scalar alpha, InMat A, OutMat C, Triangle t);
ExecutionPolicytemplate<*in-matrix* InMat,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_k_update(
);
InMat A, OutMat C, Triangle ttemplate<class ExecutionPolicy,
*in-matrix* InMat,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_k_update(
&& exec, InMat A, OutMat C, Triangle t);
ExecutionPolicy
// updating rank-k Hermitian matrix update
template<*noncomplex* Scalar,
*in-matrix* InMat1,
*possibly-packed-in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_k_update(
Scalar alpha,);
InMat1 A, InMat2 E, OutMat C, Triangle ttemplate<class ExecutionPolicy, *noncomplex* Scalar,
*in-matrix* InMat1,
*possibly-packed-in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_k_update(
&& exec, Scalar alpha,
ExecutionPolicy);
InMat1 A, InMat2 E, OutMat C, Triangle ttemplate<*in-matrix* InMat1,
*possibly-packed-in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_k_update(
);
InMat1 A, InMat2 E, OutMat C, Triangle ttemplate<class ExecutionPolicy,
*in-matrix* InMat1,
*possibly-packed-in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_k_update(
&& exec,
ExecutionPolicy); InMat1 A, InMat2 E, OutMat C, Triangle t

In the header

`<linalg>`

synopsis[linalg.syn], replace all the declarations of all the`symmetric_matrix_rank_2k_update`

and`hermitian_matrix_rank_2k_update`

overloads to read as follows.[Editorial Note:These additions are not shown in green, becuase the authors could not convince Markdown to format the code correctly.– end note]

```
// [linalg.algs.blas3.rank2k], rank-2k update of a symmetric or Hermitian matrix
// overwriting symmetric rank-2k matrix update
template<
```*in-matrix* InMat1, *in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, OutMat C, Triangle t);
template<class ExecutionPolicy,
*in-matrix* InMat1, *in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec,
);
InMat1 A, InMat2 B, OutMat C, Triangle t
// updating symmetric rank-2k matrix update
template<*in-matrix* InMat1, *in-matrix* InMat2,
*possibly-packed-in-matrix* InMat3,
*possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_2k_update(InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
template<class ExecutionPolicy,
*in-matrix* InMat1, *in-matrix* InMat2,
*possibly-packed-in-matrix* InMat3,
*possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_2k_update(ExecutionPolicy&& exec,
);
InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t
// overwriting Hermitian rank-2k matrix update
template<*in-matrix* InMat1, *in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, OutMat C, Triangle t);
template<class ExecutionPolicy,
*in-matrix* InMat1, *in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec,
);
InMat1 A, InMat2 B, OutMat C, Triangle t
// updating Hermitian rank-2k matrix update
template<*in-matrix* InMat1, *in-matrix* InMat2,
*possibly-packed-in-matrix* InMat3,
*possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_2k_update(InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t);
template<class ExecutionPolicy,
*in-matrix* InMat1, *in-matrix* InMat2,
*possibly-packed-in-matrix* InMat3,
*possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_2k_update(ExecutionPolicy&& exec,
); InMat1 A, InMat2 B, InMat3 E, OutMat C, Triangle t

Replace the entire contents of [linalg.algs.blas2.rank1] with the following.

1 The following elements apply to all functions in [linalg.algs.blas2.rank1].

2
*Mandates*:

(2.1)
`possibly-multipliable`

`<OutMat, InVec2, InVec1>()`

is `true`

, and

(2.2)
`possibly-addable`

`(A, E, A)`

is
`true`

for those overloads that take an `E`

parameter.

3
*Preconditions*:

(3.1)
`multipliable(A, y, x)`

is `true`

, and

(3.2)
`addable`

`(A, E, A)`

is `true`

for those overloads that take an `E`

parameter.

4
*Complexity*: *O*(
`x.extent(0)`

× `y.extent(0)`

).

`template<`*in-vector* InVec1, *in-vector* InVec2, *out-matrix* OutMat>
void matrix_rank_1_update(InVec1 x, InVec2 y, OutMat A);
template<class ExecutionPolicy, *in-vector* InVec1, *in-vector* InVec2, *out-matrix* OutMat>
void matrix_rank_1_update(ExecutionPolicy&& exec, InVec1 x, InVec2 y, OutMat A);

5 These functions perform a overwriting nonsymmetric nonconjugated rank-1 update.

*[Note:* These functions correspond to the BLAS functions
`xGER`

(for real element types) and `xGERU`

(for
complex element types)[bib]. *– end note]*

6
*Effects*: Computes *A* = *x**y*^{T}.

`template<`*in-vector* InVec1, *in-vector* InVec2, *in-matrix* InMat, *out-matrix* OutMat>
void matrix_rank_1_update(InVec1 x, InVec2 y, InMat E, OutMat A);
template<class ExecutionPolicy, *in-vector* InVec1, *in-vector* InVec2, *in-matrix* InMat, *out-matrix* OutMat>
void matrix_rank_1_update(ExecutionPolicy&& exec, InVec1 x, InVec2 y, InMat E, OutMat A);

7 These functions perform an updating nonsymmetric nonconjugated rank-1 update.

*[Note:* These functions correspond to the BLAS functions
`xGER`

(for real element types) and `xGERU`

(for
complex element types)[bib]. *– end note]*

8
*Effects*: Computes *A* = *E* + *x**y*^{T}.

9
*Remarks*: `A`

may alias `E`

.

`template<`*in-vector* InVec1, *in-vector* InVec2, *out-matrix* OutMat>
void matrix_rank_1_update_c(InVec1 x, InVec2 y, OutMat A);
template<class ExecutionPolicy, *in-vector* InVec1, *in-vector* InVec2, *out-matrix* OutMat>
void matrix_rank_1_update_c(ExecutionPolicy&& exec, InVec1 x, InVec2 y, OutMat A);

10 These functions perform a overwriting nonsymmetric conjugated rank-1 update.

*[Note:* These functions correspond to the BLAS functions
`xGER`

(for real element types) and `xGERC`

(for
complex element types)[bib]. *– end note]*

11
*Effects*:

(11.1) For
the overloads without an `ExecutionPolicy`

argument,
equivalent to:

`(x, conjugated(y), A); matrix_rank_1_update`

(11.2) otherwise, equivalent to:

`(std::forward<ExecutionPolicy>(exec), x, conjugated(y), A); matrix_rank_1_update`

Replace the entire contents of [linalg.algs.blas2.symherrank1] with the following.

1
*[Note:* These functions correspond to the BLAS functions
`xSYR`

, `xSPR`

, `xHER`

, and
`xHPR`

[bib]. They have overloads taking a scaling factor
`alpha`

, because it would be impossible to express the update
*A* = *A**x**x*^{T}
otherwise. *– end note]*

2 The following elements apply to all functions in [linalg.algs.blas2.symherrank1].

3
For any function `F`

in this section that takes a parameter
named `t`

, an `InMat`

template parameter, and a
function parameter `InMat E`

, `t`

applies to
accesses done through the parameter `E`

. `F`

will
only access the triangle of `E`

specified by `t`

.
For accesses of diagonal elements `E[i, i]`

, `F`

will use the value
`real-if-needed`

`(E[i, i])`

if the name
of `F`

starts with `hermitian`

. For accesses
`E[i, j]`

outside the triangle specified by `t`

,
`F`

will use the value

(3.1)
`conj-if-needed`

`(E[j, i])`

if the name
of `F`

starts with `hermitian`

, or

(3.2)
`E[j, i]`

if the name of `F`

starts with
`symmetric`

.

4
*Mandates*:

(4.1) If
`OutMat`

has `layout_blas_packed`

layout, then the
layout’s `Triangle`

template argument has the same type as
the function’s `Triangle`

template argument;

(4.2) If
the function has an `InMat`

template parameter and
`InMat`

has `layout_blas_packed`

layout, then the
layout’s `Triangle`

template argument has the same type as
the function’s `Triangle`

template argument;

(4.3)
`compatible-static-extents`

`<decltype(A), decltype(A)>(0, 1)`

is `true`

;

(4.4)
`compatible-static-extents`

`<decltype(A), decltype(x)>(0, 0)`

is `true`

; and

(4.5)
`possibly-addable`

`<decltype(A), decltype(E), decltype(A)>`

is `true`

for those overloads that take an `E`

parameter.

5
*Preconditions*:

(5.1)
`A.extent(0)`

equals `A.extent(1)`

,

(5.2)
`A.extent(0)`

equals `x.extent(0)`

, and

(5.3)
`addable`

`(A, E, A)`

is `true`

for those overloads that take an `E`

parameter.

6
*Complexity*: *O*(
`x.extent(0)`

× `x.extent(0)`

).

`template<class Scalar, `*in-vector* InVec, *possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t);
template<class ExecutionPolicy,
class Scalar, *in-vector* InVec, *possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
); Scalar alpha, InVec x, OutMat A, Triangle t

7
These functions perform an overwriting symmetric rank-1 update of the
symmetric matrix `A`

, taking into account the
`Triangle`

parameter that applies to `A`

([linalg.general]).

8
*Effects*: Computes *A* = *α**x**x*^{T},
where the scalar *α* is
`alpha`

.

`template<`*in-vector* InVec, *possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_1_update(InVec x, OutMat A, Triangle t);
template<class ExecutionPolicy,
*in-vector* InVec, *possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec, InVec x, OutMat A, Triangle t);

9
These functions perform an overwriting symmetric rank-1 update of the
symmetric matrix `A`

, taking into account the
`Triangle`

parameter that applies to `A`

([linalg.general]).

10
*Effects*: Computes *A* = *x**x*^{T}.

`template<class Scalar, `*in-vector* InVec, *possibly-packed-in-matrix* InMat, *possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
template<class ExecutionPolicy,
class Scalar, *in-vector* InVec, *possibly-packed-in-matrix* InMat, *possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
); Scalar alpha, InVec x, InMat E, OutMat A, Triangle t

11 These
functions perform an updating symmetric rank-1 update of the symmetric
matrix `A`

using the symmetric matrix `E`

, taking
into account the `Triangle`

parameter that applies to
`A`

and `E`

([linalg.general]).

12
*Effects*: Computes *A* = *E* + *α**x**x*^{T},
where the scalar *α* is
`alpha`

.

`template<`*in-vector* InVec, *possibly-packed-in-matrix* InMat, *possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_1_update(InVec x, InMat E, OutMat A, Triangle t);
template<class ExecutionPolicy,
*in-vector* InVec, *possibly-packed-in-matrix* InMat, *possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_1_update(ExecutionPolicy&& exec,
); InVec x, InMat E, OutMat A, Triangle t

13 These
functions perform an updating symmetric rank-1 update of the symmetric
matrix `A`

using the symmetric matrix `E`

, taking
into account the `Triangle`

parameter that applies to
`A`

and `E`

([linalg.general]).

14
*Effects*: Computes *A* = *E* + *x**x*^{T}.

`template<`*noncomplex* Scalar, *in-vector* InVec, *possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, OutMat A, Triangle t);
template<class ExecutionPolicy,
*noncomplex* Scalar, *in-vector* InVec, *possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
); Scalar alpha, InVec x, OutMat A, Triangle t

15 These
functions perform an overwriting Hermitian rank-1 update of the
Hermitian matrix `A`

, taking into account the
`Triangle`

parameter that applies to `A`

([linalg.general]).

16
*Effects*: Computes *A* = *α**x**x*^{H},
where the scalar *α* is
`alpha`

.

`template<`*in-vector* InVec, *possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_1_update(InVec x, OutMat A, Triangle t);
template<class ExecutionPolicy,
*in-vector* InVec, *possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec, InVec x, OutMat A, Triangle t);

17 These
functions perform an overwriting Hermitian rank-1 update of the
Hermitian matrix `A`

, taking into account the
`Triangle`

parameter that applies to `A`

([linalg.general]).

18
*Effects*: Computes *A* = *x**x*^{T}.

`template<`*noncomplex* Scalar, *in-vector* InVec, *possibly-packed-in-matrix* InMat, *possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_1_update(Scalar alpha, InVec x, InMat E, OutMat A, Triangle t);
template<class ExecutionPolicy,
*noncomplex* Scalar, *in-vector* InVec, *possibly-packed-in-matrix* InMat, *possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
); Scalar alpha, InVec x, InMat E, OutMat A, Triangle t

19 These
functions perform an updating Hermitian rank-1 update of the Hermitian
matrix `A`

using the Hermitian matrix `E`

, taking
into account the `Triangle`

parameter that applies to
`A`

and `E`

([linalg.general]).

20
*Effects*: Computes *A* = *E* + *α**x**x*^{H},
where the scalar *α* is
`alpha`

.

`template<`*in-vector* InVec, *possibly-packed-in-matrix* InMat, *possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_1_update(InVec x, InMat E, OutMat A, Triangle t);
template<class ExecutionPolicy,
*in-vector* InVec, *possibly-packed-in-matrix* InMat, *possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_1_update(ExecutionPolicy&& exec,
); InVec x, InMat E, OutMat A, Triangle t

21 These
functions perform an updating Hermitian rank-1 update of the Hermitian
matrix `A`

using the Hermitian matrix `E`

, taking
into account the `Triangle`

parameter that applies to
`A`

and `E`

([linalg.general]).

22
*Effects*: Computes *A* = *E* + *x**x*^{T}.

Replace the entire contents of [linalg.algs.blas2.rank2] with the following.

1
*[Note:* These functions correspond to the BLAS functions
`xSYR2`

, `xSPR2`

, `xHER2`

, and
`xHPR2`

[bib]. *– end note]*

2 The following elements apply to all functions in [linalg.algs.blas2.rank2].

3
For any function `F`

in this section that takes a parameter
named `t`

, an `InMat`

template parameter, and a
function parameter `InMat E`

, `t`

applies to
accesses done through the parameter `E`

. `F`

will
only access the triangle of `E`

specified by `t`

.
For accesses of diagonal elements `E[i, i]`

, `F`

will use the value
`real-if-needed`

`(E[i, i])`

if the name
of `F`

starts with `hermitian`

. For accesses
`E[i, j]`

outside the triangle specified by `t`

,
`F`

will use the value

(3.1)
`conj-if-needed`

`(E[j, i])`

if the name
of `F`

starts with `hermitian`

, or

(3.2)
`E[j, i]`

if the name of `F`

starts with
`symmetric`

.

4
*Mandates*:

(4.1) If
`OutMat`

has `layout_blas_packed`

layout, then the
layout’s `Triangle`

template argument has the same type as
the function’s `Triangle`

template argument;

(4.2) If
the function has an `InMat`

template parameter and
`InMat`

has `layout_blas_packed`

layout, then the
layout’s `Triangle`

template argument has the same type as
the function’s `Triangle`

template argument;

(4.3)
`compatible-static-extents`

`<decltype(A), decltype(A)>(0, 1)`

is `true`

;

(4.4)
`possibly-multipliable`

`<decltype(A), decltype(x), decltype(y)>()`

is `true`

; and

(4.5)
`possibly-addable`

`<decltype(A), decltype(E), decltype(A)>`

is `true`

for those overloads that take an `E`

parameter.

5
*Preconditions*:

(5.1)
`A.extent(0)`

equals `A.extent(1)`

,

(5.2)
`multipliable`

`(A, x, y)`

is
`true`

, and

(5.3)
`addable`

`(A, E, A)`

is `true`

for those overloads that take an `E`

parameter.

6
*Complexity*: *O*(
`x.extent(0)`

× `y.extent(0)`

).

`template<`*in-vector* InVec1, *in-vector* InVec2,
*possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_2_update(InVec1 x, InVec2 y, OutMat A, Triangle t);
template<class ExecutionPolicy, *in-vector* InVec1, *in-vector* InVec2,
*possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_2_update(ExecutionPolicy&& exec,
); InVec1 x, InVec2 y, OutMat A, Triangle t

7
These functions perform an overwriting symmetric rank-2 update of the
symmetric matrix `A`

, taking into account the
`Triangle`

parameter that applies to `A`

([linalg.general]).

8
Effects: Computes *A* = *x**y*^{T} + *y**x*^{T}.

`template<`*in-vector* InVec1, *in-vector* InVec2,
*possibly-packed-in-matrix* InMat,
*possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_2_update(InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
template<class ExecutionPolicy, *in-vector* InVec1, *in-vector* InVec2,
*possibly-packed-in-matrix* InMat,
*possibly-packed-out-matrix* OutMat, class Triangle>
void symmetric_matrix_rank_2_update(ExecutionPolicy&& exec,
); InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t

9
These functions perform an updating symmetric rank-2 update of the
symmetric matrix `A`

using the symmetric matrix
`E`

, taking into account the `Triangle`

parameter
that applies to `A`

and `E`

([linalg.general]).

10
Effects: Computes *A* = *E* + *x**y*^{T} + *y**x*^{T}.

`template<`*in-vector* InVec1, *in-vector* InVec2,
*possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_2_update(InVec1 x, InVec2 y, OutMat A, Triangle t);
template<class ExecutionPolicy, *in-vector* InVec1, *in-vector* InVec2,
*possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_2_update(ExecutionPolicy&& exec,
); InVec1 x, InVec2 y, OutMat A, Triangle t

11 These
functions perform an overwriting Hermitian rank-2 update of the
Hermitian matrix `A`

, taking into account the
`Triangle`

parameter that applies to `A`

([linalg.general]).

12
Effects: Computes *A* = *x**y*^{H} + *y**x*^{H}.

`template<`*in-vector* InVec1, *in-vector* InVec2,
*possibly-packed-in-matrix* InMat,
*possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_2_update(InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t);
template<class ExecutionPolicy, *in-vector* InVec1, *in-vector* InVec2,
*possibly-packed-in-matrix* InMat,
*possibly-packed-out-matrix* OutMat, class Triangle>
void hermitian_matrix_rank_2_update(ExecutionPolicy&& exec,
); InVec1 x, InVec2 y, InMat E, OutMat A, Triangle t

13 These
functions perform an updating Hermitian rank-2 update of the Hermitian
matrix `A`

using the Hermitian matrix `E`

, taking
into account the `Triangle`

parameter that applies to
`A`

and `E`

([linalg.general]).

14
Effects: Computes *A* = *E* + *x**y*^{H} + *y**x*^{H}.

Replace the entire contents of [linalg.algs.blas3.rankk] with the following.

*[Note:* These functions correspond to the BLAS functions
`xSYRK`

and `xHERK`

. *– end note]*

1 The following elements apply to all functions in [linalg.algs.blas3.rankk].

2
For any function `F`

in this section that takes a parameter
named `t`

, an `InMat2`

template parameter, and a
function parameter `InMat2 E`

, `t`

applies to
accesses done through the parameter `E`

. `F`

will
only access the triangle of `E`

specified by `t`

.
For accesses of diagonal elements `E[i, i]`

, `F`

will use the value
`real-if-needed`

`(E[i, i])`

if the name
of `F`

starts with `hermitian`

. For accesses
`E[i, j]`

outside the triangle specified by `t`

,
`F`

will use the value

(2.1)
`conj-if-needed`

`(E[j, i])`

if the name
of `F`

starts with `hermitian`

, or

(2.2)
`E[j, i]`

if the name of `F`

starts with
`symmetric`

.

3
*Mandates:*

(3.1) If

`OutMat`

has`layout_blas_packed`

layout, then the layout’s`Triangle`

template argument has the same type as the function’s`Triangle`

template argument.(3.2) If the function takes an

`InMat2`

template parameter and if`InMat2`

has`layout_blas_packed`

layout, then the layout’s`Triangle`

template argument has the same type as the function’s`Triangle`

template argument.(3.3)

`possibly-multipliable`

`<decltype(A), decltype(transposed(A)), decltype(C)>`

is`true`

.(3.4)

`possibly-addable`

`<decltype(C), decltype(E), decltype(C)>`

is`true`

for those overloads that take an`E`

parameter.

4
*Preconditions:*

(4.1)

`multipliable`

`(A, transposed(A), C)`

is`true`

.*[Note:*This implies that`C`

is square.*– end note]*(4.2)

`addable`

`(C, E, C)`

is`true`

for those overloads that take an`E`

parameter.

5
*Complexity:* *O*(
`A.extent(0)`

⋅
`A.extent(1)`

⋅
`A.extent(0)`

).

6
*Remarks:* `C`

may alias `E`

for those
overloads that take an `E`

parameter.

```
template<class Scalar,
```*in-matrix* InMat,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_k_update(
Scalar alpha,
InMat A,
OutMat C,);
Triangle ttemplate<class ExecutionPolicy,
class Scalar,
*in-matrix* InMat,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_k_update(
&& exec,
ExecutionPolicy
Scalar alpha,
InMat A,
OutMat C,); Triangle t

5
*Effects:* Computes *C* = *α**A**A*^{T},
where the scalar *α* is
`alpha`

.

`template<`*in-matrix* InMat,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_k_update(
InMat A,
OutMat C,);
Triangle ttemplate<class ExecutionPolicy,
*in-matrix* InMat,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_k_update(
&& exec,
ExecutionPolicy
InMat A,
OutMat C,); Triangle t

6
*Effects:* Computes *C* = *A**A*^{T}.

`template<`*noncomplex* Scalar,
*in-matrix* InMat,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_k_update(
Scalar alpha,
InMat A,
OutMat C,);
Triangle ttemplate<class ExecutionPolicy,
*noncomplex* Scalar,
*in-matrix* InMat,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_k_update(
&& exec,
ExecutionPolicy
Scalar alpha,
InMat A,
OutMat C,); Triangle t

7
*Effects:* Computes *C* = *α**A**A*^{H},
where the scalar *α* is
`alpha`

.

`template<`*in-matrix* InMat,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_k_update(
InMat A,
OutMat C,);
Triangle ttemplate<class ExecutionPolicy,
*in-matrix* InMat,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_k_update(
&& exec,
ExecutionPolicy
InMat A,
OutMat C,); Triangle t

8
*Effects:* Computes *C* = *A**A*^{H}.

```
template<class Scalar,
```*in-matrix* InMat1,
*possibly-packed-in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_k_update(
Scalar alpha,
InMat1 A,
InMat2 E,
OutMat C,);
Triangle ttemplate<class ExecutionPolicy,
class Scalar,
*in-matrix* InMat1,
*possibly-packed-in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_k_update(
&& exec,
ExecutionPolicy
Scalar alpha,
InMat1 A,
InMat2 E,
OutMat C,); Triangle t

9
*Effects:* Computes *C* = *E* + *α**A**A*^{T},
where the scalar *α* is
`alpha`

.

`template<`*in-matrix* InMat1,
*possibly-packed-in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_k_update(
InMat1 A,
InMat2 E,
OutMat C,);
Triangle ttemplate<class ExecutionPolicy,
*in-matrix* InMat1,
*possibly-packed-in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_k_update(
&& exec,
ExecutionPolicy
InMat1 A,
InMat2 E,
OutMat C,); Triangle t

10
*Effects:* Computes *C* = *E* + *A**A*^{T}.

`template<`*noncomplex* Scalar,
*in-matrix* InMat1,
*possibly-packed-in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_k_update(
Scalar alpha,
InMat1 A,
InMat2 E,
OutMat C,);
Triangle ttemplate<class ExecutionPolicy,
*noncomplex* Scalar,
*in-matrix* InMat1,
*possibly-packed-in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_k_update(
&& exec,
ExecutionPolicy
Scalar alpha,
InMat1 A,
InMat2 E,
OutMat C,); Triangle t

11
*Effects:* Computes *C* = *E* + *α**A**A*^{H},
where the scalar *α* is
`alpha`

.

`template<`*in-matrix* InMat1,
*possibly-packed-in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_k_update(
InMat1 A,
InMat2 E,
OutMat C,);
Triangle ttemplate<class ExecutionPolicy,
*in-matrix* InMat1,
*possibly-packed-in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_k_update(
&& exec,
ExecutionPolicy
InMat1 A,
InMat2 E,
OutMat C,); Triangle t

12
*Effects:* Computes *C* = *E* + *A**A*^{H}.

Replace the entire contents of [linalg.algs.blas3.rank2k] with the following.

*[Note:* These functions correspond to the BLAS functions
`xSYR2K`

and `xHER2K`

. *– end note]*

1 The following elements apply to all functions in [linalg.algs.blas3.rank2k].

2
For any function `F`

in this section that takes a parameter
named `t`

, an `InMat3`

template parameter, and a
function parameter `InMat3 E`

, `t`

applies to
accesses done through the parameter `E`

. `F`

will
only access the triangle of `E`

specified by `t`

.
For accesses of diagonal elements `E[i, i]`

, `F`

will use the value
`real-if-needed`

`(E[i, i])`

if the name
of `F`

starts with `hermitian`

. For accesses
`E[i, j]`

outside the triangle specified by `t`

,
`F`

will use the value

(2.1)
`conj-if-needed`

`(E[j, i])`

if the name
of `F`

starts with `hermitian`

, or

(2.2)
`E[j, i]`

if the name of `F`

starts with
`symmetric`

.

3
*Mandates:*

(3.1) If

`OutMat`

has`layout_blas_packed`

layout, then the layout’s`Triangle`

template argument has the same type as the function’s`Triangle`

template argument;(3.2) If the function takes an

`InMat3`

template parameter and if`InMat3`

has`layout_blas_packed`

layout, then the layout’s`Triangle`

template argument has the same type as the function’s`Triangle`

template argument.(3.3)

`possibly-multipliable`

`<decltype(A), decltype(transposed(B)), decltype(C)>`

is`true`

.(3.4)

`possibly-multipliable`

`<decltype(B), decltype(transposed(A)), decltype(C)>`

is`true`

.(3.5)

`possibly-addable`

`<decltype(C), decltype(E), decltype(C)>`

is`true`

for those overloads that take an`E`

parameter.

4
*Preconditions:*

(4.1)

`multipliable`

`(A, transposed(B), C)`

is`true`

.(4.2)

`multipliable`

`(B, transposed(A), C)`

is`true`

.*[Note:*This and the previous imply that`C`

is square.*– end note]*(4.3)

`addable`

`(C, E, C)`

is`true`

for those overloads that take an`E`

parameter.

4
*Complexity:* *O*(
`A.extent(0)`

⋅
`A.extent(1)`

⋅
`B.extent(0)`

)

5
*Remarks:* `C`

may alias `E`

for those
overloads that take an `E`

parameter.

`template<`*in-matrix* InMat1,
*in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_2k_update(
InMat1 A,
InMat2 B,
OutMat C,);
Triangle ttemplate<class ExecutionPolicy,
*in-matrix* InMat1,
*in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_2k_update(
&& exec,
ExecutionPolicy
InMat1 A,
InMat2 B,
OutMat C,); Triangle t

5
*Effects:* Computes *C* = *A**B*^{T} + *B**A*^{T}.

`template<`*in-matrix* InMat1,
*in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_2k_update(
InMat1 A,
InMat2 B,
OutMat C,);
Triangle ttemplate<class ExecutionPolicy,
*in-matrix* InMat1,
*in-matrix* InMat2,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_2k_update(
&& exec,
ExecutionPolicy
InMat1 A,
InMat2 B,
OutMat C,); Triangle t

6
*Effects:* Computes *C* = *A**B*^{H} + *B**A*^{H}.

`template<`*in-matrix* InMat1,
*in-matrix* InMat2,
*in-matrix* InMat3,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_2k_update(
InMat1 A,
InMat2 B,
InMat3 E,
OutMat C,);
Triangle ttemplate<class ExecutionPolicy,
*in-matrix* InMat1,
*in-matrix* InMat2,
*in-matrix* InMat3,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void symmetric_matrix_rank_2k_update(
&& exec,
ExecutionPolicy
InMat1 A,
InMat2 B,
InMat3 E,
OutMat C,); Triangle t

7
*Effects:* Computes *C* = *E* + *A**B*^{T} + *B**A*^{T}.

`template<`*in-matrix* InMat1,
*in-matrix* InMat2,
*in-matrix* InMat3,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_2k_update(
InMat1 A,
InMat2 B,
InMat3 E,
OutMat C,);
Triangle ttemplate<class ExecutionPolicy,
*in-matrix* InMat1,
*in-matrix* InMat2,
*in-matrix* InMat3,
*possibly-packed-out-matrix* OutMat,
class Triangle>
void hermitian_matrix_rank_2k_update(
&& exec,
ExecutionPolicy
InMat1 A,
InMat2 B,
InMat3 E,
OutMat C,); Triangle t

8
*Effects:* Computes *C* = *E* + *A**B*^{H} + *B**A*^{H}.