1. Authors and contributors
1.1. Authors
- 
     Mark Hoemmen (mhoemmen@stellarscience.com) (Stellar Science) 
- 
     Daisy Hollman (cpp@dsh.fyi) (Google) 
- 
     Christian Trott (crtrott@sandia.gov) (Sandia National Laboratories) 
- 
     Daniel Sunderland (dansunderland@gmail.com) 
- 
     Nevin Liber (nliber@anl.gov) (Argonne National Laboratory) 
- 
     Alicia Klinvex (alicia.klinvex@unnpp.gov) (Naval Nuclear Laboratory) 
- 
     Li-Ta Lo (ollie@lanl.gov) (Los Alamos National Laboratory) 
- 
     Damien Lebrun-Grandie (lebrungrandt@ornl.gov) (Oak Ridge National Laboratories) 
- 
     Graham Lopez (glopez@nvidia.com) (NVIDIA) 
- 
     Peter Caday (peter.caday@intel.com) (Intel) 
- 
     Sarah Knepper (sarah.knepper@intel.com) (Intel) 
- 
     Piotr Luszczek (luszczek@icl.utk.edu) (University of Tennessee) 
- 
     Timothy Costa (tcosta@nvidia.com) (NVIDIA) 
1.2. Contributors
- 
     Chip Freitag (chip.freitag@amd.com) (AMD) 
- 
     Bryce Adelstein Lelbach (brycelelbach@gmail.com) (NVIDIA) 
- 
     Srinath Vadlamani (Srinath.Vadlamani@arm.com) (ARM) 
- 
     Rene Vanoostrum (Rene.Vanoostrum@amd.com) (AMD) 
2. Revision history
- 
     Revision 0 (pre-Cologne) submitted 2019-06-17 - 
       Received feedback in Cologne from SG6, LEWGI, and (???). 
 
- 
       
- 
     Revision 1 (pre-Belfast) to be submitted 2019-10-07 - 
       Account for Cologne 2019 feedback - 
         Make interface more consistent with existing Standard algorithms 
- 
         Change dot dotc vector_norm2 vector_abs_sum reduce init 
 
- 
         
- 
       Minor changes to "expression template" classes, based on implementation experience 
- 
       Briefly address LEWGI request of exploring concepts for input arguments. 
- 
       Lazy ranges style API was NOT explored. 
 
- 
       
- 
     Revision 2 (pre-Cologne) to be submitted 2020-01-13 - 
       Add "Future work" section. 
- 
       Remove "Options and votes" section (which were addressed in SG6, SG14, and LEWGI). 
- 
       Remove basic_mdarray 
- 
       Remove batched linear algebra operations. 
- 
       Remove over- and underflow requirement for vector_norm2 
- 
       Mandate any extent compatibility checks that can be done at compile time. 
- 
       Add missing functions { symmetric , hermitian } _matrix_rank_k_update triangular_matrix_ { left , right } _product 
- 
       Remove packed_view 
- 
       Fix wording for { conjugate , transpose , conjugate_transpose } _view transpose_view layout_blas_packed layout_blas_packed Triangle StorageOrder 
- 
       Remove second template parameter T accessor_conjugate 
- 
       Make scaled_scalar conjugated_scalar 
- 
       Add in-place overloads of triangular_matrix_matrix_ { left , right } _solve triangular_matrix_ { left , right } _product triangular_matrix_vector_solve 
- 
       Add alpha { symmetric , hermitian } _matrix_rank_ { 1 , k } _update 
- 
       Add Cholesky factorization and solve examples. 
 
- 
       
- 
     Revision 3 (electronic) to be submitted 2021-04-15 - 
       Per LEWG request, add a section on our investigation of constraining template parameters with concepts, in the manner of P1813R0 with the numeric algorithms. We concluded that we disagree with the approach of P1813R0, and that the Standard’s current GENERALIZED_SUM approach better expresses numeric algorithms' behavior. 
- 
       Update references to the current revision of P0009 ( mdspan 
- 
       Per LEWG request, introduce std :: linalg 
- 
       Per LEWG request, replace the linalg_ linalg_add add linalg_copy copy linalg_swap swap_elements 
- 
       Per LEWG request, do not use _view conjugate_view conjugated conjugate_transpose_view conjugate_transposed scaled_view scaled transpose_view transposed 
- 
       Change wording from "then implementations will use T T 
- 
       Before, a Note on vector_norm2 vector_norm2 in_vector_t :: element_type T vector_norm2 
- 
       Define return types of the dot dotc vector_norm2 vector_abs_sum auto 
- 
       Remove the explicitly stated constraint on add copy in_object * _t inout_object * _t out_object * _t 
- 
       Add vector_sum_of_squares vector_norm2 
- 
       Add matrix_frob_norm matrix_one_norm matrix_inf_norm 
- 
       Address LEWG request for us to investigate support for GPU memory. See section "Explicit support for asynchronous return of scalar values." 
- 
       Add ExecutionPolicy triangular_matrix_vector_solve triangular_matrix_left_product triangular_matrix_right_product triangular_matrix_matrix_left_solve triangular_matrix_matrix_right_solve 
 
- 
       
- 
     Revision 4 (electronic), to be submitted 2021-08-15 - 
       Update authors' contact info. 
- 
       Rebase atop P2299R3, which in turn sits atop P0009R12. Make any needed fixes due to these changes. (P1673R3 was based on P0009R10, without P2299.) Update P0009 references to point to the latest version (R12). 
- 
       Fix requirements for { symmetric , hermitian } _matrix_ { left , right } _product 
- 
       Change SemiRegular < Scalar > semiregular < Scalar > 
- 
       Make Real Real 
- 
       In [linalg.algs.reqs], clarify that "unique layout" for output matrix, vector, or object types means is_always_unique () true.
- 
       Change file format from Markdown to Bikeshed. 
- 
       Impose requirements on the types on which algorithms compute, and on the algorithms themselves (e.g., what rearrangements are permitted). Add a section explaining how we came up with the requirements. Lift the requirements into a new higher-level section that applies to the entire contents of [linalg], not just to [linalg.algs]. 
- 
       Add "Overview of contents" section. 
- 
       In the last review, LEWG had asked us to consider using exposition-only concepts and requires 
- 
       Add more examples. 
 
- 
       
3. Purpose of this paper
This paper proposes a C++ Standard Library dense linear algebra interface based on the dense Basic Linear Algebra Subroutines (BLAS). This corresponds to a subset of the BLAS Standard. Our proposal implements the following classes of algorithms on arrays that represent matrices and vectors:
- 
     Elementwise vector sums 
- 
     Multiplying all elements of a vector or matrix by a scalar 
- 
     2-norms and 1-norms of vectors 
- 
     Vector-vector, matrix-vector, and matrix-matrix products (contractions) 
- 
     Low-rank updates of a matrix 
- 
     Triangular solves with one or more "right-hand side" vectors 
- 
     Generating and applying plane (Givens) rotations 
Our algorithms work with most of the matrix storage formats that the BLAS Standard supports:
- 
     "General" dense matrices, in column-major or row-major format 
- 
     Symmetric or Hermitian (for complex numbers only) dense matrices, stored either as general dense matrices, or in a packed format 
- 
     Dense triangular matrices, stored either as general dense matrices or in a packed format 
Our proposal also has the following distinctive characteristics:
- 
     It uses free functions, not arithmetic operator overloading. 
- 
     The interface is designed in the spirit of the C++ Standard Library’s algorithms. 
- 
     It uses mdspan 
- 
     The interface permits optimizations for matrices and vectors with small compile-time dimensions; the standard BLAS interface does not. 
- 
     Each of our proposed operations supports all element types for which that operation makes sense, unlike the BLAS, which only supports four element types. 
- 
     Our operations permit "mixed-precision" computation with matrices and vectors that have different element types. This subsumes most functionality of the Mixed-Precision BLAS specification (Chapter 4 of the BLAS Standard). 
- 
     Like the C++ Standard Library’s algorithms, our operations take an optional execution policy argument. This is a hook to support parallel execution and hierarchical parallelism (through the proposed executor extensions to execution policies; see P1019R2). 
- 
     Unlike the BLAS, our proposal can be expanded to support "batched" operations (see P1417R0) with almost no interface differences. This will support machine learning and other applications that need to do many small matrix or vector operations at once. 
Here are some examples of what this proposal offers.
In these examples, we ignore 
constexpr size_t N = 40 ; std :: vector < double > x_vec ( N ); mdspan x ( x_vec . data (), N ); for ( size_t i = 0 ; i < N ; ++ i ) { x ( i ) = double ( i ); y ( i ) = double ( i ) + 1.0 ; } linalg :: scale ( 2.0 , x ); // x = 2.0 * x linalg :: scale ( std :: execution :: par_unseq , 3.0 , x ); for ( size_t i = 0 ; i < N ; ++ i ) { assert ( x ( i ) == 6.0 * double ( i )); } 
Here is a matrix-vector product example.
It illustrates the 
constexpr size_t N = 40 ; constexpr size_t M = 20 ; std :: vector < double > A_vec ( N * M ); std :: vector < double > x_vec ( M ); std :: array < double , N > y_vec ( N ); mdspan A ( A_vec . data (), N , M ); mdspan x ( x_vec . data (), M ); mdspan < double , extents < N >> y ( y_vec . data ()); for ( int i = 0 ; i < A . extent ( 0 ); ++ i ) { for ( int j = 0 ; j < A . extent ( 1 ); ++ j ) { A ( i , j ) = 100.0 * i + j ; } } for ( int j = 0 ; j < x . extent ( 0 ); ++ j ) { x ( i ) = 1.0 * j ; } for ( int i = 0 ; i < y . extent ( 0 ); ++ i ) { y ( i ) = -1.0 * i ; } linalg :: matrix_vector_product ( A , x , y ); // y = A * x // y = 0.5 * y + 2 * A * x linalg :: matrix_vector_product ( std :: execution :: par , linalg :: scaled ( 2.0 , A ), x , linalg :: scaled ( 0.5 , y ), y ); 
This example illustrates the ability to perform mixed-precision computations,
and the ability to compute on subviews of a matrix or vector
by using 
constexpr size_t M = 40 ; std :: vector < float > A_vec ( M * 8 * 4 ); std :: vector < double > x_vec ( M * 4 ); std :: vector < double > y_vec ( M * 8 ); mdspan < float , extents < dynamic_extent , 8 , 4 >> A ( A_vec . data (), M ); mdspan < double , extents < 4 , dynamic_extent >> x ( x_vec . data (), M ); mdspan < double , extents < dynamic_extent , 8 >> y ( y_vec . data (), M ); for ( size_t m = 0 ; m < A . extent ( 0 ); ++ m ) { for ( size_t i = 0 ; i < A . extent ( 1 ); ++ i ) { for ( size_t j = 0 ; j < A . extent ( 2 ); ++ j ) { A ( m , i , j ) = 1000.0 * m + 100.0 * i + j ; } } } for ( size_t i = 0 ; i < x . extent ( 0 ); ++ i ) { for ( size_t m = 0 ; m < x . extent ( 1 ); ++ m ) { x ( i , m ) = 33. * i + 0.33 * m ; } } for ( size_t m = 0 ; m < y . extent ( 0 ); ++ m ) { for ( size_t i = 0 ; i < y . extent ( 1 ); ++ i ) { y ( m , i ) = 33. * m + 0.33 * i ; } } for ( size_t m = 0 ; m < M ; ++ m ) { auto A_m = submdspan ( A , m , full_extent , full_extent ); auto x_m = submdspan ( x , full_extent , m ); auto y_m = submdspan ( y , m , full_extent ); // y_m = A * x_m linalg :: matrix_vector_product ( A_m , x_m , y_m ); } 
4. Overview of contents
Section 5 explains how this proposal is interoperable with other linear algebra proposals currently under WG21 review. In particular, we believe this proposal is complementary to P1385R6, and the authors of P1385R6 have expressed the same view.
Section 6 motivates considering any dense linear algebra proposal for the C++ Standard Library.
Section 7 shows why we chose the BLAS as a starting point for our proposed library. The BLAS is an existing standard with decades of use, a rich set of functions, and many optimized implementations.
Section 8 lists what we consider general criteria for including algorithms in the C++ Standard Library. We rely on these criteria to justify the algorithms in this proposal.
Section 9 describes BLAS notation and conventions in C++ terms. Understanding this will give readers context for algorithms, and show how our proposed algorithms expand on BLAS functionality.
Section 10 lists functionality that we intentionally exclude from our proposal. We imitate the BLAS in aiming to be a set of "performance primitives" on which external libraries or applications may build a more complete linear algebra solution.
Section 11 elaborates on our design justification. This section explains
- 
     why we use mdspan 
- 
     how we translate the BLAS' Fortran-centric idioms into C++; 
- 
     how the BLAS' different "matrix types" map to different algorithms, rather than different mdspan 
- 
     how we express quality-of-implementation recommendations about avoiding undue overflow and underflow; and 
- 
     how we impose requirements on algorithms' behavior and on the various value types that algorithms encounter. 
Section 12 lists future work, that is, ways future proposals could build on this one.
Section 13 gives the data structures and utilities from other proposals on which we depend.
In particular, we rely heavily on 
Section 14 credits funding agencies and contributors.
Section 15 is our bibliography.
Section 16 is where readers will find the normative wording we propose.
Finally, Section 17 gives some more elaborate examples of linear algebra algorithms that use our proposal.
The examples show how 
Here is a table that maps between Reference BLAS function name, and algorithm or function name in our proposal. The mapping is not always one to one. "N/A" in the "BLAS name(s)" field means that the function is not in the BLAS.
| BLAS name(s) | Our name(s) | 
|---|---|
| xLARTG |  | 
| xROT |  | 
| xSWAP |  | 
| xSCAL | , | 
| xCOPY |  | 
| xAXPY | , | 
| xDOT, xDOTU |  | 
| xDOTC |  | 
| N/A |  | 
| xNRM2 |  | 
| xASUM |  | 
| xIAMAX |  | 
| N/A |  | 
| N/A |  | 
| N/A |  | 
| xGEMV |  | 
| xSYMV |  | 
| xHEMV |  | 
| xTRMV |  | 
| xTRSV |  | 
| xGER, xGERU |  | 
| xGERC |  | 
| xSYR |  | 
| xHER |  | 
| xSYR2 |  | 
| xHER2 |  | 
| xGEMM |  | 
| xSYMM |  | 
| xHEMM |  | 
| xTRMM |  | 
| xSYRK |  | 
| xHERK |  | 
| xSYR2K |  | 
| xHER2K |  | 
| xTRSM |  | 
5. Interoperable with other linear algebra proposals
We believe this proposal is complementary to P1385R6, a proposal for a C++ Standard linear algebra library that introduces matrix and vector classes with overloaded arithmetic operators. The P1385 authors and we have expressed together in a joint paper, P1891, that P1673 and P1385 "are orthogonal. They are not competing papers; ... there is no overlap of functionality."
Many of us have had experience using and implementing matrix and vector operator arithmetic libraries like what P1385 proposes. We designed P1673 in part as a natural foundation or implementation layer for such libraries. Our view is that a free function interface like P1673’s clearly separates algorithms from data structures, and more naturally allows for a richer set of operations such as what the BLAS provides. P1674 explains why we think our proposal is a minimal C++ "respelling" of the BLAS.
A natural extension of the present proposal would include
accepting P1385’s matrix and vector objects as input
for the algorithms proposed here.
A straightforward way to do that would be for P1385’s matrix and vector objects
to make views of their data available as 
6. Why include dense linear algebra in the C++ Standard Library?
- 
     "Direction for ISO C++" (P0939R4) explicitly calls out "Linear Algebra" as a potential priority for C++23. 
- 
     C++ applications in "important application areas" (see P0939R4) have depended on linear algebra for a long time. 
- 
     Linear algebra is like sort 
- 
     Dense linear algebra is core functionality for most of linear algebra, and can also serve as a building block for tensor operations. 
- 
     The C++ Standard Library includes plenty of "mathematical functions." Linear algebra operations like matrix-matrix multiply are at least as broadly useful. 
- 
     The set of linear algebra operations in this proposal are derived from a well-established, standard set of algorithms that has changed very little in decades. It is one of the strongest possible examples of standardizing existing practice that anyone could bring to C++. 
- 
     This proposal follows in the footsteps of many recent successful incorporations of existing standards into C++, including the UTC and TAI standard definitions from the International Telecommunications Union, the time zone database standard from the International Assigned Numbers Authority, and the ongoing effort to integrate the ISO unicode standard. 
Linear algebra has had wide use in C++ applications for nearly three decades (see P1417R0 for a historical survey). For much of that time, many third-party C++ libraries for linear algebra have been available. Many different subject areas depend on linear algebra, including machine learning, data mining, web search, statistics, computer graphics, medical imaging, geolocation and mapping, engineering, and physics-based simulations.
"Directions for ISO C++" (P0939R4) not only lists "Linear Algebra" explicitly as a potential C++23 priority, it also offers the following in support of adding linear algebra to the C++ Standard Library.
- 
     P0939R4 calls out "Support for demanding applications" in "important application areas, such as medical, finance, automotive, and games (e.g., key libraries...)" as an "area of general concern" that "we should not ignore." All of these areas depend on linear algebra. 
- 
     "Is my proposal essential for some important application domain?" Many large and small private companies, science and engineering laboratories, and academics in many different fields all depend on linear algebra. 
- 
     "We need better support for modern hardware": Modern hardware spends many of its cycles in linear algebra. For decades, hardware vendors, some represented at WG21 meetings, have provided and continue to provide features specifically to accelerate linear algebra operations. Some of them even implement specific linear algebra operations directly in hardware. Examples include NVIDIA’s Tensor Cores and Cerebras' Wafer Scale Engine. Several large computer system vendors offer optimized linear algebra libraries based on or closely resembling the BLAS; these include AMD’s BLIS, ARM’s Performance Libraries, Cray’s LibSci, Intel’s Math Kernel Library (MKL), IBM’s Engineering and Scientific Subroutine Library (ESSL), and NVIDIA’s cuBLAS. 
Obvious algorithms for some linear algebra operations like dense
matrix-matrix multiply are asymptotically slower than less-obvious
algorithms.  (Please refer to a survey one of us coauthored, "Communication lower bounds and optimal algorithms for numerical
linear algebra.")
Furthermore, writing the fastest dense matrix-matrix multiply depends
on details of a specific computer architecture.  This makes such
operations comparable to 
Dense linear algebra is the core component of most algorithms and applications that use linear algebra, and the component that is most widely shared over different application areas. For example, tensor computations end up spending most of their time in optimized dense linear algebra functions. Sparse matrix computations get best performance when they spend as much time as possible in dense linear algebra.
The C++ Standard Library includes many "mathematical special functions" ([sf.cmath]), like incomplete elliptic integrals, Bessel functions, and other polynomials and functions named after various mathematicians. Any of them comes with its own theory and set of applications for which robust and accurate implementations are indispensible. We think that linear algebra operations are at least as broadly useful, and in many cases significantly more so.
7. Why base a C++ linear algebra library on the BLAS?
- 
     The BLAS is a standard that codifies decades of existing practice. 
- 
     The BLAS separates "performance primitives" for hardware experts to tune, from mathematical operations that rely on those primitives for good performance. 
- 
     Benchmarks reward hardware and system vendors for providing optimized BLAS implementations. 
- 
     Writing a fast BLAS implementation for common element types is nontrivial, but well understood. 
- 
     Optimized third-party BLAS implementations with liberal software licenses exist. 
- 
     Building a C++ interface on top of the BLAS is a straightforward exercise, but has pitfalls for unaware developers. 
Linear algebra has had a cross-language standard, the Basic Linear Algebra Subroutines (BLAS), since 2002. The Standard came out of a standardization process that started in 1995 and held meetings three times a year until 1999. Participants in the process came from industry, academia, and government research laboratories. The dense linear algebra subset of the BLAS codifies forty years of evolving practice, and has existed in recognizable form since 1990 (see P1417R0).
The BLAS interface was specifically designed as the distillation of the "computer science" / performance-oriented parts of linear algebra algorithms. It cleanly separates operations most critical for performance, from operations whose implementation takes expertise in mathematics and rounding-error analysis. This gives vendors opportunities to add value, without asking for expertise outside the typical required skill set of a Standard Library implementer.
Well-established benchmarks such as the LINPACK benchmark reward computer hardware vendors for optimizing their BLAS implementations. Thus, many vendors provide an optimized BLAS library for their computer architectures. Writing fast BLAS-like operations is not trivial, and depends on computer architecture. However, it is a well-understood problem whose solutions could be parameterized for a variety of computer architectures. See, for example, Goto and van de Geijn 2008. There are optimized third-party BLAS implementations for common architectures, like ATLAS and GotoBLAS. A (slow but correct) reference implementation of the BLAS exists and it has a liberal software license for easy reuse.
We have experience in the exercise of wrapping a C or Fortran BLAS implementation for use in portable C++ libraries. We describe this exercise in detail in our paper "Evolving a Standard C++ Linear Algebra Library from the BLAS" (P1674). It is straightforward for vendors, but has pitfalls for developers. For example, Fortran’s application binary interface (ABI) differs across platforms in ways that can cause run-time errors (even incorrect results, not just crashing). Historical examples of vendors' C BLAS implementations have also had ABI issues that required work-arounds. This dependence on ABI details makes availability in a standard C++ library valuable.
8. Criteria for including algorithms
We include algorithms in our proposal based on the following criteria, ordered by decreasing importance. Many of our algorithms satisfy multiple criteria.
- 
     Getting the desired asymptotic run time is nontrivial 
- 
     Opportunity for vendors to provide hardware-specific optimizations 
- 
     Opportunity for vendors to provide quality-of-implementation improvements, especially relating to accuracy or reproducibility with respect to floating-point rounding error 
- 
     User convenience (familiar name, or tedious to implement) 
Regarding (1), "nontrivial" means "at least for novices to the field."
Dense matrix-matrix multiply is a good example.
Getting close to the asymptotic lower bound on the number of memory reads and writes
matters a lot for performance, and calls for a nonintuitive loop reordering.
An analogy to the current C++ Standard Library is 
Regarding (2), a good example is copying multidimensional arrays.
The Kokkos library spends about 2500 lines of code on multidimensional array copy,
yet still relies on system libraries for low-level optimizations.
An analogy to the current C++ Standard Library is 
Regarding (3), accurate floating-point summation is nontrivial.
Well-meaning compiler optimizations might defeat even simple technqiues,
like compensated summation.
The most obvious way to compute a vector’s Euclidean norm
(square root of sum of squares) can cause overflow or underflow,
even when the exact answer is much smaller than the overflow threshold,
or larger than the underflow threshold.
Some users care deeply about sums, even parallel sums,
that always get the same answer, despite rounding error.
This can help debugging, for example.
It is possible to make floating-point sums completely independent
of parallel evaluation order.
See e.g., the ReproBLAS effort.
Naming these algorithms and providing 
Regarding (4), the C++ Standard Library is not entirely minimalist.
One example is 
The BLAS exists mainly for the first two reasons. It includes functions that were nontrivial for compilers to optimize in its time, like scaled elementwise vector sums, as well as functions that generally require human effort to optimize, like matrix-matrix multiply.
9. Notation and conventions
9.1. The BLAS uses Fortran terms
The BLAS' "native" language is Fortran. It has a C binding as well, but the BLAS Standard and documentation use Fortran terms. Where applicable, we will call out relevant Fortran terms and highlight possibly confusing differences with corresponding C++ ideas. Our paper P1674R0 ("Evolving a Standard C++ Linear Algebra Library from the BLAS") goes into more detail on these issues.
9.2. We call "subroutines" functions
Like Fortran, the BLAS distinguishes between functions that return a value, and subroutines that do not return a value. In what follows, we will refer to both as "BLAS functions" or "functions."
9.3. Element types and BLAS function name prefix
The BLAS implements functionality for four different matrix, vector, or scalar element types:
- 
     REAL float 
- 
     DOUBLE PRECISION double 
- 
     COMPLEX complex < float > 
- 
     DOUBLE COMPLEX complex < double > 
The BLAS' Fortran 77 binding uses a function name prefix to distinguish functions based on element type:
- 
     S REAL 
- 
     D DOUBLE PRECISION 
- 
     C COMPLEX 
- 
     Z DOUBLE COMPLEX 
For example, the four BLAS functions 
The convention is to refer to all of these functions together as 
Not all BLAS functions exist for all four data types. These come in three categories:
- 
     The BLAS provides only real-arithmetic ( S D 
- 
     The complex-arithmetic versions perform a slightly different mathematical operation than the real-arithmetic versions, so they have a different base name. 
- 
     The complex-arithmetic versions offer a choice between nonconjugated or conjugated operations. 
As an example of the second category, the BLAS functions 
Examples of the third category include the following:
- 
     nonconjugated dot product xDOTU xDOTC 
- 
     rank-1 symmetric ( xGERU xGERC 
The conjugate transpose and the (nonconjugated) transpose are the same operation in real arithmetic (if one considers real arithmetic embedded in complex arithmetic), but differ in complex arithmetic. Different applications have different reasons to want either. The C++ Standard includes complex numbers, so a Standard linear algebra library needs to respect the mathematical structures that go along with complex numbers.
10. What we exclude from the design
10.1. Most functions not in the Reference BLAS
The BLAS Standard includes functionality that appears neither in the Reference BLAS library, nor in the classic BLAS "level" 1, 2, and 3 papers. (For history of the BLAS "levels" and a bibliography, see P1417R0. For a paper describing functions not in the Reference BLAS, see "An updated set of basic linear algebra subprograms (BLAS)," listed in "Other references" below.) For example, the BLAS Standard has
- 
     several new dense functions, like a fused vector update and dot product; 
- 
     sparse linear algebra functions, like sparse matrix-vector multiply and an interface for constructing sparse matrices; and 
- 
     extended- and mixed-precision dense functions (though we subsume some of their functionality; see below). 
Our proposal only includes core Reference BLAS functionality, for the following reasons:
- 
     Vendors who implement a new component of the C++ Standard Library will want to see and test against an existing reference implementation. 
- 
     Many applications that use sparse linear algebra also use dense, but not vice versa. 
- 
     The Sparse BLAS interface is a stateful interface that is not consistent with the dense BLAS, and would need more extensive redesign to translate into a modern C++ idiom. See discussion in P1417R0. 
- 
     Our proposal subsumes some dense mixed-precision functionality (see below). 
We have included vector sum-of-squares and matrix norms as exceptions, for the same reason that we include vector 2-norm: to expose hooks for quality-of-implementation improvements that avoid underflow or overflow when computing with floating-point values.
10.2. LAPACK or related functionality
The LAPACK Fortran library implements solvers for the following classes of mathematical problems:
- 
     linear systems, 
- 
     linear least-squares problems, and 
- 
     eigenvalue and singular value problems. 
It also provides matrix factorizations and related linear algebra operations. LAPACK deliberately relies on the BLAS for good performance; in fact, LAPACK and the BLAS were designed together. See history presented in P1417R0.
Several C++ libraries provide slices of LAPACK functionality. Here is a brief, noninclusive list, in alphabetical order, of some libraries actively being maintained:
P1417R0 gives some history of C++ linear algebra libraries. The authors of this proposal have designed, written, and maintained LAPACK wrappers in C++. Some authors have LAPACK founders as PhD advisors. Nevertheless, we have excluded LAPACK-like functionality from this proposal, for the following reasons:
- 
     LAPACK is a Fortran library, unlike the BLAS, which is a multilanguage standard. 
- 
     We intend to support more general element types, beyond the four that LAPACK supports. It’s much more straightforward to make a C++ BLAS work for general element types, than to make LAPACK algorithms work generically. 
First, unlike the BLAS, LAPACK is a Fortran library, not a standard. LAPACK was developed concurrently with the "level 3" BLAS functions, and the two projects share contributors. Nevertheless, only the BLAS and not LAPACK got standardized. Some vendors supply LAPACK implementations with some optimized functions, but most implementations likely depend heavily on "reference" LAPACK. There have been a few efforts by LAPACK contributors to develop C++ LAPACK bindings, from Lapack++ in pre-templates C++ circa 1993, to the recent "C++ API for BLAS and LAPACK". (The latter shares coauthors with this proposal.) However, these are still just C++ bindings to a Fortran library. This means that if vendors had to supply C++ functionality equivalent to LAPACK, they would either need to start with a Fortran compiler, or would need to invest a lot of effort in a C++ reimplementation. Mechanical translation from Fortran to C++ introduces risk, because many LAPACK functions depend critically on details of floating-point arithmetic behavior.
Second, we intend to permit use of matrix or vector element types other than just the four types that the BLAS and LAPACK support. This includes "short" floating-point types, fixed-point types, integers, and user-defined arithmetic types. Doing this is easier for BLAS-like operations than for the much more complicated numerical algorithms in LAPACK. LAPACK strives for a "generic" design (see Jack Dongarra interview summary in P1417R0), but only supports two real floating-point types and two complex floating-point types. Directly translating LAPACK source code into a "generic" version could lead to pitfalls. Many LAPACK algorithms only make sense for number systems that aim to approximate real numbers (or their complex extentions). Some LAPACK functions output error bounds that rely on properties of floating-point arithmetic.
For these reasons, we have left LAPACK-like functionality for future work. It would be natural for a future LAPACK-like C++ library to build on our proposal.
10.3. Extended-precision BLAS
Our interface subsumes some functionality of the Mixed-Precision BLAS
specification (Chapter 4 of the BLAS Standard).  For example, users
may multiply two 16-bit floating-point matrices (assuming that a
16-bit floating-point type exists) and accumulate into a 32-bit
floating-point matrix, just by providing a 32-bit floating-point
matrix as output.  Users may specify the precision of a dot product
result.  If it is greater than the input vectors' element type
precisions (e.g., 
However, we do not include the "Extended-Precision BLAS" in this proposal. The BLAS Standard lets callers decide at run time whether to use extended precision floating-point arithmetic for internal evaluations. We could support this feature at a later time. Implementations of our interface also have the freedom to use more accurate evaluation methods than typical BLAS implementations. For example, it is possible to make floating-point sums completely independent of parallel evaluation order.
10.4. Arithmetic operators and associated expression templates
Our proposal omits arithmetic operators on matrices and vectors. We do so for the following reasons:
- 
     We propose a low-level, minimal interface. 
- 
     operator * valarray 
- 
     Arithmetic operators require defining the element type of the vector or matrix returned by an expression. Functions let users specify this explicitly, and even let users use different output types for the same input types in different expressions. 
- 
     Arithmetic operators may require allocation of temporary matrix or vector storage. This prevents use of nonowning data structures. 
- 
     Arithmetic operators strongly suggest expression templates. These introduce problems such as dangling references and aliasing. 
Our goal is to propose a low-level interface. Other libraries, such as that proposed by P1385R6, could use our interface to implement overloaded arithmetic for matrices and vectors. A constrained, function-based, BLAS-like interface builds incrementally on the many years of BLAS experience.
Arithmetic operators on matrices and vectors would require the
library, not necessarily the user, to specify the element type of an
expression’s result.  This gets tricky if the terms have mixed element
types.  For example, what should the element type of the result of the
vector sum 
P1385R6 lets users customize the return type
of such arithmetic expressions.  However, different algorithms may
call for the same expression with the same inputs to have different
output types.  For example, iterative refinement of linear systems 
Arithmetic operators on matrices or vectors may also need to allocate
temporary storage.  Users may not want that.  When LAPACK’s developers
switched from Fortran 77 to a subset of Fortran 90, their users
rejected the option of letting LAPACK functions allocate temporary
storage on their own.  Users wanted to control memory allocation.
Also, allocating storage precludes use of nonowning input data
structures like 
Arithmetic expressions on matrices or vectors strongly suggest
expression templates, as a way to avoid allocation of temporaries and
to fuse computational kernels.  They do not require expression
templates.  For example, 
Expression templates work well, but have issues.  Our papers P1417R0 and
"Evolving a Standard C++ Linear Algebra Library from the BLAS" (P1674R0) give more detail on these
issues.  A particularly troublesome one is that C++ 
Our 
10.5. Banded matrix layouts
This proposal omits banded matrix types. It would be easy to add the required layouts and specializations of algorithms later. The packed and unpacked symmetric and triangular layouts in this proposal cover the major concerns that would arise in the banded case, like nonstrided and nonunique layouts, and matrix types that forbid access to some multi-indices in the Cartesian product of extents.
10.6. Tensors
We exclude tensors from this proposal, for the following reasons.
First, tensor libraries naturally build on optimized dense linear
algebra libraries like the BLAS, so a linear algebra library is a good
first step.  Second, 
10.7. Explicit support for asynchronous return of scalar values
After we presented revision 2 of this paper,
LEWG asked us to consider support for discrete graphics processing units (GPUs).
GPUs have two features of interest here.
First, they might have memory that is not accessible from ordinary C++ code,
but could be accessed in a standard algorithm
(or one of our proposed algorithms)
with the right implementation-specific 
Nothing in principle prevents 
Our proposal excludes algorithms that might write to their output arguments
at some time after after the algorithm returns.
First, LEWG insisted that our proposed algorithms that compute a scalar result,
like 
11. Design justification
We take a step-wise approach. We begin with core BLAS dense linear algebra functionality. We then deviate from that only as much as necessary to get algorithms that behave as much as reasonable like the existing C++ Standard Library algorithms. Future work or collaboration with other proposals could implement a higher-level interface.
We propose to build the initial interface on top of 
Please refer to our papers "Evolving a Standard C++ Linear Algebra Library from the BLAS" (P1674R0) and "Historical lessons for C++ linear algebra library standardization" (P1417R0). They will give details and references for many of the points that we summarize here.
11.1. We do not require using the BLAS library
Our proposal is based on the BLAS interface, and it would be natural for implementers to use an existing C or Fortran BLAS library. However, we do not require an underlying BLAS C interface. Vendors should have the freedom to decide whether they want to rely on an existing BLAS library. They may also want to write a "pure" C++ implementation that does not depend on an external library. They will, in any case, need a "generic" C++ implementation for matrix and vector element types other than the four that the BLAS supports.
11.2. Why use mdspan 
   11.2.1. View of a multidimensional array
The BLAS operates on what C++ programmers might call views of multidimensional arrays. Users of the BLAS can store their data in whatever data structures they like, and handle their allocation and lifetime as they see fit, as long as the data have a BLAS-compatible memory layout.
The corresponding C++ data structure is 
11.2.2. Ease of use
The 
11.2.3. BLAS and mdspan 
   The BLAS is low level; it imposes no mathematical meaning on multidimensional arrays.
This gives users the freedom to develop mathematical libraries with the semantics they want.
Similarly, 
We mention this because "matrix," "vector," and "tensor"
are mathematical ideas that mean more than just arrays of numbers.
This is more than just a theoretical concern.
Some BLAS functions operate on "triangular," "symmetric," or "Hermitian" matrices,
but they do not assert that a matrix has any of these mathematical properties.
Rather, they only read only one side of the matrix (the lower or upper triangle),
and compute as if the rest of the matrix satisfies the mathematical property.
A key feature of the BLAS and libraries that build on it, like LAPACK,
is that they can operate on the matrix’s data in place.
These operations change both the matrix’s mathematical properties and its representation in memory.
For example, one might have an N x N array representing a matrix that is symmetric in theory,
but computed and stored in a way that might not result in exactly symmetric data.
In order to solve linear systems with this matrix,
one might give the array to LAPACK’s 
The point of this example is that a "symmetric matrix class"
is the wrong way to model this situation.
There’s an N x N array, whose mathematical interpretation changes
with each in-place operation performed on it.
The low-level 
11.2.4. Hook for future expansion
The 
11.2.5. Generic enough to replace a "multidimensional array concept"
LEWGI requested in the 2019 Cologne meeting that we explore using a
concept instead of 
We investigated this option, and rejected it, for the following reasons.
- 
     Our proposal uses enough features of mdspan mdspan 
- 
     This proposal could support most multidimensional array types, if the array types just made themselves convertible to mdspan 
- 
     We could always generalize our algorithms later. 
- 
     Any multidimensional array concept would need revision in the light of P2128R5. 
This proposal refers to almost all of 
Suppose that a general customization point 
Requiring that a multidimensional array refer to a subset of a contiguous span of memory
would exclude multidimensional array types that have a noncontiguous backing store,
such as a 
Finally, any multidimensional array concept would need revision
in the light of P2128R5,
which has been sent by EWG to electronic polling, targeting CWG.
P2128 proposes letting 
After further discussion at the 2019 Belfast meeting,
LEWGI accepted our position that having our algorithms take 
11.3. Function argument aliasing and zero scalar multipliers
Summary:
- 
     The BLAS Standard forbids aliasing any input (read-only) argument with any output (write-only or read-and-write) argument. 
- 
     The BLAS uses INTENT ( INOUT ) transform 
- 
     The BLAS uses the values of scalar multiplier arguments ("alpha" or "beta") of vectors or matrices at run time, to decide whether to treat the vectors or matrices as write only. This matters both for performance and semantically, assuming IEEE floating-point arithmetic. 
- 
     We decide separately, based on the category of BLAS function, how to translate INTENT ( INOUT ) 
a. For triangular solve and triangular multiply, in-place behavior
      is essential for computing matrix factorizations in place,
      without requiring extra storage proportional to the input
      matrix’s dimensions.  However, in-place functions cannot be
      parallelized for arbitrary execution policies.  Thus, we have
      both not-in-place and in-place overloads, and only the
      not-in-place overloads take an optional 
b. Else, if the BLAS function unconditionally updates (like 
c. Else, if the BLAS function uses a scalar 
For a detailed analysis, see "Evolving a Standard C++ Linear Algebra Library from the BLAS" (P1674R0).
11.4. Support for different matrix layouts
Summary:
- 
     The dense BLAS supports several different dense matrix "types." Type is a mixture of "storage format" (e.g., packed, banded) and "mathematical property" (e.g., symmetric, Hermitian, triangular). 
- 
     Some "types" can be expressed as custom mdspan 
- 
     Thus, a C++ BLAS wrapper cannot overload on matrix "type" simply by overloading on mdspan 
For more details, including a list and description of the matrix "types" that the dense BLAS supports, see our paper "Evolving a Standard C++ Linear Algebra Library from the BLAS" (P1674R0).
A C++ linear algebra library has a few possibilities for distinguishing the matrix "type":
- 
     It could imitate the BLAS, by introducing different function names, if the layouts and accessors do not sufficiently describe the arguments. 
- 
     It could introduce a hierarchy of higher-level classes for representing linear algebra objects, use mdspan 
- 
     It could use the layout and accessor types in mdspan 
We have chosen Approach 1.  Our view is that a BLAS-like interface
should be as low-level as possible.  Approach 2 is more like a "Matlab
in C++"; a library that implements this could build on our proposal’s
lower-level library.  Approach 3 sounds attractive.  However, most
BLAS matrix "types" do not have a natural representation as layouts.
Trying to hack them in would pollute 
11.5. Over- and underflow wording for vector 2-norm
SG6 recommended to us at Belfast 2019 to change the special overflow /
underflow wording for 
Previous versions of this paper asked implementations to compute
vector 2-norms "without undue overflow or underflow at intermediate
stages of the computation."  "Undue" imitates existing C++ Standard
wording for 
Whether or when library functions raise an undeserved "underflow" floating-point exception is unspecified. Otherwise, as implied by F.7.6, the <math.h> functions do not raise spurious floating-point exceptions (detectable by the user) [including the "overflow" exception discussed in paragraph 6], other than the "inexact" floating-point exception.
However, these requirements are for math library functions like 
The BLAS Standard says of several operations, including vector 2-norm: "Here are the exceptional routines where we ask for particularly careful implementations to avoid unnecessary over/underflows, that could make the output unnecessarily inaccurate or unreliable" (p. 35).
The BLAS Standard does not define phrases like "unnecessary
over/underflows."  The likely intent is to avoid naïve implementations
that simply add up the squares of the vector elements.  These would
overflow even if the norm in exact arithmetic is significantly less
than the overflow threshold.  The POSIX Standard (IEEE Std
1003.1-2017) analogously says that 
The phrase "precautions against overflow" is too vague for us to translate into a requirement. The authors likely meant to exclude naïve implementations, but not require implementations to know whether a result computed in exact arithmetic would overflow or underflow. The latter is a special case of computing floating-point sums exactly, which is costly for vectors of arbitrary length. While it would be a useful feature, it is difficult enough that we do not want to require it, especially since the BLAS Standard itself does not. The implementation of vector 2-norms in the Reference BLAS included with LAPACK 3.10.0 partitions the running sum of squares into three different accumulators: one for big values (that might cause the sum to overflow without rescaling), one for small values (that might cause the sum to underflow without rescaling), and one for the remaining "medium" values. (See Anderson 2017.) Earlier implementations merely rescaled by the current maximum absolute value of all the vector entries seen thus far. (See Blue 1978.) Implementations could also just compute the sum of squares in a straightforward loop, then check floating-point status flags for underflow or overflow, and recompute if needed.
For all of the functions listed on p. 35 of the BLAS Standard as needing "particularly careful implementations," except vector norm, the BLAS Standard has an "Advice to implementors" section with extra accuracy requirements. The BLAS Standard does have an "Advice to implementors" section for matrix norms (see Section 2.8.7, p. 69), which have similar over- and underflow concerns as vector norms. However, the Standard merely states that "[h]igh-quality implementations of these routines should be accurate" and should document their accuracy, and gives examples of "accurate implementations" in LAPACK.
The BLAS Standard never defines what "Advice to implementors" means. However, the BLAS Standard shares coauthors and audience with the Message Passing Interface (MPI) Standard, which defines "Advice to implementors" as "primarily commentary to implementors" and permissible to skip (see e.g., MPI 3.0, Section 2.1, p. 9). We thus interpret "Advice to implementors" in the BLAS Standard as a nonbinding quality of implementation (QoI) recommendation.
11.6. Constraining matrix and vector element types and scalars
11.6.1. Introduction
The BLAS only accepts four different types of scalars and matrix and vector elements.
In C++ terms, these correspond to 
"Constraining matrix and vector element types and scalars" means defining how these types must behave in order for our algorithms to make sense. This includes both syntactic and semantic constraints. We have three goals:
- 
     to help implementers implement our algorithms correctly; 
- 
     to give implementers the freedom to make quality of implementation (QoI) enhancements, for both performance and accuracy; and 
- 
     to help users understand what types they may use with our algorithms. 
The whole point of the BLAS was to identify key operations for vendors to optimize. Thus, performance is a major concern. "Accuracy" here refers to either to rounding error or to approximation error (for matrix or vector element types where either makes sense).
11.6.2. Value type constraints do not suffice to describe algorithm behavior
LEWG’s 2020 review of P1673R2 asked us to investigate conceptification of its algorithms.
"Conceptification" here refers to an effort like that of P1813R0
("A Concept Design for the Numeric Algorithms"),
to come up with concepts that could be used to constrain the template parameters
of numeric algorithms like 
We agree that it is useful to set constraints that make it possible to reason about correctness of algorithms. However, we do not think constraints on value types suffice for this purpose. First, requirements like associativity are too strict to be useful for practical types. Second, what we really want to do is describe the behavior of algorithms, regardless of value types' semantics. "The algorithm may reorder sums" means something different than "addition on the terms in the sum is associative."
11.6.3. Associativity is too strict
P1813R0 requires associative addition for many algorithms, such as 
- 
     systems with rounding; 
- 
     systems with an "infinity": e.g., if 10 is Inf, 3 + 8 - 7 could be either Inf or 4; and 
- 
     saturating arithmetic: e.g., if 10 saturates, 3 + 8 - 7 could be either 3 or 4. 
Note that the latter two arithmetic systems have nothing to do with rounding error.
With saturating integer arithmetic, parenthesizing a sum in different ways might give results
that differ by as much as the saturation threshold.
It’s true that many non-associative arithmetic systems behave
"associatively enough" that users don’t fear parallelizing sums.
However, a concept with an exact property (like "commutative semigroup")
isn’t the right match for "close enough,"
just like 
The problem is a mismatch between the requirement we want to express —
We think this is important both for adding new algorithms (like those in this proposal)
and for defining behavior of an algorithm with respect to different 
11.6.4. Generalizing associativity helps little
Suppose we accept that associativity and related properties are not useful
for describing our proposed algorithms.
Could there be a generalization of associativity that would be useful?
P1813R0’s most general concept is a 
- 
     A magma only assumes one set, that is, one type. This does not accurately describe what the algorithms do, and it excludes useful features like mixed precision and types that use expression templates. 
- 
     A magma is too specific, because algorithms are useful even if the binary operation is not closed. 
First, even for simple linear algebra operations that "only" use plus and times, there is no one "set M" over which plus and times operate. There are actually three operations: plus, times, and assignment. Each operation may have completely heterogeneous input(s) and output. The sets (types) that may occur vary from algorithm to algorithm, depending on the input type(s), and the algebraic expression(s) that the algorithm is allowed to use. We might need several different concepts to cover all the expressions that algorithms use, and the concepts would end up being less useful to users than the expressions themselves.
For instance, consider the Level 1 BLAS "AXPY" function.
This computes 
We could try to describe this with a concept that expresses a sum type. The sum type would include all the types that might show up in the expression. However, we do not think this would improve clarity over just the expression itself. Furthermore, different algorithms may need different expressions, so we would need multiple concepts, one for each expression. Why not just use the expressions to describe what the algorithms can do?
Second, the magma concept is not helpful even if we only had one set M, because our algorithms would still be useful even if binary operations were not closed over that set. For example, consider a hypothetical user-defined rational number type, where plus and times throw if representing the result of the operation would take more than a given fixed amount of memory. Programmers might handle this exception by falling back to different algorithms. Neither plus or times on this type would satisfy the magma requirement, but the algorithms would still be useful for such a type. One could consider the magma requirement satisfied in a purely syntactic sense, because of the return type of plus and times. However, saying that would not accurately express the type’s behavior.
This point returns us to the concerns we expressed earlier about assuming associativity. "Approximately associative" or "usually associative" are not useful concepts without further refinement. The way to refine these concepts usefully is to describe the behavior of a type fully, e.g., the way that IEEE 754 describes the behavior of floating-point numbers. However, algorithms rarely depend on all the properties in a specification like IEEE 754. The problem, again, is that we need to describe what algorithms do -- e.g., that they can rearrange terms in a sum -- not how the types that go into the algorithms behave.
In summary:
- 
     Many useful types have nonassociative or even non-closed arithmetic. 
- 
     Lack of (e.g.,) associativity is not just a rounding error issue. 
- 
     It can be useful to let algorithms do things like reparenthesize sums or products, even for types that are not associative. 
- 
     Permission for an algorithm to reparenthesize sums is not the same as a concept constraining the terms in the sum. 
- 
     We can and do use existing Standard language, like GENERALIZED_SUM, for expressing permissions that algorithms have. 
In the sections that follow, we will describe a different way to constrain the matrix and vector element types and scalars in our algorithms. We will start by categorizing the different quality of implementation (QoI) enhancements that implementers might like to make. These enhancements call for changing algorithms in different ways. We will distinguish textbook from non-textbook ways of changing algorithms, explain that we only permit non-textbook changes for floating-point types, then develop constraints on types that permit textbook changes.
11.6.5. Categories of QoI enhancements
An important goal of constraining our algorithms is to give implementers the freedom to make QoI enhancements. We categorize QoI enhancements in three ways:
- 
     those that depend entirely on the computer architecture; 
- 
     those that might have architecture-dependent parameters, but could otherwise be written in an architecture-independent way; and 
- 
     those that diverge from a textbook description of the algorithm, and depend on element types having properties more specific than what that description requires. 
An example of Category (1) would be special hardware instructions that perform matrix-matrix multiplications on small, fixed-size blocks. The hardware might only support a few types, such as integers, fixed-point reals, or floating-point types. Implementations might use these instructions for the entire algorithm, if the problem sizes and element types match the instruction’s requirements. They might also use these instructions to solve subproblems. In either case, these instructions might reorder sums or create temporary values.
Examples of Category (2) include blocking to increase cache or translation lookaside buffer (TLE) reuse, or using SIMD instructions (given the Parallelism TS' inclusion of SIMD). Many of these optimizations relate to memory locality or parallelism. For an overview, see (Goto 2008) or Section 2.6 of (Demmel 1997). All such optimizations reorder sums and create temporary values.
Examples of Category (3) include Strassen’s algorithm for matrix multiplication. The textbook formulation of matrix multiplication only uses additions and multiplies, but Strassen’s algorithm also performs subtractions. A common feature of Category (3) enhancements is that their implementation diverges from a "textbook description of the algorithm" in ways beyond just reordering sums. As a "textbook," we recommend either (Strang 2016), or the concise mathematical description of operations in the BLAS Standard. In the next section, we will list properties of textbook descriptions, and explain some ways in which QoI enhancements might fail to adhere to those properties.
11.6.6. Properties of textbook algorithm descriptions
"Textbook descriptions" of the algorithms we propose tend to have the following properties. For each property, we give an example of a "non-textbook" algorithm, and how it assumes something extra about the matrix’s element type.
a. They compute floating-point sums straightforwardly (possibly reordered, or with temporary intermediate values), rather than using any of several algorithms that improve accuracy (e.g., compensated summation) or even make the result independent of evaluation order (see Demmel 2013). All such non-straightforward algorithms depend on properties of floating-point arithmetic. We will define below what "possibly reordered, or with temporary intermediate values" means.
b. They use only those arithmetic operations on the matrix and vector element types that the textbook description of the algorithm requires, even if using other kinds of arithmetic operations would improve performance or give an asymptotically faster algorithm.
c. They use exact algorithms (not considering rounding error), rather than approximations (that would not be exact even if computing with real numbers).
d. They do not use parallel algorithms that would give an asymptotically faster parallelization in the theoretical limit of infinitely many available parallel processing units, at the cost of likely unacceptable rounding error in floating-point arithmetic.
As an example of (b), the textbook matrix multiplication algorithm only adds or multiplies the matrices' elements. In contrast, Strassen’s algorithm for matrix-matrix multiply subtracts as well as adds and multiplies the matrices' elements. Use of subtraction assumes that arbitrary elements have an additive inverse, but the textbook matrix multiplication algorithm makes sense even for element types that lack additive inverses for all elements. Also, use of subtractions changes floating-point rounding behavior in a way that was only fully understood recently (see Demmel 2007).
As an example of (c), the textbook substitution algorithm for solving triangular linear systems is exact. In contrast, one can approximate triangular solve with a stationary iteration. (See, e.g., Section 5 of (Chow 2015). That paper concerns the sparse matrix case; we cite it merely as an example of an approximate algorithm, not as a recommendation for dense triangular solve.) Approximation only makes sense for element types that have enough precision for the approximation to be accurate. If the approximation checks convergence, than the algorithm also requires less-than comparison of absolute values of differences.
Multiplication by the reciprocal of a number, rather than division by that number, could fit into (b) or (c). As an example of (c), implementations for hardware where floating-point division is slow compared with multiplication could use an approximate reciprocal multiplication to implement division.
As an example of (d), the textbook substitution algorithm for solving triangular linear systems has data dependencies that limit its theoretical parallelism. In contrast, one can solve a triangular linear system by building all powers of the matrix in parallel, then solving the linear system as with a Krylov subspace method. This approach is exact for real numbers, but commits too much rounding error to be useful in practice for all but the smallest linear systems. In fact, the algorithm requires that the matrix’s element type have precision exponential in the matrix’s dimension.
Many of these non-textbook algorithms rely on properties of floating-point arithmetic. Strassen’s algorithm makes sense for unsigned integer types, but it could lead to unwarranted and unexpected overflow for signed integer types. Thus, we think it best to limit implementers to textbook algorithms, unless all matrix and vector element types are floating-point types. We always forbid non-textbook algorithms of type (d). If all matrix and vector element types are floating-point types, we permit non-textbook algorithms of Types (a), (b), and (c), under two conditions:
- 
     they satisfy the complexity requirements; and 
- 
     they result in a logarithmically stable algorithm, in the sense of (Demmel 2007). 
We believe that Condition (2) is a reasonable interpretation of Section 2.7 of the BLAS Standard.
This says that "no particular computational order is mandated by the function specifications.
In other words, any algorithm that produces results 'close enough'
to the usual algorithms presented in a standard book on matrix computations
is acceptable."
Examples of what the BLAS Standard considers "acceptable"
include Strassen’s algorithm, and implementing matrix multiplication as 
"Textbook algorithms" includes optimizations commonly found in BLAS implementations. This includes any available hardware acceleration, as well as the locality and parallelism optimizations we describe below. Thus, we think restricting generic implementations to textbook algorithms will not overly limit implementers.
The set of floating-point types currently has three members: 
11.6.7. Reordering sums and creating temporaries
Even textbook descriptions of linear algebra algorithms presume the freedom to reorder sums and create temporary values. Optimizations for memory locality and parallelism depend on this. This freedom imposes requirements on algorithms' matrix and vector element types.
We could get this freedom either by limiting our proposal to the Standard’s current arithmetic types, or by forbidding reordering and temporaries for types other than arithmetic types. However, doing so would unnecessarily prevent straightforward optimizations for small and fast types that act just like arithmetic types. This includes so-called "short floats" such as bfloat16 or binary16, extended-precision floating-point numbers, and fixed-point reals. Some of these types may be implementation defined, and others may be user-specified. We intend to permit implementers to optimize for these types as well. This motivates us to describe our algorithms' type requirements in a generic way.
11.6.7.1. Special case: Only one element type
We find it easier to think about type requirements by starting with the assumption that all element and scalar types in algorithms are the same. One can then generalize to input element type(s) that might differ from the output element type and/or scalar result type.
Optimizations for memory locality and parallelism
both create temporary values, and change the order of sums.
For example, reorganizing matrix data to reduce stride involves
making a temporary copy of a subset of the matrix,
and accumulating partial sums into the temporary copy.
Thus, both kinds of optimizations impose
a common set of requirements and assumptions on types.
Let 
- 
     create arbitrarily many objects of type value_type 
- 
     perform sums in any order; or 
- 
     replace any value with the sum of that value and a value-initialized value_type 
Assumption (1) implies that the output value type is 
Regarding Assumption (2): The freedom to compute sums in any order is not necessarily a type constraint. Rather, it’s a right that the algorithm claims, regardless of whether the type’s addition is associative or commutative. For example, floating-point sums are not associative, yet both parallelization and customary linear algebra optimizations rely on reordering sums. See the above "Value type constraints do not suffice to describe algorithm behavior" section for a more detailed explanation.
Regarding Assumption (3), we do not actually say that value-initialization produces a two-sided additive identity. What matters is what the algorithm’s implementation may do, not whether the type actually behaves in this way.
11.6.7.2. General case: Multiple input element types
An important feature of P1673 is the ability to compute
with mixed matrix or vector element types.
For instance, 
11.6.7.2.1. Accumulate into output value type
Generic algorithms would use the output 
11.6.7.2.2. Proxy references or expression templates
- 
     Proxy references: The input and/or output mdspan reference element_type & mdspan value_type reference atomic_ref < value_type > 
- 
     Expression templates: The element types themselves might have arithmetic operations that defer the actual computation until the expression is assigned. These "expression template" types typically hold some kind of reference or pointer to their input arguments. 
Neither proxy references nor expression template types are 
The z = y + alpha*x example above shows that some of the algorithms we propose have multiple terms in a sum on the right-hand side of the expression that defines the algorithm. If algorithms have permission to rearrange the order of sums, then they need to be able to break up such expressions into separate terms, even if some of those expressions are expression templates.
11.6.8. "Textbook" algorithm description in semiring terms
As we explain in the "Value type constraints do not suffice to describe algorithm behavior" section above, we deliberately constrain matrix and vector element types to require associative addition. This means that we do not, for instance, define concepts like "ring" or "group." We cannot even speak of a single set of values that would permit defining things like a "ring" or "group." This is because our algorithms must handle mixed value types, expression templates, and proxy references. However, it may still be helpful to use mathematical language to explain what we mean by "a textbook description of the algorithm."
Most of the algorithms we propose only depend on addition and multiplication. We describe these algorithms in terms of one or more mathematical expressions on elements of a semiring with possibly noncommutative multiplication. The only difference between a semiring and a ring is that a semiring does not require all elements to have an additive inverse. That is, addition is allowed, but not subtraction. Implementers may apply any mathematical transformation to the expressions that would give the same result for any semiring.
11.6.8.1. Why a semiring?
We use a semiring because
- 
     we generally want to reorder terms in sums, but we do not want to order terms in products; and 
- 
     we do not want to assume that subtraction works. 
The first is because linear algebra computations are useful for matrix or vector element types with noncommutative multiplication, such as quaternions or matrices. The second is because algebra operations might be useful for signed integers, where a formulation using subtraction risks unexpected undefined behavior.
11.6.8.2. Semirings and testing
It’s important that implementers be able to test our proposed algorithms for custom element types, not just the built-in arithmetic types. We don’t want to require hypothetical "exact real arithmetic" types that take particular expertise to implement. Instead, we propose testing with simple classes built out of unsigned integers. This section is not part of our Standard Library proposal, but we include it to give guidance to implementers and to show that it’s feasible to test our proposal.
11.6.8.3. Commutative multiplication
C++ unsigned integers implement commutative rings.
(Rings always have commutative addition;
a "commutative ring" has commutative multiplication as well.)
We may transform (say) 
11.6.8.4. Noncommutative multiplication
The semiring of 2x2 matrices with element type a commutative semiring is itself a semiring, but with noncommutative multiplication. This is a good way to build a noncommutative semiring for testing.
11.6.9. Summary
- 
     Constraining the matrix and vector element types and scalar types in our functions gives implementers the freedom to make QoI enhancements without risking correctness. "Value type constraints do not suffice to describe algorithm behavior" section above, 
- 
     We think describing algorithms' behavior and implementation freedom is more useful than mathematical concepts like "ring." For example, we permit implementations to reorder sums, but this does not mean that they assume sums are associative. 
- 
     We categorize different ways that implementers might like to change algorithms, list categories we exclude and categories we permit, and use the permitted categories to derive constraints on the types of matrix and vector elements and scalar results. 
- 
     We explain how a semiring is a good way to talk about implementation freedom, even though we do not think it is a good way to constrain types. We then use the semiring description to explain how implementers can test generic algorithms. 
12. Future work
Summary:
- 
     Consider generalizing function parameters to take any type that implements the get_mdspan mdarray 
- 
     Add batched linear algebra overloads. 
12.1. Generalize function parameters
Our functions differ from the C++ Standard algorithms, in that they
take a concrete type 
In a future proposal, we may consider generalizing our function’s template
parameters, to permit any type besides 
The 
Previous versions of this proposal included function overloads that
took 
12.2. Batched linear algebra
We plan to write a separate proposal that will add "batched" versions of linear algebra functions to this proposal. "Batched" linear algebra functions solve many independent problems all at once, in a single function call. For discussion, see Section 6.2 of our background paper P1417R0. Batched interfaces have the following advantages:
- 
     They expose more parallelism and vectorization opportunities for many small linear algebra operations. 
- 
     They are useful for many different fields, including machine learning. 
- 
     Hardware vendors currently offer both hardware features and optimized software libraries to support batched linear algebra. 
- 
     There is an ongoing interface standardization effort, in which we participate. 
The 
With few exceptions, the extension of this proposal to support batched operations will not require new functions or interface changes. Only the requirements on functions will change. Output arguments can have an additional rank; if so, then the leftmost extent will refer to the batch dimension. Input arguments may also have an additional rank to match; if they do not, the function will use ("broadcast") the same input argument for all the output arguments in the batch.
13. Data structures and utilities borrowed from other proposals
13.1. mdspan 
   This proposal depends on P0009R12 and its follow-on paper P2299R3,
which together propose adding multidimensional arrays to the C++ Standard
Library.  P0009’s main class is 
- 
     Layout Layout mdspan 
- 
     Accessor pointer mdspan atomic_ref Accessor mdspan 
The 
13.2. New mdspan 
   Our proposal uses the layout mapping policy of 
- 
     Unique 
- 
     Contiguous 
- 
     Strided 
P0009R12 includes three different layouts -- 
This proposal includes the following additional layouts:
- 
     layout_blas_general layout_left layout_right 
- 
     layout_blas_packed 
These layouts have "tag" template parameters that control their properties; see below.
We do not include layouts for unpacked "types," such as Symmetric
(SY), Hermitian (HE), and Triangular (TR).  P1674 explains our
reasoning.  In summary: Their actual layout -- the arrangement of
matrix elements in memory -- is the same as General.  The only
differences are constraints on what entries of the matrix algorithms
may access, and assumptions about the matrix’s mathematical
properties.  Trying to express those constraints or assumptions as
"layouts" or "accessors" violates the spirit (and sometimes the law)
of 
The packed matrix "types" do describe actual arrangements of matrix
elements in memory that are not the same as in General.  This is why
we provide 
Algorithms cannot be written generically if they permit output
arguments with nonunique layouts.  Nonunique output arguments require
specialization of the algorithm to the layout, since there’s no way to
know generically at compile time what indices map to the same matrix
element.  Thus, we will impose the following rule: Any true), unless otherwise specified.
Some of our functions explicitly require outputs with specific nonunique layouts. This includes low-rank updates to symmetric or Hermitian matrices.
14. Acknowledgments
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.
Special thanks to Bob Steagall and Guy Davidson for boldly leading the charge to add linear algebra to the C++ Standard Library, and for many fruitful discussions. Thanks also to Andrew Lumsdaine for his pioneering efforts and history lessons. In addition, I very much appreciate feedback from Davis Herring on constraints wording.
15. References
15.1. References by coathors
- 
     G. Ballard, E. Carson, J. Demmel, M. Hoemmen, N. Knight, and O. Schwartz, "Communication lower bounds and optimal algorithms for numerical linear algebra,", Acta Numerica, Vol. 23, May 2014, pp. 1-155. 
- 
     C. Trott, D. S. Hollman, D. Lebrun-Grande, M. Hoemmen, D. Sunderland, H. C. Edwards, B. A. Lelbach, M. Bianco, B. Sander, A. Iliopoulos, J. Michopoulos, and N. Liber, " mdspan 
- 
     G. Davidson, M. Hoemmen, D. S. Hollman, B. Steagall, and C. Trott, P1891R0, Oct. 2019. 
- 
     B. A. Lelbach, " mdspan 
- 
     M. Hoemmen, D. S. Hollman, and C. Trott, "Evolving a Standard C++ Linear Algebra Library from the BLAS," P1674R0, Jun. 2019. 
- 
     M. Hoemmen, J. Badwaik, M. Brucher, A. Iliopoulos, and J. Michopoulos, "Historical lessons for C++ linear algebra library standardization," P1417R0, Jan. 2019. 
- 
     M. Hoemmen, D. S. Hollman, C. Jabot, I. Muerte, and C. Trott, "Multidimensional subscript operator," P2128R5, Apr. 2021. 
- 
     D. S. Hollman, C. Trott, M. Hoemmen, and D. Sunderland, " mdarray mdspan 
- 
     D. S. Hollman, C. Kohlhoff, B. A. Lelbach, J. Hoberock, G. Brown, and M. Dominiak, "A General Property Customization Mechanism," P1393R0, Jan. 2019. 
15.2. Other references
- 
     E. Anderson, "Algorithm 978: Safe Scaling in the Level 1 BLAS," ACM Transactions on Mathematical Software, Vol. 44, pp. 1-28, 2017. 
- 
     "Basic Linear Algebra Subprograms Technical (BLAST) Forum Standard," International Journal of High Performance Applications and Supercomputing, Vol. 16, No. 1, Spring 2002. 
- 
     L. S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo, K. Remington, and R. C. Whaley, "An updated set of basic linear algebra subprograms (BLAS)," ACM Transactions on Mathematical Software, Vol. 28, No. 2, Jun. 2002, pp. 135-151. 
- 
     J. L. Blue, "A Portable Fortran Program to Find the Euclidean Norm of a Vector," ACM Transactions on Mathematical Software, Vol. 4, pp. 15-23, 1978. 
- 
     E. Chow and A. Patel, "Fine-Grained Parallel Incomplete LU Factorization", SIAM J. Sci. Comput., 37(2), C169–C193, 2015. 
- 
     G. Davidson and B. Steagall, "A proposal to add linear algebra support to the C++ standard library," P1385R6, Mar. 2020. 
- 
     B. Dawes, H. Hinnant, B. Stroustrup, D. Vandevoorde, and M. Wong, "Direction for ISO C++," P0939R4, Oct. 2019. 
- 
     J. Demmel, "Applied Numerical Linear Algebra," Society for Industrial and Applied Mathematics, Philadelphia, PA, 1997, ISBN 0-89871-389-7. 
- 
     J. Demmel, I. Dumitriu, and O. Holtz, "Fast linear algebra is stable," Numerische Mathematik 108 (59-91), 2007. 
- 
     J. Demmel and H. D. Nguyen, "Fast Reproducible Floating-Point Summation," 2013 IEEE 21st Symposium on Computer Arithmetic, 2013, pp. 163-172, doi: 10.1109/ARITH.2013.9. 
- 
     J. Dongarra, R. Pozo, and D. Walker, "LAPACK++: A Design Overview of Object-Oriented Extensions for High Performance Linear Algebra," in Proceedings of Supercomputing '93, IEEE Computer Society Press, 1993, pp. 162-171. 
- 
     M. Gates, P. Luszczek, A. Abdelfattah, J. Kurzak, J. Dongarra, K. Arturov, C. Cecka, and C. Freitag, "C++ API for BLAS and LAPACK," SLATE Working Notes, Innovative Computing Laboratory, University of Tennessee Knoxville, Feb. 2018. 
- 
     K. Goto and R. A. van de Geijn, "Anatomy of high-performance matrix multiplication,", ACM Transactions on Mathematical Software (TOMS), Vol. 34, No. 3, May 2008. 
- 
     J. Hoberock, "Integrating Executors with Parallel Algorithms," P1019R2, Jan. 2019. 
- 
     N. A. Josuttis, "The C++ Standard Library: A Tutorial and Reference," Addison-Wesley, 1999. 
- 
     M. Kretz, "Data-Parallel Vector Types & Operations," P0214R9, Mar. 2018. 
- 
     G. Strang, "Introduction to Linear Algebra," 5th Edition, Wellesley - Cambridge Press, 2016, ISBN 978-0-9802327-7-6, x+574 pages. 
- 
     D. Vandevoorde and N. A. Josuttis, "C++ Templates: The Complete Guide," Addison-Wesley Professional, 2003. 
16. Wording
Text in blockquotes is not proposed wording, but rather instructions for generating proposed wording. The � character is used to denote a placeholder section number which the editor shall determine. First, apply all wording first from P0009R12, and then from P2299R3. (This proposal is a "rebase" atop the changes proposed by P0009R12 and P2299R3. P2299R3’s changes are relative to those proposed by P0009R12.) At the end of Table � ("Numerics library summary") in [numerics.general], add the following: [linalg], Linear algebra,
. At the end of [numerics], add all the material that follows.< linalg > 
16.1. Header < linalg > 
namespace std :: linalg { // [linalg.tags.order], storage order tags struct column_major_t ; inline constexpr column_major_t column_major ; struct row_major_t ; inline constexpr row_major_t row_major ; // [linalg.tags.triangle], triangle tags struct upper_triangle_t ; inline constexpr upper_triangle_t upper_triangle ; struct lower_triangle_t ; inline constexpr lower_triangle_t lower_triangle ; // [linalg.tags.diagonal], diagonal tags struct implicit_unit_diagonal_t ; inline constexpr implicit_unit_diagonal_t implicit_unit_diagonal ; struct explicit_diagonal_t ; inline constexpr explicit_diagonal_t explicit_diagonal ; // [linalg.layouts.general], class template layout_blas_general template < class StorageOrder > class layout_blas_general ; // [linalg.layouts.packed], class template layout_blas_packed template < class Triangle , class StorageOrder > class layout_blas_packed ; // [linalg.scaled.accessor_scaled], class template accessor_scaled template < class ScalingFactor , class Accessor > class accessor_scaled ; // [linalg.scaled.scaled], scaled in-place transformation template < class ScalingFactor , class ElementType , class Extents , class Layout , class Accessor > /* see-below */ scaled ( const ScalingFactor & s , const mdspan < ElementType , Extents , Layout , Accessor >& a ); // [linalg.conj.accessor_conjugate], class template accessor_conjugate template < class Accessor > class accessor_conjugate ; // [linalg.conj.conjugated], conjugated in-place transformation template < class ElementType , class Extents , class Layout , class Accessor > /* see-below */ conjugated ( mdspan < ElementType , Extents , Layout , Accessor > a ); // [linalg.transp.layout_transpose], class template layout_transpose template < class Layout > class layout_transpose ; // [linalg.transp.transposed], transposed in-place transformation template < class ElementType , class Extents , class Layout , class Accessor > /* see-below */ transposed ( mdspan < ElementType , Extents , Layout , Accessor > a ); // [linalg.conj_transp], // conjugated transposed in-place transformation template < class ElementType , class Extents , class Layout , class Accessor > /* see-below */ conjugate_transposed ( mdspan < ElementType , Extents , Layout , Accessor > a ); // [linalg.algs.blas1.givens.lartg], compute Givens rotation template < class Real > void givens_rotation_setup ( const Real a , const Real b , Real & c , Real & s , Real & r ); template < class Real > void givens_rotation_setup ( const complex < Real >& a , const complex < Real >& a , Real & c , complex < Real >& s , complex < Real >& r ); // [linalg.algs.blas1.givens.rot], apply computed Givens rotation template < class inout_vector_1_t , class inout_vector_2_t , class Real > void givens_rotation_apply ( inout_vector_1_t x , inout_vector_2_t y , const Real c , const Real s ); template < class ExecutionPolicy , class inout_vector_1_t , class inout_vector_2_t , class Real > void givens_rotation_apply ( ExecutionPolicy && exec , inout_vector_1_t x , inout_vector_2_t y , const Real c , const Real s ); template < class inout_vector_1_t , class inout_vector_2_t , class Real > void givens_rotation_apply ( inout_vector_1_t x , inout_vector_2_t y , const Real c , const complex < Real > s ); template < class ExecutionPolicy , class inout_vector_1_t , class inout_vector_2_t , class Real > void givens_rotation_apply ( ExecutionPolicy && exec , inout_vector_1_t x , inout_vector_2_t y , const Real c , const complex < Real > s ); } // [linalg.algs.blas1.swap], swap elements template < class inout_object_1_t , class inout_object_2_t > void swap_elements ( inout_object_1_t x , inout_object_2_t y ); template < class ExecutionPolicy , class inout_object_1_t , class inout_object_2_t > void swap_elements ( ExecutionPolicy && exec , inout_object_1_t x , inout_object_2_t y ); // [linalg.algs.blas1.scal], multiply elements by scalar template < class Scalar , class inout_object_t > void scale ( const Scalar alpha , inout_object_t obj ); template < class ExecutionPolicy , class Scalar , class inout_object_t > void scale ( ExecutionPolicy && exec , const Scalar alpha , inout_object_t obj ); // [linalg.algs.blas1.copy], copy elements template < class in_object_t , class out_object_t > void copy ( in_object_t x , out_object_t y ); template < class ExecutionPolicy , class in_object_t , class out_object_t > void copy ( ExecutionPolicy && exec , in_object_t x , out_object_t y ); // [linalg.algs.blas1.add], add elementwise template < class in_object_1_t , class in_object_2_t , class out_object_t > void add ( in_object_1_t x , in_object_2_t y , out_object_t z ); template < class ExecutionPolicy , class in_object_1_t , class in_object_2_t , class out_object_t > void add ( ExecutionPolicy && exec , in_object_1_t x , in_object_2_t y , out_object_t z ); // [linalg.algs.blas1.dot], // dot product of two vectors // [linalg.algs.blas1.dot.dotu], // nonconjugated dot product of two vectors template < class in_vector_1_t , class in_vector_2_t , class T > T dot ( in_vector_1_t v1 , in_vector_2_t v2 , T init ); template < class ExecutionPolicy , class in_vector_1_t , class in_vector_2_t , class T > T dot ( ExecutionPolicy && exec , in_vector_1_t v1 , in_vector_2_t v2 , T init ); template < class in_vector_1_t , class in_vector_2_t > auto dot ( in_vector_1_t v1 , in_vector_2_t v2 ) -> /* see-below */ ; template < class ExecutionPolicy , class in_vector_1_t , class in_vector_2_t > auto dot ( ExecutionPolicy && exec , in_vector_1_t v1 , in_vector_2_t v2 ) -> /* see-below */ ; // [linalg.algs.blas1.dot.dotc], // conjugated dot product of two vectors template < class in_vector_1_t , class in_vector_2_t , class T > T dotc ( in_vector_1_t v1 , in_vector_2_t v2 , T init ); template < class ExecutionPolicy , class in_vector_1_t , class in_vector_2_t , class T > T dotc ( ExecutionPolicy && exec , in_vector_1_t v1 , in_vector_2_t v2 , T init ); template < class in_vector_1_t , class in_vector_2_t > auto dotc ( in_vector_1_t v1 , in_vector_2_t v2 ) -> /* see-below */ ; template < class ExecutionPolicy , class in_vector_1_t , class in_vector_2_t > auto dotc ( ExecutionPolicy && exec , in_vector_1_t v1 , in_vector_2_t v2 ) -> /* see-below */ ; // [linalg.algs.blas1.ssq], // Scaled sum of squares of a vector’s elements template < class T > struct sum_of_squares_result { T scaling_factor ; T scaled_sum_of_squares ; }; template < class in_vector_t , class T > sum_of_squares_result < T > vector_sum_of_squares ( in_vector_t v , sum_of_squares_result < T > init ); sum_of_squares_result < T > vector_sum_of_squares ( ExecutionPolicy && exec , in_vector_t v , sum_of_squares_result < T > init ); // [linalg.algs.blas1.nrm2], // Euclidean norm of a vector template < class in_vector_t , class T > T vector_norm2 ( in_vector_t v , T init ); template < class ExecutionPolicy , class in_vector_t , class T > T vector_norm2 ( ExecutionPolicy && exec , in_vector_t v , T init ); template < class in_vector_t > auto vector_norm2 ( in_vector_t v ) -> /* see-below */ ; template < class ExecutionPolicy , class in_vector_t > auto vector_norm2 ( ExecutionPolicy && exec , in_vector_t v ) -> /* see-below */ ; // [linalg.algs.blas1.asum], // sum of absolute values of vector elements template < class in_vector_t , class T > T vector_abs_sum ( in_vector_t v , T init ); template < class ExecutionPolicy , class in_vector_t , class T > T vector_abs_sum ( ExecutionPolicy && exec , in_vector_t v , T init ); template < class in_vector_t > auto vector_abs_sum ( in_vector_t v ) -> /* see-below */ ; template < class ExecutionPolicy , class in_vector_t > auto vector_abs_sum ( ExecutionPolicy && exec , in_vector_t v ) -> /* see-below */ ; // [linalg.algs.blas1.iamax], // index of maximum absolute value of vector elements template < class in_vector_t > extents <>:: size_type idx_abs_max ( in_vector_t v ); template < class ExecutionPolicy , class in_vector_t > extents <>:: size_type idx_abs_max ( ExecutionPolicy && exec , in_vector_t v ); // [linalg.algs.blas1.matfrobnorm], // Frobenius norm of a matrix template < class in_matrix_t , class T > T matrix_frob_norm ( in_matrix_t A , T init ); template < class ExecutionPolicy , class in_matrix_t , class T > T matrix_frob_norm ( ExecutionPolicy && exec , in_matrix_t A , T init ); template < class in_matrix_t > auto matrix_frob_norm ( in_matrix_t A ) -> /* see-below */ ; template < class ExecutionPolicy , class in_matrix_t > auto matrix_frob_norm ( ExecutionPolicy && exec , in_matrix_t A ) -> /* see-below */ ; // [linalg.algs.blas1.matonenorm], // One norm of a matrix template < class in_matrix_t , class T > T matrix_one_norm ( in_matrix_t A , T init ); template < class ExecutionPolicy , class in_matrix_t , class T > T matrix_one_norm ( ExecutionPolicy && exec , in_matrix_t A , T init ); template < class in_matrix_t > auto matrix_one_norm ( in_matrix_t A ) -> /* see-below */ ; template < class ExecutionPolicy , class in_matrix_t > auto matrix_one_norm ( ExecutionPolicy && exec , in_matrix_t A ) -> /* see-below */ ; // [linalg.algs.blas1.matinfnorm], // Infinity norm of a matrix template < class in_matrix_t , class T > T matrix_inf_norm ( in_matrix_t A , T init ); template < class ExecutionPolicy , class in_matrix_t , class T > T matrix_inf_norm ( ExecutionPolicy && exec , in_matrix_t A , T init ); template < class in_matrix_t > auto matrix_inf_norm ( in_matrix_t A ) -> /* see-below */ ; template < class ExecutionPolicy , class in_matrix_t > auto matrix_inf_norm ( ExecutionPolicy && exec , in_matrix_t A ) -> /* see-below */ ; // [linalg.algs.blas2.gemv], // general matrix-vector product template < class in_vector_t , class in_matrix_t , class out_vector_t > void matrix_vector_product ( in_matrix_t A , in_vector_t x , out_vector_t y ); template < class ExecutionPolicy , class in_vector_t , class in_matrix_t , class out_vector_t > void matrix_vector_product ( ExecutionPolicy && exec , in_matrix_t A , in_vector_t x , out_vector_t y ); template < class in_vector_1_t , class in_matrix_t , class in_vector_2_t , class out_vector_t > void matrix_vector_product ( in_matrix_t A , in_vector_1_t x , in_vector_2_t y , out_vector_t z ); template < class ExecutionPolicy , class in_vector_1_t , class in_matrix_t , class in_vector_2_t , class out_vector_t > void matrix_vector_product ( ExecutionPolicy && exec , in_matrix_t A , in_vector_1_t x , in_vector_2_t y , out_vector_t z ); // [linalg.algs.blas2.symv], // symmetric matrix-vector product template < class in_matrix_t , class Triangle , class in_vector_t , class out_vector_t > void symmetric_matrix_vector_product ( in_matrix_t A , Triangle t , in_vector_t x , out_vector_t y ); template < class ExecutionPolicy , class in_matrix_t , class Triangle , class in_vector_t , class out_vector_t > void symmetric_matrix_vector_product ( ExecutionPolicy && exec , in_matrix_t A , Triangle t , in_vector_t x , out_vector_t y ); template < class in_matrix_t , class Triangle , class in_vector_1_t , class in_vector_2_t , class out_vector_t > void symmetric_matrix_vector_product ( in_matrix_t A , Triangle t , in_vector_1_t x , in_vector_2_t y , out_vector_t z ); template < class ExecutionPolicy , class in_matrix_t , class Triangle , class in_vector_1_t , class in_vector_2_t , class out_vector_t > void symmetric_matrix_vector_product ( ExecutionPolicy && exec , in_matrix_t A , Triangle t , in_vector_1_t x , in_vector_2_t y , out_vector_t z ); // [linalg.algs.blas2.hemv], // Hermitian matrix-vector product template < class in_matrix_t , class Triangle , class in_vector_t , class out_vector_t > void hermitian_matrix_vector_product ( in_matrix_t A , Triangle t , in_vector_t x , out_vector_t y ); template < class ExecutionPolicy , class in_matrix_t , class Triangle , class in_vector_t , class out_vector_t > void hermitian_matrix_vector_product ( ExecutionPolicy && exec , in_matrix_t A , Triangle t , in_vector_t x , out_vector_t y ); template < class in_matrix_t , class Triangle , class in_vector_1_t , class in_vector_2_t , class out_vector_t > void hermitian_matrix_vector_product ( in_matrix_t A , Triangle t , in_vector_1_t x , in_vector_2_t y , out_vector_t z ); template < class ExecutionPolicy , class in_matrix_t , class Triangle , class in_vector_1_t , class in_vector_2_t , class out_vector_t > void hermitian_matrix_vector_product ( ExecutionPolicy && exec , in_matrix_t A , Triangle t , in_vector_1_t x , in_vector_2_t y , out_vector_t z ); // [linalg.algs.blas2.trmv], // Triangular matrix-vector product // [linalg.algs.blas2.trmv.ov], // Overwriting triangular matrix-vector product template < class in_matrix_t , class Triangle , class DiagonalStorage , class in_vector_t , class out_vector_t > void triangular_matrix_vector_product ( in_matrix_t A , Triangle t , DiagonalStorage d , in_vector_t x , out_vector_t y ); template < class ExecutionPolicy , class in_matrix_t , class Triangle , class DiagonalStorage , class in_vector_t , class out_vector_t > void triangular_matrix_vector_product ( ExecutionPolicy && exec , in_matrix_t A , Triangle t , DiagonalStorage d , in_vector_t x , out_vector_t y ); // [linalg.algs.blas2.trmv.in-place], // In-place triangular matrix-vector product template < class in_matrix_t , class Triangle , class DiagonalStorage , class inout_vector_t > void triangular_matrix_vector_product ( in_matrix_t A , Triangle t , DiagonalStorage d , inout_vector_t y ); // [linalg.algs.blas2.trmv.up], // Updating triangular matrix-vector product template < class in_matrix_t , class Triangle , class DiagonalStorage , class in_vector_1_t , class in_vector_2_t , class out_vector_t > void triangular_matrix_vector_product ( in_matrix_t A , Triangle t , DiagonalStorage d , in_vector_1_t x , in_vector_2_t y , out_vector_t z ); template < class ExecutionPolicy , class in_matrix_t , class Triangle , class DiagonalStorage , class in_vector_1_t , class in_vector_2_t , class out_vector_t > void triangular_matrix_vector_product ( ExecutionPolicy && exec , in_matrix_t A , Triangle t , DiagonalStorage d , in_vector_1_t x , in_vector_2_t y , out_vector_t z ); // [linalg.algs.blas2.trsv], // Solve a triangular linear system // [linalg.algs.blas2.trsv.not-in-place], // Solve a triangular linear system, not in place template < class in_matrix_t , class Triangle , class DiagonalStorage , class in_vector_t , class out_vector_t > void triangular_matrix_vector_solve ( in_matrix_t A , Triangle t , DiagonalStorage d , in_vector_t b , out_vector_t x ); template < class ExecutionPolicy , class in_matrix_t , class Triangle , class DiagonalStorage , class in_vector_t , class out_vector_t > void triangular_matrix_vector_solve ( ExecutionPolicy && exec , in_matrix_t A , Triangle t , DiagonalStorage d , in_vector_t b , out_vector_t x ); // [linalg.algs.blas2.trsv.in-place], // Solve a triangular linear system, in place template < class in_matrix_t , class Triangle , class DiagonalStorage , class inout_vector_t > void triangular_matrix_vector_solve ( in_matrix_t A , Triangle t , DiagonalStorage d , inout_vector_t b ); template < class ExecutionPolicy , class in_matrix_t , class Triangle , class DiagonalStorage , class inout_vector_t > void triangular_matrix_vector_solve ( ExecutionPolicy && exec , in_matrix_t A , Triangle t , DiagonalStorage d , inout_vector_t b ); // [linalg.algs.blas2.rank1.geru], // nonconjugated rank-1 matrix update template < class in_vector_1_t , class in_vector_2_t , class inout_matrix_t > void matrix_rank_1_update ( in_vector_1_t x , in_vector_2_t y , inout_matrix_t A ); template < class ExecutionPolicy , class in_vector_1_t , class in_vector_2_t , class inout_matrix_t > void matrix_rank_1_update ( ExecutionPolicy && exec , in_vector_1_t x , in_vector_2_t y , inout_matrix_t A ); // [linalg.algs.blas2.rank1.gerc], // conjugated rank-1 matrix update template < class in_vector_1_t , class in_vector_2_t , class inout_matrix_t > void matrix_rank_1_update_c ( in_vector_1_t x , in_vector_2_t y , inout_matrix_t A ); template < class ExecutionPolicy , class in_vector_1_t , class in_vector_2_t , class inout_matrix_t > void matrix_rank_1_update_c ( ExecutionPolicy && exec , in_vector_1_t x , in_vector_2_t y , inout_matrix_t A ); // [linalg.algs.blas2.rank1.syr], // symmetric rank-1 matrix update template < class in_vector_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_1_update ( in_vector_t x , inout_matrix_t A , Triangle t ); template < class ExecutionPolicy , class in_vector_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_1_update ( ExecutionPolicy && exec , in_vector_t x , inout_matrix_t A , Triangle t ); template < class T , class in_vector_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_1_update ( T alpha , in_vector_t x , inout_matrix_t A , Triangle t ); template < class ExecutionPolicy , class T , class in_vector_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_1_update ( ExecutionPolicy && exec , T alpha , in_vector_t x , inout_matrix_t A , Triangle t ); // [linalg.algs.blas2.rank1.her], // Hermitian rank-1 matrix update template < class in_vector_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_1_update ( in_vector_t x , inout_matrix_t A , Triangle t ); template < class ExecutionPolicy , class in_vector_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_1_update ( ExecutionPolicy && exec , in_vector_t x , inout_matrix_t A , Triangle t ); template < class T , class in_vector_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_1_update ( T alpha , in_vector_t x , inout_matrix_t A , Triangle t ); template < class ExecutionPolicy , class T , class in_vector_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_1_update ( ExecutionPolicy && exec , T alpha , in_vector_t x , inout_matrix_t A , Triangle t ); // [linalg.algs.blas2.rank2.syr2], // symmetric rank-2 matrix update template < class in_vector_1_t , class in_vector_2_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_2_update ( in_vector_1_t x , in_vector_2_t y , inout_matrix_t A , Triangle t ); template < class ExecutionPolicy , class in_vector_1_t , class in_vector_2_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_2_update ( ExecutionPolicy && exec , in_vector_1_t x , in_vector_2_t y , inout_matrix_t A , Triangle t ); // [linalg.algs.blas2.rank2.her2], // Hermitian rank-2 matrix update template < class in_vector_1_t , class in_vector_2_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_2_update ( in_vector_1_t x , in_vector_2_t y , inout_matrix_t A , Triangle t ); template < class ExecutionPolicy , class in_vector_1_t , class in_vector_2_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_2_update ( ExecutionPolicy && exec , in_vector_1_t x , in_vector_2_t y , inout_matrix_t A , Triangle t ); // [linalg.algs.blas3.gemm], // general matrix-matrix product template < class in_matrix_1_t , class in_matrix_2_t , class out_matrix_t > void matrix_product ( in_matrix_1_t A , in_matrix_2_t B , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class in_matrix_2_t , class out_matrix_t > void matrix_product ( ExecutionPolicy && exec , in_matrix_1_t A , in_matrix_2_t B , out_matrix_t C ); template < class in_matrix_1_t , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void matrix_product ( in_matrix_1_t A , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void matrix_product ( ExecutionPolicy && exec , in_matrix_1_t A , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); // [linalg.algs.blas3.symm], // symmetric matrix-matrix product // [linalg.algs.blas3.symm.ov.left], // overwriting symmetric matrix-matrix left product template < class in_matrix_1_t , class Triangle , class in_matrix_2_t , class out_matrix_t > void symmetric_matrix_left_product ( in_matrix_1_t A , Triangle t , in_matrix_2_t B , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class in_matrix_2_t , class out_matrix_t > void symmetric_matrix_left_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , in_matrix_2_t B , out_matrix_t C ); // [linalg.algs.blas3.symm.ov.right], // overwriting symmetric matrix-matrix right product template < class in_matrix_1_t , class Triangle , class in_matrix_2_t , class out_matrix_t > void symmetric_matrix_right_product ( in_matrix_1_t A , Triangle t , in_matrix_2_t B , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class in_matrix_2_t , class out_matrix_t > void symmetric_matrix_right_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , in_matrix_2_t B , out_matrix_t C ); // [linalg.algs.blas3.symm.up.left], // updating symmetric matrix-matrix left product template < class in_matrix_1_t , class Triangle , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void symmetric_matrix_left_product ( in_matrix_1_t A , Triangle t , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void symmetric_matrix_left_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); // [linalg.algs.blas3.symm.up.right], // updating symmetric matrix-matrix right product template < class in_matrix_1_t , class Triangle , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void symmetric_matrix_right_product ( in_matrix_1_t A , Triangle t , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void symmetric_matrix_right_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); // [linalg.algs.blas3.hemm], // Hermitian matrix-matrix product // [linalg.algs.blas3.hemm.ov.left], // overwriting Hermitian matrix-matrix left product template < class in_matrix_1_t , class Triangle , class in_matrix_2_t , class out_matrix_t > void hermitian_matrix_left_product ( in_matrix_1_t A , Triangle t , in_matrix_2_t B , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class in_matrix_2_t , class out_matrix_t > void hermitian_matrix_left_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , in_matrix_2_t B , out_matrix_t C ); // [linalg.algs.blas3.hemm.ov.right], // overwriting Hermitian matrix-matrix right product template < class in_matrix_1_t , class Triangle , class in_matrix_2_t , class out_matrix_t > void hermitian_matrix_right_product ( in_matrix_1_t A , Triangle t , in_matrix_2_t B , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class in_matrix_2_t , class out_matrix_t > void hermitian_matrix_right_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , in_matrix_2_t B , out_matrix_t C ); // [linalg.algs.blas3.hemm.up.left], // updating Hermitian matrix-matrix left product template < class in_matrix_1_t , class Triangle , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void hermitian_matrix_left_product ( in_matrix_1_t A , Triangle t , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void hermitian_matrix_left_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); // [linalg.algs.blas3.hemm.up.right], // updating Hermitian matrix-matrix right product template < class in_matrix_1_t , class Triangle , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void hermitian_matrix_right_product ( in_matrix_1_t A , Triangle t , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void hermitian_matrix_right_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); // [linalg.algs.blas3.trmm], // triangular matrix-matrix product // [linalg.algs.blas3.trmm.ov.left], // overwriting triangular matrix-matrix left product template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class out_matrix_t > void triangular_matrix_left_product ( in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class out_matrix_t > void triangular_matrix_left_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , out_matrix_t C ); template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class inout_matrix_t > void triangular_matrix_left_product ( in_matrix_1_t A , Triangle t , DiagonalStorage d , inout_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class inout_matrix_t > void triangular_matrix_left_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , inout_matrix_t C ); // [linalg.algs.blas3.trmm.ov.right], // overwriting triangular matrix-matrix right product template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class out_matrix_t > void triangular_matrix_right_product ( in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class out_matrix_t > void triangular_matrix_right_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , out_matrix_t C ); template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class inout_matrix_t > void triangular_matrix_right_product ( in_matrix_1_t A , Triangle t , DiagonalStorage d , inout_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class inout_matrix_t > void triangular_matrix_right_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , inout_matrix_t C ); // [linalg.algs.blas3.trmm.up.left], // updating triangular matrix-matrix left product template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void triangular_matrix_left_product ( in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void triangular_matrix_left_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); // [linalg.algs.blas3.trmm.up.right], // updating triangular matrix-matrix right product template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void triangular_matrix_right_product ( in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void triangular_matrix_right_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); // [linalg.alg.blas3.rank-k.syrk], // rank-k symmetric matrix update template < class in_matrix_1_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_k_update ( in_matrix_1_t A , inout_matrix_t C , Triangle t ); template < class ExecutionPolicy , class in_matrix_1_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_k_update ( ExecutionPolicy && exec , in_matrix_1_t A , inout_matrix_t C , Triangle t ); template < class T , class in_matrix_1_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_k_update ( T alpha , in_matrix_1_t A , inout_matrix_t C , Triangle t ); template < class T , class ExecutionPolicy , class in_matrix_1_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_k_update ( ExecutionPolicy && exec , T alpha , in_matrix_1_t A , inout_matrix_t C , Triangle t ); // [linalg.alg.blas3.rank-k.herk], // rank-k Hermitian matrix update template < class in_matrix_1_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_k_update ( in_matrix_1_t A , inout_matrix_t C , Triangle t ); template < class ExecutionPolicy , class in_matrix_1_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_k_update ( ExecutionPolicy && exec , in_matrix_1_t A , inout_matrix_t C , Triangle t ); template < class T , class in_matrix_1_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_k_update ( T alpha , in_matrix_1_t A , inout_matrix_t C , Triangle t ); template < class ExecutionPolicy , class T , class in_matrix_1_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_k_update ( ExecutionPolicy && exec , T alpha , in_matrix_1_t A , inout_matrix_t C , Triangle t ); // [linalg.alg.blas3.rank2k.syr2k], // rank-2k symmetric matrix update template < class in_matrix_1_t , class in_matrix_2_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_2k_update ( in_matrix_1_t A , in_matrix_2_t B , inout_matrix_t C , Triangle t ); template < class ExecutionPolicy , class in_matrix_1_t , class in_matrix_2_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_2k_update ( ExecutionPolicy && exec , in_matrix_1_t A , in_matrix_2_t B , inout_matrix_t C , Triangle t ); // [linalg.alg.blas3.rank2k.her2k], // rank-2k Hermitian matrix update template < class in_matrix_1_t , class in_matrix_2_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_2k_update ( in_matrix_1_t A , in_matrix_2_t B , inout_matrix_t C , Triangle t ); template < class ExecutionPolicy , class in_matrix_1_t , class in_matrix_2_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_2k_update ( ExecutionPolicy && exec , in_matrix_1_t A , in_matrix_2_t B , inout_matrix_t C , Triangle t ); // [linalg.alg.blas3.trsm], // solve multiple triangular linear systems // [linalg.alg.blas3.trsm.left], // solve multiple triangular linear systems // with triangular matrix on the left template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class out_matrix_t > void triangular_matrix_matrix_left_solve ( in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , out_matrix_t X ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class out_matrix_t > void triangular_matrix_matrix_left_solve ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , out_matrix_t X ); template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class inout_matrix_t > void triangular_matrix_matrix_left_solve ( in_matrix_1_t A , Triangle t , DiagonalStorage d , inout_matrix_t B ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class inout_matrix_t > void triangular_matrix_matrix_left_solve ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , inout_matrix_t B ); // [linalg.alg.blas3.trsm.right], // solve multiple triangular linear systems // with triangular matrix on the right template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class out_matrix_t > void triangular_matrix_matrix_right_solve ( in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , out_matrix_t X ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class out_matrix_t > void triangular_matrix_matrix_right_solve ( ExecutionPolicy && exec , in_matrix_t A , Triangle t , DiagonalStorage d , in_matrix_t B , out_matrix_t X ); template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class inout_matrix_t > void triangular_matrix_matrix_right_solve ( in_matrix_1_t A , Triangle t , DiagonalStorage d , inout_matrix_t B ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class inout_matrix_t > void triangular_matrix_matrix_right_solve ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , inout_matrix_t B ); } 
16.2. Requirements [linalg.reqs]
16.2.1. Value and reference requirements [linalg.reqs.val]
This clause lists the minimum requirements for all algorithms and classes in [linalg], and for the following types:
- 
     for any input or output mdspan value_type reference 
- 
     the Scalar 
- 
     the T T init 
- 
     the template parameter of sum_of_squares_result 
In this clause, we refer to these types as linear algebra value types. These type requirements are parameterized by the algorithm or method using the type. Each algorithm or method using the type has one or more associated mathematical expressions that defines the algorithm’s or method’s behavior. For each algorithm or method, its mathematical expression(s) are either explicitly stated as such, or are implicitly stated in the algorithm’s or method’s description. The requirements below will refer to those mathematical expression(s).
[Note: This notion of parameterizing requirements on a mathematical expression generalizes GENERALIZED_SUM. --end note]
All of the following requirements presume that the algorithm’s asymptotic complexity requirements, if any, are satisfied.
- 
     Any linear algebra value type meets the requirements of semiregular 
- 
     The algorithm or method may perform read-only access on any input or output mdspan 
- 
     The algorithm or method may make arbitrary many objects of any linear algebra value type, value-initializing or direct-initializing them with any existing object of that type. 
- 
     The algorithm or method may assign arbitrarily many times to any reference resulting from a valid output mdspan 
- 
     If the algorithm’s or method’s mathematical expression uses division and possibly also addition, subtraction, and multiplication, then the algorithm or method evaluates the mathematical expression using a sequence of evaluations of * *= / /= + += - - -= = a / b a b 
- 
     Otherwise, if the algorithm’s or method’s mathematical expression uses subtraction and possibly also addition and multiplication, then the algorithm or method evaluates the mathematical expression using a sequence of evaluations of * *= + += - - -= = 
- 
     Otherwise, if the algorithm’s or method’s mathematical expression uses multiplication and possibly also addition, then the algorithm or method evaluates the mathematical expression using a sequence of evaluations of * *= + += = 
- 
     Otherwise, if the algorithm’s or method’s mathematical expression uses addition, then the algorithm or method evaluates the mathematical expression using a sequence of evaluations of + += = 
- 
     If the algorithm’s or method’s mathematical expression includes any of the following: a. conj ( z ) z b. abs ( x ) x c. sqrt ( x ) x but otherwise conforms to case (5), (6), (7), or (8), then the relevant case (5), (6), (7), or (8) applies. In addition, any of the above subexpressions that appear in the algorithm’s or method’s mathematical expression shall be well formed. However, if the mathematical expression includes conj ( z ) z complex < R > R conj ( z ) z 
- 
     If the algorithm or method has an output mdspan mdspan value_type 
- 
     The algorithm or method may reorder addends and partial sums in its mathematical expression arbitrarily. [Note: Factors in each product are not reordered; multiplication is not necessarily commutative. --end note] 
- 
     The algorithm or method may replace any value with the sum of that value and a value-initialized object of any input or output mdspan value_type 
- 
     If the algorithm or method has a T init T 
16.2.2. Requirements for algorithms and methods on floating-point values [linalg.reqs.flpt]
For all algorithms and classes in [linalg], suppose that
- 
     all input and output mdspan value_type 
- 
     any Scalar 
- 
     any argument corresponding to the T init 
Then, algorithms and classes' methods may do the following:
- 
     compute floating-point sums in any way that improves their accuracy for arbitrary input; 
- 
     perform arithmetic operations other than those in the algorithm’s or method’s mathematical expression, in order to improve performance or accuracy; and 
- 
     use approximations (that might not be exact even if computing with real numbers), instead of computations that would be exact if it were possible to compute without rounding error; 
as long as
- 
     the algorithm or method satisfies the complexity requirements; and 
- 
     the algorithm or method is logarithmically stable, in the sense of (Demmel 2007). 
[Note: Strassen’s algorithm for matrix-matrix multiply is an example of a logarithmically stable algorithm. --end note]
16.3. Tag classes [linalg.tags]
16.3.1. Storage order tags [linalg.tags.order]
struct column_major_t { }; inline constexpr column_major_t column_major = { }; struct row_major_t { }; inline constexpr row_major_t row_major = { }; 
16.3.2. Triangle tags [linalg.tags.triangle]
Some linear algebra algorithms distinguish between the "upper triangle," "lower triangle," and "diagonal" of a matrix.
- 
     The upper triangle of a matrix A A A ( i , j ) i j 
- 
     The lower triangle of A A A ( i , j ) i j 
- 
     The diagonal is the set of all elements of A A ( i , i ) 
struct upper_triangle_t { }; inline constexpr upper_triangle_t upper_triangle = { }; struct lower_triangle_t { }; inline constexpr lower_triangle_t lower_triangle = { }; 
These tag classes specify whether algorithms and other users of a
matrix (represented as an 
16.3.3. Diagonal tags [linalg.tags.diagonal]
struct implicit_unit_diagonal_t { }; inline constexpr implicit_unit_diagonal_t implicit_unit_diagonal = { }; struct explicit_diagonal_t { }; inline constexpr explicit_diagonal_t explicit_diagonal = { }; 
These tag classes specify what algorithms and other users of a matrix should assume about the diagonal entries of the matrix, and whether algorithms and users of the matrix should access those diagonal entries explicitly.
The 
- 
     the function will never access the i , i 
- 
     the matrix has a diagonal of ones (a "unit diagonal"). 
The tag 
16.4. Layouts for general and packed matrix types [linalg.layouts]
16.4.1. layout_blas_general 
   
[Note:
- 
     Unlike layout_left layout_right layout_blas_general < StorageOrder > layout_blas_general < StorageOrder > 
- 
     Unlike layout_stride 
BLAS functions call the possibly nonunit stride of the matrix the
"leading dimension" of that matrix.  For example, a BLAS function
argument corresponding to the leading dimension of the matrix 
--end note]
template < class StorageOrder > class layout_blas_general { public : template < class Extents > struct mapping { private : Extents extents_ ; // exposition only const typename Extents :: size_type stride_ {}; // exposition only public : constexpr mapping ( const Extents & e , const typename Extents :: size_type s ); template < class OtherExtents > constexpr mapping ( const mapping < OtherExtents >& e ) noexcept ; typename Extents :: size_type operator () ( typename Extents :: size_type i , typename Extents :: size_type j ) const ; constexpr typename Extents :: size_type required_span_size () const noexcept ; typename Extents :: size_type stride ( typename Extents :: size_type r ) const noexcept ; template < class OtherExtents > bool operator == ( const mapping < OtherExtents >& m ) const noexcept ; template < class OtherExtents > bool operator != ( const mapping < OtherExtents >& m ) const noexcept ; Extents extents () const noexcept ; static constexpr bool is_always_unique (); static constexpr bool is_always_contiguous (); static constexpr bool is_always_strided (); constexpr bool is_unique () const noexcept ; constexpr bool is_contiguous () const noexcept ; constexpr bool is_strided () const noexcept ; }; }; 
- 
     Constraints: - 
       StorageOrder column_major_t row_major_t 
- 
       Extents extents 
- 
       Extents :: rank () 
 
- 
       
constexpr mapping ( const Extents & e , const typename Extents :: size_type s ); 
- 
     Requires: - 
       If StorageOrder column_major_t s e . extent ( 0 ) 
- 
       Otherwise, if StorageOrder row_major_t s e . extent ( 1 ) 
 
- 
       
- 
     Effects: Initializes extents_ e stride_ s 
[Note:
The BLAS Standard requires that the stride be one if the corresponding matrix dimension is zero. We do not impose this requirement here, because it is specific to the BLAS. if an implementation dispatches to a BLAS function, then the implementation must impose the requirement at run time.
--end note]
template < class OtherExtents > constexpr mapping ( const mapping < OtherExtents >& e ) noexcept ; 
- 
     Constraints: - 
       OtherExtents extents 
- 
       OtherExtents :: rank () 
 
- 
       
- 
     Effects: Initializes extents_ m . extents_ stride_ m . stride_ 
typename Extents :: size_type operator () ( typename Extents :: size_type i , typename Extents :: size_type j ) const ; 
- 
     Requires: - 
       0 ≤ i extent ( 0 ) 
- 
       0 ≤ j extent ( 1 ) 
 
- 
       
- 
     Returns: - 
       If StorageOrder column_major_t i + stride ( 1 ) * j 
- 
       else, if StorageOrder row_major_t stride ( 0 ) * i + j 
 
- 
       
template < class OtherExtents > bool operator == ( const mapping < OtherExtents >& m ) const ; 
- 
     Constraints: OtherExtents :: rank () rank () 
- 
     Returns: trueif and only if for 0 ≤r rank () m . extent ( r ) extent ( r ) m . stride ( r ) stride ( r ) 
template < class OtherExtents > bool operator != ( const mapping < OtherExtents >& m ) const ; 
- 
     Constraints: OtherExtents :: rank () rank () 
- 
     Returns: trueif and only if there existsr r rank () m . extent ( r ) extent ( r ) m . stride ( r ) stride ( r ) 
typename Extents :: size_type stride ( typename Extents :: size_type r ) const noexcept ; 
- 
     Returns: - 
       If StorageOrder column_major_t stride_ r 
- 
       else, if StorageOrder row_major_t stride_ r 
 
- 
       
constexpr typename Extents :: size_type required_span_size () const noexcept ; 
- 
     Returns: stride ( 0 ) * stride ( 1 ) 
Extents extents () const noexcept ; 
- 
     Effects: Equivalent to return extents_ ; 
static constexpr bool is_always_unique (); 
- 
     Returns: true.
static constexpr bool is_always_contiguous (); 
- 
     Returns: false.
static constexpr bool is_always_strided (); 
- 
     Returns: true.
constexpr bool is_unique () const noexcept ; 
- 
     Returns: true.
constexpr bool is_contiguous () const noexcept ; 
- 
     Returns: - 
       If StorageOrder column_major_t trueifstride ( 1 ) extent ( 0 ) false;
- 
       else, if StorageOrder row_major_t trueifstride ( 0 ) extent ( 1 ) false.
 
- 
       
constexpr bool is_strided () const noexcept ; 
- 
     Returns: true.
16.4.2. layout_blas_packed 
   
A 
A 
[Note:
If 
--end note]
template < class Triangle , class StorageOrder > class layout_blas_packed { public : template < class Extents > struct mapping { private : Extents extents_ ; // exposition only public : constexpr mapping ( const Extents & e ); template < class OtherExtents > constexpr mapping ( const mapping < OtherExtents >& e ) noexcept ; typename Extents :: size_type operator () ( typename Extents :: size_type i , typename Extents :: size_type j ) const ; template < class OtherExtents > bool operator == ( const mapping < OtherExtents >& m ) const noexcept ; template < class OtherExtents > bool operator != ( const mapping < OtherExtents >& m ) const noexcept ; constexpr typename Extents :: size_type stride ( typename Extents :: size_type r ) const noexcept ; constexpr typename Extents :: size_type required_span_size () const noexcept ; constexpr Extents extents () const noexcept ; static constexpr bool is_always_unique (); static constexpr bool is_always_contiguous (); static constexpr bool is_always_strided (); constexpr bool is_unique () const noexcept ; constexpr bool is_contiguous () const noexcept ; constexpr bool is_strided () const noexcept ; }; 
- 
     Constraints: - 
       Triangle upper_triangle_t lower_triangle_t 
- 
       StorageOrder column_major_t row_major_t 
- 
       Extents extents 
- 
       Extents :: rank () 
 
- 
       
constexpr mapping ( const Extents & e ); 
- 
     Requires: e . extent ( 0 ) e . extent ( 1 ) 
- 
     Effects: Initializes extents_ e 
template < class OtherExtents > constexpr mapping ( const mapping < OtherExtents >& e ); 
- 
     Constraints: - 
       OtherExtents extents 
- 
       OtherExtents :: rank () 
 
- 
       
- 
     Effects: Initializes extents_ e 
typename Extents :: size_type operator () ( typename Extents :: size_type i , typename Extents :: size_type j ) const ; 
- 
     Requires: - 
       0 ≤ i extent ( 0 ) 
- 
       0 ≤ j extent ( 1 ) 
 
- 
       
- 
     Returns: Let N extent ( 0 ) - 
       If StorageOrder column_major_t - 
         if Triangle upper_triangle_t i + j ( j + 1 ) / 2 i j j + i ( i + 1 ) / 2 
- 
         else, if Triangle lower_triangle_t i + Nj - j ( j + 1 ) / 2 i j j + Ni - i ( i + 1 ) / 2 
 
- 
         
- 
       else, if StorageOrder row_major_t - 
         if Triangle upper_triangle_t j + Ni - i ( i + 1 ) / 2 j i i + Nj - j ( j + 1 ) / 2 
- 
         else, if Triangle lower_triangle_t j + i ( i + 1 ) / 2 j i i + j ( j + 1 ) / 2 
 
- 
         
 
- 
       
template < class OtherExtents > bool operator == ( const mapping < OtherExtents >& m ) const ; 
- 
     Constraints: OtherExtents :: rank () rank () 
- 
     Returns: trueif and only if for 0 ≤r rank () m . extent ( r ) extent ( r ) 
template < class OtherExtents > bool operator != ( const mapping < OtherExtents >& m ) const ; 
- 
     Constraints: OtherExtents :: rank () rank () 
- 
     Returns: trueif and only if there existsr r rank () m . extent ( r ) extent ( r ) 
constexpr typename Extents :: size_type stride ( typename Extents :: size_type r ) const noexcept ; 
- 
     Returns: 1 if extent ( 0 ) 
constexpr typename Extents :: size_type required_span_size () const noexcept ; 
- 
     Returns: extent ( 0 ) * ( extent ( 0 ) - 1 ) / 2 
constexpr Extents extents () const noexcept ; 
- 
     Effects: Equivalent to return extents_ ; 
static constexpr bool is_always_unique (); 
- 
     Returns: false.
static constexpr bool is_always_contiguous (); 
- 
     Returns: true.
static constexpr bool is_always_strided (); 
- 
     Returns: false.
constexpr bool is_unique () const noexcept ; 
- 
     Returns: trueifextent ( 0 ) false.
constexpr bool is_contiguous () const noexcept ; 
- 
     Returns: true.
constexpr bool is_strided () const noexcept ; 
- 
     Returns: trueifextent ( 0 ) false.
16.5. Scaled in-place transformation [linalg.scaled]
The 
[Example:
--end example]// z = alpha * x + y void z_equals_alpha_times_x_plus_y ( mdspan < double , extents < dynamic_extent >> z , const double alpha , mdspan < double , extents < dynamic_extent >> x , mdspan < double , extents < dynamic_extent >> y ) { add ( scaled ( alpha , x ), y , y ); } // w = alpha * x + beta * y void w_equals_alpha_times_x_plus_beta_times_y ( mdspan < double , extents < dynamic_extent >> w , const double alpha , mdspan < double , extents < dynamic_extent >> x , const double beta , mdspan < double , extents < dynamic_extent >> y ) { add ( scaled ( alpha , x ), scaled ( beta , y ), w ); } 
[Note:
An implementation could dispatch to a function in the BLAS library, by
noticing that the first argument has an 
--end note]
16.5.1. Class template accessor_scaled 
   The class template 
The exposition-only class template 
template < class ScalingFactor , class Reference > class scaled_scalar { // exposition only private : const ScalingFactor scaling_factor ; Reference value ; using result_type = decltype ( scaling_factor * value ); public : scaled_scalar ( const ScalingFactor & s , Reference v ); operator result_type () const ; }; 
- 
     Requires: ScalingFactor Reference 
- 
     Constraints: The expression scaling_factor * value 
scaled_scalar ( const ScalingFactor & s , Reference v ); 
- 
     Effects: Initializes scaling_factor s value v 
operator result_type () const ; 
- 
     Requires: This operator may be invoked arbitrarily many times. 
- 
     Effects: Equivalent to return scaling_factor * value ; 
The class template 
template < class ScalingFactor , class Accessor > class accessor_scaled { public : using element_type = Accessor :: element_type ; using pointer = Accessor :: pointer ; using reference = scaled_scalar < ScalingFactor , Accessor :: reference > ; using offset_policy = accessor_scaled < ScalingFactor , Accessor :: offset_policy > ; accessor_scaled ( const ScalingFactor & s , Accessor a ); reference access ( pointer p , extents <>:: size_type i ) const noexcept ; offset_policy :: pointer offset ( pointer p , extents <>:: size_type i ) const noexcept ; element_type * decay ( pointer p ) const noexcept ; ScalingFactor scaling_factor () const ; private : const ScalingFactor scaling_factor_ ; // exposition only Accessor accessor ; // exposition only }; 
- 
     Requires: - 
       ScalingFactor Accessor 
- 
       Accessor mdspan 
 
- 
       
accessor_scaled ( const ScalingFactor & s , Accessor a ); 
- 
     Effects: Initializes scaling_factor_ s accessor a 
reference access ( pointer p , extents <>:: size_type i ) const noexcept ; 
- 
     Effects: Equivalent to return reference ( scaling_factor_ , accessor . access ( p , i )); 
offset_policy :: pointer offset ( pointer p , extents <>:: size_type i ) const noexcept ; 
- 
     Effects: Equivalent to return accessor . offset ( p , i ); 
element_type * decay ( pointer p ) const noexcept ; 
- 
     Effects: Equivalent to return accessor . decay ( p ); 
ScalingFactor scaling_factor () const ; 
- 
     Effects: Equivalent to return scaling_factor_ ; 
16.5.2. scaled 
   The 
For 
template < class ScalingFactor , class ElementType , class Extents , class Layout , class Accessor > /* see below */ scaled ( const ScalingFactor & alpha , const mdspan < ElementType , Extents , Layout , Accessor >& x ); 
Let 
- 
     ReturnElementType ElementType const ElementType 
- 
     ReturnAccessor - 
       if Accessor accessor_scaled < NestedScalingFactor , NestedAccessor > NestedScalingFactor NestedAccessor accessor_scaled < ProductScalingFactor , NestedAccessor > accessor_scaled < ScalingFactor , Accessor > ProductScalingFactor decltype ( alpha * x . accessor (). scaling_factor ()) 
- 
       else, accessor_scaled < ScalingFactor , Accessor > 
 
- 
       
- 
     Effects: - 
       If Accessor accessor_scaled < NestedScalingFactor , NestedAccessor > ReturnAccessor accessor_scaled < ProductScalingFactor , NestedAccessor > return R ( x . data (), x . mapping (), ReturnAccessor ( product_alpha , x . accessor (). nested_accessor ())); product_alpha alpha * x . accessor (). scaling_factor () 
- 
       else, equivalent to return R ( x . data (), x . mapping (), ReturnAccessor ( alpha , x . accessor ())); 
 
- 
       
- 
     Remarks: The elements of the returned mdspan 
[Note:
The point of 
The point of 
--end note]
[Example:
--end example]void test_scaled ( mdspan < double , extents < 10 >> x ) { auto x_scaled = scaled ( 5.0 , x ); for ( int i = 0 ; i < x . extent ( 0 ); ++ i ) { assert ( x_scaled ( i ) == 5.0 * x ( i )); } } 
16.6. Conjugated in-place transformation [linalg.conj]
The 
[Note:
An implementation could dispatch to a function in the BLAS library, by
noticing that the 
--end note]
16.6.1. Class template accessor_conjugate 
   The class template 
The exposition-only class template 
template < class Reference , class ElementType > class conjugated_scalar { // exposition only public : conjugated_scalar ( Reference v ); operator ElementType () const ; private : Reference val ; }; 
- 
     Requires: Reference 
- 
     Constraints: The expression conj ( val ) ElementType ElementType complex < R > R 
conjugated_scalar ( Reference v ); 
- 
     Effects: Initializes val v 
operator T () const ; 
- 
     Effects: Equivalent to return conj ( val ); 
template < class Accessor > class accessor_conjugate { private : Accessor acc ; // exposition only public : using element_type = typename Accessor :: element_type ; using pointer = typename Accessor :: pointer ; using reference = /* see below */ ; using offset_policy = /* see below */ ; accessor_conjugate ( Accessor a ); reference access ( pointer p , extents <>:: size_type i ) const noexcept ( noexcept ( reference ( acc . access ( p , i )))); typename offset_policy :: pointer offset ( pointer p , extents <>:: size_type i ) const noexcept ( noexcept ( acc . offset ( p , i ))); element_type * decay ( pointer p ) const noexcept ( noexcept ( acc . decay ( p ))); Accessor nested_accessor () const ; }; 
- 
     Requires: - 
       Accessor 
- 
       Accessor mdspan 
 
- 
       
using reference = /* see below */ ; 
If 
using offset_policy = /* see below */ ; 
If 
accessor_conjugate ( Accessor a ); 
- 
     Effects: Initializes acc a 
reference access ( pointer p , extents <>:: size_type i ) const noexcept ( noexcept ( reference ( acc . access ( p , i )))); 
- 
     Effects: Equivalent to return reference ( acc . access ( p , i )); 
typename offset_policy :: pointer offset ( pointer p , extents <>:: size_type i ) const noexcept ( noexcept ( acc . offset ( p , i ))); 
- 
     Effects: Equivalent to return acc . offset ( p , i ); 
element_type * decay ( pointer p ) const noexcept ( noexcept ( acc . decay ( p ))); 
- 
     Effects: Equivalent to return acc . decay ( p ); 
Accessor nested_accessor () const ; 
- 
     Effects: Equivalent to return acc ; 
16.6.2. conjugated 
template < class ElementType , class Extents , class Layout , class Accessor > /* see-below */ conjugated ( mdspan < ElementType , Extents , Layout , Accessor > a ); 
Let 
- 
     ReturnElementType ElementType const ElementType 
- 
     ReturnAccessor - 
       if Accessor accessor_conjugate < NestedAccessor > NestedAccessor NestedAccessor accessor_conjugate < Accessor > 
- 
       else if ElementType complex < U > const complex < U > U accessor_conjugate < Accessor > 
- 
       else either accessor_conjugate < Accessor > Accessor 
 
- 
       
- 
     Effects: - 
       If Accessor accessor_conjugate < NestedAccessor > ReturnAccessor NestedAccessor return R ( a . data (), a . mapping (), a . nested_accessor ()); 
- 
       else, if ReturnAccessor accessor_conjugate < Accessor > return R ( a . data (), a . mapping (), accessor_conjugate < Accessor > ( a . accessor ())); 
- 
       else, equivalent to return R ( a . data (), a . mapping (), a . accessor ()); 
 
- 
       
- 
     Remarks: The elements of the returned mdspan 
[Note:
The point of 
--end note]
[Example:
--end example]void test_conjugated_complex ( mdspan < complex < double > , extents < 10 >> a ) { auto a_conj = conjugated ( a ); for ( int i = 0 ; i < a . extent ( 0 ); ++ i ) { assert ( a_conj ( i ) == conj ( a ( i )); } auto a_conj_conj = conjugated ( a_conj ); for ( int i = 0 ; i < a . extent ( 0 ); ++ i ) { assert ( a_conj_conj ( i ) == a ( i )); } } void test_conjugated_real ( mdspan < double , extents < 10 >> a ) { auto a_conj = conjugated ( a ); for ( int i = 0 ; i < a . extent ( 0 ); ++ i ) { assert ( a_conj ( i ) == a ( i )); } auto a_conj_conj = conjugated ( a_conj ); for ( int i = 0 ; i < a . extent ( 0 ); ++ i ) { assert ( a_conj_conj ( i ) == a ( i )); } } 
16.7. Transpose in-place transformation [linalg.transp]
The 
[Note:
An implementation could dispatch to a function in the BLAS library, by
noticing that the first argument has a 
--end note]
16.7.1. layout_transpose 
   
template < class InputExtents > using transpose_extents_t = /* see below */ ; // exposition only 
For 
- 
     InputExtents :: static_extent ( InputExtents :: rank () -1 ) OutputExtents :: static_extent ( OutputExtents :: rank () -2 ) 
- 
     InputExtents :: static_extent ( InputExtents :: rank () -2 ) OutputExtents :: static_extent ( OutputExtents :: rank () -1 ) 
- 
     InputExtents :: static_extent ( r ) OutputExtents :: static_extent ( r ) r InputExtents :: rank () -2 
- 
     Requires: InputExtents extents 
- 
     Constraints: InputExtents :: rank () 
template < class InputExtents > transpose_extents_t < InputExtents > transpose_extents ( const InputExtents in ); // exposition only 
- 
     Constraints: InputExtents :: rank () 
- 
     Returns: An extents out - 
       out . extent ( in . rank () -1 ) in . extent ( in . rank () -2 ) 
- 
       out . extent ( in . rank () -2 ) in . extent ( in . rank () -1 ) 
- 
       out . extent ( r ) in . extent ( r ) r in . rank () -2 
 
- 
       
template < class Layout > class layout_transpose { public : template < class Extents > struct mapping { private : using nested_mapping_type = typename Layout :: template mapping < transpose_extents_t < Extents >> ; // exposition only nested_mapping_type nested_mapping_ ; // exposition only public : mapping ( const nested_mapping_type & map ); extents <>:: size_type operator () ( extents <>:: size_type i , extents <>:: size_type j ) const noexcept ( noexcept ( nested_mapping_ ( j , i ))); nested_mapping_type nested_mapping () const ; template < class OtherExtents > bool operator == ( const mapping < OtherExtents >& m ) const ; template < class OtherExtents > bool operator != ( const mapping < OtherExtents >& m ) const ; Extents extents () const noexcept ; typename Extents :: size_type required_span_size () const noexcept ( noexcept ( nested_mapping_ . required_span_size ())); bool is_unique () const noexcept ( noexcept ( nested_mapping_ . is_unique ())); bool is_contiguous () const noexcept ( noexcept ( nested_mapping_ . is_contiguous ())); bool is_strided () const noexcept ( noexcept ( nested_mapping_ . is_strided ())); static constexpr bool is_always_unique (); static constexpr bool is_always_contiguous (); static constexpr bool is_always_strided (); typename Extents :: size_type stride ( typename Extents :: size_type r ) const noexcept ( noexcept ( nested_mapping_ . stride ( r ))); }; }; 
- 
     Requires: Layout mdspan 
- 
     Constraints: For all specializations E extents E :: rank () typename Layout :: template mapping < E >:: is_always_unique () true.
mapping ( const nested_mapping_type & map ); 
- 
     Effects: Initializes nested_mapping_ map 
extents <>:: size_type operator () ( extents <>:: size_type i , extents <>:: size_type j ) const noexcept ( noexcept ( nested_mapping_ ( j , i ))); 
- 
     Effects: Equivalent to return nested_mapping_ ( j , i ); 
nested_mapping_type nested_mapping () const ; 
- 
     Effects: Equivalent to return nested_mapping_ ; 
template < class OtherExtents > bool operator == ( const mapping < OtherExtents >& m ) const ; 
- 
     Constraints: OtherExtents :: rank () rank () 
- 
     Effects: Equivalent to nested_mapping_ == m . nested_mapping_ ; 
template < class OtherExtents > bool operator != ( const mapping < OtherExtents >& m ) const ; 
- 
     Constraints: OtherExtents :: rank () rank () 
- 
     Effects: Equivalent to nested_mapping_ != m . nested_mapping_ ; 
Extents extents () const noexcept ; 
- 
     Effects: Equivalent to return transpose_extents ( nested_mapping_ . extents ()); 
typename Extents :: size_type required_span_size () const noexcept ( noexcept ( nested_mapping_ . required_span_size ())); 
- 
     Effects: Equivalent to return nested_mapping_ . required_span_size (); 
bool is_unique () const noexcept ( noexcept ( nested_mapping_ . is_unique ())); 
- 
     Effects: Equivalent to return nested_mapping_ . is_unique (); 
bool is_contiguous () const noexcept ( noexcept ( nested_mapping_ . is_contiguous ())); 
- 
     Effects: Equivalent to return nested_mapping_ . is_contiguous (); 
bool is_strided () const noexcept ( noexcept ( nested_mapping_ . is_strided ())); 
- 
     Effects: Equivalent to return nested_mapping_ . is_strided (); 
static constexpr bool is_always_unique (); 
- 
     Effects: Equivalent to return nested_mapping_type :: is_always_unique (); 
static constexpr bool is_always_contiguous (); 
- 
     Effects: Equivalent to return nested_mapping_type :: is_always_contiguous (); 
static constexpr bool is_always_strided (); 
- 
     Effects: Equivalent to return nested_mapping_type :: is_always_strided (); 
typename Extents :: size_type stride ( typename Extents :: size_type r ) const noexcept ( noexcept ( nested_mapping_ . stride ( r ))); 
- 
     Constraints: is_always_strided () true.
- 
     Effects: Equivalent to return nested_mapping_ . stride ( s ); s r s r 
16.7.2. transposed 
   The 
template < class ElementType , class Extents , class Layout , class Accessor > /* see-below */ transposed ( mdspan < ElementType , Extents , Layout , Accessor > a ); 
Let 
- 
     ReturnElementType ElementType const ElementType 
- 
     ReturnLayout - 
       if Layout layout_blas_packed < Triangle , StorageOrder > layout_blas_packed < OppositeTriangle , OppositeStorageOrder > - 
         OppositeTriangle conditional_t < is_same_v < Triangle , upper_triangle_t > , lower_triangle_t , upper_triangle_t > 
- 
         OppositeStorageOrder conditional_t < is_same_v < StorageOrder , column_major_t > , row_major_t , column_major_t > 
 
- 
         
- 
       else, if Layout layout_transpose < NestedLayout > NestedLayout NestedLayout layout_transpose < Layout > 
- 
       else layout_transpose < Layout > 
 
- 
       
- 
     Effects: - 
       If Layout layout_blas_packed < Triangle , StorageOrder > return R ( a . data (), ReturnMapping ( a . mapping (). extents ()), a . accessor ()); 
- 
       else, if Layout layout_transpose < NestedLayout > ReturnLayout NestedLayout return R ( a . data (), a . mapping (). nested_mapping (), a . accessor ()); 
- 
       else, equivalent to return R ( a . data (), ReturnMapping ( a . mapping ()), a . accessor ); ReturnMapping typename layout_transpose < Layout >:: template mapping < ReturnExtents > 
 
- 
       
- 
     Remarks: The elements of the returned mdspan 
[Note:
Implementations may optimize applying 
--end note]
[Example:
--end example]void test_transposed ( mdspan < double , extents < 3 , 4 >> a ) { const auto num_rows = a . extent ( 0 ); const auto num_cols = a . extent ( 1 ); auto a_t = transposed ( a ); assert ( num_rows == a_t . extent ( 1 )); assert ( num_cols == a_t . extent ( 0 )); assert ( a . stride ( 0 ) == a_t . stride ( 1 )); assert ( a . stride ( 1 ) == a_t . stride ( 0 )); for ( extents <>:: size_type row = 0 ; row < num_rows ; ++ row ) { for ( extents <>:: size_type col = 0 ; col < num_rows ; ++ col ) { assert ( a ( row , col ) == a_t ( col , row )); } } auto a_t_t = transposed ( a_t ); assert ( num_rows == a_t_t . extent ( 0 )); assert ( num_cols == a_t_t . extent ( 1 )); assert ( a . stride ( 0 ) == a_t_t . stride ( 0 )); assert ( a . stride ( 1 ) == a_t_t . stride ( 1 )); for ( extents <>:: size_type row = 0 ; row < num_rows ; ++ row ) { for ( extents <>:: size_type col = 0 ; col < num_rows ; ++ col ) { assert ( a ( row , col ) == a_t_t ( row , col )); } } } 
16.8. Conjugate transpose transform [linalg.conj_transp]
The 
template < class ElementType , class Extents , class Layout , class Accessor > /* see-below */ conjugate_transposed ( mdspan < ElementType , Extents , Layout , Accessor > a ); 
- 
     Effects: Equivalent to return conjugated ( transposed ( a )); 
- 
     Remarks: The elements of the returned mdspan 
[Example:
--end example]void test_conjugate_transposed ( mdspan < complex < double > , extents < 3 , 4 >> a ) { const auto num_rows = a . extent ( 0 ); const auto num_cols = a . extent ( 1 ); auto a_ct = conjugate_transposed ( a ); assert ( num_rows == a_ct . extent ( 1 )); assert ( num_cols == a_ct . extent ( 0 )); assert ( a . stride ( 0 ) == a_ct . stride ( 1 )); assert ( a . stride ( 1 ) == a_ct . stride ( 0 )); for ( extents <>:: size_type row = 0 ; row < num_rows ; ++ row ) { for ( extents <>:: size_type col = 0 ; col < num_rows ; ++ col ) { assert ( a ( row , col ) == conj ( a_ct ( col , row ))); } } auto a_ct_ct = conjugate_transposed ( a_ct ); assert ( num_rows == a_ct_ct . extent ( 0 )); assert ( num_cols == a_ct_ct . extent ( 1 )); assert ( a . stride ( 0 ) == a_ct_ct . stride ( 0 )); assert ( a . stride ( 1 ) == a_ct_ct . stride ( 1 )); for ( extents <>:: size_type row = 0 ; row < num_rows ; ++ row ) { for ( extents <>:: size_type col = 0 ; col < num_rows ; ++ col ) { assert ( a ( row , col ) == a_ct_ct ( row , col )); assert ( conj ( a_ct ( col , row )) == a_ct_ct ( row , col )); } } } 
16.9. Algorithms [linalg.algs]
16.9.1. Requirements based on template parameter name [linalg.algs.reqs]
Throughout this Clause, where the template parameters are not
constrained, the names of template parameters are used to express type
requirements.  In the requirements below, we use 
- 
     Algorithms that have a template parameter named ExecutionPolicy 
- 
     Real complex < Real > 
- 
     in_vector * _t mdspan const 
- 
     inout_vector * _t mdspan const layout_type :: is_always_unique () true).
- 
     out_vector * _t mdspan const layout_type :: is_always_unique () true). If the algorithm accesses the object, it will do so in write-only fashion.
- 
     in_matrix * _t mdspan const 
- 
     inout_matrix * _t mdspan const 
- 
     out_matrix * _t mdspan const 
- 
     in_object * _t mdspan const layout_type :: is_always_unique () true). If the algorithm accesses the object, it will do so in read-only fashion.
- 
     inout_object * _t mdspan const layout_type :: is_always_unique () true).
- 
     out_object * _t mdspan const layout_type :: is_always_unique () true).
- 
     Triangle upper_triangle_t lower_triangle_t 
- 
     DiagonalStorage implicit_unit_diagonal_t explicit_diagonal_t 
- 
     in_ * _t const const mdspan 
- 
     inout_ * _t out_ * _t const mdspan const mdspan 
16.9.2. BLAS 1 functions [linalg.algs.blas1]
[Note:
The BLAS developed in three "levels": 1, 2, and 3. BLAS 1 includes vector-vector operations, BLAS 2 matrix-vector operations, and BLAS 3 matrix-matrix operations. The level coincides with the number of nested loops in a naïve sequential implementation of the operation. Increasing level also comes with increasing potential for data reuse. The BLAS traditionally lists computing a Givens rotation among the BLAS 1 operations, even though it only operates on scalars.
--end note]
Complexity: All algorithms in this Clause with 
16.9.2.1. Givens rotations [linalg.algs.blas1.givens]
16.9.2.1.1. Compute Givens rotation [linalg.algs.blas1.givens.lartg]
template < class Real > void givens_rotation_setup ( const Real a , const Real b , Real & c , Real & s , Real & r ); template < class Real > void givens_rotation_setup ( const complex < Real >& a , const complex < Real >& a , Real & c , complex < Real >& s , complex < Real >& r ); 
This function computes the plane (Givens) rotation represented by the
two values 
[ c s ] [ a ] [ r ] [ ] * [ ] = [ ] [ - conj ( s ) c ] [ b ] [ 0 ] 
holds, where 
[Note: This function corresponds to the LAPACK function 
[Note: 
- 
     Effects: Assigns to c s a b r a b 
- 
     Throws: Nothing. 
16.9.2.1.2. Apply a computed Givens rotation to vectors [linalg.algs.blas1.givens.rot]
template < class inout_vector_1_t , class inout_vector_2_t , class Real > void givens_rotation_apply ( inout_vector_1_t x , inout_vector_2_t y , const Real c , const Real s ); template < class ExecutionPolicy , class inout_vector_1_t , class inout_vector_2_t , class Real > void givens_rotation_apply ( ExecutionPolicy && exec , inout_vector_1_t x , inout_vector_2_t y , const Real c , const Real s ); template < class inout_vector_1_t , class inout_vector_2_t , class Real > void givens_rotation_apply ( inout_vector_1_t x , inout_vector_2_t y , const Real c , const complex < Real > s ); template < class ExecutionPolicy , class inout_vector_1_t , class inout_vector_2_t , class Real > void givens_rotation_apply ( ExecutionPolicy && exec , inout_vector_1_t x , inout_vector_2_t y , const Real c , const complex < Real > s ); 
[Note: These functions correspond to the BLAS function 
For the overloads that take the last argument 
For the overloads that take the last argument 
- 
     Requires: x . extent ( 0 ) y . extent ( 0 ) 
- 
     Mandates: If neither x . static_extent ( 0 ) y . static_extent ( 0 ) dynamic_extent x . static_extent ( 0 ) y . static_extent ( 0 ) 
- 
     Effects: Applies the plane (Givens) rotation specified by c s x y 
16.9.2.2. Swap matrix or vector elements [linalg.algs.blas1.swap]
template < class inout_object_1_t , class inout_object_2_t > void swap_elements ( inout_object_1_t x , inout_object_2_t y ); template < class ExecutionPolicy , class inout_object_1_t , class inout_object_2_t > void swap_elements ( ExecutionPolicy && exec , inout_object_1_t x , inout_object_2_t y ); 
[Note: These functions correspond to the BLAS function 
- 
     Requires: For all r x . rank () x . extent ( r ) y . extent ( r ) 
- 
     Constraints: - 
       x . rank () y . rank () 
- 
       x . rank () 
- 
       For i ... x y x ( i ...) = y ( i ...) 
 
- 
       
- 
     Mandates: For all r x . rank () x . static_extent ( r ) y . static_extent ( r ) dynamic_extent x . static_extent ( r ) y . static_extent ( r ) 
- 
     Effects: Swap all corresponding elements of the objects x y 
16.9.2.3. Multiply the elements of an object in place by a scalar [linalg.algs.blas1.scal]
template < class Scalar , class inout_object_t > void scale ( const Scalar alpha , inout_object_t obj ); template < class ExecutionPolicy , class Scalar , class inout_object_t > void scale ( ExecutionPolicy && exec , const Scalar alpha , inout_object_t obj ); 
[Note: These functions correspond to the BLAS function 
For 
- 
     Constraints: obj . rank () 
- 
     Effects: Replace each element of obj alpha 
16.9.2.4. Copy elements of one matrix or vector into another [linalg.algs.blas1.copy]
template < class in_object_t , class out_object_t > void copy ( in_object_t x , out_object_t y ); template < class ExecutionPolicy , class in_object_t , class out_object_t > void copy ( ExecutionPolicy && exec , in_object_t x , out_object_t y ); 
[Note: These functions correspond to the BLAS function 
- 
     Requires: For all r x . rank () x . extent ( r ) y . extent ( r ) 
- 
     Constraints: - 
       x . rank () y . rank () 
- 
       For all i ... x y y ( i ...) = x ( i ...) 
 
- 
       
- 
     Mandates: For all r x . rank () x . static_extent ( r ) y . static_extent ( r ) dynamic_extent x . static_extent ( r ) y . static_extent ( r ) 
- 
     Effects: Overwrite each element of y x 
16.9.2.5. Add vectors or matrices elementwise [linalg.algs.blas1.add]
template < class in_object_1_t , class in_object_2_t , class out_object_t > void add ( in_object_1_t x , in_object_2_t y , out_object_t z ); template < class ExecutionPolicy , class in_object_1_t , class in_object_2_t , class out_object_t > void add ( ExecutionPolicy && exec , in_object_1_t x , in_object_2_t y , out_object_t z ); 
[Note: These functions correspond to the BLAS function 
For 
- 
     Requires: For all r x . rank () - 
       x . extent ( r ) z . extent ( r ) 
- 
       y . extent ( r ) z . extent ( r ) 
 
- 
       
- 
     Constraints: x . rank () y . rank () z . rank () 
- 
     Mandates: For all r x . rank () - 
       if neither x . static_extent ( r ) z . static_extent ( r ) dynamic_extent x . static_extent ( r ) z . static_extent ( r ) 
- 
       if neither y . static_extent ( r ) z . static_extent ( r ) dynamic_extent y . static_extent ( r ) z . static_extent ( r ) 
- 
       if neither x . static_extent ( r ) y . static_extent ( r ) dynamic_extent x . static_extent ( r ) y . static_extent ( r ) 
 
- 
       
- 
     Effects: Compute the elementwise sum z = x + y. 
16.9.2.6. Dot product of two vectors [linalg.algs.blas1.dot]
16.9.2.6.1. Nonconjugated dot product of two vectors [linalg.algs.blas1.dot.dotu]
[Note: The functions in this section correspond to the BLAS
functions 
Nonconjugated dot product with specified result type
template < class in_vector_1_t , class in_vector_2_t , class T > T dot ( in_vector_1_t v1 , in_vector_2_t v2 , T init ); template < class ExecutionPolicy , class in_vector_1_t , class in_vector_2_t , class T > T dot ( ExecutionPolicy && exec , in_vector_1_t v1 , in_vector_2_t v2 , T init ); 
For 
- 
     Requires: v1 . extent ( 0 ) v2 . extent ( 0 ) 
- 
     Mandates: If neither v1 . static_extent ( 0 ) v2 . static_extent ( 0 ) dynamic_extent v1 . static_extent ( 0 ) v2 . static_extent ( 0 ) 
- 
     Effects: Let N v1 . extent ( 0 ) N init plus <> () init v1 ( 0 ) * v2 ( 0 ) v1 ( N -1 ) * v2 ( N -1 ) 
- 
     Remarks: If in_vector_t :: element_type T T in_vector_type :: element_type T 
[Note: Like 
[Note: Users can get 
Nonconjugated dot product with default result type
template < class in_vector_1_t , class in_vector_2_t > auto dot ( in_vector_1_t v1 , in_vector_2_t v2 ) -> /* see-below */ ; template < class ExecutionPolicy , class in_vector_1_t , class in_vector_2_t > auto dot ( ExecutionPolicy && exec , in_vector_1_t v1 , in_vector_2_t v2 ) -> /* see-below */ ; 
- 
     Effects: Let T decltype ( v1 ( 0 ) * v2 ( 0 )) dot ( v1 , v2 , T {}); dot ( exec , v1 , v2 , T {}); 
16.9.2.6.2. Conjugated dot product of two vectors [linalg.algs.blas1.dot.dotc]
[Note:
The functions in this section correspond to the BLAS functions 
--end note]
Conjugated dot product with specified result type
template < class in_vector_1_t , class in_vector_2_t , class T > T dotc ( in_vector_1_t v1 , in_vector_2_t v2 , T init ); template < class ExecutionPolicy , class in_vector_1_t , class in_vector_2_t , class T > T dotc ( ExecutionPolicy && exec , in_vector_1_t v1 , in_vector_2_t v2 , T init ); 
- 
     Effects: The three-argument overload is equivalent to dot ( v1 , conjugated ( v2 ), init ); dot ( exec , v1 , conjugated ( v2 ), init ); 
Conjugated dot product with default result type
template < class in_vector_1_t , class in_vector_2_t > auto dotc ( in_vector_1_t v1 , in_vector_2_t v2 ) -> /* see-below */ ; template < class ExecutionPolicy , class in_vector_1_t , class in_vector_2_t > auto dotc ( ExecutionPolicy && exec , in_vector_1_t v1 , in_vector_2_t v2 ) -> /* see-below */ ; 
- 
     Effects: If in_vector_2_t :: element_type complex < R > R T decltype ( v1 ( 0 ) * conj ( v2 ( 0 ))) T decltype ( v1 ( 0 ) * v2 ( 0 )) dotc ( v1 , v2 , T {}); dotc ( exec , v1 , v2 , T {}); 
16.9.2.7. Scaled sum of squares of a vector’s elements [linalg.algs.blas1.ssq]
template < class T > struct sum_of_squares_result { T scaling_factor ; T scaled_sum_of_squares ; }; template < class in_vector_t , class T > sum_of_squares_result < T > vector_sum_of_squares ( in_vector_t v , sum_of_squares_result < T > init ); template < class ExecutionPolicy , class in_vector_t , class T > sum_of_squares_result < T > vector_sum_of_squares ( ExecutionPolicy && exec , in_vector_t v , sum_of_squares_result < T > init ); 
[Note: These functions correspond to the LAPACK function 
- 
     Requires: - 
       T 
- 
       abs ( x ( 0 )) T 
 
- 
       
- 
     Constraints: For all i v f ssq T ssq = ssq + ( abs ( x ( i )) / f ) * ( abs ( x ( i )) / f ) 
- 
     Effects: Returns two values: - 
       scaling_factor init . scaling_factor abs ( x ( i )) i v 
- 
       scaled_sum_of_squares scaling_factor * scaling_factor * scaled_sum_of_squares abs ( x ( i )) init . scaling_factor * init . scaling_factor * init . scaled_sum_of_squares 
 
- 
       
- 
     Remarks: If in_vector_t :: element_type complex < R > R T - 
       if T in_vector_type :: element_type T 
- 
       any guarantees regarding overflow and underflow of vector_sum_of_squares 
 
- 
       
16.9.2.8. Euclidean norm of a vector [linalg.algs.blas1.nrm2]
16.9.2.8.1. Euclidean norm with specified result type
template < class in_vector_t , class T > T vector_norm2 ( in_vector_t v , T init ); template < class ExecutionPolicy , class in_vector_t , class T > T vector_norm2 ( ExecutionPolicy && exec , in_vector_t v , T init ); 
[Note: These functions correspond to the BLAS function 
For 
- 
     Requires: init + abs ( v ( 0 )) * abs ( v ( 0 )) T 
- 
     Effects: Returns the Euclidean norm (also called 2-norm) of the vector v 
- 
     Remarks: If in_vector_t :: element_type T - 
       if T in_vector_type :: element_type T 
- 
       any guarantees regarding overflow and underflow of vector_norm2 
 
- 
       
[Note: A suggested implementation of this function for floating-point types 
16.9.2.8.2. Euclidean norm with default result type
template < class in_vector_t > auto vector_norm2 ( in_vector_t v ) -> /* see-below */ ; template < class ExecutionPolicy , class in_vector_t > auto vector_norm2 ( ExecutionPolicy && exec , in_vector_t v ) -> /* see-below */ ; 
- 
     Effects: Let T decltype ( abs ( v ( 0 )) * abs ( v ( 0 ))) vector_norm2 ( v , T {}); vector_norm2 ( exec , v , T {}); 
16.9.2.9. Sum of absolute values of vector elements [linalg.algs.blas1.asum]
16.9.2.9.1. Sum of absolute values with specified result type
template < class in_vector_t , class T > T vector_abs_sum ( in_vector_t v , T init ); template < class ExecutionPolicy , class in_vector_t , class T > T vector_abs_sum ( ExecutionPolicy && exec , in_vector_t v , T init ); 
[Note: This function corresponds to the BLAS functions 
For 
- 
     Requires: For i v init + abs ( v ( i )) T 
- 
     Effects: Let N v . extent ( 0 ) - 
       If N init 
- 
       else, if in_vector_t :: element_type complex < R > R plus <> () init abs ( real ( v ( 0 ))) + abs ( imag ( v ( 0 ))) abs ( real ( v ( N -1 ))) + abs ( imag ( v ( N -1 ))) 
- 
       else, returns GENERALIZED_SUM( plus <> () init abs ( v ( 0 )) abs ( v ( N -1 )) 
 
- 
       
- 
     Remarks: If in_vector_t :: element_type T T in_vector_type :: element_type T 
16.9.2.9.2. Sum of absolute values with default result type
template < class in_vector_t > auto vector_abs_sum ( in_vector_t v ) -> /* see-below */ ; template < class ExecutionPolicy , class in_vector_t > auto vector_abs_sum ( ExecutionPolicy && exec , in_vector_t v ) -> /* see-below */ ; 
- 
     Effects: Let T decltype ( abs ( v ( 0 ))) vector_abs_sum ( v , T {}); vector_abs_sum ( exec , v , T {}); 
16.9.2.10. Index of maximum absolute value of vector elements [linalg.algs.blas1.iamax]
template < class in_vector_t > extents <>:: size_type idx_abs_max ( in_vector_t v ); template < class ExecutionPolicy , class in_vector_t > extents <>:: size_type idx_abs_max ( ExecutionPolicy && exec , in_vector_t v ); 
[Note: These functions correspond to the BLAS function 
- 
     Requires: For i v abs_value_type decltype ( v ( i )) abs_value_type 
- 
     Effects: Returns the index (in the domain of v v v numeric_limits < extents <>:: size_type >:: max () 
16.9.2.11. Frobenius norm of a matrix [linalg.algs.blas1.matfrobnorm]
16.9.2.11.1. Frobenius norm with specified result type
template < class in_matrix_t , class T > T matrix_frob_norm ( in_matrix_t A , T init ); template < class ExecutionPolicy , class in_matrix_t , class T > T matrix_frob_norm ( ExecutionPolicy && exec , in_matrix_t A , T init ); 
- 
     Effects: Returns the Frobenius norm of the matrix A A 
- 
     Remarks: If in_matrix_t :: element_type complex < R > R T - 
       if T in_matrix_type :: element_type T 
- 
       any guarantees regarding overflow and underflow of matrix_frob_norm 
 
- 
       
16.9.2.11.2. Frobenius norm with default result type
template < class in_matrix_t > auto matrix_frob_norm ( in_matrix_t A ) -> /* see-below */ ; template < class ExecutionPolicy , class in_matrix_t > auto matrix_frob_norm ( ExecutionPolicy && exec , in_matrix_t A ) -> /* see-below */ ; 
- 
     Effects: Let T decltype ( abs ( A ( 0 , 0 )) * abs ( A ( 0 , 0 ))) matrix_frob_norm ( A , T {}); matrix_frob_norm ( exec , A , T {}); 
16.9.2.12. One norm of a matrix [linalg.algs.blas1.matonenorm]
16.9.2.12.1. One norm with specified result type
template < class in_matrix_t , class T > T matrix_one_norm ( in_matrix_t A , T init ); template < class ExecutionPolicy , class in_matrix_t , class T > T matrix_one_norm ( ExecutionPolicy && exec , in_matrix_t A , T init ); 
- 
     Requires: abs ( A ( 0 , 0 )) T 
- 
     Effects: - 
       If A . extent ( 1 ) init 
- 
       else, returns the sum of init A A A 
 
- 
       
- 
     Remarks: If in_matrix_t :: element_type complex < R > R T T in_matrix_type :: element_type T 
16.9.2.12.2. One norm with default result type
template < class in_matrix_t > auto matrix_one_norm ( in_matrix_t A ) -> /* see-below */ ; template < class ExecutionPolicy , class in_matrix_t > auto matrix_one_norm ( ExecutionPolicy && exec , in_matrix_t A ) -> /* see-below */ ; 
- 
     Effects: Let T decltype ( abs ( A ( 0 , 0 )) matrix_one_norm ( A , T {}); matrix_one_norm ( exec , A , T {}); 
16.9.2.13. Infinity norm of a matrix [linalg.algs.blas1.matinfnorm]
16.9.2.13.1. Infinity norm with specified result type
template < class in_matrix_t , class T > T matrix_inf_norm ( in_matrix_t A , T init ); template < class ExecutionPolicy , class in_matrix_t , class T > T matrix_inf_norm ( ExecutionPolicy && exec , in_matrix_t A , T init ); 
- 
     Requires: abs ( A ( 0 , 0 )) T 
- 
     Effects: - 
       If A . extent ( 0 ) init 
- 
       else, returns the sum of init A A A 
 
- 
       
- 
     Remarks: If in_matrix_t :: element_type complex < R > R T T in_matrix_type :: element_type T 
16.9.2.13.2. Infinity norm with default result type
template < class in_matrix_t > auto matrix_inf_norm ( in_matrix_t A ) -> /* see-below */ ; template < class ExecutionPolicy , class in_matrix_t > auto matrix_inf_norm ( ExecutionPolicy && exec , in_matrix_t A ) -> /* see-below */ ; 
- 
     Effects: Let T decltype ( abs ( A ( 0 , 0 )) matrix_inf_norm ( A , T {}); matrix_inf_norm ( exec , A , T {}); 
16.9.3. BLAS 2 functions [linalg.algs.blas2]
16.9.3.1. General matrix-vector product [linalg.algs.blas2.gemv]
[Note: These functions correspond to the BLAS function 
The following requirements apply to all functions in this section.
- 
     Requires: - 
       A . extent ( 1 ) x . extent ( 0 ) 
- 
       A . extent ( 0 ) y . extent ( 0 ) 
- 
       y . extent ( 0 ) z . extent ( 0 ) 
 
- 
       
- 
     Constraints: For all functions in this section: - 
       in_matrix_t 
- 
       A . rank () x . rank () y . rank () z . rank () 
 
- 
       
- 
     Mandates: - 
       If neither A . static_extent ( 1 ) x . static_extent ( 0 ) dynamic_extent A . static_extent ( 1 ) x . static_extent ( 0 ) 
- 
       If neither A . static_extent ( 0 ) y . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) y . static_extent ( 0 ) 
- 
       If neither y . static_extent ( 0 ) z . static_extent ( 0 ) dynamic_extent y . static_extent ( 0 ) z . static_extent ( 0 ) 
 
- 
       
- 
     Complexity: All algorithms in this Clause with mdspan mdspan x . extent ( 0 ) A . extent ( 1 ) 
16.9.3.1.1. Overwriting matrix-vector product
template < class in_vector_t , class in_matrix_t , class out_vector_t > void matrix_vector_product ( in_matrix_t A , in_vector_t x , out_vector_t y ); template < class ExecutionPolicy , class in_vector_t , class in_matrix_t , class out_vector_t > void matrix_vector_product ( ExecutionPolicy && exec , in_matrix_t A , in_vector_t x , out_vector_t y ); 
For 
- 
     Effects: Assigns to the elements of y A x 
[Example:
--end example]constexpr extents <>:: size_type num_rows = 5 ; constexpr extents <>:: size_type num_cols = 6 ; // y = 3.0 * A * x void scaled_matvec_1 ( mdspan < double , extents < num_rows , num_cols >> A , mdspan < double , extents < num_cols >> x , mdspan < double , extents < num_rows >> y ) { matrix_vector_product ( scaled ( 3.0 , A ), x , y ); } // y = 3.0 * A * x + 2.0 * y void scaled_matvec_2 ( mdspan < double , extents < num_rows , num_cols >> A , mdspan < double , extents < num_cols >> x , mdspan < double , extents < num_rows >> y ) { matrix_vector_product ( scaled ( 3.0 , A ), x , scaled ( 2.0 , y ), y ); } // z = 7.0 times the transpose of A, times y void scaled_matvec_2 ( mdspan < double , extents < num_rows , num_cols >> A , mdspan < double , extents < num_rows >> y , mdspan < double , extents < num_cols >> z ) { matrix_vector_product ( scaled ( 7.0 , transposed ( A )), y , z ); } 
16.9.3.1.2. Updating matrix-vector product
template < class in_vector_1_t , class in_matrix_t , class in_vector_2_t , class out_vector_t > void matrix_vector_product ( in_matrix_t A , in_vector_1_t x , in_vector_2_t y , out_vector_t z ); template < class ExecutionPolicy , class in_vector_1_t , class in_matrix_t , class in_vector_2_t , class out_vector_t > void matrix_vector_product ( ExecutionPolicy && exec , in_matrix_t A , in_vector_1_t x , in_vector_2_t y , out_vector_t z ); 
For 
- 
     Effects: Assigns to the elements of z y A x 
16.9.3.2. Symmetric matrix-vector product [linalg.algs.blas2.symv]
[Note: These functions correspond to the BLAS functions 
The following requirements apply to all functions in this section.
- 
     Requires: - 
       A . extent ( 0 ) A . extent ( 1 ) 
- 
       A . extent ( 1 ) x . extent ( 0 ) 
- 
       A . extent ( 0 ) y . extent ( 0 ) 
- 
       y . extent ( 0 ) z . extent ( 0 ) 
 
- 
       
- 
     Constraints: - 
       in_matrix_t layout_blas_packed 
- 
       If in_matrix_t layout_blas_packed Triangle Triangle 
- 
       A . rank () x . rank () y . rank () z . rank () 
 
- 
       
- 
     Mandates: - 
       If neither A . static_extent ( 0 ) A . static_extent ( 1 ) dynamic_extent A . static_extent ( 0 ) A . static_extent ( 1 ) 
- 
       If neither A . static_extent ( 1 ) x . static_extent ( 0 ) dynamic_extent A . static_extent ( 1 ) x . static_extent ( 0 ) 
- 
       If neither A . static_extent ( 0 ) y . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) y . static_extent ( 0 ) 
- 
       If neither y . static_extent ( 0 ) z . static_extent ( 0 ) dynamic_extent y . static_extent ( 0 ) z . static_extent ( 0 ) 
 
- 
       
- 
     Complexity: All algorithms in this Clause with mdspan mdspan x . extent ( 0 ) A . extent ( 1 ) 
- 
     Remarks: The functions will only access the triangle of A Triangle t i , j A ( j , i ) A ( i , j ) 
16.9.3.2.1. Overwriting symmetric matrix-vector product
template < class in_matrix_t , class Triangle , class in_vector_t , class out_vector_t > void symmetric_matrix_vector_product ( in_matrix_t A , Triangle t , in_vector_t x , out_vector_t y ); template < class ExecutionPolicy , class in_matrix_t , class Triangle , class in_vector_t , class out_vector_t > void symmetric_matrix_vector_product ( ExecutionPolicy && exec , in_matrix_t A , Triangle t , in_vector_t x , out_vector_t y ); 
For 
- 
     Effects: Assigns to the elements of y A x 
16.9.3.2.2. Updating symmetric matrix-vector product
template < class in_matrix_t , class Triangle , class in_vector_1_t , class in_vector_2_t , class out_vector_t > void symmetric_matrix_vector_product ( in_matrix_t A , Triangle t , in_vector_1_t x , in_vector_2_t y , out_vector_t z ); template < class ExecutionPolicy , class in_matrix_t , class Triangle , class in_vector_1_t , class in_vector_2_t , class out_vector_t > void symmetric_matrix_vector_product ( ExecutionPolicy && exec , in_matrix_t A , Triangle t , in_vector_1_t x , in_vector_2_t y , out_vector_t z ); 
For 
- 
     Effects: Assigns to the elements of z y A x 
16.9.3.3. Hermitian matrix-vector product [linalg.algs.blas2.hemv]
[Note: These functions correspond to the BLAS functions 
The following requirements apply to all functions in this section.
- 
     Requires:" - 
       A . extent ( 0 ) A . extent ( 1 ) 
- 
       A . extent ( 1 ) x . extent ( 0 ) 
- 
       A . extent ( 0 ) y . extent ( 0 ) 
- 
       y . extent ( 0 ) z . extent ( 0 ) 
 
- 
       
- 
     Constraints: - 
       in_matrix_t layout_blas_packed 
- 
       If in_matrix_t layout_blas_packed Triangle Triangle 
- 
       A . rank () x . rank () y . rank () z . rank () 
 
- 
       
- 
     Mandates: - 
       If neither A . static_extent ( 0 ) A . static_extent ( 1 ) dynamic_extent A . static_extent ( 0 ) A . static_extent ( 1 ) 
- 
       If neither A . static_extent ( 1 ) x . static_extent ( 0 ) dynamic_extent A . static_extent ( 1 ) x . static_extent ( 0 ) 
- 
       If neither A . static_extent ( 0 ) y . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) y . static_extent ( 0 ) 
- 
       If neither y . static_extent ( 0 ) z . static_extent ( 0 ) dynamic_extent y . static_extent ( 0 ) z . static_extent ( 0 ) 
 
- 
       
- 
     Complexity: All algorithms in this Clause with mdspan mdspan x . extent ( 0 ) A . extent ( 1 ) 
- 
     Remarks: - 
       The functions will only access the triangle of A Triangle t 
- 
       If inout_matrix_t :: element_type complex < RA > RA i , j A ( j , i ) conj ( A ( i , j )) A ( j , i ) A ( i , j ) 
 
- 
       
16.9.3.3.1. Overwriting Hermitian matrix-vector product
template < class in_matrix_t , class Triangle , class in_vector_t , class out_vector_t > void hermitian_matrix_vector_product ( in_matrix_t A , Triangle t , in_vector_t x , out_vector_t y ); template < class ExecutionPolicy , class in_matrix_t , class Triangle , class in_vector_t , class out_vector_t > void hermitian_matrix_vector_product ( ExecutionPolicy && exec , in_matrix_t A , Triangle t , in_vector_t x , out_vector_t y ); 
For 
- 
     Effects: Assigns to the elements of y A x 
16.9.3.3.2. Updating Hermitian matrix-vector product
template < class in_matrix_t , class Triangle , class in_vector_1_t , class in_vector_2_t , class out_vector_t > void hermitian_matrix_vector_product ( in_matrix_t A , Triangle t , in_vector_1_t x , in_vector_2_t y , out_vector_t z ); template < class ExecutionPolicy , class in_matrix_t , class Triangle , class in_vector_1_t , class in_vector_2_t , class out_vector_t > void hermitian_matrix_vector_product ( ExecutionPolicy && exec , in_matrix_t A , Triangle t , in_vector_1_t x , in_vector_2_t y , out_vector_t z ); 
For 
- 
     Effects: Assigns to the elements of z y A x 
16.9.3.4. Triangular matrix-vector product [linalg.algs.blas2.trmv]
[Note: These functions correspond to the BLAS functions 
The following requirements apply to all functions in this section.
- 
     Requires: - 
       A . extent ( 0 ) A . extent ( 1 ) 
- 
       A . extent ( 0 ) y . extent ( 0 ) 
- 
       A . extent ( 1 ) x . extent ( 0 ) 
- 
       y . extent ( 0 ) z . extent ( 0 ) 
 
- 
       
- 
     Constraints: - 
       in_matrix_t layout_blas_packed 
- 
       if in_matrix_t layout_blas_packed Triangle Triangle 
- 
       A . rank () 
- 
       y . rank () 
- 
       x . rank () 
- 
       z . rank () 
 
- 
       
- 
     Mandates: - 
       if neither A . static_extent ( 0 ) A . static_extent ( 1 ) dynamic_extent A . static_extent ( 0 ) A . static_extent ( 1 ) 
- 
       if neither A . static_extent ( 0 ) y . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) y . static_extent ( 0 ) 
- 
       if neither A . static_extent ( 1 ) x . static_extent ( 0 ) dynamic_extent A . static_extent ( 1 ) x . static_extent ( 0 ) 
- 
       if neither y . static_extent ( 0 ) z . static_extent ( 0 ) dynamic_extent y . static_extent ( 0 ) z . static_extent ( 0 ) 
 
- 
       
- 
     Complexity: All algorithms in this Clause with mdspan mdspan x . extent ( 0 ) A . extent ( 1 ) 
- 
     Remarks: - 
       The functions will only access the triangle of A Triangle t 
- 
       If the DiagonalStorage implicit_unit_diagonal_t A A element_type 
 
- 
       
16.9.3.4.1. Overwriting triangular matrix-vector product [linalg.algs.blas2.trmv.ov]
template < class in_matrix_t , class Triangle , class DiagonalStorage , class in_vector_t , class out_vector_t > void triangular_matrix_vector_product ( in_matrix_t A , Triangle t , DiagonalStorage d , in_vector_t x , out_vector_t y ); template < class ExecutionPolicy , class in_matrix_t , class Triangle , class DiagonalStorage , class in_vector_t , class out_vector_t > void triangular_matrix_vector_product ( ExecutionPolicy && exec , in_matrix_t A , Triangle t , DiagonalStorage d , in_vector_t x , out_vector_t y ); 
For 
- 
     Effects: Assigns to the elements of y A x 
16.9.3.4.2. In-place triangular matrix-vector product [linalg.algs.blas2.trmv.in-place]
template < class in_matrix_t , class Triangle , class DiagonalStorage , class inout_vector_t > void triangular_matrix_vector_product ( in_matrix_t A , Triangle t , DiagonalStorage d , inout_vector_t y ); 
For 
- 
     Requires: A . extent ( 1 ) y . extent ( 0 ) 
- 
     Mandates: If neither A . static_extent ( 1 ) y . static_extent ( 0 ) dynamic_extent A . static_extent ( 1 ) y . static_extent ( 0 ) 
- 
     Effects: Overwrites y A y 
16.9.3.4.3. Updating triangular matrix-vector product [linalg.algs.blas2.trmv.up]
template < class in_matrix_t , class Triangle , class DiagonalStorage , class in_vector_1_t , class in_vector_2_t , class out_vector_t > void triangular_matrix_vector_product ( in_matrix_t A , Triangle t , DiagonalStorage d , in_vector_1_t x , in_vector_2_t y , out_vector_t z ); template < class ExecutionPolicy , class in_matrix_t , class Triangle , class DiagonalStorage , class in_vector_1_t , class in_vector_2_t , class out_vector_t > void triangular_matrix_vector_product ( ExecutionPolicy && exec , in_matrix_t A , Triangle t , DiagonalStorage d , in_vector_1_t x , in_vector_2_t y , out_vector_t z ); 
For 
- 
     Effects: Assigns to the elements of z y A x 
16.9.3.5. Solve a triangular linear system [linalg.algs.blas2.trsv]
[Note: These functions correspond to the BLAS functions 
The following requirements apply to all functions in this section.
- 
     Requires: - 
       A . extent ( 0 ) A . extent ( 1 ) 
- 
       A . extent ( 1 ) b . extent ( 0 ) 
 
- 
       
- 
     Constraints: - 
       A . rank () 
- 
       b . rank () 
- 
       in_matrix_t layout_blas_packed 
- 
       if in_matrix_t layout_blas_packed Triangle Triangle 
 
- 
       
- 
     Mandates: - 
       if neither A . static_extent ( 0 ) A . static_extent ( 1 ) dynamic_extent A . static_extent ( 0 ) A . static_extent ( 1 ) 
- 
       if neither A . static_extent ( 1 ) b . static_extent ( 0 ) dynamic_extent A . static_extent ( 1 ) b . static_extent ( 0 ) 
 
- 
       
- 
     Complexity: All algorithms in this Clause with mdspan mdspan x . extent ( 0 ) A . extent ( 1 ) 
- 
     Remarks: - 
       The functions will only access the triangle of A Triangle t 
- 
       If the DiagonalStorage implicit_unit_diagonal_t A A element_type 
 
- 
       
16.9.3.5.1. Not-in-place triangular solve [linalg.algs.blas2.trsv.not-in-place]
template < class in_matrix_t , class Triangle , class DiagonalStorage , class in_vector_t , class out_vector_t > void triangular_matrix_vector_solve ( in_matrix_t A , Triangle t , DiagonalStorage d , in_vector_t b , out_vector_t x ); template < class ExecutionPolicy , class in_matrix_t , class Triangle , class DiagonalStorage , class in_vector_t , class out_vector_t > void triangular_matrix_vector_solve ( ExecutionPolicy && exec , in_matrix_t A , Triangle t , DiagonalStorage d , in_vector_t b , out_vector_t x ); 
If 
If 
- 
     Requires: A . extent ( 0 ) x . extent ( 0 ) 
- 
     Constraints: x . rank () b . rank () 
- 
     Mandates: If neither A . static_extent ( 0 ) x . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) x . static_extent ( 0 ) 
- 
     Effects: Assigns to the elements of x 
16.9.3.5.2. In-place triangular solve [linalg.algs.blas2.trsv.in-place]
template < class in_matrix_t , class Triangle , class DiagonalStorage , class inout_vector_t > void triangular_matrix_vector_solve ( in_matrix_t A , Triangle t , DiagonalStorage d , inout_vector_t b ); template < class ExecutionPolicy , class in_matrix_t , class Triangle , class DiagonalStorage , class inout_vector_t > void triangular_matrix_vector_solve ( ExecutionPolicy && exec , in_matrix_t A , Triangle t , DiagonalStorage d , inout_vector_t b ); 
[Note:
Performing triangular solve in place hinders parallelization.
However, other 
--end note]
If 
If 
- 
     Requires: A . extent ( 0 ) b . extent ( 0 ) 
- 
     Mandates: If neither A . static_extent ( 0 ) b . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) b . static_extent ( 0 ) 
- 
     Effects: Overwrites b 
16.9.3.6. Rank-1 (outer product) update of a matrix [linalg.algs.blas2.rank1]
16.9.3.6.1. Nonsymmetric nonconjugated rank-1 update [linalg.algs.blas2.rank1.geru]
template < class in_vector_1_t , class in_vector_2_t , class inout_matrix_t > void matrix_rank_1_update ( in_vector_1_t x , in_vector_2_t y , inout_matrix_t A ); template < class ExecutionPolicy , class in_vector_1_t , class in_vector_2_t , class inout_matrix_t > void matrix_rank_1_update ( ExecutionPolicy && exec , in_vector_1_t x , in_vector_2_t y , inout_matrix_t A ); 
[Note: This function corresponds to the BLAS functions 
For 
- 
     Requires: - 
       A . extent ( 0 ) x . extent ( 0 ) 
- 
       A . extent ( 1 ) y . extent ( 0 ) 
 
- 
       
- 
     Constraints: A . rank () x . rank () y . rank () 
- 
     Mandates: - 
       If neither A . static_extent ( 0 ) x . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) x . static_extent ( 0 ) 
- 
       If neither A . static_extent ( 1 ) y . static_extent ( 0 ) dynamic_extent A . static_extent ( 1 ) y . static_extent ( 0 ) 
 
- 
       
- 
     Effects: Assigns to A A x y 
- 
     Complexity: The count of mdspan x . extent ( 0 ) y . extent ( 0 ) 
[Note: Users can get 
16.9.3.6.2. Nonsymmetric conjugated rank-1 update [linalg.algs.blas2.rank1.gerc]
template < class in_vector_1_t , class in_vector_2_t , class inout_matrix_t > void matrix_rank_1_update_c ( in_vector_1_t x , in_vector_2_t y , inout_matrix_t A ); template < class ExecutionPolicy , class in_vector_1_t , class in_vector_2_t , class inout_matrix_t > void matrix_rank_1_update_c ( ExecutionPolicy && exec , in_vector_1_t x , in_vector_2_t y , inout_matrix_t A ); 
[Note: This function corresponds to the BLAS functions 
- 
     Effects: Equivalent to matrix_rank_1_update ( x , conjugated ( y ), A ); 
16.9.3.6.3. Rank-1 update of a Symmetric matrix [linalg.algs.blas2.rank1.syr]
template < class in_vector_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_1_update ( in_vector_t x , inout_matrix_t A , Triangle t ); template < class ExecutionPolicy , class in_vector_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_1_update ( ExecutionPolicy && exec , in_vector_t x , inout_matrix_t A , Triangle t ); template < class T , class in_vector_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_1_update ( T alpha , in_vector_t x , inout_matrix_t A , Triangle t ); template < class ExecutionPolicy , class T , class in_vector_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_1_update ( ExecutionPolicy && exec , T alpha , in_vector_t x , inout_matrix_t A , Triangle t ); 
[Note:
These functions correspond to the BLAS functions 
They have overloads taking a scaling factor 
--end note]
For 
- 
     Requires: - 
       A . extent ( 0 ) A . extent ( 1 ) 
- 
       A . extent ( 0 ) x . extent ( 0 ) 
 
- 
       
- 
     Constraints: - 
       A . rank () x . rank () 
- 
       A layout_blas_packed 
- 
       if A layout_blas_packed Triangle Triangle 
 
- 
       
- 
     Mandates: - 
       If neither A . static_extent ( 0 ) A . static_extent ( 1 ) dynamic_extent A . static_extent ( 0 ) A . static_extent ( 1 ) 
- 
       If neither A . static_extent ( 0 ) x . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) x . static_extent ( 0 ) 
 
- 
       
- 
     Effects: - 
       Overloads without an alpha A A x x 
- 
       Overloads with an alpha A A x x 
 
- 
       
- 
     Complexity: The count of mdspan x . extent ( 0 ) x . extent ( 0 ) 
- 
     Remarks: The functions will only access the triangle of A Triangle t i , j A ( j , i ) A ( i , j ) 
16.9.3.6.4. Rank-1 update of a Hermitian matrix [linalg.algs.blas2.rank1.her]
template < class in_vector_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_1_update ( in_vector_t x , inout_matrix_t A , Triangle t ); template < class ExecutionPolicy , class in_vector_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_1_update ( ExecutionPolicy && exec , in_vector_t x , inout_matrix_t A , Triangle t ); template < class T , class in_vector_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_1_update ( T alpha , in_vector_t x , inout_matrix_t A , Triangle t ); template < class ExecutionPolicy , class T , class in_vector_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_1_update ( ExecutionPolicy && exec , T alpha , in_vector_t x , inout_matrix_t A , Triangle t ); 
[Note:
These functions correspond to the BLAS functions 
They have overloads taking a scaling factor 
--end note]
For 
- 
     Requires: - 
       A . extent ( 0 ) A . extent ( 1 ) 
- 
       A . extent ( 0 ) x . extent ( 0 ) 
 
- 
       
- 
     Constraints: - 
       A . rank () x . rank () 
- 
       A layout_blas_packed 
- 
       If A layout_blas_packed Triangle Triangle 
 
- 
       
- 
     Mandates: - 
       If neither A . static_extent ( 0 ) A . static_extent ( 1 ) dynamic_extent A . static_extent ( 0 ) A . static_extent ( 1 ) 
- 
       If neither A . static_extent ( 0 ) x . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) x . static_extent ( 0 ) 
 
- 
       
- 
     Effects: - 
       Overloads without alpha A A x x 
- 
       Overloads with alpha A A x x 
 
- 
       
- 
     Complexity: The count of mdspan x . extent ( 0 ) x . extent ( 0 ) 
- 
     Remarks: - 
       The functions will only access the triangle of A Triangle t 
- 
       If inout_matrix_t :: element_type complex < RA > RA i , j A ( j , i ) conj ( A ( i , j )) A ( j , k ) A ( i , j ) 
 
- 
       
16.9.3.7. Rank-2 update of a symmetric matrix [linalg.algs.blas2.rank2.syr2]
template < class in_vector_1_t , class in_vector_2_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_2_update ( in_vector_1_t x , in_vector_2_t y , inout_matrix_t A , Triangle t ); template < class ExecutionPolicy , class in_vector_1_t , class in_vector_2_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_2_update ( ExecutionPolicy && exec , in_vector_1_t x , in_vector_2_t y , inout_matrix_t A , Triangle t ); 
[Note: These functions correspond to the BLAS functions 
For 
- 
     Requires: - 
       A . extent ( 0 ) A . extent ( 1 ) 
- 
       A . extent ( 0 ) x . extent ( 0 ) 
- 
       A . extent ( 0 ) y . extent ( 0 ) 
 
- 
       
- 
     Constraints: - 
       A . rank () x . rank () y . rank () 
- 
       A layout_blas_packed 
- 
       If A layout_blas_packed Triangle Triangle 
 
- 
       
- 
     Mandates: - 
       If neither A . static_extent ( 0 ) A . static_extent ( 1 ) dynamic_extent A . static_extent ( 0 ) A . static_extent ( 1 ) 
- 
       If neither A . static_extent ( 0 ) x . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) x . static_extent ( 0 ) 
- 
       If neither A . static_extent ( 0 ) y . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) y . static_extent ( 0 ) 
 
- 
       
- 
     Effects: Assigns to A A x y y x 
- 
     Complexity: The count of mdspan x . extent ( 0 ) y . extent ( 0 ) 
- 
     Remarks: The functions will only access the triangle of A Triangle t i , j A ( j , i ) A ( i , j ) 
16.9.3.8. Rank-2 update of a Hermitian matrix [linalg.algs.blas2.rank2.her2]
template < class in_vector_1_t , class in_vector_2_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_2_update ( in_vector_1_t x , in_vector_2_t y , inout_matrix_t A , Triangle t ); template < class ExecutionPolicy , class in_vector_1_t , class in_vector_2_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_2_update ( ExecutionPolicy && exec , in_vector_1_t x , in_vector_2_t y , inout_matrix_t A , Triangle t ); 
[Note: These functions correspond to the BLAS functions 
For 
- 
     Requires: - 
       A . extent ( 0 ) A . extent ( 1 ) 
- 
       A . extent ( 0 ) x . extent ( 0 ) 
- 
       A . extent ( 0 ) y . extent ( 0 ) 
 
- 
       
- 
     Constraints: - 
       A . rank () x . rank () y . rank () 
- 
       A layout_blas_packed 
- 
       If A layout_blas_packed Triangle Triangle 
 
- 
       
- 
     Mandates: - 
       If neither A . static_extent ( 0 ) A . static_extent ( 1 ) dynamic_extent A . static_extent ( 0 ) A . static_extent ( 1 ) 
- 
       If neither A . static_extent ( 0 ) x . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) x . static_extent ( 0 ) 
- 
       If neither A . static_extent ( 0 ) y . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) y . static_extent ( 0 ) 
 
- 
       
- 
     Effects: Assigns to A A x y y x 
- 
     Complexity: The count of mdspan x . extent ( 0 ) y . extent ( 0 ) 
- 
     Remarks: - 
       The functions will only access the triangle of A Triangle t 
- 
       If inout_matrix_t :: element_type complex < RA > RA i , j A ( j , i ) conj ( A ( i , j )) A ( j , i ) A ( i , j ) 
 
- 
       
16.9.4. BLAS 3 functions [linalg.algs.blas3]
16.9.4.1. General matrix-matrix product [linalg.algs.blas3.gemm]
[Note: These functions correspond to the BLAS function 
The following requirements apply to all functions in this section.
- 
     Requires: - 
       C . extent ( 0 ) E . extent ( 0 ) 
- 
       C . extent ( 1 ) E . extent ( 1 ) 
- 
       A . extent ( 1 ) B . extent ( 0 ) 
- 
       A . extent ( 0 ) C . extent ( 0 ) 
- 
       B . extent ( 1 ) C . extent ( 1 ) 
 
- 
       
- 
     Constraints: - 
       in_matrix_1_t in_matrix_2_t in_matrix_3_t out_matrix_t 
- 
       A . rank () B . rank () C . rank () E . rank () 
 
- 
       
- 
     Mandates: - 
       For all r C . rank () C . static_extent ( r ) E . static_extent ( r ) dynamic_extent C . static_extent ( r ) E . static_extent ( r ) 
- 
       If neither A . static_extent ( 1 ) B . static_extent ( 0 ) dynamic_extent A . static_extent ( 1 ) B . static_extent ( 0 ) 
- 
       If neither A . static_extent ( 0 ) C . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) C . static_extent ( 0 ) 
- 
       If neither B . static_extent ( 1 ) C . static_extent ( 1 ) dynamic_extent B . static_extent ( 1 ) C . static_extent ( 1 ) 
 
- 
       
- 
     Complexity: For all algorithms in this Clause, the count of mdspan A . extent ( 0 ) B . extent ( 1 ) A . extent ( 1 ) 
16.9.4.1.1. Overwriting general matrix-matrix product
template < class in_matrix_1_t , class in_matrix_2_t , class out_matrix_t > void matrix_product ( in_matrix_1_t A , in_matrix_2_t B , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class in_matrix_2_t , class out_matrix_t > void matrix_product ( ExecutionPolicy && exec , in_matrix_1_t A , in_matrix_2_t B , out_matrix_t C ); 
For 
- 
     Effects: Assigns to the elements of the matrix C A B 
16.9.4.1.2. Updating general matrix-matrix product
template < class in_matrix_1_t , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void matrix_product ( in_matrix_1_t A , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void matrix_product ( ExecutionPolicy && exec , in_matrix_1_t A , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); 
For 
- 
     Effects: Assigns to the elements of the matrix C E A B 
- 
     Remarks: C E 
16.9.4.2. Symmetric matrix-matrix product [linalg.algs.blas3.symm]
[Note:
These functions correspond to the BLAS function 
Unlike the symmetric rank-1 or rank-k update functions, these functions assume
that the input matrix 
--end note]
The following requirements apply to all functions in this section.
- 
     Requires: - 
       A . extent ( 0 ) A . extent ( 1 ) 
- 
       C . extent ( 0 ) E . extent ( 0 ) 
- 
       C . extent ( 1 ) E . extent ( 1 ) 
 
- 
       
- 
     Constraints: - 
       in_matrix_1_t layout_blas_packed 
- 
       in_matrix_2_t in_matrix_3_t out_matrix_t 
- 
       If in_matrix_1_t layout_blas_packed Triangle Triangle 
- 
       A . rank () B . rank () C . rank () E . rank () 
 
- 
       
- 
     Mandates: - 
       If neither A . static_extent ( 0 ) A . static_extent ( 1 ) dynamic_extent A . static_extent ( 0 ) A . static_extent ( 1 ) 
- 
       For all r C . rank () C . static_extent ( r ) E . static_extent ( r ) dynamic_extent C . static_extent ( r ) E . static_extent ( r ) 
 
- 
       
- 
     Remarks: - 
       The functions will only access the triangle of A Triangle t i , j A ( j , i ) A ( i , j ) 
- 
       Remarks: C E 
 
- 
       
The following requirements apply to all overloads of 
- 
     Requires: - 
       A . extent ( 1 ) B . extent ( 0 ) 
- 
       A . extent ( 0 ) C . extent ( 0 ) 
- 
       B . extent ( 1 ) C . extent ( 1 ) 
 
- 
       
- 
     Mandates: - 
       if neither A . static_extent ( 1 ) B . static_extent ( 0 ) dynamic_extent A . static_extent ( 1 ) B . static_extent ( 0 ) 
- 
       if neither A . static_extent ( 0 ) C . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) C . static_extent ( 0 ) 
- 
       if neither B . static_extent ( 1 ) C . static_extent ( 1 ) dynamic_extent B . static_extent ( 1 ) C . static_extent ( 1 ) 
 
- 
       
- 
     Complexity: The count of mdspan A . extent ( 0 ) B . extent ( 1 ) A . extent ( 1 ) 
The following requirements apply to all overloads of 
- 
     Requires: - 
       B . extent ( 1 ) A . extent ( 0 ) 
- 
       B . extent ( 0 ) C . extent ( 0 ) 
- 
       A . extent ( 1 ) C . extent ( 1 ) 
 
- 
       
- 
     Mandates: - 
       if neither B . static_extent ( 1 ) A . static_extent ( 0 ) dynamic_extent B . static_extent ( 1 ) A . static_extent ( 0 ) 
- 
       if neither B . static_extent ( 0 ) C . static_extent ( 0 ) dynamic_extent B . static_extent ( 0 ) C . static_extent ( 0 ) 
- 
       if neither A . static_extent ( 1 ) C . static_extent ( 1 ) dynamic_extent A . static_extent ( 1 ) C . static_extent ( 1 ) 
 
- 
       
- 
     Complexity: The count of mdspan B . extent ( 0 ) A . extent ( 1 ) B . extent ( 1 ) 
16.9.4.2.1. Overwriting symmetric matrix-matrix left product [linalg.algs.blas3.symm.ov.left]
template < class in_matrix_1_t , class Triangle , class in_matrix_2_t , class out_matrix_t > void symmetric_matrix_left_product ( in_matrix_1_t A , Triangle t , in_matrix_2_t B , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class in_matrix_2_t , class out_matrix_t > void symmetric_matrix_left_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , in_matrix_2_t B , out_matrix_t C ); 
For 
- 
     Effects: Assigns to the elements of the matrix C A B 
16.9.4.2.2. Overwriting symmetric matrix-matrix right product [linalg.algs.blas3.symm.ov.right]
template < class in_matrix_1_t , class Triangle , class in_matrix_2_t , class out_matrix_t > void symmetric_matrix_right_product ( in_matrix_1_t A , Triangle t , in_matrix_2_t B , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class in_matrix_2_t , class out_matrix_t > void symmetric_matrix_right_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , in_matrix_2_t B , out_matrix_t C ); 
For 
- 
     Effects: Assigns to the elements of the matrix C B A 
16.9.4.2.3. Updating symmetric matrix-matrix left product [linalg.algs.blas3.symm.up.left]
template < class in_matrix_1_t , class Triangle , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void symmetric_matrix_left_product ( in_matrix_1_t A , Triangle t , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void symmetric_matrix_left_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); 
For 
- 
     Effects: assigns to the elements of the matrix C E A B 
16.9.4.2.4. Updating symmetric matrix-matrix right product [linalg.algs.blas3.symm.up.right]
template < class in_matrix_1_t , class Triangle , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void symmetric_matrix_right_product ( in_matrix_1_t A , Triangle t , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void symmetric_matrix_right_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); 
For 
- 
     Effects: assigns to the elements of the matrix C E B A 
16.9.4.3. Hermitian matrix-matrix product [linalg.algs.blas3.hemm]
[Note:
These functions correspond to the BLAS function 
Unlike the Hermitian rank-1 or rank-k update functions, these functions assume that the input matrix -- not the output matrix -- is Hermitian.
--end note]
The following requirements apply to all functions in this section.
- 
     Requires: - 
       A . extent ( 0 ) A . extent ( 1 ) 
- 
       C . extent ( 0 ) E . extent ( 0 ) 
- 
       C . extent ( 1 ) E . extent ( 1 ) 
 
- 
       
- 
     Constraints: - 
       in_matrix_1_t layout_blas_packed 
- 
       in_matrix_2_t in_matrix_3_t out_matrix_t 
- 
       If in_matrix_1_t layout_blas_packed Triangle Triangle 
- 
       A . rank () B . rank () C . rank () E . rank () 
 
- 
       
- 
     Mandates: - 
       If neither A . static_extent ( 0 ) A . static_extent ( 1 ) dynamic_extent A . static_extent ( 0 ) A . static_extent ( 1 ) 
- 
       For all r C . rank () C . static_extent ( r ) E . static_extent ( r ) dynamic_extent C . static_extent ( r ) E . static_extent ( r ) 
 
- 
       
- 
     Remarks: - 
       The functions will only access the triangle of A Triangle t 
- 
       If in_matrix_1_t :: element_type complex < RA > RA i , j A ( j , i ) conj ( A ( i , j )) A ( j , i ) A ( i , j ) 
- 
       C E 
 
- 
       
The following requirements apply to all overloads of 
- 
     Requires: - 
       A . extent ( 1 ) B . extent ( 0 ) 
- 
       A . extent ( 0 ) C . extent ( 0 ) 
- 
       B . extent ( 1 ) C . extent ( 1 ) 
 
- 
       
- 
     Mandates: - 
       If neither A . static_extent ( 1 ) B . static_extent ( 0 ) dynamic_extent A . static_extent ( 1 ) B . static_extent ( 0 ) 
- 
       if neither A . static_extent ( 0 ) C . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) C . static_extent ( 0 ) 
- 
       if neither B . static_extent ( 1 ) C . static_extent ( 1 ) dynamic_extent B . static_extent ( 1 ) C . static_extent ( 1 ) 
 
- 
       
- 
     Complexity: The count of mdspan A . extent ( 0 ) B . extent ( 1 ) A . extent ( 1 ) 
The following requirements apply to all overloads of 
- 
     Requires: - 
       B . extent ( 1 ) A . extent ( 0 ) 
- 
       B . extent ( 0 ) C . extent ( 0 ) 
- 
       A . extent ( 1 ) C . extent ( 1 ) 
 
- 
       
- 
     Mandates: - 
       If neither B . static_extent ( 1 ) A . static_extent ( 0 ) dynamic_extent B . static_extent ( 1 ) A . static_extent ( 0 ) 
- 
       if neither B . static_extent ( 0 ) C . static_extent ( 0 ) dynamic_extent B . static_extent ( 0 ) C . static_extent ( 0 ) 
- 
       if neither A . static_extent ( 1 ) C . static_extent ( 1 ) dynamic_extent A . static_extent ( 1 ) C . static_extent ( 1 ) 
 
- 
       
- 
     Complexity: The count of mdspan B . extent ( 0 ) A . extent ( 1 ) B . extent ( 1 ) 
16.9.4.3.1. Overwriting Hermitian matrix-matrix left product [linalg.algs.blas3.hemm.ov.left]
template < class in_matrix_1_t , class Triangle , class in_matrix_2_t , class out_matrix_t > void hermitian_matrix_left_product ( in_matrix_1_t A , Triangle t , in_matrix_2_t B , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class in_matrix_2_t , class out_matrix_t > void hermitian_matrix_left_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , in_matrix_2_t B , out_matrix_t C ); 
For 
- 
     Effects: Assigns to the elements of the matrix C A B 
16.9.4.3.2. Overwriting Hermitian matrix-matrix right product [linalg.algs.blas3.hemm.ov.right]
template < class in_matrix_1_t , class Triangle , class in_matrix_2_t , class out_matrix_t > void hermitian_matrix_right_product ( in_matrix_1_t A , Triangle t , in_matrix_2_t B , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class in_matrix_2_t , class out_matrix_t > void hermitian_matrix_right_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , in_matrix_2_t B , out_matrix_t C ); 
For 
- 
     Effects: Assigns to the elements of the matrix C B A 
16.9.4.3.3. Updating Hermitian matrix-matrix left product [linalg.algs.blas3.hemm.up.left]
template < class in_matrix_1_t , class Triangle , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void hermitian_matrix_left_product ( in_matrix_1_t A , Triangle t , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void hermitian_matrix_left_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); 
For 
- 
     Effects: Assigns to the elements of the matrix C E A B 
16.9.4.3.4. Updating Hermitian matrix-matrix right product [linalg.algs.blas3.hemm.up.right]
template < class in_matrix_1_t , class Triangle , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void hermitian_matrix_right_product ( in_matrix_1_t A , Triangle t , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void hermitian_matrix_right_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); 
For 
- 
     Effects: Assigns to the elements of the matrix C E B A 
16.9.4.4. Triangular matrix-matrix product [linalg.algs.blas3.trmm]
[Note: These functions correspond to the BLAS function 
The following requirements apply to all functions in this section.
- 
     Requires: - 
       A . extent ( 0 ) A . extent ( 1 ) 
- 
       C . extent ( 0 ) E . extent ( 0 ) 
- 
       C . extent ( 1 ) E . extent ( 1 ) 
 
- 
       
- 
     Constraints: - 
       in_matrix_1_t layout_blas_packed 
- 
       in_matrix_2_t in_matrix_3_t out_matrix_t inout_matrix_t 
- 
       If in_matrix_1_t layout_blas_packed Triangle Triangle 
- 
       A . rank () B . rank () C . rank () E . rank () 
 
- 
       
- 
     Mandates: - 
       If neither A . static_extent ( 0 ) A . static_extent ( 1 ) dynamic_extent A . static_extent ( 0 ) A . static_extent ( 1 ) 
- 
       For all r C . rank () C . static_extent ( r ) E . static_extent ( r ) dynamic_extent C . static_extent ( r ) E . static_extent ( r ) 
 
- 
       
- 
     Remarks: - 
       The functions will only access the triangle of A Triangle t 
- 
       If the DiagonalStorage implicit_unit_diagonal_t A A element_type 
- 
       C E 
 
- 
       
The following requirements apply to all overloads of 
- 
     Requires: - 
       A . extent ( 1 ) B . extent ( 0 ) 
- 
       A . extent ( 0 ) C . extent ( 0 ) 
- 
       B . extent ( 1 ) C . extent ( 1 ) 
 
- 
       
- 
     Mandates: - 
       if neither A . static_extent ( 1 ) B . static_extent ( 0 ) dynamic_extent A . static_extent ( 1 ) B . static_extent ( 0 ) 
- 
       if neither A . static_extent ( 0 ) C . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) C . static_extent ( 0 ) 
- 
       if neither B . static_extent ( 1 ) C . static_extent ( 1 ) dynamic_extent B . static_extent ( 1 ) C . static_extent ( 1 ) 
 
- 
       
- 
     Complexity: The count of mdspan A . extent ( 0 ) B . extent ( 1 ) A . extent ( 1 ) 
The following requirements apply to all overloads of 
- 
     Requires: - 
       B . extent ( 1 ) A . extent ( 0 ) 
- 
       B . extent ( 0 ) C . extent ( 0 ) 
- 
       A . extent ( 1 ) C . extent ( 1 ) 
 
- 
       
- 
     Mandates: - 
       if neither B . static_extent ( 1 ) A . static_extent ( 0 ) dynamic_extent B . static_extent ( 1 ) A . static_extent ( 0 ) 
- 
       if neither B . static_extent ( 0 ) C . static_extent ( 0 ) dynamic_extent B . static_extent ( 0 ) C . static_extent ( 0 ) 
- 
       if neither A . static_extent ( 1 ) C . static_extent ( 1 ) dynamic_extent A . static_extent ( 1 ) C . static_extent ( 1 ) 
 
- 
       
- 
     Complexity: The count of mdspan B . extent ( 0 ) A . extent ( 1 ) B . extent ( 1 ) 
16.9.4.4.1. Overwriting triangular matrix-matrix left product [linalg.algs.blas3.trmm.ov.left]
Not-in-place overwriting triangular matrix-matrix left product
template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class out_matrix_t > void triangular_matrix_left_product ( in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class out_matrix_t > void triangular_matrix_left_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , out_matrix_t C ); 
For 
- 
     Effects: Assigns to the elements of the matrix C A B 
In-place overwriting triangular matrix-matrix left product
template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class inout_matrix_t > void triangular_matrix_left_product ( in_matrix_1_t A , Triangle t , DiagonalStorage d , inout_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class inout_matrix_t > void triangular_matrix_left_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , inout_matrix_t C ); 
For 
- 
     Requires: A . extent ( 1 ) C . extent ( 0 ) 
- 
     Mandates: If neither A . static_extent ( 1 ) C . static_extent ( 0 ) dynamic_extent A . static_extent ( 1 ) C . static_extent ( 0 ) 
- 
     Effects: Overwrites C A C 
16.9.4.4.2. Overwriting triangular matrix-matrix right product [linalg.algs.blas3.trmm.ov.right]
Not-in-place overwriting triangular matrix-matrix right product
template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class out_matrix_t > void triangular_matrix_right_product ( in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class out_matrix_t > void triangular_matrix_right_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , out_matrix_t C ); 
For 
- 
     Effects: Assigns to the elements of the matrix C B A 
In-place overwriting triangular matrix-matrix right product
template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class inout_matrix_t > void triangular_matrix_right_product ( in_matrix_1_t A , Triangle t , DiagonalStorage d , inout_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class inout_matrix_t > void triangular_matrix_right_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , inout_matrix_t C ); 
For 
- 
     Requires: C . extent ( 1 ) A . extent ( 0 ) 
- 
     Mandates: If neither C . static_extent ( 1 ) A . static_extent ( 0 ) dynamic_extent C . static_extent ( 1 ) A . static_extent ( 0 ) 
- 
     Effects: Overwrites C C A 
16.9.4.4.3. Updating triangular matrix-matrix left product [linalg.algs.blas3.trmm.up.left]
template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void triangular_matrix_left_product ( in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void triangular_matrix_left_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); 
For 
- 
     Effects: Assigns to the elements of the matrix C E A B 
16.9.4.4.4. Updating triangular matrix-matrix right product [linalg.algs.blas3.trmm.up.right]
template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void triangular_matrix_right_product ( in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class in_matrix_3_t , class out_matrix_t > void triangular_matrix_right_product ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , in_matrix_3_t E , out_matrix_t C ); 
For 
- 
     Effects: Assigns to the elements of the matrix C E B A 
16.9.4.5. Rank-k update of a symmetric or Hermitian matrix [linalg.alg.blas3.rank-k]
[Note: Users can achieve the effect of the 
- 
     Complexity: For all algorithms in this Clause, the count of mdspan A . extent ( 0 ) A . extent ( 1 ) A . extent ( 0 ) 
16.9.4.5.1. Rank-k symmetric matrix update [linalg.alg.blas3.rank-k.syrk]
template < class in_matrix_1_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_k_update ( in_matrix_1_t A , inout_matrix_t C , Triangle t ); template < class ExecutionPolicy , class in_matrix_1_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_k_update ( ExecutionPolicy && exec , in_matrix_1_t A , inout_matrix_t C , Triangle t ); template < class T , class in_matrix_1_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_k_update ( T alpha , in_matrix_1_t A , inout_matrix_t C , Triangle t ); template < class ExecutionPolicy , class T , class in_matrix_1_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_k_update ( ExecutionPolicy && exec , T alpha , in_matrix_1_t A , inout_matrix_t C , Triangle t ); 
[Note:
These functions correspond to the BLAS function 
They take an optional scaling factor 
--end note]
For all overloads without an 
For all overloads with an 
- 
     Requires: - 
       A . extent ( 0 ) C . extent ( 0 ) 
- 
       C . extent ( 0 ) C . extent ( 1 ) 
 
- 
       
- 
     Constraints: - 
       A . rank () C . rank () 
- 
       C layout_blas_packed 
- 
       If C layout_blas_packed Triangle Triangle 
 
- 
       
- 
     Mandates: - 
       If neither A . static_extent ( 0 ) C . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) C . static_extent ( 0 ) 
- 
       If neither C . static_extent ( 0 ) C . static_extent ( 1 ) dynamic_extent C . static_extent ( 0 ) C . static_extent ( 1 ) 
 
- 
       
- 
     Effects: - 
       Overloads without alpha C C A A 
- 
       Overloads with alpha C C A A 
 
- 
       
- 
     Remarks: The functions will only access the triangle of C Triangle t i , j C ( j , i ) C ( i , j ) 
16.9.4.5.2. Rank-k Hermitian matrix update [linalg.alg.blas3.rank-k.herk]
template < class in_matrix_1_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_k_update ( in_matrix_1_t A , inout_matrix_t C , Triangle t ); template < class ExecutionPolicy , class in_matrix_1_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_k_update ( ExecutionPolicy && exec , in_matrix_1_t A , inout_matrix_t C , Triangle t ); template < class T , class in_matrix_1_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_k_update ( T alpha , in_matrix_1_t A , inout_matrix_t C , Triangle t ); template < class ExecutionPolicy , class T , class in_matrix_1_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_k_update ( ExecutionPolicy && exec , T alpha , in_matrix_1_t A , inout_matrix_t C , Triangle t ); 
[Note:
These functions correspond to the BLAS function 
They take an optional scaling factor 
--end note]
For all overloads without an 
For all overloads with an 
- 
     Requires: - 
       A . extent ( 0 ) C . extent ( 0 ) 
- 
       C . extent ( 0 ) C . extent ( 1 ) 
 
- 
       
- 
     Constraints: - 
       A . rank () C . rank () 
- 
       C layout_blas_packed 
- 
       If C layout_blas_packed Triangle Triangle 
 
- 
       
- 
     Mandates: - 
       If neither A . static_extent ( 0 ) C . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) C . static_extent ( 0 ) 
- 
       If neither C . static_extent ( 0 ) C . static_extent ( 1 ) dynamic_extent C . static_extent ( 0 ) C . static_extent ( 1 ) 
 
- 
       
- 
     Effects: - 
       Overloads without alpha C C A A 
- 
       Overloads with alpha C C A A 
 
- 
       
- 
     Remarks: The functions will only access the triangle of C Triangle t i , j C ( j , i ) C ( i , j ) 
16.9.4.6. Rank-2k update of a symmetric or Hermitian matrix [linalg.alg.blas3.rank2k]
[Note: Users can achieve the effect of the 
- 
     Complexity: The count of mdspan A . extent ( 0 ) A . extent ( 1 ) B . extent ( 1 ) 
16.9.4.6.1. Rank-2k symmetric matrix update [linalg.alg.blas3.rank2k.syr2k]
template < class in_matrix_1_t , class in_matrix_2_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_2k_update ( in_matrix_1_t A , in_matrix_2_t B , inout_matrix_t C , Triangle t ); template < class ExecutionPolicy , class in_matrix_1_t , class in_matrix_2_t , class inout_matrix_t , class Triangle > void symmetric_matrix_rank_2k_update ( ExecutionPolicy && exec , in_matrix_1_t A , in_matrix_2_t B , inout_matrix_t C , Triangle t ); 
[Note: These functions correspond to the BLAS function 
For 
- 
     Requires: - 
       A . extent ( 0 ) C . extent ( 0 ) 
- 
       B . extent ( 1 ) C . extent ( 0 ) 
- 
       C . extent ( 0 ) C . extent ( 1 ) 
 
- 
       
- 
     Constraints: - 
       A . rank () B . rank () C . rank () 
- 
       C layout_blas_packed 
- 
       If C layout_blas_packed Triangle Triangle 
 
- 
       
- 
     Mandates: - 
       If neither A . static_extent ( 0 ) C . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) C . static_extent ( 0 ) 
- 
       If neither B . static_extent ( 1 ) C . static_extent ( 0 ) dynamic_extent B . static_extent ( 1 ) C . static_extent ( 0 ) 
- 
       If neither C . static_extent ( 0 ) C . static_extent ( 1 ) dynamic_extent C . static_extent ( 0 ) C . static_extent ( 1 ) 
 
- 
       
- 
     Effects: Assigns to C C A B B A 
- 
     Remarks: The functions will only access the triangle of C Triangle t i , j C ( j , i ) C ( i , j ) 
16.9.4.6.2. Rank-2k Hermitian matrix update [linalg.alg.blas3.rank2k.her2k]
template < class in_matrix_1_t , class in_matrix_2_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_2k_update ( in_matrix_1_t A , in_matrix_2_t B , inout_matrix_t C , Triangle t ); template < class ExecutionPolicy , class in_matrix_1_t , class in_matrix_2_t , class inout_matrix_t , class Triangle > void hermitian_matrix_rank_2k_update ( ExecutionPolicy && exec , in_matrix_1_t A , in_matrix_2_t B , inout_matrix_t C , Triangle t ); 
[Note: These functions correspond to the BLAS function 
For 
- 
     Requires: - 
       A . extent ( 0 ) C . extent ( 0 ) 
- 
       B . extent ( 1 ) C . extent ( 0 ) 
- 
       C . extent ( 0 ) C . extent ( 1 ) 
 
- 
       
- 
     Constraints: - 
       A . rank () B . rank () C . rank () 
- 
       C layout_blas_packed 
- 
       If C layout_blas_packed Triangle Triangle 
 
- 
       
- 
     Mandates: - 
       If neither A . static_extent ( 0 ) C . static_extent ( 0 ) dynamic_extent A . static_extent ( 0 ) C . static_extent ( 0 ) 
- 
       If neither B . static_extent ( 1 ) C . static_extent ( 0 ) dynamic_extent B . static_extent ( 1 ) C . static_extent ( 0 ) 
- 
       If neither C . static_extent ( 0 ) C . static_extent ( 1 ) dynamic_extent C . static_extent ( 0 ) C . static_extent ( 1 ) 
 
- 
       
- 
     Effects: Assigns to C C A B B A 
- 
     Remarks: - 
       The functions will only access the triangle of C Triangle t 
- 
       If inout_matrix_t :: element_type complex < RC > RC i , j C ( j , i ) conj ( C ( i , j )) C ( j , i ) C ( i , j ) 
 
- 
       
16.9.4.7. Solve multiple triangular linear systems [linalg.alg.blas3.trsm]
[Note: These functions correspond to the BLAS function 
The following requirements apply to all functions in this section.
- 
     Requires: - 
       For all r B . rank () X . extent ( r ) B . extent ( r ) 
- 
       A . extent ( 0 ) A . extent ( 1 ) 
 
- 
       
- 
     Constraints: - 
       A . rank () B . rank () 
- 
       X . rank () 
- 
       in_matrix_1_t layout_blas_packed 
- 
       in_matrix_2_t 
- 
       out_matrix_t 
- 
       inout_matrix_t 
- 
       if r , j X B X ( r , j ) = B ( r , j ) 
- 
       if DiagonalStorage explicit_diagonal_t i , j X X ( i , j ) /= A ( i , i ) 
 
- 
       
- 
     Mandates: - 
       For all r X . rank () X . static_extent ( r ) B . static_extent ( r ) dynamic_extent X . static_extent ( r ) B . static_extent ( r ) 
- 
       If neither A . static_extent ( 0 ) A . static_extent ( 1 ) dynamic_extent A . static_extent ( 0 ) A . static_extent ( 1 ) 
 
- 
       
- 
     Remarks: - 
       The functions will only access the triangle of A Triangle t 
- 
       If the DiagonalStorage implicit_unit_diagonal_t A A element_type 
 
- 
       
16.9.4.7.1. Solve multiple triangular linear systems with triangular matrix on the left [linalg.alg.blas3.trsm.left]
- 
     Complexity: The count of mdspan A . extent ( 0 ) A . extent ( 1 ) X . extent ( 1 ) 
Not-in-place left solve of multiple triangular systems
template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class out_matrix_t > void triangular_matrix_matrix_left_solve ( in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , out_matrix_t X ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class out_matrix_t > void triangular_matrix_matrix_left_solve ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , out_matrix_t X ); 
If 
If 
- 
     Requires: A . extent ( 1 ) B . extent ( 0 ) 
- 
     Mandates: If neither A . static_extent ( 1 ) B . static_extent ( 0 ) dynamic_extent A . static_extent ( 1 ) B . static_extent ( 0 ) 
- 
     Effects: Assigns to the elements of X 
In-place left solve of multiple triangular systems
template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class inout_matrix_t > void triangular_matrix_matrix_left_solve ( in_matrix_1_t A , Triangle t , DiagonalStorage d , inout_matrix_t B ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class inout_matrix_t > void triangular_matrix_matrix_left_solve ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , inout_matrix_t B ); 
[Note:
This algorithm makes it possible to compute factorizations like Cholesky and LU in place.
Performing triangular solve in place hinders parallelization.
However, other 
--end note]
If 
If 
- 
     Requires: A . extent ( 1 ) B . extent ( 0 ) 
- 
     Mandates: If neither A . static_extent ( 1 ) B . static_extent ( 0 ) dynamic_extent A . static_extent ( 1 ) B . static_extent ( 0 ) 
- 
     Effects: Overwrites B 
16.9.4.7.2. Solve multiple triangular linear systems with triangular matrix on the right [linalg.alg.blas3.trsm.right]
- 
     Complexity: The count of mdspan B . extent ( 0 ) B . extent ( 1 ) A . extent ( 1 ) 
Not-in-place right solve of multiple triangular systems
template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class out_matrix_t > void triangular_matrix_matrix_right_solve ( in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , out_matrix_t X ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class in_matrix_2_t , class out_matrix_t > void triangular_matrix_matrix_right_solve ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , in_matrix_2_t B , out_matrix_t X ); 
If 
If 
- 
     Requires: A . extent ( 1 ) B . extent ( 1 ) 
- 
     Mandates: If neither A . static_extent ( 1 ) B . static_extent ( 1 ) dynamic_extent A . static_extent ( 1 ) B . static_extent ( 1 ) 
- 
     Effects: Assigns to the elements of X 
In-place right solve of multiple triangular systems
template < class in_matrix_1_t , class Triangle , class DiagonalStorage , class inout_matrix_t > void triangular_matrix_matrix_right_solve ( in_matrix_1_t A , Triangle t , DiagonalStorage d , inout_matrix_t B ); template < class ExecutionPolicy , class in_matrix_1_t , class Triangle , class DiagonalStorage , class inout_matrix_t > void triangular_matrix_matrix_right_solve ( ExecutionPolicy && exec , in_matrix_1_t A , Triangle t , DiagonalStorage d , inout_matrix_t B ); 
[Note:
This algorithm makes it possible to compute factorizations like Cholesky and LU in place.
Performing triangular solve in place hinders parallelization.
However, other 
--end note]
If 
If 
- 
     Requires: A . extent ( 1 ) B . extent ( 1 ) 
- 
     Mandates: If neither A . static_extent ( 1 ) B . static_extent ( 1 ) dynamic_extent A . static_extent ( 1 ) B . static_extent ( 1 ) 
- 
     Effects: Overwrites B 
17. Examples
17.1. Cholesky factorization
This example shows how to compute the Cholesky factorization of a real
symmetric positive definite matrix A stored as an 
#include <linalg>#include <cmath>template < class inout_matrix_t , class Triangle > int cholesky_factor ( inout_matrix_t A , Triangle t ) { using element_type = typename inout_matrix_t :: element_type ; constexpr element_type ZERO {}; constexpr element_type ONE ( 1.0 ); const auto n = A . extent ( 0 ); if ( n == 0 ) { return 0 ; } else if ( n == 1 ) { if ( A ( 0 , 0 ) <= ZERO || isnan ( A ( 0 , 0 ))) { return 1 ; } A ( 0 , 0 ) = sqrt ( A ( 0 , 0 )); } else { // Partition A into [A11, A12, // A21, A22], // where A21 is the transpose of A12. const extents <>:: size_type n1 = n / 2 ; const extents <>:: size_type n2 = n - n1 ; auto A11 = submdspan ( A , pair { 0 , n1 }, pair { 0 , n1 }); auto A22 = submdspan ( A , pair { n1 , n }, pair { n1 , n }); // Factor A11 const int info1 = cholesky_factor ( A11 , t ); if ( info1 != 0 ) { return info1 ; } using std :: linalg :: symmetric_matrix_rank_k_update ; using std :: linalg :: transposed ; if constexpr ( std :: is_same_v < Triangle , upper_triangle_t > ) { // Update and scale A12 auto A12 = submdspan ( A , pair { 0 , n1 }, pair { n1 , n }); using std :: linalg :: triangular_matrix_matrix_left_solve ; triangular_matrix_matrix_left_solve ( transposed ( A11 ), upper_triangle , explicit_diagonal , A12 ); // A22 = A22 - A12^T * A12 symmetric_matrix_rank_k_update ( - ONE , transposed ( A12 ), A22 , t ); } else { // // Compute the Cholesky factorization A = L * L^T // // Update and scale A21 auto A21 = submdspan ( A , pair { n1 , n }, pair { 0 , n1 }); using std :: linalg :: triangular_matrix_matrix_right_solve ; triangular_matrix_matrix_right_solve ( transposed ( A11 ), lower_triangle , explicit_diagonal , A21 ); // A22 = A22 - A21 * A21^T symmetric_matrix_rank_k_update ( - ONE , A21 , A22 , t ); } // Factor A22 const int info2 = cholesky_factor ( A22 , t ); if ( info2 != 0 ) { return info2 + n1 ; } } } 
17.2. Solve linear system using Cholesky factorization
This example shows how to solve a symmetric positive definite linear
system Ax=b, using the Cholesky factorization computed in the previous
example in-place in the matrix 
template < class in_matrix_t , class Triangle , class in_vector_t , class out_vector_t > void cholesky_solve ( in_matrix_t A , Triangle t , in_vector_t b , out_vector_t x ) { using std :: linalg :: transposed ; using std :: linalg :: triangular_matrix_vector_solve ; if constexpr ( std :: is_same_v < Triangle , upper_triangle_t > ) { // Solve Ax=b where A = U^T U // // Solve U^T c = b, using x to store c. triangular_matrix_vector_solve ( transposed ( A ), t , explicit_diagonal , b , x ); // Solve U x = c, overwriting x with result. triangular_matrix_vector_solve ( A , t , explicit_diagonal , x ); } else { // Solve Ax=b where A = L L^T // // Solve L c = b, using x to store c. triangular_matrix_vector_solve ( A , t , explicit_diagonal , b , x ); // Solve L^T x = c, overwriting x with result. triangular_matrix_vector_solve ( transposed ( A ), t , explicit_diagonal , x ); } } 
17.3. Compute QR factorization of a tall skinny matrix
This example shows how to compute the QR factorization of a "tall and
skinny" matrix 
// Compute QR factorization A = Q R, with A storing Q. template < class inout_matrix_t , class out_matrix_t > int cholesky_tsqr_one_step ( inout_matrix_t A , // A on input, Q on output out_matrix_t R ) { // One might use cache size, sizeof(element_type), and A.extent(1) // to pick the number of rows per block. For now, we just pick // some constant. constexpr extents <>:: size_type max_num_rows_per_block = 500 ; using R_element_type = typename out_matrix_t :: element_type ; constexpr R_element_type ZERO {}; for ( extents <>:: size_type i = 0 ; i < R . extent ( 0 ); ++ i ) { for ( extents <>:: size_type j = 0 ; j < R . extent ( 1 ); ++ j ) { R ( 0 , 0 ) = ZERO ; } } // Cache-blocked version of R = R + A^T * A. const auto num_rows = A . extent ( 0 ); auto rest_num_rows = num_rows ; auto A_rest = A ; while ( A_rest . extent ( 0 ) > 0 ) { const ptrdiff num_rows_per_block = min ( A . extent ( 0 ), max_num_rows_per_block ); auto A_cur = submdspan ( A_rest , pair { 0 , num_rows_per_block }, full_extent ); A_rest = submdspan ( A_rest , pair { num_rows_per_block , A_rest . extent ( 0 )}, full_extent ); // R = R + A_cur^T * A_cur using std :: linalg :: symmetric_matrix_rank_k_update ; symmetric_matrix_rank_k_update ( transposed ( A_cur ), R , upper_triangle ); } const int info = cholesky_factor ( R , upper_triangle ); if ( info != 0 ) { return info ; } using std :: linalg :: triangular_matrix_matrix_left_solve ; triangular_matrix_matrix_left_solve ( R , upper_triangle , A ); return info ; } // Compute QR factorization A = Q R. Use R_tmp as temporary R factor // storage for iterative refinement. template < class in_matrix_t , class out_matrix_1_t , class out_matrix_2_t , class out_matrix_3_t > int cholesky_tsqr ( in_matrix_t A , out_matrix_1_t Q , out_matrix_2_t R_tmp , out_matrix_3_t R ) { assert ( R . extent ( 0 ) == R . extent ( 1 )); assert ( A . extent ( 1 ) == R . extent ( 0 )); assert ( R_tmp . extent ( 0 ) == R_tmp . extent ( 1 )); assert ( A . extent ( 0 ) == Q . extent ( 0 )); assert ( A . extent ( 1 ) == Q . extent ( 1 )); copy ( A , Q ); const int info1 = cholesky_tsqr_one_step ( Q , R ); if ( info1 != 0 ) { return info1 ; } // Use one step of iterative refinement to improve accuracy. const int info2 = cholesky_tsqr_one_step ( Q , R_tmp ); if ( info2 != 0 ) { return info2 ; } // R = R_tmp * R using std :: linalg :: triangular_matrix_left_product ; triangular_matrix_left_product ( R_tmp , upper_triangle , explicit_diagonal , R ); return 0 ; }