1. Introduction
This paper proposes making
formattable using the formatting
facility introduced in C++20 (
) and fixes encoding issues in the
underlying API ([LWG4156]).
2. Changes since R3
-
Added LEWG poll results for R3.
-
Fixed a typo in the wording.
-
Fixed wording for the debug output.
-
Clarify why the format specifier for value is not provided.
3. Changes since R2
-
Added a reference to [P2930] and how it differs from the current proposal.
4. Changes since R1
-
Added a debug format to avoid ambiguity when formatting error codes in maps.
-
Added SG16 poll results.
5. Changes since R0
-
Changed the title from "Formatting of std::error_code" to "Fix encoding issues and add a formatter for std::error_code" to reflect the fact that the paper also fixes [LWG4156].
-
Specified that
returns a string the ordinary literal encoding per SG16 feedback.error_category :: name () -
Made transcoding in
implementation-defined if the literal encoding is not UTF-8 per SG16 feedback and for consistency with other similar cases in the standard.error_category :: message ()
6. Polls
LEWG poll results for R3:
POLL: P3395 should explore format specifier support to define which information (error number/category/message) to format.
SF F N A SA 1 9 4 2 0
Outcome: Consensus in favour
SG16 poll results for R0:
POLL: Forward P3395R0 to LEWG amended to specify an encoding for
and for transcoding to be to UTF-8 if that
matches the ordinary literal encoding and to an implementation-defined encoding
otherwise.
SF F N A SA 1 6 0 0 0
Outcome: Strong consensus
7. Motivation
has a rudimentary
inserter. For example:
std :: error_code ec ; auto size = std :: filesystem :: file_size ( "nonexistent" , ec ); std :: cout << ec ;
This works and prints
.
However, the following code doesn’t compile:
std :: ( "{} \n " , ec );
Unfortunately, the existing inserter has several issues, such as I/O manipulators applying only to the category name rather than the entire error code, resulting in confusing output:
std :: cout << std :: left << std :: setw ( 12 ) << ec ;
This prints:
generic :2
Additionally, it doesn’t allow formatting the error message and introduces potential encoding issues, as the encoding of the category name is unspecified.
A specifier for an error code’s value is intentionally not provided because it is of limited use without the associated category information. Moreover, the value can be easily accessed and formatted using other means, for example:
std :: ( "{} \n " , ec . value ());
This functionality is not currently provided by {fmt}, and over several years of usage, there have been no requests to add it. However, if sufficient demand emerges, it could be considered for future inclusion.
8. Proposal
This paper proposes adding a
specialization for
to address the problems discussed in the previous section.
The default format will produce the same output as the
inserter:
std :: ( "{} \n " , ec );
Output:
generic:2
It will correctly handle width and alignment:
std :: ( "[{:>12}] \n " , ec );
Output:
[ generic:2]
Additionally, it will allow formatting the error message:
std :: ( "{:s} \n " , ec );
Output:
No such file or directory
(The actual message depends on the platform.)
The main challenge lies in the standard’s lack of specification for the
encodings of strings returned by
and
/
(syserr.errcat.virtuals):
virtual const char * name () const noexcept = 0 ;
Returns: A string naming the error category.
virtual string message ( int ev ) const = 0 ;
Returns: A string that describes the error condition denoted by
.
In practice, implementations typically define category names as string literals, meaning they are in the ordinary literal encoding.
However, there is significant divergence in message encodings. libc++ and
libstdc++ use
for the generic category which is in the C
(not "C") locale encoding but disagree on the encoding for the system category:
libstdc++ uses the Active Code Page (ACP) while libc++ again uses
/ C locale on Windows. Microsoft STL uses a table of string literals in the
ordinary literal encoding for the generic category and ACP for the system
category.
The following table summarizes the differences:
libstdc++ | libc++ | Microsoft STL | |
POSIX |
|
| N/A |
Windows | / ACP
|
| ordinary literals / ACP |
Obviously none of this is usable in a portable way through the generic
API because encodings can be and often are different.
To address this, the proposal suggests using the C locale encoding (execution
character set), which is already employed in most cases and aligns with
underlying system APIs. Microsoft STL’s implementation has a number of bugs in
([MSSTL-3254], [MSSTL-4711]) and will
likely need to change anyway. This also resolves [LWG4156].
An alternative approach could involve communicating the encoding from
. However, this introduces ABI challenges and complicates usage
compared to adopting a single encoding.
9. Previous work
A formatter for
was proposed as part of [P2930] which
has more formatting options for the numeric code but doesn’t try to address
encoding issues or provide a debug format.
10. Wording
Add to "Header <system_error> synopsis" [system.error.syn]:
// [system.error.fmt], formatter template < class charT > struct formatter < error_code , charT > ;
Add a new section "Formatting" [system.error.fmt] under "Class
" [syserr.errcode]:
template < class charT > struct formatter < error_code , charT > { constexpr void set_debug_format (); constexpr typename basic_format_parse_context < charT >:: iterator parse ( basic_format_parse_context < charT >& ctx ); template < class FormatContext > typename FormatContext :: iterator format ( const error_code & ec , FormatContext & ctx ) const ; };
constexpr void set_debug_format ();
Effects: Modifies the state of the
to be as if the error-code-format-spec parsed by the last call to
contained the
option.
constexpr typename basic_format_parse_context < charT >:: iterator parse ( basic_format_parse_context < charT >& ctx );
Effects: Parses the format specifier as a error-code-format-spec and stores the
parsed specifiers in
.
error-code-format-spec:
fill-and-alignopt widthopt
opt
opt
where the productions fill-and-align and width are described in [format.string].
Returns: An iterator past the end of the error-code-format-spec.
template < class FormatContext > typename FormatContext :: iterator format ( const error_code & ec , FormatContext & ctx ) const ;
Effects: If the
option is used, then:
-
If the ordinary literal encoding is UTF-8, then let
bemsg
transcoded to UTF-8 with maximal subparts of ill-formed subsequences substituted with U+FFFD REPLACEMENT CHARACTER per the Unicode Standard, Chapter 3.9 U+FFFD Substitution in Conversion.ec . message () -
Otherwise, let
bemsg
transcoded to an implementation-defined encoding.ec . message ()
Otherwise, let
be
.
Writes
into
, adjusted according to the error-code-format-spec. If the
option is used then
is formatted as
an escaped string ([format.string.escaped]).
Returns: An iterator past the end of the output range.
Modify [syserr.errcat.virtuals]:
virtual const char * name () const noexcept = 0 ;
Returns: A string in the ordinary literal encoding naming the error category.
...
virtual string message ( int ev ) const = 0 ;
Returns: A string
of multibyte characters in the execution character
set
that describes the error condition denoted by
.
11. Implementation
The proposed
for
has been implemented in the
open-source {fmt} library ([FMT]).