1. Changelog
1.1. Changes since R1
-
Reword Abstract
-
Clarify in § 2 Motivation that
andint8_t are aliases touint8_t andsigned char , respectively.unsigned char -
Touch up wording
1.2. Changes since R0
-
Add a study on possible impact
-
Update the code example in Motivation
-
Add note about
in Cchar8_t -
Add note about [P0487R1]
2. Motivation
#include <cstdint>#include <iostream>#include <format>// In the standard library: namespace std { using int8_t = signed char ; using uint8_t = unsigned char ; } int main () { // Prints: std :: cout << static_cast < char > ( 48 ) << '\n' // 0 << static_cast < signed char > ( 48 ) << '\n' // 0 (Proposing deprecation) << static_cast < unsigned char > ( 48 ) << '\n' // 0 (Proposing deprecation) << static_cast < int8_t > ( 48 ) << '\n' // 0 (Proposing deprecation) << static_cast < uint8_t > ( 48 ) << '\n' // 0 (Proposing deprecation) << static_cast < short > ( 48 ) << '\n' // 48 << std :: format ( "{} \n " , static_cast < char > ( 48 )) // 0 << std :: format ( "{} \n " , static_cast < signed char > ( 48 )) // 48 << std :: format ( "{} \n " , static_cast < unsigned char > ( 48 )) // 48 << std :: format ( "{} \n " , static_cast < int8_t > ( 48 )) // 48 << std :: format ( "{} \n " , static_cast < uint8_t > ( 48 )) // 48 << std :: format ( "{} \n " , static_cast < short > ( 48 )); // 48 }
There are overloads for for ,
that take an , and a .
In addition, there are overloads for for ,
that take an and an .
These overloads are specified to have equivalent behavior to
the non-signedness qualified overloads:
[istream.extractors]
[ostream.inserters.character].
This is surprising. Per [basic.fundamental] p1 and p2:
There are five standard signed integer types: "
", "signed char ", "short int ", "int ", and "long int "... There may also be implementation-defined extended signed integer types. The standard and extended signed integer types are collectively called signed integer types.long long int For each of the standard signed integer types, there exists a corresponding (but different) standard unsigned integer type: "
", "unsigned char ", "unsigned short int ", "unsigned int ", and "unsigned long int "... Likewise, for each of the extended signed integer types, there exists a corresponding extended unsigned integer types. The standard and extended unsigned integer types are collectively called unsigned integer types.unsigned long long int
Thus, and should be treated as integers, not as characters.
This is highlighted by the fact, that and
are specified to be aliases to (un)signed integer types,
which are in practice going to be and .
Note:
The Solaris implementation is different, and defines to be by default.
This is not conformant.
and are not character types.
Per [basic.fundamental] p11, since [P2314R4]:
The types
,char ,wchar_t ,char8_t , andchar16_t are collectively called character types.char32_t
and are included in the set of ordinary character types
and narrow character types ([basic.fundamental] p7),
but these definitions are used for specifying alignment, padding, and indeterminate values
([basic.indet]),
and are arguably not related to characters in the sense of pieces of text.
has already taken a step in the right direction here,
by treating and as integers.
It’s specified to not give special treatment to these types,
but to use the standard definitions of (un)signed integer type
to determine whether a type is to be treated as an integer when formatting.
This paper proposes that these overloads in iostreams should be deprecated.
3. Impact
It’s difficult to find examples where this is the sought-after behavior, and would become deprecated with this change. These snippets aren’t easily greppable.
It’s easy to find counter-examples, however, where workarounds have to be employed to insert or extract s or s
as integers. Some of them can be found with isocpp.org codesearch
by searching for or , although false positives there are very prevalent.
/* ... */ << static_cast < int > ( my_schar );
These overloads have existed since C++98.
The signature of for was updated for C++20 in [P0487R1],
where these functions were changed to take instead of , for safety reasons.
No other changes to these overloads have been made in standard C++.
// Changes in P0487, applied to C++20 template < class charT , class traits , size_t N > basic_istream < charT , traits >& operator >> ( basic_istream < charT , traits >& , charT * charT ( & )[ N ] ); template < class traits , size_t N > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& , unsigned char * unsigned char ( & )[ N ] ); template < class traits , size_t N > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& , signed char * signed char ( & )[ N ] );
It should be noted, that the C standard has defined to be an alias (typedef) to .
In C++, is a distinct type with an underlying type of .
3.1. Impact of removal
This paper proposes deprecating these overloads to discourage their use, but doesn’t propose removing them. However, since deprecation is often followed by removal, let’s see what that would eventually look like.
The taking is the simplest. If that overload is removed,
overload resolution will kick in to find a match, and the overload taking an is chosen.
This would be a breaking change, but also the behavior we would want, and what is currently in .
The other overloads proposed for deprecation in this paper aren’t as simple, since they take either references or pointers. Removal of these overloads would thus make code calling them ill-formed, as there would be no viable candidate in overload resolution.
For the taking , the overload should either be removed or marked as
(making code ill-formed), or its behavior should be changed to match .
For the and operators taking references to arrays or pointers to strings, these overloads should arguably be removed altogether.
Their behavior is currently defined essentially as -ing to a pointer to the stream character type,
and forwarding that pointer to the appropriate operator. When we’re treating s as integer, this behavior will no longer make sense.
3.2. Impact study
To gauge the potential impact of this deprecation, the author tried building open source C++ code bases, using a patched version of libc++. Below are the instances where the overloads proposed for deprecation were used in these builds.
For reference, the author built
tensorflow-lite and
Tenzir using a custom version of libc++ where these overloads
were marked as d. These code bases number ~1½ MLoC in total, with a large number of dependencies,
are reasonably modern, and use iostreams.
3.2.1. Abseil
The Abseil logging library seems to
treat and as character types.
This is likely because the syntax used by the library is very similar to that
used by iostreams:
signed char my_schar = 65 ; LOG ( ERROR ) << my_schar ; // Will output: // E0520 13:49:47.968463 123694 absl_log.cpp:8] A // where the message itself is the 'A' here -----^
Internally in the library, this is achieved with this overload set:
// Abseil, version 20230802.1: // absl/log/internal/check_op.cc void MakeCheckOpValueString ( std :: ostream & os , const char v ) { if ( v >= 32 && v <= 128 ) { os << "'" << v << "'" ; } else { os << "char value " << int { v }; } } void MakeCheckOpValueString ( std :: ostream & os , const signed char v ) { if ( v >= 32 && v <= 128 ) { os << "'" << v << "'" ; } else { os << "signed char value " << int { v }; } } void MakeCheckOpValueString ( std :: ostream & os , const unsigned char v ) { if ( v >= 32 && v <= 128 ) { os << "'" << v << "'" ; } else { os << "unsigned char value " << int { v }; } }
where and are explicitly and intentionally
treated similarly to , and are passed to an underlying .
Notably, the values between and are really treated as character values,
as they are printed with ', and are cast to integers otherwise.
3.2.2. FlatBuffers
In the implementation of (the FlatBuffers schema compiler), there’s the following
function template:
// Flatbuffers, version 23.5.26: // src/annotated_binary_text_gen.cpp template < typename T > std :: string ToString ( T value ) { if ( std :: is_floating_point < T >:: value ) { std :: stringstream ss ; ss << value ; return ss . value (); } else { return std :: to_string ( value ); } }
where the proposed-for-deprecation overload of is instantiated,
if is or . The overloads are never actually called,
but because the above code is using instead of , the compiler
warns about usage, anyway.
The current behavior when using or is to use ,
which formats the value as an integer, as the overload is picked
in overload resolution.
3.2.3. simdjson
The following piece of code is present in the implementation of simdjson:
// simdjson, version 3.9.1: // include/simdjson/dom/document-inl.h inline bool document::dump_raw_tape ( std :: ostream & os ) const noexcept { uint32_t string_length ; size_t tape_idx = 0 ; uint64_t tape_val = tape [ tape_idx ]; uint8_t type = uint8_t ( tape_val >> 56 ); os << tape_idx << " : " << type ; // ... os << tape_idx << " : " << type << " \t // pointing to " << // ... if ( type == 'r' ) // ... switch ( type ) { case '"' : // ... case 'l' : // ... } }
This member function is apparently intended to be used for debugging.
The referenced is a library-internal representation of a parsed JSON document.
Above, has the type of , but is clearly treated as a character type.
Its value is compared to character literals, and thus, when written to a ,
is intended to be formatted as a character. The proposed deprecation would break this.
3.2.4. yaml-cpp
In yaml-cpp, the following piece of code can be found,
where the overload is called with and :
// yaml-cpp, version 0.8.0: // include/yaml-cpp/node/convert.h // Used with T=signed char and T=unsigned char template < typename T > typename std :: enable_if <! std :: is_floating_point < T >:: value , void >:: type inner_encode ( const T & rhs , std :: stringstream & stream ){ stream << rhs ; }
This function template is instantiated and called when writing to an existing YAML document:
signed char my_schar = 65 ; unsigned char my_uchar = 65 ; auto node = YAML :: Load ( "{schar: 0, uchar: 0}" ); node [ "schar" ] = my_schar ; node [ "uchar" ] = my_uchar ; std :: cout << node ; // Outputs: {schar: A, uchar: A}
It’s unclear whether treating as a character type here is the desired behavior,
or simply an oversight caused by the usage of .
Elsewhere in the library, is treated unambiguously as an integer,
whereas is treated as a character:
signed char my_schar = 65 ; unsigned char my_uchar = 65 ; YAML :: Emitter out ; out << YAML :: BeginMap << YAML :: Key << "schar" << YAML :: Value << my_schar << YAML :: Key << "uchar" << YAML :: Value << my_uchar << YAML :: EndMap ; std :: cout << out . c_str (); // Outputs: // schar: 65 // uchar: A
There are two long-standing issues against yaml-cpp to inquire about this inconsistency, without a resolution before the mailing deadline.
3.2.5. Conclusion
Only four instances of use were found during this study, which is not a lot.
Notably, only uses of taking a or were found.
No uses of the array-version of or any of the overloads were identified.
In these four cases:
-
One (probably) contained a bug, which could have been identified with the deprecation proposed here § 3.2.4 yaml-cpp
-
One was essentially a false-positive, where the deprecated overloads were never called, only instantiated § 3.2.2 FlatBuffers
-
Two were cases were the deprecated behavior was the desired one § 3.2.1 Abseil § 3.2.3 simdjson
So, the use in the wild for these overloads seems to be quite limited.
In some cases, the current behavior is asked for, but it’s difficult to ascertain whether the developers writing that code
initially got tripped up by this behavior.
With this deprecation, at least a single possible bug was identified, and it’s possible even more could be found,
once developers are forced to check their usages as their compilers start warning them.
If anything, forcing users to cast to // could be argued to be an increase in readability,
in favor of relying on the current behavior with and .
4. Wording
This wording is relative to [N5032].
The wording essentially transplants the existing wording of the to-be-deprecated overloads to Annex D. Any modifications done are explicitly noted.
4.1. Modify [istream.general] p1
// ... // [istream.extractors], character extraction templates template < class charT , class traits > basic_istream < charT , traits >& operator >> ( basic_istream < charT , traits >& , charT & ); template < class traits > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& , unsigned char & ); template < class traits > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& , signed char & ); template < class charT , class traits , size_t N > basic_istream < charT , traits >& operator >> ( basic_istream < charT , traits >& , charT ( & )[ N ]); template < class traits , size_t N > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& , unsigned char ( & )[ N ]); template < class traits , size_t N > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& , signed char ( & )[ N ]);
4.2. Modify [istream.extractors], around p7 to p12
template < class charT , class traits , size_t N > basic_istream < charT , traits >& operator >> ( basic_istream < charT , traits >& in , charT ( & s )[ N ]); template < class traits , size_t N > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& in , unsigned char ( & s )[ N ]); template < class traits , size_t N > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& in , signed char ( & s )[ N ]);
Effects: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of .
After a sentry object is constructed, extracts characters and stores them into .
If is greater than zero, is .
Otherwise is .
is the maximum number of characters stored.
Characters are extracted and stored until any of the following occurs:
-
characters are stored;n - 1 -
end of file occurs on the input sequence;
-
letting
bect ,use_facet < ctype < charT >> ( in . getloc ()) isct . is ( ct . space , c ) true.
then stores a null byte () in the next position, which may be the first position if no characters were extracted.
then calls .
If the function extracted no characters, is set in the input function’s local error state before is called.
Returns: .
template < class charT , class traits > basic_istream < charT , traits >& operator >> ( basic_istream < charT , traits >& in , charT & c ); template < class traits > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& in , unsigned char & c ); template < class traits > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& in , signed char & c );
Effects: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of .
A character is extracted from , if one is available, and stored in .
Otherwise, is set in the input function’s local error state before is called.
Returns: .
4.3. Modify [ostream.general] p1
// ... // [ostream.inserters.character], character inserters template < class charT , class traits > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& , charT ); template < class charT , class traits > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& , char ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , char ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , signed char ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , unsigned char ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , wchar_t ) = delete ; template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , char8_t ) = delete ; template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , char16_t ) = delete ; template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , char32_t ) = delete ; template < class traits > basic_ostream < wchar_t , traits >& operator << ( basic_ostream < wchar_t , traits >& , char8_t ) = delete ; template < class traits > basic_ostream < wchar_t , traits >& operator << ( basic_ostream < wchar_t , traits >& , char16_t ) = delete ; template < class traits > basic_ostream < wchar_t , traits >& operator << ( basic_ostream < wchar_t , traits >& , char32_t ) = delete ; template < class charT , class traits > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& , const charT * ); template < class charT , class traits > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& , const char * ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , const char * ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , const signed char * ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , const unsigned char * ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , const wchar_t * ) = delete ; template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , const char8_t * ) = delete ; template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , const char16_t * ) = delete ; template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& , const char32_t * ) = delete ; template < class traits > basic_ostream < wchar_t , traits >& operator << ( basic_ostream < wchar_t , traits >& , const char8_t * ) = delete ; template < class traits > basic_ostream < wchar_t , traits >& operator << ( basic_ostream < wchar_t , traits >& , const char16_t * ) = delete ; template < class traits > basic_ostream < wchar_t , traits >& operator << ( basic_ostream < wchar_t , traits >& , const char32_t * ) = delete ; // ...
4.4. Modify [ostream.inserters.character]
template < class charT , class traits > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& out , charT c ); template < class charT , class traits > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& out , char c ); // specialization template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& out , char c ); // signed and unsigned template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& out , signed char c ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& out , unsigned char c );
Effects: Behaves as a formatted output function of .
Constructs a character sequence .
If has type and the character container type of the stream is not ,
then consists of ; otherwise consists of .
Determines padding for as described in [ostream.formatted.reqmts].
Inserts into .
Calls .
Returns: .
template < class charT , class traits > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& out , const charT * s ); template < class charT , class traits > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >& out , const char * s ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& out , const char * s ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& out , const signed char * s ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& out , const unsigned char * s );
Preconditions: is not a null pointer.
Effects: Behaves like a formatted inserter (as described in [ostream.formatted.reqmts]) of .
Creates a character sequence of characters starting at , each widened using ([basic.ios.members]),
where is the number that would be computed as if by:
-
for the overload where the first argument is of typetraits :: length ( s ) and the second is of typebasic_ostream < charT , traits >& , and also for the overload where the first argument is of typeconst charT * and the second is of typebasic_ostream < char , traits >& ,const char * -
for the overload where the first argument is of typechar_traits < char > :: length ( s ) and the second is of typebasic_ostream < charT , traits >& .const char * , -
for the other two overloads.traits :: length ( reinterpret_cast < const char *> ( s ))
Determines padding for as described in [ostream.formatted.reqmts].
Inserts into .
Calls .
Returns: .
4.5. Add a new subclause in Annex D after [depr.atomics]
Note: The wording after the first paragraph ("The header ...")
is identical to the wording in § 4.2 Modify [istream.extractors], around p7 to p12,
which had no special behavior for these overloads.
Deprecated and extraction [depr.istream.extractors]
The header has the following additions:
template < class traits > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& in , unsigned char & c ); template < class traits > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& in , signed char & c );
Effects: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of .
A character is extracted from , if one is available, and stored in .
Otherwise, is set in the input function’s local error state before is called.
Returns: .
template < class traits , size_t N > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& in , unsigned char ( & s )[ N ]); template < class traits , size_t N > basic_istream < char , traits >& operator >> ( basic_istream < char , traits >& in , signed char ( & s )[ N ]);
Effects: Behaves like a formatted input member (as described in [istream.formatted.reqmts]) of .
After a sentry object is constructed, extracts characters and stores them into .
If is greater than zero, is .
Otherwise is .
is the maximum number of characters stored.
Characters are extracted and stored until any of the following occurs:
-
characters are stored;n - 1 -
end of file occurs on the input sequence;
-
letting
bect ,use_facet < ctype < charT >> ( in . getloc ()) isct . is ( ct . space , c ) true.
then stores a null byte () in the next position, which may be the first position if no characters were extracted.
then calls .
If the function extracted no characters, is set in the input function’s local error state before is called.
Returns: .
4.6. Add a new subclause in Annex D after the above ([depr.istream.extractors])
Note: This wording is novel. It captures the behavior for which the wording is removed in § 4.4 Modify [ostream.inserters.character].
Deprecated and insertion [depr.ostream.inserters]
The header has the following additions:
template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& out , signed char c ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& out , unsigned char c );
Effects: Equivalent to: .
template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& out , const signed char * s ); template < class traits > basic_ostream < char , traits >& operator << ( basic_ostream < char , traits >& out , const unsigned char * s );
Effects: Equivalent to: .