1. Revision History
1.1. Revision 3 - June 15th, 2022
- 
     Fix typo "orindary" ➡ "ordinary". 
1.2. Revision 2 - May 15th, 2022
- 
     Add new Tony Table. 
- 
     Passed EWG with the addition of excluding signed char u8initialization rules."" 
- 
     Wording updated to reflect this behavior. 
1.3. Revision 1 - February 15th, 2022
- 
     Fix typos and other grammar mistakes in various sections such as in § 4.2 Casting/Aliasing?. 
- 
     Use "may" in both places in the wording, rather than "can" and then "may". 
- 
     "Fix" for the title, rather than "Fixes". 
- 
     Discuss the aggregate-initialization-with-overloading case related to fixed-size arrays and brace initialization in § 4.6 Overload Resolution for Array-Containing Structure Initialization. 
- 
     Adjust wording to include Annex C entry in § 5.1.3 Add Annex C.1.6 example for change in code [diff.cpp20]. 
- 
     Successfully passed SG16 vote to be forwarded to EWG, potentially for C++23. 
1.4. Revision 0 - January 15th, 2022
- 
     Initial Release! 🎉 
2. Polls & Votes
Votes are done in a Strongly in Favor (SF) / Favor (F) / Neutral (N) / Against (A) / Strongly Against (SA) format. Differences between vote count and number of attendees is abstention.
2.1. May 12th, 2022 - EWG
Accept P2513R1 as a Defect Report against C++20.
SF F N A SA 3 5 2 1 0
Result: Consensus (8-1)
Accept P2513R1, with the modification to exclude 'signed char' from the allowable conversions list as a Defect > > Report against C++20.
SF F N A SA 5 4 1 1 0
Result: Consensus (9-1) < Stronger
The second poll has stronger consensus, so it will be forwarded to electronic polling.
The one to remove 
2.2. February 9th, 2022 - SG16
Add an Annex C entry and discussion to D2513R1, and forward the published paper as revised to EWG as a defect report.
SF F N A SA
1 5 0 1 0
Attendance: 8
Author position: SF
Consensus: Strong consensus
Against rationale: Adding another weird inconsistency between pointers and arrays; discussion decreased comfort; breakage is concerning.
3. Introduction and Motivation
| Pre-C++20 | 
 | 
 | 
| C++20 | 
 | 
 | 
| C++-20-with-DR | 
 | 
 | 
The introduction of 
Among the breakages, ones that stood out were that several kinds of string initialization and pointer conversions were illegal, particular ones involving 
const char * a = u8"a" ; // broken in C++20 const char b [] = u8"b" ; // broken in C++20 const unsigned char c [] = u8"c" ; // broken in C++20 
This has also exasperated 
#include <utility>template < std :: size_t N > struct char8_t_string_literal { static constexpr inline std :: size_t size = N ; template < std :: size_t ... I > constexpr char8_t_string_literal ( const char8_t ( & r )[ N ], std :: index_sequence < I ... > ) : s { r [ I ]...} {} constexpr char8_t_string_literal ( const char8_t ( & r )[ N ]) : char8_t_string_literal ( r , std :: make_index_sequence < N > ()) {} auto operator <=> ( const char8_t_string_literal & ) = default ; char8_t s [ N ]; }; template < char8_t_string_literal L , std :: size_t ... I > constexpr inline const char as_char_buffer [ sizeof ...( I )] = { static_cast < char > ( L . s [ I ])... }; template < char8_t_string_literal L , std :: size_t ... I > constexpr auto & make_as_char_buffer ( std :: index_sequence < I ... > ) { return as_char_buffer < L , I ... > ; } constexpr char operator "" _as_char ( char8_t c ) { return c ; } template < char8_t_string_literal L > constexpr auto & operator "" _as_char () { return make_as_char_buffer < L > ( std :: make_index_sequence < decltype ( L ) :: size > ()); } #if defined(__cpp_char8_t) # define U8(x) u8##x##_as_char #else # define U8(x) u8##x #endif int main () { constexpr const char * p = U8 ( "text" ); constexpr const char & r = U8 ( 'x' ); return 0 ; } 
With all due respect to the effort involved, these are solutions only a C++ expert could love. It harkens back to days long-gone-by of 
There are other solutions as well, such as constructing a 
3.1. C Compatibility
Worse, this code impacts C Compatibility both before and after any changes to 
extern const char * a = u8"a" ; // Works in C (using default extensions), broken in C++20 extern const char b [] = u8"b" ; // Works in C, broken in C++20 extern const unsigned char * c = u8"c" ; // Works in C (using default extensions), broken in C++20 extern const unsigned char d [] = u8"d" ; // Works in C, broken in C++20 
This is kind of break in previously working code may be too far reaching. Even if the char8_t for C paper, N2653 passes for C23 (or later), it only introduces 
extern const unsigned char d [] = u8"d" ; // Works in C even after N2653, breaks in C++20 
These breaks have caused issues, including for very popular C and C++ libraries, and the solution is adding C++20-specific overloads. But this does nothing to help individuals who are trying to write C++11, 14, and 17 code that needs to eventually transition to use u8 string literals.
3.2. Compatibility Troubles in Existing Libraries
There are many libraries that have sustained usability decreases from the introduction of u8 string literals. Popular user libraries such as Dear imgui, nlohmann::json, and many others suffer from these issues. For example:
Basically dear imgui wants to uses low-level types here
+ promote terse code,const char * u8was perfect for encoding strings. When using the lib users typically use LOTS of literals. Now users can’t without a cast or us adding overloads to several hundreds entry points."" Those users, the majority are silent in the first place, they are used to that kind of software not working well for their languages, they move on. Dear imgui supported them somehow (very imperfectly but enough to attract a crowd). Now things became much less attractive.
… The lib is designed for very fast iteration, compact code, imho it is a great loss.
This kind of pain has been repeated in other libraries, such as 
Watch on this!
is serialized as number array now.. I have to explicitly convert it intostd :: u8string_view every time.std :: string_view 
You are right,
is currently not supported. I currently see no blocker in supporting it, but I cannot promise any timeline for the feature. Any help (and PRs) welcome!std :: u8string 
The tests for u8 strings. Where necessary the library (and many others) simply use by-hand byte sequence encoding in non-prefixed string literals when they know they cannot influence the use of command line arguments for UTF-8 encoded strings.
Some code just remains broken currently, such as the antlr4 project which generators u8 literals. That will require greater surgery to fix.
This proposal allows for a dedicated migration path, albeit it still require minor changes. In particular, users will have to first create a variable so that the UTF-8 string literal can be used to initialize a 
4. Design
There are three core goals this proposal is out to achieve, specifically around the usage of single 
- 
     code written in both C and C++ in a header file will initialize and work properly when using unsigned char typedef 
- 
     code written to be compatible with both pre-C++17 and C++20-and-beyond, as well as C, can work properly by using unsigned char 
- 
     code that wants to remain compatible with old u8literal behavior can initialize to"" const char [] const signed char [] 
- 
     and, enabling a gradual migration path that is not a hard break that can be mechanically accounted for, rather than requiring larger, more involved and architected changes. 
This proposal is the smallest, simplest possible fix. It explicitly does not attempt to deal with conversion or use as a pointer value, and deals strictly with array initialization. This means that function calls and initialization of a 
4.1. Why unsigned  char 
   
There is strong in-the-industry usage of 
Groups with the power to control the entire vertical stack — from their data centers to the final services running in the browser and on end-user machines — can guarantee that they can simply set their locale to be UTF-8 on their native machine. This is not exactly possible across all tech stacks, however: Microsoft has only just started to encourage UTF-8, after all. However, the option for turning on UTF-8 as the default Active Code Page (ACP) is still hidden in the legacy control panel settings behind 3 dialog boxes and a checkmark to turn on a "BETA" feature. This means that the wide variety of software that still uses 
Therefore, this proposal focuses on 
Tapping into this current industry best-practice is a good way to give people in pre-C++20 code practice for working with a 
4.2. Casting/Aliasing?
We do not provide a way for a 
4.3. C Compatibility
Because of the nature of C and the fact that the only proposal on the table that is likely to be accepted is that it uses 
const unsigned char str [] = u8"" ; 
may become the lingua-franca of dealing with UTF-8 in a way that is type-level different from normal non-prefixed string literals. This code will work before and after the changes proposed in [n2653]. But, it breaks when transitioning to C++20-and-beyond in headers. This can become a problem for end-users, which is why we present this as a fix. The functions in [n2730] are also going in this direction, with both papers having general approval from WG14 and slated to make it either in late C23 or early C2y/C3Y.
Additionally, Tom Honermann’s accepted u8 and u8 literals to u8 string literals continues to work.
Therefore, we additionally propose to allow initialization of 
We do not propose allowing 
4.4. Defect Report
This paper is being pushed forward as a Defect Report to C++20, which is when 
4.5. What about special unsigned  char * 
   We do not propose u8 string literal. This is strictly due to rules around u8 string literal can change it so that the backing storage for the 
Still, this problem can be solved, in general, by using special 
Clearly, this was not the case and has continued to be an enduring problem, but there is little we can do now to solve this problem besides accept that we made a mistake in C++11 and try to course correct sooner, rather than later.
4.5.1. Compound Literals with C?
One way to get a 
void f ( const unsigned char * ); f (( unsigned char []){ u8"text" }); 
This is overtly verbose and, unfortunately, compound literals are not supported in Standard C++ (though they are supported as an implementation extension in some C++ compilers with C modes, such as Clang). There is a proposal for compound literals that has seen some renewed interest over the last year, Zhihao Yuan’s [p2174r0]. It has not progressed but has been brought up for multiple use cases, meaning that it may once more be brought forward. This can be seen as an alternative solution that can be made viable by Yuan’s proposal, but is not pursued in this one.
4.5.2. But you CAN make it work??
In a way, yes, but it would get messy to solve this for all existing use cases. For example, consider the following code (using C++20 with all of its features available):
#include <cstdio>void f ( const unsigned char * f ) { printf ( "%s" , "unsigned char \n " ); } void f ( const char * f ) { printf ( "%s" , "char \n " ); } void f ( const char8_t * f ) { printf ( "%s" , "char8_t \n " ); } int main () { // (1) const unsigned char * p = u8"" ; // (2) f ( u8"" ); return 0 ; } 
The case for the code under 
(Uses< source >: 17 : 34 : error : pointer targets in initialization of 'const unsigned char * 'from 'char * 'differ in signedness [ - Werror = pointer - sign ] 17 | const unsigned char * p = u8"" ; 
- std = c2x  - O3  - Wall  - Wpedantic  - Werror This makes the case for 
Thusly, we consider only the array initialization case, since this paper primarily focuses on compatibility. We also do not want to disturb overload sets which contain a choice between 
We do think that, in the future, there can be improved interoperation with u8 may decay into a 
4.6. Overload Resolution for Array-Containing Structure Initialization
There exists an ambiguity when initializing character arrays from 
The question of whether or not this matters, in overall analysis, leans into it not having significant impact. This same kind of code snippet has similar impact for string literal initialization using a plain 
struct A { unsigned char s [ 10 ]; }; struct B { char s [ 10 ]; }; void f ( A ); void f ( B ); int main () { f ({ "" }); // ambiguous } 
This situation now becomes the same deal when workign with u8 in this scenario and having 
struct C { char8_t s [ 10 ]; }; struct D { char s [ 10 ]; }; void f ( C ); void f ( D ); int main () { f ({ u8"" }); // ambiguous } 
Users could not rely on this code successfully disambiguating before C++20, going back to it being ambiguous for this very specific case is fine. Furthermore, this only applies in C++ with C-like aggregate structures: C has no such problem in its codebases, and so it should not show up at all in C code being ported to C++. Because this paper is a Defect Report, it restores it to the behavior it’s had since C++11, meaning that there has been very little time for this to manifest. Given that there has been a lack of 
5. Specification
The specification is relative to the latest C++ Working Draft, [n4901].
5.1. Language Wording
5.1.1. Adjust Feature Test Macro for char8_t 
   Editor’s Note: Please replace with a suitable value.
Macro Name Value __cpp_char8_t 201811L202XXXL
5.1.2. Modify Initialization of Character Arrays in [dcl.init.string]
An array of ordinary character type ([basic.fundamental]),array,char8_ t array,char16_ t array, orchar32_ t arraywchar_ t canmay be initialized by an ordinary string literal, UTF-8 string literal, UTF-16 string literal, UTF-32 string literal, or wide string literal, respectively, or by an appropriately-typed string-literal enclosed in braces ([lex.string]). Additionally, an array oforchar may be initialized by a UTF-8 string literal, or by such a string literal enclosed in braces. Successive characters of the value of the string-literal initialize the elements of the array , with an integral conversion [conv.integral] if necessary for the source and destination value .unsigned char 
5.1.3. Add Annex C.1.6 example for change in code [diff.cpp20]
Affected subclause: [dcl.init.string]
Change: UTF-8 string literals may initialize arrays of
orchar .unsigned char Rationale: Compatibility with previously written code that conformed to previous versions of this document.
Effect on original feature: Arrays of
orchar may now be initialized with a UTF-8 string literal. This can affect initialization that includes arrays that are directly initialized within class types, typically aggregates.unsigned char [ Example 1:
struct A { char8_t s [ 10 ]; }; struct B { char s [ 10 ]; }; void f ( A ); void f ( B ); int main () { f ({ u8"" }); // ambiguous } — end example]