Deprecate implicit conversions
          between char8_t  and char16_t , char32_t , or wchar_t 
- Document number:
- P3695R2
- Date:
- 2025-09-28 
- Audience:
- EWG, SG16
- Project:
- ISO/IEC 14882 Programming Languages — C++, ISO/IEC JTC1/SC22/WG21
- Author:
- Jan Schultke <janschultke@gmail.com>
- GitHub Issue:
- wg21.link/P3695/github
- Source:
- github.com/Eisenwave/cpp-proposals/blob/master/src/deprecate-unicode-conversion.cow
Contents
Revision history
Changes since R1
Changes since R0
Introduction
It's not hypothetical. This really happens.
The underlying problem
Scope
What about "safe" comparisons?
What about char16_t char32_t 
What about char wchar_t 
What about conversions with integers?
What comes after deprecation?
Why not make these conversions narrowing?
Impact on existing code
Replacement for deprecated behavior
Implementation experience
Wording
[conv.integral]
[expr.arith.conv]
[expr.static.cast]
[depr.conv.unicode]
References
1. Revision history
1.1. Changes since R1
R0 of the paper was seen by SG16, with the following poll results:
P3695R1: Recommend deprecating conversions between char and the charN_t types.
- Attendees: 10
- No objection to unanimous dissent.
P3695R1: Recommend deprecating conversions between char8_t and wchar_t.
- Attendees: 10
- No objection to unanimous consent.
P3695R1: Recommend deprecating conversions between char16_t and char32_t.
- Attendees: 10
SF F N A SA 0 0 3 7 0 - Consensus against.
Consequently, the following changes were made:
- 
    also deprecated conversion between char8_t wchar_t char wchar_t 
- changed title and abstract to reflect this new direction
- rewrote §3.2. What about char16_t char32_t 
- expanded note on tautology warnings in §2.1. It's not hypothetical. This really happens.
- added §3.6. Why not make these conversions narrowing?
- restructured §6. Wording and added editorial notes
1.2. Changes since R0
- 
    limited deprecation to conversions involving char8_t char16_t char32_t 
- rebased §6. Wording on [N5014]
2. Introduction
Implicit conversions between 
The assertion succeeds because Ԡ (U+0520) is UTF-8 encoded as 
Note that the "bad comparison" occurs between two 
Conversions "the other way" (e.g. 
2.1. It's not hypothetical. This really happens.
These kinds of bugs are not far-fetched hypotheticals either;
I have written such bugs myself,
and have had them contributed
to my syntax highlighter [µlight],
which makes extensive use of 
Using 
2.2. The underlying problem
The underlying problem is that 
To be fair, Unicode character types aren't strictly required to store Unicode code units.
However, that is their primary purpose, and the assumption holds true for any Unicode
3. Scope
I propose to deprecate implicit conversions between
3.1. What about "safe" comparisons?
In comparisons between code units,
certain ranges of code points yield the expected result.
For example, 
However, even those should be deprecated because:
- Keeping these valid would essentially leak implementation details of UTF-8 into the set of implicit conversions in the C++ core language, which seems like unclean design.
- 
    To rely on this "feature", the developer needs to memorize which code points are "safe to use".
    It is not obvious whether c == U ' € ' c == U ' $ ' 
- 
    It would make this "feature" (or lack thereof) harder to teach than it needs to be.
    The rule can be very simple: char8_t 
3.2. What about char16_t char32_t 
Following some negative feedback on [ClangWarning],
the proposal no longer seeks to deprecate conversions between 
Other code points are encoded using high surrogates ([
It is possible to have false negatives
when searching for a UTF-32 code unit
outside the Basic Multilingual Plane (BMP) in UTF-16 text.
However, these searches are tautologically false because values
≥ 
It also also much less likely that 
Last but not least, UTF-8 is becoming the "default encoding", especially on the web,
while UTF-16 is increasingly becoming a "legacy encoding".
This makes it unattractive to raise warnings for 
Recently it has become clear that the overhead of translating from/to UTF-8 on input and output, and dealing with potential encoding errors in the input UTF-8, overwhelms any benefits UTF-16 could offer. So newer software systems are starting to use UTF-8. The default string primitive used in newer programming languages, such as Go, Julia, Rust and Swift 5, assume UTF-8 encoding. PyPy also uses UTF-8 for its strings, and Python is looking into storing all strings in UTF-8. Microsoft now recommends the use of UTF-8 for applications using the Windows API, while continuing to maintain a legacy "Unicode" (meaning UTF-16) interface.
In summary, in 
3.3. What about char wchar_t 
As recommended by SG16,
I propose to leave 
The following conversions are not deprecated:
- 
    char char8_t 
- 
    wchar_t char16_t wchar_t char32_t wchar_t char16_t wchar_t char32_t 
Furthermore, deprecating any conversion from 
It may also be possible to deprecate conversions with 
3.4. What about conversions with integers?
It is quite common to compare character types to integer types.
For example, we may write 
3.5. What comes after deprecation?
The goal is to eventually remove these conversions entirely. Since the behavior is easily detected (§5. Implementation experience) and easily replaced (§4.1. Replacement for deprecated behavior), removal should be feasible within one or two revisions of the language.
Furthermore, I don't believe that having "tombstone behavior" would be necessary.
That is, allowing the conversion to happen but making the program ill-formed if it happens.
The reason is that 
3.6. Why not make these conversions narrowing?
Another possible option (instead of deprecation or following deprecation)
is to make the affected 
There are multiple problems with this approach, which is why it is not proposed:
- 
    char8_t char32_t 
- A long time has passed since C++11, and there is a lot of code using list-initialization now. This means that the "blast radius" of the change may still be quite large. If we accept that a non-trivial amount of warnings is raised in existing code, this half-measure seems unattractive.
- A lot of the problematic cases are not initialization, but comparisons as shown in §2. Introduction. Narrowing conversions play no role in equality comparison or in the usual arithmetic conversions.
4. Impact on existing code
It is not trivial to estimate how much code would be affected by a deprecation like this.
However, that is ultimately not what makes or breaks this proposal.
The goal is not to deprecate a rarely used feature to give it new meaning,
like 
The goal is to deprecate a bug-prone and harmful feature to make the language safer.
The longer we wait, the more mistakes will be made using 
4.1. Replacement for deprecated behavior
If the new deprecation warnings spot a bug like in §2. Introduction, some work will be required to fix it, but the deprecation will have done its job.
If the comparison is obviously safe, such as 
5. Implementation experience
Corentin Jabot has recently implemented a 
However the warning is more conservative than the proposed deprecation; it does not warn on "safe comparisons" (§3.1. What about "safe" comparisons?).
6. Wording
The following changes are relative to [N5014].
[conv.integral]
Change [conv.integral] paragraph 1 as follows, and split it into two paragraphs:
1 A prvalue of an integer type
can be converted to a prvalue of another integer type.
The conversion is deprecated ([depr.conv.unicode]) if
one of the types involved in the conversion is 
[Note: This deprecation also applies to cv-qualified types because prvalues of such types are adjusted to cv-unqualified types ([expr.type]). — end note]
2 A prvalue of an unscoped enumeration type can be converted to a prvalue of an integer type.
[expr.arith.conv]
Change [expr.arith.conv] paragraph 1 as follows:
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:
- The lvalue-to-rvalue conversion ([conv.lval]) is applied to each operand and the resulting prvalues are used in place of the original operands for the remainder of this section.
- […]
- 
    Otherwise, each operand is converted to a common type C char8_t char16_t char32_t wchar_t T1 T2 - […]
 
[expr.static.cast]
Immediately prior to [expr.static.cast] paragraph 5, insert a new paragraph:
An expression  of type cv 
[Note: Integral conversions ([conv.integral]) between these types have the same effect and are deprecated, unlike this explicit conversion ([depr.conv.unicode]). — end note]
Do not change [expr.static.cast] paragraph 5; it is cited here for reference:
Otherwise, an expression can be explicitly converted to a type
if there is an implicit conversion sequence ([over.best.ics]) from toT , […]. […], the result object is direct-initialized from .T 
[depr.conv.unicode]
Insert a new subclause in [depr] between [depr.local] and [depr.capture.this], containing a single paragraph:
Unicode character conversions [depr.conv.unicode]
The following conversions are deprecated:
- 
    Integral conversions ([conv.integral]),
    where out of the types involved in the conversion,
    one is char8_t char16_t char32_t wchar_t 
- 
    Usual arithmetic conversions ([expr.arith.conv])
    where out of the operand types after lvalue-to-rvalue conversion ([conv.lval]),
    one is char8_t char16_t char32_t wchar_t 
[Example:
char16_t char32_t char char8_t — end example]