SG16: Unicode meeting summaries 2018/10/17 - 2019/01/09
Summaries of SG16 meetings are maintained at
https://github.com/sg16-unicode/sg16-meetings.  This paper contains a
snapshot of select meeting summaries from that repository.
October 17th, 2018
Draft agenda:
  - char8_t: Markus' concerns, motivation, type safety, Unicode sandwich,
      most C++ code is yet to be written, transition story.
- Code points, EGCs, or explicit ranges for text views/containers?
    
      - How to decide? Pick a direction now? Write a pros/cons paper for the
          committee?
 
Attendees:
  - Artem Tokmakov
- Cameron Gunnin
- JeanHeyd Meneide
- Mark Zeren
- Markus Scherer
- Martinho Fernandes
- Sergey Zubkov
- Steve Downey
- Tom Honermann
- Zach Laine
Meeting summary:
  - 
      Issue #30: Unclear behavior for octal and hex escape sequences in
      Unicode character and string literals
    
      - Tom explained the current situation;
          CWG#2333 tracks this issue.
          CWG discussed at their August 2017 teleconference and decided that
          numeric escape sequences should be ill-formed in UTF-8 character
          literals.  Mike Miller offered to reconsider the issue if requested
          by SG16.
- Markus mentioned the utility in using numeric escapes to create
          ill-formed strings for testing purposes.
- Markus also presented an alternative possibility, that numeric
          escapes only be ill-formed if used to encode a code unit value that
          is never valid in a UTF string, e.g., 0xff.
- Markus additionally noted that there is a distinction between Unicode
          strings (may contain ill-formed contents) and UTF strings (must be
          well-formed).
- Zach asserted that the ability to use numeric escapes is more
          important than preventing encoding of ill-formed UTF sequences.
- Tom noted that the current CWG resolution seems evolutionary given
          that it contradicts existing practice.
- Markus noted a further benefit, maintaining consistency with
          languages like Java. Additionally, he explained that some logging
          libraries write strings with non-printable characters replaced with
          escape sequences and that the ability to copy and paste those
          strings verbatim into code is useful.
- Tom noted an additional use case; strings encoded as Modified UTF-8.
          Modified UTF-8 requires use of escapes to encode U+0000 as an
          overlong two-byte sequence.
- Markus added that the same use case applies to creation of CESU-8
          strings; escape sequences are needed for the individual encoding of
          UTF-16 surrogate pairs.
- Tom stated that it is useful to embed a null terminator with
          \0, though it would still be possible to do so using
          \u0000.
- Mark observed that implementations can warn if a literal that
          contains numeric escape sequences produces an ill-formed UTF
          string.
- Poll: Continue to allow hex and octal escapes that indicate code unit
          values, requiring only that they fit into the range of the code unit
          type.
          
      
 
- char8_t:
    
      - Zach started the discussion by noting that use of char8_t
          does not help to enfore preconditions; ill-formed UTF-8 can appear
          in sequences of char8_t just as it can in sequences of
          char. How does char8_t help?
- Mark acknowledged that preconditions can always be violated.
- Tom offered make_text_view and UDLs as examples.
          char8_t enables writing generic functions that work with
          ordinary and UTF-8 string literals.
- Zach summarized, I see, it allows authors of overload sets to
          differentiate behavior.
- Markus chimed in, starting to see the motivation for char8_t;
          generic code can't distinguish encodings unless it is represented in
          the type system.
- Markus further noted that the standard library has a high percentage
          of generic code relative to code outside the standard.
- Tom agreed, but noted there is more focus on generic libraries now
          than in the past and that the committee is working hard to improve
          support for generic programming as exemplified by Concepts.
- Tom mentioned that we have multiple encodings we have to support.
- Markus acknowledged the dilemma; many other languages have settled on
          a single internal encoding, but C++ supports multiple encodings and
          there is no clear dominant one across the industry.
- Mark added that there is considerable baggage with char and
          the implementation definedness of the execution encoding.
- Markus acknowledged the existence of many incompatible string types
          in C++ that are all similar in intent.
- Tom stated that Concepts helps to bring these different string types
          together such that they can be supported by generic code.
- Markus observed that the char8_t proposal changes existing
          behavior.
- Mark noted that u8 literals aren't used much in C++.
- Markus mentioned that Google uses unsigned char and ensures
          use of UTF-8 internally.
- Tom responded that there is a backward compatibility story that is
          aided by C++20 support for class types as non-type template
          parameters as proposed in
          P0732.
 
- Code points vs grapheme clusters:
    
      - Martinho lead the discussion by expressing concern that grapheme
          cluster boundaries are not stable.  The situation with Swift today
          is that behavior depends on the version of ICU installed on the
          system.  Behavior is therefore non-portable.
- Mark mentioned that we have a similar issue with the timezone
          database and <chrono>. Behavior depends on which
          version of the database is installed.
- Tom acknowledged the concern; we won't have portable grapheme
          breaking in C++ either.
- Markus provided a link to a recent document authored by Mark Davis
          and noted a limitation imposed by the instability of grapheme cluster
          boundaries; stored EGC indexes are invalidated when changing Unicode
          versions.
        
      
- Zach asked, as someone without a lot of end user experience, how
          often do programmers make poor choices regarding handling of Unicode
          text?
- Steve responded that he sees bug reports frequently where programmers
          inadvertently sliced grapheme clusters.
- Martinho provided links to a couple of example defects:
        
      
- Tom asked, so how do we make a decision about how to proceed.
- Martinho countered that we don't need to yet.
- Steve chimed in with, how do we make them less scary?
- Mark responded with a question, how are things going to look?  New
          types on top of std::string_view and
          std::string?
- Zach provided a brief overview of how Boost.Text handles grapheme
          clusters.
- Markus asked, does Boost.Text enforce well-formed UTF-8?
- Zach responded that it encourages, but does not require well-formed
          UTF-8.
- Markus mentioned that validation can be expensive.  If you know your
          input is well-formed, then lookups can be optimized without having to
          decode.
- Tom described this as a design trade off; validate up front and reap
          performance benefits later, or skip validation and lazily validate
          later.
- Markus noted that it is common for programmers to slam content into
          strings and then validate them later.
- Mark mentioned that P1072 helps
          to support that use case.
- Tom asked, assuming that we standardize a type that enforces
          well-formedness, is there room for standardizing a non-validating
          type as well?  Or does that become an expert level do-it-yourself
          feature?
- JeanHeyd advocated an adapter-over-range approach for
          std::text; tags can suppress validation when it isn't
          necessary.
- Tom observed that it isn't possible to enforce well-formedness on
          views without introducing validation costs.
- Steve mentioned that adapters over containers make memory allocation
          someone else's problem, for better or worse.
- Martinho advocated that, if performing validation on container
          construction, would prefer replacement character substitution since
          throwing gives you nothing.  Invalid input can be used as an attack
          vector; if UTF-8 input is all 0x80, replacement will triple
          the buffer size.
- Zach expressed openness to an adapter approach for Boost.Text.
- Mark expressed a preference for the adapter approach as it supports
          underlying containers with reference counts or small buffer
          optimizations.
- Mark also mentioned that wrapping std::string provides a
          nice transition story.
 
- Tom then summarized the plan for the San Diego meeting: discussion of the
      Unicode Direction paper,
      P1072, Isabella Muerte's
      P1275, and then small groups to
      focus on further proposal incubation.
December 5th, 2018
Draft agenda:
  - Draft guidelines for other WGs and SGs to request SG16 review.
- char8_t remediation for backward compatibility impact.
- Review P1072 following San Diego LEWGI feedback.
Attendees:
  - Bryce Adelstein Lelblach
- Cameron Gunnin
- Corentin Jabot
- Florin Trofin
- JeanHeyd Meneide
- Mark Zeren
- Markus Sherer
- Peter Bindels
- Steve Downey
- Tom Honermann
- Zach Laine
Meeting summary:
  - Draft guidelines for other WGs and SGs to request SG16 review.
    
      - Tom introduced the topic.  Bryce had suggested that SG16 produce a
          rubric detailing guidance for when other WGs and SGs should consult
          SG16.  SG7 recently produced such a document.  Tom felt this was an
          excellent idea and is now bringing it before SG16 for discussion.
- Tom first asked Bryce where SG7's rupric can be found.
- Bryce replied that it will be in the San Diego post-meeting
          mailing.
- Tom then asked for suggested guidance.
- Steve suggested a simple litmus test; "if it smells like
          Unicode..."
- Corentin mentioned having discussed this with Titus in San Diego and
          suggested that anything having to do with text processing should be
          sent our way.
- Bryce asked about locales and it was agreed that Unicode has locale
          dependencies.
- Peter mentioned the {fmt} library; code units vs code points?
- Tom replied that we discussed {fmt} with Victor in SG16 on several
          occassions.
- Bryce asked if {fmt} is in C++20 and whether SG16 has any concerns
          about it.
 [Editor's note: not yet, but it passed LEWG review in San
          Diego].
- Zach replied that it is certainly no worse than what we have
          now.
- Mark commented, bird in hand... even if we had issues with the {fmt}
          library, there is no competing proposal.
- Corentin mentioned that {fmt} does not yet handle char16_t
          and char32_t, but can be extended later.
- JeanHeyd elaborated, template overloads are present, but formatting
          strings must be char or wchar_t at the moment.
- Zach suggested a requirement; that we need to reserve the right to
          explicitly specialize standard library templates that might be
          instantiated by users with char8_t.
- Tom asked for a volunteer to identify such templates.
- Zach volunteered.  Hooray for Zach!
- Steve suggested that anything involving command lines, file names,
          and environment variables should be sent our way.
- Mark added, any kind of encoding.  Including source encoding.
- Tom asked, do we want SG13 (HMI) members consulting us for text
          input and presentation issues?
- Steve replied, when they get to that point, yes.
- Tom asked for a volunteer to draft the rubric paper.
- Steve volunteered.  Hooray for Steve!
 
- char8_t remediation for backward compatibility impact.
    
      - Tom gave a brief introduction and pointed the group at a rough draft
          paper posted to the mailing list
          (
          http://www.open-std.org/pipermail/unicode/2018-December/000180.html).
- Time was given for those who had not yet seen it to quickly scan
          it.
- Steve commented on the proposed change to make ostream inserters for
          char16_t and char32_t ill-formed; for anyone
          actually relying on printing pointer values, a fix should be easy,
          add a cast to void*.
- Corentin wondered if anyone actually does
          std::cout << u8"text".
- Zach observed that someone could conceivably want to use the ostream
          inserters to print char16_t values formatted as hex integers,
          say when dumping UTF-16 code units for diagnostic purposes.
- Steve asked if it would be problematic to allow std::string
          to be constructed with char8_t based data.
- Zach responded that he didn't see any harm.
- Peter chimed in that std::string always holds UTF-8 in the
          code base he works on.
- Tom stated that supporting std::string interoperability with
          u8 literals would require a lot of overloads for the
          char based specialization of std::basic_string.
          Implementors would not like that.
- Zach asserted that he wants, somehow, to be able to construct
          std::string objects initialized with u8
          literals.
- Tom asked if using a factory function would suffice.
- Zach responded that would require updates and therefore doesn't
          address existing code.
- Markus advised thinking of std::string_view in addition to
          std::string.
- JeanHeyd asked about allowing std::u8string to be
          convertible to std::string.
- Tom stated he thought that might allow most existing code to just
          work.  But, would we really want that?  Implicit conversions are
          often undesirable.
- Peter responded that he thought so, yes.  Existing code mixes UTF-8
          with char.
- Corentin observed that implicit conversion from
          std::u8string could lead to mojibake.
- Zach acknowledged that std::string doesn't guarantee any
          encoding.
- Peter asked about the possibility of making it UB for
          std::u8string to contain non-UTF-8 data.
- Zach requested not adding encoding guarantees for strings.
- Peter responded, it doesn't actually work anyway since you couldn't
          update a string without introducing UB.
- Tom asked if the UDL approach to providing UTF-8 data in char
          via u8 literals was realistic.
- Zach stated we shouldn't be suggesting macros as solutions.
 [Editor's note, macros are not required to create a solution that
          works for C++17 and C++20, but source code changes are
          required].
- Tom asked if use of -fno-char8_t is a valid option noting
          that it forks the language.
- Zach suggested, perhaps this is our first good opportunity to put
          tooling to use as part of a C++20 migration story.
- Corentin observed that it should be easy to use clang-tidy
          to update code.
- JeanHeyd asked if char8_t could implicitly convert to
          char.
- Corentin stated that he wants conversions to be explicit.
- Tom mentioned that the draft paper is intended to tell a migration
          story.
- Markus explained that he felt the economics are not right.  The
          current situation puts the burden of addressing breakage on many
          programmers.
- Zach suggested adding tooling automation to the paper.
- Tom said he could add clang-tidy, what else should be
          mentioned?
- Zach stated he'd like to see compilers do fix-ups themselves.
- Corentin observed that implementors are unlikely to have something
          in place in the necessary time frame.
- Tom asked about experimentation.
- Peter stated his code base isn't using u8 literals today
          and won't be able to.
- Markus observed that not all code is equally modifiable.  For
          example, Google's code base has a lot of Google specific code, but
          also uses a lot of third party code.  Updating the third party code
          and potentially maintaining differences from upstream, is more
          difficult than updating Google's own code.
- Tom suggested a C++17 compatibility library could be made available
          that implements some of the remediation approaches noted in the draft
          paper.
- Bryce asked about the possibility that the char8_t proposal
          might be re-litigated due to backward compatibility concerns.
- Tom replied, sure, anything is possible.
- Bryce suggested adding data about expected breakage to the
          remediation paper to avoid scaring people.
 
- Peter requested time in SG16 for presenting and collecting feedback
      on a simple 2D graphics library he has been working on.
December 19th, 2018
Draft agenda:
  - Continue discussion of char8_t remediation for backward compatibility
      impact.
    
      - Discuss pros/cons of keeping u8 literals char based and introducing
          new char8_t based U8 literals.
 
- Review P1072 following San Diego LEWGI feedback.
Attendees:
  - Bryce Adelstein Lelblach
- JeanHeyd Meneide
- Mark Zeren
- Peter Bindels
- Steve Downey
- Tom Honermann
Meeting summary:
  - Continued discussion of char8_t remediation for backward compatibility
      impact.
    
      - Tom introduced the discussion topic.  One approach to minimizing
          backward compatibility impact would be to restore u8
          literals being char-based and to introduce a new U8
          literal prefix for char8_t based UTF-8 literals.
- Mark suggested following up with Google folks to determine if this
          would address their concerns.
- Tom stated he talked to Chandler following the San Diego vote.
          Concerns expressed were that the potential backward compatibility
          impact exceeded the benefits.
- Tom asked for pros and cons for a new U8 literal prefix.
- JeanHeyd was first to note the obvious primary benefit, avoids
          backward compatibilty issues.
- Tom agreed, but added that P0482 does have other minor breakage; the
          changes to the return types of the u8string member functions
          of std::filesystem::path.
- JeanHeyd pointed out that the visual difference between u8
          (lowercase) and U8 (uppercase) is subtle and bad for
          readability.
- Bryce agreed and pointed out that MISRA forbids identifiers that
          look similar.
- Bryce further stated that use of u and U for
          char16_t and char32_t literals was a mistake for
          the same reason.
- Mark mentioned a pro, this approach preserves investment in any
          increased use of u8 literals in code over the next few
          years before migration to C++20.
- Bryce suggested that compiler warnings could be added to help educate
          programmers about the change when compiling in pre-C++20 language
          modes.  This still depends on compiler upgrades of course.
- Tom agreed and noted that Clang trunk already issues such a warning
          when invoked with -Wc++2a-compat.
- Mark asked if a cast or similar approach for converting u8
          literals to char-based types doesn't suffice.
- Tom responded that Zach expressed a desire for existing code to
          continue working at our last meeting.
- Tom asked what adoping an additional literal prefix would mean for
          messaging.  What would we be telling programmers going forward?  We
          could deprecate u8 literals and promote U8 going
          forward.
- JeanHeyd responded that deprecation doesn't really help to move
          programmers towards use of char8_t.  He'd prefer to break
          things, get over the migration hump, and keep a cleaner design.
- Mark asked why the as_char approach suggested in the draft
          paper doesn't suffice.
- JeanHeyd responded that it requires markup, so existing code requires
          changes.
- Mark pondered, a new prefix does kind of fix everything.  It doesn't
          have to be U8, we could use utf8 or similar.
- JeanHeyd suggested we could introduce new prefixes for all of UTF-8,
          UTF-16, and UTF-32 in order to maintain symmetry and to address the
          subtle u vs U concerns.
- Tom suggested another pro; a new prefix avoids potentially forking
          the language by unintentionally encouraging use of a
          -fno-char8_t option as has happened with -fno-rtti
          and -fno-exceptions.
- Mark asked where we're at with proposing char8_t to
          WG14.
- Tom responded that he would like to get a proposal in front of WG14
          at their October 2019 meeting in Ithaca.  In addition, he'd like to
          have proposals ready for our other proposals targeting core language
          features:
        
          - P1097 - "Named character
              escapes"
- P1041 - "Make
              char16_t/char32_t string literals be UTF-16/32"
- Source file encoding tags (no proposal yet).
 
- Tom added another pro, or con, depending on perspective; a new prefix
          maintains the ability to continue writing UTF-8 based applications
          with char-based types.
- Mark opined that moving away from char aliasing issues is
          compelling.
- Steve noted that UTF-8 in char-based types often seems to
          work, but works for the wrong reasons.  For example, UTF-8 encoded
          source files compiled as "8-bit ASCII" such that the UTF-8 code units
          just get copied from the source file.
- Tom asked about messaging again, what message are we sending to
          library authors?  Do they write their UTF-8 based interfaces against
          char or char8_t?  How do they choose?
- Mark observed that this isn't a new problem.  Library authors code
          against std::string today and it isn't a universal string
          type or a great type for Unicode.  We'll have similar concerns
          with the introduction of std::text vs
          std::string.
- Tom concluded, sounds like templates will be the way to go.
- JeanHeyd commented that views help.  For example, text_view
          can effectively type erase the code unit type.  But what does one
          assume for encoding for char?
- Tom responded that the execution encoding must be assumed per
          existing precedent in the standard.
- Mark concluded that he doesn't see a way out of the char
          vs char8_t problem.  But, with char8_t being
          available, we'll get experience using it that will inform future
          library efforts.  In the short term, being able to use either
          char or char8_t is advantageous.
- Peter chimed in from chat (due to a non-functioning microphone):
        
          - "looks like my mic is completely broken. From what I can tell
              this is like the uptake of uint8_t, it takes some time
              but over time everybody learns that these types have a given
              fixed meaning and others are a :shrug: type"
 
- Tom presented a few polls.
        
          - Poll 1: Add defined-as-deleted overloads for
              operator<< for
              basic_ostream<char, ...> specializations.
              
          
- Poll 2: Allow deprecated std::filesystem::u8path to be
              called with sources with char8_t value type.
              
            
              - Peter explained his against vote; this maintains working
                  around something that we don't really want to work in the
                  first place.
 
- Poll 3: Restore char-based u8 literals and
              introduce new char8_t based literals with a new prefix.
              
            
              - Bryce explained his against vote; we'll need to converge on
                  a very short prefix, 2 characters at most.  That seems
                  unlikey.
- JeanHeyd commented that he still prefers to go with a
                  solution that pushes the community in a new and consistent
                  direction.  u8 literals aren't widely used, so we
                  still have time to course correct.
- Mark asked if tooling could be used to fix existing code by
                  converting u8 literals to ordinary literals encoded
                  with escapes.
- Tom responded that we discussed tooling possibilities at the
                  last meeting.  Specifically Zach's suggestion that this
                  could be a good test for Titus' goals for tooling.
 
- Poll 4: Assuming u8 literals remain char8_t
              based, allow char arrays to be initialized with
              u8 string literals.
            
              - Tom stated that the reason to consider this is that the
                  as_char approach doesn't work for array
                  initialization.
- Bryce stated he wanted more time to think about this.
- Mark agreed with wanting more time.
- Poll not taken.
 
 
 
- Review P1072 following San Diego LEWGI feedback.
    
      - Mark provided a summary of changes:
        
          - No buffer moving features; feedback from San Diego was negative
              regarding that due to exposure of implementation details.
- resize_default_init() resizes the string such that the
              added content is default initialized.  Failure to write to the
              added elements results in undefined behavior.
- This approach matches Google's existing implementation.
- This approach is compatible with existing allocators.
- libc++ is already using this approach as part of its
              std::filesystem implementation to remove an
              allocation.
- This doesn't preclude a buffer migration feature in the
              future.
- The paper establishes that basic_string is allocator
              aware.
 
 
January 9th, 2019
Draft agenda:
  - Preparation for the Kona pre-meeting mailing deadline on 1/21.
    
      - Review the SG16 rupric assuming a draft is available.
- Review the char8_t remediation paper assuming a revision is
          available.
- Review other papers requiring an update for Kona (P1041, P1097).
 
Attendees:
  - Cameron Gunnin
- JeanHeyd Meneide
- Mark Zeren
- Michael Spencer
- Steve Downey
- Tom Honermann
- Victor Zverovich
- Zach Laine
Meeting summary:
  - Tom stated that he was unable to get a revision of the char8_t
      remediation paper ready for this meeting, so no further discussion on
      it for now.
- We then started reviewing Steve's
      
      draft SG16 rubric.
    
      - Victor asked about locales as he and Howard have been working on
          chrono updates that add overloads based on locale.
- Tom said, yes, bring to SG16 anything involving locales.
- Zach expressed a preference for just those locale features that
          relate to Unicode.
- Tom stated a preference for having a chance to offer our expertise;
          to help ensure appropriate use of locales.
- Michael asserted that we don't want new Unicode stuff dependent on
          std::locale.
- Zach observed that it is very hard to write portable code that uses
          std::locale due to implementation defined things.  For
          example,
        
          - the set of locales is not specified.
- even the "C" locale is not portable.
 
- Tom suggested that the language regarding "requires review" by SG16
          be softened as we don't have standing to actually require review.
- Zach disagreed and offered the perspective that this paper should be
          adopted by the LEWG and EWG chairs with the expectation that the
          chairs will enforce review requirements.
- Tom expresseed enthusiasm for that perspective; this paper should be
          targeted to LEWG and EWG to get their buy-in.
- Tom asked about the SG-7 rubric in the hopes that we could
          compare/contrast with it.
- Michael located it and provided a link:
        
      
- Tom suggested we should have a section on text containers and string
          builders.
- Zach asked if we care about string builders.  If a string builder is
          used in such a way that it slices code unit sequences, isn't that
          just an incorrect use of the builder?
- Tom stated he wants to catch any new operations that are problematic
          for some encodings.  For example, reliance on broken interfaces like
          std::ctype::widen
- Cameron suggested we're interested in any new overloads involving
          Unicode types.
- Zach proposed adding a section detailing encoding assumptions.
- Tom agreed and suggested that can appear in the text encoding
          section; we need to make it explicit that char based values of
          unknown origin are assumed to have execution encoding.
- Zach disagreed with the assumption of execution encoding stating that
          they should instead have an unknown encoding and their contents
          should only be forwarded and operated on generically (e.g., as a bag
          of bytes), not examined as having data in any particular
          encoding.
- Tom challenged this noting that reasonable assumptions can be made.
          On Windows, execution encoding matches the system code page, on POSIX
          it corresponds to the LANG or LC_CTYPE environment
          variables, and is generally ASCII elsewhere (except z/OS).
- Zach noted that assumption doesn't work for file names.
- Tom agreed that filenames are special; they don't have a known
          encoding.  But C++17 at least offers std::filesystem with
          means to get a filename in a displayable format via the
          *string and generic_*string member functions of
          std::filesystem::path.
- Zach asserted those member functions are a trap; the names retrieved
          via those member functions don't necessarily round trip.
- Michael observed that programmers need to be able to display file
          names and, if the standard doesn't provide a way to do it,
          programmers will do it themselves, probably badly.
- Steve noted that file names may not be presentable at all.
- Michael reiterated that we need interfaces that do the right thing
          easily; e.g., to create a display name for a file in something other
          than std::filesystem::path.
- JeanHeyd observed that some of these problems would go away with a
          new I/O layer that uses std::filesystem::path instead of
          const char* interfaces.
- Steve noted that we can't replace the OS interfaces though.
- Tom stated that we need to update the paper to require consultation
          with SG16 for anything involving file names.
 
- P1378R0: std::string_literal
    
      - JeanHeyd provided a link to an updated draft revision of the paper:
        
      
- JeanHeyd introduced the motivation; to provide means to guarantee
          that a string literal is used in invocations of std::embed
          in order to enable dependency discovery in build systems.
          Additional motivation is to provide means to avoid unintended
          array-to-ponter decay and to handle string literals with embedded
          null characters without having to depend on deduction via array
          reference in order to obtain the actual array size of the
          literal.
- JeanHeyd acknowledged that the proposal changes the type of all
          string literals in ways that are unlikely to be acceptable.
- Michael observed that the proposed design doesn't actually meet the
          motivation requirements for std::embed since the proposed
          type is copyable and therefore can be produced by many kinds of
          expressions, not just literals.
- Steve suggested another motivation: requiring string literals for
          things like format strings and SQL; requiring a literal would avoid
          the possibility of consuming user provided input that could be used
          as an attack vector as in SQL injection attacks.
- Zach observed that immediate (consteval) functions can help
          in this regard since they can't consume run-time input by design.
- Tom asked about a different implementation strategy; making all of
          the class constructors private and befriending a UDL.  This would
          ensure the class could only be constructed by calling a UDL (assuming
          copy constructors are deleted).
- Michael suggested the constructors could also use compiler magic to
          require construction via a literal.
- Steve noted that having the size of a string literal readily
          available would be useful.
- Michael noted that this design impacts type deduction for
          auto declared variables and template parameters.
- Zach suggested that two-step conversion as would be required for
          backward compatibility would be problematic.
- JeanHeyd responded that any number of builtin implicit conversions
          are already permitted.
- Tom wondered if the number of conversions might impact overload
          resolution.
- JeanHeyd suggested the design might be useful to limit when error
          handling and encoding validation would be necessary for
          std::text.
- Zach countered that string literals can form ill-formed code unit
          sequences.
- Zach acknowledged that the ability to avoid strlen could be
          a big deal.
- Michael asserted that the motivational use cases can largely be met
          with immediate (consteval) functions.
- JeanHeyd provided an additional motivation; comparison between string
          literals.  Today, whether "foo" == "foo" is unspecified.
          The proposed std::string_literal could make such comparisons
          work as expected.
- Mark asserted that an implementation is needed to evaluate backward
          compatibility impact.
- Mark noted having previously had a desire to determine if a pointer
          pointed to a string literal; to avoid storing the string
          contents.
- Zach and Tom both expressed having used or encountered string pool
          classes that exist to collapse matching strings to a single copy.
 
- WG21 Direction group
      
      response to P1238R0: SG16: Unicode Direction
    
      - Steve summarized the response.
- Tom noted that the DG did not comment on the constraints listed in
          the paper.
- Mark noted the DG request to clarify scope.
- Zach stated that we need an elevator pitch and suggested: We want all
          Unicode algorithms available via standard interfaces for C++23.
 
- Tom announced that the next meeting will start an hour later than
      usual.