Document number: N3398=12-0088
Date: 2012-09-19
Project: Programming Language C++, Library Working Group
Reply-to: Beman Dawes <bdawes at acm dot org>

String Interoperation Library
Adapting Standard Library Strings and I/O to a Unicode World

This paper proposes library components to ease string interoperability problems for Unicode and other string encodings. These problems occur with the current C++11 standard library. Read the Components... section for a full description of problems or look at some simple examples here.

I first encountered the C++03 version of string interoperability problems while providing Unicode support for the internationalization of commercial GIS software. These problems appeared again while working on the Boost Filesystem Library. They have become more apparent as compiler support for C++11's additional Unicode support has made it easier to write programs that run up against current limitations.

Work began on the proposal when the Library Working Group requested string encoding conversion arguments be removed from class path in the initial C++11 proposal for a Filesystem library. That sparked this proposal as a far more general solution to string encoding conversion problems than a Filesystem specific proposal.

The proposed components are separable. Any of the components except codecs and codec helpers could be removed, although ease-of-use would suffer as a result.

The proposed components are suitable for a C++ standard library Technical Specification (TS), either standalone or as part of a larger TS.

The proposed components are pure additions. No C++03 or C++11 headers are changed and no current user or standard library code is broken, subject only to the usual namespace discipline caveats.

Proposed wording is provided. The proposed wording relies only on C++11 features. Should a basic_string-reference library TS be accepted, it might be used to reduce the number of signatures in this proposal.

A "proof-of-concept" implementation of the proposals (and more) is available at github.com/beman/string-interoperability.

Table of contents

Introduction
Revision history
Proposed components and their motivation
   Codecs and their helpers
   conversion_iterator class template
   copy_string algorithm
   make_string function template
   to_string conversion functions, converting stream inserters and extractors
   Explicit UTF-8 encoded types char8_t and u8string
Design
   Design paths not taken
Existing practice with string interoperability
Existing practice with conversion iterators
Acknowledgements
TODO List
Proposed Wording
String interoperation 
Header <string_interop.hpp> synopsis 
Codecs 
   Class default_codec 
   Requirements on codec classes 
      end-of-sequence iterator requirements 
      from_iterator requirements 
      to_iterator requirements 
      Constructor requirements 
   select_codec 
UTF-8 typedefs (Informative) 
Class template conversion_iterator 
   Synopsis 
   Constructors 
Algorithm copy_string 
make_string function templates 
to_string function templates 
UTF-8 string support 
Stream inserters 
Stream extractors 

Revision history

This paper is a complete rewrite of N3336, Adapting Standard Library Strings and I/O to a Unicode World. It reflects C++ committee feedback from the LWG's review of N3336 and further analysis and experimentation.

Proposed components and their motivation

Codecs and their helpers

Provide an iterator-based composable solution to string encoding and type conversion that works well in generic code and does not require heap allocated temporary strings or buffers.

These low-level components provide the foundation for most of the higher level components. They provide an abstraction of string encoding and type conversion that frees higher-level components from details.

Specific motivations include:

conversion_iterator class template

Provides an iterator adapter that performs character type and encoding conversion on-the-fly.

While codecs (see below) offer worthwhile benefits, they essentially provide low-level, encoding specific, iterators. The conversion_iterator class template provides a simple iterator adaptor that composes two codecs regardless of encoding into a single, easy-to-use iterator.

With  conversion_iterator, implementation of many mid and high level character type and encoding conversions becomes trivial. It is useful to standard, user, and third-party library implementers, as it provides a vocabulary iterator type that is far easier to use than roll-your-oven conversions based on codecvt facets.

copy_string algorithm

Provides an algorithm like std::copy, except performing type and encoding conversion as it copies.

Solves many end user problems.

Provides a simple way to both specify and implement other high-level convenience functions.

make_string function template

Provides a generic string type and encoding conversion factory function.

to_string conversion functions, converting stream inserters and extractors

Provide easy-to-use (automatic, in the case of inserters and extractors) solutions to irritating string interoperability problems, in the style of similar standard library functionality.

With the C++11 standard library:

int i = 50;                      // OK
long j = i;                      // OK
cout << j;                       // OK
string s = to_string(i);         // OK, C++11 provides this overload
wstring t = to_wstring(s);       // error!
u8string u = to_u8string(t);     // error!
u16string v = to_u16string(s);   // error!
u32string w = to_u32string(v);   // error!
string x = to_string(v.c_str()); // error!
string y = to_string(U"50");     // error!
std::cout << t;                  // error!

With the proposal (and the unmodified C++11 standard library):

int i = 50;                      // OK
long j = i;                      // OK
cout << j;                       // OK
string s = to_string(i);         // OK
wstring t = to_wstring(s);       // OK
u8string u = to_u8string(t);     // OK
u16string v = to_u16string(s);   // OK
u32string w = to_u32string(v);   // OK
string x = to_string(v.c_str()); // OK
string y = to_string(U"50");     // OK
std::cout << t;                  // OK

Explicit UTF-8 encoded types char8_t and u8string

Specifies a character type and a string type that are unambiguously UTF-8 encoded.

UTF-8 is the most important, and often the only, byte-sized character encoding required by many internationalized applications. Yet it is the only one of the critical Unicode encodings (UTF-8, UTF-16, UTF-32) that does not have its own C++ character type. This causes endless technical problems, such as the inability to overload on a UTF-8 character type, for those who want to write portable code. It causes developers who otherwise think highly of C++ to believe the standards committee is stuck in the distant past when dinosaurs roamed the earth.

The proposed string interoperability facilities run afoul of the lack of a UTF-8 character type because they use generic programming techniques that depend on a one-to-one relationship between character value types and their encodings.

This feature is far more speculative than the rest of the proposal. It has been implemented and has been used in an experimental branch of the Filesystem library. But there is no user experience whatsoever. It leaves u8 string literals twisting in the wind, and that's a serious problem. It needs much further study and discussion before moving forward.

Design

The copy_string algorithm was a starting point for the design. The algorithm was arrived at by analyzing numerous real-world string conversion problems encountered by Boost Filesystem and while internationalizing various industrial applications. During that analysis, it was observed that std::copy algorithm would be a common solution to those problems if it could be given generic versions of John Maddock's Unicode conversion iterator adaptors used in his Boost Regex implementation. The conversion_iterator and codec designs evolved as the underlying conversion abstractions needed to support copy_string.

The key design for composition of codecs is the use of UTF-32 as an common intermediate encoding that works without an intermediate temporary string when applied at the iterator level. This is the same approach, albeit a compile time rather than run time, taken by the International Components for Unicode (ICU) library.

Design paths not taken

This proposal deals with C++11 std::basic_string, standard character types, and their encodings. The deeper attributes of Unicode characters are not addressed. See Mathias Gaunard's Unicode project for an example of deeper Unicode support.

This proposal provides compile-time solutions. It does not provide runtime solutions such as provided by the ICU library.

This proposal provides work-arounds for C++11's lack of UTF-8 strings. Several users have argued that instead of work-arounds, the C++ standard should require UTF-8 encoding for both C-style char strings and std::string. This proposal assumes that is too great a leap forward at this time.

Existing practice with string interoperability

Boost Filesystem Version 3's class path solves some of the string interoperability problems, albeit in limited context. A function that is declared like this:

void f(const path&);

Can be called like this:

f("Meow");
f(L"Meow");
f(u8"Meow");
f(u"Meow");
f(U"Meow");
// ... many additional variations such as basic_strings and iterators

This string interoperability support has been a success. It does, however, raise the question of why std::basic_string isn't providing the interoperability support. Users are misusing paths as general string containers because they provide interoperability. The string interoperability cat is out of the bag. The toothpaste is out of the tube.

See Boost.Filesystem V3 class path for an example of how such interoperability might be achieved.

Experience with Boost.Filesystem V3 class path has demonstrated that string interoperability brings a considerable simplification and improvement to internationalized user code, but that having to provide interoperability without the resolution of the issues presented here is a band-aid.

Existing practice with conversion iterators

Boost Regex for many years has included a set of Unicode conversion iterators as an implementation detail. Although these do not provide composition, they do demonstrate the technique of using encoding conversion iterators to avoid creation of temporary strings.

Acknowledgements

Peter Dimov inspired the idea of string interoperability by arguing that the Boost Filesystem library should treat a path is a single type (i.e. not a template) regardless of character size and encoding. The experienced gained with that approach led to a much clearer understanding of where to draw the line between functionality provided by a library such as Filesystem, and the standard library (or a TS) itself.

John Maddock's Unicode conversion iterators demonstrated an easy-to-use, more efficient, and STL friendly way to perform character type and encoding conversions.

Yakov Galka suggested attacking string interoperability with free functions to reduce or eliminate changes to basic_string.

The C++11 standard deserves acknowledgement as it provides the underlying language and library features that allow Unicode string interoperability:

TODO List

To Do

  • Add error handling argument where appropriate.

  • Add three pointer case signatures for basic_ostream<wchar_t>&

  • Add stream extractors.

  • Add usage examples to Proposed Wording.

  • Add example of how would apply to Filesystem class path.

Proposed Wording

Italic text highlighted in yellow is commentary and not part of the proposal.

The wording assumes the whole of the ISO C++ Standard Library introduction [lib.library] is included by reference.

String interoperation    [str-x]

This library provides facilities that allow interoperation between strings of differing types and encodings, and ease the use of strings with UTF-8 encoding. The following encodings are supported:

Header <string_interop.hpp> synopsis    [str-x.synopsis]

namespace std {

  template <> struct char_traits<unsigned char>;

namespace tbd {  // tbd is to be decided

  //  UTF-8 typedefs [str-x.utf8-typedefs]
  typedef unsigned char           char8_t;
  typedef basic_string<char8_t>   u8string;
 
  //  codecs [str-x.codec]
  class narrow;
  class wide;     
  class utf8;     
  class utf16;    
  class utf32;    
  class default_codec;  // See [str-x.codec.default]

  //  select_codec [str-x.codec.select]
  template <class charT> struct select_codec;
  template <> struct select_codec<char>       { typedef narrow type; };
  template <> struct select_codec<wchar_t>    { typedef wide   type; };
  template <> struct select_codec<char8_t>    { typedef utf8   type; };
  template <> struct select_codec<char16_t>   { typedef utf16  type; };
  template <> struct select_codec<char32_t>   { typedef utf32  type; };
 
  //  conversion_iterator [str-x.cvt-iter]
  template <class ToCodec, class FromCodec, class InputIterator>
    class conversion_iterator;

  //  copy_string algorithm [str-x.copy_string]
  template<class InputIterator, class FromCodec,
           class OutputIterator, class ToCodec>
  OutputIterator copy_string(InputIterator first, InputIterator last,
    OutputIterator result);

  //  make_string function templates [str-x.make_string]
  template <class ToCodec,
            class FromCodec = default_codec,
            class ToString = std::basic_string<typename ToCodec::value_type>,
            class FromString>
  ToString make_string(const FromString& ctr);

  template <class ToCodec,
            class FromCodec = default_codec,
            class ToString = std::basic_string<typename ToCodec::value_type>,
            class InputIterator>
  ToString make_string(InputIterator begin);

  template <class ToCodec,
            class FromCodec = default_codec,
            class ToString = std::basic_string<typename ToCodec::value_type>,
            class InputIterator>
  ToString make_string(InputIterator begin, std::size_t sz);

  template <class ToCodec,
            class FromCodec = default_codec,
            class ToString = std::basic_string<typename ToCodec::value_type>,
            class InputIterator,
            class InputIterator2>
  ToString make_string(InputIterator begin, InputIterator2 end);

  //  to_string function templates [str-x.to_string]
  template <class FromCodec = default_codec,
    class ToString = std::basic_string<char>, class FromString>
      ToString to_string(const FromString& s);
  template <class FromCodec = default_codec,
    class ToString = std::basic_string<char>, class InputIterator>
      ToString to_string(InputIterator begin);
  template <class FromCodec = default_codec,
    class ToString = std::basic_string<char>, class InputIterator>
      ToString to_string(InputIterator begin, std::size_t sz);
  template <class FromCodec = default_codec,
    class ToString = std::basic_string<char>, class InputIterator>
      ToString to_string(InputIterator begin, InputIterator end);
  Repeat pattern for to_wstring, to_u8string, to_u16string, to_u32string

  //  UTF-8 string support [str-x.utf8]
  inline const char8_t* u8(const char* s) noexcept;
  inline const char8_t* u8(const string& s) noexcept;
  inline const char*    u8(const char8_t* s) noexcept;
  inline const char*    u8(const u8string& s) noexcept;

}  // namespace tbd

  // stream inserters [str-x.cvt.ins]
  template <class Ostream, class charT, class Traits, class Allocator>
  Ostream& operator<<(Ostream& os, const basic_string<charT, Traits, Allocator>& str);
  basic_ostream<char>& operator<<(basic_ostream<char>& os, const wchar_t* p);
  basic_ostream<char>& operator<<(basic_ostream<char>& os, const char16_t* p);
  basic_ostream<char>& operator<<(basic_ostream<char>& os, const char32_t* p);
  
}  // namespace std

Codecs    [str-x.codec]

Codecs are classes that package one typedef and three class templates. They contain no data or function members and never need to be instantiated. Codec classes may be predefined or user defined. All codec classes except default_codec shall meet the codec requirements [str-x.codec.req]

Table: Predefined codec classes

Class value_type Encoding
narrow char Default locale's char encoding.
wide wchar_t Implementation specific wchar_t encoding.
utf8 char8_t UTF-8
utf16 char16_t UTF-16
utf32 char32_t UTF-32
default_codec

N/A

N/A

Class default_codec    [str-x.codec.default]

Class default_codec is a pseudo-codec that provides lazy select_codec selection. It is for use as a default for codec template parameters that appear before the template parameter that determines charT. Class default_codec is not required to meet the codec class requirements

class default_codec
{
public:
  template <class charT>
  struct codec
  { 
    typedef typename select_codec<charT>::type type; 
  };
};

Requirements on codec classes    [str-x.codec.req]

Codecs are required to contain the following:

  typedef implementation-defined value_type;

  template <class charT>
  struct codec { typedef codec-class-name type; };

  template <class InputIterator>  
  class from_iterator
  {
  public:
    
    from_iterator();
    from_iterator(InputIterator begin);
    from_iterator(InputIterator begin, size_t sz);
    template <class InputIterator2>
      from_iterator(InputIterator begin, InputIterator2 end);
  };

  template <class InputIterator>  
  class to_iterator
  {
  public:
    to_iterator();
    to_iterator(InputIterator begin);
  };

end-of-sequence iterator requirements    [str-x.codec.req.eos]

An end-of-sequence iterator becomes equal to the end-of-sequence value upon reaching the end of the sequence being iterated over. An end-of-sequence iterator constructor with no arguments constructs the end-of-sequence value, which is the only legitimate iterator value to be used for the end condition. The behavior of operator* on an iterator with the end-of-sequence value is undefined. For any other iterator value a const T& is returned. The behavior of operator-> for an iterator with the end-of-sequence value is undefined. For any other iterator value a const T* is returned. The behavior of operator++() for an iterator with the end-of-sequence value is undefined.

Two iterators with the end-of-sequence value are equal. An iterator with the end-of-sequence value is not equal to an iterator that does not have the end-of-sequence value. Two iterators that do not have the end-of-iterator value are equal iff they point to the same element of the sequence.

from_iterator requirements    [str-x.codec.req.from]

The class template from_iterator is an input iterator that is an adaptation of a InputIterator template parameter whose value_type is the same as the parent codec class value_type.  It has a  value_type of char32_t and meets the inpuyt iterator requirements of the C++ standard and the end-of-sequence iterator requirements ([str-x.codec.req.eos]).

to_iterator requirements    [str-x.codec.req.to]

The class template to_iterator is a input iterator that is an adaptation of a InputIterator template parameter whose value_type is char32_t.  It has a  value_type that is the same as the parent codec class value_type.  It meets the input iterator requirements of the C++ standard and the end-of-sequence iterator requirements ([str-x.codec.req.eos]).

Constructor requirements    [str-x.codec.req.ctors]

from_iterator();

Effects: Constructs an iterator with the end-of-sequence iterator value ([str-x.codec.req.eos]).

from_iterator(InputIterator begin);

Effects: Constructs an iterator for the half-open range that begins at begin and ends at the first element with a value of value_type().

from_iterator(InputIterator begin, size_t sz);

Effects: Constructs an iterator for the half-open range that begins at begin and ends at begin + sz.

template <class InputIterator2>
from_iterator(InputIterator begin, InputIterator2 end);

Effects: Constructs an iterator for the half-open range that begins at begin and ends at end.

Remarks: Shall not participate in overload resolution unless InputIterator and InputIterator2 are the same type.

to_iterator();

Effects: Constructs an object with the end-of-sequence iterator value ([str-x.codec.req.eos]).

to_iterator(InputIterator begin);

InputIterator is required to meet the end-of-sequence iterator requirements ([str-x.codec.req.eos]).

Effects: Constructs an iterator for the half-open range that begins at begin and ends when the end-of-sequence iterator value is reached.

select_codec   [str-x.codec.select]

To be supplied.

UTF-8 typedefs (Informative)    [str-x.utf8-typedefs]

In portable internationalized applications, use of UTF-8 encoded C-style array of char strings and std::string is problematic for passing arguments to functions which assume the encoding is the native narrow character encoding. For example, arguments representing filenames for I/O functions or arguments representing content for web sites. Disciplined conversion of all narrow character strings to UTF-8 encoding within an application is a partial solution, but is not enforceable via the C++ language type system and does not help with third-party or standard library functions that assume char strings use native narrow encoding.

The char8_t and u8string typedefs allow the C++ type system to distinguish between native encoded and UTF-8 encoded character strings. The actual type used for char8_t is unsigned char because the C++ language rules require that the representation of the underlying bytes for char and unsigned char are the same (C++ standard: [basic.types]). This allows conversion by compile-time casts with no runtime cost.

Class template conversion_iterator    [str-x.cvt-iter]

Class template conversion_iterator composes a input iterator from a codec to_iterator, a codec from_iterator, and a input iterator. It adapts the input iterator to behave as an iterator to ToCodec::value_type. The type iterator_traits<InputIterator>::value_type is required to be the same as FromCodec::value_type conversion_iterator meets the standard library input iterator requirements and the end-of-sequence iterator requirements ([str-x.codec.req.eos]).

Synopsis    [str-x.cvt-iter.synop]

template <class ToCodec, class FromCodec, class InputIterator>
  class conversion_iterator
    : public ToCodec::template to_iterator<
        typename FromCodec::template from_iterator<InputIterator>>
{
public:
  typedef typename FromCodec::template from_iterator<InputIterator>
    from_iterator_type;
  typedef typename ToCodec::template to_iterator<from_iterator_type>
    to_iterator_type;

  conversion_iterator();
  conversion_iterator(InputIterator begin);
  conversion_iterator(InputIterator begin, std::size_t sz);
  template <class U>
    conversion_iterator(InputIterator begin, U end);

  // other functions as needed to meet standard library requirements
  // for input iterators [input.iterators]
  ...
};

Constructors    [str-x.cvt-iter.ctors]

conversion_iterator();

Effects: Constructs an iterator with the end-of-sequence iterator value ([str-x.codec.req.eos]).

conversion_iterator(InputIterator begin);

Effects: Constructs an iterator for the half-open range that begins at begin and ends at the first element with a value of value_type().

conversion_iterator(InputIterator begin, size_t sz);

Effects: Constructs an iterator for the half-open range that begins at begin and ends at begin + sz.

template <class InputIterator2>
conversion_iterator(InputIterator begin, InputIterator2 end);

Effects: Constructs an iterator for the half-open range that begins at begin and ends at end.

Remarks: Shall not participate in overload resolution unless InputIterator and InputIterator2 are the same type.

Algorithm copy_string    [str-x.copy_string]

template<class InputIterator, class FromCodec,
         class OutputIterator, class ToCodec>
OutputIterator copy_string(InputIterator first, InputIterator last,
                           OutputIterator result);

Requires: result shall not be in the range [first,last).

Effects:

typedef conversion_iterator<ToCodec,
  typename FromCodec::template
    codec<typename std::iterator_traits<InputIterator>::value_type>::type,
  InputIterator>
iter_type;

Returns: std::copy(iter_type(begin, end), iter_type(), result).

make_string function templates    [str-x.make_string]

The make_string functions create a string from a source sequence of characters. The conversion of the type and encoding of the characters in the source sequence of characters to the type and encoding of characters in the created string is performed by conversion_iterator<ToCodec, typename FromCodec::template codec<typename FromString::value_type>::type, typename FromString::const_iterator>, where ToCodec, FromCodec, and FromString are template parameters, as is ToString, the type of the resulting string.

template <class ToCodec,
          class FromCodec = default_codec,
          class ToString = std::basic_string<typename ToCodec::value_type>,
          class FromString>
ToString make_string(const FromString& s);

Returns:  A string containing the characters of the sequence [s.cbegin(), s.cend()).

[Example: A conforming implementation would be:

  typedef conversion_iterator<ToCodec,
    typename FromCodec::template codec<typename FromString::value_type>::type,
    typename FromString::const_iterator>
      iter_type;

  ToString tmp;
  std::copy(iter_type(s.cbegin(), s.cend()), iter_type(),
            std::back_insert_iterator<ToString>(tmp));
  return tmp;

--end example]

template <class ToCodec,
          class FromCodec = default_codec,
          class ToString = std::basic_string<typename ToCodec::value_type>,
          class InputIterator>
ToString make_string(InputIterator begin);

Returns:  A string containing the characters of the sequence [begin, begin+dist) where dist is the distance from begin to the first instance of character iterator_traits<InputIterator>::value_type().

Complexity: O(dist)

template <class ToCodec,
          class FromCodec = default_codec,
          class ToString = std::basic_string<typename ToCodec::value_type>,
          class InputIterator>
ToString make_string(InputIterator begin, std::size_t sz);

Returns:  A string containing the characters of the sequence [begin, begin+sz).

template <class ToCodec,
          class FromCodec = default_codec,
          class ToString = std::basic_string<typename ToCodec::value_type>,
          class InputIterator,
          class InputIterator2>
ToString make_string(InputIterator begin, InputIterator2 end);

Returns:  A string containing the characters of the sequence [begin, end).

to_string function templates    [str-x.to_string]

template <class FromCodec = default_codec,
  class ToString = std::basic_string<char>, class FromString>
    ToString to_string(const FromString& s);
template <class FromCodec = default_codec,
  class ToString = std::basic_string<char>, class InputIterator>
    ToString to_string(InputIterator begin);
template <class FromCodec = default_codec,
  class ToString = std::basic_string<char>, class InputIterator>
    ToString to_string(InputIterator begin, std::size_t sz);
template <class FromCodec = default_codec,
  class ToString = std::basic_string<char>, class InputIterator>
    ToString to_string(InputIterator begin, InputIterator end);
Repeat pattern for to_wstring, to_u8string, to_u16string, to_u32string

Returns: make_string<codec, FromCodec, ToString>(arguments), where codec is narrow, wide, utf8, utf16, and utf32, and arguments is s, begin, begin,sz, and begin,end.
 

UTF-8 string support    [str-x.utf8]

These functions provide copy-less type conversion for use with narrow character strings when no encoding conversion is required. Their semantics take advantage of C++ language rules that ensure the representation of the underlying bytes for char and unsigned char are the same (C++ standard: [basic.types]).

inline const char8_t* u8(const char* s) noexcept;

Returns: static_cast<const char8_t*>(static_cast<const void*>(s)).

inline const char8_t* u8(const string& s) noexcept;

Returns: static_cast<const char8_t*>(static_cast<const void*>(s.c_str())).

inline const char* u8(const char8_t* s) noexcept;

Returns: static_cast<const char*>(static_cast<const void*>(s));.

inline const char* u8(const u8string& s) noexcept;

Returns: static_cast<const char*>(static_cast<const void*>(s.c_str())).

Stream inserters    [str-x.ins]

The stream inserter functions perform stream insertion of an insertion character sequence converted from a source character sequence. The conversion of the type and encoding of the source sequence to the type and encoding of the insertion sequence is performed by a conversion_iterator.

template <class Ostream, class charT, class traits, class Allocator>
Ostream& operator<<(Ostream& os, const basic_string<charT, traits, Allocator>& str);

Effects: For each value of an iterator of type conversion_iterator<typename select_codec<typename Ostream::char_type>::type, typename select_codec<charT>::type, typename string_type::const_iterator> initialized with the source sequence (str.cbegin(), str.cend()], iterate until the end-of-sequence value ([str-x.codec.req.eos]) is reached, inserting the dereferenced value of the iterator into os.

Returns: os.

Remarks: Does not participate in overload resolution if charT and Ostream::char_type are the same type.

basic_ostream<char>& operator<<(basic_ostream<char>& os, const wchar_t* p);
basic_ostream<char>& operator<<(basic_ostream<char>& os, const char16_t* p);
basic_ostream<char>& operator<<(basic_ostream<char>& os, const char32_t* p);

Effects: For each value of an iterator of type conversion_iterator<typename select_codec<char>::type, typename select_codec<p's value_type>::type, p's type> initialized with p, iterate until the end-of-sequence value ([str-x.codec.req.eos]) is reached, inserting the dereferenced value of the iterator into os.

Returns: os.

[Note: The existing basic_ostream<charT,traits>& operator<<(const void* p) prevents use of a template to abstract away the differences between the pointer types covered by above signatures. --end note]

Stream extractors    [str-x.ext]

To be supplied.