Doc: N1985==06-055

Date: 2006-04-06

Author: Jack W. Reeves

jack.reeves@bleading-edge.com

 

Request the Standard Provide Explicit
Specialization of char_traits For All Built-in Character Types

 

Changes to Section 21.1 [lib.char.traits]

 

The Standard does not require explicit specializations of std::char_traits<> for any type other than ‘char’ and ‘wchar_t’. This can lead to some unexpected and undesirable behavior.

 

Consider the following:

            std::basic_string<unsigned char> buffer;

 

This is shorthand for

            std::basic_string<unsigned char, std::char_traits<unsigned char>, std::allocator<unsigned char> > buffer;

 

This yields undefined behavior since the standard only provides a declaration for std::char_traits<> and does not require an explicit specialization of std::char_traits<unsigned char>.

 

Naturally, undefined behavior can mean anything, but several possible results are common.

 

  1. The implementation is minimal and provides only the required explicit specializations. In this case, the above code will not compile.

 

  1. Some implementations attempt to provide a definition of the template class std::char_traits<> itself based upon the requirements in table 39. Although some have argued that this follows from the standard, I have always felt that such a definition has to be considered an implementation extension. In any case, it is problematic since it is basically impossible to provide a template definition that will correctly implement the requirements of table 39 for any POD type. The typical definitions usually do work when specialized with unsigned char however. In such cases, std::basic_string<unsigned char> will appear to work correctly.

 

  1. Some implementation may provide an implementation extension in the form of an explicit specialization of std::char_traits<unsigned char>. Naturally, on such platforms, std::basic_string<unsigned char> will work correctly.

 

Since the presence of a correct specialization of std::char_traits<unsigned char> is currently an implementation extension, any client attempting to use std::basic_string<unsigned char> in a portable manner will have to provide their own character traits class. Furthermore, they can not provide this as an explicit specialization of std::char_traits<>. The Standard explicitly states that explicit specialization of templates defined in namespace std for anything other than user defined types yields undefined behavior. As a practical matter, such an explicit specialization might work on platform (a) and (b) above, but would clearly clash with the implementation defined extension on platform (c). Furthermore, such an approach causes problems if multiple libraries try it even on platforms (a) and (b).

 

Most users want the declaration of a std::basic_string<unsigned char> to work. The typical user simply expects it to, and the sophisticated user who understands why it doesn’t will still be annoyed with the alternative. After all, the Standard uses std::basic_string<char> so why shouldn’t it work the same for unsigned char?

 

A typical reaction to the fact that std::basic_string<unsigned char> will not work portably is to suggest std::vector<unsigned char>. While this ordinarily will solve an immediate problem, the use of vector instead of basic_string for specializations of built-in types has potential performance hits which make it undesirable for reusable library code. Vector is required to work with any type. As a result, internal copying of vector’s storage must involve loops which invoke copy constructors and destructors. While these operations are NOOPs for built-in types, the loops will still exist and it becomes a “quality of implementation” issue whether they will be optimized away. This is clearly a valid concern, or the Standard would not require the copy(), move(), and assign() operations to be provided by any valid character traits class.

 

Suggested resolution: the Standard should require that explicit specializations of std::char_traits<> be provided for all built-in character types.

 

Proposed resolution:

 

Change 21.1/1 as follows:

This subclause defines requirements on classes representing character traits, and defines declares a class template char_traits<charT>, along with two four specializations, char_traits<char>, and char_traits<wchar_t>, char_traits<unsigned char>, and char_traits<signed char>, that satisfy those requirements.

 

Change 21.1/4 as follows:

This subclause specifies a struct template, char_traits<charT>, and two four explicit specializations of it, char_traits<char>,  and char_traits<wchar_t>, char_traits<unsigned char>, and char_traits<signed char>,  all of which appear in the header <string> and satisfy the requirements below.

 

Change 21.1.3 to read:

namespace std {

    template <> struct char_traits <char >;

    template <> struct char_traits < wchar_t >;

    template <> struct char_traits<unsigned char>;

    template <> struct char_traits<signed char>;

}

 

Change 21.1.3/1 as follows:

The header <string> declares two four structs that are specializations of the template struct char_traits.

 

 

 

Add  

21.1.3.3 struct char_traits<unsigned char>

 

namespace std {

  template <>

  struct char_traits <unsigned char > {

    typedef unsigned char char_type ;

    typedef int int_type ;

    typedef streamoff off_type ;

    typedef streampos pos_type ;

    typedef mbstate_t state_type ;

    static void assign ( char_type & c1 , const char_type & c2 );

    static bool eq( const char_type & c1 , const char_type & c2 );

    static bool lt( const char_type & c1 , const char_type & c2 );

    static int compare ( const char_type * s1 , const char_type * s2 , size_t n);

    static size_t length ( const char_type * s);

    static const char_type * find ( const char_type * s , size_t n,

    const char_type & a);

    static char_type * move ( char_type * s1 , const char_type * s2 , size_t n);

   static char_type * copy ( char_type * s1 , const char_type * s2 , size_t n);

    static char_type * assign ( char_type * s , size_t n , char_type a);

    static int_type not_eof ( const int_type & c);

    static char_type to_char_type ( const int_type & c);

    static int_type to_int_type ( const char_type & c);

    static bool eq_int_type ( const int_type & c1 , const int_type & c2 );

    static int_type eof ();

  };

}

 

The header <string> (21.2) declares a specialization of the template struct char_traits for unsigned char. It is for narroworiented iostream classes.

 

The defined types for int_type, pos_type, off_type, and state_type are int, streampos, streamoff, and mbstate_t respectively.

 

The type streampos is an implementation-defined type that satisfies the requirements for POS_T in 21.1.2.

 

The type streamoff is an implementation-defined type that satisfies the requirements for OFF_T in 21.1.2.

 

The type mbstate_t is defined in <cwchar> and can represent any of the conversion states possible to occur in an implementation-defined set of supported multibyte character encoding rules.

 

The two-argument member assign is defined identically to the built-in operator =. The two-argument members eq and lt are defined identically to the built-in operators == and < for type unsigned char.

 

The member eof() returns EOF.

 

 

Add

21.1.3.4 struct char_traits<signed char>

 

namespace std {

  template <>

  struct char_traits <signed char > {

    typedef signed char char_type ;

    typedef int int_type ;

    typedef streamoff off_type ;

    typedef streampos pos_type ;

    typedef mbstate_t state_type ;

    static void assign ( char_type & c1 , const char_type & c2 );

    static bool eq( const char_type & c1 , const char_type & c2 );

    static bool lt( const char_type & c1 , const char_type & c2 );

    static int compare ( const char_type * s1 , const char_type * s2 , size_t n);

    static size_t length ( const char_type * s);

    static const char_type * find ( const char_type * s , size_t n,

    const char_type & a);

    static char_type * move ( char_type * s1 , const char_type * s2 , size_t n);

   static char_type * copy ( char_type * s1 , const char_type * s2 , size_t n);

    static char_type * assign ( char_type * s , size_t n , char_type a);

    static int_type not_eof ( const int_type & c);

    static char_type to_char_type ( const int_type & c);

    static int_type to_int_type ( const char_type & c);

    static bool eq_int_type ( const int_type & c1 , const int_type & c2 );

    static int_type eof ();

  };

}

 

The header <string> (21.2) declares a specialization of the template struct char_traits for signed char. It is for narroworiented iostream classes.

 

The defined types for int_type, pos_type, off_type, and state_type are int, streampos, streamoff, and mbstate_t respectively.

 

The type streampos is an implementation-defined type that satisfies the requirements for POS_T in 21.1.2.

 

The type streamoff is an implementation-defined type that satisfies the requirements for OFF_T in 21.1.2.

 

The type mbstate_t is defined in <cwchar> and can represent any of the conversion states possible to occur in an implementation-defined set of supported multibyte character encoding rules.

 

The two-argument member assign is defined identically to the built-in operator =. The two-argument members eq and lt are defined identically to the built-in operators == and < for type unsigned char.

 

The member eof() returns EOF.