Doc. no.   WG21/N1973=06-0043
Date:         2006-04-10
Project:     Programming Language C++
Reply to:   Kevlin Henney <kevlin@curbralan.com>
                Beman Dawes <bdawes@acm.org>

Lexical Conversion Library Proposal for TR2

Introduction
Motivation and Scope
Impact on the Standard
Important Design Decisions
Proposed Text for TR2
    Synopsis
    Function template lexical_cast
    Class bad_lexical_cast

Introduction

This paper proposes addition of a lexical conversion library component to the C++ Standard Library Technical Report 2. The proposal is based on the Boost Conversion Library's lexical_cast (www.boost.org/libs/conversion/lexical_cast.htm).

The lexical_cast function template offers a convenient and consistent form for supporting common conversions to and from arbitrary types when they are represented as text. The Boost version of lexical_cast is very widely used. It would be a pure addition to the C++ standard.

Boost lexical_cast is particularly popular with end users. Five of six Who's using Boost in house users list lexical_cast as one of the Boost libraries they use.

For a good discussion of the options and issues involved in string-based formatting, including comparison of stringstream, lexical_cast, and others, see Herb Sutter's article, The String Formatters of Manor Farm.

Also see Bjrn Karlsson, Beyond the C++ Standard Library, 73-77, Addison Wesley, ISBN 0-321-13354-4, www.awprofessional.com/title/0321133544

Motivation and Scope

Why is this important?

Sometimes a value must be converted to a literal text form, such as an int represented as a string, or vice-versa, when a string is interpreted as an int. Such examples are common when converting between data types internal to a program and representation external to a program, such as windows and configuration files.

The standard C and C++ libraries offer a number of facilities for performing such conversions. However, they vary with their ease of use, extensibility, and safety.

For instance, there are a number of limitations with the family of standard C functions typified by atoi:

The standard C functions typified by strtol have the same basic limitations, but offer finer control over the conversion process. However, for the common case such control is often either not required or not used. The scanf family of functions offer even greater control, but also lack safety and ease of use.

The standard C++ library offers stringstream for the kind of in-core formatting being discussed. It offers a great deal of control over the formatting and conversion of I/O to and from arbitrary types through text. However, for simple conversions direct use of stringstream can be either clumsy (with the introduction of extra local variables and the loss of infix-expression convenience) or obscure (where stringstream objects are created as temporary objects in an expression). Facets provide a comprehensive concept and facility for controlling textual representation, but their perceived complexity and high entry level requires an extreme degree of involvement for simple conversions, and excludes all but a few programmers.

The lexical_cast function template offers a convenient and consistent form for supporting common conversions to and from arbitrary types when they are represented as text. The simplification it offers is in expression-level convenience for such conversions. For more involved conversions, such as where precision or formatting need tighter control than is offered by the default behavior of lexical_cast, the conventional stringstream approach is recommended. Where the conversions are numeric to numeric, other approaches may offer more reasonable behavior than lexical_cast.

What kinds of problems does it address, and what kinds of programmers is it intended to support?

The library addresses everyday needs, for both application programs and libraries. It is useful across many application domains. It is useful to all levels of programmers, from rank beginners to seasoned experts.

Is it based on existing practice? Is there a reference implementation?

Yes, very much so. It has been a mainstay of Boost for many years.

Impact on the Standard

What does it depend on, and what depends on it?

It depends on some standard library components. No other proposals depend on it.

Is it a pure extension, or does it require changes to standard components?

It is a pure extension.

Can it be implemented using today's compilers, or does it require language features that will only be available as part of C++0x?

It can be (and has been) implemented with current compilers, and also many older compilers.

Important Design Decisions

FAQ

Why is the << plus >> analogy broken for the std::string special case?

The default asymmetric behavior of I/O for strings is often a cause for surprise amongst novices and, when wrapped inside lexical_cast, experts as well. Converting from a string and back again is expected to be an identity operation, which is what is now supported. This expectation is important, and the response is to make the behavior consistent with the intent of the conversion rather than its underlying implementation. Over time, lexical_cast has become more symmetric with respect to its conversions.

There is also a little bit of handling to ensure that numeric types do not lose precision. Again, the I/O stream defaults are not what many people would expect. And then there is special support for wchar_t<->char conversions, because again I/O streams don't quite do the right thing. We are not in a position to change I/O streams at this late stage, but something like lexical_cast is not required to repeat those little surprises.

Before these changes, Boost regularly received complaints and bug reports about lexical_cast behavior. Once the changes were made, complaints and bug reports stopped.

I don't like the name. Why don't you change it?

Suggestions always welcome. However, until something better comes along, the proposal authors don't believe that there is sufficient reason to change from lexical_cast, which is very well established, used in books and other teaching material, and does not seem to cause confusion among real users.

Since either the source or target are usually strings, why not provide separate to_string(x) and string_to<t>(x) functions?

The source or target isn't always a string. Furthermore, the from/to idea cannot be expressed in a simple and consistent form. The illusion is that they are easier than lexical_cast because of the name. This is theory. The practice is that the two forms, although similarly and symmetrically named, are not at all similar in use: one requires explicit provision of a template parameter and the other not. This is a simple usability pitfall that is guaranteed to catch experienced and inexperienced users alike -- the only difference being that the experienced user will know what to do with the error message.

Change history


Proposed Text for Technical Report 2

Text in gray is commentary and not part of the proposed text.


Synopsis

Choice of a new or existing header is deferred pending outcome of other conversion related proposals.

namespace std
{
  namespace tr2
  {
    class bad_lexical_cast;
    template<typename Target, typename Source>
      Target lexical_cast(const Source& arg);
  }
}

Function template lexical_cast

The lexical_cast function template supplies common conversions to and from arbitrary types represented as text, providing expression-level convenience for such conversions.

The requirements on the argument and result types are:

lexical_cast behavior is specified in terms of operator<< and operator>> on a std::basic_stringstream object. Implementations are not required to actually use a std::basic_stringstream object to achieve the required behavior. Implementations are permitted to provide specializations of the lexical_cast template.

[Note: Implementations may use this "as if" leeway to achieve efficiency. -- end note.]

template<typename Target, typename Source>
  Target lexical_cast(const Source& arg);

Effects:

Throws: bad_lexical_cast if:

Returns: The result as created by the effects.

Remarks: If Target is either std::string or std::wstring, stream extraction takes the whole content of the string, including spaces, rather than relying on the default operator>> behavior.

The character type of the underlying stream is assumed to be char unless either the Source or the Target requires wide-character streaming, in which case the underlying stream uses wchar_t. Source types that require wide-character streaming are wchar_t, wchar_t *, and std::wstring. Target types that require wide-character streaming are wchar_t and std::wstring.

If std::numeric_limits<Target>::is_specialized, the underlying stream precision is set according to std::numeric_limits<Target>::digits10 + 1, otherwise if std::numeric_limits<Source>::is_specialized, the underlying stream precision is set according to std::numeric_limits<Source>::digits10 + 1.

[Note: Where a higher degree of control is required over conversions, std::stringstream and std::wstringstream offer a more appropriate path. Where non-stream-based conversions are required, lexical_cast is the wrong tool for the job and is not special-cased for such scenarios. -- end note.]

Class bad_lexical_cast

namespace std
{
  namespace tr2
  {
    class bad_lexical_cast : public std::bad_cast
    {
    public:
      bad_lexical_cast () throw ();
      bad_lexical_cast ( const bad_lexical_cast &) throw ();
      bad_lexical_cast & operator =( const bad_lexical_cast &) throw ();
      virtual const char * what () const throw ();
    };
  }
}

The virtual destructor is not shown, following the practice of 18.5.2 Class bad_cast [lib.bad.cast].

The class bad_lexical_cast defines the type of objects thrown as exceptions by the implementation to report runtime lexical_cast failure.

bad_lexical_cast () throw ();

Effects: Constructs an object of class bad_lexical_cast.

Remarks: The result of calling what() on the newly constructed object is implementation-defined.

bad_lexical_cast ( const bad_lexical_cast &) throw ();
bad_lexical_cast & operator =( const bad_lexical_cast &) throw ();

Effects: Copies an object of class bad_lexical_cast.

virtual const char * what () const throw ();

Returns: An implementation-defined NTBS.

Remarks: The message may be a null-terminated multibyte string (17.3.2.1.3.2), suitable for conversion and display as a wstring (21.2, 22.2.1.4)


Copyright Kevlin Henney 2000-2005
Copyright Beman Dawes 2006

Last revised: 2006-04-10