Doc. no. N2321=07=0181
Obsoletes: N2211=07=0071
Date: 2007-06-22
Project: Programming Language C++
Reply to: Martin Sebor

Enhancing the time_get facet for POSIX® compatibility, Revision 2

Index

Changes in This Revision

This is a minor revision of the proposal that clarifies the permission granted to implementations in Revision 1 of the document to fail to parse input sequences using complex conversion directives such as %c, %x, and %X, so as to extend to the same sequences even when they involve the optional modifiers E and O.

In addition this revision adds a Comaptibility paragraph.

It should be noted that cases where the function may not be able to correctly parse even complex sequences should be quite rare especially on POSIX platforms where the function nl_langinfo may be used to retrieve the broken-down string consisting of a sequence of simple conversion directives corresponding to each of the complex ones. For example, in the C locale, the broken-down string corresponding to the %c directive is "%a %b %e %T %Y". The nl_langinfo function also makes it possible to retrieve the alternative symbols used instead of ordinary digits in directives involving the E and O modifiers.

Motivation

The time_get and time_put facets provide a low-level asymmetric interface for the parsing and formatting of time values. The interfaces are asymmetric because the time_put facet is capable of producing a much larger set of sequences than the time_get facet is capable of parsing. The time_put interface can also readily expose useful implementation-defined extensions by recognizing additional formatting specifiers and modifiers while the time_get interface provides no such flexibility. The behavior of the time_put facet is specified in terms of the C standard library function strftime and the facet's interface allows programs to take advantage of the rich set of the 60 or so strftime conversion specifies (including their optional modifiers). In contrast, the behavior of time_get is restricted to parsing a limited set of time and date sequences produced by a handful of formatting specifiers, namely the locale-independent and trivial %T (which is the same as "%H:%M:%S", the 24 hour time representation), the locale-specific and less trivial %x (the locale's date representation), and to parsing simple weekday names (%a and %A) and the names of calendar months (%b and %B). Presumably, this restriction exists only because the C standard library provides no function for parsing time sequences. Such a function is, however, specified by the ISO/IEC 9945 standard (also known as POSIX) -- see strptime. Thus, C++ programs that need to process date and time sequences produced by any of the other 56 or so formatting specifiers are unable to do so by relying on the time_get facet's parsing functionality, even though much of it often exists in implementations that parse non-trivial date sequences but is not exposed in the interface of the facet. For instance, even the simple task of parsing a 12 hour time representation is beyond the ability of the facet, as is the often needed ability to recognize and interpret time zones.

Description

This paper proposes to extend the time_get facet interface in a way to permit the parsing of most of the same set of date and time sequences as produced by time_put, thus providing a subset of the same functionality as POSIX strptime. Specifically, we propose to add two get and one do_get member functions to class template time_get to parallel those declared by time_put.

Proposed Changes

Add to the declaration of class time_get in [lib.locale.time.get], immediately below the declaration of the member function get_year, the following declarations:

iter_type get (iter_type s, iter_type end, ios_base& f, ios_base::iostate& err, tm* t, char format, char modifier = 0) const;
iter_type get (iter_type s, iter_type end, ios_base& f, ios_base::iostate& err, tm* t, const char_type* fmt, const char_type *end) const;

Add to the declaration of class time_get, immediately below the declaration of the virtual member function do_get_year, the following declaration:

virtual iter_type do_get (iter_type s, iter_type end, ios_base& f, ios_base::iostate& err, tm* t, char format, char modifier) const;

Add to the end of [lib.locale.time.get.members] the following text:

iter_type get (iter_type s, iter_type end, ios_base& f, ios_base::iostate& err, tm* t, char format, char modifier = 0) const;

Returns: do_get(s, end, f, err, t, format, modifier)

iter_type get (iter_type s, iter_type end, ios_base& f, ios_base::iostate& err, tm* t, const char_type* fmt, const char_type *end) const;

Requires: [fmt, end) is a valid range.

Effects: The function starts by evaluating err = ios_base::goodbit. It then enters a loop, reading zero or more characters from s at each iteration. Unless otherwise specified below, the loop terminates when the first of the following conditions holds:

Note: The function uses the ctype<charT> facet installed in f's locale to determine valid whitespace characters. It is unspecified by what means the function performs case-insensitive comparison or whether multi-character sequences are considered while doing so.

Returns: s.

Add the following paragraphs to the end of [lib.locale.time.get.virtuals]:

virtual iter_type do_get (iter_type s, iter_type end, ios_base& f, ios_base::iostate& err, tm* t, char format, char modifier) const;

Requires: [fmt, end) is a valid range and t is dereferenceable.

Effects: The function starts by evaluating err = ios_base::goodbit. It then reads characters starting at s until it encounters an error, or until it has extracted and assigned those struct tm members, and any remaining format characters, corresponding to a conversion directive appropriate for the ISO/IEC 9945 function strptime, formed by concatenating '%', the modifier character, when non-NUL, and the format character. When the concatenation fails to yield a complete valid directive the function leaves the object pointed to by t unchanged and evaluates err |= ios_base::failbit. When (s == end) evaluates to true after reading a character the function evaluates err |= ios_base::eofbit.

For complex conversion directives such as %c, %x, or %X, or directives that involve the optional modifiers E or O, when the function is unable to unambiguously determine some or all struct tm members from the input sequence [s, end), it evaluates err |= ios_base::eofbit. In such cases the values of those struct tm members are unspecified and may be outside their valid range.

Note: It is unspecified whether multiple calls to do_get() with the address of the same struct tm object will update the current contents of the object or simply overwrite its members. Portable programs must zero out the object before invoking the function.

Returns: An iterator pointing immediately beyond the last character recognized as possibly part of a valid input sequence for the given format and modifier.

Implementation

A reference implementation of this extension is available for review in the Open Source Apache C++ Standard Library. The same extension has been implemented in the Rogue Wave® C++ Standard Library and shipped since 2001. See this page for the latest documentation of the feature.

Impact On Programs

The proposed extensions are largely source compatible with the existing interface of the time_get facet (there is a very small chance that the introduction of a new a base class member function might affect the well-formedness or even the behavior of a program that calls a function with the same name in a class derived from the base).

Compatibility

Adding a new virtual member function is a binary incompatible change. During the discussion of this proposal at the Oxford meeting in April 2007 a number of attendees expressed concern about introducing such a change in a Technical Report (such as TR2) and felt that a change of this nature would be more appropriate for the upcoming revision of the C++ standard.