Doc. no. N2070=06-0140
Date: 2006-09-08
Project: Programming Language C++
Reply to: Martin Sebor

Enhancing the time_get facet for POSIX® compatibility

Index

Motivation

The time_get and time_put facets provide a low-level asymmetric interface for the parsing and formatting of time values. The interfaces are asymmetric because the time_put facet is capable of producing a much larger set of sequences than the time_get facet is capable of parsing. The time_put interface can also readily expose useful implementation-defined extensions by recognizing additional formatting specifiers and modifiers while the time_get interface provides no such flexibility. The behavior of the time_put facet is specified in terms of the C standard library function strftime and the facet's interface allows programs to take advantage of the rich set of the 60 or so strftime conversion specifies (including their optional modifiers). In contrast, the behavior of time_get is restricted to parsing a limited set time and date sequences produced by a handful of formatting specifiers, namely the locale-independent and trivial %T (which is the same as "%H:%M:%S", the 24 hour time representation), the locale-specific and less trivial %x (the locale's date representation), and to parsing simple weekday names (%a and %A) and the names of calendar months (%b and %B). Presumably, this restriction exists only because the C standard library provides no function for parsing time sequences. Such a function is, however, specified by the ISO/IEC 9945 standard (also known as POSIX) -- see strptime. Thus, C++ programs that need to process date and time sequences produced by any of the other other 56 or so formatting specifiers are unable to do so by relying on the time_get's parsing functionality, even though much of it often exists in implementations that parse non-trivial date sequences but is not exposed in the interface of the facet. For instance, even the simple task of parsing a 12 hour time representation is beyond the ability of the facet, as is the often needed ability to recognize and interpret time zones.

Description

This paper proposes to extend the time_get facet interface in a way to permit the parsing of most of the same set of date and time sequences as produced by time_put, thus providing a subset of the same functionality as POSIX strptime. Specifically, we propose to add two get and one do_get member functions to class time_get to parallel those declared by time_put.

Proposed Changes

Add to the declaration of class time_get in [lib.locale.time.get], immediately below the declaration of the member function get_year, the following:

iter_type get (iter_type s, iter_type end, ios_base& f, ios_base::iostate& err, tm* t, char format, char modifier = 0) const;
iter_type get (iter_type s, iter_type end, ios_base& f, ios_base::iostate& err, tm* t, const char_type* fmt, const char_type *end) const;

Add to the declaration of class time_get, immediately below the declaration of the virtual member function do_get_year, the following:

virtual iter_type get (iter_type s, iter_type end, ios_base& f, ios_base::iostate& err, tm* t, char format, char modifier) const;

Add to the end of [lib.locale.time.get.members] the following text:

iter_type get (iter_type s, iter_type end, ios_base& f, ios_base::iostate& err, tm* t, char format, char modifier = 0) const;

Returns: do_get(s, end, f, err, t, format, modifier)

iter_type get (iter_type s, iter_type end, ios_base& f, ios_base::iostate& err, tm* t, const char_type* fmt, const char_type* end) const;

Requires: [fmt, end) is a valid range.

Effects: The function starts by evaluating err = ios_base::goodbit. It then enters a loop, reading zero or more characters from s at each iteration. Unless otherwise specified below, the loop terminates when the first of the following conditions holds:

Note: The function uses the ctype<charT> facet installed in f's locale to determine valid whitespace characters. It is unspecified by what means the function performs case-insensitive comparison or whether multi-character sequences are considered while doing so.

Returns: s.

Add the following paragraphs to the end of [lib.locale.time.get.virtuals]:

iter_type do_get (iter_type s, iter_type end, ios_base& f, ios_base::iostate& err, tm* t, char format, char modifier) const;

Requires: t is a valid pointer.

Effects: The function starts by evaluating err = ios_base::goodbit. It then reads characters starting at s until it encounters an error, or until it has extracted those struct tm members, and any remaining format characters, corresponding to a conversion directive appropriate for the ISO/IEC 9945 function strptime formed by concatenating '%', the modifier character, when non-NUL, and the format character. When the concatenation fails to yield a valid complete directive the function leaves the object pointed to by t unchanged and evaluates err |= ios_base::failbit. When (s == end) evaluates to true after reading a character the function evaluates err |= ios_base::eofbit.

Note: It is unspecified whether multiple calls to do_get() with the address of the same struct tm object will update the current contents of the object or simply overwrite its members. Portable programs must zero out the object before invoking the function.

Returns: An iterator pointing immediately beyond the last character recognized as possibly part of a valid input sequence for the given format and modifier.

Implementation

A reference implementation of this extension is available for review in the Open Source Apache C++ Standard Library. The same extension has been implemented in the Rogue Wave® C++ Standard Library and shipped since 2001. See this page for the latest documentation of the feature.

Impact On Programs

The proposed extensions are largely source compatible with the existing interface of the time_getfacet (there is a very small chance that the introduction of a new a base class member function might affect the well-formedness or even the behavior of a program that calls a function with the same name in a class derived from the base). Adding a new virtual member function is a binary incompatible change.