ISO/ IEC JTC1/SC22/WG14 N764

SC22/WG14 N764    J11/97/128

Issues about time
Clive D.W. Feather
clive@demon.net
1997-09-22


Abstract
--------
N735 contained a number of items (20 to 25) concerning the time-related
functions. In addition, N733 added four new conversion specifiers to
strftime().

Since discussion in the ISO-8601 community shows that some of these changes
were flawed, and since all these items are related, this paper attempts to
address all these issues at once.


Discussion - ISO 8601 weeks
---------------------------

ISO 8601 specifies the concept of a week number in the year and the day
within the week. Weeks always begin on a Monday, so, for example, Wednesday
12th January 1997 is the third day of week two of 1997 which, in ISO format,
is "1997-W02-3" or "1997W023".

The first week of the year is specified to be that containing January 4th
or, equivalently, that containing the first Thursday of January. However,
ISO 8601 does not explicitly show how to indicate days in January before
week 1 of the year, or days in December that are in the same week as week 1
of the next year. For example, in 1999 week 1 starts on Monday 4rd January,
and so there is an issue as to how to express January 1st to 3rd; similarly,
in 1998 week 1 includes Thursday 1st January, and so there is an issue as to
how to express the last three days of 1997.

The changes in N733 assumed that dates always belong to the current year.
However, current practice among users of ISO 8601 is to give every day of
a week the same week and year number. Thus we see the following:

    Date              N733            Current practice
    1998-12-31        1998-W53-4      1998-W53-4
    1999-01-01        1999-W00-5      1998-W53-5
    1999-01-02        1999-W00-6      1998-W53-6
    1999-01-03        1999-W00-7      1998-W53-7
    1999-01-04        1999-W01-1      1998-W01-1

    1997-12-28        1997-W52-7      1997-W52-7
    1997-12-29        1997-W53-1      1998-W01-1
    1997-12-30        1997-W53-2      1998-W01-2
    1997-12-31        1997-W53-3      1998-W01-3
    1998-01-01        1998-W01-4      1998-W01-4

Current practice also uses different letters for the specifiers.

Given this, the changes of N733 should be altered.


Proposals
---------
Part A
------

In subclause 7.16.3.5 (strftime()), paragraph 3 (the list of specifiers):

* Change the item %f to be %u (wording unaltered).

* Change the wording of %V to be:

    %V  is replaced by the ISO 8601 week number (see below) as a decimal
        number (01-53).

* Add the following items:

    %g  is replaced by the last 2 digits of the week-based year (see below)
        as a decimal number (00-99).
    %G  is replaced by the week-based year (see below) as a decimal number
        (e.g. 1997).

* Add the following text at the end of the list:

    %g, %G, and %V give values according to the ISO 8601 week-based year.
    In this system, weeks begin on a Monday and week 1 of the week is the
    week that includes both January 4th and the first Thursday of the year.
    If the first Monday of January is the 2nd, 3rd, or 4th, the preceeding
    days are part of the last week of the preceeding year; thus Saturday
    2nd January 1999 will have %G == 1998 and %V == 53. If December 29th,
    30th, or 31st is a Monday, it and any following days are part of week 1
    of the following year. Thus Tuesday 30th December 1997 will have
    %G == 1998 and %V == 1.


Part B
------

In subclause 7.16.3.5 (strftime()), paragraph 3 (the list of specifiers),
change the wording of the following items to avoid confusion (e.g. 2000 is
in the 20th century but 2001 is in the 21st):

    %y  is replaced by the last 2 digits of the year as a decimal
        number (00-99).
    %Y  is replaced by the whole year as a decimal number (e.g. 1997).


Part C
------
[Was N735 item 23]

In subclause 7.16.1, change the range of tm_sec to [0,60] and remove
footnote 243. See various WG14 mailing list items (e.g. 3482) or:
# The International Earth Rotation Service periodically uses leap seconds
# to keep UTC to within 0.9 s of TAI (atomic time); see
# Terry J Quinn, The BIPM and the accurate measure of time,
# Proc IEEE 79, 7 (July 1991), 894-905.


Part D
------
[Was N735 item 20]

Some specifiers of strftime() generate a number, and this usually has a
known width and is zero filled (though %Y is potentially an exception).
However, the other specifiers generate items of an unknown width. While it
is possible to expand the value into a string and then then manipulate it,
this is an inconvenient approach.

In subclause 7.16.3.5 (strftime()), paragraph 2, change:

    A conversion specifier consists of a % character followed by a
    character that determines the behavior of the conversion specifier.

to:

    A conversion specifier consists of a % character followed by a
    character that determines the behavior of the conversion specifier,
    possibly separated by various modifiers.

Add the following paragraph between paragraphs 2 and 3:

    Any or all of the following may occur between the % character and the
    letter of a conversion specifier (they are not permitted for %%). Those
    that appear must be in this order:
    * A minus sign, indicating that any padding (see below) is to be on the
      right, not the left.
    * A field width, as a decimal integer. If the replacement string has
      fewer characters than the field width, it is padded with spaces on
      the left (right if a minus sign was used).
    * A dot followed by a precision, as a decimal integer.
      * if the specifier produces a decimal number which contains more
        characters than the precision, then sufficient leading zeros (if
        available) are removed from the replacement string until the
        precision is reached;
      * otherwise, if the replacement string contains more characters than
        the precision, then only that number of characters are placed in
        the array, taken from the left end of the replacement string.


Part E
------
[Was N735 item 22]

The only facilities for generating the time zone are a locale-specific
specifier (%z) in strftime(). However, zone names are not standardised,
and there are two common numeric formats which give the offset from UTC:
ISO 8601 and Internet common practice. Both use the notation "+0830" to
mean an offset of 8 hours 30 minutes, but the signs differ: ISO 8601 uses
+ for east of Greenwich, while Internet common practice uses it for west
of Greenwich.

Add the following conversion specifiers to subclause 7.16.3.5 (strftime())
paragraph 3:

    %o  is replaced by the offset from UTC in the form "+0830" (meaning
        8 hours 30 minutes behind). This format is common on the Internet.
    %O  is replaced by the offset from UTC in the form "-0830" (meaning
        8 hours 30 minutes behind). This is the ISO 8601 format.


Part F
------
[Was N735 item 24]

Subclause 7.16.3.5 (strftime()) is unclear on how the values of the
members of /timeptr/ affect the result, especially if they are outside
the normal range.

Add one of the following sets of wording, in each case after
paragraph 4:

Option [Fa]:

    If the value of any member of the structure pointed to by /timeptr/
    is out of the normal range, or the values are not consistent with
    one another [*], the behaviour is undefined.

    [*] For example, the contents represent "30th Feb", "29th Feb 1997",
    or "Monday 10th May 1997".

Option [Fb]:

    If the value of any member of the structure pointed to by /timeptr/
    is out of the normal range, or the values are not consistent with
    one another [*], the value returned and the contents of the array
    are unspecified.

    [*] For example, the contents represent "30th Feb", "29th Feb 1997",
    or "Monday 10th May 1997".

Option [Fc]:

    The characters placed in the array by each conversion specifier depend
    on a member of the structure pointed to by /timeptr/, as specified in
    brackets in the description. If this value is outside the normal range,
    the characters stored are unspecified.

If option [Fc] is taken, add the following to each specifier in
paragraph 3:

    %a [tm_wday]
    %A [tm_wday]
    %b [tm_mon]
    %B [tm_mon]
    %c [all specified in 7.16.1]
    %d [tm_mday]
    %H [tm_hour]
    %I [tm_hour]
    %j [tm_yday]
    %m [tm_mon]
    %M [tm_min]
    %p [tm_hour]
    %S [tm_sec]
    %U [tm_year, tm_wday, tm_yday]
    %w [tm_wday]
    %W [tm_year, tm_wday, tm_yday]
    %x [all specified in 7.16.1]
    %X [all specified in 7.16.1]
    %y [tm_year]
    %Y [tm_year]
    %Z [tm_isdst]

If part A is accepted, add:
    %g [tm_year, tm_wday, tm_yday]
    %G [tm_year, tm_wday, tm_yday]
    %u [tm_wday]
    %V [tm_year, tm_wday, tm_yday]

If part E is accepted, add:
    %o [tm_isdst]
    %O [tm_isdst]

If part H is accepted, then %o, %O, and %Z become
    %o [tm_utcoffset, tm_isdst, tm_xisdst].
    %O [tm_utcoffset, tm_isdst, tm_xisdst].
    %Z [tm_utcoffset, tm_isdst, tm_xisdst].


Part G
------
[Was N735 item 25]

Those conversion specifiers in subclause 7.16.3.5 (strftime()) that
generate variable strings should have values specified for the C locale.

Add at the end of the subclause:

    In the C locale the replacement strings for the following specifiers
    are:

        %a  the first three characters of %A
        %A  one of "Sunday", "Monday", ..., "Saturday"
        %b  the first three characters of %B
        %B  one of "January", "February", ..., "December"
        %c  equivalent to "%A %B %d %T %Y"
        %P  one of "am" or "pm"
        %x  equivalent to "%A %B %d %Y"
        %X  equivalent to "%T"
        %Z  implementation-defined


Part H
------
[Was N735 item 21]

The conversion carried out by localtime() does not provide any way of
determining the time zone used, and the normalization done by mktime()
does not specify how DST changes are handled. Similarly, many systems
are now aware of leap seconds, but the Standard is not clear on how
these are to be handled. Adding this information is not trivial, because
there is no obvious way to extend /struct tm/ in a compatible manner.
This proposal therefore contains a kludge.

[The following is not final wording, as I wanted to see agreement on the
semantics before trying to craft them. Given that time is short, I will
attempt to produce final wording if I have the opportunity.]

Add the following fields to struct tm:

    int tm_version;    /* version number of the structure layout */
    int tm_utcoffset;  /* offset from UTC in minutes - [-1439, +1439] */
    int tm_leapsecs;   /* leap seconds applied */
    int tm_xisdst;     /* daylight saving time flag - [-1, +1439] */

and add the following macros to <time.h>, all constant integral
expressions capable of being stored in an object of type int:

    _EXTENDED_TM
    _NO_LEAP_SECONDS
    _LOCALTIME

The gmtime() function shall set tm_utcoffset to 0, while the localtime()
function shall set it according to the local time zone, including any
DST corrections; a positive value for tm_utcoffset indicates ahead of
UTC, so that PDT is represented by -420. If the implementation is unable
to determine the local zone, localtime() shall set this field to
_LOCALTIME and gmtime() shall fail.

Both functions shall set tm_isdst to represent whether DST is (believed
to be) in effect at the represented time, and tm_xisdst to -1, 0, or the
(positive) size of the DST offset, in minutes, according as whether
tm_isdst is less than, equal to, or greater than zero.

Both functions shall set tm_leapsecs to indicate the number of leap
seconds that have been applied to the resulting value (if tm_sec == 60,
the relevant leap second is *not* included in the count). If the
implementation is not aware of leap seconds, it shall set tm_leapsecs to
_NO_LEAP_SECONDS.

Both functions shall set tm_version to 1.

The mktime() function shall behave as follows. If the tm_isdst field is
equal to _EXTENDED_TM, then the tm_version field shall be 1. The broken
down time is normalized according to the following rules, and also
converted to a time_t representation.

If the call is successful, a second call to mktime() with the resulting
struct tm value shall always leave it unchanged and return the same
value as the first call.

If the call is successful and the normalized time is exactly representable
as a time_t value, then the normalized broken-down time, and the
broken-down time generated by converting the result of mktime() as if by
a call to localtime(), shall be identical except that, if the tm_isdst
member of the former originally had the value _EXTENDED_TM, it shall remain
unchanged.

A time is normalized according to the following rules. The principle
behind normalization is that the date is converted to a number of
seconds past some epoch, and then converted back to the correct
normalized form.

If the tm_isdst member does not equal _EXTENDED_TM, then the rules shall
be applied as if:
- tm_leapsecs is _NO_LEAP_SECONDS;
- tm_utcoffset is _LOCALTIME;
- tm_xisdst is -1, 0, or +60 according to whether tm_isdst is less than,
  equal to, or greater than zero.

All dates are in the Gregorian calendar. Thus a value of -800 for
tm_year represents 1100 CE, while a value of -2000 represents -100 CE
(99 BCE); neither are leap years, while -2300 (-400 CE, 399 BCE) is.

The value of tm_leapsecs is the number of leap seconds applied (the
value of UTC-UT0) at the represented time. It should therefore be added
to the value determined by (days*86400 + hours*3600 + mins*60 +
seconds). If the value is _NO_LEAP_SECONDS, then the implementation
should determine the correct number if it can, and use 0 otherwise.

The value of tm_utcoffset is a number of minutes to be subtracted from
the time to convert it to UTC. The value _LOCALTIME is a request for the
implementation to determine this; if it is unknown, it should assume
that local time is UTC plus any DST offset determined from tm_xisdst.

If tm_mon is outside the range [0, 11], it shall be converted to that
range by adding or subtracting a multiple of 12 and adjusting the year
accordingly. This shall then be used to determine the number of days in
the year prior to the month. Thus tm_year == 97 and tm_mon == -8
represents May of 1996, a leap year.

Apart from this, the final date can be determined simply by adding
together the various fields, each with a suitable weight, to get the
number of seconds past the epoch.

The normalization should be exact provided that there is no unreasonable
overflow. I would consider reasonable limitations to be that each of the
following expressions are in the range [-1<<30,+1<<30]:
    tm_year * 366
    tm_mon  * 31
    tm_mday
    tm_hour * 3600
    tm_min  * 60
    tm_sec
    tm_leapsecs
    tm_utcoffset * 60
    tm_xisdst * 60       [if nonnegative, else tm_xisdst must be -1]

This would ensure that separate "seconds in the day" and "days since
epoch" calculations won't overflow in 32 bits.

==== ENDS ====