ISO/ IEC JTC1/SC22/WG14 N833

                                                       N833 J11/98-032

	  printf/scanf modifiers for inttypes abstract types
			     Randy Meyers
			     May 29, 1998


INTRODUCTION

At the London meeting, Clive Feather proposed printf/scanf modifiers
for intmax_t and other such "abstract" integer types.  The committee
rejected the proposal initially because of concerns with release to
release binary compatibility.  (An issue I raised.)  There is great
likelihood that an implementation will change the size of its largest
supported integer type over time.  Such changes tend to break programs
that use printf from a shared library unless great care is taken by
the implementation.

At the Colorado meeting, the committee reconsidered.  Modifiers for
intmax_t and other types have great utility, and techniques for
preserving binary compatibility when changing datatypes
supported by printf/scanf are well known to implementations.  The
utility to the programmer outweighs the cost of implementation.

A straw poll at the Colorado meeting showed very strong support for
adding printf/scanf modifiers for intmax_t, size_t, and ptrdiff_t.  I
volunteered to write proposed words.

Unfortunately, there are no clearly good letters left to use as
modifiers.  So, this paper contains a list of candidates and lets the
committee resolve the issue via straw poll.  Exact wording is proposed
with a "fill in the blank" for the modifier letters.

All references to the Standard in this paper are against N828, the
pre-Copenhagen draft.

BACKGROUND

The C90 Standard, and the C9x draft (Subclause 7.26.8, Page 386,
Paragraph 1, says "Lowercase letters may be added to the conversion
specifiers in fprintf and fscanf.  Other characters may be used in
extensions."  Because of this, it would be wise to only consider
lowercase letters for the new printf/scanf modifiers.


Subclause 6.5, Page 58, Paragraph 7, allows an integer value to be
referenced by either an lvalue of either the signed or unsigned
version of its effective type.  Because of this, a strictly conforming
program may printf a signed number as unsigned or an unsigned number
as signed.  A suggestion that a single modifier letter be used for
size_t (an unsigned type) and ptrdiff_t (a signed type) is not wise.

The following table lists the lower and upper case letters in use by
the Standard or various implementations.  If the line for a letter
does not like a meaning or an implementation using the letter, the
letter is free for use.

a  hex float conversion spec (C9X)
   malloc flag (GNU)
b  unsigned binary integer (SCO)
   unknown use (Cray)
c  character conversion spec (std)
d  decimal integer conversion spec (std)
e  exponential float conversion spec (std)
f  fraction float conversion spec (std)
g  general float conversion spec (std)
h  short modifier (std)
i  integer conversion spec (std)
j
k
l  long modifier (std)
m  errno message conversion spec (GNU)
n  count conversion spec (std)
   pointer (SCO scanf-only)
o  octal integer conversion spec (std)
p  pointer conversion spec (std)
   pointer to pointer (SCO scanf-only)
q  quad (long long) modifier (BSD, GNU)
r
s  string conversion spec (std)
t
u  unsigned integer conversion spec (std)
v
w  wide modifier (SUN)
x  hex integer conversion spec (std)
y
z

A  hex float conversion spec (C9X)
B  no-op flag (AIX)
   unknown use (Cray)
   unsigned binary integer (SCO)
C  wide char conversion spec (lots-SUS)
D  same as ld (BSD deprecated)
E  exponential float conversion spec (std)
   same as le (BSD)
F  fraction float conversion spec (std)
   same as lf (BSD deprecated)
G  general float conversion spec (std)
   same as lg (BSD)
H
I
J  no-op flag (AIX)
K
L  long double modifier (std)
   long long int (GNU)
M
N  no-op flag (AIX)
O  same as lo (BSD deprecated)
P
Q
R
S  wide string conversion spec (lots-SUS)
T
U  same as lu (BSD deprecated)
V
W
X  hex integer conversion spec (std)
   same as lx (BSD)
Y
Z  size_t modifier (GNU)


Note that GNU has established a precedent using "Z" for size_t values.

DISCUSSION

Since there are few good letters left, the committee could take one of
the following approaches:

1) Only add a modifier for intmax_t.  For example, "z" (the last
letter) for intmax_t (the ultimate integer type).  All output
could be done by casting other integer types to intmax_t or uintmax_t,
and using the printf/scanf modifier "z".

I believe that a modifier for intmax_t is the single most important
part of this proposal, and I hope that the committee adds this at a
minimum.  However, I believe that modifiers for size_t and ptrdiff_t
are also useful.

2) The committee could pick three lower case letters to be the
printf/scanf modifiers for size_t, ptrdiff_t, and intmax_t.  For
example,
	z for size_t (GNU uses uppercase "Z")
	t for ptrdiff_t (random assignment)
	j for intmax_t (j is sort of like "i" for integer)

3) The committee could pick a single letter to be an "escape"
character for two character modifiers.  For example, "z" could be the
first character of a two character modifier sequence (not unlike
"ll").  This opens up the full set of letters to be used after "z" for
future uses, and allows mnemonic letters for the three integer types
discussed in this paper:
	zs for size_t
	zp for ptrdiff_t
	zm for intmax_t


I recommend the following straw votes:

1.  Does the committee want to add some additional modifiers for
integer types?

2.  Does the committee want modifiers for only intmax_t, or for size_t
and ptrdiff_t as well?

3.  Does the committee want to add an "escape" character to be used
for further modifiers (like zs, zp, zm)?

4.  Does the committee want to just pick three letter (which three)?


PROPOSED WORDING

Add the following three items to the list of fprintf length modifiers
(Subclause 7.19.6.1, Page 252, Paragraph 7).  (The underlined blanks
are to be filled in with the one or two letter modifiers chosen be the
committee.)

__	Specifies that a following d, i, o, u, x, or X conversion
	specifier applies to a intmax_t or uintmax_t argument.

__	Specifies that a following d, i, o, u, x, or X conversion
	specifier applies to a size_t (or the corresponding signed
	integer type) argument.

__	Specifies that a following d, i, o, u, x, or X conversion
	specifier applies to a ptrdiff_t (or the corresponding unsigned
	integer type) argument.

Also add the above three items to the list of fwprintf length
modifiers (Subclause 7.24.2.1, paragraph 7).

Note that I did not bother adding the ability to use %n to store
the number of characters written to the output stream into an
intmax_t, size_t, or ptrdiff_t.


Add the following three items to the list of fscanf length modifiers
(Subclause 7.19.6.2, Page 260, Paragraph 11).  (The underlined blanks
are to be filled in with the one or two letter modifiers chosen be the
committee.)

__	Specifies that a following d, i, o, u, x, X, or n conversion
	specifier applies to an argument with type pointer to intmax_t
	or uintmax_t.

__	Specifies that a following d, i, o, u, x, X, or n conversion
	specifier applies to an argument with type pointer to size_t
	or pointer to the signed integer type corresponding to size_t.

__	Specifies that a following d, i, o, u, x, X, or n conversion
	specifier applies to an argument with type pointer to ptrdiff_t
	or pointer to the unsigned integer type corresponding to
	ptrdiff_t.

Also add the above three items to the list of fwscanf length modifiers
(Subclause 7.24.2.2, paragraph 11).