N833 J11/98-032 printf/scanf modifiers for inttypes abstract types Randy Meyers May 29, 1998 INTRODUCTION At the London meeting, Clive Feather proposed printf/scanf modifiers for intmax_t and other such "abstract" integer types. The committee rejected the proposal initially because of concerns with release to release binary compatibility. (An issue I raised.) There is great likelihood that an implementation will change the size of its largest supported integer type over time. Such changes tend to break programs that use printf from a shared library unless great care is taken by the implementation. At the Colorado meeting, the committee reconsidered. Modifiers for intmax_t and other types have great utility, and techniques for preserving binary compatibility when changing datatypes supported by printf/scanf are well known to implementations. The utility to the programmer outweighs the cost of implementation. A straw poll at the Colorado meeting showed very strong support for adding printf/scanf modifiers for intmax_t, size_t, and ptrdiff_t. I volunteered to write proposed words. Unfortunately, there are no clearly good letters left to use as modifiers. So, this paper contains a list of candidates and lets the committee resolve the issue via straw poll. Exact wording is proposed with a "fill in the blank" for the modifier letters. All references to the Standard in this paper are against N828, the pre-Copenhagen draft. BACKGROUND The C90 Standard, and the C9x draft (Subclause 7.26.8, Page 386, Paragraph 1, says "Lowercase letters may be added to the conversion specifiers in fprintf and fscanf. Other characters may be used in extensions." Because of this, it would be wise to only consider lowercase letters for the new printf/scanf modifiers. Subclause 6.5, Page 58, Paragraph 7, allows an integer value to be referenced by either an lvalue of either the signed or unsigned version of its effective type. Because of this, a strictly conforming program may printf a signed number as unsigned or an unsigned number as signed. A suggestion that a single modifier letter be used for size_t (an unsigned type) and ptrdiff_t (a signed type) is not wise. The following table lists the lower and upper case letters in use by the Standard or various implementations. If the line for a letter does not like a meaning or an implementation using the letter, the letter is free for use. a hex float conversion spec (C9X) malloc flag (GNU) b unsigned binary integer (SCO) unknown use (Cray) c character conversion spec (std) d decimal integer conversion spec (std) e exponential float conversion spec (std) f fraction float conversion spec (std) g general float conversion spec (std) h short modifier (std) i integer conversion spec (std) j k l long modifier (std) m errno message conversion spec (GNU) n count conversion spec (std) pointer (SCO scanf-only) o octal integer conversion spec (std) p pointer conversion spec (std) pointer to pointer (SCO scanf-only) q quad (long long) modifier (BSD, GNU) r s string conversion spec (std) t u unsigned integer conversion spec (std) v w wide modifier (SUN) x hex integer conversion spec (std) y z A hex float conversion spec (C9X) B no-op flag (AIX) unknown use (Cray) unsigned binary integer (SCO) C wide char conversion spec (lots-SUS) D same as ld (BSD deprecated) E exponential float conversion spec (std) same as le (BSD) F fraction float conversion spec (std) same as lf (BSD deprecated) G general float conversion spec (std) same as lg (BSD) H I J no-op flag (AIX) K L long double modifier (std) long long int (GNU) M N no-op flag (AIX) O same as lo (BSD deprecated) P Q R S wide string conversion spec (lots-SUS) T U same as lu (BSD deprecated) V W X hex integer conversion spec (std) same as lx (BSD) Y Z size_t modifier (GNU) Note that GNU has established a precedent using "Z" for size_t values. DISCUSSION Since there are few good letters left, the committee could take one of the following approaches: 1) Only add a modifier for intmax_t. For example, "z" (the last letter) for intmax_t (the ultimate integer type). All output could be done by casting other integer types to intmax_t or uintmax_t, and using the printf/scanf modifier "z". I believe that a modifier for intmax_t is the single most important part of this proposal, and I hope that the committee adds this at a minimum. However, I believe that modifiers for size_t and ptrdiff_t are also useful. 2) The committee could pick three lower case letters to be the printf/scanf modifiers for size_t, ptrdiff_t, and intmax_t. For example, z for size_t (GNU uses uppercase "Z") t for ptrdiff_t (random assignment) j for intmax_t (j is sort of like "i" for integer) 3) The committee could pick a single letter to be an "escape" character for two character modifiers. For example, "z" could be the first character of a two character modifier sequence (not unlike "ll"). This opens up the full set of letters to be used after "z" for future uses, and allows mnemonic letters for the three integer types discussed in this paper: zs for size_t zp for ptrdiff_t zm for intmax_t I recommend the following straw votes: 1. Does the committee want to add some additional modifiers for integer types? 2. Does the committee want modifiers for only intmax_t, or for size_t and ptrdiff_t as well? 3. Does the committee want to add an "escape" character to be used for further modifiers (like zs, zp, zm)? 4. Does the committee want to just pick three letter (which three)? PROPOSED WORDING Add the following three items to the list of fprintf length modifiers (Subclause 7.19.6.1, Page 252, Paragraph 7). (The underlined blanks are to be filled in with the one or two letter modifiers chosen be the committee.) __ Specifies that a following d, i, o, u, x, or X conversion specifier applies to a intmax_t or uintmax_t argument. __ Specifies that a following d, i, o, u, x, or X conversion specifier applies to a size_t (or the corresponding signed integer type) argument. __ Specifies that a following d, i, o, u, x, or X conversion specifier applies to a ptrdiff_t (or the corresponding unsigned integer type) argument. Also add the above three items to the list of fwprintf length modifiers (Subclause 7.24.2.1, paragraph 7). Note that I did not bother adding the ability to use %n to store the number of characters written to the output stream into an intmax_t, size_t, or ptrdiff_t. Add the following three items to the list of fscanf length modifiers (Subclause 7.19.6.2, Page 260, Paragraph 11). (The underlined blanks are to be filled in with the one or two letter modifiers chosen be the committee.) __ Specifies that a following d, i, o, u, x, X, or n conversion specifier applies to an argument with type pointer to intmax_t or uintmax_t. __ Specifies that a following d, i, o, u, x, X, or n conversion specifier applies to an argument with type pointer to size_t or pointer to the signed integer type corresponding to size_t. __ Specifies that a following d, i, o, u, x, X, or n conversion specifier applies to an argument with type pointer to ptrdiff_t or pointer to the unsigned integer type corresponding to ptrdiff_t. Also add the above three items to the list of fwscanf length modifiers (Subclause 7.24.2.2, paragraph 11).