Document: WG14 N952 BDTI's Comments on N948 (Extension for the programming language C to support embedded processors) John R. Hauser Berkeley Design Technology, Inc. 2001 August 16 Since Copenhagen, we've adjusted our prototype fixed-point implementation at BDTI to reflect the changes that came out of that meeting. In working with the new system, we found a major flaw in the arithmetic rules, for which we will be proposing a fix. Also, in response to concerns voiced by David Keaton, we've tried to find a set of "least common denominator" requirements for the fixed-point types that we can accept. At the same time, I've been collecting details for most of the "multimedia extensions" to mainstream processors (such as MMX for Intel Pentiums), so we can be sure they aren't shortchanged by the language. Most of the changes we suggest to the draft are detailed in text attached to the bottom of this document. The most significant points are: - Relax the requirements on fixed-point type formats. The exact rules we are proposing are spelled out below, and I won't try to repeat them here. Of note, our new rules would no longer require that corresponding "fract" and "accum" types have the same number of fractional bits. Although perhaps less concise than before, several rules are needed to ensure that the arithmetic makes sense (e.g., "long fract" cannot have fewer fractional bits than "fract"; etc.). Included in the text below are our recommendations for specific fixed- point formats on various systems. These recommendations are based on a careful evaluation of the capabilities of each processor, and I'm prepared to argue that the formats are realistic for each system. You may note that, except in one instance, the formats we propose satisfy the stricter type rules of the existing draft. We expect the current draft's rules to be kept as "recommended practice", because they should be. - Suppress the "usual arithmetic conversions" for fixed-point types. BDTI's original proposal did not permit an arithmetic operation with, for example, "accum" for one operand and "long fract" for the other. This was refused on the grounds that "accum" has more integral bits than "long fract", but "long fract" may have more fractional bits than "accum". When the committee in Copenhagen insisted that this be allowed, some "usual arithmetic conversion" rules were invented that promoted the "long fract" operand to "accum" before the operation occurred, similar to the way the "usual arithmetic conversions" work for other C types. Unfortunately, the automatic conversions we added were a mistake. The reason they're a mistake is similar to the reason we agreed not to support "usual arithmetic conversions" from integers to fixed-point. As we discussed in Copenhagen, for integer operands, such conversions would make it impossible to perform a meaningful multiplication or division between an integer and a "fract" fixed-point value, because automatic conversion of the integer operand to the "fract" type would be guaranteed to overflow unless the integer happened to be 0 or -1. To avoid this gratuitous overflow, the current draft defines an arithmetic operation between an integer and a fixed-point value as occuring directly on the two types as they are, without first converting the integer operand to match the type of the fixed-point operand. Now consider what happens when an "accum" is multiplied by a "long fract", both fixed-point types. Let's assume for argument's sake that "accum" has format s16.15 (sign bit + 16 integral bits + 15 fractional bits) and "long fract" is s.31 (sign bit + 31 fractional bits). Of course, there's nothing strange about the desire to multiply a 32-bit "accum" value by a 32-bit fraction between -1 and 1, generating an "accum" result; and furthermore, any hardware capable of a "long fract" multiplication (32 x 32 -> 32-bit fraction) can execute this "accum" x "long fract" multiplication perfectly well. However, our "usual arithmetic conversions" require that the "long fract" operand be converted first to "accum" before the multiplication occurs, which results in a drastic and gratuitous loss of 16 bits of significance in the "long fract" operand, making the result far less accurate than it deserves to be. The current workaround is to cast the operands to an encompassing larger type such as "long accum" and then cast (or assign) the "long accum" result back to "accum". Problems with the workaround include: * Given the new, relaxed rules proposed below for fixed-point formats, there might not be an encompassing type. The "long accum" type is not required to have as many fractional bits as "long fract" in our new rules. * The compiler is responsible for reducing this expression to a simple 32-bit fractional multiplication on the hardware. With rounding effects, the full expression might not be identical to a single multiplication, and the compiler might decline to perform the optimization. * The same solution using casts was offerred with BDTI's original proposal which disallowed multiplication of "accum" and "long fract" directly. This casting was rejected by the committee as being awkward and nonintuitive. (Note, by the way, that, with the current system, drastic loss of precision occurs quietly if the casts aren't included, whereas questionable operations would have been flagged by the compiler in the original proposal.) We see two possible fixes: * Restore the original prohibition against combining incompatible types such as "accum" and "long fract"; or, * Eliminate the automatic conversions and define the fixed-point arithmetic operations as occurring directly on the two operands, as already happens for operations between integers and fixed-point types. In the text below, we've assumed the second solution, since the committee has already voted against the first one. - Relax the accuracy requirements slightly for multiplication and division. The current Embedded C draft (following BDTI's proposal) requires in most cases that a fixed-point operation occur with a rounding error of less than 1 ulp. (An "ulp" is a "unit in the last place", equal to 2^-F where F is the number of fractional bits in the destination format.) We're now convinced this is too strict for multiplications on some implementations, and so we're suggesting instead a maximum rounding error of less than 2 ulps for multiplication and divisions. One important beneficiary would be mainstream 32-bit processors, on which "long fract" might reasonably be implemented as 32 bits (format s.31). A multiplication of two 32-bit "long fract"s to a "long fract" result would typically be compiled as a 32 x 32 -> 64-bit integer multiplication followed by a shift right by 31 bits, keeping only the bottom 32 bits at the end. On many of these processors, the 64-bit product would be obtained in two 32-bit registers---say, R0 and R1---and then the 31-bit shift across the register pair would take three instructions: shift R0 left 1 bit shift R1 right 31 bits (an unsigned shift) OR R1 into R0 which leaves the 32-bit "long fract" result in R0. But note that the most significant 31 bits of the result are already available in R0 after the first shift; the other two instructions serve only to move the last, least significant bit into position. If the product is permitted to be up to 2 ulps in error, an implementation could choose instead to leave the least significant bit zero and dispense with the last two instructions. Although we'd prefer the tighter 1-ulp bound in principle, savings such as this will be significant enough on many processors to justify the greater leniency. As a side effect of this change, the special case about multiplication results of 1 and -1 could be dropped from the Embedded C document. Other comments we have: - The statement A conforming implementation shall support at least two different signed "fract" fixed point datatypes, and one signed accum fixed point datatype. in Section 2.1.1 is inappropriate. An implementation should be required to support all six signed fixed-point types, and probably all of the unsigned fixed-point types, too. Just as for the integers, there's no requirement that the types all be different formats. - The entire discourse on "containers" should be dropped. Most of the important notions are already covered under the topic of "representations of types" in Section 6.2.6.1 of the C Standard. Issues specific to the new fixed-point types should follow the model of Section 6.2.6.2 for integer types. In particular, the encoding of a fixed-point type should be divided into "padding bits", "fractional bits", "integral bits", and "sign bit", analogous to the integer types in 6.2.6.2. In the same vein, Sections 2.1.4.1.2 and 2.1.4.1.4 of the draft are (I think) redundant with the existing Standard and should be deleted. - In the "usual arithmetic conversions" in Section 2.1.3, the following statement is out of place and should be deleted: If the type of either of the operands has the sat qualifier, the resulting type shall have the sat qualifier; if the type of either of the operands has the modwrap qualifier, the resulting type shall have the modwrap qualifier. The "usual arithmetic conversions" are not concerned with the result type of an operation. - Regarding the renaming of "abs" to "fpabs": the abbreviatin "f.p." is often taken to mean "floating-point", so that may not be a good choice. In any event, one or both of "roundfx" and "fpabs" should be renamed to be consistent. - Since the committee at Copenhagen rejected BDTI's proposal to have the "bits" construct be assignable, we are now proposing a "setbits" operator to serve this purpose. The rejected form bits(x) = expr would instead become setbits(x, expr) While the Embedded C draft calls "bits" a type-generic function, "setbits" will either have to be a keyword or a macro in order to work properly. - The "additional information and rationale" for fixed-point in Section A.1 needs work, especially as some of it contradicts the more normative part of the document. We also agree that there are other issues (such as "printf" format specifiers, or the possibility of complex fixed-point types) that need to be considered but that haven't been addressed here. For now, our interest is concentrated on getting concensus for the core fixed-point types and operations before we widen our scope to include other topics. ============================================================================ The following constitutes more specific normative changes we propose for the draft document. ---------------------------------------------------------------------------- 2.1.1 The datatypes To fix a number of problems with the draft and to provide a more flexible set of type rules, replace Section 2.1.1 with the following (or something similar): /-------------------------------------------------------------------------\ Twelve new fixed-point types are defined: unsigned short fract unsigned short accum unsigned fract unsigned accum unsigned long fract unsigned long accum signed short fract signed short accum signed fract signed accum signed long fract signed long accum The names short fract short accum fract accum long fract long accum without either "unsigned" or "signed" are aliases for the corresponding signed fixed-point types. The fixed-point types are assigned a fixed-point rank. The following types are listed in order of increasing rank: short fract, fract, long fract, short accum, accum, long accum Each unsigned fixed-point type has the same size (in bytes) and the same rank as it's corresponding signed fixed-point type. The bits of an unsigned fixed-point type are divided into padding bits, fractional bits, and integral bits. The bits of a signed fixed-point type are divided into padding bits, fractional bits, integral bits, and a sign bit. The "fract" types have no integral bits; consequently, unsigned "fract" types encode values in the range of 0 to 1, and signed "fract" types encode values in the range of -1 to 1. The minimal formats for each type are: signed short fract s.7 signed short accum s4.7 signed fract s.15 signed accum s4.15 signed long fract s.23 signed long accum s4.23 unsigned short fract .7 unsigned short accum 4.7 unsigned fract .15 unsigned accum 4.15 unsigned long fract .23 unsigned long accum 4.23 (For the unsigned formats, the notation "x.y" means x integral bits and y fractional bits, for a total of x + y bits. The added "s" in the signed formats denotes the sign bit.) An implementation may give any of the fixed-point types more fractional bits, and may also give any of the "accum" types more integral bits, subject to the following restrictions: - Each unsigned "fract" type has either the same number of fractional bits or one more fractional bit than its corresponding signed "fract" type. - The number of fractional bits is nondecreasing for each of the following sets of fixed-point types when arranged in order of increasing rank: * signed "fract" types * unsigned "fract" types * signed "accum" types * unsigned "accum" types - The number of integral bits is nondecreasing for each of the following sets of fixed-point types when arranged in order of increasing rank: * signed "accum" types * unsigned "accum" types - Each signed "accum" type has at least as many integral bits as its corresponding unsigned "accum" type. Furthermore, the following are recommended practice where practical: - The "signed long fract" type has at least 31 fractional bits. - Each "accum" type has at least 8 integral bits. - Each unsigned "accum" type has the same number of fractional bits as its corresponding unsigned "fract" type. - Each signed "accum" type has the same number of fractional bits as either its corresponding signed "fract" type or its corresponding unsigned "fract" type. \-------------------------------------------------------------------------/ By way of example, these tables show the fixed-point formats we would suggest for various classes of processors: signed fract--------- signed accum--------- short middle long short middle long typical desktop processor s.7 s.15 s.31 s8.7 s16.15 s32.31 typical 16-bit DSP s.15 s.15 s.31 s8.15 s8.15 s8.31 typical 24-bit DSP s.23 s.23 s.47 s8.23 s8.23 s8.47 Intel MMX s.7 s.15 s.31 s8.7 s16.15 s32.31 PowerPC AltiVec s.7 s.15 s.31 s8.7 s16.15 s32.31 Sun VIS s.7 s.15 s.31 s8.7 s16.15 s32.31 MIPS MDMX s.7 s.15 s.31 s8.7 s8.15 s17.30 Lexra Radiax s.7 s.15 s.31 s8.7 s8.15 s8.31 ARM Piccolo s.7 s.15 s.31 s8.7 s16.15 s16.31 unsigned fract------- unsigned accum------- short middle long short middle long typical desktop processor .8 .16 .32 8.8 16.16 32.32 typical 16-bit DSP .16 .16 .32 8.16 8.16 8.32 typical 24-bit DSP .24 .24 .48 8.24 8.24 8.48 Intel MMX .8 .16 .32 8.8 16.16 32.32 PowerPC AltiVec .8 .16 .32 8.8 16.16 32.32 Sun VIS .8 .16 .32 8.8 16.16 32.32 MIPS MDMX .8 .16 .32 8.8 8.16 16.32 Lexra Radiax .8 .16 .32 8.8 8.16 8.32 ARM Piccolo .8 .16 .32 8.8 16.16 16.32 (The "typical" DSPs referred to in the table cannot address units in memory smaller than 16 or 24 bits, which is why these processors aren't expected to support a "short fract" smaller than "fract".) ---------------------------------------------------------------------------- 2.1.3 Type conversions, usual arithmetic conversions To suppress most of the "usual arithmetic conversions" for fixed-point types, replace the four rules in the text with the following: Otherwise, if one operand has fixed-point type and the other operand has integer type, then no conversions are needed. Otherwise, if both operands have signed fixed-point types, or if both operands have unsigned fixed-point types, then no conversions are needed. Otherwise, if one operand has signed fixed-point type and the other operand has unsigned fixed-point type, the operand with unsigned type is converted to the signed fixed-point type corresponding to its own unsigned fixed-point type. In any event, delete the rule If the type of either of the operands has the sat qualifier, the resulting type shall have the sat qualifier; if the type of either of the operands has the modwrap qualifier, the resulting type shall have the modwrap qualifier. ---------------------------------------------------------------------------- 2.1.4.1.2 Address and indirection operators Delete this section. ---------------------------------------------------------------------------- 2.1.4.1.4 The "sizeof" operator Delete this section. ---------------------------------------------------------------------------- 2.1.4.2.1 Binary arithmetic operators To suppress most of the "usual arithmetic conversions" for fixed-point types, replace the second bullet point with: - Otherwise if both operands are fixed-point, the result type is the operand type with greater rank (after the usual arithmetic conversions have been applied), with the adoption of any "sat" or "modwrap" qualifier from either operand. (For example, if the operands of an addition have types "unsigned long accum" and "sat fract", the result type is "sat long accum".) It is a constraint error for one operand to have a "sat" qualifier and the other a "modwrap" qualifier. To relax the accuracy requirement for multiplication and division, replace the text starting with "However, if the mathematical result of ..." with the following: For arithmetic operators other than "*" and "/", this rounded result is returned as the result of the operation. The "*" and "/" operators may return either this rounded result or, alternatively, the closest larger or closest smaller value representable by the result fixed-point type. The circumstances in which the rounded result might be replaced by a neighboring value in this manner are implementation-defined. (Between rounding and this optional adjustment, the multiplication and division operations permit a mathematical error of almost 2 units in the last place of the result type.) It should be stated that "division by zero is undefined". ---------------------------------------------------------------------------- 2.1.5.1 The "roundfx" function 2.1.5.4 The "fpabs" function Rename one or both of these functions to be consistent with one another. ---------------------------------------------------------------------------- 2.1.5.3 The "bits" function Append the following text for "setbits": The opposite operation of setting the bits of a fixed-point variable is provided by the "setbits" macro, which has the syntax setbits ( <fixed-lvalue>, <n> ) For example, starting with the declaration of "a" above, the assignment setbits(a, 0x2000); gives "a" the fixed-point value of 0.25. The value of the second operand is converted to an integer before the assignment is made. If this integer value is too large for the type of the first operand, only the bottom N bits of the value are used, where N is the total number of (nonpadding) bits of the fixed-point type. ----------------------------------------------------------------------------