Document: WG14 N952

BDTI's Comments on N948
(Extension for the programming language C to support embedded processors)

John R. Hauser
Berkeley Design Technology, Inc.
2001 August 16


Since Copenhagen, we've adjusted our prototype fixed-point implementation at
BDTI to reflect the changes that came out of that meeting.  In working with
the new system, we found a major flaw in the arithmetic rules, for which
we will be proposing a fix.  Also, in response to concerns voiced by David
Keaton, we've tried to find a set of "least common denominator" requirements
for the fixed-point types that we can accept.  At the same time, I've been
collecting details for most of the "multimedia extensions" to mainstream
processors (such as MMX for Intel Pentiums), so we can be sure they aren't
shortchanged by the language.

Most of the changes we suggest to the draft are detailed in text attached to
the bottom of this document.  The most significant points are:

 - Relax the requirements on fixed-point type formats.

   The exact rules we are proposing are spelled out below, and I won't try
   to repeat them here.  Of note, our new rules would no longer require
   that corresponding "fract" and "accum" types have the same number of
   fractional bits.  Although perhaps less concise than before, several
   rules are needed to ensure that the arithmetic makes sense (e.g., "long
   fract" cannot have fewer fractional bits than "fract"; etc.).

   Included in the text below are our recommendations for specific fixed-
   point formats on various systems.  These recommendations are based on
   a careful evaluation of the capabilities of each processor, and I'm
   prepared to argue that the formats are realistic for each system.  You
   may note that, except in one instance, the formats we propose satisfy the
   stricter type rules of the existing draft.  We expect the current draft's
   rules to be kept as "recommended practice", because they should be.

 - Suppress the "usual arithmetic conversions" for fixed-point types.

   BDTI's original proposal did not permit an arithmetic operation with, for
   example, "accum" for one operand and "long fract" for the other.  This
   was refused on the grounds that "accum" has more integral bits than "long
   fract", but "long fract" may have more fractional bits than "accum".
   When the committee in Copenhagen insisted that this be allowed, some
   "usual arithmetic conversion" rules were invented that promoted the "long
   fract" operand to "accum" before the operation occurred, similar to the
   way the "usual arithmetic conversions" work for other C types.

   Unfortunately, the automatic conversions we added were a mistake.
   The reason they're a mistake is similar to the reason we agreed not to
   support "usual arithmetic conversions" from integers to fixed-point.  As
   we discussed in Copenhagen, for integer operands, such conversions would
   make it impossible to perform a meaningful multiplication or division
   between an integer and a "fract" fixed-point value, because automatic
   conversion of the integer operand to the "fract" type would be guaranteed
   to overflow unless the integer happened to be 0 or -1.  To avoid this
   gratuitous overflow, the current draft defines an arithmetic operation
   between an integer and a fixed-point value as occuring directly on the
   two types as they are, without first converting the integer operand to
   match the type of the fixed-point operand.

   Now consider what happens when an "accum" is multiplied by a "long
   fract", both fixed-point types.  Let's assume for argument's sake that
   "accum" has format s16.15 (sign bit + 16 integral bits + 15 fractional
   bits) and "long fract" is s.31 (sign bit + 31 fractional bits).  Of
   course, there's nothing strange about the desire to multiply a 32-bit
   "accum" value by a 32-bit fraction between -1 and 1, generating an
   "accum" result; and furthermore, any hardware capable of a "long fract"
   multiplication (32 x 32 -> 32-bit fraction) can execute this "accum" x
   "long fract" multiplication perfectly well.  However, our "usual
   arithmetic conversions" require that the "long fract" operand be
   converted first to "accum" before the multiplication occurs, which
   results in a drastic and gratuitous loss of 16 bits of significance in
   the "long fract" operand, making the result far less accurate than it
   deserves to be.

   The current workaround is to cast the operands to an encompassing larger
   type such as "long accum" and then cast (or assign) the "long accum"
   result back to "accum".  Problems with the workaround include:

    * Given the new, relaxed rules proposed below for fixed-point formats,
      there might not be an encompassing type.  The "long accum" type is not
      required to have as many fractional bits as "long fract" in our new
      rules.

    * The compiler is responsible for reducing this expression to a simple
      32-bit fractional multiplication on the hardware.  With rounding
      effects, the full expression might not be identical to a single
      multiplication, and the compiler might decline to perform the
      optimization.

    * The same solution using casts was offerred with BDTI's original
      proposal which disallowed multiplication of "accum" and "long fract"
      directly.  This casting was rejected by the committee as being awkward
      and nonintuitive.  (Note, by the way, that, with the current system,
      drastic loss of precision occurs quietly if the casts aren't included,
      whereas questionable operations would have been flagged by the
      compiler in the original proposal.)

   We see two possible fixes:

    * Restore the original prohibition against combining incompatible types
      such as "accum" and "long fract"; or,

    * Eliminate the automatic conversions and define the fixed-point
      arithmetic operations as occurring directly on the two operands, as
      already happens for operations between integers and fixed-point types.

   In the text below, we've assumed the second solution, since the committee
   has already voted against the first one.

 - Relax the accuracy requirements slightly for multiplication and division.

   The current Embedded C draft (following BDTI's proposal) requires in most
   cases that a fixed-point operation occur with a rounding error of less
   than 1 ulp.  (An "ulp" is a "unit in the last place", equal to 2^-F where
   F is the number of fractional bits in the destination format.)  We're now
   convinced this is too strict for multiplications on some implementations,
   and so we're suggesting instead a maximum rounding error of less than
   2 ulps for multiplication and divisions.

   One important beneficiary would be mainstream 32-bit processors, on which
   "long fract" might reasonably be implemented as 32 bits (format s.31).
   A multiplication of two 32-bit "long fract"s to a "long fract" result
   would typically be compiled as a 32 x 32 -> 64-bit integer multiplication
   followed by a shift right by 31 bits, keeping only the bottom 32 bits
   at the end.  On many of these processors, the 64-bit product would be
   obtained in two 32-bit registers---say, R0 and R1---and then the 31-bit
   shift across the register pair would take three instructions:

       shift R0 left 1 bit
       shift R1 right 31 bits (an unsigned shift)
       OR R1 into R0

   which leaves the 32-bit "long fract" result in R0.  But note that the
   most significant 31 bits of the result are already available in R0
   after the first shift; the other two instructions serve only to move the
   last, least significant bit into position.  If the product is permitted
   to be up to 2 ulps in error, an implementation could choose instead
   to leave the least significant bit zero and dispense with the last two
   instructions.  Although we'd prefer the tighter 1-ulp bound in principle,
   savings such as this will be significant enough on many processors to
   justify the greater leniency.

   As a side effect of this change, the special case about multiplication
   results of 1 and -1 could be dropped from the Embedded C document.

Other comments we have:

 - The statement

       A conforming implementation shall support at least two different
       signed "fract" fixed point datatypes, and one signed accum fixed
       point datatype.

   in Section 2.1.1 is inappropriate.  An implementation should be required
   to support all six signed fixed-point types, and probably all of the
   unsigned fixed-point types, too.  Just as for the integers, there's no
   requirement that the types all be different formats.

 - The entire discourse on "containers" should be dropped.  Most of the
   important notions are already covered under the topic of "representations
   of types" in Section 6.2.6.1 of the C Standard.  Issues specific to the
   new fixed-point types should follow the model of Section 6.2.6.2 for
   integer types.  In particular, the encoding of a fixed-point type should
   be divided into "padding bits", "fractional bits", "integral bits", and
   "sign bit", analogous to the integer types in 6.2.6.2.

   In the same vein, Sections 2.1.4.1.2 and 2.1.4.1.4 of the draft are
   (I think) redundant with the existing Standard and should be deleted.

 - In the "usual arithmetic conversions" in Section 2.1.3, the following
   statement is out of place and should be deleted:

       If the type of either of the operands has the sat qualifier, the
       resulting type shall have the sat qualifier; if the type of either of
       the operands has the modwrap qualifier, the resulting type shall have
       the modwrap qualifier.

   The "usual arithmetic conversions" are not concerned with the result type
   of an operation.

 - Regarding the renaming of "abs" to "fpabs":  the abbreviatin "f.p." is
   often taken to mean "floating-point", so that may not be a good choice.
   In any event, one or both of "roundfx" and "fpabs" should be renamed to
   be consistent.

 - Since the committee at Copenhagen rejected BDTI's proposal to have the
   "bits" construct be assignable, we are now proposing a "setbits" operator
   to serve this purpose.  The rejected form

       bits(x) = expr

   would instead become

       setbits(x, expr)

   While the Embedded C draft calls "bits" a type-generic function,
   "setbits" will either have to be a keyword or a macro in order to work
   properly.

 - The "additional information and rationale" for fixed-point in Section A.1
   needs work, especially as some of it contradicts the more normative part
   of the document.

We also agree that there are other issues (such as "printf" format
specifiers, or the possibility of complex fixed-point types) that need to
be considered but that haven't been addressed here.  For now, our interest
is concentrated on getting concensus for the core fixed-point types and
operations before we widen our scope to include other topics.


============================================================================

The following constitutes more specific normative changes we propose for the
draft document.


----------------------------------------------------------------------------
2.1.1 The datatypes

To fix a number of problems with the draft and to provide a more flexible
set of type rules, replace Section 2.1.1 with the following (or something
similar):

 /-------------------------------------------------------------------------\

  Twelve new fixed-point types are defined:

      unsigned short fract    unsigned short accum
      unsigned fract          unsigned accum
      unsigned long fract     unsigned long accum

      signed short fract      signed short accum
      signed fract            signed accum
      signed long fract       signed long accum

  The names

      short fract    short accum
      fract          accum
      long fract     long accum

  without either "unsigned" or "signed" are aliases for the corresponding
  signed fixed-point types.

  The fixed-point types are assigned a fixed-point rank.  The following
  types are listed in order of increasing rank:

      short fract,  fract,  long fract,  short accum,  accum,  long accum

  Each unsigned fixed-point type has the same size (in bytes) and the same
  rank as it's corresponding signed fixed-point type.

  The bits of an unsigned fixed-point type are divided into padding bits,
  fractional bits, and integral bits.  The bits of a signed fixed-point type
  are divided into padding bits, fractional bits, integral bits, and a sign
  bit.

  The "fract" types have no integral bits; consequently, unsigned "fract"
  types encode values in the range of 0 to 1, and signed "fract" types
  encode values in the range of -1 to 1.  The minimal formats for each type
  are:

      signed short fract      s.7       signed short accum      s4.7
      signed fract            s.15      signed accum            s4.15
      signed long fract       s.23      signed long accum       s4.23

      unsigned short fract     .7       unsigned short accum     4.7
      unsigned fract           .15      unsigned accum           4.15
      unsigned long fract      .23      unsigned long accum      4.23

  (For the unsigned formats, the notation "x.y" means x integral bits and
  y fractional bits, for a total of x + y bits.  The added "s" in the signed
  formats denotes the sign bit.)

  An implementation may give any of the fixed-point types more fractional
  bits, and may also give any of the "accum" types more integral bits,
  subject to the following restrictions:

   - Each unsigned "fract" type has either the same number of fractional
     bits or one more fractional bit than its corresponding signed "fract"
     type.

   - The number of fractional bits is nondecreasing for each of the
     following sets of fixed-point types when arranged in order of
     increasing rank:
      * signed "fract" types
      * unsigned "fract" types
      * signed "accum" types
      * unsigned "accum" types

   - The number of integral bits is nondecreasing for each of the following
     sets of fixed-point types when arranged in order of increasing rank:
      * signed "accum" types
      * unsigned "accum" types

   - Each signed "accum" type has at least as many integral bits as its
     corresponding unsigned "accum" type.

  Furthermore, the following are recommended practice where practical:

   - The "signed long fract" type has at least 31 fractional bits.

   - Each "accum" type has at least 8 integral bits.

   - Each unsigned "accum" type has the same number of fractional bits as
     its corresponding unsigned "fract" type.

   - Each signed "accum" type has the same number of fractional bits as
     either its corresponding signed "fract" type or its corresponding
     unsigned "fract" type.

 \-------------------------------------------------------------------------/

By way of example, these tables show the fixed-point formats we would
suggest for various classes of processors:

                             signed fract---------     signed accum---------
                             short   middle   long     short   middle   long

   typical desktop processor  s.7     s.15    s.31     s8.7   s16.15  s32.31
   typical 16-bit DSP         s.15    s.15    s.31     s8.15   s8.15   s8.31
   typical 24-bit DSP         s.23    s.23    s.47     s8.23   s8.23   s8.47

   Intel MMX                  s.7     s.15    s.31     s8.7   s16.15  s32.31
   PowerPC AltiVec            s.7     s.15    s.31     s8.7   s16.15  s32.31
   Sun VIS                    s.7     s.15    s.31     s8.7   s16.15  s32.31
   MIPS MDMX                  s.7     s.15    s.31     s8.7    s8.15  s17.30
   Lexra Radiax               s.7     s.15    s.31     s8.7    s8.15   s8.31
   ARM Piccolo                s.7     s.15    s.31     s8.7   s16.15  s16.31

                             unsigned fract-------     unsigned accum-------
                             short   middle   long     short   middle   long

   typical desktop processor   .8      .16     .32      8.8    16.16   32.32
   typical 16-bit DSP          .16     .16     .32      8.16    8.16    8.32
   typical 24-bit DSP          .24     .24     .48      8.24    8.24    8.48

   Intel MMX                   .8      .16     .32      8.8    16.16   32.32
   PowerPC AltiVec             .8      .16     .32      8.8    16.16   32.32
   Sun VIS                     .8      .16     .32      8.8    16.16   32.32
   MIPS MDMX                   .8      .16     .32      8.8     8.16   16.32
   Lexra Radiax                .8      .16     .32      8.8     8.16    8.32
   ARM Piccolo                 .8      .16     .32      8.8    16.16   16.32

(The "typical" DSPs referred to in the table cannot address units in memory
smaller than 16 or 24 bits, which is why these processors aren't expected to
support a "short fract" smaller than "fract".)


----------------------------------------------------------------------------
2.1.3 Type conversions, usual arithmetic conversions

To suppress most of the "usual arithmetic conversions" for fixed-point
types, replace the four rules in the text with the following:

    Otherwise, if one operand has fixed-point type and the other operand has
    integer type, then no conversions are needed.

    Otherwise, if both operands have signed fixed-point types, or if both
    operands have unsigned fixed-point types, then no conversions are
    needed.

    Otherwise, if one operand has signed fixed-point type and the other
    operand has unsigned fixed-point type, the operand with unsigned type
    is converted to the signed fixed-point type corresponding to its own
    unsigned fixed-point type.

In any event, delete the rule

    If the type of either of the operands has the sat qualifier, the
    resulting type shall have the sat qualifier; if the type of either of
    the operands has the modwrap qualifier, the resulting type shall have
    the modwrap qualifier.


----------------------------------------------------------------------------
2.1.4.1.2 Address and indirection operators

Delete this section.


----------------------------------------------------------------------------
2.1.4.1.4 The "sizeof" operator

Delete this section.


----------------------------------------------------------------------------
2.1.4.2.1 Binary arithmetic operators

To suppress most of the "usual arithmetic conversions" for fixed-point
types, replace the second bullet point with:

    - Otherwise if both operands are fixed-point, the result type is the
      operand type with greater rank (after the usual arithmetic conversions
      have been applied), with the adoption of any "sat" or "modwrap"
      qualifier from either operand.  (For example, if the operands of an
      addition have types "unsigned long accum" and "sat fract", the result
      type is "sat long accum".)  It is a constraint error for one operand
      to have a "sat" qualifier and the other a "modwrap" qualifier.

To relax the accuracy requirement for multiplication and division, replace
the text starting with "However, if the mathematical result of ..." with the
following:

    For arithmetic operators other than "*" and "/", this rounded result is
    returned as the result of the operation.  The "*" and "/" operators may
    return either this rounded result or, alternatively, the closest larger
    or closest smaller value representable by the result fixed-point type.
    The circumstances in which the rounded result might be replaced by a
    neighboring value in this manner are implementation-defined.  (Between
    rounding and this optional adjustment, the multiplication and division
    operations permit a mathematical error of almost 2 units in the last
    place of the result type.)

It should be stated that "division by zero is undefined".


----------------------------------------------------------------------------
2.1.5.1 The "roundfx" function
2.1.5.4 The "fpabs" function

Rename one or both of these functions to be consistent with one another.


----------------------------------------------------------------------------
2.1.5.3 The "bits" function

Append the following text for "setbits":

    The opposite operation of setting the bits of a fixed-point variable is
    provided by the "setbits" macro, which has the syntax

        setbits ( <fixed-lvalue>, <n> )

    For example, starting with the declaration of "a" above, the assignment

        setbits(a, 0x2000);

    gives "a" the fixed-point value of 0.25.  The value of the second
    operand is converted to an integer before the assignment is made.  If
    this integer value is too large for the type of the first operand, only
    the bottom N bits of the value are used, where N is the total number of
    (nonpadding) bits of the fixed-point type.


----------------------------------------------------------------------------