N2218
Signed Integers are Two’s Complement

New Proposal,

Author:
(Apple)
Project:
ISO JTC1/SC22/WG14: Programming Language C
Source:
github.com/jfbastien/papers/blob/master/source/N2218.bs

Abstract

There is One True Representation for signed integers, and that representation is two’s complement.

1. Introduction

[C11] Integer types allows three representations for signed integral types:

See §3 C Signed Integer Wording for full wording.

C++17 goes further than C and only requires that "the representations of integral types shall define values by use of a pure binary numeration system". To the author’s knowledge no modern machine uses both C++ and a signed integer representation other than two’s complement (see §4 Survey of Signed Integer Representations). None of [MSVC], [GCC], and [LLVM] support other representations. This means that the C++ that is taught is effectively two’s complement, and the C++ that is written is two’s complement. It is extremely unlikely that there exist any significant codebase developed for two’s complement machines that would actually work when run on a non-two’s complement machine.

C and C++ as specified, however, are not two’s complement. Signed integers currently allow the existence of an extraordinary value which traps, extra padding bits, integral negative zero, and introduce undefined behavior and implementation-defined behavior for the sake of this extremely abstract machine.

[P0907r1] stands to change C++20 and make two’s complement the only signed integer representation that is supported. WG21 wants to hear from WG14 before making this change, and hopes that WG14 will be interested in making the same changes. Aaron Ballman has volunteered to present this paper at the WG14 Brno meeting, as well as with the C Safety and Security study group.

Based on guidance received from this paper, The author is happy to write a follow-up proposal with wording for WG14. The author will communicate WG14’s feedback to WG21 at the Rappersvil meeting in June 2018.

2. Details

The following is proposed to C++:

3. C Signed Integer Wording

The following is the wording on integers from the C11 Standard.

For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N−1, so that objects of that type shall be capable of representing values from 0 to 2N − 1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.

For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; signed char shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M ≤ N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:

Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones’ complement), is a trap representation or a normal value. In the case of sign and magnitude and ones’ complement, if this representation is a normal value it is called a negative zero.

If the implementation supports negative zeros, they shall be generated only by:

It is unspecified whether these cases actually generate a negative zero or a normal zero, and whether a negative zero becomes a normal zero when stored in an object.

If the implementation does not support negative zeros, the behavior of the &, |, ^, ~, <<, and >> operators with operands that would produce such a value is undefined.

The values of any padding bits are unspecified. A valid (non-trap) object representation of a signed integer type where the sign bit is zero is a valid object representation of the corresponding unsigned type, and shall represent the same value. For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type.

The precision of an integer type is the number of bits it uses to represent values, excluding any sign and padding bits. The width of an integer type is the same but including any sign bit; thus for unsigned integer types the two values are the same, while for signed integer types the width is one greater than the precision.

4. Survey of Signed Integer Representations

Here is a non-comprehensive history of signed integer representations:

Wikipedia offers more details and has comprehensive sources for the above.

Thomas Rodgers surveyed popular DSPs and found the following:

In short, the only machine the author could find using non-two’s complement are made by Unisys, and no counter-example was brought by any member of the C++ standards committee. Nowadays Unisys emulates their old architecture using x86 CPUs with attached FPGAs for customers who have legacy applications which they’ve been unable to migrate. These applications are unlikely to be well served by modern C++, signed integers are the least of their problem. Post-modern C++ should focus on serving its existing users well, and incoming users should be blissfully unaware of integer esoterica.

References

Informative References

[C11]
Programming Languages — C. URL: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
[GCC]
GCC C Implementation-Defined Behavior: Integers. URL: https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
[LLVM]
LLVM Language Reference Manual. URL: https://llvm.org/docs/LangRef.html
[LWG3047]
Tim Song. atomic compound assignment operators can cause undefined behavior when corresponding fetch_meow members don't. New. URL: https://wg21.link/lwg3047
[MSVC]
MSVC C Implementation-Defined Behavior: Integers. URL: https://docs.microsoft.com/en-us/cpp/c-language/integers
[P0907r1]
Signed integers are two's complement. URL: https://wg21.link/P0907r1