1. Changelog
1.1. Revision 4 - July 6th, 2022
- 
     Wording improvements: - 
       Revised generic_count_type 
- 
       Provide concrete definitions for most significant index and least significant index in the wording. 
- 
       Rework all of the preamble and requirements into a "General" paragraph for the < stdbit . h > 
 
- 
       
- 
     Add additional verification of the 8-bit memreverse function and use that implementation exactly in § 4.5.2 Vetting the Implementation / Algorithm for memreverse. 
- 
     Add additional verification of the 8-bit endian-aware load and store functions in § 4.6.1 Vetting the Implementation / Algorithm for 8-bit loads and stores, and try to transcribe that implementation directly to avoid errors in the wording. 
1.2. Revision 3 - June 17th, 2022
- 
     Removed a bullet point about adding a reservation for stdc_ 
- 
     Fixed the wording for Question 0 and Question 1 with respect to mentioning generic_count_type 
- 
     Adjust the position of the < stdbit . h > 
- 
     Formulated the return descriptions of the Endian-Aware Load and Store functions much more clearly and use pure mathematics notation, as suggested by Joseph Myers. 
1.3. Revision 2 - April 12th, 2022
- 
     Deeply discussed the rotate_left rotate_right 
- 
     Committed to taking a poll with the new information given about the functionality. 
- 
     Use bool _Bool 
- 
     Ensure the wording for stdc_first_ ( leading / trailing ) _ ( one / zero ) 
- 
     Discuss potentially having bit functions which take a width parameter in the future in polling section about CHAR_BIT and width definitions. 
- 
     Adjust wording for Endian-Aware Load/Store functions. - 
       Typo fixes for the names of mask and index values. 
- 
       For right shifts, introduce a new unsigned_value 
 
- 
       
- 
     Provide alternative wording solutions for various committee decisions in § 5.1 Decisions for the C Standards Committee. 
1.4. Revision 1 - January 1st, 2022
- 
     Drastically rework design section and motivations after several rounds of feedback from at least 4 vendors, 6 business partners, 3 Open Source maintainers, and more. 
- 
     Add additional bit utilities and design them from existing practice in C, C++, Go, Rust, Zig, and implementation-specific constraints in Visual C++, CLang, GCC, SDCC, TCC and more. - 
       stdc_first_ ( leading / trailing ) _ ( one / zero ) 
- 
       stdc_count_ ( leading / trailing ) _ ( ones / zeros ) 
- 
       Return types for bit functions counting bits is int 
- 
       Arguments types should be int 
 
- 
       
- 
     Provide backing implementation for all functionality in this paper at an official repository. 
- 
     Provide benchmarks showing performance comparisons using the intrinsics vs. not in § 3.1 Bits: How Much Faster?. 
- 
     Use zeros zeroes 
1.5. Revision 0 - October 15th, 2021
- 
     Initial release. ✨ 
2. Polls
These polls help guide the design of this paper in accordance with WG14 consensus. Where consensus was not sufficient or close (or there were many abstentions in conjunction with not having much consensus), the author chose in a particular direction and provided rationale.
2.1. WG14 Virtual Meeting - February 2022
WG14 reviewed an earlier version of this paper in N2903, discussing many of its design choices and aspects. WG14 was asked about which functions from the given set below to keep in the paper or remove: all sets of functions were approved when asking the 5 questions about which functionality should be kept (answered questions were moved to the Appendix in § 6.1 Decisions to Committee Questions). This was interpreted as unanimous consent to proceed with all of the functionality in this paper. If there is anyone who is interested in bisecting or taking pieces apart from this proposal, please let the authors know as soon as is humanly possible.
2.1.1. Does WG14 want the memreverse8 
   | Yes | No | Abstain | 
|---|---|---|
| 6 | 5 | 8 | 
This was interpreted as not strong enough consensus, but it was left to the author to decide. As we do not want to leave freestanding implementations which have 
One of the suggestions that came from doing this would also be to require the generic bit functions to take a parameter indicating the desired final width of the integer result, that the user would then cast. This is seen currently in the standard in functions such as 
For example, MISRA C and CERT discourage 
We also do not have existing practice for bit functions that are specified in this way. These functions are usually meant to map to a tight set of hardware instructions, and are meant to be cheaply translatable to said hardware instructions. So, we focus on providing things that map directly to standard and extended unsigned integer types as well as bit-precise integers that match exact-width integer types. This proposal does not spend further time explore providing 
2.1.2. Does WG14 want new signed-count rotate functions in addition to what is in N2903?
| Yes | No | Abstain | 
|---|---|---|
| 8 | 6 | 6 | 
This was interpreted as very close consensus, and also left to the author to decide. However, it was made clear in post-discussion that the current design for rotate left/right is fine, because it is a symmetrical operation, and is completely free to implement on 2’s complement implementations. Another important factor in making this decision was noting that most compilers already generate optimal code with a signed count value, including x86_64, x32_64, i686,  AARCH64 (Arm 64-bit), and Arm 32-bit targets. Finally, there are architectures were both rotate left and rotate right instructions are available, but they do not have the same performance characteristics: the end-user should be able to use either 
2.1.3. Does WG14 want to put something along the lines of N2903 into C23?
| Yes | No | Abstain | 
|---|---|---|
| 19 | 2 | 2 | 
This is very clear direction to put it into C23, provided that the wording and other design details are hammered into place. We are working on these details.
3. Introduction & Motivation
There is a lot of proposals and work that goes into figuring out the "byte order" of integer values that occupy more than 1 octet (8 bits). This is nominally important when dealing with data that comes over network interfaces and is read from files, where the data can be laid out in various orders of octets for 2-, 3-, 4-, 6-, or 8-tuples of octets. The most well-known endian structures on existing architectures include "Big Endian", where the least significant bit comes "last" and is featured prominently in network protocols and file protocols; and, "Little Endian", where the least significant bit comes "first" and is typically the orientation of data for processor and user architectures most prevalent today.
In more legacy architectures (Honeywell, PDP), there also exists other orientations called "mixed" or "middle" endian. The uses of such endianness are of dubious benefit and are vanishingly rare amongst commodity and readily available hardware today, but nevertheless still represent an applicable ordering of octets.
In other related programming interfaces, the C functions/macros 
This proposal puts forth the fundamentals that make a homegrown implementation of 
3.1. Bits: How Much Faster?
Just how much faster can using intrinsics and bit operations as proposed in this paper be? Below is a quantification of the performance differences from naïve algorithms that worked over one "bit" (or 
If you don’t read the previous link, then at the very least it should be shown that the code describes in this proposal provides the means to implement the improvements shown in the ztdc_packed group of benchmark bars.
4. Design
This is a library addition. It is meant to expose both macros and functions that can be used for translation time-suitable checks. It provides a way to check endianness within the preprocessor, and gives definitive names that allow for knowing whether the endianness is big, little, or neither. We state big, little, or neither, because there is no settled-upon name for the legacy endianness of "middle" or "mixed", nor any agreed upon ordering for such a "middle" or "mixed" endianness between architectures. This is not the case for big endian or little endian, where one is simply the reverse of the other, always, in every case, across architectures, file protocols, and network specifications.
The next part of the design is functions for working with groupings of 8 bits. They are meant to communicate with network or file protocols and formats that have become ubiquitous in computing for the last 30 years.
This design also provides a small but essential suite of bit utilities, all within the 
4.1. Preliminary: Why the stdc_ 
   We use the 
4.2. Charter: unsigned  char  const  ptr [ static  sizeof ( uintN_t )] 
   There are 2 choices on how to represent sized pointer arguments. The first is a 
To start, we still put any 
One of the tipping arguments in favor of our choice of 
4.3. Signed vs. Unsigned
This paper has gone back and forth between signed vs. unsigned 
- 
     All of the values returned from the functions here return conceptually unsigned/natural numbers (0 to potentially infinity, but not negative). 
- 
     Some existing practice — e.g., C++ — has in recent years struggled against unsigned integers and tried to move towards signed. "Anything that is a count should just be an int 
- 
     Conversely, some of C’s most fierce proponents use unsigned numbers almost exclusively until they have a proper justification for a signed number. For them, unsigned size_t 
- 
     Whatever decision we make for one (e.g., for the arugment type of rotate_left rotate_right count_ones popcount 
This brings up a lot of questions about whether or not the functions here should be signed or unsigned. We will analyze this primarily from the standpoint of 
4.3.1. In Defense of Signed Integers
Let us consider a universe where 
SDCC and several other compilers optimize for left and right shifts ([sdcc]). Texas Instruments and a handful of other specialist architectures also have "variable shift" instructions (SSHVL), which uses the sign of the argument to shift in one direction or the other ([ti-tms320c64x]). Having a 
To test code generation for using a signed integer and 2’s complement arithmetic, we used both C++ and C code samples. It’s a fairly accurate predictor of how notable compilers handle this kind of specification. The generated assembly for the compilers turns out to be optimal, so long as an implementation does not do a literal copy-paste of the specification’s text
Using non-constant offset, with generated x86_64 assembly:
#include <bit>extern unsigned int x ; extern int offset ; int main () { int l = std :: rotl ( x , offset ); int r = std :: rotr ( x , offset ); return l + r ; } 
main : # @main mov eax , dword ptr [ rip + x ] mov cl , byte ptr [ rip + offset ] mov edx , eax rol edx , cl ror eax , cl add eax , edx ret 
— And, using constant offset, with generated x86_64 assembly.
#include <bit>extern unsigned int x ; int main () { int l = std :: rotl ( x , -13 ); int r = std :: rotr ( x , -13 ); return l + r ; } 
main : # @main mov eax , dword ptr [ rip + x ] mov ecx , eax rol ecx , 19 rol eax , 13 add eax , ecx ret 
The generated code shows that the compiler understands the symmetric nature of the operations (from the constant code) and also shows that it will appropriately handle it even when it cannot see through constant values. The same can be shown when writing C code using a variety of styles, as shown here:
#if UNSIGNED_COUNT == 1 static unsigned int rotate_right ( unsigned int value , unsigned int count ); inline static unsigned int rotate_left ( unsigned int value , unsigned int count ) { unsigned int c = count % 32 ; return value >> c | value << ( 32 - c ); } inline static unsigned int rotate_right ( unsigned int value , unsigned int count ) { unsigned int c = count % 32 ; return value << c | value >> ( 32 - c ); } #elif TWOS_COMPLEMENT_CAST == 1 static unsigned int rotate_right ( unsigned int value , int count ); inline static unsigned int rotate_left ( unsigned int value , int count ) { unsigned int c = ( unsigned int ) count ; c = c % 32 ; return value >> c | value << ( 32 - c ); } inline static unsigned int rotate_right ( unsigned int value , int count ) { unsigned int c = ( unsigned int ) count ; c = c % 32 ; return value << c | value >> ( 32 - c ); } #else static unsigned int rotate_right ( unsigned int value , int count ); inline static unsigned int rotate_left ( unsigned int value , int count ) { int c = count % 32 ; if ( c < 0 ) { return rotate_right ( value , - c ); } return value >> c | value << ( 32 - c ); } inline static unsigned int rotate_right ( unsigned int value , int count ) { int c = count % 32 ; if ( c < 0 ) { return rotate_left ( value , - c ); } return value << c | value >> ( 32 - c ); } #endif #if UNSIGNED_COUNT == 1 unsigned int f ( unsigned int x , unsigned int offset ) { #else unsigned int f ( unsigned int x , int offset ) { #endif unsigned int l = rotate_left ( x , offset ); unsigned int r = rotate_right ( x , offset ); return l + r ; } 
When using the various definitions, we find that the generated assembly for 
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
— §6.3.1.3, ¶2, ISO/IEC 9899:202x "C2x" Standard
Finally, the vast majority of existing practice takes the offset value in as a signed integer, and all the return types are also still some form of signed integer (unless the intrinsic is returning the exact same unsigned value put in that was manipulated). It also allows "plain math" being done on the type to naturally manifest negative numbers without accidentaly having roundtripping or signed/unsigned conversion issues.
4.3.2. In Defense of Unsigned
Unsigned, on the other hand, has existing practice in hardware. While the intrinsics defined by glibc, C++'s standard libraries, and many more use signed integers, they are conceptually unsigned in their implementations. For example, for a 32-bit rotate, most standard libraries taking an 
count = count & 31 ; 
This is critical for optimization here. Note that, if we were to provide a specification using a 
count = count % 32 ; 
produces optimal code generation for most compilers, as they understand that bit 
Rust is one of the few languages that provides optimal versions of this code using 
All in all, unsigned naturally optimizes better and matches the size type of C. It has no undefined behavior on overflow and produces better assembly in-general when it comes to bit intrinsics. Shifting behavior is also well-defined for unsigned types and not signed types, further compounding unsigned types as far better than their signed counterparts.
4.3.3. Which Does This Paper Choose?
Ultimately, this paper chooses signed integer types. This is primarily to satisfy architectures which have signed-based variably style shifts. These platforms would have to convert to signed values to perform their variable shifts either way, so it benefits them. We also know that, for 2’s complement architectures, signed can be treated best by simply deploying 
Furthermore, existing practice in C uses signed integer types for the count for 
I expect this decision will not be extremelty popular. Ultimately, I expect to poll this at the next meeting. Whichever direction gets higher consensus, will be the direction I pursue for this functionality.
4.4. The __STDC_ENDIAN_ * 
   The enumeration is specified as follows:
#include <stdbit.h>#define __STDC_ENDIAN_LITTLE__ /* some unique value */ #define __STDC_ENDIAN_BIG__ /* some other unique value */ #define __STDC_ENDIAN_NATIVE__ /* see below! */ 
The goal of these macros is that if the system identifies as a "little endian" system, then 
#include <stdbit.h>#include <stdio.h>int main () { if ( __STDC_ENDIAN_NATIVE__ == __STDC_ENDIAN_LITTLE__ ) { printf ( "little endian! uwu \n " ); } else if ( __STDC_ENDIAN_NATIVE__ == __STDC_ENDIAN_BIG__ ) { printf ( "big endian OwO! \n " ); } else { printf ( "what is this?! \n " ); } return 0 ; } 
If a user has a Honeywell architecture or a PDP architecture, it is up to them to figure out which flavor of "middle endian"/"mixed endian"/"bi endian" they are utilizing. We do not give these a name in the set of macros because neither the Honeywell or PDP communities ever figured out which flavor of the 32-bit byte order of 
4.4.1. A (Brief) Discussion of Endianness
There is a LOT of design space and deployed existing practice in the endianness space of both architectures and their instruction sets. A non-exhaustive list of behaviors is as follows:
- 
     Instruction set, OS, and register conventions are in-sync (Windows, Apple, and most *Nix Distributions). 
- 
     Instruction set has variability that can be toggled (ARM with the SETEND 
- 
     Instruction set has no variability, but data can be stored in unconventional endianness (RISC-V, mainframe architectures, and similar). 
- 
     Instruction set has no variability, but it changes endianness between types/sizes (FORTRAN-implemented floating point units used Big Endian, PDP-11 compatibility with those machines required 32-bit big-endian instructions on a little-endian machine (hilarity/shenanigans ensued)). 
- 
     Instruction set has no variability, but historical weight forces certain choices (PDP-11 had 16-bit little-endian integers. Some folk interpreted two of them next to each other as a single 32-bit integer, resulting in the 2143 
Suffice to say, there exists a lot of deployed practice. Note that this list effectively has these concerns in priority order. The first is the most conventional software; as the list goes down, each occurrence becomes more rare and less interesting. Therefore, we try not to spend too much time focusing on what are effectively the edge cases of software and hardware. Some of the past choices in endianness and similar were simply due "going with the flow" (PDP’s "2143" order) or severe historical baggage (early FORTRAN dealing in big endian floating point numbers, and those algorithms and serialization methods being given to PDP machines without thinking about the ordering). With much of the industry moving away from such modes in both newer mainframes and architectures and towards newer implementations and architectures, it does not seem prudent to try to standardize the multitude of their behaviors.
This proposal constraints its definition of endianness to integer types without padding, strictly because trying to capture the vast berth of existing architectures and their practices can quickly devolve down a slope that deeply convolutes this proposal’s core mission: endian and bit utilities.
4.4.2. Hey! Some Architectures Can Change Their Endianness at Run-time!
This is beyond the scope of this proposal. This is meant to capture the translation-time endianness. There also does not appear to be any operating system written today that can tolerate an endianness change of the whole program happening arbitrarily at runtime, after a program has launched. This means that the property is effectively a translation-time property, and therefore can be exposed as a compile-time constant. A future proposal to determine the run-time byte order is more than welcome from someone who has suitable experience dealing with such architectures and programs, and this proposal does not preclude their ability to provide such a run-time function e.g. 
Certain instruction sets have ways to set the endianness of registers, to change how data is accessed ([arm-setend]). This functionality is covered by byte swapping, and byte swaps can be implemented using the 
4.4.3. Floating Point has a Byte Order, Too.
For the design of this paper, we strictly consider the design space for (unsigned) integers, only. Floating point numbers already have an implementation-defined byte order, and none of these functions are meant to interact with the floating point types. While the 
It shall be noted that for C++, since C++20, its endian enumeration applies to all scalar types:
This subclause describes the endianness of the scalar types of the execution environment.
— C++ Standard Working Draft, bit.endian/p1
It does not specify what this means for padding bits or similar; nor, I think, does it have to. Byte order means very little for padding bits until serialization comes into play. C++ does not define any functions which do byte-order aware serialization. So, it does not have to write any specification governing what may or may not happen and the left is rest undefined / unspecified.
For this proposal, we focus purely on integer types and, more specifically, on integer types which do not have padding or where we can work with a padding bits-agnostic representation. While it is acknowledged that floating point types and pointers have byte orders too, we do not want to interact directly with these types when it comes to endianness load and store functions. Byte swaps, (bit) population counts, and other bit operations can be performed on floating point types after they have been copied or type-punned (with implementation checking/blessing) into equivalent (unsigned) integer objects to do the necessary work.
4.5. Generic 8-bit Memory Reverse and Exact-width 8-bit Memory Reverse
In order to accommodate both a wide variety of architectures but also support minimum-width integer optimized intrinsics, this proposal takes from the industry 2 forms of byteswap:
- 
     one generic mem_ 
- 
     a sequence of exact-width byte swapping instructions which (typically) map directly to intrinsics available in compilers and instructions in hardware. 
These end up inhabiting the 
#include <stdbit.h>#include <limits.h>#include <stdint.h>#if (CHAR_BIT % 8 == 0) void stdc_memreverse8 ( size_t n , unsigned char ptr [ static n ]); uintN_t stdc_memreverse8uN ( uintN_t value ); #endif 
where 
One property of note is that 
// NOT guaranteed, if it works on CHAR_BIT // instead of working on 8 bits at a time. assert ( stdc_memreverse8u32 ( 0xAABBCCDD ) == 0xDDCCBBAA ); 
One of the problems with this approach is that it opens us up to potentially having padding bits if 
There is also the concern of bit orderings on top of byte orderings. Unfortunately, there is no practical way to deal with sub-8 bit orderings that may be different or change from machine to machine in a way that is practical when put in conjunction with larger-than-8-bit-bytes.
4.5.1. But Memory Reverse Is Dangerous?
Byte swapping, by itself, is absolutely dangerous in terms of code portability. Users often program strictly for their own architecture when doing serialization, and do not take into consideration that their endianness can change. This means that, while 
The inclusion of the 
4.5.2. Vetting the Implementation / Algorithm for memreverse 
   In previous iterations of the paper, there were various off-by-one errors in transcribing the algorithm used to get the job done. Therefore, we more directly lifted the code for the algorithm from the example implementation here. To further prove that it works on "bytes" that may be larger than 8 bits, we also took the following steps.
- 
     Implemented it as a macro (as shown from the link above). 
- 
     Use that macro implementation in the normal unsigned char 
- 
     Use that macro implementation all unsigned integer types that are larger than unsigned char 
- 
     Apply - fno - strict - alias 
All of the tests pass across the three major compilers (MSVC, GCC, and Clang) and across platforms (Windows, Linux, Mac OS). We find this to be compelling enough to ensure that the implementation and the algorithm in the wording is suitably correct. Nevertheless, any wording failures present here represent the authors' collective inability to properly serialize wording, not that an implementation is not possible or too inventive.
4.6. stdc_load8_ * stdc_store8_ * 
   Functions meant to transport bytes to a specific endianness need 3 pieces of information:
- 
     the sign of the input/output; 
- 
     the byte order of the input; and, 
- 
     the desired byte order of the output. 
To represent any operation that goes from/to the byte order that things like 
The specification for the endianness functions borrows from many different sources listed above, and is as follows:
#include <stdbit.h>#include <limits.h>#include <stdint.h>#if ((N % CHAR_BIT) == 0 && (CHAR_BIT % 8 == 0)) void stdc_store8_leuN ( uint_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_beuN ( uint_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); uint_leastN_t stdc_load8_leuN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); uint_leastN_t stdc_load8_beuN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_aligned_leuN ( uint_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_aligned_beuN ( uint_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); uint_leastN_t stdc_load8_aligned_leuN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); uint_leastN_t stdc_load8_aligned_beuN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_lesN ( int_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_besN ( int_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); int_leastN_t stdc_load8_lesN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); int_leastN_t stdc_load8_besN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_aligned_lesN ( int_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_aligned_besN ( int_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); int_leastN_t stdc_load8_aligned_lesN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); int_leastN_t stdc_load8_aligned_besN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); #endif 
Thanks to some feedback from implementers and librarians, this first implementation would also need an added signed variant to the load and store functions as well as aligned and unaligned loads and stores. While C23 will mandate a two’s complement representation for integers, because we are using the 
This specification is marginally more complicated than the 
We are fine with not making these precisely 
Similarly to 
Note that this means a 
4.6.1. Vetting the Implementation / Algorithm for 8-bit loads and stores
In previous iterations of the paper, getting the algorithm written down properly in a way that does not rely on any kind of implementation-defined behavior for signed and unsigned endian-aware loads and stores was tough and resulted in many errors in the wording. Still, we know that the implementation is solid because we have tested it (both theoretically and factually) by writing implementations which base "unit" for writing into has a width greater than 
- 
     Implemented the core bodies of the functions as macros whose base unit is not necessarily unsigned char 
- 
     Use that macro implementation in the normal unsigned char 
- 
     Use that macro implementation all unsigned integer types that are larger than unsigned char 
- 
     Apply - fno - strict - alias 
All of the tests pass across the three major compilers (MSVC, GCC, and Clang) and across platforms (Windows, Linux, Mac OS). We find this to be compelling enough to ensure that the implementation is suitably correct, even if the wording may not be proper or ideal. Therefore, we hope this can serve as a good basis in establishing that, at the very least, this is both implementable and usable. This also corroborates additional materials outside of compilers who always target 
4.7. Modern Bit Utilities
Additionally to this, upon first pre_review of the paper there was a strong supporting groundswell for bit operations that have long been present in both hardware and as compiler intrinsics. This idea progressed naturally from the 
| operation | Intel/AMD | ARM | PowerPC | 
|---|---|---|---|
|  | ROL | - | rldicl | 
|  | ROR | ROR, EXTR | - | 
|  | POPCNT | - | popcntb | 
|  | BSR, LZCNT | CLZ | cntlzd | 
|  | - | CLS | - | 
|  | BSF, TZCNT | - | - | 
|  | - | - | - | 
Many of the below bit functions are defined below to ease portability to these architectures. For places where specific compiler idioms and automatic detection are not possible, similar assembly tricks or optimized implementations can be provided by C. Further bit functions were also merged into C++, resulting in the current state of the C++ bit header.
There is further a bit of an "infamous" page amongst computer scientists for Bit Twiddling Hacks. These may not all map directly to instructions but they provide a broad set of useful functionality commonly found in not only CPU-based programming libraries, but GPU-based programming libraries and other high performance computing resources as well.
We try to take the most useful subset of these functions that most closely represent functionality on both old and new CPU architectures as well as common, necessary operations that have been around in the last 25 years for various industries. We have left out operations such as sign extension, parity computation, bit merging, clear/setting bits, fast negation, bit swaps, lexicographic next bit permutation, and bit interleaving. The rest are more common and appear across a wide range of industries from cryptography to graphics to simulation to efficient property lookup and kernel scheduling.
4.7.1. "Why not only generic interfaces or ( u ) intmax_t 
   For many of the bit-based utilities, you will see it introduces functions with several suffixes for the various types. Often, it is asked: why? Even the GCC builtins for things like 
The generic interfaces can be used by individuals who want automatic selection of the best. And, as shown in the § 6 Appendix, platforms can use any builtins or techniques at their disposal to select an appropriate built-in, instruction, or function call to fit the use case.
4.7.2. Type-Generic Macros and Counts for Types
All of the functions below have type generic macros associated with them. This can bring up an interesting question: if the return value depends on the type of the argument going into the function (i.e. for 
4.7.3. Argument Types
Many of the functions below are defined over the fundamental unsigned integer types, rather than their minimum width or exact width counterparts. This is done to provide maximum portability: users can combine information from the recently-introduced 
The 
The exact-width types suffer from non-availability on specific platforms, which makes little sense for functions which do not depend on a no-padding bits requirement. As long as the values read from the array only involve 
Extended integer types, least-width integer types, and exact-width integer types, can all be used with the type-generic macros since the type-generic macros are required to work over all standard (unsigned) integer types and extended (unsigned) integer types, while excluding 
This paper does not concern itself with the implications of passing a 
Finally, in general 
4.7.4. Return Types
There is the question of what is meant to happen for types which return bit counts, such as 
At the moment, the functions do not accept all bit-precise integer types (just ones that are bit-width equivalent to the existing standard and extended integer types), so this is technically a non-issue. But, if and when bit-precise integer types are given better handling in 
4.7.5. stdc_count_ones stdc_count_zeros 
   
The API for it is as such:
#include <stdbit.h>int stdc_count_onesuc ( unsigned char value ); int stdc_count_onesus ( unsigned short value ); int stdc_count_onesui ( unsigned int value ); int stdc_count_onesul ( unsigned long value ); int stdc_count_onesull ( unsigned long long value ); int stdc_count_zerosuc ( unsigned char value ); int stdc_count_zerosus ( unsigned short value ); int stdc_count_zerosui ( unsigned int value ); int stdc_count_zerosul ( unsigned long value ); int stdc_count_zerosull ( unsigned long long value ); // type-generic macros generic_return_type stdc_count_ones ( generic_value_type value ); generic_return_type stdc_count_zeros ( generic_value_type value ); 
It covers all of the built-in unsigned integer types. The type-generic macro supports all of the built-in types as well as any of the implementation-defined extended integer types. See the appendix for an implementation.
4.7.6. stdc_rotate_left stdc_rotate_right 
   
#include <stdbit.h>unsigned char stdc_rotate_leftuc ( unsigned char value , int count ); unsigned short stdc_rotate_leftus ( unsigned short value , int count ); unsigned int stdc_rotate_leftui ( unsigned int value , int count ); unsigned long stdc_rotate_leftul ( unsigned long value , int count ); unsigned long long stdc_rotate_leftull ( unsigned long long value , int count ); unsigned char stdc_rotate_rightuc ( unsigned char value , int count ); unsigned short stdc_rotate_rightus ( unsigned short value , int count ); unsigned int stdc_rotate_rightui ( unsigned int value , int count ); unsigned long stdc_rotate_rightul ( unsigned long value , int count ); unsigned long long stdc_rotate_rightull ( unsigned long long value , int count ); // type-generic macro generic_value_type stdc_rotate_left ( generic_value_type value , generic_count_type count ); generic_value_type stdc_rotate_right ( generic_value_type value , generic_count_type count ); 
They cover all of the built-in unsigned integer types. A discussion of signed vs. unsigned integer types for the count type and the return type can be found in a previous section, here § 4.3 Signed vs. Unsigned.
As for choosing a single function like 
f : # @f mov r8d , edi mov ecx , esi rol r8d , cl mov edx , edi ror edx , cl mov ecx , esi neg ecx mov eax , edi rol eax , cl ror edi , cl test esi , esi cmovs edx , r8d cmovle eax , edi add eax , edx ret 
This is more than double the size of the rotates found using left/right directly in § 4.3 Signed vs. Unsigned. Due to this, we decided that it was not advantageous to have a signed count with an unknown left/right: it is important to be capable of biasing the optimizer to whether a given rotate is left/right oriented.
4.7.7. stdc_leading_zeros stdc_leading_ones stdc_trailing_zeros stdc_trailing_ones 
   
#include <stdbit.h>int stdc_leading_zerosuc ( unsigned char value ); int stdc_leading_zerosus ( unsigned short value ); int stdc_leading_zerosui ( unsigned int value ); int stdc_leading_zerosul ( unsigned long value ); int stdc_leading_zerosull ( unsigned long long value ); int stdc_leading_onesuc ( unsigned char value ); int stdc_leading_onesus ( unsigned short value ); int stdc_leading_onesui ( unsigned int value ); int stdc_leading_onesul ( unsigned long value ); int stdc_leading_onesull ( unsigned long long value ); int stdc_trailing_zerosuc ( unsigned char value ); int stdc_trailing_zerosus ( unsigned short value ); int stdc_trailing_zerosui ( unsigned int value ); int stdc_trailing_zerosul ( unsigned long value ); int stdc_trailing_zerosull ( unsigned long long value ); int stdc_trailing_onesuc ( unsigned char value ); int stdc_trailing_onesus ( unsigned short value ); int stdc_trailing_onesui ( unsigned int value ); int stdc_trailing_onesul ( unsigned long value ); int stdc_trailing_onesull ( unsigned long long value ); // type-generic macros generic_return_type stdc_leading_zeros ( generic_value_type value ); generic_return_type stdc_leading_ones ( generic_value_type value ); generic_return_type stdc_trailing_zeros ( generic_value_type value ); generic_return_type stdc_trailing_ones ( generic_value_type value ); 
4.7.8. stdc_first_leading_zero stdc_first_leading_one stdc_first_trailing_zero stdc_first_trailing_one 
   
We specify things to use the interpretation that 
It is of note that users can implement the 
#include <stdbit.h>int stdc_first_leading_zerouc ( unsigned char value ); int stdc_first_leading_zerous ( unsigned short value ); int stdc_first_leading_zeroui ( unsigned int value ); int stdc_first_leading_zeroul ( unsigned long value ); int stdc_first_leading_zeroull ( unsigned long long value ); int stdc_first_leading_oneuc ( unsigned char value ); int stdc_first_leading_oneus ( unsigned short value ); int stdc_first_leading_oneui ( unsigned int value ); int stdc_first_leading_oneul ( unsigned long value ); int stdc_first_leading_oneull ( unsigned long long value ); int stdc_first_trailing_zerouc ( unsigned char value ); int stdc_first_trailing_zerous ( unsigned short value ); int stdc_first_trailing_zeroui ( unsigned int value ); int stdc_first_trailing_zeroul ( unsigned long value ); int stdc_first_trailing_zeroull ( unsigned long long value ); int stdc_first_trailing_oneuc ( unsigned char value ); int stdc_first_trailing_oneus ( unsigned short value ); int stdc_first_trailing_oneui ( unsigned int value ); int stdc_first_trailing_oneul ( unsigned long value ); int stdc_first_trailing_oneull ( unsigned long long value ); // type-generic macros generic_return_type stdc_first_leading_zero ( generic_value_type value ); generic_return_type stdc_first_leading_one ( generic_value_type value ); generic_return_type stdc_first_trailing_zero ( generic_value_type value ); generic_return_type stdc_first_trailing_one ( generic_value_type value ); 
4.7.9. stdc_has_single_bit 
   This is a function that determines if an unsigned integer is a power of 2. It can be written either using a normal expression such as 
#include <stdbit.h>bool stdc_has_single_bituc ( unsigned char value ); bool stdc_has_single_bitus ( unsigned short value ); bool stdc_has_single_bitui ( unsigned int value ); bool stdc_has_single_bitul ( unsigned long value ); bool stdc_has_single_bitull ( unsigned long long value ); // type-generic macro bool stdc_has_single_bit ( generic_value_type value ); 
4.7.10. stdc_bit_width stdc_bit_ceil stdc_bit_floor 
   These set of functions provide a way to determine the number of bits it takes to represent a given value (
The declarations look as follows:
#include <stdbit.h>unsigned char stdc_bit_flooruc ( unsigned char value ); unsigned short stdc_bit_floorus ( unsigned short value ); unsigned int stdc_bit_floorui ( unsigned int value ); unsigned long stdc_bit_floorul ( unsigned long value ); unsigned long long stdc_bit_floorull ( unsigned long long value ); unsigned char stdc_bit_ceiluc ( unsigned char value ); unsigned short stdc_bit_ceilus ( unsigned short value ); unsigned int stdc_bit_ceilui ( unsigned int value ); unsigned long stdc_bit_ceilul ( unsigned long value ); unsigned long long stdc_bit_ceilull ( unsigned long long value ); int stdc_bit_widthuc ( unsigned char value ); int stdc_bit_widthus ( unsigned short value ); int stdc_bit_widthui ( unsigned int value ); int stdc_bit_widthul ( unsigned long value ); int stdc_bit_widthull ( unsigned long long value ); // type-generic macro generic_return_type stdc_bit_floor ( generic_value_type value ); generic_return_type stdc_bit_ceil ( generic_value_type value ); generic_return_type stdc_bit_width ( generic_value_type value ); 
Notably, 
5. Wording
The following wording is relative to N2912. For the rotate functions, wording is attached for all permutations of the polls taken, which are listed just below.
5.1. Decisions for the C Standards Committee
These are decisions the Committee might want to make to alter the wording below. Alternative wording is provided to guide the discussion and to make voting with the actual alternative specification in front of people’s eyes easier.
5.1.1. Question 0
— Given the new information present in the paper, do we want a single 
NOTE: #3 from § 5.1.2 Question 1 does not apply if this question is accepted, because then the rotate must have a sign to communicate left/right.
If the answer to this question is "Yes", then the below sections on "§7.✨.15 Rotate Left" and "§7.✨.16 Rotate Right" will be swapped out for the following wording:
7.✨.15 RotateSynopsisunsigned char stdc_rotate_leftuc ( unsigned char value , int count ); unsigned short stdc_rotate_leftus ( unsigned short value , int count ); unsigned int stdc_rotate_leftui ( unsigned int value , int count ); unsigned long stdc_rotate_leftul ( unsigned long value , int count ); unsigned long long stdc_rotate_leftull ( unsigned long long value , int count ); generic_value_type stdc_rotate_left ( generic_value_type value , generic_count_type count ); DescriptionThefunctions perform a bitwise rotate left or right. This operation is typically known as a left or right circular shift.stdc_rotate ReturnsLet N be the width corresponding to the type of the input
. Let r bevalue .count % N 
— If r is 0, returns
;value 
— otherwise, if r is positive, returns
;( value < < r ) | ( value >> ( N - r )) 
— otherwise, if r is negative, returns
.( value >> - r ) | ( value << ( N - - r )) The type-generic function (marked by its
argument) returns the above described result for a given input value so long as thegeneric_value_type is angeneric_value_type 
— standard unsigned integer type, excluding
;bool 
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose width matches a standard or extended integer type, excluding
.bool The
type shall be suitably large unsigned integer type capable of representing the width of the computed result. Thegeneric_return_type shall be a signed integer type.generic_count_type 
5.1.2. Question 1
— Do we want unsigned (
If the answer to this question is "Yes", then the following mechanical changes are made to the wording:
- 
     The return types for the following functions is changed: - 
       stdc_count_ones int unsigned int 
- 
       stdc_count_zeros int unsigned int 
- 
       stdc_leading_ones int unsigned int 
- 
       stdc_leading_zeros int unsigned int 
- 
       stdc_trailing_ones int unsigned int 
- 
       stdc_trailing_zeros int unsigned int 
- 
       stdc_first_leading_one int unsigned int 
- 
       stdc_first_leading_zero int unsigned int 
- 
       stdc_first_trailing_one int unsigned int 
- 
       stdc_first_trailing_zero int unsigned int 
- 
       stdc_bit_width int unsigned int 
 
- 
       
- 
     Replace all instances of the following text: - 
       — "Thegeneric_return_type value … with … — "The generic_return_type value 
 
- 
       
- 
     Make the following modifications to the stdc_rotate_left stdc_rotate_right - 
       Replace the parameter type for all the rotate functions from int unsigned int 
- 
       Remove the bullet point for when a negative count/"r" is encountered: — otherwise, if r is negative, returns.…
- 
       Change the last sentence for both functions concerning the types of the generic count and returns from: - 
         Thegeneric_return_type generic_count_type … to … 
- The
generic_return_type generic_count_type 
 
- 
         
 
- 
       
NOTE: #3 does not apply if § 5.1.1 Question 0 is accepted, because then the rotate must have a sign. This is captured in the wording shown above.
5.1.3. Question 2
There is also 1 more question that has been consistently asked of me as I’ve moved this proposal forward: changing how the suffixes for the types is done. Rather than doing 
— Do we want to change the suffixes of all of the type-specific functions to use an underscore before the suffix?
5.2. Add < stdbit . h > 
   A conforming freestanding implementation shall accept any strictly conforming program in which the use of the features specified in the library clause (Clause 7) is confined to the contents of the standard headers
,< float . h > ,< iso646 . h > ,< limits . h > ,< stdalign . h > ,< stdarg . h > ,< stdbit . h > ,< stdbool . h > ,< stddef . h > , and< stdint . h > < stdnoreturn . h > 
5.3. Add a new §7.✨ sub-clause for "Bit and Byte Utilities" in §7
7.✨ Bit and Byte Utilities< stdbit . h > 7.✨.1 General< stdbit . h > The header
defines the following macros, types, and functions, to work with the byte and bit representation of many types, typically integer types. This header makes available the< stdbit . h > type name (7.19) and anysize_t ,uint N _t ,int N _t , oruint_least N _t type names defined by the implementation (7.20).int_least N _t For declarations and definitions in 7.✨, an identifier with a suffix containing
typically represents little-endian. An identifier with a suffix containingle typically represents big-endian. This clause describes the endianness of the execution environment with respect to bit-precise integer types, standard integer types, and extended integer types which do not have padding bits.be The most significant index is the 0-based index counting from the most significant bit, $0$, to the least significant bit, $w - 1$, where $w$ is the width of the type that is having its most significant index computed.
The least significant index is the 0-based index counting from the least significant bit, $0$, to the most significant bit, $w - 1$, where $w$ is the width of the type that is having its least significant index computed.
It is unspecified whether any generic function declared in
is a macro or an identifier declared with external linkage. If a macro definition is suppressed in order to access an actual function, or a program defines an external identifier with the name of a generic function, the behavior is unspecified.< stdbit . h > 
5.3.1. Add a new §7.✨.1 sub-sub-clause for "Endian" in §7.✨
7.✨.2 EndianTwo common methods of byte ordering in multi-byte scalar types are little-endian and big-endian. Little-endian is a format for storage of binary data in which the least significant byte is placed first, with the rest in ascending order. Or, that the least significant byte is stored at the smallest memory address. Big-endian is a format for storage or transmission of binary data in which the most significant byte is placed first, with the rest in descending order. Or, that the most significant byte is stored at the smallest memory address. Other byte orderings are also possible.
The macros are:__STDC_ENDIAN_LITTLE__ which represents a method of byte order storage least significant byte is placed first and the rest are in ascending order, and is an integer constant expression;
__STDC_ENDIAN_BIG__ which represents a method of byte order storage most significant byte is placed first and the rest are in descending order, and is an integer constant expression;
__STDC_ENDIAN_NATIVE__ /* see below */ which represents the method of byte order storage for the execution environment and is an integer constant expression.
shall expand to an integer constant expression whose value is equivalent to the value of__STDC_ENDIAN_NATIVE__ if the execution environment is little-endian. Otherwise,__STDC_ENDIAN_LITTLE__ shall expand to an integer constant expression whose value is equivalent to the value of__STDC_ENDIAN_NATIVE__ if the execution environment is big-endian. If__STDC_ENDIAN_BIG__ is not equivalent to either, then the byte order for the execution environment is implementation-defined.__STDC_ENDIAN_NATIVE__ 
5.3.2. Add a new §7.✨.3 sub-sub-clause for "8-bit Memory Reversal" in §7.✨
7.✨.3 8-bit Memory ReversalSynopsis#include <stdbit.h>#include <limit.h>#if (CHAR_BIT % 8) == 0 void stdc_memreverse8 ( size_t n , unsigned char ptr [ static n ]); #endif DescriptionThe
function provides an interface to reverse the order of a given sequence of bytes by treating them as sequences of 8 bits at a time. The function is only present ifstdc_memreverse8 is a multiple of 8. It is equivalent to the following algorithm:CHAR_BIT for ( size_t index = 0 , limit = (( n * CHAR_BIT ) / 2 ); index < limit ;) { const size_t ptr_index = index / CHAR_BIT ; const size_t rev_ptr_index = n - 1 - ptr_index ; unsigned char * p = ptr + ptr_index ; unsigned char * rev_p = ptr + rev_ptr_index ; const unsigned char b_temp = * p ; const unsigned char rev_b_temp = * rev_p ; * p = 0 ; * rev_p = 0 ; const size_t bit_limit = CHAR_BIT ; for ( size_t bit_index = 0 ; bit_index < bit_limit ; bit_index += 8 ) { const size_t rev_bit_index = CHAR_BIT - 8 - bit_index ; const unsigned char bit_mask = (( unsigned char ) 0xFF ) << bit_index ; const unsigned char rev_bit_mask = (( unsigned char ) 0xFF ) << rev_bit_index ; * p |= ((( rev_b_temp & rev_bit_mask ) >> rev_bit_index ) << bit_index ); * rev_p |= ((( b_temp & bit_mask ) >> bit_index ) << rev_bit_index ); index += 8 ; } } 7.✨.4 Exact-width 8-bit Memory ReversalSynopsis#include <stdbit.h>#include <limits.h>#include <stdint.h>#if ((N % 8) == 0) && ((CHAR_BIT % 8) == 0) uintN_t stdc_memreverse8uN ( uintN_t value ); #endif DescriptionThe
functions provide an interface to swap the bytes of a correspondingstdc_memreverse8u N object, where N matches one of the exact-width integer types (7.20.1.1). If an implementation provides the correspondinguint N _t typedef, it shall define the corresponding exact-width memory reversal function for that value ofuint N _t .N ReturnsThe
functions returns the 8-bit memory reversedstdc_memreverse8u N value, as if by invokinguint N _t .stdc_memreverse8 ( sizeof ( value ), ( unsigned char * ) & value ) 
5.3.3. Add a new §7.✨.5 sub-sub-clause for "Endian-Aware" functions in §7.✨
7.✨.5 Endian-Aware 8-bit LoadSynopsis#include <stdbit.h>#if ((N % 8) == 0) && ((CHAR_BIT % 8) == 0) uint_leastN_t stdc_load8_leuN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); uint_leastN_t stdc_load8_beuN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); uint_leastN_t stdc_load8_aligned_leuN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); uint_leastN_t stdc_load8_aligned_beuN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); int_leastN_t stdc_load8_lesN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); int_leastN_t stdc_load8_besN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); int_leastN_t stdc_load8_aligned_lesN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); int_leastN_t stdc_load8_aligned_besN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); #endif DescriptionThe 8-bit load family of functions functions read an
orint_least N _t object from the provideduint_least N _t in an endian-aware (7.✨.2) manner, where N matches an existing minimum-width integer type (7.20.1.2). If this function is present, N shall be a multiple of 8 andptr shall be a multiple of 8. The functions containingCHAR_BIT in the name shall assume that_aligned is suitably aligned to access a signed or unsigned integer of width N for a signed or unsigned variant of the function, respectively. If the function name contains theptr suffix in the name, it is a signed variant. Otherwise, the function is an unsigned variant. If the function name contains thes N orles N suffix, it is a little-endian variant. Otherwise, if the function name contains theleu N orbes N suffix, it is a big-endian variant.beu N ReturnsLet the computed value $result$ be:
$$\sum_{index = 0}^{(N \div{} CHAR\_{}BIT) - 1} b_{index} \times{} 2^{8 \times{} index}$$
where $b_{index}$ is:
—
, if the function is the little-endian variant;( ptr [ index / ( CHAR_BIT / 8 )] >> (( index % ( CHAR_BIT / 8 )) * 8 )) & 0xFF 
— otherwise,
, if the function is the the big-endian variant.( ptr [((( N / CHAR_BIT ) - 1 ) - index ) / ( CHAR_BIT / 8 )] >> ((((( N / CHAR_BIT ) - 1 ) - index ) % ( CHAR_BIT / 8 )) * 8 ))) & 0xFF If the function is an unsigned variant, return $result$. Otherwise, if the function is a signed variant, return:
$result$, if $result$ is less than $2^{N-1}$;
otherwise, $result - 2^{N}$.
7.✨.6 Endian-Aware 8-bit StoreSynopsis#include <stdbit.h>#if ((N % CHAR_BIT) == 0) && ((CHAR_BIT % 8 == 0) void stdc_store8_leuN ( uint_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_beuN ( uint_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_aligned_leuN ( uint_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_aligned_beuN ( uint_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_lesN ( int_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_besN ( int_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_aligned_lesN ( int_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_aligned_besN ( int_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); #endif DescriptionThe 8-bit store family of functions functions write a
orint_least N _t object into the provideduint_least N _t in an endian-aware (7.✨.2) manner, where N matches an existing minimum-width integer type (7.20.1.2). If this function is present, N shall be a multiple of 8 andptr shall be a multiple of 8. The functions containingCHAR_BIT in the name shall assume that_aligned is suitably aligned to access a signed or unsigned integer of width N. If the function name contains theptr suffix in the name, it is a signed variant. Otherwise, the function is an unsigned variant. If the function name contains thes N orles N suffix, it is a little-endian variant. Otherwise, if the function name contains theleu N orbes N suffix, it is a big-endian variant.beu N Let
bevalue_unsigned if the function is a unsigned variant. Otherwise, letvalue be the conversion ofvalue_unsigned to its corresponding unsigned type, if the function is a signed variant.value Let
be an integer in a sequence thatindex 
— starts from 0 and increments by 8 in the range of [0, N), if the function is a little-endian variant;
— starts from
and decrements by 8 in the range of [0, N), if the function is a big-endian variant.N - 8 Let
be an integer that starts from 0. Letptr_bit_index bebyte_index8 . For eachindex % CHAR_BIT in the order of the above-specified sequence:index 
Let
be an object of valuebyte_mask8 of a suitably large unsigned type.( 0xFF << byte_index8 ) 
Sets the 8 bits in
at offsetptr [ ptr_bit_index / CHAR_BIT ] tobyte_index8 .( value_unsigned >> index ) & byte_mask8 
Increments
by 8.ptr_bit_index 
5.3.4. Add a new §7.✨.7 sub-sub-clause for Low-Level Bit Utilities in §7.✨
7.✨.7 Count Leading ZerosSynopsisint stdc_leading_zerosuc ( unsigned char value ); int stdc_leading_zerosus ( unsigned short value ); int stdc_leading_zerosui ( unsigned int value ); int stdc_leading_zerosul ( unsigned long value ); int stdc_leading_zerosull ( unsigned long long value ); generic_return_type stdc_leading_zeros ( generic_value_type value ); ReturnsReturns the number of consecutive 0 bits in
, starting from the most significant bit.value The type-generic function (marked by its
argument) returns the appropriate value based on the type of the input value, so long as it is angeneric_value_type 
— standard unsigned integer type, excluding
;bool 
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose width matches a standard or extended integer type, excluding
.bool The
type shall be a suitably large signed integer type capable of representing the computed result.generic_return_type 7.✨.8 Count Leading OnesSynopsisint stdc_leading_onesuc ( unsigned char value ); int stdc_leading_onesus ( unsigned short value ); int stdc_leading_onesui ( unsigned int value ); int stdc_leading_onesul ( unsigned long value ); int stdc_leading_onesull ( unsigned long long value ); generic_return_type stdc_leading_ones ( generic_value_type value ); ReturnsReturns the number of consecutive 1 bits in
, starting from the most significant bit.value The type-generic function (marked by its
argument) returns the appropriate value based on the type of the input value, so long as it is angeneric_value_type 
— standard unsigned integer type, excluding
;bool 
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose width matches a standard or extended integer type, excluding
.bool The
type shall be a suitably large signed integer type capable of representing the computed result.generic_return_type 7.✨.9 Count Trailing ZerosSynopsisint stdc_trailing_zerosuc ( unsigned char value ); int stdc_trailing_zerosus ( unsigned short value ); int stdc_trailing_zerosui ( unsigned int value ); int stdc_trailing_zerosul ( unsigned long value ); int stdc_trailing_zerosull ( unsigned long long value ); generic_return_type stdc_trailing_zeros ( generic_value_type value ); ReturnsReturns the number of consecutive 0 bits in
, starting from the least significant bit.value The type-generic function (marked by its
argument) returns the appropriate value based on the type of the input value, so long as it is angeneric_value_type 
— standard unsigned integer type, excluding
;bool 
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose width matches a standard or extended integer type, excluding
.bool The
type shall be a suitably large signed integer type capable of representing the computed result.generic_return_type 7.✨.10 Count Trailing OnesSynopsisint stdc_trailing_onesuc ( unsigned char value ); int stdc_trailing_onesus ( unsigned short value ); int stdc_trailing_onesui ( unsigned int value ); int stdc_trailing_onesul ( unsigned long value ); int stdc_trailing_onesull ( unsigned long long value ); generic_return_type stdc_trailing_ones ( generic_value_type value ); ReturnsReturns the number of consecutive 1 bits in
, starting from the least significant bit.value The type-generic function (marked by its
argument) returns the appropriate value based on the type of the input value, so long as it is angeneric_value_type 
— standard unsigned integer type, excluding
;bool 
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose width matches a standard or extended integer type, excluding
.bool The
type shall be a suitably large signed integer type capable of representing the computed result.generic_return_type 7.✨.11 First Leading ZeroSynopsisint stdc_first_leading_zerouc ( unsigned char value ); int stdc_first_leading_zerous ( unsigned short value ); int stdc_first_leading_zeroui ( unsigned int value ); int stdc_first_leading_zeroul ( unsigned long value ); int stdc_first_leading_zeroull ( unsigned long long value ); generic_return_type stdc_first_leading_zero ( generic_value_type value ); ReturnsReturns the most significant index of the first 0 bit in
, plus 1. If it is not found, this function returns 0.value The type-generic function (marked by its
argument) returns the appropriate value based on the type of the input value, so long as it is angeneric_value_type 
— standard unsigned integer type, excluding
;bool 
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose width matches a standard or extended integer type, excluding
.bool The
type shall be a suitably large signed integer type capable of representing the computed result.generic_return_type 7.✨.12 First Leading OneSynopsisint stdc_first_leading_oneuc ( unsigned char value ); int stdc_first_leading_oneus ( unsigned short value ); int stdc_first_leading_oneui ( unsigned int value ); int stdc_first_leading_oneul ( unsigned long value ); int stdc_first_leading_oneull ( unsigned long long value ); generic_return_type stdc_first_leading_one ( generic_value_type value ); ReturnsReturns the most significant index of the first 1 bit in
, plus 1. If it is not found, this function returns 0.value The type-generic function (marked by its
argument) returns the appropriate value based on the type of the input value, so long as it is an:generic_value_type 
— standard unsigned integer type, excluding
;bool 
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose width matches a standard or extended integer type, excluding
.bool The
type shall be a suitably large signed integer type capable of representing the computed result.generic_return_type 7.✨.13 First Trailing ZeroSynopsisint stdc_first_trailing_zerouc ( unsigned char value ); int stdc_first_trailing_zerous ( unsigned short value ); int stdc_first_trailing_zeroui ( unsigned int value ); int stdc_first_trailing_zeroul ( unsigned long value ); int stdc_first_trailing_zeroull ( unsigned long long value ); generic_return_type stdc_first_trailing_zero ( generic_value_type value ); ReturnsReturns the least significant index of the first 0 bit in
, plus 1. If it is not found, this function returns 0.value The type-generic function (marked by its
argument) returns the appropriate value based on the type of the input value, so long as it is angeneric_value_type 
— standard unsigned integer type, excluding
;bool 
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose width matches a standard or extended integer type, excluding
.bool The
type shall be a suitably large signed integer type capable of representing the computed result.generic_return_type 7.✨.14 First Trailing OneSynopsisint stdc_first_trailing_oneuc ( unsigned char value ); int stdc_first_trailing_oneus ( unsigned short value ); int stdc_first_trailing_oneui ( unsigned int value ); int stdc_first_trailing_oneul ( unsigned long value ); int stdc_first_trailing_oneull ( unsigned long long value ); generic_return_type stdc_first_trailing_one ( generic_value_type value ); ReturnsReturns the least significant index of the first 1 bit in
, plus 1. If it is not found, this function returns 0.value The type-generic function (marked by its
argument) returns the appropriate value based on the type of the input value, so long as it is angeneric_value_type 
— standard unsigned integer type, excluding
;bool 
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose width matches a standard or extended integer type, excluding
.bool The
type shall be a suitably large signed integer type capable of representing the computed result.generic_return_type 7.✨.15 Rotate LeftSynopsisunsigned char stdc_rotate_leftuc ( unsigned char value , int count ); unsigned short stdc_rotate_leftus ( unsigned short value , int count ); unsigned int stdc_rotate_leftui ( unsigned int value , int count ); unsigned long stdc_rotate_leftul ( unsigned long value , int count ); unsigned long long stdc_rotate_leftull ( unsigned long long value , int count ); generic_value_type stdc_rotate_left ( generic_value_type value , generic_count_type count ); DescriptionThefunctions perform a bitwise rotate left. This operation is typically known as a left circular shift.stdc_rotate_left ReturnsLet N be the width corresponding to the type of the input
. Letvalue ber .count % N 
— If r is 0, returns
;value 
— otherwise, if r is positive, returns
;( value < < r ) | ( value >> ( N - r )) 
— otherwise, if r is negative, returns
.stdc_rotate_right ( value , - r ) The type-generic function (marked by its
argument) returns the above described result for a given input value so long as thegeneric_value_type is angeneric_value_type 
— standard unsigned integer type, excluding
;bool 
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose width matches a standard or extended integer type, excluding
.bool The
type shall be suitably large unsigned integer type capable of representing the width of the computed result. Thegeneric_return_type shall be a signed integer type.generic_count_type 7.✨.16 Rotate RightSynopsisunsigned char stdc_rotate_rightuc ( unsigned char value , int count ); unsigned short stdc_rotate_rightus ( unsigned short value , int count ); unsigned int stdc_rotate_rightui ( unsigned int value , int count ); unsigned long stdc_rotate_rightul ( unsigned long value , int count ); unsigned long long stdc_rotate_rightull ( unsigned long long value , int count ); generic_value_type stdc_rotate_right ( generic_value_type value , generic_count_type count ); DescriptionThefunctions perform a bitwise rotate right. This operation is typically known as a right circular shift.stdc_rotate_right ReturnsLet N be the width corresponding to the type of the input
.. Let r bevalue .count % N 
— If r is 0, returns
;value 
— otherwise, if r is positive, returns
;( value >> r ) | ( value << ( N - r )) 
— otherwise, if r is negative, returns
.stdc_rotate_left ( value , - r ) The type-generic function (marked by its
argument) returns the above described result for a given input value so long as thegeneric_value_type isgeneric_value_type 
— a standard unsigned integer type, excluding
;bool 
— an extended unsigned integer type;
— or, a bit-precise unsigned integer type whose width matches a standard or extended integer type, excluding
.bool The
type shall be suitably large unsigned integer type capable of representing the width of the computed result. Thegeneric_return_type shall be a signed integer type.generic_count_type 7.✨.17 Count OnesSynopsisint stdc_count_onesuc ( unsigned char value ); int stdc_count_onesus ( unsigned short value ); int stdc_count_onesui ( unsigned int value ); int stdc_count_onesul ( unsigned long value ); int stdc_count_onesull ( unsigned long long value ); generic_return_type stdc_count_ones ( generic_value_type value ); ReturnsThe
functions returns the total number of 1 bits within the givenstdc_count_ones .value The type-generic function (marked by its
argument) returns the previously described result for a given input value so long as thegeneric_value_type is angeneric_value_type 
— standard unsigned integer type, excluding
;bool 
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose width matches a standard or extended integer type, excluding
.bool The
type shall be a suitably large signed integer type capable of representing the computed result.generic_return_type 7.✨.18 Count ZerosSynopsisint stdc_count_zerosuc ( unsigned char value ); int stdc_count_zerosus ( unsigned short value ); int stdc_count_zerosui ( unsigned int value ); int stdc_count_zerosul ( unsigned long value ); int stdc_count_zerosull ( unsigned long long value ); generic_return_type stdc_count_zeros ( generic_value_type value ); ReturnsThe
functions returns the total number of 0 bits within the givenstdc_count_zeros .value The type-generic function (marked by its
argument) returns the previously described result for a given input value so long as thegeneric_value_type is angeneric_value_type 
— standard unsigned integer type, excluding
;bool 
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose width matches a standard or extended integer type, excluding
.bool The
type for the type-generic function need not be the same as the type ofgeneric_return_type . It shall be suitably large unsigned integer type capable of representing the computed result.value 
5.3.5. Add a new §7.✨.19 sub-sub-clause for Fundamental Bit Utilities in §7.✨
7.✨.19 Single-bit CheckSynopsisbool stdc_has_single_bituc ( unsigned char value ); bool stdc_has_single_bitus ( unsigned short value ); bool stdc_has_single_bitui ( unsigned int value ); bool stdc_has_single_bitul ( unsigned long value ); bool stdc_has_single_bitull ( unsigned long long value ); bool stdc_has_single_bit ( generic_value_type value ); ReturnsThe
functions returns true if and only if there is a single 1 bit instdc_has_single_bit .value The type-generic function (marked by its
argument) returns the previously described result for a given input value so long as thegeneric_value_type is angeneric_value_type 
— standard unsigned integer type, excluding
;bool 
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose width matches a standard or extended integer type, excluding
.bool 7.✨.20 Bit WidthSynopsisint stdc_bit_widthuc ( unsigned char value ); int stdc_bit_widthus ( unsigned short value ); int stdc_bit_widthui ( unsigned int value ); int stdc_bit_widthul ( unsigned long value ); int stdc_bit_widthull ( unsigned long long value ); generic_return_type stdc_bit_width ( generic_value_type value ); DescriptionThe
functions compute the smallest number of bits needed to storestdc_bit_width .value ReturnsThe
functions return 0 ifstdc_bit_width is 0. Otherwise, they returnvalue .1 + ⌊log 2 ( value ) ⌋The type-generic function (marked by its
argument) returns the previously described result for a given input value so long as thegeneric_value_type is angeneric_value_type 
— standard unsigned integer type, excluding
;bool 
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose width matches a standard or extended integer type, excluding
.bool The
type for the type-generic function need not be the same as the type ofgeneric_return_type . It shall be suitably large signed integer type capable of representing the computed result.value 7.✨.21 Bit FloorSynopsisunsigned char stdc_bit_flooruc ( unsigned char value ); unsigned short stdc_bit_floorus ( unsigned short value ); unsigned int stdc_bit_floorui ( unsigned int value ); unsigned long stdc_bit_floorul ( unsigned long value ); unsigned long long stdc_bit_floorull ( unsigned long long value ); generic_value_type stdc_bit_floor ( generic_value_type value ); DescriptionThefunctions compute the largest integral power of 2 that is not greater thanstdc_bit_floor .value ReturnsThe
functions return 0 ifstdc_bit_floor is 0. Otherwise, they return the largest integral power of 2 that is not greater thanvalue .value The type-generic function (marked by its
argument) returns the previously described result for a given input value so long as thegeneric_value_type is angeneric_value_type 
— standard unsigned integer type, excluding
;bool 
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose width matches a standard or extended integer type, excluding
.bool 7.✨.22 Bit CeilingSynopsisunsigned char stdc_bit_ceiluc ( unsigned char value ); unsigned short stdc_bit_ceilus ( unsigned short value ); unsigned int stdc_bit_ceilui ( unsigned int value ); unsigned long stdc_bit_ceilul ( unsigned long value ); unsigned long long stdc_bit_ceilull ( unsigned long long value ); generic_value_type stdc_bit_ceil ( generic_value_type value ); DescriptionThefunctions compute the smallest integral power of 2 that is not less thanstdc_bit_ceil . If the computation does not fit in the given return type, the behavior is undefined.value ReturnsThe
functions return the smallest integral power of 2 that is not less thanstdc_bit_ceil .value The type-generic function (marked by its
argument) returns the previously described result for a given input value so long as thegeneric_value_type is angeneric_value_type 
— standard unsigned integer type, excluding
;bool 
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose width matches a standard or extended integer type, excluding
.bool 
5.4. Add one new entry for Implementation-Defined Behavior in Annex J.3
— The value of
if the execution environment is not big-endian or little-endian (7.✨.2).__STDC_ENDIAN_NATIVE__ — The value of
, and__STDC_ENDIAN_BIG__ if the execution environment is not big-endian or little-endian (7.✨.2).__STDC_ENDIAN_LITTLE__ 
5.5. Modify an existing entry for Unspecified behavior in Annex J.1
— The macro definition of a generic function is suppressed in order to access an actual function (7.17.1) , (7.✨).
6. Appendix
A collection of miscellaneous and helpful bits of information and implementation.
6.1. Decisions to Committee Questions
Originally titled "Committee Polls / Questions", this section listed all of the different pieces of functionality the Committee wanted. Each of the 5 below questions sets of functionality was asked of WG14: nobody raised objections to even want to see a poll on it. This is interpreted as there was unanimous consent amongst participants to include all of this functionality in the paper, even if no formal poll was done for each of the 5 questions. If this changes, it is imperative to let the paper author know.
For the Committee, this proposal is, effectively, five parts:
the endianness definitions;
the
functions (generic and width-specific);stdc_memreverse8 
the
/stdc_load8_ * endianness functions;stdc_store8_ * 
the suite of low-level bit functions:
,stdc_count_ ( leading / trailing ) _ ( ones / zeros ) 
,stdc_count_ ( ones / zeros ) 
, and,stdc_rotate_ ( left / right ) 
,stdc_first_ ( leading / trailing ) _ ( zero / one ) which map directly to instructions and/or intrinsics; and,
the suite of useful bit functions:
,stdc_bit_ceil 
,stdc_bit_floor 
, and,stdc_bit_width 
,stdc_has_single_bit which may not map directly to instructions but are useful nonetheless in a wide variety of contexts
These can be polled together or separately, depending on what the Committee desires.
6.2. Example Implementations in Publicly-Available Libraries
Optimized routines following the naming conventions present in this paper can be found in the Shepherd’s Oasis Industrial Development Kit (IDK) library, compilable with a conforming C11 compiler and tested on MSVC, GCC, and Clang on Windows, Mac, and Linux:
Optimized routines following the basic principles present in this paper and used as motivation to improve several C++ Standard Libraries can be found in the Itsy Bitsy Bit Libraries, compilable with a conforming C++17 compiler and tested on MSVC, GCC, and Clang on Windows, Mac, and Linux:
- 
     Bit Intrinsics (Declarations) (Sources) 
Endianness routines and original motivation that spawned this proposal came from David Seifert’s Portable Endianness library and its deep dive into compiler optimizations and efficient code generation when alignment came into play:
- 
     Endian Load/Store (Declarations) (Sources) 
6.3. Implementation of Generic stdc_count_ones 
   Sample implementation on Godbolt (clang/gcc specific builtins):
#define stdc_count_ones(...) \ _Generic((__VA_ARGS__), \ char: __builtin_popcount, \ unsigned char: __builtin_popcount, \ unsigned short: __builtin_popcount, \ unsigned int: __builtin_popcount, \ unsigned long: __builtin_popcountl, \ unsigned long long: __builtin_popcountll \ )(__VA_ARGS__) int main () { return stdc_count_ones (( unsigned char ) '0' ) + stdc_count_ones ( 13ull ); } 
6.4. Implementation of Generic stdc_bit_ceil 
   Sample implementation on Godbolt (clang/gcc specific builtins):
#include <limits.h>#define stdc_leading_zeros(...) \ (_Generic((__VA_ARGS__), \ char: __builtin_clz((__VA_ARGS__)) - ((sizeof(unsigned) - sizeof(char)) * CHAR_BIT), \ unsigned char: __builtin_clz((__VA_ARGS__)) - ((sizeof(unsigned) - sizeof(unsigned char)) * CHAR_BIT), \ unsigned short: __builtin_clz((__VA_ARGS__)) - ((sizeof(unsigned) - sizeof(unsigned short)) * CHAR_BIT), \ unsigned int: __builtin_clz((__VA_ARGS__)), \ unsigned long: __builtin_clzl((__VA_ARGS__)), \ unsigned long long: __builtin_clzll((__VA_ARGS__)) \ )) #define stdc_bit_width(...) \ _Generic((__VA_ARGS__), \ char: (CHAR_BIT - stdc_leading_zeros((__VA_ARGS__))), \ unsigned char: (UCHAR_WIDTH - stdc_leading_zeros((__VA_ARGS__))), \ unsigned short: (USHRT_WIDTH - stdc_leading_zeros((__VA_ARGS__))), \ unsigned int: (UINT_WIDTH - stdc_leading_zeros((__VA_ARGS__))), \ unsigned long: (ULONG_WIDTH - stdc_leading_zeros((__VA_ARGS__))), \ unsigned long long: (ULLONG_WIDTH - stdc_leading_zeros((__VA_ARGS__))) \ ) // integer promotion rules means we need to // precisely calculate the value here #define __stdc_bit_ceil_promotion_protection(_Type, _Value) \ _Generic((_Value), \ char: (_Value <= (_Type)1) ? (_Type)0 : (_Type)(1u <fake-production-placeholder class=production bs-autolink-syntax='<< (stdc_bit_width((_Type)(_Value - 1)) + (UINT_WIDTH - UCHAR_WIDTH)) >>' data-opaque> (stdc_bit_width((_Type)(_Value - 1)) + (UINT_WIDTH - UCHAR_WIDTH)) </fake-production-placeholder> (UINT_WIDTH - UCHAR_WIDTH)), \ unsigned char: (_Value <= (_Type)1) ? (_Type)0 : (_Type)(1u <fake-production-placeholder class=production bs-autolink-syntax='<< (stdc_bit_width((_Type)(_Value - 1)) + (UINT_WIDTH - UCHAR_WIDTH)) >>' data-opaque> (stdc_bit_width((_Type)(_Value - 1)) + (UINT_WIDTH - UCHAR_WIDTH)) </fake-production-placeholder> (UINT_WIDTH - UCHAR_WIDTH)), \ unsigned short: (_Value <= (_Type)1) ? (_Type)0 : (_Type)(1u <fake-production-placeholder class=production bs-autolink-syntax='<< (stdc_bit_width((_Type)(_Value - 1)) + (UINT_WIDTH - USHRT_WIDTH)) >>' data-opaque> (stdc_bit_width((_Type)(_Value - 1)) + (UINT_WIDTH - USHRT_WIDTH)) </fake-production-placeholder> (UINT_WIDTH - USHRT_WIDTH)), \ default: (_Type)0 \ ) #define stdc_bit_ceil(...) \ _Generic((__VA_ARGS__), \ char: __stdc_bit_ceil_promotion_protection(unsigned char, (__VA_ARGS__)), \ unsigned char: __stdc_bit_ceil_promotion_protection(unsigned char, (__VA_ARGS__)), \ unsigned short: __stdc_bit_ceil_promotion_protection(unsigned short, (__VA_ARGS__)), \ unsigned int: (unsigned int)(1u << stdc_bit_width((unsigned int)((__VA_ARGS__) - 1))), \ unsigned long: (unsigned long)(1ul << stdc_bit_width((unsigned long)((__VA_ARGS__) - 1))), \ unsigned long long: (unsigned long long)(1ull << stdc_bit_width((unsigned long long)((__VA_ARGS__) - 1))) \ ) int main () { int x = stdc_bit_ceil (( unsigned char ) '\x13' ); int y = stdc_bit_ceil ( 33u ); return x + y ; } 
6.5. Endian Enumeration
The endian enumeration was struck from this paper. It had very marginal benefit and was mostly redundant for Standard C code, since the macros would suffice well enough. Nevertheless, the old rationale is presented below.
6.5.1. Rationale
A 
The other portion of this is that providing an enumeration helps users pass this information along to functions. Users defining functions that take an endianness, without the enumeration, would define it as so:
void my_conversion_unsafe ( int endian , size_t data_size , unsigned char data [ static data_size ]); 
The name may specify that it is for an endian, but the range of values is not really known without looking at the documentation. It is also impossible for the compiler to diagnose problematic uses: calling 
void my_conversion_safe ( stdc_endian endian , size_t data_size , unsigned char data [ static data_size ]); 
This function call can get diagnosed in (some) implementations:
#include <stddef.h>typedef enum stdc_endian { stdc_endian_little = __ORDER_LITTLE_ENDIAN__ , stdc_endian_big = __ORDER_BIG_ENDIAN__ , stdc_endian_native = __BYTE_ORDER__ , } stdc_endian ; void my_conversion_unsafe ( int endian , size_t n , unsigned char ptr [ static n ]) {} void my_conversion_safe ( stdc_endian endian , size_t n , unsigned char ptr [ static n ]) {} int main () { unsigned char arr [ 4 ]; my_conversion_unsafe ( 48558395 , sizeof ( arr ), arr ); my_conversion_safe ( 48558395 , sizeof ( arr ), arr ); // ^ // <source>:15:24: error: integer constant not in range // of enumerated type 'stdc_endian' (aka 'enum stdc_endian') [-Werror,-Wassign-enum] my_conversion_unsafe (( stdc_endian ) 48558395 , sizeof ( arr ), arr ); my_conversion_safe (( stdc_endian ) 48558395 , sizeof ( arr ), arr ); return 0 ; } 
(Many current implementations do not diagnose it in the current landscape because such implicit conversions are, unfortunately, incredibly common, sometimes for good reason.)
7. Acknowledgements
Many thanks to David Seifert, Aaron Bachmann, Jens Gustedt, Tony Finch, Erin AO Shepherd, and many others who helped fight to get the semantics and wording into the right form, providing motivation, giving example code, pointing out existing libraries, and helping to justify this proposal.