1. Changelog
1.1. Revision 1  January 1^{st}, 2022

Drastically rework design section and motivations after several rounds of feedback from at least 4 vendors, 6 business partners, 3 Open Source maintainers, and more.

Add additional bit utilities and design them from existing practice in C, C++, Go, Rust, Zig, and implementationspecific constraints in Visual C++, CLang, GCC, SDCC, TCC and more.

stdc_first_ ( leading / trailing ) _ ( one / zero ) 
stdc_count_ ( leading / trailing ) _ ( ones / zeros ) 
Return types for bit functions counting bits is
, and for typegeneric functions computing an inputrelated value is "type suitably large enough to hold the result".int 
Arguments types should be
by default or the target input type, unless otherwise specified.int 
Provide backing implementation for all functionality in this paper at an official repository.

Provide benchmarks showing performance comparisons using the intrinsics vs. not in § 2.1 Bits: How Much Faster?.

Use
consistently in the function name spelling instead ofzeros
.zeroes
1.2. Revision 0  October 15^{th}, 2021

Initial release. ✨
2. Introduction & Motivation
There is a lot of proposals and work that goes into figuring out the "byte order" of integer values that occupy more than 1 octet (8 bits). This is nominally important when dealing with data that comes over network interfaces and is read from files, where the data can be laid out in various orders of octets for 2, 3, 4, 6, or 8tuples of octets. The most wellknown endian structures on existing architectures include "Big Endian", where the least significant bit comes "last" and is featured prominently in network protocols and file protocols; and, "Little Endian", where the least significant bit comes "first" and is typically the orientation of data for processor and user architectures most prevalent today.
In more legacy architectures (Honeywell, PDP), there also exists other orientations called "mixed" or "middle" endian. The uses of such endianness are of dubious benefit and are vanishingly rare amongst commodity and readily available hardware today, but nevertheless still represent an applicable ordering of octets.
In other related programming interfaces, the C functions/macros
("network to host") and
("host to network") (usually suffixed with
or
or others to specify which native data type it was being performed on such as
) were used to change the byte order of a value ([ntohl]). This became such a common operation that many compilers  among them Clang and GCC  optimized the code down to use an intrinsic
/
(for MSVC, for Clang, and for GCC). These intrinsics often compiled into binary code representing cheap, simple, and fast byte swapping instructions available on many CPUs for 16, 32, 64, and sometimes 128 bit numbers. The
/
intrinsics were used as the fundamental underpinning for the
and
functions, where a check for the translationtime endianness of the program determined if the byte order would be flipped or not.
This proposal puts forth the fundamentals that make a homegrown implementation of
,
, and other endiannessbased functions possible in Standard C code. It also addresses many of the compilerbased intrinsics found to generate efficient machine code, with a few simpler utilities layered on top of it.
2.1. Bits: How Much Faster?
Just how much faster can using intrinsics and bit operations as proposed in this paper be? Below is a quantification of the performance differences from naïve algorithms that worked over one "bit" (or
) at a time by attempting to implement a few algorithms using it. The explanations of these graphs can be found at one of the publicly available implementation of this code in its documentation  https://ztdidk.readthedocs.io/en/latest/benchmarks/bit.html.
If you don’t read the previous link, then at the very least it should be shown that the code describes in this proposal provides the means to implement the improvements shown in the ztdc_packed group of benchmark bars.
3. Design
This is a library addition. It is meant to expose both macros and functions that can be used for translation timesuitable checks. It provides a way to check endianness within the preprocessor, and gives definitive names that allow for knowing whether the endianness is big, little, or neither. We state big, little, or neither, because there is no settledupon name for the legacy endianness of "middle" or "mixed", nor any agreed upon ordering for such a "middle" or "mixed" endianness between architectures. This is not the case for big endian or little endian, where one is simply the reverse of the other, always, in every case, across architectures, file protocols, and network specifications.
The next part of the design is functions for working with groupings of 8 bits. They are meant to communicate with network or file protocols and formats that have become ubiquitous in computing for the last 30 years.
This design also provides a small but essential suite of bit utilities, all within the
header.
3.1. Preliminary: Why the stdc_
prefix?
We use the
prefix for these functions so that we do not have to struggle with taking common words away from the end user. Because we now have 31 bytes of linker name significance, we can afford to have some sort of prefix rather than spend all of our time carving out reserved words or headerspecific extensions. This will let us have good names that very clearly map to industry practice, without replacing industry code or being forced to be compatible with existing code that already has taken the name with sometimesconflicting argument conventions.
3.2. Charter: unsigned char const ptr [ static sizeof ( uintN_t )]
and More?
There are 2 choices on how to represent sized pointer arguments. The first is a
convention for functions arguments in this proposal. The second is an
/
convention.
To start, we still put any
+
arguments in the proper "size first, pointer second" configuration so that implementation extensions which allow
can exist no matter what choice is made here. That part does not change. The
argument convention mean that pointers to structures, or similar, can be passed to these functions without needing a cast. This represents the totality of the ease of use argument. The
argument convention can produce both better compiletime safety and articulate requirements using purely the function declaration, without needing to look up prose from the C Standard or implementation documentation. The cost is that any use of the function will require a cast in strictly conforming code.
One of the tipping arguments in favor of our choice of
is that
can be dangerous, especially since we still do not have a
constant in the language and
can be used for both the size and the pointer argument. (Which is, very sadly, an actual bug that happens in existing code. Especially when users mix
and
calls and use the wrong
argument because of writing one and meaning the other, and copying values over a large part of their 0pointer in their lowlevel driver code.) Using an
(or its staticallysized array function argument form) means that usage of the functions below would require explicit casting on the part of the user. This is, in fact, the way it is presented in [portableendianness]: as far as existing practice is concerned, users of the code would rather cast and preserve safety rather than easily use something like
with the guts of their structure.
3.3. The __STDC_ENDIAN_ *
Macros
The enumeration is specified as follows:
#include <stdbit.h>#define __STDC_ENDIAN_LITTLE__ /* some unique value */ #define __STDC_ENDIAN_BIG__ /* some other unique value */ #define __STDC_ENDIAN_NATIVE__ /* see below! */
The goal of these macros is that if the system identifies as a "little endian" system, then
, and that is how an enduser knows that the implementation is little endian. Similarly, a user can check
, and they can know the implementation is big endian. Finally, if the system is neither big nor little endian, than
is a unique value that does not compare equal to either value:
#include <stdbit.h>#include <stdio.h>int main () { if ( __STDC_ENDIAN_NATIVE__ == __STDC_ENDIAN_LITTLE__ ) { printf ( "little endian! uwu \n " ); } else if ( __STDC_ENDIAN_NATIVE__ == __STDC_ENDIAN_BIG__ ) { printf ( "big endian OwO! \n " ); } else { printf ( "what is this?! \n " ); } return 0 ; }
If a user has a Honeywell architecture or a PDP architecture, it is up to them to figure out which flavor of "middle endian"/"mixed endian"/"bi endian" they are utilizing. We do not give these a name in the set of macros because neither the Honeywell or PDP communities ever figured out which flavor of the 32bit byte order of
/
/
/etc. was strongly assigned to which name ("mixed" endian? "mixedbig" endian? "bilittle" endian?), and since this is not a settled matter in existing practice we do not provide a name for it in the C Standard. It is also of dubious determination what the byte order for a 3byte, 5byte, 6byte, or 7byte integer is in these mixedendian types, whereas both big and little have dependable orderings.
3.3.1. A (Brief) Discussion of Endianness
There is a LOT of design space and deployed existing practice in the endianness space of both architectures and their instruction sets. A nonexhaustive list of behaviors is as follows:

Instruction set, OS, and register conventions are insync (Windows, Apple, and most *Nix Distributions).

Instruction set has variability that can be toggled (ARM with the
instruction).SETEND 
Instruction set has no variability, but data can be stored in unconventional endianness (RISCV, mainframe architectures, and similar).

Instruction set has no variability, but it changes endianness between types/sizes (FORTRANimplemented floating point units used Big Endian, PDP11 compatibility with those machines required 32bit bigendian instructions on a littleendian machine (hilarity/shenanigans ensued)).

Instruction set has no variability, but historical weight forces certain choices (PDP11 had 16bit littleendian integers. Some folk interpreted two of them next to each other as a single 32bit integer, resulting in the
byte order).2143
Suffice to say, there exists a lot of deployed practice. Note that this list effectively has these concerns in priority order. The first is the most conventional software; as the list goes down, each occurrence becomes more rare and less interesting. Therefore, we try not to spend too much time focusing on what are effectively the edge cases of software and hardware. Some of the past choices in endianness and similar were simply due "going with the flow" (PDP’s "2143" order) or severe historical baggage (early FORTRAN dealing in big endian floating point numbers, and those algorithms and serialization methods being given to PDP machines without thinking about the ordering). With much of the industry moving away from such modes in both newer mainframes and architectures and towards newer implementations and architectures, it does not seem prudent to try to standardize the multitude of their behaviors.
This proposal constraints its definition of endianness to integer types without padding, strictly because trying to capture the vast berth of existing architectures and their practices can quickly devolve down a slope that deeply convolutes this proposal’s core mission: endian and bit utilities.
3.3.2. Hey! Some Architectures Can Change Their Endianness at Runtime!
This is beyond the scope of this proposal. This is meant to capture the translationtime endianness. There also does not appear to be any operating system written today that can tolerate an endianness change of the whole program happening arbitrarily at runtime, after a program has launched. This means that the property is effectively a translationtime property, and therefore can be exposed as a compiletime constant. A future proposal to determine the runtime byte order is more than welcome from someone who has suitable experience dealing with such architectures and programs, and this proposal does not preclude their ability to provide such a runtime function e.g.
.
Certain instruction sets have ways to set the endianness of registers, to change how data is accessed ([armsetend]). This functionality is covered by byte swapping, and byte swaps can be implemented using the
instruction plus an access. (The compiler would have to remember to unwind the endian state back to its original value, however, or risk contaminating the entire program and breaking things.)
3.3.3. Floating Point has a Byte Order, Too.
For the design of this paper, we strictly consider the design space for (unsigned) integers, only. Floating point numbers already have an implementationdefined byte order, and none of these functions are meant to interact with the floating point types. While the
function can work on any memory region, which includes any structure, scalar, or similar type with or without padding bits, the function just swaps bytes. Nothing needs to be said about padding bits in this case, since the operation is welldefined in all cases.
It shall be noted that for C++, since C++20, its endian enumeration applies to all scalar types:
This subclause describes the endianness of the scalar types of the execution environment.
— C++ Standard Working Draft, bit.endian/p1
It does not specify what this means for padding bits or similar; nor, I think, does it have to. Byte order means very little for padding bits until serialization comes into play. C++ does not define any functions which do byteorder aware serialization. So, it does not have to write any specification governing what may or may not happen and the left is rest undefined / unspecified.
For this proposal, we focus purely on integer types and, more specifically, on integer types which do not have padding or where we can work with a padding bitsagnostic representation. While it is acknowledged that floating point types and pointers have byte orders too, we do not want to interact directly with these types when it comes to endianness load and store functions. Byte swaps, (bit) population counts, and other bit operations can be performed on floating point types after they have been copied or typepunned (with implementation checking/blessing) into equivalent (unsigned) integer objects to do the necessary work.
3.4. Generic 8bit Memory Reverse and Exactwidth 8bit Memory Reverse
In order to accommodate both a wide variety of architectures but also support minimumwidth integer optimized intrinsics, this proposal takes from the industry 2 forms of byteswap:

one generic
version which takes a pointer and the number of bytes to perform a reverse operation; and,mem_ 
a sequence of exactwidth byte swapping instructions which (typically) map directly to intrinsics available in compilers and instructions in hardware.
These end up inhabiting the
header and have the following interface:
#include <stdbit.h>#include <limits.h>#include <stdint.h>#if (CHAR_BIT % 8 == 0) void stdc_memreverse8 ( size_t n , unsigned char ptr [ static n ]); uintN_t stdc_memreverse8uN ( uintN_t value ); #endif
where
is one of the minimumwidth integer types such as
,
,
,
,
,
, and others. On most architectures, this matches the builtins (MSVC, Clang, GCC) and the result of compiler optimizations that produce instructions for many existing architectures as shown in the README of this portable endianness function implementation. We use the exactwidth values for the
suffixed functions because we expect that C compilers would want to lower the
call to existing practice of
instructions and compiler intrinsics. Using
reduces the ability to match these existing optimizations in the case where
functions are not defined.
One property of note is that
swaps 8 bits at a time rather than
bits at a time (this is why it has the suffix "
" in the name). This matches existing practice: all known byteswap operations work on 8 bits. This caveat is here because we need to retain crossplatform behavior. If we swapped to using
, then the behavior of a program that uses no implementationdefined properties would suddenly become dependent on implementation/architecture properties:
// NOT guaranteed, if it works on CHAR_BIT // instead of working on 8 bits at a time. assert ( stdc_memreverse8u32 ( 0xAABBCCDD ) == 0xDDCCBBAA );
One of the problems with this approach is that it opens us up to potentially having padding bits if
is not a multiple of 8. There are a number of approaches to this, but the ultimate reality is that it is simply not portable using any other definition. If the goal is standard functions and the purpose of these types is to create a way to talk to other processors (or different kinds of cores all along the same bus), files in specific formats, or networks, then we have to stick to using an 8bit byte and not letting unspecified amounts of padding filtering into the representation. This also allows the code, when present, to map reasonably to available intrinsics: note that even the GCC builtins work explicitly on 8bitbytes, no matter the platform. We are simply following existing practice, here.
3.4.1. But Memory Reverse Is Dangerous?
Byte swapping, by itself, is absolutely dangerous in terms of code portability. Users often program strictly for their own architecture when doing serialization, and do not take into consideration that their endianness can change. This means that, while
functions can compile down to intrinsics, those intrinsics get employed to change "little endian" to "big endian" without performing the necessary "am I already in the right endianness" check. Values that are already in the proper byte order for their target serialization get swapped, resulting in an incorrect byte order for the target network protocol, file format, or other binary serialization target.
The inclusion of the
header reduces this problem by giving access to the
macro definition, but does not fully eliminate it. This is why many Linux and BSDs include functions which directly transcribe from one endianness to another. This is why the Byte Order Fallacy has spread so far in Systems Programming communities, and why many create their own versions of this both in official widespread vendor code ([linuxendian]) and in more personal code used for specific distributions ([portableendianness]). Thusly, this proposal includes some endianness functions, specified just below.
3.5. stdc_load8_ *
/stdc_store8_ *
EndianAware Functions
Functions meant to transport bytes to a specific endianness need 3 pieces of information:

the sign of the input/output;

the byte order of the input; and,

the desired byte order of the output.
To represent any operation that goes from/to the byte order that things like
s are kept in, the Linux/BSD/etc. APIs use the term "host", represented by
. Every other operation is represented by explicitly naming it, particularly as
or
for "big endian" or "little endian". Again, because of the severe confusion that comes from what the exact byte order a "mixed endian" multi byte scalar is meant to be in, there seems not to exist any widely available practice regarding what to call a PDP/Honeywell endian configuration. Therefore, mixed/bi/middleendian is not included in this proposal. It can be added at a later date if the community ever settles on a welldefined naming convention that can be shared between codebases, standards, and industries.
The specification for the endianness functions borrows from many different sources listed above, and is as follows:
#include <stdbit.h>#include <limits.h>#include <stdint.h>#if ((N % CHAR_BIT) == 0 && (CHAR_BIT % 8 == 0)) void stdc_store8_leuN ( uint_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_beuN ( uint_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); uint_leastN_t stdc_load8_leuN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); uint_leastN_t stdc_load8_beuN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_aligned_leuN ( uint_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_aligned_beuN ( uint_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); uint_leastN_t stdc_load8_aligned_leuN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); uint_leastN_t stdc_load8_aligned_beuN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_lesN ( int_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_besN ( int_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); int_leastN_t stdc_load8_lesN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); int_leastN_t stdc_load8_besN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_aligned_lesN ( int_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_aligned_besN ( int_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); int_leastN_t stdc_load8_aligned_lesN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); int_leastN_t stdc_load8_aligned_besN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); #endif
Thanks to some feedback from implementers and librarians, this first implementation would also need an added signed variant to the load and store functions as well as aligned and unaligned loads and stores. While C23 will mandate a two’s complement representation for integers, because we are using the
functions (which may be larger than the intended
or
specification), it is important for the sign bit to be properly serialized an transported. Therefore, during
/
operations, the sign bit will be directly serialized into resulting signed value or byte array where necessary.
This specification is marginally more complicated than the
functions because they operate on
, where
is the minimumwidth bit value. These functions, on most normal implementations, will just fill in the exact number of 8, 16, 32, 64, etc. bits. But for Digital Signal Processors (DSPs), select embedded architectures, and many freestanding implementations, it is impossible to offer a
guarantee. For example, some Digital Signal Processors have
, and all of
,
,
, and
are all aliased to the same fundamental type.
We are fine with not making these precisely
/
because the upcoming C23 Standard includes a specific allowance that if
/
exist, then
/
must match their exactwidth counterparts exactly, which has been existing practice on almost all implementations for quite some time now.
Similarly to
, we want a dependable set of functionality that can work across platforms. Therefore, the functions only exist if both
and
is evenly divisible by 8. We use the
types still because we want these functions to be generally available when the requirements are met, because we can guarantee a proper value as long as a user is working with
as anticipated. A lack of padding bits is not required to work with the memory correctly, unlike
and its exactwidth counterpart.
Note that this means a
implementation can still implement a
function, as it satisfies both
and
and uses the
parameter, which is guaranteed to be available in that implementation’s
header.
3.6. Modern Bit Utilities
Additionally to this, upon first pre_review of the paper there was a strong supporting groundswell for bit operations that have long been present in both hardware and as compiler intrinsics. This idea progressed naturally from the
and
discussion. As indicated in [p0553] (merged into C++20 already), here’s a basic rundown of some common architectures and their support for various bit functionality:
operation  Intel/AMD  ARM  PowerPC 

 ROL    rldicl 
 ROR  ROR, EXTR   
 POPCNT    popcntb 
 BSR, LZCNT  CLZ  cntlzd 
   CLS   
 BSF, TZCNT     
      
Many of the below bit functions are defined below to ease portability to these architectures. For places where specific compiler idioms and automatic detection are not possible, similar assembly tricks or optimized implementations can be provided by C. Further bit functions were also merged into C++, resulting in the current state of the C++ bit header.
There is further a bit of an "infamous" page amongst computer scientists for Bit Twiddling Hacks. These may not all map directly to instructions but they provide a broad set of useful functionality commonly found in not only CPUbased programming libraries, but GPUbased programming libraries and other high performance computing resources as well.
We try to take the most useful subset of these functions that most closely represent functionality on both old and new CPU architectures as well as common, necessary operations that have been around in the last 25 years for various industries. We have left out operations such as sign extension, parity computation, bit merging, clear/setting bits, fast negation, bit swaps, lexicographic next bit permutation, and bit interleaving. The rest are more common and appear across a wide range of industries from cryptography to graphics to simulation to efficient property lookup and kernel scheduling.
3.6.1. "Why not only generic interfaces or ( u ) intmax_t
interfaces?"
For many of the bitbased utilities, you will see it introduces functions with several suffixes for the various types. Often, it is asked: why? Even the GCC builtins for things like
only take
and
. The answer is in the blank spaces in the table above: for architectures that do not have perfect instruction mappings for a given builtin type (e.g., ARM for
), the amount of bits one is utilizing for the given function is actually incredibly important. There is a difference between counting for 8 bits in a loop and counting 64 bits (or larger for extended integer types), so the various forms are provided to allow implementations to produce the most efficient code on their platforms when the user requests a specific size.
The generic interfaces can be used by individuals who want automatic selection of the best. And, as shown in the § 6 Appendix, platforms can use any builtins or techniques at their disposal to select an appropriate builtin, instruction, or function call to fit the use case.
3.6.2. TypeGeneric Macros and Counts for Types
All of the functions below have type generic macros associated with them. This can bring up an interesting question: if the return value depends on the type of the argument going into the function (i.e. for
,
,
,
,
, and
), is it bad for literal arguments? The answer to this question, however, is the same as its always been when dealing with literal values in C: use the suffix for the appropriate type, or cast, or put it in a
variable so that it can be used with the expected semantics. We cannot sink macrobased generic code use cases in the offchance that someone calls
and thinks it returns a dependable answers. Integers (and their literals) are the least portable part of Standard C code: use the exactwidth types if you are expecting exactwidth semantics. Or, call the fundamentaltype suffixed versions to get answers dependable for that given type (e.g.,
).
3.6.3. Argument Types
Many of the functions below are defined over the fundamental unsigned integer types, rather than their minimum width or exact width counterparts. This is done to provide maximum portability: users can combine information from the recentlyintroduced
macros to determine the width of the sizes at translation time as well as enjoy a disjoint and distinct set of fundamental types over which generic selection works.
The
types also have
macros, but those macros are not exactly guaranteed to cover a wide range of actual bit sizes either (if the
types do not exist, then a conforming implementation can simply just name all of the types as typedefs for
and call it a day). While an implementation could also define each of the distinct fundamental types from
to
to all be the same width as well, we are at the very least guaranteed that they are, in fact, distinct types. This makes selection over types in
predictable and usable (i.e.
is not guaranteed to compile since those types are not required to form a mutually exclusive or disjoint set).
The exactwidth types suffer from nonavailability on specific platforms, which makes little sense for functions which do not depend on a nopadding bits requirement. As long as the values read from the array only involve
bits (including the sign bit), and the rest are zeroinitialized, we can have predictable semantics.
Extended integer types, leastwidth integer types, and exactwidth integer types, can all be used with the typegeneric macros since the typegeneric macros are required to work over all standard (unsigned) integer types and extended (unsigned) integer types, while excluding
and bitprecise (
) integer types that do not match preexisting type widths. This provides a complete set of functionality that is maximally portable while also allowing for precise semantic control with exact or leastwidth types.
This paper does not concern itself with the implications of passing a
to bitcounting typegeneric functions like
directly: a user must account for such use and be prepared to have types larger than
bits in width. This is, very literally, what users are signing up for when they use such types and it is their responsibility to query the
macros. We expect users to use the
of their exactwidth integer types with the typegeneric macros as well.
Finally, in general
objects are disallowed from the above functions. There is just not a meaningful body of functionality that can be provided, and there is a fundamental difference between something that is expected to be a boolean value and something that is expected to be a 1bit number (even if they can both serve similar purposes). It is also questionable to compute things such as rotation for
objects. If we can grow a consistent set of answers for these operations across the industry, than we can weaken the requirements and add the behavior in. (Note that if we put it in now and choose a behavior, we cut off any improvements made in the future, so it is best to be conservative here.)
3.6.4. Return Types
There is the question of what is meant to happen for types which return bit counts, such as
,
, and
. Ostensibly, part of the motivation to capture here should be that the types used to do things such as rotations should be identical to the return type used to do things like count zeroes, e.g.
. This is mostly nonproblematic until someone uses
: Clang already supports several megabytelarge
. On platforms where
is actually 16 bits, this is far too small to accommodate even a 1 MB
.
At the moment, the functions do not accept all bitprecise integer types (just ones that are bitwidth equivalent to the existing standard and extended integer types), so this is technically a nonissue. But, if and when bitprecise integer types are given better handling in
macros or similar features that make them more suitable for typegeneric macro implementations, this could become a problem. At the moment, we use wording to defer the issue by saying that type generic macros return a type suitably large for the range of the computed value. This allows us forward compatibility while fixing nontypegeneric macro return types to
. The typegeneric macros will have the flexibility from the specification to return larger signed integer types to aid in a smooth transition once bitprecise integer types sees more standard support.
3.6.5. stdc_count_ones
/stdc_count_zeroes
(also known as
/Population Count) is an older computer science term taken from the statistics / biology nomenclature to indicate how many bits are set within a grouping. It’s a very useful instruction with applications in everything from game development to scientific computing. It is also directly provided by many instruction sets. Its antithesis is
, which counts the number of zeros in the type. There exist efficient computation, intrinsics, and instructions for both zeros and ones computation, albeit it is more prevalent as
. We chose the name
and
due to not having a good way to describe the zerosanalogous version of
in industrysettled terminology. But, the
/
split has been used to good success in C libraries, C++ libraries, Julia, Rust, and other (standard) libraries.
The API for it is as such:
#include <stdbit.h>int stdc_count_onesuc ( unsigned char value ); int stdc_count_onesus ( unsigned short value ); int stdc_count_onesui ( unsigned int value ); int stdc_count_onesul ( unsigned long value ); int stdc_count_onesull ( unsigned long long value ); int stdc_count_zerosuc ( unsigned char value ); int stdc_count_zerosus ( unsigned short value ); int stdc_count_zerosui ( unsigned int value ); int stdc_count_zerosul ( unsigned long value ); int stdc_count_zerosull ( unsigned long long value ); // typegeneric macros int stdc_count_ones ( generic_integer_type value ); int stdc_count_zeros ( generic_integer_type value );
It covers all of the builtin unsigned integer types. The typegeneric macro supports all of the builtin types as well as any of the implementationdefined extended integer types. See the appendix for an implementation.
3.6.6. stdc_rotate_left
/stdc_rotate_right
/
are common CPU instructions and the forms of the commonlyused circular shifts. They are common operations with applications in cyclic codes. They are commonly expressed (for 32bit numbers) as
(rotate left) or
(rotate right).
#include <stdbit.h>unsigned char stdc_rotate_leftuc ( unsigned char value , int count ); unsigned short stdc_rotate_leftus ( unsigned short value , int count ); unsigned int stdc_rotate_leftui ( unsigned int value , int count ); unsigned long stdc_rotate_leftul ( unsigned long value , int count ); unsigned long long stdc_rotate_leftull ( unsigned long long value , int count ); unsigned char stdc_rotate_rightuc ( unsigned char value , int count ); unsigned short stdc_rotate_rightus ( unsigned short value , int count ); unsigned int stdc_rotate_rightui ( unsigned int value , int count ); unsigned long stdc_rotate_rightul ( unsigned long value , int count ); unsigned long long stdc_rotate_rightull ( unsigned long long value , int count ); // typegeneric macro generic_integer_type stdc_rotate_left ( generic_integer_type value , int count ); generic_integer_type stdc_rotate_right ( generic_integer_type value , int count );
They cover all of the builtin unsigned integer types. Note that
is a signed integer! If (e.g.)
is called, it will call itself again with
; if (e.g.)
is called, it will call itself again with
. This matches the behavior from C++ and avoids undefined behavior, while also avoiding toolarge shift errors from signedtounsigned conversions.
SDCC and several other compilers optimize for left and right shifts ([sdcc]). Texas Instruments and a handful of other specialist architectures also have "variable shift" instructions (SSHVL), which uses the sign of the argument to shift in one direction or the other ([titms320c64x]). Having a
where the a negative number produces the opposite
cyclic operation (and viceversa) means that both of these architectures can optimize efficiently in the case of hardcoded constants, and still produce welldefined behavior otherwise (
instructions just deploy a "negated by default" for the count value or not, depending on whether the
or
variant is called, other architectures propagate the information to shift left or right). This also follows existing practice with analogous functions from the C++
3.6.7. stdc_count_leading_zeros
, stdc_count_leading_ones
, stdc_count_trailing_zeros
, and stdc_count_trailing_ones
,
,
, and
are semicommon CPU instruction for counting the number of zeros/ones from the most significant bit ("leading") and the least significant bit ("trailing"). C++ adopted this one using the names of the form
. The
/
stand for "left" and "right". C++ uses left to match the concept of the left hand side of integers in lexical parsing and left shift operators in C an C++. We choose "leading" and "trailing" here as that’s the more common instruction name, and tie in a little bit better with "most/least significant bit" than "left" or "right" do. The name
(and its variations for the other 3 operations) can also work, albeit it would be one of the biggest names in the C standard library if we do choose it. (This could potentially be shortened to
or even
). It may also run afoul of the 31 minimum linker bytes of significance we have, so we chose these names instead.
#include <stdbit.h>int stdc_count_leading_zerosuc ( unsigned char value ); int stdc_count_leading_zerosus ( unsigned short value ); int stdc_count_leading_zerosui ( unsigned int value ); int stdc_count_leading_zerosul ( unsigned long value ); int stdc_count_leading_zerosull ( unsigned long long value ); int stdc_count_leading_onesuc ( unsigned char value ); int stdc_count_leading_onesus ( unsigned short value ); int stdc_count_leading_onesui ( unsigned int value ); int stdc_count_leading_onesul ( unsigned long value ); int stdc_count_leading_onesull ( unsigned long long value ); int stdc_count_trailing_zerosuc ( unsigned char value ); int stdc_count_trailing_zerosus ( unsigned short value ); int stdc_count_trailing_zerosui ( unsigned int value ); int stdc_count_trailing_zerosul ( unsigned long value ); int stdc_count_trailing_zerosull ( unsigned long long value ); int stdc_count_trailing_onesuc ( unsigned char value ); int stdc_count_trailing_onesus ( unsigned short value ); int stdc_count_trailing_onesui ( unsigned int value ); int stdc_count_trailing_onesul ( unsigned long value ); int stdc_count_trailing_onesull ( unsigned long long value ); // typegeneric macros int stdc_count_leading_zeros ( generic_integer_type value ); int stdc_count_leading_ones ( generic_integer_type value ); int stdc_count_trailing_zeros ( generic_integer_type value ); int stdc_count_trailing_ones ( generic_integer_type value );
3.6.8. stdc_first_leading_zero
, stdc_first_leading_one
, stdc_first_trailing_zero
, and stdc_first_trailing_one
,
,
, and
are semicommon CPU instruction (
/
for Intel,
for Motorola,
for VAX, and so on) for counting the number of zeros/ones from the most significant bit ("leading") and the least significant bit ("trailing"). The caveat here is that it produces the bit index plus one. There are a few compilerbased implementations of this. The first is MSVC’s
and
(with
prefix for 64bit versions). They are meant to mimic Intel’s instruction behavior where a flag is set if "0" is passed, which is returned to the user who called the
function. The actual output is populated in an output pointer variable of type
. Notably, MSVC does not offer any ISA protection: it will emit an illegal CPU instruction if the target architecture doesn’t support the functionality. The other implementations are from Clang, GCC and NVIDIA CUDA, which have a compiler intrinsic which is then mapped to instructions where possible. They returns
when the input value is zero.
We specify things to use the interpretation that
produces the return value
and otherwise returns
. This interpretation is favorable because it allows an enduser to easily check the return value in a way consistent with typical C boolean checking, which is with
. If
is zero, than the user knows it’s zero and knows no bit was found. Otherwise, they can proceed and subtract 1 to get the index suitable for shifts. If a user has advanced knowledge, they can simply not branch and immediately subtract.
and its similar names covers the behavior behind
. The others are permutations on this behavior: we provide them for completeness, and for the fact that other architectures cover some or part of these other named operations. Whatever happens,
is incredibly important, if only for the fact that it is responsible for significant speedups in algorithms that scan over bits to find certain behaviors. The others can be built out of different the other existing intrinsics or with speciallycrafted code, but not taxing the compiler’s optimize and simply providing the operations directly may be of great benefit.
It is of note that users can implement the find_first_set by using the
functions.
#include <stdbit.h>int stdc_first_leading_zerouc ( unsigned char value ); int stdc_first_leading_zerous ( unsigned short value ); int stdc_first_leading_zeroui ( unsigned int value ); int stdc_first_leading_zeroul ( unsigned long value ); int stdc_first_leading_zeroull ( unsigned long long value ); int stdc_first_leading_oneuc ( unsigned char value ); int stdc_first_leading_oneus ( unsigned short value ); int stdc_first_leading_oneui ( unsigned int value ); int stdc_first_leading_oneul ( unsigned long value ); int stdc_first_leading_oneull ( unsigned long long value ); int stdc_first_trailing_zerouc ( unsigned char value ); int stdc_first_trailing_zerous ( unsigned short value ); int stdc_first_trailing_zeroui ( unsigned int value ); int stdc_first_trailing_zeroul ( unsigned long value ); int stdc_first_trailing_zeroull ( unsigned long long value ); int stdc_first_trailing_oneuc ( unsigned char value ); int stdc_first_trailing_oneus ( unsigned short value ); int stdc_first_trailing_oneui ( unsigned int value ); int stdc_first_trailing_oneul ( unsigned long value ); int stdc_first_trailing_oneull ( unsigned long long value ); // typegeneric macros int stdc_first_leading_zero ( generic_integer_type value ); int stdc_first_leading_one ( generic_integer_type value ); int stdc_first_trailing_zero ( generic_integer_type value ); int stdc_first_trailing_one ( generic_integer_type value );
3.6.9. stdc_has_single_bit
This is a function that determines if an unsigned integer is a power of 2. It can be written either using a normal expression such as
, or by using
. Checking that something is a power of 2 (or that it has a single bit set) is an operation used for checking if something can be turned into a mask value efficiently (useful in specific kinds of containers which specific bit limits like hash tables) and many other applications. This one does not map directly to a hardware instruction.
#include <stdbit.h>_Bool stdc_has_single_bituc ( unsigned char value ); _Bool stdc_has_single_bitus ( unsigned short value ); _Bool stdc_has_single_bitui ( unsigned int value ); _Bool stdc_has_single_bitul ( unsigned long value ); _Bool stdc_has_single_bitull ( unsigned long long value ); // typegeneric macro _Bool stdc_has_single_bit ( generic_integer_type value );
3.6.10. stdc_bit_width
/stdc_bit_ceil
/stdc_bit_floor
These set of functions provide a way to determine the number of bits it takes to represent a given value (
), the next largest power of 2 from the value (
), the previous largest power of 2 from the value (
), and the number of bits required to store the given value. All of these operations are extremely useful, especially in the context of GPUs.
can be used to drastically simplify the implementation of both
and
.
can be calculated with
, where
is one of the
macros for the given unsigned integer type.
's computation is subtle and involves a bit of preparation to avoid problems with integer promotions and bit shifts in specific cases (typically
,
, and
on most implementations). This aids in making the case for a would make for a good candidate for standardization (since it can be hard to get right). One can detect integer promotion by checking if
and
yield the same type. If not, then an integer promotion happens, and the implementation needs to account for that. See the appendix for an implementation.
is simpler, and is comprised of a simple computation of
(with appropriately typed / casted constants so the right type is returned without promotions or casts).
The declarations look as follows:
#include <stdbit.h>unsigned char stdc_bit_flooruc ( unsigned char value ); unsigned short stdc_bit_floorus ( unsigned short value ); unsigned int stdc_bit_floorui ( unsigned int value ); unsigned long stdc_bit_floorul ( unsigned long value ); unsigned long long stdc_bit_floorull ( unsigned long long value ); unsigned char stdc_bit_ceiluc ( unsigned char value ); unsigned short stdc_bit_ceilus ( unsigned short value ); unsigned int stdc_bit_ceilui ( unsigned int value ); unsigned long stdc_bit_ceilul ( unsigned long value ); unsigned long long stdc_bit_ceilull ( unsigned long long value ); unsigned char stdc_bit_widthuc ( unsigned char value ); unsigned short stdc_bit_widthus ( unsigned short value ); unsigned int stdc_bit_widthui ( unsigned int value ); unsigned long stdc_bit_widthul ( unsigned long value ); unsigned long long stdc_bit_widthull ( unsigned long long value ); // typegeneric macro generic_return_type stdc_bit_floor ( generic_integer_type value ); generic_return_type stdc_bit_ceil ( generic_integer_type value ); generic_return_type stdc_bit_width ( generic_integer_type value );
Notably,
requires that the number is big enough to fit the representation. Conceivably, it might be beneficial to synchronize these return types and just return
. But, in the case of something like an implementation for
,
can be so catastrophically enormous that we could not count it in a (presumably 16 or 32bit)
or
type. C++ always returns the type
that was put in, and we follow that here since any type is large enough to hold its own width in bits. However, in anticipation of a potentially enormous
in
— and not wanting to return an e.g. 4 GB
to represent a
that has an
of 4 billion — we allow the return type for the generic functions to be a "suitably large (unsigned/signed) integer type".
4. Committee Polls / Questions
For the Committee, this proposal is, effectively, five parts:

the endianness definitions;

the
functions (generic and widthspecific);stdc_memreverse8 
the
/stdc_load8_ *
endianness functions;stdc_store8_ * 
the suite of lowlevel bit functions:

,stdc_count_ ( leading / trailing ) _ ( ones / zeros ) 
,stdc_count_ ( ones / zeros ) 
, and,stdc_rotate_ ( left / right ) 
,stdc_first_ ( leading / trailing ) _ ( zero / one )
which map directly to instructions and/or intrinsics; and,


the suite of useful bit functions:

,stdc_bit_ceil 
,stdc_bit_floor 
, and,stdc_bit_width 
,stdc_has_single_bit
which may not map directly to instructions but are useful nonetheless in a wide variety of contexts

These can be polled together or separately, depending on what the Committee desires. It is the author’s recommendation that all are adopted to make serialization and bit work with scalars much simpler and easier.
5. Wording
The following wording is relative to N2596.
5.1. Add < stdbit . h >
to freestanding headers in §4, paragraph 6
A conforming freestanding implementation shall accept any strictly conforming program in which the use of the features specified in the library clause (Clause 7) is confined to the contents of the standard headers,
< float . h > ,
< iso646 . h > ,
< limits . h > ,
< stdalign . h > ,
< stdarg . h > ,
< stdbool . h > ,
< stddef . h > ,
< stdint . h > , and
< stdbit . h >
< stdnoreturn . h >
5.2. Add a new bullet point at the top for globallyreserved macro and library names to §7.1.3 "Reserved Identifiers, paragraph 1.
— All identifiers starting with
are reserved for future use.
stdc_
5.3. Add a new §7.3�x subclause for "Bit and Byte Utilities" in §7
7.3�x Bit and Byte Utilities
< stdbit . h > The headerdefines the following macros, types, and functions, to work with the byte and bit representation of many types, typically integer types. This header makes available the
< stdbit . h > type name (7.19) and any
size_t or
uint N _t type names defined by the implementation (7.20).
uint_least N _t
5.3.1. Add a new §7.3�x.1 subsubclause for "Endian" in §7.3�x
7.3�x.1 Endian
Two common methods of byte ordering in multibyte scalar types are bigendian and littleendian. Bigendian is a format for storage of binary data in which the least significant byte is placed first, with the rest in ascending order. Littleendian is a format for storage or transmission of binary data in which the most significant byte is placed first, with the rest in descending order. Other byte orderings are also possible. Declarations and definitions in 7.3�x, a suffix containingtypically represents littleendian. A suffix containing
le typically represents bigendian. This clause describes the endianness of the execution environment with respect to bitprecise integer types without padding bits, standard integer types, and extended integer types.
be It is unspecified whether any generic function declared inis a macro or an identifier declared with external linkage. If a macro definition is suppressed in order to access an actual function, or a program defines an external identifier with the name of a generic function, the behavior is undefined.
< stdbit . h > The macros are:__STDC_ENDIAN_LITTLE__ which represents a method of byte order storage least significant byte is placed first, and the rest are in ascending order is suitable for use in an
preprocessing directive;
#if __STDC_ENDIAN_BIG__ which represents a method of byte order storage most significant byte is placed first, and the rest are in descending order is suitable for use in an
preprocessing directive;
#if __STDC_ENDIAN_NATIVE__ /* see below */ which represents the method of byte order storage for the execution environment and is suitable for use in an
preprocessing directive.
#if
shall be identical to
__STDC_ENDIAN_NATIVE__ if the execution environment is littleendian. Otherwise,
__STDC_ENDIAN_LITTLE__ shall be identical to
__STDC_ENDIAN_NATIVE__ if the execution environment is bigendian. If
__STDC_ENDIAN_BIG__ is not equivalent to either, then the byte order for the execution environment is implementationdefined.
__STDC_ENDIAN_NATIVE__
5.3.2. Add a new §7.3�x.2 subsubclause for "Memory Reordering" in §7.3�x
7.3�x.2 8bit Memory Reversal
Synopsis
#include <stdbit.h>#include <limit.h>#if (CHAR_BIT % 8) == 0 void stdc_memreverse8 ( size_t n , unsigned char ptr [ static n ]); #endif Description
The
function provides an interface to reverse the order of a given sequence of bytes by treating them as sequences of 8 bits at a time. The function is only present if
stdc_memreverse8 is a multiple of 8. It is equivalent to the following algorithm:
CHAR_BIT for ( size_t index = 0 ; index < ( n / 2 ); ++ index ) { const size_t reverse_index = n  1  index ; const unsigned char * p = ptr + index ; const unsigned char * reverse_p = ptr + reverse_index ; const unsigned char b_temp = * p ; const unsigned char reverse_b_temp = * reverse_p ; * p = 0 ; * reverse_p = 0 ; for ( size_t bit_index = 0 ; bit_index < CHAR_BIT ; bit_index += 8 ) { const size_t reverse_bit_index = CHAR_BIT  8  bit_index ; const unsigned char bit_mask = 0xFF << bit_index ; const unsigned char reverse_bit_mask = 0xFF << reverse_bit_index ; * p = (( reverse_b_temp & reverse_bit_mask ) << bit_index ); * reverse_p = (( b_temp & bit_mask ) << reverse_bit_index ); } } 7.3�x.3 Exactwidth 8bit Memory Reversal
Synopsis
#include <stdbit.h>#include <limits.h>#include <stdint.h>#if ((N % 8) == 0) && ((CHAR_BIT % 8) == 0) uint_t stdc_memreverse8uN ( uint_t value ); #endif Description
The
functions provide an interface to swap the bytes of a corresponding
stdc_memreverse8u N object, where N matches one of the exactwidth integer types (7.20.1.1). If an implementation provides the corresponding
uint N _t typedef, it shall define the corresponding byte swap function for that value of
uint N _t .
N Returns
The
functions returns the 8bit memory reversed
stdc_memreverse8u N value, as if by invoking
uint N _t .
stdc_memreverse8 ( sizeof ( value ), ( unsigned char * ) & value )
5.3.3. Add a new §7.3�x.4 subsubclause for "Endian Aware" functions in §7.3�x
7.3�x.4 EndianAware 8bit Load
Synopsis
#include <stdbit.h>#if ((N % 8) == 0) && ((CHAR_BIT % 8) == 0) uint_leastN_t stdc_load8_leuN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); uint_leastN_t stdc_load8_beuN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); uint_leastN_t stdc_load8_aligned_leuN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); uint_leastN_t stdc_load8_aligned_beuN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); int_leastN_t stdc_load8_lesN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); int_leastN_t stdc_load8_besN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); int_leastN_t stdc_load8_aligned_lesN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); int_leastN_t stdc_load8_aligned_besN ( const unsigned char ptr [ static ( N / CHAR_BIT )]); #endif Description
The 8bit load family of functions functions read a
or
int_least N _t object from the provided
uint_least N _t in an endianaware (7.3�x.1) manner, where N matches an existing minimumwidth integer type (7.20.1.2). If this function is present, N shall be a multiple of 8 and
ptr shall be a multiple of 8. The functions containing
CHAR_BIT in the name shall assume that
_aligned is suitably aligned to access a signed or unsigned integer of width N. If the function name contains the
ptr suffix in the name, it is a signed variant. Otherwise, the function is an unsigned variant. If the function name contains the
s N or
les N suffix, it is a littleendian variant. Otherwise, if the function name contains the
leu N or
bes N suffix, it is a bigendian variant.
beu N Returns
Let
be an object of either
value if the function is a signed variant or
int_least N _t if the function is an unsigned variant initialized to 0. Let
uint_least N _t be an integer in a sequence that
index
 — starts from 0 and increments by 8 in the range of [0, N), if the function is a little endian variant;
 — starts from
and decrements by 8 in the range of [0, N), if the function is a big endian variant.
CHAR_BIT  8 Let
be an integer that starts from 0. Let
ptr_bit_index be
byte_index8 . For each
index % CHAR_BIT in the order of the abovespecified sequence:
index
 1. let
be:
byte_mask8
 —
, if the function is a signed variant,
( 0x7F << byte_index8 ) is equal to
byte8_index , and
( CHAR_BIT  8 ) is equal to
ptr_bit_index ;
N  8  — otherwise,
.
( 0xFF << byte_index8 )  2. computes
;
value = ((( ptr [ ptr_bit_index / CHAR_BIT ] & byte8_mask ) >> byte_index8 ) << index )  3. increments
by 8.
ptr_bit_index Finally, if the function is a signed variant, and either:
 —
is nonzero for the little endian variant;
( ptr [ 0 ] >> ( CHAR_BIT  1 )) & 0x1  — or,
is nonzero for the big endian variant;
( ptr [( N / CHAR_BIT )  1 ] >> ( CHAR_BIT  1 )) & 0x1 then the most significant bit is set to
. Otherwise, it is set to
1 .
0 Returns the computed.
value 7.3�x.5 EndianAware 8bit Store
Synopsis
#include <stdbit.h>#if ((N % CHAR_BIT) == 0 && (CHAR_BIT % 8 == 0)) void stdc_store8_leuN ( uint_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_beuN ( uint_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_aligned_leuN ( uint_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_aligned_beuN ( uint_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_lesN ( int_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_besN ( int_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_aligned_lesN ( int_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); void stdc_store8_aligned_besN ( int_leastN_t value , unsigned char ptr [ static ( N / CHAR_BIT )]); #endif Description
The 8bit store family of functions functions write a
or
int_least N _t object into the provided
uint_least N _t in an endianaware (7.3�x.1) manner, where N matches an existing minimumwidth integer type (7.20.1.2). If this function is present, N shall be a multiple of 8 and
ptr shall be a multiple of 8. The functions containing
CHAR_BIT in the name shall assume that
_aligned is suitably aligned to access a signed or unsigned integer of width N. If the function name contains the
ptr suffix in the name, it is a signed variant. Otherwise, the function is an unsigned variant. If the function name contains the
s N or
les N suffix, it is a littleendian variant. Otherwise, if the function name contains the
leu N or
bes N suffix, it is a bigendian variant.
beu N Let
be an integer in a sequence that
index
 — starts from 0 and increments by 8 in the range of [0, N), if the function is a little endian variant;
 — starts from
and decrements by 8 in the range of [0, N), if the function is a big endian variant.
CHAR_BIT  8 Let
be an integer that starts from 0. Let
ptr_bit_index be
byte_index8 . For each
index % CHAR_BIT in the order of the abovespecified sequence:
index
 1. let
be:
byte_mask8
 —
, if the function is a signed variant,
( 0x7F << byte_index8 ) is equal to
byte8_index , and
( CHAR_BIT  8 ) is equal to
ptr_bit_index ;
N  8  — otherwise,
.
( 0xFF << byte_index8 )  2. sets the 8 bits in
at offset
ptr [ ptr_bit_index / CHAR_BIT ] to
byte8_index ;
( value >> index ) & 0xFF  3. increments
by 8.
ptr_bit_index Finally, if the function is a signed variant, and
is less than 0, then either:
value
 — or,
has its high bit set to 1 if the function is the little endian variant.
ptr [ 0 ]  —
has its high bit set to 1 if the function is the big endian variant;
ptr [( N / CHAR_BIT )  1 ]
5.3.4. Add a new §7.3�x.6 subsubclause for LowLevel Bit Utilities in §7.3�x
7.3�x.6 Count Leading Zeros
Synopsis
int stdc_leading_zerosuc ( unsigned char value ); int stdc_leading_zerosus ( unsigned short value ); int stdc_leading_zerosui ( unsigned int value ); int stdc_leading_zerosul ( unsigned long value ); int stdc_leading_zerosull ( unsigned long long value ); generic_return_type stdc_leading_zeros ( generic_integer_type value ); Returns
Returns the number of consecutive 0 bits in, starting from the most significant bit.
value The typegeneric function (marked by itsargument) returns the appropriate value based on the type of the input value, so long as it is an
generic_integer_type The
 — standard unsigned integer type, excluding
;
_Bool  — extended unsigned integer type;
 — or, bitprecise unsigned integer type whose width matches a standard or extended integer type, excluding
.
_Bool type for the typegeneric function need not be the same as the type of
generic_return_type . It shall be suitably large unsigned integer type capable of representing the computed result.
value 7.3�x.7 Count Leading Ones
Synopsis
int stdc_leading_onesuc ( unsigned char value ); int stdc_leading_onesus ( unsigned short value ); int stdc_leading_onesui ( unsigned int value ); int stdc_leading_onesul ( unsigned long value ); int stdc_leading_onesull ( unsigned long long value ); generic_return_type stdc_leading_ones ( generic_integer_type value ); Returns
Returns the number of consecutive 1 bits in, starting from the most significant bit.
value The typegeneric function (marked by itsargument) returns the appropriate value based on the type of the input value, so long as it is an
generic_integer_type The
 — standard unsigned integer type, excluding
;
_Bool  — extended unsigned integer type;
 — or, bitprecise unsigned integer type whose width matches a standard or extended integer type, excluding
.
_Bool type for the typegeneric function need not be the same as the type of
generic_return_type . It shall be suitably large unsigned integer type capable of representing the computed result.
value 7.3�x.8 Count Trailing Zeros
Synopsis
int stdc_trailing_zerosuc ( unsigned char value ); int stdc_trailing_zerosus ( unsigned short value ); int stdc_trailing_zerosui ( unsigned int value ); int stdc_trailing_zerosul ( unsigned long value ); int stdc_trailing_zerosull ( unsigned long long value ); generic_return_type stdc_trailing_zeros ( generic_integer_type value ); Returns
Returns the number of consecutive 0 bits in, starting from the least significant bit.
value The typegeneric function (marked by itsargument) returns the appropriate value based on the type of the input value, so long as it is an
generic_integer_type The
 — standard unsigned integer type, excluding
;
_Bool  — extended unsigned integer type;
 — or, bitprecise unsigned integer type whose width matches a standard or extended integer type, excluding
.
_Bool type for the typegeneric function need not be the same as the type of
generic_return_type . It shall be suitably large unsigned integer type capable of representing the computed result.
value 7.3�x.9 Count Trailing Ones
Synopsis
int stdc_trailing_onesuc ( unsigned char value ); int stdc_trailing_onesus ( unsigned short value ); int stdc_trailing_onesui ( unsigned int value ); int stdc_trailing_onesul ( unsigned long value ); int stdc_trailing_onesull ( unsigned long long value ); generic_return_type stdc_trailing_ones ( generic_integer_type value ); Returns
Returns the number of consecutive 1 bits in, starting from the least significant bit.
value The typegeneric function (marked by itsargument) returns the appropriate value based on the type of the input value, so long as it is an
generic_integer_type The
 — standard unsigned integer type, excluding
;
_Bool  — extended unsigned integer type;
 — or, bitprecise unsigned integer type whose width matches a standard or extended integer type, excluding
.
_Bool type for the typegeneric function need not be the same as the type of
generic_return_type . It shall be suitably large unsigned integer type capable of representing the computed result.
value 7.3�x.10 Rotate Left
Synopsis
unsigned char stdc_rotate_leftuc ( unsigned char value , int count ); unsigned short stdc_rotate_leftus ( unsigned short value , int count ); unsigned int stdc_rotate_leftui ( unsigned int value , int count ); unsigned long stdc_rotate_leftul ( unsigned long value , int count ); unsigned long long stdc_rotate_leftull ( unsigned long long value , int count ); generic_integer_type stdc_rotate_left ( generic_integer_type value , int count ); Description
Thefunctions perform a bitwise rotate left. This operation is typically known as a left circular shift.
stdc_rotate_left Returns
Let N be the width corresponding to the type of the input. Let r be
value .
count % N
 — If r is 0, returns
;
value  — otherwise, if r is positive, returns
.
( value < < r )  ( value >> ( N  r ))  — otherwise, if r is negative, returns
;
stdc_rotate_right ( value ,  r ) The typegeneric function (marked by itsargument) returns the above described result for a given input value so long as the
generic_integer_type is an
generic_integer_type
 — standard unsigned integer type, excluding
;
_Bool  — extended unsigned integer type;
 — or, bitprecise unsigned integer type whose width matches a standard or extended integer type, excluding
.
_Bool 7.3�x.11 Rotate Right
Synopsis
unsigned char stdc_rotate_rightuc ( unsigned char value , int count ); unsigned short stdc_rotate_rightus ( unsigned short value , int count ); unsigned int stdc_rotate_rightui ( unsigned int value , int count ); unsigned long stdc_rotate_rightul ( unsigned long value , int count ); unsigned long long stdc_rotate_rightull ( unsigned long long value , int count ); generic_integer_type stdc_rotate_right ( generic_integer_type value , int count ); Description
Thefunctions perform a bitwise rotate right. This operation is typically known as a right circular shift.
stdc_rotate_right Returns
Let N be the width corresponding to the type of the input.. Let r be
value .
count % N
 — If r is 0, returns
;
value  — otherwise, if r is positive, returns
;
( value >> r )  ( value << ( N  r ))  — otherwise, if r is negative, returns
;
stdc_rotate_left ( value ,  r ) The typegeneric function (marked by itsargument) returns the above described result for a given input value so long as the
generic_integer_type is an
generic_integer_type
 — standard unsigned integer type, excluding
;
_Bool  — extended unsigned integer type;
 — or, bitprecise unsigned integer type whose width matches a standard or extended integer type, excluding
.
_Bool 7.3�x.12 Count Ones
Synopsis
int stdc_count_onesuc ( unsigned char value ); int stdc_count_onesus ( unsigned short value ); int stdc_count_onesui ( unsigned int value ); int stdc_count_onesul ( unsigned long value ); int stdc_count_onesull ( unsigned long long value ); generic_return_type stdc_count_ones ( generic_integer_type value ); Returns
Thefunctions returns the total number of 1 bits within the given
stdc_count_ones .
value The typegeneric function (marked by itsargument) returns the previously described result for a given input value so long as the
generic_integer_type is an
generic_integer_type The
 — standard unsigned integer type, excluding
;
_Bool  — extended unsigned integer type;
 — or, bitprecise unsigned integer type whose width matches a standard or extended integer type, excluding
.
_Bool type for the typegeneric function need not be the same as the type of
generic_return_type . It shall be suitably large unsigned integer type capable of representing the computed result.
value 7.3�x.13 Count Zeros
Synopsis
int stdc_count_zerosuc ( unsigned char value ); int stdc_count_zerosus ( unsigned short value ); int stdc_count_zerosui ( unsigned int value ); int stdc_count_zerosul ( unsigned long value ); int stdc_count_zerosull ( unsigned long long value ); generic_return_type stdc_count_zeros ( generic_integer_type value ); Returns
Thefunctions returns the total number of 0 bits within the given
stdc_count_zeros .
value The typegeneric function (marked by itsargument) returns the previously described result for a given input value so long as the
generic_integer_type is an
generic_integer_type The
 — standard unsigned integer type, excluding
;
_Bool  — extended unsigned integer type;
 — or, bitprecise unsigned integer type whose width matches a standard or extended integer type, excluding
.
_Bool type for the typegeneric function need not be the same as the type of
generic_return_type . It shall be suitably large unsigned integer type capable of representing the computed result.
value
5.3.5. Add a new §7.3�x.3 subsubclause for Fundamental Bit Utilities in §7.3�x
7.3�x.14 Singlebit Check
Synopsis
_Bool stdc_has_single_bituc ( unsigned char value ); _Bool stdc_has_single_bitus ( unsigned short value ); _Bool stdc_has_single_bitui ( unsigned int value ); _Bool stdc_has_single_bitul ( unsigned long value ); _Bool stdc_has_single_bitull ( unsigned long long value ); _Bool stdc_has_single_bit ( generic_integer_type value ); Returns
Thefunctions returns true if and only if there is a single 1 bit in
stdc_has_single_bit .
value The typegeneric function (marked by itsargument) returns the previously described result for a given input value so long as the
generic_integer_type is an
generic_integer_type
 — standard unsigned integer type, excluding
;
_Bool  — extended unsigned integer type;
 — or, bitprecise unsigned integer type whose width matches a standard or extended integer type, excluding
.
_Bool 7.3�x.15 Bit Width
Synopsis
unsigned char stdc_bit_widthuc ( unsigned char value ); unsigned short stdc_bit_widthus ( unsigned short value ); unsigned int stdc_bit_widthui ( unsigned int value ); unsigned long stdc_bit_widthul ( unsigned long value ); unsigned long long stdc_bit_widthull ( unsigned long long value ); generic_return_type stdc_bit_width ( generic_integer_type value ); Description
Thefunctions compute the smallest number of bits needed to store
stdc_bit_width .
value Returns
Thefunctions return 0 if
stdc_bit_width is 0. Otherwise, they return
value .
1 + ⌊log _{2}( value ) ⌋The typegeneric function (marked by itsargument) returns the previously described result for a given input value so long as the
generic_integer_type is an
generic_integer_type The
 — standard unsigned integer type, excluding
;
_Bool  — extended unsigned integer type;
 — or, bitprecise unsigned integer type whose width matches a standard or extended integer type, excluding
.
_Bool type for the typegeneric function need not be the same as the type of
generic_return_type . It shall be suitably large unsigned integer type capable of representing the computed result.
value 7.3�x.16 Bit Floor
Synopsis
unsigned char stdc_bit_flooruc ( unsigned char value ); unsigned short stdc_bit_floorus ( unsigned short value ); unsigned int stdc_bit_floorui ( unsigned int value ); unsigned long stdc_bit_floorul ( unsigned long value ); unsigned long long stdc_bit_floorull ( unsigned long long value ); generic_integer_type stdc_bit_floor ( generic_integer_type value ); Description
Thefunctions compute the largest integral power of 2 that is not greater than
stdc_bit_floor .
value Returns
Thefunctions return 0 if
stdc_bit_floor is 0. Otherwise, they return the largest integral power of 2 that is not greater than
value .
value The typegeneric function (marked by itsargument) returns the previously described result for a given input value so long as the
generic_integer_type is an
generic_integer_type
 — standard unsigned integer type, excluding
;
_Bool  — extended unsigned integer type;
 — or, bitprecise unsigned integer type whose width matches a standard or extended integer type, excluding
.
_Bool 7.3�x.17 Bit Ceiling
Synopsis
unsigned char stdc_bit_ceiluc ( unsigned char value ); unsigned short stdc_bit_ceilus ( unsigned short value ); unsigned int stdc_bit_ceilui ( unsigned int value ); unsigned long stdc_bit_ceilul ( unsigned long value ); unsigned long long stdc_bit_ceilull ( unsigned long long value ); generic_integer_type stdc_bit_ceil ( generic_integer_type value ); Description
Thefunctions compute the smallest integral power of 2 that is not less than
stdc_bit_ceil . If the computation does not fit in the given return type, the behavior is undefined.
value Returns
Thefunctions return the smallest integral power of 2 that is not less than
stdc_bit_ceil .
value The typegeneric function (marked by itsargument) returns the previously described result for a given input value so long as the
generic_integer_type is an
generic_integer_type
 — standard unsigned integer type, excluding
;
_Bool  — extended unsigned integer type;
 — or, bitprecise unsigned integer type whose width matches a standard or extended integer type, excluding
.
_Bool
5.4. Add one new entry for ImplementationDefined Behavior in Annex J.3
— The value of
if the execution environment is not bigendian or littleendian (7.3�x.1).
__STDC_ENDIAN_NATIVE__ — The value of
, and
__STDC_ENDIAN_BIG__ if the execution environment is not bigendian or littleendian (7.3�x.1).
__STDC_ENDIAN_LITTLE__
5.5. Modify an existing entry for Unspecified behavior in Annex J.1
— The macro definition of a generic function is suppressed in order to access an actual function (7.17.1) , (7.3�x).
6. Appendix
A collection of miscellaneous and helpful bits of information and implementation.
6.1. Example Implementations in PubliclyAvailable Libraries
Optimized routines following the naming conventions present in this paper can be found in the [Shepherd’s Oasis Industrial Development Kit (IDK) library](), compilable with a conforming C11 compiler and tested on MSVC, GCC, and Clang on Windows, Mac, and Linux:
Optimized routines following the basic principles present in this paper and used as motivation to improve several C++ Standard Libraries can be found in the Itsy Bitsy Bit Libraries, compilable with a conforming C++17 compiler and tested on MSVC, GCC, and Clang on Windows, Mac, and Linux:

Bit Intrinsics (Declarations) (Sources)
Endianness routines and original motivation that spawned this proposal came from David Seifert’s Portable Endianness library and its deep dive into compiler optimizations and efficient code generation when alignment came into play:

Endian Load/Store (Declarations) (Sources)
6.2. Implementation of Generic stdc_count_ones
Sample implementation on Godbolt (clang/gcc specific builtins):
#define stdc_count_ones(...) \ _Generic((__VA_ARGS__), \ char: __builtin_popcount, \ unsigned char: __builtin_popcount, \ unsigned short: __builtin_popcount, \ unsigned int: __builtin_popcount, \ unsigned long: __builtin_popcountl, \ unsigned long long: __builtin_popcountll \ )(__VA_ARGS__) int main () { return stdc_count_ones (( unsigned char ) '0' ) + stdc_count_ones ( 13ull ); }
6.3. Implementation of Generic stdc_bit_ceil
Sample implementation on Godbolt (clang/gcc specific builtins):
#include <limits.h>#define stdc_leading_zeros(...) \ (_Generic((__VA_ARGS__), \ char: __builtin_clz((__VA_ARGS__))  ((sizeof(unsigned)  sizeof(char)) * CHAR_BIT), \ unsigned char: __builtin_clz((__VA_ARGS__))  ((sizeof(unsigned)  sizeof(unsigned char)) * CHAR_BIT), \ unsigned short: __builtin_clz((__VA_ARGS__))  ((sizeof(unsigned)  sizeof(unsigned short)) * CHAR_BIT), \ unsigned int: __builtin_clz((__VA_ARGS__)), \ unsigned long: __builtin_clzl((__VA_ARGS__)), \ unsigned long long: __builtin_clzll((__VA_ARGS__)) \ )) #define stdc_bit_width(...) \ _Generic((__VA_ARGS__), \ char: (CHAR_BIT  stdc_leading_zeros((__VA_ARGS__))), \ unsigned char: (UCHAR_WIDTH  stdc_leading_zeros((__VA_ARGS__))), \ unsigned short: (USHRT_WIDTH  stdc_leading_zeros((__VA_ARGS__))), \ unsigned int: (UINT_WIDTH  stdc_leading_zeros((__VA_ARGS__))), \ unsigned long: (ULONG_WIDTH  stdc_leading_zeros((__VA_ARGS__))), \ unsigned long long: (ULLONG_WIDTH  stdc_leading_zeros((__VA_ARGS__))) \ ) // integer promotion rules means we need to // precisely calculate the value here #define __stdc_bit_ceil_promotion_protection(_Type, _Value) \ _Generic((_Value), \ char: (_Value <= (_Type)1) ? (_Type)0 : (_Type)(1u <fakeproductionplaceholder class=production bsautolinksyntax='<< (stdc_bit_width((_Type)(_Value  1)) + (UINT_WIDTH  UCHAR_WIDTH)) >>' dataopaque> (stdc_bit_width((_Type)(_Value  1)) + (UINT_WIDTH  UCHAR_WIDTH)) </fakeproductionplaceholder> (UINT_WIDTH  UCHAR_WIDTH)), \ unsigned char: (_Value <= (_Type)1) ? (_Type)0 : (_Type)(1u <fakeproductionplaceholder class=production bsautolinksyntax='<< (stdc_bit_width((_Type)(_Value  1)) + (UINT_WIDTH  UCHAR_WIDTH)) >>' dataopaque> (stdc_bit_width((_Type)(_Value  1)) + (UINT_WIDTH  UCHAR_WIDTH)) </fakeproductionplaceholder> (UINT_WIDTH  UCHAR_WIDTH)), \ unsigned short: (_Value <= (_Type)1) ? (_Type)0 : (_Type)(1u <fakeproductionplaceholder class=production bsautolinksyntax='<< (stdc_bit_width((_Type)(_Value  1)) + (UINT_WIDTH  USHRT_WIDTH)) >>' dataopaque> (stdc_bit_width((_Type)(_Value  1)) + (UINT_WIDTH  USHRT_WIDTH)) </fakeproductionplaceholder> (UINT_WIDTH  USHRT_WIDTH)), \ default: (_Type)0 \ ) #define stdc_bit_ceil(...) \ _Generic((__VA_ARGS__), \ char: __stdc_bit_ceil_promotion_protection(unsigned char, (__VA_ARGS__)), \ unsigned char: __stdc_bit_ceil_promotion_protection(unsigned char, (__VA_ARGS__)), \ unsigned short: __stdc_bit_ceil_promotion_protection(unsigned short, (__VA_ARGS__)), \ unsigned int: (unsigned int)(1u << stdc_bit_width((unsigned int)((__VA_ARGS__)  1))), \ unsigned long: (unsigned long)(1ul << stdc_bit_width((unsigned long)((__VA_ARGS__)  1))), \ unsigned long long: (unsigned long long)(1ull << stdc_bit_width((unsigned long long)((__VA_ARGS__)  1))) \ ) int main () { int x = stdc_bit_ceil (( unsigned char ) '\x13' ); int y = stdc_bit_ceil ( 33u ); return x + y ; }
6.4. Endian Enumeration
The endian enumeration was struck from this paper. It had very marginal benefit and was mostly redundant for Standard C code, since the macros would suffice well enough. Nevertheless, the old rationale is presented below.
6.4.1. Rationale
A
enumeration could have some benefits, and mirrors the same enumerations come from the (accepted) C++20 paper and idioms found in [p0463], which also went into a
header. Similar ideas are also present in libraries such as [libcorkbyteorder], which are hybrid C and C++ libraries that give definitions similar to the ones here. Compilers also define macros such as
(Clang/GCC family), or are welldefined to be a certain endianness (Windows is always littleendian).
The other portion of this is that providing an enumeration helps users pass this information along to functions. Users defining functions that take an endianness, without the enumeration, would define it as so:
void my_conversion_unsafe ( int endian , size_t data_size , unsigned char data [ static data_size ]);
The name may specify that it is for an endian, but the range of values is not really known without looking at the documentation. It is also impossible for the compiler to diagnose problematic uses: calling
is legal, and compilers will not diagnose such a call as wrong. Now, consider the same with the enumeration:
void my_conversion_safe ( stdc_endian endian , size_t data_size , unsigned char data [ static data_size ]);
This function call can get diagnosed in (some) implementations:
#include <stddef.h>typedef enum stdc_endian { stdc_endian_little = __ORDER_LITTLE_ENDIAN__ , stdc_endian_big = __ORDER_BIG_ENDIAN__ , stdc_endian_native = __BYTE_ORDER__ , } stdc_endian ; void my_conversion_unsafe ( int endian , size_t n , unsigned char ptr [ static n ]) {} void my_conversion_safe ( stdc_endian endian , size_t n , unsigned char ptr [ static n ]) {} int main () { unsigned char arr [ 4 ]; my_conversion_unsafe ( 48558395 , sizeof ( arr ), arr ); my_conversion_safe ( 48558395 , sizeof ( arr ), arr ); // ^ // <source>:15:24: error: integer constant not in range // of enumerated type 'stdc_endian' (aka 'enum stdc_endian') [Werror,Wassignenum] my_conversion_unsafe (( stdc_endian ) 48558395 , sizeof ( arr ), arr ); my_conversion_safe (( stdc_endian ) 48558395 , sizeof ( arr ), arr ); return 0 ; }
(Many current implementations do not diagnose it in the current landscape because such implicit conversions are, unfortunately, incredibly common, sometimes for good reason.)
7. Acknowledgements
Many thanks to David Seifert, Aaron Bachmann, Jens Gustedt, Tony Finch, Erin AO Shepherd, and many others who helped fight to get the semantics and wording into the right form, providing motivation, giving example code, pointing out existing libraries, and helping to justify this proposal.