1.1. Revision 6 - June 17th, 2022
Editorial changes were made to the paper. These changes are non-consequential:
examples to the proper OPTIONAL section.
examples to the embed parameter wording sub-clause.
Added a history section explaining some of the pre-proposal steps this proposal went through to reach the form it is in today in § 6.1 Why a Preprocessor Directive, Specifically?.
Add the letter of support sent in to the C Standards and Shepherd’s Oasis, LLC in § 3.3 Support.
1.2. Revision 5 - April 12th, 2022
Additional syntax changes based on feedback from Joseph Myers, Hubert Tong, and users.
Minor wording tweaks and typo clean up.
An implementation available in Godbolt (since last revision as well and noted below).
The paper’s source code has been refactored:
Separated WG21 paper from WG14 paper.
Core paper together (rationale, reasoning), included in both C and C++ papers since rationale is identical.
to match feedback from last standards meeting, nominally that an empty resource returns
(but both decay to a truthy value during preprocessor conditional inclusion expressions). Modified by the wording and the prose in § 4.4 __has_embed.
As a reaction to this, the
embed parameter is an optional part of the proposal, as explained in § 188.8.131.52 Empty Signifier. This did affect a user in an impactful manner but the new functionality is fine, but has some downsides w.r.t. "repeating yourself".
The wording for the limit parameter (in the embed parameter sub-clauses) adjusted to perform macro expansion, at least once. Exact wording may need help.
1.3. Revision 4 - February 7th, 2022
Clean up syntax.
Reimplement and deploy extension in Clang to ensure an implementation of named parameters work.
Change wording to encapsulate the new fixes.
Removed C++ wording to focus on C wording for this document.
1.4. Revision 3 - May 15th, 2021
Added post C meeting fixes to prepare for hopeful success next meeting.
Added 2 more examples to C and C++ wording.
Vastly improved wording and reduced ambiguities in syntax and semantics.
Fixed various wording issues.
1.5. Revision 2 - October 25th, 2020
Added post C++ meeting notes and discussion.
Removed type or bit specifications from the
Moved "Type Flexibility" section and related notes to the Appendix as they are now unpursued.
1.6. Revision 1 - April 10th, 2020
Added post C meeting notes and discussion.
Added discussion of potential endianness.
Improved wording section at the end to be more detailed in handling preprocessor (which does not understand types).
1.7. Revision 0 - January 5th, 2020
Initial release! 🎉
2. Polls & Votes
The votes for the C Committee are as follows:
2.1. January/February 2022 C Meeting
"Does WG14 want the embed parameter specification as shown in N2898?"
From the January/February 2022 Meeting Minutes, Summary of Decisions:
WG14 wants the embed parameter specification as shown in N2898.
We interpret this as consensus. We keep the parameters but make the one that folks were questioning (
) optional in response to the feedback during and after the meeting.
2.2. December 2020 Virtual C Meeting
"Do we want to allow #embed to appear in any context that is different from an initialization of a character array?"
"Leaning in the direction of no but not clear." The paper author after consideration chose to keep this as-is right now. Discussion of the feature meant that trying to ban this from different contexts meant that a naïve, separated-preprocessor implementation would be banned and it would require special compiler magic to diagnose. Others pointed out that just trying to leave it "unspecified whether it works outside of the initialization of an array or not" is very dangerous to portability. The author agrees with this assessment and therefore will leave it as-is. The goal of this feature is to enable implementers to use the magic if they so choose, as an implementation detail and a Quality of Implementation selling point. Vendors who provide a simple expansion may not see improvements to throughput and speed of translation but that is their choice as an implementer. Therefore, we cannot do anything which would require them or any preprocessor implementation to traffic in magic directives unless they want to.
2.3. April 2020 Virtual C Meeting
"We want to have a proper preprocessor
This had UNANIMOUS CONSENT to pursue a proper preprocessor directive and NOT use the
syntax. It is noted that the author deems this to be the best decision!
The following poll was later superseded in the C and C++ Committees.
"We want to specify embed as using
." (2-way poll.)
Y: 10 bits-per-element (Ye)
N: 2 type-based (Nay)
A: 4 Abstain (Abstain)
This poll will be a bit harder to accommodate properly. Using a
that produces a numeric constant means that the max-length specifier is now ambiguous. The syntax of the directive may need to change to accommodate further exploration.
For well over 40 years, people have been trying to plant data into executables for varying reasons. Whether it is to provide a base image with which to flash hardware in a hard reset, icons that get packaged with an application, or scripts that are intrinsically tied to the program at compilation time, there has always been a strong need to couple and ship binary data with an application.
Neither C nor C++ makes this easy for users to do, resulting in many individuals reaching for utilities such as
, writing python scripts, or engaging in highly platform-specific linker calls to set up
variables pointing at their data. Each of these approaches come with benefits and drawbacks. For example, while working with the linker directly allows injection of very large amounts of data (5 MB and upwards), it does not allow accessing that data at any other point except runtime. Conversely, doing all of these things portably across systems and additionally maintaining the dependencies of all these resources and files in build systems both like and unlike
is a tedious task.
Thusly, we propose a new preprocessor directive whose sole purpose is to be
, but for binary data:
The reason this needs a new language feature is simple: current source-level encodings of "producing binary" to the compiler are incredibly inefficient both ergonomically and mechanically. Creating a brace-delimited list of numerics in C comes with baggage in the form of how numbers and lists are formatted. C’s preprocessor and the forcing of tokenization also forces an unavoidable cost to lexer and parser handling of values.
Therefore, using arrays with specific initialized values of any significant size becomes borderline impossible. One would think this old problem would be work-around-able in a succinct manner. Given how old this desire is (that comp.std.c thread is not even the oldest recorded feature request), proper solutions would have arisen. Unfortunately, that could not be farther from the truth. Even the compilers themselves suffer build time and memory usage degradation, as contributors to the LLVM compiler ran the gamut of the biggest problems that motivate this proposal in a matter of a week or two earlier this very year. Luke is not alone in his frustrations: developers all over suffer from the inability to include binary in their program quickly and perform exceptional gymnastics to get around the compiler’s inability to handle these cases.
C developer progress is impeded regarding the inability to handle this use case, and it leaves both old and new programmers wanting.
Finally, Microsoft has an ABI problem with its maximum string literal size that cannot be solved using string literals or anything treated like string literals, as the LLVM thread and the thread from Claire Xen make clear. It has also frustrated both C an C++ programmers alike, despite their best efforts. It was so frustrating that even extended-C-and-C++-compilers, like Circle, solve this problem with custom directives.
3.2. But How Expensive Is This?
Many different options as opposed to this proposal were seriously evaluated. Implementations were attempted in at least 2 production-use compilers, and more in private. To give an idea of usage and size, here are results for various compilers on a machine with the following specification:
Intel Core i7 @ 2.60 GHz
24.0 GB RAM
Debian Sid or Windows 10
Method: Execute command hundreds of times, stare extremely hard at
work well for getting accurate timing information and can be run several times in a loop to produce a good average value, tracking memory consumption without intrusive efforts was much harder and thusly relied on OS reporting with fixed-interval probes. Memory usage is therefore approximate and may not represent the actual maximum of consumed memory. All of these are using the latest compiler built from source if available, or the latest technology preview if available. Optimizations at
(GCC & Clang style)/
(MSVC style) or equivalent were employed to generate the final executable.
|Strategy||40 kilobytes||400 kilobytes||4 megabytes||40 megabytes|
|0.236 s||0.231 s||0.300 s||1.069 s|
|0.406 s||2.135 s||23.567 s||225.290 s|
|0.366 s||1.063 s||8.309 s||83.250 s|
|0.552 s||3.806 s||52.397 s||Out of Memory|
3.2.2. Memory Size
|Strategy||40 kilobytes||400 kilobytes||4 megabytes||40 megabytes|
|17.26 MB||17.96 MB||53.42 MB||341.72 MB|
|24.85 MB||134.34 MB||1,347.00 MB||12,622.00 MB|
|41.83 MB||103.76 MB||718.00 MB||7,116.00 MB|
|~48.60 MB||~477.30 MB||~5,280.00 MB||Out of Memory|
The numbers here are not reassuring that compiler developers can reduce the memory and compilation time burdens with regard to large initializer lists. Furthermore, privately owned compilers and other static analysis tools perform almost exponentially worse here, taking vastly more memory and thrashing CPUs to 100% for several minutes (to sometimes several hours if e.g. the Swap is engaged due to lack of main memory). Every compiler must always consume a certain amount of memory in a relationship directly linear to the number of tokens produced. After that, it is largely implementation-dependent what happens to the data.
The GNU Compiler Collection (GCC) uses a tree representation and has many places where it spawns extra "garbage", as its called in the various bug reports and work items from implementers. There has been a 16+ year effort on the part of GCC to reduce its memory usage and speed up initializers (C Bug Report and C++ Bug Report). Significant improvements have been made and there is plenty of room for GCC to improve here with respect to compiler and memory size. Somewhat unfortunately, one of the current changes in flight for GCC is the removal of all location information beyond the 256th initializer of large arrays in order to save on space. This technique is not viable for static analysis compilers that promise to recreate source code exactly as was written, and therefore discarding location or token information for large initializers is not a viable cross-implementation strategy.
LLVM’s Clang, on the other hand, is much more optimized. They maintain a much better scaling and ratio but still suffer the pain of their token overhead and Abstract Syntax Tree representation, though to a much lesser degree than GCC. A bug report was filed but talk from two prominent LLVM/Clang developers made it clear that optimizing things any further would require an extremely large refactor of parser internals with a lot of added functionality, with potentially dubious gains. As part of this proposal, the implementation provided does attempt to do some of these optimizations, and follows some of the work done in this post to try and prove memory and file size savings. (The savings in trying to optimize parsing large array literals were "around 10%", compared to the order-of-magnitude gains from
and similar techniques).
Microsoft Visual C (MSVC) scales the worst of all the compilers, even when given the benefit of being on its native operating system. Both Clang and GCC outperform MSVC on Windows 10 or WINE as of the time of writing.
Linker tricks on all platforms perform better with time (though slower than
implementation), but force the data to be optimizer-opaque (even on the most aggressive "Link Time Optimization" or "Whole Program Optimization" modes compilers had). Linker tricks are also exceptionally non-portable: whether it is the
assembly command supported by certain compilers, specific invocations of
or others, non-portability plagues their usefulness in writing Cross-Platform C (see Appendix for listing of techniques). This makes C decidedly unlike the "portable assembler" advertised by its proponents (and my Professors and co-workers).
To say that
enjoys broad C Community support is an understatement. In all the years we have written proposals for C and C++, this is the only one where someone physically mailed us a letter - from a different country - directly to the Standards Body to try and make a case for the feature directly, rather than what was already in the paper: