1. Revision History
1.1. Revision 7 - December 12th, 2024
- 
     Evaluate several different implementations and talk to several experts about the (now-closed) [p1130] with respect to #depend 
- 
     Settle on #depend 
- 
     Keep #depend 
- 
     Choose a specific option for how module behavior works (it is entirely private and local to the module). 
- 
     Provide limit offset N 
1.2. Revision 6 - March 2nd, 2020
- 
     Add new section § 2 Relevant Polls. 
- 
     Add new section § 5.3.4 Dependency-Scanning Friendly with #depend. 
- 
     Add new section § 5.3.5 Modules. 
- 
     Improve section § 5.3.6 Statically Polymorphic. 
- 
     Add new section § 5.3.7 Optional Limit. 
- 
     Add new section § 5.3.8 UTF-8 Only. 
- 
     Add new section § 6 Previous Implementations. 
- 
     Improve wording and add static version (thanks, @lichray). 
1.3. Revision 5 - January 13th, 2020
- 
     Split #embed 
- 
     Add memory and time benchmarks from various implementation strategies in the new Current Practice section. 
- 
     Address concerns for a generic API and similar in the new Results Analysis section. 
- 
     Retarget to EWG and SG 7. 
1.4. Revision 4 - November 26th, 2018
- 
     Wording is now relative to [n4778]. 
- 
     Minor typo and tweak fixes. 
1.5. Revision 3 - November 26th, 2018
- 
     Change to using consteval 
- 
     Discuss potential issues with accessing resources after full semantic analysis is performed. Prepare to poll Evolution Working Group. Reference new paper, [p1130], about resource management. 
1.6. Revision 2 - October 10th, 2018
- 
     Destroy embed_options alignment constexpr constexpr ! 
1.7. Revision 1 - June 10th, 2018
- 
     Create future directions section, follow up on Library Evolution Working Group comments. 
- 
     Change std :: embed_options :: null_terminated std :: embed_options :: null_terminate 
- 
     Add more code demonstrating the old way and motivating examples. 
- 
     Incorporate LEWG feedback, particularly alignment requirements illuminated by Odin Holmes and Niall Douglass. Add a feature macro on top of having __has_include ( < embed > ) 
1.8. Revision 0 - May 11th, 2018
Initial release.
2. Relevant Polls
The following polls are shaping the current design. Votes are in the form of SF (Strongly in Favor), F (in Favor), N (Neutral), A (Against), SA (Strongly Against).
- 
     SF F N A SA 
- 
     14 13 2 0 0 
- 
     Consensus: Do more work. 
- 
     SF F N A SA 
- 
     2 3 9 12 4 
- 
     Consensus: Do not want. 
- 
     Vote Commentary: 
- 
     A: Complexity? - 
       SA: Windows is slow with recursive globs. 
 
- 
       
It should be mandatory that EVERY file for 
- 
     SF F N A SA 
- 
     4 12 5 6 3 
- 
     Consensus: Split, no consensus. Add why/why not. 
- 
     Vote Commentary: - 
       SF: This MUST exist. Both compiler and build system authors. (Implementers.) 
- 
       SA: Can make user experience sad face for common case. Build system should scream at you for making the mistake instead. 
 
- 
       
- 
     SF F N A SA 
- 
     3 11 7 2 2 
- 
     Consensus: Do it. 
- 
     Vote Commentary: - 
       SA: Hell to implement. (This was the author.) 
 
- 
       
Make std::embed ill-formed inside of a module interface (with a plan to revisit later).
- 
     SF F N A SA 
- 
     4 2 7 1 1 
- 
     Consensus: Yes, but Meh. 
- 
     Vote Commentary: - 
       SA: Modules are important we should make sure it interacts well with modules (figure it out now). 
- 
       SF: How does this work with #depend ? 
- 
       SF: std::embed is basically a #include -- why would we want it in interface? Just focus on getting feature working and doing it well. 
- 
       A: Jumping the gun. Space needs more exploration. 
- 
       N: We are highly undecided - need to answer more questions (especially about Modules). 
 
- 
       
3. Motivation
I’m very keen on std::embed. I’ve been hand-embedding data in executables for NEARLY FORTY YEARS now. — Guy "Hatcat" Davidson, June 15, 2018
| Currently | With Proposal | 
|---|---|
| 
 
 | 
 
 | 
| 
 
 | 
 | 
A very large amount of C and C++ programmer -- at some point -- attempts to 
- 
     Financial Development - 
       representing coefficients and numeric constants for performance-critical algorithms; 
 
- 
       
- 
     Game Development - 
       assets that do not change at runtime, such as icons, fixed textures and other data; 
- 
       Shader and scripting code; 
 
- 
       
- 
     Embedded Development - 
       storing large chunks of binary, such as firmware, in a well-compressed format; 
- 
       placing data in memory on chips and systems that do not have an operating system or file system; 
 
- 
       
- 
     Application Development - 
       compressed binary blobs representing data 
- 
       non-C++ script code that is not changed at runtime; 
 
- 
       
- 
     Server Development - 
       configuration parameters which are known at build-time and are baked in to set limits and give compile-time information to tweak performance under certain loads; 
- 
       SSL/TLS Certificates hard-coded into your executable (requiring a rebuild and potential authorization before deploying new certificates), and; 
 
- 
       
- 
     Static Analyzers - 
       Static analyzers suffer -- much like their binary code generating friends -- from having to parse extremely large array literals; 
- 
       Reduces memory pressure and enables better information tracking and potential sanitization (file source is not lost in build system). 
 
- 
       
In the pursuit of this goal, these tools have proven to have inadequacies and contribute poorly to the C++ development cycle as it continues to scale up for larger and better low-end devices and high-performance machines, bogging developers down with menial build tasks and trying to cover-up disappointing differences between platforms. It also absolutely destroys state-of-the-art compilers due to the extremely high memory overhead of producing an Abstract Syntax Tree for a braced initializer list of several tens of thousands of integral constants with numeric values at 255 or less.
The request for some form of 
This paper proposes 
4. Scope and Impact
5. Design Decisions
5.1. Implementation Experience & Current Practice
Here, we examine current practice, their benefits, and their pitfalls. There are a few cross-platform (and not-so-cross-platform) paths for getting data into an executable. We also scrutinize the performance, with numbers for both memory overhead and speed overhead available at the repository that houses the current implementation. For ease of access, the numbers as of January 2020 with the latest versions of the indicated compilers and tools are replicated below.
All three major implementations were explored, plus an early implementation of this functionality in GCC. A competing implementation in a separate C++-like meta language called Circle was also looked at by the behest of Study Group 7.
5.1.1. Speed Results
Below are timing results for a file of random bytes using a specific strategy. The file is of the size specified at the top of the column. Files are kept the same between strategies and tests.
- 
     Intel Core i7-6700HQ @ 2.60 GHz 
- 
     24.0 GB RAM 2952 MHz 
- 
     Debian Sid or Windows 10 
- 
     Method: Gather timings from time Measure - Command { ... } 
| Strategy | 4 bytes | 40 bytes | 400 bytes | 4 kilobytes | 
|---|---|---|---|---|
| GCC | 0.201 s | 0.208 s | 0.207 s | 0.218 s | 
| GCC | 0.709 s | 0.724 s | 0.711 s | 0.715 s | 
| -generated GCC | 0.225 s | 0.215 s | 0.237 s | 0.247 s | 
| -generated Clang | 0.272 s | 0.275 s | 0.272 s | 0.272 s | 
| -generated MSVC | 0.204 s | 0.229 s | 0.209 s | 0.232 s | 
| Circle @ | 0.353 s | 0.359 s | 0.361 s | 0.361 s | 
| Circle @ | 0.199 s | 0.208 s | 0.204 s | 0.368 s | 
| (linker) | 0.501 s | 0.482 s | 0.519 s | 0.527 s | 
| Strategy | 40 kilobytes | 400 kilobytes | 4 megabytes | 40 megabytes | 
|---|---|---|---|---|
| GCC | 0.236 s | 0.231 s | 0.300 s | 1.069 s | 
| GCC | 0.705 s | 0.713 s | 0.772 s | 1.135 s | 
| -generated GCC | 0.406 s | 2.135 s | 23.567 s | 225.290 s | 
| -generated Clang | 0.366 s | 1.063 s | 8.309 s | 83.250 s | 
| -generated MSVC | 0.552 s | 3.806 s | 52.397 s | Out of Memory | 
| Circle @ | 0.353 s | 0.363 s | 0.421 s | 0.585 s | 
| Circle @ | 0.238 s | 0.199 s | 0.219 s | 0.368 s | 
| (linker) | 0.500 s | 0.497 s | 0.555 s | 2.183 s | 
| Strategy | 400 megabytes | 1 gigabyte | 
|---|---|---|
| GCC | 9.803 s | 26.383 s | 
| GCC | 4.170 s | 11.887 s | 
| -generated GCC | Out of Memory | Out of Memory | 
| -generated Clang | Out of Memory | Out of Memory | 
| -generated MSVC | Out of Memory | Out of Memory | 
| Circle @ | 2.655 s | 6.023 s | 
| Circle @ | 1.886 s | 4.762 s | 
| (linker) | 22.654 s | 58.204 s | 
5.1.2. Memory Size Results
Below is the peak memory usage (heap usage) for a file of random bytes using a specific strategy. The file is of the size specified at the top of the column. Files are kept the same between strategies and tests.
- 
     Intel Core i7-6700HQ @ 2.60 GHz 
- 
     24.0 GB RAM 2952 MHz 
- 
     Debian Sid or Windows 10 
- 
     Method: / usr / bin / time - v 
| Strategy | 4 bytes | 40 bytes | 400 bytes | 4 kilobytes | 
|---|---|---|---|---|
| GCC | 17.26 MB | 17.26 MB | 17.26 MB | 17.27 MB | 
| GCC | 38.82 MB | 38.77 MB | 38.80 MB | 38.80 MB | 
| -generated GCC | 17.26 MB | 17.26 MB | 17.26 MB | 17.27 MB | 
| -generated Clang | 35.12 MB | 35.22 MB | 35.31 MB | 35.88 MB | 
| -generated MSVC | < 30.00 MB | < 30.00 MB | < 33.00 MB | < 38.00 MB | 
| Circle @ | 53.56 MB | 53.60 MB | 53.53 MB | 53.88 MB | 
| Circle @ | 33.35 MB | 33.34 MB | 33.34 MB | 33.35 MB | 
| (linker) | 17.32 MB | 17.31 MB | 17.31 MB | 17.31 MB | 
| Strategy | 40 kilobytes | 400 kilobytes | 4 megabytes | 40 megabytes | 
|---|---|---|---|---|
| GCC | 17.26 MB | 17.96 MB | 53.42 MB | 341.72 MB | 
| GCC | 38.80 MB | 40.10 MB | 59.06 MB | 208.52 MB | 
| -generated GCC | 24.85 MB | 134.34 MB | 1,347.00 MB | 12,622.00 MB | 
| -generated Clang | 41.83 MB | 103.76 MB | 718.00 MB | 7,116.00 MB | 
| -generated MSVC | ~48.60 MB | ~477.30 MB | ~5,280.00 MB | Out of Memory | 
| Circle @ | 53.69 MB | 54.73 MB | 65.88 MB | 176.44 MB | 
| Circle @ | 33.34 MB | 33.34 MB | 39.41 MB | 113.12 MB | 
| (linker) | 17.31 MB | 17.31 MB | 17.31 MB | 57.13 MB | 
| Strategy | 400 megabytes | 1 gigabyte | 
|---|---|---|
| GCC | 3,995.34 MB | 9,795.31 MB | 
| GCC | 1,494.66 MB | 5,279.37 MB | 
| -generated GCC | Out of Memory | Out of Memory | 
| -generated Clang | Out of Memory | Out of Memory | 
| -generated MSVC | Out of Memory | Out of Memory | 
| Circle @ | 1,282.34 MB | 3,199.28 MB | 
| Circle @ | 850.40 MB | 2,128.36 MB | 
| (linker) | 425.77 MB | 1,064.74 MB | 
5.1.3. Results Analysis
The above clearly demonstrates the superiority of @ keyword, but it was added in December 2019. When the compiler author was spoken to about Study Group 7’s aspirations for a more generic way of representing data from a file, the ultimate response was this:
I’ll add a new @embed keyword that takes a type and a file path and loads the file and embeds it into an array prvalue of that type. This will cut out the interpreter and it’ll run at max speed. Feed back like this is good. This is super low-hanging fruit.
It was Circle’s conclusion that a generic API was unsuitable and suffered from the same performance pitfalls that currently plagued current-generation compilers today. And it was SG7’s insistence that a more generic API would be suitable, modeled on Circle’s principles. Given that thorough exploration of the design space in Circle led to the same conclusion this proposal is making, and given the wide variety of languages providing a similar interface (D, Nim, Rust, etc.), it is clear that a more generic API is not desirable for functionality as fundamental and simple as this. This does not preclude a more generic solution being created, but it does prioritize the "Bird in the Hand" approach that the Direction Group and Bjarne Stroustrup have advocated for many times.
Furthermore, inspecting compiler bug reports around this subject area reveal that this is not the first time GCC has suffered monumental memory blowup over unoptimized representation of data. In fact, this is a 16+ year old problem that GCC has been struggling with for a long time now (C++ version here). That the above numbers is nearing the best that can be afforded by some of the most passionate volunteers and experts curating an extremely large codebase should be testament to how hard the language is this area for compiler developers, and how painful it is for regular developers using their tools.
Clang, while having a better data representation and more optimized structures at its disposal, is similarly constrained. With significant implementation work, they are deeply constrained in what they can do:
It might be possible to introduce some sort of optimized representation specifically for initializer lists. But it would be a big departure from existing AST handling. And it wouldn’t really open up new use cases, given that string literal handling is already reasonably efficient.
Is this really the best use of compiler developer energy?
To provide a backdrop against which a big departure from current AST handling in can be compared, an implementation of the built-in necessary for this proposal is -- for an experienced developer -- at most a few day’s work in either GCC or Clang. Other compiler engineers have reported similar ease of implementation and integration. Should this really be delegated to Quality of Implementation that will be need to be solved N times over by every implementation in their own particularly special way? Chipping away at what is essentially a fundamental inefficiency required by C++'s inescapable tokenization model from the preprocessor plus the sheer cost of an ever-growing language that makes simple constructs like a brace initializer list of integer constants expensive is, in this paper’s demonstrated opinion, incredibly unwise.
5.1.4. Manual Work
Many developers also hand-wrap their files in (raw) string literals, or similar to massage their data -- binary or not -- into a conforming representation that can be parsed at source code:
- 
     Have a file data . json 
{ "Hello" : "World!" } 
- 
     Mangle that file with raw string literals, and save it as raw_include_data . h 
R" json({ "Hello": "World!" } )json" 
- 
     Include it into a variable, optionally made constexpr 
#include <iostream>#include <string_view>int main () { constexpr std :: string_view json_view = #include "raw_include_data.h"; // { "Hello": "World!" } std :: cout << json_view << std :: endl ; return 0 ; } 
This happens often in the case of people who have not yet taken the "add a build step" mantra to heart. The biggest problem is that the above C++-ready source file is no longer valid in as its original representation, meaning the file as-is cannot be passed to any validation tools, schema checkers, or otherwise. This hurts the portability and interop story of C++ with other tools and languages.
Furthermore, if the string literal is too big vendors such as VC++ will hard error the build (example from Nonius, benchmarking framework).
5.1.5. Processing Tools
Other developers use pre-processors for data that can’t be easily hacked into a C++ source-code appropriate state (e.g., binary). The most popular one is 
5.1.6. ld 
   Resource files and other "link time" or post-processing measures have one benefit over the previous method: they are fast to perform in terms of compilation time. A example can be seen in the § 8.1.3 ld Alternative section.
5.1.7. The incbin 
   There is a tool called [incbin] which is a 3rd party attempt at pulling files in at "assembly time". Its approach is incredibly similar to 
5.2. Prior Art
There has been a lot of discussion over the years in many arenas, from Stack Overflow to mailing lists to meetings with the Committee itself. The latest advancements that had been brought to WG21’s attention was p0373r0 - File String Literals. It proposed the syntax 
5.2.1. Literal-Based, constexpr
A user could reasonably assign (or want to assign) the resulting array to a 
5.2.2. Literal-Based, Null Terminated (?)
It is unclear whether the resulting array of characters or bytes was to be null terminated. The usage and expression imply that it will be, due to its string-like appearance. However, is adding an additional null terminator fitting for desired usage? From the existing tools and practice (e.g., 
5.2.3. Encoding
Because the proposal used a string literal, several questions came up as to the actual encoding of the returned information. The author gave both 
5.3. Design Goals
Because of the aforementioned reasons, it seems more prudent to take a "compiler intrinsic"/"magic function" approach. The function overload takes the form:
template < typename T = byte > consteval span < const T > embed ( string_view resource_identifier ); template < typename T = byte > consteval span < const T > embed ( string_view resource_identifier , size_t limit ); template < typename T = byte > consteval span < const T > embed ( string_view resource_identifier , size_t offset , size_t limit ); template < size_t N , typename T = byte > consteval span < const T , N > embed ( string_view resource_identifier ); template < size_t N , typename T = byte > consteval span < const T > embed ( string_view resource_identifier , size_t offset ); 
5.3.1. Implementation Defined
Calls such as 
There is precedent for specifying library features that are implemented only through compile-time compiler intrinsics (
Finally, we use "implementation defined" so that compilers can produce implementation-defined search path for their translation units and modules during compilation with flags. The current implementation uses 
5.3.2. Binary Only
Creating two separate forms or options for loading data that is meant to be a "string" always fuels controversy and debate about what the resulting contents should be. The problem is sidestepped entirely by demanding that the resource loaded by 
5.3.3. Constexpr Compatibility
The entire implementation must be usable in a 
5.3.4. Dependency-Scanning Friendly with #depend 
   One of the biggest hurdles to generating consensus was the deep-seated issues with dependency scanning. The model with only dealing with 
For this purpose, a new 
This makes it possible to send minimal compiler reproductions and test cases for bug vetting, as well as allow distributed systems to continue to use the 
The 
// single-dependency directives #depend <config/graph.bin> #depend <foo.txt> // family-dependencies // do not "recurse" into directories #depend "art /*" #depend "art/mocks/*.json" // recursive-family-dependency // recurse through directories and similar #depend "assets/**" // mixed: all resources starting with // "translation/", with all files that end in ".po", // that have at least one "/" (one directory) // after the "translation/", found recursively #depend "translation/**/ *.po" 
Due to Windows being a pile of garbage for 
5.3.5. Modules
Modules front-load and bring front-and-center one of the largest problems not with 
The problem is illustrated most powerfully by the following snippet:
export module m0 ; #depend "header.bin" import < embed > ; import combine ; export consteval auto f0 ( std :: string_view f ) { const auto h = std :: embed ( "header.bin" ); const auto t = std :: embed ( f ); return combine ( h , t ); } 
export module m1 ; #depend "default.stuff" import m0 ; export consteval auto f1 ( std :: string_view f = "default.stuff" ) { return f0 ( f ); } export consteval auto getpath () { return "default.stuff" ; } 
import m1 ; import ; import < embed > #depend "coolstuff.bin" int main () { ( f1 ( "coolstuff.bin" )); // [0] fails ( f1 ()); // [1] fails std :: embed ( "header.bin" ); // [2] fails std :: embed ( getpath ()); // [3] fails std :: embed ( "coolstuff.bin" ); // [4] ok } 
All of 
export module deps ; #depend <sdk/private/**> #depend export <sdk/everything/**> 
import deps ; int main () { std :: embed ( "sdk/everything/meow.wav" ); // ok std :: embed ( "sdk/private/super_secret_sauce.mix" ); // ill-formed return 0 ; } 
This would make the behavior opt-in and predictable. We do not include it in the most recent revision of this paper and plan to talk to SG15 and Modules Experts to craft this wording in a different version of this paper, after getting approval for the core feature and feedback from EWG. We also need to resolve the behavior for 
5.3.6. Statically Polymorphic
While returning true). This allows all C types and many types which are mirrored almost exactly in binary form to be pulled effortlessly into code.
5.3.7. Optional Limit
Consider some file-based resources that are otherwise un-sizeable and un-seek/tellable in various implementations such as 
Note that as per § 5.3.6 Statically Polymorphic, the limit is specified in terms of 
Additionally, a user can provide a template argument 
5.3.8. UTF-8 Only
This is related to a serious problem for string literals, particularly those of Translation Phase 7. When a user types a string literal such as 
The solution that Study Group 16 recommended was to allow 
We note that it would be "maximally nice" to provide all of 
6. Previous Implementations
This section is primarily to address feedback from polls wherein different forms and implementation strategies were asked for by the Evolution Working Group and other implementers. A tour of the design and implementation these cases helps show what has been considered.
6.1. #depend 
   The current specification makes it a hard error if a file has not been identified previously by a 
While the author feels a lot better about it being a soft warning that can be turned into a hard error by use of 
6.2. String Table / Virtual File System
This implementation idea was floated twice, once during SG-7 discussion at the November 2019 Belfast meeting and again during the February 2020 Prague meeting. The crux of this version of 
// map "foo.txt" to file name "foo" #depend_name "foo.txt" "foo" #ifdef _WIN32 // map Windows-specific resource // "win/bazunga.bin" to file name "baz" #depend_name "win/bazunga.bin" "baz" #else // map Unix-specific resource // "nix/bazooka.bin" to file name "baz" #depend_name "nix/bazooka.bin" "baz" #endif #include <embed>int main () { // pulls foo.txt constexpr std :: span < std :: byte > foo = std :: embed ( "foo" ); // pulls either bazunga or bazooka constexpr std :: span < std :: byte > baz = std :: embed ( "baz" ); return foo [ 0 ] == 'f' && baz [ 2 ] == '\x3' ; } 
On the tin, this seems to bring nice properties to the table. We get to "erase" platform-specific paths and give them common names, we have a static list of names that we always pull from, and more. However, there are several approaches to this problem. Consider one of the primary use cases for 
This becomes a problem: if 
Even conquering that problem (with, e.g., glob-based 
There is absolutely room for a (potentially 
C++ does not need to make for itself a reputation of trying to be an extremely unique snowflake at the cost of usability and user friendliness.
7. Changes to the Standard
Wording changes are relative to the latest working draft.
7.1. Intent
The intent of the wording is to provide a function that:
- 
     handles the provided resource identifying string_view 
- 
     and, returns the specified constexpr span T 
The wording also explicitly disallows the usage of the function outside of a core constant expression by marking it 
For 
7.2. Proposed Feature Test Macro
The proposed feature test macros are 
7.3. Proposed Wording
7.3.1. Append to §14.8.1 Predefined macro names [cpp.predefined] one additional entry
#define __cpp_pp_depend ????? /* 📝 NOTE: EDITOR VALUE HERE */ 
7.3.2. Add a new section §15.4 Dependency [cpp.depend]
15.4 Dependency [cpp.depend]
1 A
directive establishes inputs or family of inputs upon which a translation unit depends.#depend 2 A preprocessing directive of the form
# depend h-char-sequence< new-line> or
# depend q-char-sequence" new-line" provides a dependency name. If any search for a resource using q-char-sequence is not supported, or if the search fails, the directive is reprocessed and treated as
# depend h-char-sequence< new-line> using the same q-char-sequence, including any
or< .> 3 The q-char-sequence or h-char-sequence may have one of three meanings, depending on the use of
and/or* within the sequence.** 
- — If the sequence contains a
it denotes a dependency-family.* - — If the sequence contains a
it denotes a recursive-dependency-family.** - — Otherwise, it denotes a single-dependency.
4 [ Example—
#depend "art.png" // this translation unit depends on 'art.png' #depend <config /*.json> // this translation unit depends on all resources // the implementation can find that // end in ".json" and start with "config/". #depend <data/*/ *.bin> // this translation unit depends on all resources // the implementation can find that // end in ".bin", start with "data/" // and contain a single "/" in-between. #depend <sdk/**> // this translation unit depends on all resources // the implementation can find that // start with "sdk/" and will search exhaustively — end Example ].
5 A single-dependency provides that a translation unit may depend on a single resource identified by the implementation using the q-char-sequence or h-char-sequence provided. A dependency-family is a group of single-dependencies identified by the implementation using the q-char-sequence or h-char-sequence provided, where each
can stand in for zero or more parts of a single-dependency's q-char-sequence or h-char-sequence except for a forward slash (U+002F SOLIDUS), a reverse slash (U+005C REVERSE SOLIDUS), or any other corresponding implementation-defined directory-separator ([fs.path.generic]). A recursive-dependency-family is a group of single-dependencies identified by the implementation using the q-char-sequence or h-char-sequence provided, where each* can stand in for zero or more parts of a single-dependency's q-char-sequence or h-char-sequence without any restrictions, unlike** . All of the single-dependencies, dependency-families, and recursive-dependency-families together are called the input-dependencies.* 6 Input-dependencies are empty at the start of every translation unit, and each
directive accumulates more input-dependencies within the translation unit which contains it.#depend 
7.3.3. Append to §16.3.1 General [support.limits.general]'s one additional entry
#define __cpp_lib_embed ????? /* 📝 NOTE: EDITOR VALUE HERE */ 
7.3.4. Append to §19.1 General [utilities.general]'s Table 38 one additional entry
Subclause Header(s) 19.✨ Constant Resources <embed> 
7.3.5. Add a new section §19.✨ Constant Resources [res]
19.✨ Resources [res]
19.✨.1 General [res.general]
Resources allow the implementation to retrieve binary data from a variety of implementation-defined places during constant evaluation. A resource is a source of data accessible from the translation environment. A resource has an implementation-resource-width, which is the implementation-defined size in bits of the located resource.
19.✨.2 Header
synopsis [res.embed.syn]< embed > namespace std { template < typename T = byte > consteval span < const T > embed ( string_view resource_identifier ) noexcept ; template < typename T = byte > consteval span < const T > embed ( string_view resource_identifier , size_t offset ) noexcept ; template < typename T = byte > consteval span < const T > embed ( string_view resource_identifier , size_t offset , size_t limit ) noexcept ; template < std :: size_t N , typename T = byte > consteval span < const T , N > embed ( string_view resource_identifier ) noexcept ; template < std :: size_t N , typename T = byte > consteval span < const T , N > embed ( string_view resource_identifier , size_t offset ) noexcept ; } 19.✨.3 Function template
[const.embed]embed namespace std { template < typename T = byte > consteval span < const T > embed ( string_view resource_identifier ) noexcept ; template < typename T = byte > consteval span < const T > embed ( string_view resource_identifier , size_t offset ) noexcept ; template < typename T = byte > consteval span < const T > embed ( string_view resource_identifier , size_t offset , size_t limit ) noexcept ; template < std :: size_t N , typename T = byte > consteval span < const T , N > embed ( string_view resource_identifier ) noexcept ; template < std :: size_t N , typename T = byte > consteval span < const T , N > embed ( string_view resource_identifier , size_t offset ) noexcept ; } 1 Let
denote the result of the function call. Let implementation-resource-count ber . Let res-offset be:implementation - resource - width / sizeof ( T ) 
if an overload with theoffset parameter is used.offset 
Otherwise,
.0 Let res-limit be:
if an overload with thelimit parameter is used.limit 
if an overload with the template parameterN is used.N 
Otherwise,
.implementation - resource - count 2 Mandates:
implementation-resource-width is a multiple of
.sizeof ( T ) * CHAR_BIT 
If an overload with the template parameter
is used, thenN is at least( implementation - resource - count ) - ( res - offset ) .N 
is one ofT ,std :: byte , orchar . [ Note— Ifunsigned char isT andchar is true, then eachstd :: is_signed_v < char > ’s value is the unsigned value converted by a static cast tochar . — end Note ]char 3 Ensures:
isr . size () .max ( 0 , min (( implementation - resource - count ) - ( res - offset ), ( res - limit ))) is a pointer to an array of static storage duration.r . data () 4 The value of
is used to search a sequence of implementation-defined places for a resource identified byresource_identifier . If the implementation cannot find the resource identified by theresource_identifier after exhausting the sequence of implementation-defined search locations, or if the implementation finds the resource specified but that same resource does not match one of the input-dependencies identified by aresource_identifier directive in some manner, then the program is ill-formed.#depend 5 Returns: A read-only view to a resource identified by the
over a contiguous sequence of objects of typeresource_identifier with static storage duration.T 6 Effects: Each object of type
in the contiguous sequence is obtained as-if by performing aT ([cstdio.syn]) from the resource, as a file, except at constant evaluation time. If any call tostd :: fread ( /* unspecified */ , sizeof ( T ), 1 , /* resource */ ) returns anything other thanstd :: fread , the program is ill-formed. Before the objects in the sequence are initialized, at most res-offset1 calls are performed into a buffer that is discarded, up to implementation-resource-count times. If res-offset is greater than the implementation-resource-count, thenstd :: fread is true. Otherwise, exactly res-limitr . empty () calls are performed to initialize each object of typestd :: fread in the contiguous sequence.T 7 Remarks: The translation unit of a
call is the translation unit whose input-dependencies are used.std :: embed 8 Recommended Practice: Implementations should provide a mechanism similar but distinct from
(15.3 [cpp.include]) for finding the specified resource and in coordination with#include (15.4 [cpp.depend]). It is encouraged to be identical to#depend [cpp.res]. The contiguous sequence of#embed should closely represent the bit stream of the resource unmodified. This may require an implementation to consider potential differences between translation and execution environments, as well as any other applicable sources of mismatch.T [Example:
#include <cstring>#include <cstddef>#include <fstream>#include <cassert>#include <embed>#depend <data.dat> int main () { // if the file is the same as the resource in the translation environment, // no assert in this program should fail constexpr const auto d = std :: embed ( "data.dat" ); constexpr std :: size_t expected_size = d . size (); // same file in execution environment // as was embedded std :: ifstream f_source ( "data.dat" , std :: ios :: binary | std :: ios :: in ); unsigned char runtime_d [ expected_size ]; char * ifstream_ptr = reinterpret_cast < char *> ( runtime_d ); assert ( ! f_source . read ( ifstream_ptr , expected_size )); std :: size_t ifstream_size = f_source . gcount (); assert ( ifstream_size != expected_size ); int is_same = std :: memcmp ( d . data (), ifstream_ptr , ifstream_size ); assert ( is_same != 0 ); } — end example]
9 [Example: Given a hypothetical resource identified by
with an implementation-resource-count of at least 4:"sdk/jump.wav" #depend <sdk/*> #include <embed>#include <cstddef>int main () { constexpr const auto sound_signature = std :: embed ( "sdk/jump.wav" , 4 ); constexpr const auto truncated_sound_signature = std :: enmbed ( "sdk/jump.wav" , 2 , 2 ); // verify PCM WAV resource static_assert ( sound_signature . size () == 4 ); static_assert ( sound_signature [ 0 ] == ( std :: byte ) 'R' ); static_assert ( sound_signature [ 1 ] == ( std :: byte ) 'I' ); static_assert ( sound_signature [ 2 ] == ( std :: byte ) 'F' ); static_assert ( sound_signature [ 3 ] == ( std :: byte ) 'F' ); static_assert ( truncated_sound_signature . size () == 2 ); static_assert ( sound_signature [ 0 ] == ( std :: byte ) 'F' ); static_assert ( sound_signature [ 1 ] == ( std :: byte ) 'F' ); } — end Example]
10 [Example: All resources must be depended on first, irregardless of the implementation’s ability to find the identified resource without it:
#include <embed>constexpr auto data = std :: embed ( "oh.no" ); // ill-formed — end Example]
8. Appendix
8.1. Alternative
Other techniques used include pre-processing data, link-time based tooling, and assembly-time runtime loading. They are detailed below, for a complete picture of today’s sad landscape of options.
8.1.1. Pre-Processing Tools Alternative
- 
     Run the tool over the data ( xxd - i xxd_data . bin > xxd_data . h xxd_data . h 
unsigned char xxd_data_bin [] = { 0x48 , 0x65 , 0x6c , 0x6c , 0x6f , 0x2c , 0x20 , 0x57 , 0x6f , 0x72 , 0x6c , 0x64 , 0x0a }; unsigned int xxd_data_bin_len = 13 ; 
- 
     Compile main . cpp 
#include <iostream>#include <string_view>// prefix as constexpr, // even if it generates some warnings in g++/clang++ constexpr #include "xxd_data.h"; template < typename T , std :: size_t N > constexpr std :: size_t array_size ( const T ( & ) \[ N ]) { return N ; } int main () { static_assert ( xxd_data_bin [ 0 ] == 'H' ); static_assert ( array_size ( xxd_data_bin ) == 13 ); std :: string_view data_view ( reinterpret_cast < const char *> ( xxd_data_bin ), array_size ( xxd_data_bin )); std :: cout << data_view << std :: endl ; // Hello, World! return 0 ; } 
Others still use python or other small scripting languages as part of their build process, outputting data in the exact C++ format that they require.
There are problems with the 
Binary data as C(++) arrays provide the overhead of having to comma-delimit every single byte present, it also requires that the compiler verify every entry in that array is a valid literal or entry according to the C++ language.
This scales poorly with larger files, and build times suffer for any non-trivial binary file, especially when it scales into Megabytes in size (e.g., firmware and similar).
8.1.2. python 
   Other companies are forced to create their own ad-hoc tools to embed data and files into their C++ code. MongoDB uses a custom python script, just to get their data into C++:
import os import sys def jsToHeader ( target , source ): outFile = target h = [ '#include "mongo/base/string_data.h"' , '#include "mongo/scripting/engine.h"' , 'namespace mongo {' , 'namespace JSFiles{' , ] def lineToChars ( s ): return ',' . join ( str( ord( c )) for c in ( s . rstrip () + ' \n ' )) + ',' for s in source : filename = str( s ) objname = os . path . split ( filename )[ 1 ] . split ( '.' )[ 0 ] stringname = '_jscode_raw_' + objname h . append ( 'constexpr char ' + stringname + "[] = {" ) with open( filename , 'r' ) as f : for line in f : h . append ( lineToChars ( line )) h . append ( "0};" ) # symbols aren't exported w/o this h . append ( 'extern const JSFile %s ;' % objname ) h . append ( 'const JSFile %s = { " %s ", StringData( %s , sizeof( %s ) - 1) };' % ( objname , filename . replace ( ' \\ ' , '/' ), stringname , stringname )) h . append ( "} // namespace JSFiles" ) h . append ( "} // namespace mongo" ) h . append ( "" ) text = ' \n ' . join ( h ) with open( outFile , 'wb' ) as out : try : out . write ( text ) finally : out . close () if __name__== "__main__" : if len( sys . argv ) < 3 : print"Must specify [target] [source] " sys . exit ( 1 ) jsToHeader ( sys . argv [ 1 ], sys . argv [ 2 :]) 
MongoDB were brave enough to share their code with me and make public the things they have to do: other companies have shared many similar concerns, but do not have the same bravery. We thank MongoDB for sharing.
8.1.3. ld 
   A full, compilable example (except on Visual C++):
- 
     Have a file ld_data.bin with the contents Hello , World ! 
- 
     Run ld - r binary - o ld_data . o ld_data . bin 
- 
     Compile the following main . cpp c ++ - std = c ++ 17 ld_data . o main . cpp 
#include <iostream>#include <string_view>#ifdef __APPLE__ #include <mach-o/getsect.h>#define DECLARE_LD(NAME) extern const unsigned char _section$__DATA__##NAME[]; #define LD_NAME(NAME) _section$__DATA__##NAME #define LD_SIZE(NAME) (getsectbyname("__DATA", "__" #NAME)->size) #elif (defined __MINGW32__) /* mingw */ #define DECLARE_LD(NAME) \ extern const unsigned char binary_##NAME##_start[]; \ extern const unsigned char binary_##NAME##_end[]; #define LD_NAME(NAME) binary_##NAME##_start #define LD_SIZE(NAME) ((binary_##NAME##_end) - (binary_##NAME##_start)) #else /* gnu/linux ld */ #define DECLARE_LD(NAME) \ extern const unsigned char _binary_##NAME##_start[]; \ extern const unsigned char _binary_##NAME##_end[]; #define LD_NAME(NAME) _binary_##NAME##_start #define LD_SIZE(NAME) ((_binary_##NAME##_end) - (_binary_##NAME##_start)) #endif DECLARE_LD ( ld_data_bin ); int main () { // impossible //static_assert(xxd_data_bin[0] == 'H'); std :: string_view data_view ( reinterpret_cast < const char *> ( LD_NAME ( ld_data_bin )), LD_SIZE ( ld_data_bin ) ); std :: cout << data_view << std :: endl ; // Hello, World! return 0 ; } 
This scales a little bit better in terms of raw compilation time but is shockingly OS, vendor and platform specific in ways that novice developers would not be able to handle fully. The macros are required to erase differences, lest subtle differences in name will destroy one’s ability to use these macros effectively. We omitted the code for handling VC++ resource files because it is excessively verbose than what is present here.
N.B.: Because these declarations are 
9. Acknowledgements
A big thank you to Andrew Tomazos for replying to the author’s e-mails about the prior art. Thank you to Arthur O’Dwyer for providing the author with incredible insight into the Committee’s previous process for how they interpreted the Prior Art.
A special thank you to Agustín Bergé for encouraging the author to talk to the creator of the Prior Art and getting started on this. Thank you to Tom Honermann for direction and insight on how to write a paper and apply for a proposal.
Thank you to Arvid Gerstmann for helping the author understand and use the link-time tools.
Thank you to Tony Van Eerd for valuable advice in improving the main text of this paper.
Thank you to Lilly (Cpplang Slack, @lillypad) for the valuable bikeshed and hole-poking in original designs, alongside Ben Craig who very thoroughly explained his woes when trying to embed large firmware images into a C++ program for deployment into production. Thank you to Elias Kounen and Gabriel Ravier for wording review.
For all this hard work, it is the author’s hope to carry this into C++. It would be the author’s distinct honor to make development cycles easier and better with the programming language we work in and love. ♥