| Document | P1703R1 | 
|---|---|
| Audience | SG2, EWG | 
| Authors | Boris Kolpackov (Code Synthesis) | 
| Reply-To | boris@codesynthesis.com | 
| Date | 2019-07-19 | 
Abstract
Currently, recognizing header unit imports requires full preprocessing
  which is problematic for dependency scanning and partial preprocessing. This
  paper proposes changes that will allow handling such imports with the same
  degree of preprocessing as #include directives.
Revision History
R1 – Add the Wording section.
Contents
1 Background
With the current wording, recognizing a header unit import
  declaration requires performing macros replacement and tokenization of every
  line in a translation unit. As a representative example, consider the
  following line:
MYDECL import <int>;
Whether this is a header unit importation or something else depends on
  what MYDECL expands to. Compare:
#define MYDECL int x; MYDECL import <int>;
And:
template <typename> class import; #define MYDECL using x = MYDECL import <int>;
While the second example is contrived, it is valid (again, according to
  the current wording) because import is a context-sensitive
  keyword.
Requiring such full macro replacement is at a minimum wasteful for header dependency scanning but also may not be something that tools other than compilers can easily and correctly do.
Additionally, several implementations provide support for partial
  preprocessing (GCC's -fdirectives-only and Clang's
  -frewrite-includes) and this requirement is in conflict with
  the essence of that functionality.
More specifically, GCC is currently unable to support header unit imports
  in its -M (dependency scanning) and
  -fdirectives-only (partial preprocessing) modes because in
  these modes it does not perform macro replacement in non-directive
  lines.
While Clang currently performs full preprocessing in its -M
  and -frewrite-includes modes, there is agreement that it's not
  ideal for it to be impossible to correctly extract dependencies without full
  preprocessing.
Finally, consulting with the developers of clang-scan-deps
  (a Clang-based tool for fast dependency extraction) revealed that this
  requirement would be problematic for their implementation.
2 Proposal
We propose to further restrict header unit import declarations so that
  they can be recognized and handled with the same degree of preprocessing as
  #include directives.
Specifically, we propose recognizing a declaration as a header unit import if, additionally to restrictions in [cpp.module.1]:
- It starts with the importtoken orexport importtoken sequence that have not been produced by macro replacement.
- Followed, after macro replacement, by header-name-tokens.
- The entire, single, and only declaration is on one line.
We believe this should not detract much from usability because header
  imports are replacing #include directives where we have the
  same restrictions.
3 Before/After Tables ("Tony Tables")
3.1 Affected Use Cases
| before | after | 
|---|---|
| int x; import <map>; int y; | int x; import <map>; int y; | 
| before | after | 
|---|---|
| import <map>; import <set>; | import <map>; import <set>; | 
| before | after | 
|---|---|
| export import <map>; | export import <map>; | 
| before | after | 
|---|---|
| #ifdef MAYBE_EXPORT export #endif import <map>; | #ifdef MAYBE_EXPORT export import <map>; #else import <map>; #endif | 
| before | after | 
|---|---|
| #define MAYBE_EXPORT export MAYBE_EXPORT import <map>; | #define MAYBE_EXPORT #ifdef MAYBE_EXPORT export import <map>; #else import <map>; #endif | 
3.2 Unaffected Use Cases
Header unit names are still macro-expanded (similar to
  #include):
#define MYMODULE <map> import MYMODULE;
Normal module imports are unaffected:
import std.set; using int_set = std::set<int>;
3.3 Unsupported Use Cases
With the proposed change the following will no longer be possible:
#define MYIMPORT(x) import x MYIMPORT(<set>);
Note also that the following is already impossible (because neither
  #include nor import's closing ; can
  be the result of macro replacement):
#define IMPORT_OR_INCLUDE(x) ??? IMPORT_OR_INCLUDE(<set>)
4 Discussion
4.1 Context-Sensitive Keywords
The proposed change does not fit well with the context-sensitive modules
  keywords semantics. In the current wording, the context is "wide" taking
  into account (after macro expansion) previous lines as well as
  {}-nesting.  The following examples illustrate the problem:
#define MYDECL using x = MYDECL import <int>;
BEGIN_NAMESPACE template<> class import<int>; END_NAMESPACE
Our proposed resolution is to adjust context-sensitivity for header unit
  imports to be based solely on the declaration itself. The fact that import
  should be at the beginning of the line followed by header-name-tokens
  and terminated with ; already makes the "pattern" fairly
  constrained. We could not think of any plausible use-cases for
  " while < all seem to boil down to multi-line
  template-related declarations. And all such cases are easily fixed either by
  adjusting newlines or with ::-qualification. For example:
| before | after | 
|---|---|
| using x = import<int>; template<> class import<int>; | using x = ::import<int>; template<> class import<int>; | 
Doing a search for import < on https://codesearch.isocpp.org
  yielded 2562 matches which unfortunately also included
  #import <... directives. Doing a search for
  #import < produced 2540 matches. From this we can
  conclude (though, without seeing the actual code, with low degree of
  certainty), that there are 20 occurrences of the
  import < token sequence, however, not necessarily at
  the beginning of the line. We've managed to track at least some of these 20
  matches to the Boost.Metaparse library with none of the occurrences being
  problematic.
4.2 One Line Requirement
Requiring the entire header unit import declaration to be on
  a single line is not strictly necessary. The benefit of this restriction is
  the simplification of tools that may then be able to reuse the same code to
  handle both #include directives and header unit
  import declarations (at least we found this to be the case for
  GCC). However, the ability to split the declaration across multiple lines
  could be beneficial in the presence of attributes. For example (courtesy of
  Richard Smith):
import "foo.h" [[clang::import_macros(FOO, BAR, BAZ, QUUX), clang::wrap_in_namespace(foo_namespace)]];
5 Wording
Note that according to the direction given at the Cologne meeting, this section extends the original proposal of this paper to all (module and header unit) imports.
In [lex.pptoken]:
preprocessing-token:
    header-name
    import-keyword
    ...
  - (3.3)
- 
Otherwise, the next preprocessing token is the longest sequence of characters
that could constitute a preprocessing token, even if that would cause further
lexical analysis to fail, except that a header-name ([lex.header]) is only
formed
  - (3.3.1)
- 
within a #includedirective ([cpp.include]),
 after theincludeorimportpreprocessing token in a#include([cpp.include]) orimport([cpp.import]) directive, or
- (3.3.2)
- within a has-include-expression, or.
- (3.3.3)
- outside of any preprocessing directive, if applying phase 4 of translation to the sequence of preprocessing tokens produced thus far is valid and results in an import-seq ([cpp.module]).
 
- 4
- 
The import-keyword is produced by processing an
importdirective ([cpp.import]) and has no associated grammar productions.
In [basic.link]:
- 3
- 
A token sequence beginning
with exportoptmoduleorexportoptimportand not immediately followed by::is never interpreted as the declaration of a top-level-declaration.
In [cpp]:
- 1
- 
A preprocessing directive consists of a sequence of preprocessing tokens that satisfies the following constraints:
The first token in the sequence, referred to as a directive-introducing token,
is a #preprocessing token, animportpreprocessing token, or anexportpreprocessing token immediately followed by animportpreprocessing token, that (at the start of translation phase 4) either isbegins with the first character in the source file (optionally after white space containing no new-line characters) or follows white space containing at least one new-line character. The last token in the sequence is the first new-line character that follows the first token in the sequence.144 A new-line character ends the preprocessing directive even if it occurs within what would otherwise be an invocation of a function-like macro.... control-line:#includepp-tokens new-lineexportoptimportpp-tokens new-line ...
- 4
- 
The only white-space characters that shall appear between preprocessing tokens
within a preprocessing directive (from just after
the introducing #preprocessingdirective-introducing token through just before the terminating new-line character) are space and horizontal-tab (including spaces that have replaced comments or possibly other white-space characters in translation phase 3).
In [cpp.include]:
- 7
- 
If the header identified by the header-name denotes an importable
header ([module.import]), the #includepreprocessing directive is instead replaced by the preprocessing-tokens animportdirective ([cpp.import]) of the formimportheader-name;new-line
Rename [cpp.module] "Header units" to [cpp.import] "Header unit importation" and change order so that it appears immediately after [cpp.include] "Source file inclusion".
Move the pp-balanced-token-seq and associated productions to [cpp.glob.frag].
In [cpp.import] (previously [cpp.module]):
import-seq: top-level-token-seqoptexportoptimporttop-level-token-seq: any pp-balanced-token-seq ending in;or}pp-import:importheader-name pp-import-suffixopt;importheader-name-tokens pp-import-suffixopt;exportoptimportheader-name pp-tokensopt;new-lineexportoptimportheader-name-tokens pp-tokensopt;new-lineexportoptimportpp-tokens;new-line pp-import-suffix: pp-import-suffix-token pp-import-suffix pp-import-suffix-token pp-import-suffix-token: any pp-balanced-token other than;
- 1
- 
 
The preprocessing tokens after the importpreprocessing token in theimportcontrol-line are processed just as in normal text (i.e., each identifier currently defined as a macro name is replaced by its replacement list of preprocessing tokens). A sequence of preprocessing-tokens Animportdirective matching the first two forms of a pp-import instructs the preprocessor to import macros from the header unit ([module.import]) denoted by the header-name. A pp-import is only recognized when the sequence of tokens produced by phase 4 of translation up to theimporttoken forms an import-seq, and theimporttoken is not within the header-name-tokens or pp-import-suffix of another pp-import. The;preprocessing-token terminating a pp-import shall not have been produced by macro replacement ([cpp.replace]). The point of macro import for a the first two forms of pp-import is immediately after the ;new-line terminating the pp-import. The last form of pp-import is only considered if the first two forms did not match.
- 2
- 
In all three forms of pp-import
the importtoken is replaced by the import-keyword token. Additionally, Inin the second form of pp-import, a header-name token is formed as if the header-name-tokens were the pp-tokens of a#includedirective. The header-name-tokens are replaced by the header-name token. [Note: This ensures that imports are treated consistently by the preprocessor and later phases of translation. — end note]
In [module.import]:
module-import-declaration:
    exportopt importimport-keyword module-name attribute-specifier-seqopt ;
    exportopt importimport-keyword module-partition attribute-specifier-seqopt ;
    exportopt importimport-keyword header-name attribute-specifier-seqopt ;
  6 Questions and Answers
6.1 Who will be in Cologne to present this paper?
Boris Kolpackov
6.2 Is there implementation experience?
Yes, an implementation is available in the boris/c++-modules-ex
  GCC branch. This includes working -fdirectives-only mode.
One encouraging result of implementing the proposed change was the
  relative ease of generalizing the #include directive handling
  code in the GCC preprocessor (libcpp) and module mapper to also
  handle header unit imports.
6.3 Is there usage experience?
Yes, the build2 build
  system implements support for header unit importation relying on this
  functionality.
6.4 What shipping vehicle do you target with this proposal?
The same as C++ Modules, presumably C++20.
7 Acknowledgments
To our knowledge this issue was first discovered and documented (in the GCC manual) by Nathan Sidwell.
Thanks to Nathan Sidwell, Richard Smith, Gabriel Dos Reis, Alex Lorenz, Michael Spencer, Cameron DaCamara, David Stone, and Ben Boeckel for discussions regarding this issue and for feedback on earlier drafts of this paper.