Name n3633, alx-0068r2 - C source files are text files Principles - Keep the language small and simple. - Codify existing practice to address evident deficiencies. - Follow international standards Category Attributes Author Reported-by: Martin Uecker Suggested-by: Alejandro Colomar Reported-by: Joseph Myers Suggested-by: Marcus Johnson Suggested-by: Aaron Peter Bachmann Suggested-by: Christopher Bazley Cc: Martin Uecker Cc: John McCall Signed-off-by: Alejandro Colomar History r0 (2025-09-19): - Initial draft. r1 (2025-12-02; n3758): - Remove line. r2 (2025-12-09; n3633): - Allow empty files, as POSIX does. - Rebase on n3685. Description One could think that this paper acts in bad faith, by trying to revise a very recent decision by strong consensus of WG14. Oh, hilarity! No, wake me from this bad dream! Just kidding. Let's go straight to this proposal. So, WG14 decided, by strong consensus, to remove some weird undefined behavior, just for the sake of saying they're removed one undefined behavior. Did WG14 consider the consequences of it? In retrospective, it seems not. It is an exercise to the reader to judge whether removing undefined behavior without enough analysis is a good or a bad thing, and how the committee should make sure this doesn't happen often in the future. If only anyone had warned that POSIX already requires text files as input to most tools for good reasons. And that defining the behaviour could cause second-order bugs, even if just for mistakes in implementations, which would need to be unnecessarily more complex by having to deal with non-text input. Too bad that nobody raised such concerns, or did they? But to rectify is for the wise. The committee, or part of it, seems to have realized the mistake, and that a constraint violation would have been a better choice. Let's forgive those who rectify their own mistakes. Since this is mostly a theoretical UB, and no hard drives were damaged due to it, let's constrain it. (Low) quality implementations are free to define their behavior after the mandatory diagnostic, and are even free to decide to hide the diagnostic under a -Wpedantic flag that turns on the conforming diagnostic. So, this is not even a problem for users that need the behavior defined, if they really exist. They'll come to their vendors in the black market of extensions, and make an appropriate deal with them. Let's fix the standard, and keep it simple. Prior art POSIX requires that C source files are text files, which essentially means that they are terminated by a newline character. (It also means that they don't contain NUL characters, and that no line exceeds LINE_MAX, but let's ignore those details for this discussion.) This is not a requirement specific to C source files. POSIX requires text files as input to almost every command. (With the obvious exceptions of those commands that are not meant to handle text.) Let's follow POSIX, and require that C source files are text files. This includes disallowing the case of a source file with 0 bytes. See also This proposal reverts course from n3411 . However, instead of bringing back the UB, it turns it into a constraint violation. Proposed wording Based on N3685. 6.4.1 Lexical elements :: General @@ Constraints, p2+1 +A source file +that is not empty +shall end in a new-line character, +which shall not be immediately preceded +by a backslash character +before splicing takes place as described in 5.2.1.2.