Enumerating Core Undefined Behavior

Draft Proposal,

This version:
Toggle Diffs:
ISO/IEC JTC1/SC22/WG21 14882: Programming Language — C++


Adding an undefined behavior annex to the Standard and creating an C++ undefined behavior TR

1. Introduction

Explaining undefined behavior is complicated. First you need to explain what undefined behavior is. Then all the unintuitive consequences that it entails. Including removal of safety checks, turning finite loops infinite, booleans that can both be false and true and how undefined behavior can time travel 🤯

Then comes the next logical question, how can I know what all the undefined behavior are so I can avoid them. This may be followed by an awkward silence, “That is complicated”, one might say. We might follow up and mention we have both explicit and implicit undefined behavior. A fair response might be, “Makes sense but surely you can tell me what all the explicit undefined behaviors are?”. This would be followed by more awkward silence. Followed by a perhaps sheepish, “Well you see that is also complicated”.

We would have to follow-up and point out that the C++ Standard does indeed list all the explicit undefined behavior but you would have to manually go through the 1700+ page Standard to find them. Merely finding all the mentions of “undefined” is only partially helpful. The Standard being a specification and not a tutorial does not explain each in plain language and honestly some defy explanation in plain language. Examples are not always provided, neither do we have rationales or explanations on how to avoid or catch violations of these rules (if possible).

The goal of this paper is two fold. One is to create an annex of undefined behavior. The purpose would be to have a list of all the explicit core undefined behavior along with at least one example demonstrating it. Having this list will enable the C++ community to better grasp the scope and depth of undefined behavior. It should benefit not just users but also those teaching C++ and those developing tools for writing better code. It will benefit implementors because it lets them know what’s undefined and how. It will help the committee track its undefined behavior and revisit it.

The second goal would be to develop of a core undefined behavior TR, which would expand upon the content of the annex with more examples including examples showing surprising consequences. It would also include tools if any that could aid in detecting or avoiding each undefined behavior. If possible we would also like to include a rationale for each undefined behavior. This will have all the benefits of that annex but with more details and rationale should aid in teaching. Additionally this should also be a help to researchers both in understanding, developing better tools and perhaps finding alternatives approaches to undefined behavior.

2. Edit History

2.1. r0 → r1

[P1705R0] was seen by SG12 in Cologne. This update does the following:

Poll Group SF F N A SA Outcome
Add an informative annex to the WD which lists core language UB organized by type to be maintained by someone after initial version from author. SG12 8 2 0 0 0
Include implicit (by omission) UB on a best effort basis. SG12 5 4 1 0 0
Target an informative document (TR/SD/?) that contains UB along with rationale and additional information to be maintained by SG12 after initial version from author along the lines of the contents of P1705R0 SG12 3 3 3 1 0

3. Goals of Undefined Behavior Annex

4. Benefits of Undefined Behavior Annex

5. Implementation of Undefined Behavior Annex

Annex E will be modeled similar Annex C with a description of each undefined behavior and at least one example. Each Annex E entry will have its own stable name that we can use to refer to the specific undefined behavior unambiguously.

There will be two LaTeX macros: one to link each annex entry back to the normative wording via a reference, and a second macro to link the normative wording to the Annex E via a reference. Having a two way reference will us to maintain the Annex E more seamlessly without having to worry about the normative text moving or being removed without this being reflected in Annex E.

6. Stable Names

Stables names will take the form of ub.SectionName.UniqueName. For example:

7. Goals of Undefined Behavior TR

8. Benefits of Undefined Behavior TR

9. How would the Undefined Behavior TR relate to the Core Guidelines

The C++ Core Guidelines are focused on "relatively high-level issues", which is appropriate for a document that seeks to "help people to use modern C++ effectively". Undefined behavior itself may be one high-level topic and does deserve specific mention in the Core Guidelines. The Undefined Behavior TR would be more focused, drilling into each specific core undefined behavior with details. The undefined behavior TR would therefore inform the Core Guidelines.

10. What about Standard Library Undefined Behavior?

Undefined behavior is a large topic, to make it a more tractable problem we believe tackling Core undefined behavior separately from Library undefined behavior makes sense. Core and Library already have separate processes and tackling them seperately will allow those with expertise in Core or Library to focus on those areas repsectively. This proposal specifically focuses on Core while acknowledging that documenting Library undefined behavior is important, we leave that to a future proposal.

11. How Might the TR look

There has been some effort to document core undefined behavior and below I will provide an example of one approach to an undefined behavior TR. This works covers about most of the explicit core undefined behavior with at least one example for each undefined behavior. To a lesser extent it covers rationales, backgrounds and tools:

11.1. [lex]

11.1.1. [lex.phases]

11.1.2. [lex.string]

11.2. [basic]

11.2.1. [basic.def.odr]

11.2.2. [basic.life]

11.2.3. [basic.indet]

11.2.4. [basic.start]

11.3. [expr]

11.3.1. [expr.pre]

11.3.2. [conv.double]

11.3.3. [conv.fpint]

11.3.4. [expr.call]

11.3.5. [expr.static.cast]

11.3.6. [expr.delete]

11.3.7. [expr.mptr.oper]

11.3.8. [expr.mul]

11.3.9. [expr.add]

11.3.10. [expr.shift]

11.3.11. [expr.ass]

11.4. [stmt.stmt]

11.4.1. [stmt.return]

11.4.2. [stmt.dcl]

11.5. [dcl.dcl]

11.5.1. [dcl.type.cv]

11.5.2. [dcl.attr.contract.syn]

11.5.3. [dcl.attr.contract.syn]

11.5.4. [dcl.attr.contract.check]

11.5.5. [dcl.attr.noreturn]

11.6. [class]

11.6.1. [class.mfct.non-static]

11.6.2. [class.dtor]

11.6.3. [class.union]

11.6.4. [class.abstract]

11.6.5. [class.base.init]

11.6.6. [class.cdtor]

12. Acknowledgement

Thanks to JF Bastien for his review.


Informative References

Shafik Yaghmour. Enumerating Core Undefined Behavior. 13 June 2019. URL: https://wg21.link/p1705r0