Doc. no.   WG21/N2062=06-0132
Date:        2006-09-06
Project:     Programming Language C++
Reply to:   Beman Dawes <bdawes@acm.org>

POD's Revisited

Introduction
Features and benefits of POD types
Motivating examples
    std::pair example
    Endian example
    Two structs example
    Atomic example
Coupling between POD's and aggregates
Rationale for changes
Proposed changes to the Working Paper
    POD in the Standard, with changes
Impact on existing code
Interactions with other proposals
Acknowledgements
References

Introduction

This paper proposes resolutions for Core Issue 568, Definition of POD is too strict, submitted by Matt Austern.

The current working paper has several problems with POD's:

Features and benefits of POD types

Features Benefits
Byte copyable guarantees [3.9 2-3, basic.types]
  • Programs tcan safely apply coding optimizations, particularly std::memcpy.
C layout-compatible guarantees, including including byte copyable, and [9.2 14-17, class.mem]
  • C++ programs can interoperate with functions written in C and other languages.
  • C++ programs can, after considering compiler, alignment, and data type constraints, perform binary I/O such that files to interoperate with other languages and platforms.
C code compatibility guarantees, including byte copyable, C layout compatible, and numerous initialization rules
  • C language compatibility.
Static initialization guarantees [3.6.2, basic.start.init]
  • Programs can avoid order-of-initialization issues.
  • Multi-threaded programs can avoid data races during initialization.
Various rules for non-POD's
  • Compilers apply data layout optimizations.
  • Compilers assume non-aliasing, allowing code generation optimizations.

Motivating examples

std::pair example

Matt Austern provided this example:

If a program has two arrays of type std::pair<int,int>, then it is natural to expect that memcpy(A2,A1,sizeof(A2)) would be safe. Programmers have trouble imagining any implementation in which a byte-for-byte copy of std::pair<int,int> wouldn't do the right thing. Unfortunately, that's not what the language standard says. It says that byte-for-byte copies are guaranteed to work only for PODs. std::pair<T,U> isn't a class aggregate, since it has a user-defined constructor, and that means it also isn't a POD.

std::pair has a user-defined constructor essentially for syntactic reasons: because in some cases it looks nicer to write "std::pair<int,int> p(1,2);" than to write "std::pair<int,int> p = {1,2};". It seems a shame that this syntactic change caused the loss of the important semantic property of PODness. It's especially a shame because it means something formally doesn't work when on all real-world implementations it actually does work. It also encourages programmers to rely on undefined behavior, which is something the standard should not encourage.

With the proposed wording, the example pair becomes a POD, solving the issue.

Endian example

Beman Dawes provided this eample:

Here is an example of something in development for Boost, based on classes used in industrial applications for many years. The fact that it is a template partial specialization isn't material to this discussion and can be ignored.

template <typename T, std::size_t n_bits>
class endian< big, T, n_bits, unaligned > : cover_operators< endian< big, T, n_bits >, T >
{
  BOOST_STATIC_ASSERT( (n_bits/8)*8 == n_bits );
public:
  typedef T value_type;
  endian() {}
  endian(T i) { detail::store_big_endian<T, n_bits/8>(bytes, i); }
  operator T() const { return detail::load_big_endian<T, n_bits/8>(bytes); }
private:
  char bytes[n_bits/8];
};

But it isn't a POD, so it won't work at all in unions. Some uses such as binary I/O rely on undefined behavior. Since the rationale for having endian is to do binary I/O, forcing the user to rely on undefined behavior is unfortunate to say the least.

Here is what would have to be done to make it a POD:

Remove the constructors. But that makes initialization painful, so boosters are proposing to add an ugly and unintuitive static init function, and an operator= from the value_type. Those are partial workarounds, but not really what the designers, Beman Dawes and Darin Adler, want.

Make the data member public. But this encourages a poor design practice.

Eliminate the base class. But the only way to do that without the highly error-prone duplication of the functions provided by the base class is to introduce a lengthy macro. Enough said.

In other words, making this class a POD under current language rules would do serious damage to interface ease-of-use and to code quality, and would encourage poor design practices. Yet the only data member of the class is an array of char, so programmers intuitively expect the class to be memcpyable and binary I/O-able.

With the proposed wording, the class becomes a POD, solving all the issues.

Two structs example

Matt Austern provided this example in Core DR 568:

It’s silly for the standard to make layout and memcpy guarantees for this class:

struct A {
  int n;
};

but not for this one:

struct B {
  int n;
  B(n_) : n(n_) { }
};

With either A or B, it ought to be possible to save an array of those objects to disk with a single call to Unix’s write(2) system call or the equivalent. At present the standard says that it’s legal for A but not B, and there isn’t any good reason for that distinction.

With the proposed wording, the class becomes a POD, solving all the issues.

Atomic example

Lawrence Crowl provided this example.

Consider a class providing atomic operations. Among other requirements, it should:

For best C++ coding practice, the data should be private. But that would make the class a non-POD under current rules. Under the proposed rules, it is allowable for the data members to be private, as long as all are private.

Under both the current and proposed rules, there doesn't seem to be any way to make a POD non-copyable.

Coupling between POD's and aggregates

POD's provide object representation guarantees, layout-compatibility guarantees, memory contiguity guarantees, and memory copy-ability guarantees for fairly simple types, yet leave compilers much latitude in such matters for more complicated types.

Aggregates provide well-defined initialization from initializer-clauses.

The two concepts are at most tangential, if not completely orthogonal. Thus to define POD in terms of aggregates creates an unnecessary and confusing dependency. It makes otherwise straightforward changes to the Standard POD and aggregate sections much more difficult because of the need to analyze a potential change for impact on both POD's and aggregates. The coupling is confusing to users, causing them to make mistaken assumptions about POD's. The coupling may be part of the reason even committee members cannot accurately remember the full rules for POD-ness.

Rationale for changes

The proposed changes decompose the byte-copyability requirement from the larger POD requirements. The dependency on the definition of aggregates by the definition of POD is removed. Instead, additional POD requirements are tailored to the needs of POD's. Because these requirements are somewhat less restrictive than the requirements for aggregates, the effect is to make POD's more broadly useful and solve the problems identified in the Introduction and Motivating examples.

Changes are not proposed that would allow POD's to be non-copyable. There was no apparent way to provide syntax for this without more complexity than is justified by a need judged to be fairly minor.

Changes are not proposed that would allow POD's to have base classes with non-static data members. There was no apparent way to allow these cases without putting undue restrictions on how compilers layout base class data in relation to derived class data.

Proposed changes to the Working Paper

Added text is shown in green and underlined. Deleted text is show in red with strikethrough.

Proposed text:

Change 9 [class] paragraph 4 as indicated:

A structure is a class defined with the class-key struct; its members and base classes (clause 10) are public by default (clause 11). A union is a class defined with the class-key union; its members are public by default and it holds only one data member at a time (9.5). [ Note: aggregates of class type are described in 8.5.1. —end note ]  A byte-copyable-class is a class that has a trivial copy constructor (12.8), a trivial copy assignment operator (13.5.3, 12.8), and a trivial destructor (12.4). [Note: Among other requirements, that precludes virtual functions, virtual bases, and members or bases with non-trivial copy constructors, copy assignments, or destructors. --end note]

A POD-struct is an aggregate a byte-copyable class that:

— has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and
all non-static data members have the same access control (clause 11), and
has no non-POD base classes, and no base classes with data members.

Similarly, a POD-union is an aggregate a union that has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and has no user-declared copy assignment operator and no user-declared destructor. A POD class is a class that is either a POD-struct or a POD-union. [Note: virtual functions and base classes are prohibited in unions (9.5). -- end note.]

Change other WP text as indicated in the POD in the Standard table below.

POD in the Standard, with changes

The following table lists uses of POD in the current working paper, with proposed changes.

Working Paper Text Proposal
1.8 5 [intro.object]

An object of POD5) type (3.9) shall occupy contiguous bytes of storage.

No change
3.6.2 ¶1 Initialization of non-local objects

Objects with static storage duration (3.7.1) shall be zero-initialized (8.5) before any other initialization takes place. A reference with static storage duration and an object of POD type with static storage duration can be initialized with a constant expression (5.19); this is called constant initialization. Together, zero-initialization and constant initialization are called static initialization; all other initialization is dynamic initialization.

No change
3.8 ¶2 Object Lifetime

 [ Note: the lifetime of an array object or of an object of POD type (3.9) starts as soon as storage with proper size and alignment is obtained, and its lifetime ends when the storage which the array or object occupies is reused or released. 12.6.2 describes the lifetime of base and member subobjects. —end note ]

No change
3.8 ¶5 Object Lifetime

Restrictions on pointers to partially constructed non-POD types.

No change
3.8 ¶6 Object Lifetime

Restrictions on l-values of partially constructed non-POD types.

No change
3.9 ¶2 Types

For any object (other than a base-class subobject) of POD byte-copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char.41) If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.

Change as indicated
3.9 ¶3 Types

For any POD byte-copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the value of obj1 is copied into obj2, using the std::memcpy library function, obj2 shall subsequently hold the same value as obj1.

Change as indicated
3.9 ¶4 Types

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object is the set of bits that hold the value of type T. For POD byte-copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.42)

Change as indicated
3.9 ¶10 Types

Arithmetic types (3.9.1), enumeration types, pointer types, and pointer to member types (3.9.2), and cv-qualified versions of these types (3.9.3) are collectively called scalar types. Scalar types, POD-struct types, POD-union types (clause 9), arrays of such types and cv-qualified versions of these types (3.9.3) are collectively called POD types.

No change
3.9 ¶11 Types

If two types T1 and T2 are the same type, then T1 and T2 are layout-compatible types. [ Note: Layout-compatible enumerations are described in 7.2. Layout-compatible POD-structs and POD-unions are described in 9.2. —end note ]

No change
5.2 ¶7 Postfix expressions

When there is no parameter for a given argument, the argument is passed in such a way that the receiving function can obtain the value of the argument by invoking va_arg (18.8). The lvalue-to-rvalue (4.1), array-to-pointer (4.2), and function-to-pointer (4.3) standard conversions are performed on the argument expression. After these conversions, if the argument does not have arithmetic, enumeration, pointer, pointer to member, or class type, the program is ill-formed. If the argument has a non-POD class type (clause 9), the behavior is undefined. If the argument has integral or enumeration type that is subject to the integral promotions (4.5), or a floating point type that is subject to the floating point promotion (4.6), the value of the argument is converted to the promoted type before the call. These promotions are referred to as the default argument promotions.

No change
5.3.4 ¶16 New

A new-expression that creates an object of type T initializes that object as follows:
— If the new-initializer is omitted:
    — If T is a (possibly cv-qualified) non-POD class type (or array thereof), the object is default-initialized (8.5). If T is a const-qualified type, the underlying class type shall have a user-declared default constructor.
    — Otherwise, the object created has indeterminate value. If T is a const-qualified type, or a (possibly cv-qualified) POD class type (or array thereof) containing (directly or indirectly) a member of const-qualified type, the program is ill-formed;
— If the new-initializer is of the form (), the item is value-initialized (8.5);
— If the new-initializer is of the form (expression-list) and T is a class type, the appropriate constructor is called, using expression-list as the arguments (8.5);
— If the new-initializer is of the form (expression-list) and T is an arithmetic, enumeration, pointer, or pointer-to-member type and expression-list comprises exactly one expression, then the object is initialized to the (possibly converted) value of the expression (8.5);
— Otherwise the new-expression is ill-formed.

No change
5.19 ¶4 Constant expressions

An address constant expression is a pointer to an lvalue designating an object of static storage duration, a string literal (2.13.4), or a function. The pointer shall be created explicitly, using the unary & operator, or implicitly using a non-type template parameter of pointer type, or using an expression of array (4.2) or function (4.3) type. The subscripting operator [] and the class member access . and -> operators, the & and * unary operators, and pointer casts (except dynamic_casts, 5.2.7) can be used in the creation of an address constant expression, but the value of an object shall not be accessed by the use of these operators. If the subscripting operator is used, one of its operands shall be an integral constant expression. An expression that designates the address of a subobject of a non-POD class object (clause 9) is not an address constant expression (12.7). Function calls shall not be used in an address constant expression, even if the function is inline and has a reference return type.

No change
5.19 ¶5 Constant expressions

A reference constant expression is an lvalue designating an object of static storage duration, a non-type template parameter of reference type, or a function. The subscripting operator [], the class member access . and -> operators, the & and * unary operators, and reference casts (except those invoking user-defined conversion functions (12.3.2) and except dynamic_casts (5.2.7)) can be used in the creation of a reference constant expression, but the value of an object shall not be accessed by the use of these operators. If the subscripting operator is used, one of its operands shall be an integral constant expression. An lvalue expression that designates a member or base class of a non-POD class object (clause 9) is not a reference constant expression (12.7). Function calls shall not be used in a reference constant expression, even if the function is inline and has a reference return type.

No change
6.7 ¶3 Declaration statement

It is possible to transfer into a block, but not in a way that bypasses declarations with initialization. A program that jumps82) from a point where a local variable with automatic storage duration is not in scope to a point where it is in scope is ill-formed unless the variable has POD type (3.9) and is declared without an initializer (8.5).

No change
6.8 ¶4 Ambiguity resolution

The zero-initialization (8.5) of all local objects with static storage duration (3.7.1) is performed before any other initialization takes place. A local object of POD type (3.9) with static storage duration initialized with constant-expressions is initialized before its block is first entered. An implementation is permitted to perform early initialization of other local objects with static storage duration under the same conditions that an implementation is permitted to statically initialize an object with static storage duration in namespace scope (3.6.2). Otherwise such an object is initialized the first time control passes through its declaration; such an object is considered initialized upon the completion of its initialization. If the initialization exits by throwing an exception, the initialization is not complete, so it will be tried again the next time control enters the declaration. If control re-enters the declaration (recursively) while the object is being initialized, the behavior is undefined.

No change
8.5 ¶5 Initializers

To default-initialize an object of type T means:
— if T is a non-POD class type (clause 9), the default constructor for T is called (and the initialization is ill-formed if T has no accessible default constructor);
— if T is an array type, each element is default-initialized;
— otherwise, the object is zero-initialized.

No change
8.5 ¶9 Initializers

If no initializer is specified for an object, and the object is of (possibly cv-qualified) non-POD class type (or array thereof), the object shall be default-initialized; if the object is of const-qualified type, the underlying class type shall have a user-declared default constructor. Otherwise, if no initializer is specified for a non-static object, the object and its subobjects, if any, have an indeterminate initial value97); if the object or any of its subobjects are of const-qualified type, the program is ill-formed.

No change
8.5 ¶14 Initializers

When an aggregate with static storage duration is initialized with a brace-enclosed initializer-list, if all the member initializer expressions are constant expressions, and the aggregate is a POD type, the initialization shall be done during the static phase of initialization (3.6.2); otherwise, it is unspecified whether the initialization of members with constant expressions takes place during the static phase or during the dynamic phase of initialization.

No Change 
9 ¶4 Classes [class]

A structure is a class defined with the class-key struct; its members and base classes (clause 10) are public by default (clause 11). A union is a class defined with the class-key union; its members are public by default and it holds only one data member at a time (9.5). [ Note: aggregates of class type are described in 8.5.1. —end note ] A POD-struct is an aggregate class that has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and has no user-declared copy assignment operator and no user-declared destructor. Similarly, a POD-union is an aggregate union that has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and has no user-declared copy assignment operator and no user-declared destructor. A POD class is a class that is either a POD-struct or a POD-union.

See proposed change above.
9.2 ¶15-18 Class members [class.mem]

15 Two POD-struct (clause 9) types are layout-compatible if they have the same number of non-static data members, and corresponding non-static data members (in order) have layout-compatible types (3.9).

16 Two POD-union (clause 9) types are layout-compatible if they have the same number of non-static data members, and corresponding non-static data members (in any order) have layout-compatible types (3.9).

17 If a POD-union contains two or more POD-structs that share a common initial sequence, and if the POD-union object currently contains one of these POD-structs, it is permitted to inspect the common initial part of any of them. Two POD-structs share a common initial sequence if corresponding members have layout-compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

18 A pointer to a POD-struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note: There might therefore be unnamed padding within a POD-struct object, but not at its beginning, as necessary to achieve appropriate alignment. —end note ]

No Change
9.5 ¶1 Unions

In a union, at most one of the data members can be active at any time, that is, the value of at most one of the data members can be stored in a union at any time. [ Note: one special guarantee is made in order to simplify the use of unions: If a POD-union contains several POD-structs that share a common initial sequence (9.2), and if an object of this POD-union type contains one of the POD-structs, it is permitted to inspect the common initial sequence of any of POD-struct members; see 9.2. —end note ] The size of a union is sufficient to contain the largest of its data members. Each data member is allocated as if it were the sole member of a struct. A union can have member functions (including constructors and destructors), but not virtual (10.3) functions. A union shall not have base classes. A union shall not be used as a base class. An object of a class with a non-trivial default constructor (12.1), a non-trivial copy constructor (12.8), a non-trivial destructor (12.4), or a non-trivial copy assignment operator (13.5.3, 12.8) cannot be a member of a union, nor can an array of such objects. If a union contains a static data member, or a member of reference type, the program is ill-formed.

No change
12.6.2 ¶4 Initializing bases and members

If a given non-static data member or base class is not named by a mem-initializer-id (including the case where there is no mem-initializer-list because the constructor has no ctor-initializer), then

— If the entity is a non-static data member of (possibly cv-qualified) class type (or array thereof) or a base class, and the entity class is a non-POD class the entity is default-initialized (8.5). If the entity is a non-static data member of a const-qualified type, the entity class shall have a user-declared default constructor.

— Otherwise, the entity is not initialized. If the entity is of const-qualified type or reference type, or of a (possibly cv-qualified) POD class type (or array thereof) containing (directly or indirectly) a member of a const-qualified type, the program is ill-formed.

After the call to a constructor for class X has completed, if a member of X is neither specified in the constructor’s mem-initializers, nor default-initialized, nor value-initialized, nor given a value during execution of the body of the constructor, the member has indeterminate value.

No change
12.7 ¶1 Construction and destruction

For an object of non-POD class type (clause 9) before the constructor begins execution and after the destructor finishes execution, referring to any non-static member or base class of the object results in undefined behavior. [ Example:

struct X { int i; };                  // POD
struct Y : X { };                     // non-POD
struct A { int a; };                  // POD
struct B : public A { int j; Y y; };  // non-POD

extern B bobj;
B* pb = &bobj;         // OK
int* p1 = &bobj.a;     // undefined, refers to base class member
int* p2 = &bobj.y.i;   // undefined, refers to member’s member

A* pa = &bobj;         // undefined, upcast to a base class type
B bobj;                // definition of bobj

extern X xobj;
int* p3 = &xobj.i;     //OK, X is a POD class
X xobj;
Change  as indicated
17.1.3 character container type

a class or a type used to represent a character (17.1.2). It is used for one of the template parameters of the string and iostream class templates. A character container class shall be a POD (3.9) type.

No change. Users expect characters involved in I/O to be C-layout-compatible, and thus POD types.
18.1 ¶4 Types

The macro offsetof(type, member-designator) accepts a restricted set of type arguments in this International Standard. If type is not a POD structure or a POD union (clause 9), the results are undefined.189) The expression offsetof(type, member-designator) is never type-dependent (14.6.2.2) and it is value-dependent (14.6.2.3) if and only if type is dependent. The result of applying the offsetof macro to a field that is a static data member or a function member is undefined.

No change
20.4 type traits has many uses of POD in the specification of is_pod. Most of those uses clearly will remain unchanged. Uses in other type traits need to be reviewed TODO
21 ¶1 Strings library

This clause describes components for manipulating sequences of “characters,” where characters may be of any POD (3.9) type. In this clause such types are called char-like types, and objects of char-like types are called char-like objects or simply “characters.”

No change. Users expect c_str() and data() to return pointers to C-layout-compatible, and thus POD types.
25.4 ¶4 C library algorithms

The function signature:

qsort(void *, size_t, size_t, int (*)(const void *, const void *));

is replaced by the two declarations:

extern "C" void qsort(void* base , size_t nmemb , size_t size, int (*compar )(const void*, const void*));

extern "C++" void qsort(void* base , size_t nmemb , size_t size, int (*compar )(const void*, const void*));

both of which have the same behavior as the original declaration. The behavior is undefined unless the objects in the array pointed to by base are of POD byte-copyable type.

Change as indicated

Impact on existing code

The proposed changes will cause some existing non-POD's to become POD's. This may result in less optimization being performed. The problem can be eliminated by adding a user-defined do-nothing destructor.

Adding a user-defined do-nothing destructor to existing code to leave POD-ness unchanged is simple enough that it could be done programmatically. If a compiler vendor felt this was a serious concern for their user-base, they might wish to provide such a program. Alternately, compilers may wish to issue warnings during a transition period if the new rules change a non-POD into a POD.

Interaction with other proposals

See N1824, Extending Aggregate Initialization. Whichever proposal is accepted first, the other will have to be reviewed, and possibly revised, accordingly.

Acknowledgements

Matt Austern, Greg Colvin, Alisdair Meredith, and Clark Nelson provided helpful comments during preparation of this proposal. Our cat Jane woke me up in the middle of the night, provoking this proposal as an alternative to counting sheep (or cats).

References

N1824 Extending Aggregate Initialization, Alisdair Meredith,  www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1824.htm

Core issue 568. www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#568