Doc. no. 02-0014=N1356
Date: 12 March 2002
Project: Programming Language C++
Reply to: RWGrosse-Kunstleve@lbl.gov

Predictable data layout for certain non-POD types

By R.W. Grosse-Kunstleve & D. Abrahams

It would be nice if every kind of numeric software could be written in C++ without loss of efficiency, but unless something can be found that achieves this without compromising the C++ type system it may be preferable to rely on Fortran, assembler or architecture-specific extensions (Bjarne Stroustrup).

Problem

When combining multiple programming languages as suggested by B. Stroustrup it is essential that the data layout for the types that are to be used from two or more languages are precisely defined. We will also show that a predictable layout is a necessary prerequisite to enabling important optimizations when working within C++. According to ISO/IEC 14882:1998, the data layout of a user defined type in C++ is predictable only if the the type is POD. The conditions which cause a type to become non-POD are very liberal. In particular the presence of any constructor is not compatible with the requirements of a POD struct.

Example 1: std::complex<T>

1(a): Language Interoperability

Since std::complex<T> includes several constructors, it is not POD and the standard doesn't define how its real and imaginary parts are stored. This inhibits portable use of a large number of FORTRAN and C libraries which manipulate complex numbers (e.g. FFTW, FFTPACK, BLAS), both as object libraries and as source code ported to C++. In the latter case a complete rewrite would be required for many libraries to achieve full portability (e.g. FFTPACK), and in some cases the C++ version could not portably achieve similar performance. Internally to these libraries, arrays of complex numbers are commonly treated as arrays of real numbers (reference: cctbx.sf.net, module fftbx).

The C99 standard includes a reserved keyword _Complex for a family of types implementing complex numbers. The data layout is precisely defined in 6.2.5/13 of ISO/IEC 9899:1999:

Each complex type has the same representation and alignment requirements as an array type containing exactly two elements of the corresponding real type; the first element is equal to the real part, and the second element to the imaginary part, of the complex number.
The FORTRAN standard defines the same data layout for the COMPLEX type. In contrast, the definition of ISO/IEC 14882:1998 is much less specific (26.2.2/1):
The class complex describes an object that can store the Cartesian components, real() and imag(), of a complex number.

The current standard does not define storage and alignment requirements. Some have claimed that the internal representation of complex values can be arbitrarily transformed. For example, some people interpret the standard as saying a polar internal representation might be legal. To our knowledge, the data layout of all current implementations of std::complex<T> are actually compatible with C99 and FORTRAN. However, as it stands C++ and C99 or C++ and FORTRAN programs cannot be interfaced portably because of the liberal definition 26.2.2/1 in ISO/IEC 14882:1998.

1(b): Optimization considerations

To facilitate the discussion, we will use a highly simplified outline of one of the most important algorithms in numerical applications: an inplace real-to-complex Fast Fourier Transform (FFT).

std::vector<double> vec;
// fill vec
std::complex<double>*
result = fft_real_to_complex(&*vec.begin(), vec.size());

std::complex<double>*
fft_real_to_complex(double* seq, std::size_t n)
{
  std::complex<double>*
  result = reinterpret_cast<std::complex<double>*>(seq);
  // Do the transform. In the process the array of real
  // values will become an array of complex values.
  return result;
}

To be able to do the transform truly in place (i.e., without copying an entire array at some point in the algorithm) it is essential that either (a) the data layout of std::complex<T> is predictable or (b) the real and imaginary parts of the complex values are directly accessible, such as through references. ISO/IEC 14882:1998 does not provide any of these prerequisites.

A predictable data layout or direct access through references is also a prerequisite to enabling essential speed optimizations, even for complex-to-complex transforms. Example: Any of the automatically generated codelets in FFTW, such as ftw_4.c.

For the algorithm above to work it is essential that

Example 2: Interfacing Python and C++

Python is a dynamically typed, object-oriented, interpreted language and therefore a powerful complement for the statically-typed, compiled C++ language. The most popular implementation of the Python programming language is written in ANSI C89. David Abrahams has been implementing a system for the integration of C-Python and C++ (reference: www.boost.org, module Boost.Python).

In the Python 'C' API, all objects are manipulated through pointers to a "base" struct PyObject. The layout of every Python object which participates in its cycle garbage-collection begins with the layout of a PyObject. The PyObject contains a reference count and what is for all intents and purposes a vtable. This arrangement provides a crude form of object-orientation in 'C' and the basic idioms have been repeated in the implementations of countless languages and systems.

The 'C' programmer wishing to implement a new object type in Python has the opportunity to employ two of the language's most-beloved features, macros and 'C'-style casts:

struct MyObject
{
    PyObject_HEAD   // MACRO providing the members of PyObject
    T1 additional_data_1;
    T2 additional_data_2;
};

// Return a Python string representing MyObject
PyObject* MyObject_print(PyObject* o)
{
    MyObject* x = (MyObject*)o; // downcast
    ...
}

// "vtbl"
PyTypeObject MyType = {
    ...
    MyObject_print,
    ...
};

// Creation function
PyObject* MyObject_new()
{
    // MACRO invocation which allocates memory and initializes
    MyObject* result = PyObject_New(MyObject, &MyObject_Type);
    ...more initialization...
    return (PyObject*)result;
}
In keeping with the design intention that C++ is "a better C", consider how we might solve this problem in C++. Obviously, we'd use inheritance to eliminate macros and casting as much as possible. We'd add constructors for MyObject and PyObject to eliminate the need for initialization in MyObject_new(). We'd use real virtual functions instead of an ad-hoc PyTypeObject filled with functions using the 'C' calling convention.

Unfortunately, the rest of Python is still written in 'C', so we really can't expect to replace the PyTypeObject with real virtual functions here. However, we are tantalizingly close to being able to do very much better than shown above in C++:

// Base object for all Python extension types
struct PyBaseObject : PyObject
{
    // initializes refcount and vtbl
    PyObject(PyTypeObject const&);
    // allocates in Python's special GC area
    void* operator new(std::size_t n);
};

extern "C" PyObject* MyObject_print(PyObject* o) {
    MyObject* x = static_cast<MyObject*>(o);
}

PyTypeObject MyType = {
    ...
    MyObject_print,
    ...
};

struct MyObject : PyBaseObject
{
    MyObject() : PyBaseObject(MyType) {...}
};

// Just use operator new for allocation
Though the above works on every C++ implementation we know of, it relies on an assumption which is technically non-portable: that base classes in non-virtual inheritance hierarchies are laid out as though they were the first data members of a class. The assumption is invalid because the classes involved are non-POD: they have both base classes and constructors. In the absence of such a guarantee, or a way to achieve it, the C++ programmer is exposed to most of the same dangers as the 'C' programmer when interfacing to many 'C' systems, and to Python in particular.

Proposed resolution

The original considerations about POD focused strictly on being able to interoperate with types defined in 'C', but not on being able to leverage the power of C++ for interfacing with 'C' systems. The examples above illustrate the importance of a predictable data layout for this and other purposes. Therefore:

  1. To facilitate the usability of std::complex in multi-language projects we propose to adopt the definition 6.2.5/13 of ISO/IEC 9899:1999 for std::complex<T>.

  2. To ease reinterpretion of an array of std::complex<T> as an array of T and vice versa, we propose that the member functions for data access return references instead of copies.

  3. We propose that the standard includes a new concept of "Enhanced POD" which allows the use of certain C++ language features such as constructors and inheritance as a notational convenience while providing POD-like guarantees for data layout. The exact definition of Enhanced POD (e.g. no virtual functions, single or multiple inheritance, etc.) is open to discussion.

    By allowing constructors, and thus ensuring initialization to a valid state, the Enhanced POD concept encourages safer programming practices. Right now certain classes (endian arithmetic, for example) are often designed without constructors so that they can be used in contexts requiring POD types. This is neither as safe or convenient as if these classes had constructors.

    Presumably many of the contexts now requiring POD types will be relaxed to require only Enhanced POD types. In particular, it would be very helpful if the requirements on implementations for POD types in 3.9 paragraphs 2-4 could also apply to Enhanced POD types.

We encourage the committee to consider the Enhanced POD proposal separately from the others.

The proposals will allow to (a) build arrays of std::complex<T> with a predictable data layout and (b) portably pass T* pointers to these arrays to other languages, e.g.:

void foo(double *data, long n); // C library function
std::vector<std::complex<double> > vec;
// fill vec
foo(&vec[0].real(), vec.size());

Acknowledgments

John Spicer's "advice and consent" was invaluable in formulating this proposal. We thank Beman Dawes for contributing substantive additional motivation, and Robert Stewart for careful proof reading.