Accredited Standards Committee X3       Doc No: X3J16/94-0182   WG21/N0569
  Information Processing Systems          Date:   Sept 27, 1994   Page 1 of 4
  Operating under the procedures of       Project: Programming Language C++
  American National Standards Institute   Ref Doc:
                                          Reply to: Josee Lajoie
                                                    (josee@vnet.ibm.com)
+--------------------------+
| Object Shape Acquisition |
+--------------------------+

1. Introduction
===============

The shape of an object describes its storage layout.  An object has the
shape of a T if it occupies a storage location of size sizeof(T), if its
address respects the alignment requirements of a T, and, if T is a class
type, if the members are allocated as required by the definition of type
T.

2. Static Shape and Dynamic Shape
=================================

The static shape of an object describes the storage layout properties
of the object that are known when storage for the object is allocated.
These properties are:
    o size
    o alignment
    o location of non-virtual base class sub-objects
    o location of member sub-objects

The dynamic shape of an object describes the layout properties of the
object that are set up only when the object is initialized.  These
properties are:
    o location of virtual base class sub-objects

For example:

    struct X { int i; };
    struct Y : X { };
    struct A { int a; };
    struct B : virtual A { int b; Y y; };
    extern B bobj;
    int *py = &bobj.b;   //1
    int *py = &bobj.y.i; //2
    int *py = &bobj.a;   //3
    B bobj;

Line //1 has well-defined behavior.  The expression '&bobj.b' only
depends on the static shape of bobj (that is, on the location of a
member sub-object) and this location is known at program start up.

Line //2 has well-defined behavior.  The expression '&bobj.y.i' only
depends on the static shape of bobj (that is, on the location of a
non-virtual base class member of a member sub-object) and this location
is known at program start up.

-------- X3J16/94-00182 - WG21/N0569 ----- Lajoie:Object Shape ----- Page  2

Line //3 however results in undefined behavior.  The expression
'&bobj.a' depends on the dynamic shape of bobj (that is, on the location
of a virtual base class member) and this location is known only when
bobj is initialized later on in the program.

3. When is the dynamic shape established? lost?
===============================================

At which point exactly during initialization is the dynamic shape of an
object established?

There are three possibilities:

(A) Beginning of initialization for the complete object (only)

    This means that the entire object acquires its dynamic shape when
    the constructor of the complete object starts.  The member
    sub-objects would acquire their dynamic shape at the same time as
    the complete object that owns them, that is, when the constructor of
    the complete object starts.

    Example:

        struct X { };
        struct Y : virtual X { };

        struct A {
            A(Y* py) { X* px = py; }
        };
        struct B : A {
            Y y;
            B() : A(&y) { }
         };

    Base class A is initialized first, followed by the initialization
    of the member sub-object y.  Is the dynamic shape of the member
    sub-object y established at the time A's constructor attempts to
    address y's virtual base class X?

    If option (A) is chosen, the answer is yes because the dynamic
    shape of the complete object and of the member sub-objects is
    established as the construction of the object of type B starts.
    This will guarantee that A's constructor has a well-defined behavior
    because the location of y's virtual base class will be known before
    A's constructor is entered.

(B) Beginning of initialization of the complete object and
    beginning of initialization of complete _member_ sub-objects

    This means that a virtual base class sub-object acquires its
    dynamic shape when the constructor of the complete object or the
    constructor of the complete member sub-object (most derived class)
    starts.

    If this option is chosen, the example shown in section (A) above

-------- X3J16/94-00182 - WG21/N0569 ----- Lajoie:Object Shape ----- Page  3

    will have undefined behavior because the dynamic shape of the member
    sub-object y will be established only when the constructor for class
    Y (most derived class representing the member sub-object) starts.

    However, the following example:

        struct V { int i; };
        struct W {
          int& ri;
          W(V* pv) : ri(pv->i) {}
        };
        struct D : virtual W, virtual V : W(this) {};

    will have well-defined behavior because the members of D's virtual
    base V become addressable when D's constructor starts (even though
    the base V is constructed after the base W is constructed).

(C) Beginning of initialization for every sub-object

    This means that a virtual base class sub-object only acquires its
    dynamic shape when its own initialization starts.

    The example above would then have undefined behavior because D
    doesn't "know" were V's members are until V's constructor is run.
    The initialization 'ri(pv->i)' therefore results in undefined
    behavior.

There are points during destruction that correspond to (A), (B) and
(C):
   (A) end of the destruction for the complete object
   (B) end of the destruction for the complete object and end of the
       destruction for the complete _member_ sub-objects
   (C) end of destruction for every sub-object
I believe that for simplicity, we will need to choose the point at
which an object looses its dynamic shape to mirror the behavior that
happens during construction.

4. Critique of the three options
================================

(A) is the most desirable option for users since it places fewest
limits on what programmers can do (and it's very simple to describe,
too).  However there is a cost associated with (A).
[Bill Gibbons in core-4557:]
    The virtual base pointers must either be set twice, or there must
    be a way to avoid setting them twice.

    Constructors must already know whether they are being invoked for
    base classes (to avoid re-initializing virtual bases), so there is
    no (time) cost for avoiding setting virtual base pointers again in a
    base class constructor.

    But member constructors would either set the virtual base pointers
    again, or there would be some small nonzero cost involved in

-------- X3J16/94-00182 - WG21/N0569 ----- Lajoie:Object Shape ----- Page  4

    avoiding setting them again.  (Even an alternate entry point costs a
    jump). I suspect many vendors wouldn't bother, and would just set
    the virtual base pointers again.

    Perhaps the cost is minimal compared to the utility.  But it's not
    zero.

(C) allows cfront-style mechanism of passing hidden arguments to a
constructor to tell it the locations of the virtual bases.  Option (A)
doesn't allow for this.  If the cost of for option (A) is judged too
expensive by committee members, than solution (C) becomes the simplest
solution available.
[Bill Gibbons in private email: ]
    One nice property of deferring establishing the dynamic shape of
    base class subobjects until their constructors begin is that member
    subobjects and base class subobjects then behave the same way.  This
    actually simplifies the rules.  Since most of the problem cases are
    pathological, I think it's more important to have a simple and
    consistent set of rules than to try to make every possible case work
    regardless of expense.

Proposal
--------
Adopt solution (A).

5. Something to think about
===========================

If solution (A) is adopted, that is, if the dynamic shape of members is
established early, that leaves globals as the only kind of object for
which the address can be taken before the dynamic shape is established.
Perhaps that hole should be plugged too.

[ Bill Gibbons in core-4552: ]
    There is no real reason why the shapes of global objects couldn't
    be required to be correct at the end of the static phase of global
    variable initialization.  The only drawback is that such object can
    currently be emitted in the "uninitialized data" part of an object
    and they would have to be moved to the "initialized data" part.
    This may have a major impact on the size of some object files.

[ Mike Miller in edit-425: ]
    That's one way of doing it; another way is to put the virtual base
    pointer setting at the very beginning of the static initialization
    for the module, before calling the constructors and doing the other
    non-constant initializations.  Since the per-module static
    initialization is done "before the first use of any function or
    object defined in that translation unit" (3.5.2p1), that would
    guarantee addressability with no impact on the size of the
    initialized data in the load module.  (The constructors would be
    invoked in the same way as complete member constructors, invoking
    virtual base constructors but not setting virtual base pointers.)

Do we want to go this far?