Accredited Standards Committee X3 Doc No: X3J16/94-0182 WG21/N0569 Information Processing Systems Date: Sept 27, 1994 Page 1 of 4 Operating under the procedures of Project: Programming Language C++ American National Standards Institute Ref Doc: Reply to: Josee Lajoie (josee@vnet.ibm.com) +--------------------------+ | Object Shape Acquisition | +--------------------------+ 1. Introduction =============== The shape of an object describes its storage layout. An object has the shape of a T if it occupies a storage location of size sizeof(T), if its address respects the alignment requirements of a T, and, if T is a class type, if the members are allocated as required by the definition of type T. 2. Static Shape and Dynamic Shape ================================= The static shape of an object describes the storage layout properties of the object that are known when storage for the object is allocated. These properties are: o size o alignment o location of non-virtual base class sub-objects o location of member sub-objects The dynamic shape of an object describes the layout properties of the object that are set up only when the object is initialized. These properties are: o location of virtual base class sub-objects For example: struct X { int i; }; struct Y : X { }; struct A { int a; }; struct B : virtual A { int b; Y y; }; extern B bobj; int *py = &bobj.b; //1 int *py = &bobj.y.i; //2 int *py = &bobj.a; //3 B bobj; Line //1 has well-defined behavior. The expression '&bobj.b' only depends on the static shape of bobj (that is, on the location of a member sub-object) and this location is known at program start up. Line //2 has well-defined behavior. The expression '&bobj.y.i' only depends on the static shape of bobj (that is, on the location of a non-virtual base class member of a member sub-object) and this location is known at program start up. -------- X3J16/94-00182 - WG21/N0569 ----- Lajoie:Object Shape ----- Page 2 Line //3 however results in undefined behavior. The expression '&bobj.a' depends on the dynamic shape of bobj (that is, on the location of a virtual base class member) and this location is known only when bobj is initialized later on in the program. 3. When is the dynamic shape established? lost? =============================================== At which point exactly during initialization is the dynamic shape of an object established? There are three possibilities: (A) Beginning of initialization for the complete object (only) This means that the entire object acquires its dynamic shape when the constructor of the complete object starts. The member sub-objects would acquire their dynamic shape at the same time as the complete object that owns them, that is, when the constructor of the complete object starts. Example: struct X { }; struct Y : virtual X { }; struct A { A(Y* py) { X* px = py; } }; struct B : A { Y y; B() : A(&y) { } }; Base class A is initialized first, followed by the initialization of the member sub-object y. Is the dynamic shape of the member sub-object y established at the time A's constructor attempts to address y's virtual base class X? If option (A) is chosen, the answer is yes because the dynamic shape of the complete object and of the member sub-objects is established as the construction of the object of type B starts. This will guarantee that A's constructor has a well-defined behavior because the location of y's virtual base class will be known before A's constructor is entered. (B) Beginning of initialization of the complete object and beginning of initialization of complete _member_ sub-objects This means that a virtual base class sub-object acquires its dynamic shape when the constructor of the complete object or the constructor of the complete member sub-object (most derived class) starts. If this option is chosen, the example shown in section (A) above -------- X3J16/94-00182 - WG21/N0569 ----- Lajoie:Object Shape ----- Page 3 will have undefined behavior because the dynamic shape of the member sub-object y will be established only when the constructor for class Y (most derived class representing the member sub-object) starts. However, the following example: struct V { int i; }; struct W { int& ri; W(V* pv) : ri(pv->i) {} }; struct D : virtual W, virtual V : W(this) {}; will have well-defined behavior because the members of D's virtual base V become addressable when D's constructor starts (even though the base V is constructed after the base W is constructed). (C) Beginning of initialization for every sub-object This means that a virtual base class sub-object only acquires its dynamic shape when its own initialization starts. The example above would then have undefined behavior because D doesn't "know" were V's members are until V's constructor is run. The initialization 'ri(pv->i)' therefore results in undefined behavior. There are points during destruction that correspond to (A), (B) and (C): (A) end of the destruction for the complete object (B) end of the destruction for the complete object and end of the destruction for the complete _member_ sub-objects (C) end of destruction for every sub-object I believe that for simplicity, we will need to choose the point at which an object looses its dynamic shape to mirror the behavior that happens during construction. 4. Critique of the three options ================================ (A) is the most desirable option for users since it places fewest limits on what programmers can do (and it's very simple to describe, too). However there is a cost associated with (A). [Bill Gibbons in core-4557:] The virtual base pointers must either be set twice, or there must be a way to avoid setting them twice. Constructors must already know whether they are being invoked for base classes (to avoid re-initializing virtual bases), so there is no (time) cost for avoiding setting virtual base pointers again in a base class constructor. But member constructors would either set the virtual base pointers again, or there would be some small nonzero cost involved in -------- X3J16/94-00182 - WG21/N0569 ----- Lajoie:Object Shape ----- Page 4 avoiding setting them again. (Even an alternate entry point costs a jump). I suspect many vendors wouldn't bother, and would just set the virtual base pointers again. Perhaps the cost is minimal compared to the utility. But it's not zero. (C) allows cfront-style mechanism of passing hidden arguments to a constructor to tell it the locations of the virtual bases. Option (A) doesn't allow for this. If the cost of for option (A) is judged too expensive by committee members, than solution (C) becomes the simplest solution available. [Bill Gibbons in private email: ] One nice property of deferring establishing the dynamic shape of base class subobjects until their constructors begin is that member subobjects and base class subobjects then behave the same way. This actually simplifies the rules. Since most of the problem cases are pathological, I think it's more important to have a simple and consistent set of rules than to try to make every possible case work regardless of expense. Proposal -------- Adopt solution (A). 5. Something to think about =========================== If solution (A) is adopted, that is, if the dynamic shape of members is established early, that leaves globals as the only kind of object for which the address can be taken before the dynamic shape is established. Perhaps that hole should be plugged too. [ Bill Gibbons in core-4552: ] There is no real reason why the shapes of global objects couldn't be required to be correct at the end of the static phase of global variable initialization. The only drawback is that such object can currently be emitted in the "uninitialized data" part of an object and they would have to be moved to the "initialized data" part. This may have a major impact on the size of some object files. [ Mike Miller in edit-425: ] That's one way of doing it; another way is to put the virtual base pointer setting at the very beginning of the static initialization for the module, before calling the constructors and doing the other non-constant initializations. Since the per-module static initialization is done "before the first use of any function or object defined in that translation unit" (3.5.2p1), that would guarantee addressability with no impact on the size of the initialized data in the load module. (The constructors would be invoked in the same way as complete member constructors, invoking virtual base constructors but not setting virtual base pointers.) Do we want to go this far?