SC22/WG14 N791


Solving the struct hack problem

Clive D.W. Feather

clive@demon.net

1997-10-22


Abstract

========

Several DRs have attempted to address the issue of the "struct hack". This

paper proposes an approach to making the technique available while avoiding

most of the problems of current practice.


Discussion

==========

The "struct hack" is a technique for using a dynamically sized structure:

a structure type is declared like this:


    struct hack

    {

        size_t n_elements;

        int data [1];

    };


space is then malloced:


    size_t n;

    /* ... */

    struct hack *p;

    p->n_elements = n;

    p = malloc (sizeof (struct hack) + sizeof (int) * (n - 1));


and the entire space is used:


    for (i = 0; i < p->n_elements; i++)

        p->data [i] = 0;


The problem is that accesses to p->data [i] for i > 0 are undefined behavior,

because a pointer (p->data + i) to beyond the end of the array is being

used. To quote the DR response (slightly modified):

    Subclause 6.3.2.1 describes limitations on pointer arithmetic, in

    connection with array subscripting (see also subclause 6.3.6).

    Basically, it permits an implementation to tailor how it represents

    pointers to the size of the objects they point at. Thus, the

    expression p->data[5] may fail to designate the expected [object],

    even though the malloc call ensures that the [object] is present.

    The idiom, while common, is not strictly conforming.


This paper implements a technique, apparently already supported by at least

one declaration, of allowing the structure to be declared as:


    struct hack

    {

        size_t n_elements;

        int data [];

    };


and then explicitly permitting the access to any element of the array that

is within the bounds of the malloced space.


Proposal

========

[References are to draft 11 pre 3.]


In subclause 6.5.2.1 (Structure and union specifiers), paragraph 2, change:


    A structure or union shall not contain a member with incomplete or

    function type.


to:


    A structure or union shall not contain a member with incomplete or

    function type, except that the last element of a structure may have

    incomplete array type.


add a new paragraph at the end of the semantics:


    As a special case, the last element of a structure may be an incomplete

    array type. This is called a /flexible array member/, and the size of

    the structure shall be equal to the offset of the last element of an

    otherwise identical structure that replaces the flexible array member

    with an array of one element. When an lvalue whose type is a structure

    with a flexible array member is used to access an object, it behaves as

    if that member were replaced by the longest array that would not make

    the structure larger than the object being accessed. If this array

    would have no elements, then it behaves as if there was one element,

    but the behavior is undefined if any attempt is made to access that

    element.


and add an example:


    Example:


    After the declarations:

        struct s { int n; double d []; };

        struct ss { int n; double d [1]; };

    the three expressions:

        sizeof (struct s)

        offsetof (struct s, d)

        offsetof (struct ss, d)

    have the same value. The structure /struct s/ has a flexible array

    member /d/.


    If /sizeof (double)/ is 8, then after the following code is executed:

        struct s *s1;

        struct s *s2;

        s1 = malloc (sizeof (struct s) + 64);

        s2 = malloc (sizeof (struct s) + 46);

    and assuming that the calls to /malloc/ succeed, /s1/ and /s2/ behave

    as if they had been declared as:

        struct { int n; double d [8]; } *s1;

        struct { int n; double d [5]; } *s2;


    Following the further successful assignments:

        s1 = malloc (sizeof (struct s) + 10);

        s2 = malloc (sizeof (struct s) +  6);

    they then behave as if they had been declared as:

        struct { int n; double d [1]; } *s1, *s2;

    and:

        double *dp;

        dp = &(s1->d[0]);    // Permitted

        *dp = 42;            // Permitted

        dp = &(s2->d[0]);    // Permitted

        *dp = 42;            // Undefined behavior