From nmm1@cus.cam.ac.uk Wed Sep 26 17:52:15 2001 Received: from virgo.cus.cam.ac.uk (cusexim@virgo.cus.cam.ac.uk [131.111.8.20]) by dkuug.dk (8.9.2/8.9.2) with ESMTP id RAA25400 for ; Wed, 26 Sep 2001 17:52:15 +0200 (CEST) (envelope-from nmm1@cus.cam.ac.uk) Received: from nmm1 by virgo.cus.cam.ac.uk with local-smtp (Exim 3.33 #2) id 15mGyo-0002LS-00 for sc22wg14@dkuug.dk; Wed, 26 Sep 2001 16:52:10 +0100 To: sc22wg14@dkuug.dk Subject: What is an Object in C Terms? Date: Wed, 26 Sep 2001 16:52:10 +0100 From: Nick Maclaren Message-Id: This is a slight redraft of a documented that was circulated to the UK C panel. I am not sure what it is proposing, except that effective types need an improved specification, but it attempts to describe a very serious problem. THE OBJECT PROBLEM ------------------ This is asking the question "What is an object when it is at home?" We need to start with an explanation of why a precise answer to the question is so important. This issue is very closely related to the sequence point issue, obviously, and can be regarded as of equal importance as far as the C language's semantics are concerned. However, it is even more serious as far as its effects on other standards is. C and POSIX have now agreed that their approaches to signal handling are incompatible, though not in those words. This needs to be resolved, but will not even be tackled for POSIX 2001, as it is far too difficult a problem. The actual incompatibility is that C defines all parallelism and 'real' or external signal handling to be undefined behaviour, as it is outside C's mode, and that POSIX leaves all language issues to C, as it is mainly a library interface. Therefore such things may be well-defined in POSIX terms, but have undefined effects on the program which uses the POSIX facilities! The incompatibility is therefore of the form that POSIX relies on C to define behaviour that is explicitly undefined! Obviously, the solution is for POSIX to define the language aspects that it needs, but the problem is that doing so is very tricky. It was pretty complex in C90, but C99 has complicated this area very considerably, and the problems are going to cause major difficulty over the next decade. The incompatibility between C90 and POSIX already causes intermittent, unrepeatable program failure on most of the systems that I have access to but, precisely because of those properties, it is almost impossible to prove this to the vendors. Few people recognise the symptoms and a negligible proportion have the skills to investigate further. When I am referring to the SMP aspects, I am talking about the fact that two independent threads must not access overlapping objects if one of them is updated. Whether the SMP is cache-coherent is irrelevant, as exactly the same problem occurs with data in registers as with data in memory; when using parallelism for efficiency, storing and reloading all registers at every synchronisation point is an unacceptable overhead, and so is not done. Also, note that this applies to all state (such as floating-point exception flags) as well as data. There is an exactly corresponding problem with asynchronous signal handlers. Commercial Implementors and Important Users ------------------------------------------- The vast majority of commercial implementors and important users are not members of SC22/WG14 or even any national body affiliated with it, and work solely from the standard. As they should, because ISO rules (and all normal commercial practice) specify that the standard is the official document, and records of its development should be used for at most clarification. I can witness that this area has caused major headaches ever since C90 was developed, and that many implementors are very seriously unhappy about C99. They simply do not know what most of it implies and, as is generally agreed, the subtleties in this area are critical for both optimisation and the development of robust applications. Most of the ones that I have contacts with had severe problems deciding exactly what optimisations were allowed in C90, and often went through several iterations before and industry consensus emerged. Their approach is typically to use restrict and static in array parameters (but not effective types) enough to optimise the BLAS and LAPACK, and perhaps a little further, but to await clarification before being more radical. The fact that a very few vendors are gung ho about effective types, without it being absolutely clear what the constraints mean, is going to be a major portability problem in certain complex codes. I already see far too many that have to disable optimisation on some systems because it is unclear whether they or the implementation has broken the standard. The C99 Standard ---------------- Let us look at some relevant definitions: [A] access to read or modify the value of an object [B] alignment requirement that objects of a particular type be located on storage boundaries with addresses that are particular multiples of a byte address [C] object region of data storage in the execution environment, the contents of which can represent values NOTE: When referenced, an object may be interpreted as having a particular type; see 6.3.2.1. [D] parameter, formal parameter, formal argument (deprecated) object declared as part of a function declaration or definition that acquires a value on entry to the function, or an identifier from the comma-separated list bounded by the parentheses immediately following the macro name in a function-like macro definition [E] value precise meaning of the contents of an object when interpreted as having a specific type Other sections that are particularly relevant include 5.1.2.3 passim (side effects, sequence points and conformance requirements), 6.2.5 [#1] (object type and meaning of a value), 6.2.5 [#20] (derived types), 6.2.6.1 [#2,#4] (values are stored in bytes), 6.2.6.1 [#5,#6,#7] (trap representations and padding bytes), 6.2.6.1 [#8] (validity of alternate representations), 6.2.6.2 (integer types), 6.3.2.1 [#1] (definition of lvalue), 6.2.7 [#2] (multiple declarations and compatible types), 6.3.2.1 [#2] (lvalue to value conversion), 6.3.2.1 [#3] (array to pointer conversion), 6.3.2.3 [#1,#7] (pointer conversion), 6.5 [#1] (sequence points and accesses), 6.5 [#6] (effective type), 6.5 [#7] (permitted access types), 6.5 passim (expressions that require or deliver lvalues), 6.5.3.2 [#3] (unary '&' operator), 6.5.8 [#5,#6] (pointer comparison), 6.7.3.1 (formal definition of restrict), 6.7.5.3 [#7] (array parameter adjustment and implication of static therein), 6.9.1 [#9] (parameter identifiers are lvalues) and 7.21.1 [#1] ( access conventions). I may well have missed some important ones. What is abundantly clear is that the basic model used by C is this. An object type is specified to be a complete type that is neither void nor a function type. An expression that can access an object has a one-to-one mapping to an lvalue, and they correspond to a unique and well-defined object and object type. A pointer value with a pointer to object type defines an lvalue that is an array object with a base type the corresponding object type (the array may be of length 1, of course), except for the odd case of the element one beyond the end of an array. A pointer value with a pointer to an incomplete type or to the element beyond the end of an array defines an address but not an lvalue, though its base type correspondence is otherwise the same as that for a pointer to an object. A pointer value with a pointer to void type specifies only an address. So far, so good. Unfortunately, this breaks down in several places in C90, and in many more in C99. In most cases, the failure of the basic model is not serious in the effects on the C language as such, because the constructions that show up the problems are either perverse or show up only in the most extreme optimising compilers. So it is mainly the High Performance Computing people who notice, and few of them. However, it is very serious as far as the effects on related standards are concerned; this applies particularly to parallel models, such as POSIX threads or OpenMP, but also includes anything that needs reliable signal handling. To a first approximation, the major problems separate into two parts: what the base type of an object is and, if an object is an array type, what size the array is. These can be considered separately. The first question is less clear-cut than the second, and has several aspects, which all interact with one another. These include whether an object or subobject is the relevant one, which arises in several guises, and the new concept of effective type. There is a further subtlety, which will not be considered here in any detail, which is when a datum with one base type can be accessed through an lvalue of a different type. This was extremely unclear in C90 and has been significantly changed in C99. In practice, this is mostly about the signedness of integer types, because all other base types tend to be either wholly incompatible or wholly compatible. The issue applies as much to them; it is merely that the problem does not show up in the current C standard. However, it is worth noting that it will probably be exposed for fixed-point data types as well, and in a worse way than for signedness, if they are introduced. It is also critical to realise that the examples given later are a small proportion of the problems that I have seen actually cause trouble, and a very small proportion of those where I know that the C99 and POSIX or OpenMP standards' models conflict. So there is virtually no point in trying to resolve any of the following individual examples; what is needed is a clarification of these aspects of the C standard, along the lines of the sequence point approaches (only probably rather more radical). What is the Base Type? ---------------------- in the details In this context, type qualifiers are largely irrelevant, and the most critical properties of the base type are its size and alignment. While the C standard does not say so, it is clear that aliasing, sequence point and SMP problems are primarily affected by those two properties alone. In C90, things were relatively clear, and it had three categories of type, all of which have been preserved in C99: Datum type. This is the actual type of the datum, ignoring any syntactic issues, and is generally not decidable statically. Access type. This is the type of the lvalue used for access, and can be decided statically. Allocation type. This is the type of the original definition or result of malloc etc., and is generally not decidable statically. The requirements in both C90 and C99 are that the datum and access types must be compatible, and that the size and alignment of the allocation type must be adequate for both. There is also the exception for character access types, which may be used on any datum type. There was a thread on the reflector which claimed that the allocation type could affect later accesses which used another type of lvalue, but it did not seem to have much support, and the consensus was that its only lasting effect was through its alignment. All of this specification could do with clarification. In C90, the base type of an object was the base type of the lvalue (i.e. the access type), except for certain library functions (e.g. memcpy) which accessed objects as arrays of bytes. While there were a good many cases where the array length was unclear (see below), I cannot think of many where the alignment of the base type was. What confusion there was mainly which of a set of compatibly aligned base types to choose from when using functions like memcpy. There were also some confusions about whether the type of an argument (as distinct from a parameter) was relevant, subtleties about signedness, and whether copying only the data of a structure (i.e. not its padding) was safe. Some of them have been clarified in C99, but others have not. These will be ignored here, as they affect the semantics of C programs, but have relatively little effect on other standards; i.e. they affect what can be done with an object, rather than exactly where it exists in memory. However, C99 has changed this area quite drastically, by introducing the concept of effective type. This has not clarified the situation, and some of its problems are described below. But we now have to add: Effective type. This seems to be an extension of datum type, but intended for type-dependent optimisation. I shall not describe this in the next section, as I do not understand it, for reasons given later. I think that its introduction was a very serious mistake. Objects and Subobjects ---------------------- There are three entwined aspects to this. One is the question of when an object access refers to a subobject (as distinct from the whole object), such as an array access referring to an array element or a structure or union access referring to a member. Another is when a valid object can be created by pure pointer manipulation, bypassing the type mechanism entirely. And the last is when the relevant object is an array of uninterpreted characters (i.e. unsigned char); while this is most common in the library, it can occur in the language proper. One of the reasons that C90 and C99 work at all is that they define almost no type-generic facilities other than ones that map objects to arrays of characters. is irrelevant here, because it does not raise the problems being considered. However, the same is not true in other standards that use C as a base, because they may (for example) have facilities that operate on arrays of generic data pointers. But this does not mean that there is not an ambiguity in C, so much as that the ambiguity is not exposed in the current language. It appears that the standard says neither that a member of a struct is itself an object, nor that an element of an array is, but clearly they are. However, there is more to subsectioning than that. Consider: #include typedef struct {int a; double b; int c;} COMPOSITE; typedef struct {int a; double b;} SUBSET; int main (void) { double array[5] = {0.0}; COMPOSITE composite = {0,0.0,0}; double (*p)[2]; SUBSET *q; int *r; array[0] += (array[1] = 0.0); composite.a += (composite.c = 0); p = (double (*)[2])(array+2); q = (SUBSET *)&composite; r = (int *)(((char *)&composite)+offsetof(COMPOSITE,c)); return 0; } It is clear that the first two assignments would be illegal if the relevant objects were the whole array and structure, and constructions like that are so common that the only plausible interpretation of the C standard is that the relevant objects are the elements and members. This should be stated explicitly. However, I assert that it is also the case that *p, *q and *r are well-defined objects of type double[2], SUBSET and int. The validity of the first follows from the equivalence of pointers and arrays and the fact that a pointer to an array is a pointer to its first element, the second follows from the common initial sequence rules for unions, and the third from the very definition and purpose of the offsetof macro. The constructions given here may not be common, but there are a vast number of programs that use equivalent ones (often implicitly), and forbidding them would break a huge number of programs. In all cases, if they are regarded as undefined, I am almost certain that I can produce a program that shows a clear inconsistency in the standard. Perhaps worse is the fact that there are so many unrelated ways of bypassing the formal type structure; I do not know offhand how many more there are. So subsetting is not a simple phenomenon. There is a similar problem with the cases where objects are handled as arrays of characters. Consider: #include typedef struct {int a; double b; int c;} COMPOSITE; int main (void) { COMPOSITE composite; char string[] = "ABB"; memcpy(&composite,&((char *)&composite)[sizeof(int)],sizeof(int)); strtok(string,&string[2]); return 0; } The first function call is a fairly common construction, though rarely in that form, and generally regarded as acceptable; again, forbidding it would break a huge number of programs and could be used to show an inconsistency in the standard. The second one is obscene, but I can see no reason that it is not conforming; the consequences of forbidding such things generically are beyond my imagination. My view is that these questions are fairly easy to resolve within the C language, except for the matter of effective type discussed below. However, it is not so clear when other standards are involved, or for C extensions that allow operations on whole arrays. For example, it is very unclear whether an array should be treated as a single object, as an array of discrete elements or as an array of characters. I doubt that this problem is soluble without a major rethink of the C object model. Hence my view is the following: The generic C model is that the object used for access through an lvalue is the smallest possible subobject compatible with the type of the lvalue and form of access, and that arrays (including arrays of uninterpreted characters) are invariably treated as discrete elements even when operated on in bulk. I can think of no exceptions to this, but the point is not clearly stated and other interpretations may be possible. Some wording improvement is essential. Related to this is the interpretation of the base type of an object: if the lvalue used for access has an object type, that type and the datum type are the only ones considered; if it has none (as with memcpy), the lvalue type is treated as if it were an array of uninterpreted characters and the datum type is irrelevant. The main exception is the concept of effective type, if it is retained. The third aspect is that a valid object can be created by any legal pointer operations, casts etc., subject to each conversion having an appropriate alignment for its type and value, and the result ending up with an appropriate datum type and size. I would hate to have to draft the specification of this, but it badly needs doing. The current wording is probably adequate for the alignment, but is very unclear on when the result is a valid object. The C standard should make a committment (in Future Directions) that it will not be defining operators or other facilities that access arrays as a whole without resolving this problem. Without such a committment, many implementors will be reluctant to trust the existing standard to remain stable, as a trivial change in this area could have immense consequences for implementors. It should also point out to the developers of extensions or other standards based on C that this area is a minefield. In particular, it should say that it is essential that the extension or standard specifies whether arrays will be treated as single objects, as their composite elements, or as arrays of bytes in any particular context, as the C language allows any of those options. The restrict Qualifier ---------------------- This issue was raised on the reflector with the previous specification of restrict, but got dropped because it was not clearly applicable to the new one. It is considering the question of whether the objects affected by the restrict qualifier are considered to have type the base type of the restrict qualifier pointer or that of the lvalue used for the actual access. typedef struct {int a; double b; int c;} COMPOSITE; COMPOSITE composite; void copy (COMPOSITE * restrict A, COMPOSITE * restrict B) {A.a = B.c;} void scatter (double * restrict A, double * restrict B) { int i; for (i = 0; i < 100; i += 2) A[i] = B[i] = 0.0; } int main (void) { double array[200]; copy(&composite,&composite); scatter(array,&array[1]); return 0; } Now, if the objects being referred to in 6.7.3.1 are the access (lvalue) types, there is nothing wrong with these; but, if they are considered relative to the base type of the restrict qualified pointers, both function calls are undefined. As I understand the wording, the former is the correct interpretation, though it is not the best specification for SMP optimisation. My belief is that this does not need attention, if the previous queries are resolved and clarified in the directions described. If, however, they are resolved in any other way, then it needs reconsidering in the light of that resolution. Effective Type -------------- C99 introduced the concept of effective type (6.5 paragraph 6), but it has had the effect of making a confusing situation totally baffling. This is because it has introduced a new category of types, it has invented new terminology without defining it, its precise intent is most unclear, and it has not specified its effect on the library. The first aspect was mentioned above. 6.5 paragraph 6 uses the term "declared type" of an object, which is otherwise used in only five other places, and then in contexts that makes its meaning clear. In all of those cases, the context is discussing an aspect of a particular declaration or lvalue, and the term is a reference back to that declaration. But that is not the case in this section, and it is clear that the term is being used with a different meaning. It clearly and repeatedly distinguishes the type of the lvalue used for access from the declared type, so it cannot be that, but what does it mean? The third question is related, in the sense that knowing what that term means in this context might enable one to deduce the intent of this section, and knowing the intent of this section might enable one to deduce the meaning of that term. Consider: #include typedef struct {double a;} A; typedef struct {A b;} B; void fred (A *a) { double *p = (double *)a; void *q = malloc(sizeof(B)); memcpy(q,(char *)p,sizeof(B)); } int main (void) { B b[1000]; fred((A *)b); return 0; } After the call to memcpy, is the effective type of the object pointed to by q an array of B, A, double or char? In other words, which of those is the "declared type" and why? It is probable that it is not an array of char, because of the lvalue versus declared type distinction. But there is no obvious reason to choose any one of B, A or double as the preferred interpretation. This is not the only problem with this section. Another very nasty one relates to partial copying, which is extremely common in many C programs. Consider: #include #include typedef struct {double a; int b;} C; int main (void) { C c = {0.0,0}; void *p = malloc(sizeof(int)); memcpy(p,(void *)(((char *)&c)+offsetof(C,b)),sizeof(int)); return 0; } The wording of 6.5 paragraph 6 would seem to imply that the effective type of the object pointed to by p becomes C, which means that the object is necessarily invalid as there is insufficient space for a C datum. But surely it cannot be the intent that the effective type of the object pointed to by p is incompatible with its datum type, which is most definitely int in this case? If it is, this is the most serious incompatibility between C99 and C90 yet reported, and will break a huge number of reasonable programs. For the fourth aspect, consider: #include void *compare (const void *A, const void *A) { return *(double*)A-*(double *)B; } int main (void) { double array[100] = {0.0}; void *p; p = bsearch(array[0],array,100,sizeof(double),compare); } What is the effective type of the object pointed to by p? It is pretty clearly double, an array of double or no effective type. But which and why? There are probably other problems, but there is little point in giving yet more examples. What is clear is that I (and some other people I have spoken to) cannot work out what this section is attempting to specify, and it desperately needs clarification. When it is clarified, it may well need further work, but the first step is to clarify its intent. If this turns out to be infeasible, then the concept of effective type should be scrapped until and unless someone can come up with an adequate description. What Size is an Array? ---------------------- In this section, the question being asked is how large the array object that corresponds to a pointer value is. Please ignore any assumptions that can be used by the fact that two constructions are in the same compilation unit, as they could easily be separated. For the purposes of these examples, let us assume no padding and that all structures have the same alignment. typedef struct {double z[5];} small; typedef struct {double z[7];} large; double *pp; small *qq; void ONE (double *a) {a[8] = 1;} void TWO (double a[5]) {ONE((double *)a);} void THREE (double a[5]) {a[8] = 1;} void FOUR (double a[5]) {if (a == pp) ONE((double *)a);} void FIVE (double a[5]) {if (a == pp) ONE((double *)pp);} void SIX (large *a) {a->z[6] = 1;} void SEVEN (small *a) {SIX((large *)a);} void EIGHT (large *a) {SEVEN((small *)a);} void NINE (large *a) {if ((char *)a == (char *)qq) SEVEN((small *)a);} void TEN (large *a) {if ((char *)a == (char *)qq) SEVEN((small *)qq);} void ELEVEN (small *a, int n) { int i; large *b; for (i = 0; i < n; ++i) {b = (large *)a; a = (small *)b;} a->z[3] = 1; } int main (void) { double p[10]; small q[2]; ONE(p); TWO(p); THREE(p); pp = p; FOUR(p); FIVE(p); SIX((large *)q); SEVEN(q); EIGHT((large *)q); qq = q; NINE((large *)q); TEN((large *)q); ELEVEN(q,1); ELEVEN(q,2); return 0; } Now, which of the first five calls are conforming? There are obviously plausible interpretations of the standard that allow all of them, all of them except THREE, or only ONE and FIVE. But consider TWO: exactly WHICH construction breaks any wording in the standard? There is clearly nothing in either ONE or TWO on their own that is not conforming, and the standard has no explicit concept of a value having a 'history' that affects its semantics. And is the intent REALLY to distinguish FOUR and FIVE? The second set of five calls is mainly to show that the effect does not depend on array parameter adjustment. Let us assume that the introduction of effective type does not exclude SIX; if it does, then C99 has broken a huge number of important existing codes! The key point is that '(large *a)q' points to an array object of length 1; hence, if the history matters, so does '(small *)(large *a)q'; which in turn means that '(large *a)(small *)(large *a)q' is a zero-length array, and so the access is illegal. The calls to ELEVEN are to show that this is not solely a static problem but, in some cases and with some interpretations, depends on the execution of the program. The last time this issue was raised, several people said that the standard was clear, but did not agree on exactly which examples were conforming. Other people felt they they knew what was intended, but doubted that the standard made it entirely clear. And yet others were certain that the standard was unclear or even inconsistent. My view is that there are three possible approaches to consistency: 1) The original K&R unchecked approach. All pointer values are simply addresses and, if they are object pointers, that object is the largest array of their base type that will fit within the originally allocated object. I.e. the object as originally defined, returned from malloc etc. or returned from some specified library call (like getenv). I believe that this is the approach being taken by the gcc checked pointer project (on grounds of practicality), even though they know that it does not match the requirements of the C standard. If this approach is taken, then the ONLY subsequent constraints on the size of an array object pointed to are those imposed by the restrict qualifier and the static qualifier in array parameters. I favour this interpretation, as it is most consistent with traditional expectations and existing codes, though it is the worst approach for SMP optimisation. If this were done, there is a case for rephrasing the static qualifier in array parameters to mean precisely that number of elements (i.e. to adopt traditional Fortran semantics in that case, and that case alone). 2) The approach that only visible information is relevant. The first thing to do is to specify what 'visible' means in this context, and that is not an easy thing to do. One obvious rule is to say that only the type actually used in an array subscription is relevant (i.e. the lvalue type). As pointed out before, this still means that the declared size of array parameters (without the static qualifier) is ignored, because their type has already been adjusted to a pointer type. If the declared size of array parameters should be be significant, then the wording of array parameter adjustment will need considerable rethinking, because it will introduce yet another category of types associated with an lvalue. This is not pretty, and still allows: void THREE (double a[5]) {double *b = &a[5]; b[3] = 1;} To lock that out needs a more complex definition of visible, and I doubt that it will be easy to find a consistent one, short of the next approach. 3) The approach that pointer values have a history. This is essentially approach (2) applied dynamically as well as statically and taken to the limit. Every time that a pointer is cast to a new type, implicitly or explicitly, the object that it refers to is reduced in size to a whole number of elements. And every time that it is passed as a complete array parameter, it is reduced to the size of the declared array (before adjustment). Thus the size associated with a pointer value can decrease but never increase. However, this still leaves the question of whether passing a pointer through a library function like memcpy clears its history, in the same way that memchr can be used to clear the const qualifier from a pointer. And, of course, it does not say anything about the properties of pointers created from integers. This is clearly the best approach for SMP optimisation, but would break a number of important existing codes in diabolically obscure ways, though it is unclear how many of those are already not conforming. Codes which I am pretty certain would fall foul of this include most BSD string handling and much of the X11 system. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QG, England. Email: nmm1@cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679