WG14/N823 C9X Public Comment WG14/N823 ================== Sponsoring National Body: J11 Date: 98/05/15 Author: Tom MacDonald (with help from Hugh Redelmeier) Author Affiliation: Silicon Graphics Inc. Postal Address: 655F Lone Oak Drive, Eagan, MN 55409 USA E-mail Address: tam@cray.com Telephone Number: +1 612 6835818 Fax Number: +1 612 6835307 Number of individual comments: 2 Below is a copy of something Hugh Redelmeier sent to the committee over a year ago. I don't think WG14 ever adequately addressed the issue. I'm re-submitting the paper for the June 1998 meeting. I've made a few tweaks, but tried to clearly identify them. Tom MacDonald tam@cray.com ================================================================ From: hugh@mimosa.com ("D. Hugh Redelmeier") Date: Sat, 1 Feb 1997 04:45:42 -0500 To: sc22wg14@dkuug.dk Subject: (SC22WG14.3377) DR166 -- lvalue constraints I promised to write a paper on DR166. I'm sorry for the lateness of this. I have shown an earlier version to larry.jones@sdrc.com, seebs@solon.com and gwyn@arl.mil. I have made some changes to address their comments. I wish to thank them for their help. That does not mean that they would approve of what I say here. As I see it, the problem is with the wording of 6.2.2.1, in particular, the first sentence [from c9x-std.txt on the ftp site]: [#1] An lvalue is an expression (with an object type or an incomplete type other than void) that designates an object.38 This looks as if the syntactic recognition of an lvalue depends on it really designating an object. In particular, the DR suggests that this makes the run-time behavior of the lvalue expression affect a constraint (a compile-time notion). There is a classic bug in English: the substitution of "that" for "which" and vice versa. From Fowler's Modern English Usage (alas, not the brand new edition): Which, that, who: ... (A) of "which" and "that", "which" is appropriate to non-defining and "that" to defining clauses. ... ...(A) "The river, which here is tidal, is dangerous", but "The river that flows through London is the Thames." I think that the simple fix is to change the first sentence of 6.2.2.1: An _lvalue_ is the form of expression used to designate an object.#38 It shall have an object type or an incomplete type other than void. I think that this clearly shows the purpose of an lvalue, without making the syntactic property depend on the runtime validity. I have moved the parenthetical remark to its own sentence to simplify and clarify the prose. I wonder if it belongs in a constraint section. Doug Gwyn suggested that expressing the intent is wimpy: "There is no force in the "intent" that it be used to designate an object, except when it doesn't quite, so why bother to mention it?" He suggests: An _lvalue_ is an expression; it shall have an object type or an incomplete type other than void. I see his point, but I think that describing the purpose is useful. I agree that the wording could be better. It is important that any runtime restrictions be explicitly stated somewhere. I don't think this change redistributes that burden. If they are missing now, they already were (unless the "that designates an object" did the job). To express the runtime restrictions, we should add something like: When an lvalue expression is evaluated, the behavior is undefined if the expression does not designate an object. or When an lvalue expression is evaluated, it shall designate an object. It would probably be useful to add a footnote to the effect: [Footnote: note that the operand of a sizeof expression is not evaluated -- 6.3.3.4] Larry asked: Can anyone think of a case where we need to require an lvalue to designate an object even though it isn't evaluated? I think not, but the committee should consider this. ================================================================ Note: the following is a separable issue. I have not prepared suggested wording changes, so this cannot be considered as a proposal. I am including it in case the committee is interested. Many people have been surprised that the behavior of &a[upper_bound] is undefined in C89. It was and is a common idiom. I still use it in my code and haven't used an implementation that did something unexpected. Several comments expressed ambivalence about this. I think that they would like to support &a[upper_bound], but don't really like *(a + upper_bound) which is pretty hard to separate. [[...TMacD... I suspect the `*' is a typo - should be just (a + upper_bound) or &(*(a + upper_bound)) ...]] If we wish to make this form well-defined in C9x, I think we could do so here, and in the description of unary *, and in the description of addition involving pointers. We would need to refine the runtime restrictions that we just added to 6.2.2.1, replacing them with: When an lvalue expression that is not the operand of a unary & is evaluated, it shall designate an object. When lvalue expression that is the operand of a unary & is evaluated, it shall designate an object or one past the last ^ element [[...TMacD...]] element of an array object. [Perhaps this should be reworded without "shall"; the flavor should be clear.] We need to make some changes in 6.3.3.2 (Address and indirection operators). Here is one paragraph from the current 6.3.3.2 that would need changing: [#4] The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type ``pointer to type,'' the result has type ``type.'' If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.49 Here is a paragraph from the current 6.3.6 (Additive operators) that would need to be adjusted (near the end). [#8] When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i- n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. Unless both the pointer operand and the result point to elements of the same array object, or the pointer operand points one past the last element of an array object and the result points to an element of the same array object, the behavior is undefined if the result is used as an operand of the unary * operator. This paragraph seems very fragile. In fact, I'm not sure that it works. For our purpose, I think that the only change would be to delete the last sentence. Its function should be achieved by appropriate words in 6.3.3.2. Hugh Redelmeier hugh@mimosa.com voice: +1 416 482-8253 =================== TMacD's proposed rewrite of 6.3.3.2 ==================== 6.3.3.2 Address and indirection operators Constraints [#1] The operand of the unary & operator shall be either a function designator, the result of a [] or unary * operator, or an lvalue that designates an object that is not a bit- field and is not declared with the register storage-class ^ , or one element past the last element of an array, specifier. [#2] The operand of the unary * operator shall have pointer type. Semantics [#3] The result of the unary & (address-of) operator is a pointer to the object or function designated by its operand. ^^^^^^^^^^ an object, or one element past the last element of an array, If the operand has type ``type'', the result has type ``pointer to type''. If the operand is the result of a unary * operator, neither that operator nor the & operator ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Neither operator are evaluated, and the result shall be as if both were ^^^ is omitted, even if the intermediate object does not exist, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ resulting pointer does not point to an object with an effective type (described in 6.3) that can be accessed through this pointer. except that the constraints on the operators still apply and ^^^^^^^^^^^ However, the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] are evaluated, and the result shall be as if the & operator was removed and the [] operator was changed to a + operator. [#4] The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type ``pointer to type'', the result has type ``type''. If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.71 [[... TMacD ...]] Although, Hugh suggests a rewrite of para 4 above, I think the current wording works. The last sentence could be rewritten as: If the pointer does not point to an object, the behavior is undefined. I also don't think these words handle the following &p.a &p->a assuming "a" is a member of a union and "p" points one element past the end of an array. Not sure if this is the intent.