From owner-sc22wg14@open-std.org  Thu Feb 28 16:40:58 2008
Return-Path: <owner-sc22wg14@open-std.org>
X-Original-To: sc22wg14-domo1
Delivered-To: sc22wg14-domo1@open-std.org
Received: by open-std.org (Postfix, from userid 521)
	id CA236D8934; Thu, 28 Feb 2008 16:40:58 +0100 (CET)
X-Original-To: sc22wg14@open-std.org
Delivered-To: sc22wg14@open-std.org
Received: from ace.ace.nl (smtp1.ace.nl [193.78.105.127])
	by open-std.org (Postfix) with ESMTP id CF6DD38508
	for <sc22wg14@open-std.org>; Thu, 28 Feb 2008 16:40:43 +0100 (CET)
Received: from [127.0.0.1] (localhost.ace.nl [127.0.0.1])
	by ace.ace.nl (8.12.11.20060308/8.12.11) with ESMTP id m1SFeNIa005710;
	Thu, 28 Feb 2008 16:40:24 +0100
Message-ID: <47C6D5EE.2000708@ace.nl>
Date: Thu, 28 Feb 2008 16:40:30 +0100
From: Willem Wakker <willemw@ace.nl>
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: "Gwyn, Douglas (Civ, ARL/CISD)" <gwyn@arl.army.mil>
CC: sc22wg14@open-std.org
Subject: Re: (SC22WG14.11415) RE: A naive (??) question ... (UNCLASSIFIED)
References: <20080225173450.187DAD6BA5@open-std.org> <20080227004438.250A0D7A81@open-std.org>
In-Reply-To: <20080227004438.250A0D7A81@open-std.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Sender: owner-sc22wg14@open-std.org
Precedence: bulk

Hello Douglas,

Thanks for your response. However I do not think the answer is
very helpful as it does not (in my opinion) addresses the
points that I raised.

Let me try again.

The whole issue is about initializing static or external variables
that are not initialized explicitly.

In the (good?) old days there was K&R C which stated (appendix A,
para 8.6):
    Static and external variables which are not initialized are
    guaranteed to start off as 0; automatic and register variables
    which are not initialized are guaranteed to start off as
    garbage.
Earlier in the K&R book (chapter 4, page 84) an example is given
where external variables are explicitly initialized with 0 and
the text says:
    These initializations are actually unnecessary since all are
    zero, but it's good form to make them explicit anyway.
So, one might expect that many K&R C programs will rely on the
fact that static and external variables that are not initialized
explicitly are implicitly set to zero.

Then comes C89. Subclause 6.5.7 says:
    If an object that has static storage duration is not
    initialized explicitly, it is initialized implicitly as if
    every member that has arithmetic type were assigned 0 and
    every member that has pointer type were assigned a null pointer
    constant.
This is exactly as what was written in K&R C; note the use of
the term 'null pointer constant' here: a null pointer constant
is 'an integral constant expression with the value 0, or such
an expression cast to type void *' (see 6.3.2.#3).
I do not think that at this time the possibility that a null
pointer (result of assigning a null pointer constant to a pointer)
might be represented by something that does not have all bits
zero was taken into account, and that still the full K&R
intentions were honored.

Then comes DR_016 which (in question 2) starts of with:
    This one is relevant only for hardware on which either
    null pointer or floating point zero is /not/ represented
    as all zero bits.
It deals with
    union { char *p; int i; } x;
and then states:
    If the null pointer is represented as, say, 0x80000000,
    then there is no way to implicitly initialize this object.
    Either the p member contains the null pointer, or the
    i member contains 0, but not both. So the behavior of this
    translation unit is undefined.
    This is a bad state of affairs. I assume it was not the
    Committee's intention to prohibit a large class of
    implicitly initialized unions; this would render a great
    deal of existing code nonconforming.
The issue here is about the representation of the null pointer
which is unspecified (6.2.6.1#1: The representation of all
types are unspecified except as stated in this subclause).
The claim of non-conformance here is false (I think): if a
program depends on the unspecified representation of pointer
types then it is non-conforming anyway.
There is however a problem with the cited union in the
context 6.5.7 of C89 related to 'what does it means that
everything is cleared, i.e. all bits are set to zero'.
My inclination in handling this problem would have been
rather than stepping forward (defining what we think all
those null bits should mean for the values of the types,
or, even worse: define what values there should be for the
various types) to take a step backward and say something
to the effect: 'the data space of the objects is cleared
(all bits set to zero); it is implementation defined(??)
what this means for the values of the objects'.

However the committee decided otherwise (the even worse
part of above) using 'null pointer' rather than 'null
pointer constant' and decided on the union issue that
only the first member was to be initialized.

This may have a silent effect on a whole class of programs
that worked correctly under the K&R/C89 specifications,
namely those programs where the 2nd (3rd, ...) member of
a union has a bigger size than the first member: for
the union
    union { int i; short sa[20] s; } u;
only the i is initialized leaving a hole in the rest of the
union; programs may no longer rely on the initialization
to 0 of the s.
To make things even worse: on 'normal' systems uninitialized
data is commonly allocated in a BSS or COMMON segment which
is usually cleared at start-up time; on these systems the
effect is not noted.
Only when you port a such a program to a system whereby
the compiler has to generate explicit initializations and
is strictly following the C99 rules the problem occurs.

So, my arguments for reopening DR #16 are:
- the committee solution, as implemented in C99, introduces
a silent change from C89 to C99 that is not even mentioned
in the rationale, and diverts explicitly from what I would
call the 'K&R/C89 spirit of C';
- the working of a program now may depend on the order
in which the members of a union are defined, which is
an unwanted effect (think of generated C programs);
- the analogy (from DR #016) with the fact that for
explicit initializations also only the first element of
a union is initialized is false: there the programmer
knows and sees what he is doing;
- one could imagine that the uninitialized holes in the
data space might be exploited and cause a security risk.

- Willem Wakker


Gwyn, Douglas (Civ, ARL/CISD) wrote:

>Classification:  UNCLASSIFIED 
>Caveats: NONE
>
>
>>In response to DR #016 (problems with the initialization of unions
>>on hardware on which either null pointer or floating point zero is
>>/not/ represented as all zero bits) the text was changed to effectively
>>require (amongst others) that for unions only the first member of a
>>union is initialized to zero.
>>Was it at the time of DR #016 (1992) considered that this creates
>>problems for cases where the members of a union have different sizes?
>>
>
>It doesn't create a problem; in C89 this behavior was arguably undefined
>due to lack of specification.
>
>
>>   union { int i; short sh[42]; } u;
>>When this program was ported to a system where the initialization needs
>>to be done explicitly by the compiler only the the first few bytes of
>>the union were set to zero, giving rather unexpected behaviour.
>>
>
>>- is this intended behaviour?
>>
>
>Yes, only the first union member is properly initialized.
>
>
>>- keeping in mind that DR #016 was written with a very special type
>>of hardware in mind:
>>
>
>But it wasn't.  The issue is that the behavior needed to be defined,
>and inherently (due to representational issues) only one member of a
>union can hold a valid value at one time, portably.
>
>
>>is it acceptable that for all other systems the behaviour of a
>>program may depend on the order in which the various members of a
>>union are defined?
>>
>
>It's inevitable in some cases, if the behavior is to have a
>reasonable, portable definition.
>
>
>>- clause 6.7.8#10 defines how the first member of a union is
>>initialized but does not say anything about the other members;
>>is their initialization (or the lack thereof) unspecified?
>>
>
>Yes.
>
>
>>If so, should the standard not say so?
>>
>
>Near the beginning is a stemenet about dependence on unspecified
>behavior.  Specifying unspecified behavior sounds like a
>contradiction..
>
>
>>- should we reopen DR #016?
>>
>
>No.  That some programmers have not appreciated some program
>portability requirements doesn't justify repoening it.  C99
>didn't change anything in this regard except to define some
>instances of previously undefined behavior.  It's not feasible
>to change this definition to do "what the programmer expects"
>(assuming that that really is something definable) on all
>platforms to which the C standard is meant to apply.
>Classification:  UNCLASSIFIED 
>Caveats: NONE
>
>

-- 

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Willem Wakker                                    email: <willemw@ace.nl>
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
ACE Consulting bv                                tel: +31 20 6646416
De Ruijterkade 113                               fax: +31 20 6750389
1011 AB  Amsterdam, The Netherlands              www: http://www.ace.nl
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee



