From owner-sc22wg14@open-std.org  Sun Aug 31 15:08:05 2008
Return-Path: <owner-sc22wg14@open-std.org>
X-Original-To: sc22wg14-domo1
Delivered-To: sc22wg14-domo1@open-std.org
Received: by open-std.org (Postfix, from userid 521)
	id 953D3E2A1E; Sun, 31 Aug 2008 15:08:04 +0200 (CET DST)
X-Original-To: sc22wg14@open-std.org
Delivered-To: sc22wg14@open-std.org
Received: from v-smtp-auth-relay-4.gradwell.net (v-smtp-auth-relay-4.gradwell.net [79.135.125.43])
	by open-std.org (Postfix) with ESMTP id C707CE2A1E
	for <sc22wg14@open-std.org>; Sun, 31 Aug 2008 15:07:50 +0200 (CET DST)
Received: from digraph.polyomino.org.uk ([81.187.227.50] country=GB ident=postmaster*pop3#polyomino*org$uk)
          by v-smtp-auth-relay-4.gradwell.net with esmtpa (Gradwell gwh-smtpd 1.290) id 48ba97a5.a78.541; Sun, 31 Aug 2008 14:07:49 +0100
          (envelope-sender <jsm@polyomino.org.uk>)
Received: from jsm28 (helo=localhost)
	by digraph.polyomino.org.uk with local-esmtp (Exim 4.68)
	(envelope-from <jsm@polyomino.org.uk>)
	id 1KZmei-00030H-NQ; Sun, 31 Aug 2008 13:07:48 +0000
Date: Sun, 31 Aug 2008 13:07:48 +0000 (UTC)
From: "Joseph S. Myers" <jsm@polyomino.org.uk>
X-X-Sender: jsm28@digraph.polyomino.org.uk
To: Thomas Plum <tplum@plumhall.com>
cc: WG14 <sc22wg14@open-std.org>
Subject: RE: (SC22WG14.11499) RE: critical undefined behavior
In-Reply-To: <7CE7F921A61FC24F9C443A0E164BB3880303B585@ms12.MSE7.exchange.ms>
Message-ID: <Pine.LNX.4.64.0808311232460.11243@digraph.polyomino.org.uk>
References: <20080715014235.B13D4D84DD@open-std.org> <20080715121121.BA3BCD86A5@open-std.org>
 <20080715182412.4E658D9E48@open-std.org> <20080715205352.56F62DA196@open-std.org>
 <7CE7F921A61FC24F9C443A0E164BB3880303B585@ms12.MSE7.exchange.ms>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-sc22wg14@open-std.org
Precedence: bulk

On Fri, 29 Aug 2008, Thomas Plum wrote:

> >First, there isn't a clear definition of what the bounds are; if you are
> >trying to bound possible behavior, I think it needs to be clear enough to
> >define a set of possible executions in the abstract machine.
> 
> Several people (I thought yourself included) have suggested that each 
> time we try to define possible executions we eliminate optimizations 
> that would in fact not destroy "causality" or "system structures".

Indeed.  The logical deductions are fully intended: I believe that if you 
define instances of undefined behavior well enough to allow 
security-relevant deductions to be made about the behavior of a program in 
the possible presence of such behavior, then you require implementations 
to change in a way that eliminates useful optimizations; contrapositively, 
if your definition does not require avoiding such optimizations then I 
don't think it will allow useful reasoning about programs beyond what is 
already possible.

In many, perhaps most, cases, the optimizations won't cause the effects 
you wish to avoid, but in some cases they will, and I don't think you can 
effectively define things so as to avoid only the problem cases of 
optimizations.  Boundaries of expressions, statements, basic blocks, 
functions, translation units and source code languages that used to be 
effective barriers to optimizations are such barriers no more; whole 
program optimization will deduce whatever information about the program it 
can and propagate it wherever it can, quite possibly on internal 
representations covering translation units written in multiple languages, 
and future standard versions should bear in mind such developments in 
optimization.  This includes, obviously, addressing issues such as DR#314 
that arise in whole-program optimization; it also includes avoiding 
poorly-defined changes that impair optimization or changes based on 
outdated models of optimization that assume propagation of information can 
be restricted to a unit smaller than the whole program.

If you have references to the literature of optimization that deal with 
how to make the useful optimizations based on undefined behavior without 
the undesired consequences, they would of course be appreciated.

> >you have an if condition involving an expression whose evaluation involved
> >undefined behavior, if (cond) { A; B; } else { C; D; }.  Depending on

> >and sometimes as false.  Clearly executing A; D; or C; B; in the above
> >case would be a problematic control flow effect.

> I know that WG14 appreciates these detailed sketches of cutting-edge 
> optimization methods.  One way to summarize the observations above is that
> "indeterminate value" might include "wobbly" values that don't look the
> same each time they're used.  All of the examples above do not appear to
> modify any values that weren't produced by the offending expression, so
> I'm not sure why you're insisting on defining "critical" as stringently
> as you suggest here.

In the example above; A might initialize a pointer P and C might 
initialize a pointer Q; B store a value in *P and D store a value in *Q, 
with obvious critical results from A; D; or C; B;.

If you don't define the original undefined behavior as critical, you get 
so many possible instances like the above where the paths through the 
program differ from what might have been intended by the author that I 
doubt it can be usefully analyzed.  References to the literature showing 
otherwise would of course be appreciated.

As another example suppose you have if (cond) { A[1] = X; } else { A[0] = 
X; } - an implementation may determine that in the absence of undefined 
behavior the condition must have value 0 or 1 and so convert this to 
A[cond] = X;.  (Branches are often expensive, especially if the processor 
mispredicts them, so eliminating them where possible is a good idea, 
leading to transformations like this.)  Now undefined behavior, somewhere 
in the integer arithmetic of the condition, means the condition ends up 
evaluating to a value other than 0 or 1, and there is an out-of-bounds 
store that didn't appear in the original source code.

> >In this area as in others I think we need to avoid invention.  
> 
> Unfortunately, the security problems that arise from out-of-bounds stores
> cannot be considered "inventions"; these are problems going back to the
> origins of C.  It's just that each year the negative implications are
> increasingly appreciated.

The problems are not inventions; some solutions may be inventions and 
others not.  We should only adopt solutions that are not inventions.  For 
example, there have been various implementations of bounds checking for C, 
varying in their approaches and how thoroughly they will detect different 
sources of overflow.  We could investigate the details of these and 
consider adding a bounds checking option, such that implementations of the 
option guarantee to detect certain classes of bounds violations (that 
there is existing practice to detect) and abort on them rather than 
triggering undefine behavior.  We could call this a security profile and 
combine this with defined behavior for some other instances of undefined 
behavior (*not* "undefined except ...", but a clear definition of what the 
results are in certain cases under this profile, or at least specific 
bounds on possible executions such as "unspecified value").  Or we could 
split this up into multiple profiles, each having implementation practice 
as a guide.

I think such optional profiles are clearly better than either 
unconditional changes to restrict undefined behavior with costs for those 
uses of C where the security issues are not problems, or ill-defined 
changes whether conditional or not.  I support the inclusion of such 
optional profiles where based on existing practice for C; profiles based 
on existing implementation practice for other languages such as Ada could 
certainly be considered (and I certainly recommend bearing such languages 
in mind when designing profiles based on C practice), but much more care 
would be needed there.

-- 
Joseph S. Myers
jsm@polyomino.org.uk