Document: WG14 N1118

Disposition of comments for Austin Group Review

Date: 07-Apr-05


Austin Group Review of ISO/IEC WDTR 24731
Specification for Secure C Library Functions
============================================

During the Redmond WG 14 meeting, I was asked to request review and
comment on the Secure C Library functions by the Austin Group.

This report summarizes the comments received.

1. The Austin Group in general applauds WG 14 for focusing attention
   on the problems of buffer overflow.

2. It is generally felt that the name "Secure" is  incorrect,
   and should be changed. Possible alternates include "Enhanced
   Library Interfaces" or "Robust Library Interfaces".

Committee Response:

See N1114 CA01.
3. The term "diagnosed undefined behavior" is confusing, and
   inappropriate.  Some other term should be sought.

Committee Response:

See N1114 NL01.
4. The approach of simply providing a length argument to a number
   of string functions may cause as many problems as it solves. It
   provides no assurance that the programmer provide a sensible
   value, and may end up obfuscating otherwise clear and safe
   usage of the original function.  Functions that allocate memory
   (using malloc()) would provide safer, clearer, and more robust
   interfaces. E.g. strdup() instead of strcpy().

Committee Response:

See below
5. The asctime_s() and related functions do not support localization,
   and should be dropped in favor of using strftime().

Committee Response:

With asctime_s the TR 24731 will contain a Recommended Practice that advise the programmer to use strftime(). See N1114 GB17.
6. Namespace pollution issues ... the behavior should not be
   implementation defined if __STDC_WANT_SECURE_LIB is undefined. See
   P4 para3. The undefined case should be the same as defined as 0

Committee Response:

This is a non-normative technical report. If a vendor chooses not to implement the non-normative technical report the user has nothing to do. The comment does not seem to be generated with the understanding of what an ISO TR is or what an ISO TR is meant to accomplish.
In general, many of the group found this proposed TR to be controversial,
and the Austin Group has no strong showing of support for it.

=======================================================================
Background 
----------

The Austin Group initially looked at the document in late December,
2004, and discussed the document during their Face to Face plenary
meeting in January 2005.

At that meeting, the following observations were made:

* Responses to Draft TR from WG14: "Security" Library Extensions

WG 14 have submitted a draft of this TR for CD Registration, and have
asked AG for review and comment.

The document is titled "Specification for Secure C Library Functions"
but does not cover anything related to security. It provides bound
checked string functions and other interfaces intended to reduce
programmer error. The AG respectfully suggest that the title should
be something more like "Enhanced Library Interfaces", or similar.

Security can be divided into six basic requirements, or tenets, that
help ensure data confidentiality, integrity, and availability. The
six security tenets are:

    * Identification. This deals with user names and how users identify
      themselves to the system.
    * Authentication. This deals with passwords, smart cards,
      biometrics, and so on. Authentication is how users prove to
      the system that they are who they claim to be.
    * Access control (also called authorization). This deals with
      access and the privileges granted to users so that they may
      perform certain functions on the system.
    * Confidentiality. This deals with encryption. Confidentiality
      mechanisms ensure that only authorized people can see data
      stored on or traveling across the network.
    * Integrity. This deals with checksums and digital
      signatures. Integrity mechanisms ensure that data is not garbled,
      lost, or changed when traveling across the network.
    * Nonrepudiation. This is a means of providing proof of data
      transmission or receipt so that the occurrence of a transaction
      cannot later be denied.

This TR deals only with integrity, and only a little of that.

Committee Response:

That is a mischaracterization of this TR, this TR will improve code correctness and this return effects all six of these security tenants.

However, the committee has decided to change the title to "Specification for Safer C Library Functions" to avoid confusion.

Namespace pollution issues ... should not be implementation defined
if __STDC_WANT_SECURE_LIB is undefined. See P4 para3. The undefined
case should be the same as defined as 0.

We should continue to review this document offline. Comments
to austin-group-l. Nick Stoughton will deliver the comments
to the next meeting of WG14. Mailing deadline (for WG14)
is 7 March 2005. Our deadline should be one or two days
before that. ACTION AI-2005-01-11: All to review WG14
Security TR and report to austin-group-l by 2005-3-1.
========================================================================

The following comments have been extracted from the messages
posted in response to that action item. These responses have
been slightly edited for consistency and to remove some of the
off-topic side discussions!  The entire messages can be found at
index.html

At this stage, these are "raw" comments ... they represent the opinions
of some of the approximately 550-600 participants, and have not been
officially adopted as an offical opinion of the Austin Group itself.

-----------------------------------------------

From:   Curtis Smith

This proposal seems as though it took a bit of effort to develop.
I especially like the behaviour of strncpy_s in that it ensures a
nul terminator (when possible) and does not require the writing of
extra nuls at the end of the string.

However, I don't think [this TR] would be especially useful, and I
don't envision it gaining wide acceptance due to the following:

   awkward names with _s suffix

Committee Response:

It is less awkward than the posix_ prefix :-)

   corresponding traditional functions would apparently still be
   available, so whatever "security" the functions purport to provide
   can be circumvented

Committee Response:

The Committee agrees. With the exception of the function gets, the older functions can be used correctly, but are hard to verify,. The new functions raise the level of visibility of buffer overrun related issues, and it is intended to make it easier to demonstrate that buffer overflow has been addressed. Many vendors are proposing adding a detection mechanism to their compilation environment to detect that the older functions are being used and provide warnings to help with the transition.

Committee Response:

One of the major audiences is maintainers of legacy code. The TR will contain Recommended Practice sections and a Rationale where there are even better solutions (e.g. for gets_s, asctime_s or tmpnam_s).
   anyone interested in using the new functions would probably already
   be writing code to check buffer sizes

Committee Response:

The committee agrees. However, there is a substantial amount of legacy code that is still be retrofitted, and a standardized set of interfaces to help in this problem is believed to be beneficial.
   people that want to manipulate strings are probably better off using
   C++ unless they're concerned with speed in time-critical loops in
   which they don't want to be doing superfluous checks anyway.

Committee Response:

We believe this TR is of interest to those that have a large C Legacy code, and do not have the option or want to change to C++.
   I think the term "secure" is misleading.

Committee Response:

See previous answer.
   Perpetuation of other poor interfaces (e.g., anything to do with
   time_t) is not a good idea.

Committee Response:

See answer about legacy code.
   These changes really don't offer any functionality not already
   available.

Committee Response:

See visiliby answer. And, they do offer new functionality (constraint violation handling at the minimum).
If you did want to proceed with the proposal, I think it is missing
some functionality that really does aid in robustness.

Background:  Whenever I am trying to write robust code, especially for
libraries that 3rd parties might call, I like to do some rudimentary
argument checking for invalid pointers.  For many specific operating
environments, this is possible, e.g., in Windows, the functions
IsBadReadPtr, IsBadWritePtr, and IsBadStringPtr implement this
desired functionality (although they use logic inverted from what
I suggest below).  In operating systems which do not provide such
functions, equivalents can usually be written using machine-specific
instructions to analyse the accessibilty of certain parts of memory.
Even in environments where no such cpu instructions exist, a stub set
of functions can be written to ensure that NULL pointers aren't being
used. It can be argued that these functions are not robust enough
because they don't handle the case where code in parallel suddenly
changes the readability or writability of a particular section of
memory; nevertheless, their use does indeed provide value

[[ edited out suggestion for additional robustness checking functions,
available on request ]]

Committee Response:

If the author wants to generate a proposal and submit this proposal to WG14, it will be considered.
-----------------------------------------------

From:   Marshall Wiley

As an application (rather than OS) developer, these seem like a good
start. However, this leaves one problem that I haven't seen addressed
anywhere - the last of a supported way to determine how long a va_arg
list is, and thus prevent overflow when attempting to process one. The
Open VMS HP C library offers a non-portable va_count() routine which
returns the number of quadwords (Alpha) in a va_list, which is better
than nothing, but there doesn't seem to be anything in the POSIX or C
standards to address this problem correctly. A related issue is the
apparent lack of a supported way to create a va_list, but I can't
claim that is (directly) a security problem.

Committee Response:

The committee believes such a proposal would lead to binary incompatibility, and is therefore out of scope of this TR.
As an API developer, I often need to take an incoming va_list, break
out some of the arguments, and pass the remainder to a function
such as printf (vprintf). Since I don't know how many arguments to
expect, what types they may be, nor how long the arg list actually
is, it's virtually impossible to write a supported function that
I can guarantee won't run off the end of the list. Today I have to
attempt to determine how each OS implements va_list and try to work
around this restriction. But since the behavior isn't specified, that
implementation can change tomorrow, potentially breaking my code,
or worse, bypassing my attempts to provide some degree of security.

[[ This led to a long, off topic, discussion on the merits of this
proposal, which can be largely summarized as "this will either require
hardware support or substantial compiler changes {Nick Mclaren has
details}". Most thought it was not a good idea.]]

-----------------------------------------------

>From Paul Eggert:

Let's vote against that proposal.  It's controversial, arguably
leads to buggier software, and does not reflect the consensus of
the community.

Linus Torvalds had this to say about a similar, earlier proposal:

  The above code is slow, ugly, non-straightforward, unportable and
  no more secure than the original code was.

  In short, it is just stupid code.

  But hey, if you want to advocate stupid code in public, that's
  your prerogative.  But please don't be proud of it.

  -- msg00073

The GNU C Library maintainers have rejected similar proposals, for
similar reasons.  For example, Ulrich Drepper wrote:

  This is horribly inefficient BSD crap.  Using these functions only
  leads to other errors.  Correct string handling means that you
  always know how long your strings are and therefore you can you
  memcpy (instead of strcpy).

  -- msg00053

Given this history of controversy, I am a bit surprised to see this
proposal appear before a standards body.

I am not aware of any study demonstrating that the approach embodied
by the proposed API leads to safer or more-secure software.  On the
contrary, when I very-briefly attempted to investigate the matter,
I found that it made C code harder to read and to maintain, and
even buggier; its use did not catch any bugs in the code I surveyed
and its use apparently introduced at least one bug.  I surveyed
OpenSSH_3.0.2p1, the then-current version.  For some details please
see:

-- msg00096.html
-- msg00159.html

My understanding is that the proposed functions are mainly useful in
environments where developers must take a lot of existing code, much of
it of poor quality, and try to limit the scope of some of its faults
without necessarily having to understand it thoroughly.  The goal is
mainly to prevent certain low-level failures, not to improve the code
quality or functionality.  This is a fairly specialized need, and does
not form sufficient justification for introducing these primitives
into a widely used standard.  Those who have such a need can easily
write the functions themselves, and include their implementations as
part of their application.

A new API like this should reflect consensus.  We should not
standardize on a new API that is so obviously controversial.

[[Clive Feather points out "that this is a Technical Report. That is,
it is *NOT* a proposal to change the Standard; it's a document to
give some consensus to efforts in this area.

I have a list of detailed comments on the document, but I'll just
repeat my first one here: it should not be called "Secure functions"
or anything like that; the correct name should be something along
the lines of "Functions with Bounds Checking and/or Re-entrancy".

I don't see the harm, once my issues are fixed, in issuing this as a
TR - it gives a point of focus. Either the concepts will be taken up,
or people will adjust them to meet real-world concerns, or they'll die.

I would take a very different view, however, to bringing them into
the C Standard at this point."]]

Committee Response:

See visiablity comment. The understanding of the proposed usefullness of the new functions is correct. The intention is not to alter the current C standard.
-----------------------------------------------

>From Glenn Seeds

While I may not agree with the details of what has been proposed,
I applaud the recognition that something needs to be done, and I
believe we should support the effort by making constructive comments.

I certainly don't agree with a point of view that says "good
programmers don't need this stuff". I do agree that any "improvements"
must satisfy some key criteria: - Code must not be made less readable
by their use.  - It should be possible to implement them so that
they perform as well as equivalently robust hand-written code in
the application.

Committee Response:

The committee agrees in principal, and the TR can not control the way the new functions are used. See comments on visiability.
1) I was confused by the term "secure", which implied something about
security (i.e. access control). To avoid confusing others, could we
not use the term "robust" instead?

Committee Response:

See previous comments on this.
2) The current [POSIX] standard defines 2 methods for obtaining the
text corresponding to an error code: strerror, and strerror_r

strerror is not thread safe. strerror_r was introduced for this
purpose.

Unfortunately, strerror_r has a serious usability problem. It requires
the caller to supply a fixed buffer for the result, but there is no
way to determine how big this buffer needs to be. The only way to
make this work in general is to supply an initial buffer, check for
overflow, re-allocate, and try again. Repeat until success, or you've
used all available memory. Minimal code looks something like this:
[program fragment edited to work!]

size_t buflen; 
char *buf; 
buflen = 100; 
while (0 != (buf = malloc(buflen))) {
    if (0 == (strerror_r(errno, buf, buflen)) )
        break;
    free(buf); buflen++;
}

A possible solution is to add a strerrorlen(errno) to get the required
buffer size. The resulting code would be:

size_t buflen; char *buf; buflen = 1+strerrorlen(errno); buf =
malloc(buflen); strerror_r(errno, buf, buflen);

Committee Response:

The Committee agrees, something along this line will be considered. Also see N1114 CA04.
[Additional comments] strerror_s() as defined in the proposal does
*not* solve the usability problems identified above. Strerrorlen() is
easy to implement efficiently, and does not impair the readabaility
of the application.  While the malloc() approach used above does
have a performance impact, applications are not forced to do that. If
simple truncation is acceptable for the application, then the existing
strerror_r() is sufficient.

-----------------------------------------------

>From Bruce Korb

Every new interface introduces new complexity, so throwing in new stuff
to standardize needs to be widely regarded as useful.  Paul [Eggert]
did a reasonably careful code analysis of OpenSSH and did not find that
it had any appreciable benefit.  "appreciable benefit" to me would mean
improved conciseness with equivalent or better verification.  I do
not see that this interface provides that, either.  It is intended
to do so, but when you get around to actually coding something up
with it, it just fails to provide anything appreciable.  So, perhaps
instead of starting with the premise, "I think this will help guide
programmers into being more careful", instead start with examining
the misusages of string copies and providing interfaces that can
more helpful.  Both of which are outside of POSIX, of course.  But,
just as a couple of examples:

  char* asprintf( const char*, ...) -- allocating sprintf function
  char* strjoin( const char*, ...)  -- multi-string "strdup()"

Committee Response:

Yes, a proposal along these line would be considered if presented to the committee. See below.
-----------------------------------------------

>From Garret Wollman

I think in particular the draft's attempts to save some of the
fundamentally broken interfaces in C by restricting them to a magic
maximum buffer size is very much the wrong way to go.

General comments:

* __STDC_WANT_SECURE_LIB__ is a horrible botch.  [[but its the same
technique that POSIX uses ...]]

* tmpnam_s() (and functions like it) cannot be used securely; it
should not be included.

Committee Response:

The TR will contain Recommended Practice that will point to tmpfile_s should be used. The TR will also have a detailed Rationale. Also see US27-B.
* The whole concept of "diagnosed undefined behavior" as invoked
(e.g.)  in section 5 does not bear contemplation.  [[Clive Feather:
The *name* of the concept is foul, but what it's trying to do isn't
as bad as all that.]]

Committee Response:

See N1114 NL01.
* The fopen_s() etc. functions are not improvements over the Standard
functions.

Committee Response:

See N1114 US03.
* I don't believe the invention of `rsize_t' and `errno_t' improves
anything.  In general, new `foo_t' types are almost always a mistake,
since C does not have incomplete typedefs.

Committee Response:

See N1114 GB04.
* The improvements to the scanf() family seem reasonable, but I'm
dubious as to the value of improving scanf() since it is nearly always
the wrong function to use when parsing input.

Committee Response:

Helping to improve legacy code.
* gets_s() is a pointless interface; fgets() does the useful part of
what gets_s() is described as doing.  By contrast, an interface like
fgetln() would be a meaningful improvement.

Committee Response:

See N1114 JP01.
* getenv_s() does not offer anything over strdup(getenv(...)) except
unnecessary complexity.  [[strdup is not a C function, its a POSIX
one]]

Committee Response:

See legacy code answer.
* OpenBSD's strlcpy() would be an improvement over the strcpy_s()
interface described here, and has the benefit of actually being used
in a real implementation, Linux developers notwithstanding.

Committee Response:

See legacy code answer.
* The 4.4BSD function strsep() is an improvement over the Standard
strtok() and strtok_r() interfaces; this draft's strtok_s() is not.

Committee Response:

See legacy code answer.
* I would agree with the comments others have made regarding
strerror_s().  As defined, the whole strerror() family is a horrible
interface.  In retrospect, it would have been much better to simply
standardize "%m" as a format specifier for printf().

Committee Response:

This is outside the scope of this work item.
* Extending the current time interfaces at this time is inadvisable.
They are almost all broken with respect to timezones and localization;
this needs to be on a separate standardization track.

Committee Response:

See above.
-----------------------------------------------

>From Ulrich Drepper

The proposed safe(r) ISO C library fails to address to issue completely.
  The problem with the existing interfaces is that the programmer has to
put in a lot of additional effort to make sure the program behaves
correctly.  In many situations a much simpler code, with all kinds of
error checking removed, works equally well and therefore is left out.
This is the core of the problem.  Code is rather written like

   char *p = malloc (3 * NAME_LEN);
   strcpy (p, name1);
   strcat (p, name2);
   strcat (p, name3);

for some fixed (but arbitrarily chosen NAME_LEN) than

   char *p = malloc (strlen (name1) + strlen (name2) + strlen (name3) + 1);
   if (p != NULL)
     {
       strcpy (p, name1);
       strcat (p, name2);
       strcat (p, name3);
     }

Code like the former can be found in almost all software projects.  The
second code is just so much longer and, if something needs to change, it
has to be changed in two places.  Programmers are lazy, though, so this
is a big deterrent.


Proposing to make the life of a programmer even harder is not going to
help.  But this is exactly what is proposed.  Requiring the programming
to write

   size_t len = 3 * NAME_LEN;
   char *p = NULL;
  again:
   char *p2 = realloc (p, len);
   if (p2 == NULL)
     abort ();
   p = p2;
   if (strcpy_s (p, len, name1) != 0
       || strcat_s (p, len, name2) != 0
       || strcat_s (p, len name3) != 0)
     {
       len += 2;
       goto again;
     }

will make the situation worst, if it changes anything.  Programmers
won't put up with the amount of new code needed.  The resulting code is
probably more complicated than the second example code which is
perfectly safe as well.  The program in both cases needs to keep track
of the length of the string, so why not do it right from the beginning?

Committee Response:

This committee is not addressing lazy programmers. The committee felt the examples given were a little misleading, and believe that
   char *p = malloc (3 * NAME_LEN);
   strcpy_s (p, 3 * NAMELEN, name1);    // there will be a constraint violation if malloc failed
   strcat_s (p, 3 * NAMELEN, name2);
   strcat_s (p, 3 * NAMELEN, name3);

will generate a much safer outcome.
In the first example given, if the malloc failed, there would have been
undefined behavior, as there would have been if
strlen(name1)+strlen(name2)+strlen(name3) > 3*NAMELEN.
However,
with this simple substitution for the safer functions,
there is no undefined behavior. If malloc failed, at the end of this code
fragment, p would be NULL, and no undefined actions would have occurred
(though the implementation defined constraint violation handler would have
been executed three times). If the overall length was too long, then p would
point to memory that had the first byte set to 0. This memory could be
successfully freed later.

If programmers are asked about what they want it is automatic, implicit
memory allocation.  I know that this is contrary to the design decision
of ISO C where malloc et.al. are only used explicitly (and not
implicitly by some interface).  That's OK, but it does not justify
inventing (and this all is pure invention) useless interfaces.

It is clear that those who wish a string call should choose appropriate
programming languages which either have it build in or allow to
implement it easily using OO methods.  C does not fall into this
category.  But if the no-implicit-malloc rule is ignored (as POSIX
does), there are many possibilities.  The GNU C library has many
functions which provide implicit memory management:

   int asprintf (char **s, const char *fmt, ...)
   int vasprintf (char **s, const char *fmt, va_list ap)

These functions work similar to sprintf() but actually allocate the
memory needed for the result with an adequately sized call to malloc().
  For more complicated situations this interface is useful:

   FILE *open_memstream (char **bufp, size_t *sizep)

The function returns a stream which can be used with any of the stream
operations.  The output is accumulated in a memory buffer the runtime
allocates.  Upon fclose() to variables references in the
open_memstream() call are filled with the final string.  This interface
allows to create arbitrarily complex strings.  Some people suggested
more interfaces like strjoin(), which would just append an arbitrary
number of strings, but this is something which can be easily handled
with asprintf() and open_memstream().

In a similar fashion, stdio functions should be extended.  glibc
contains getline()/getdelim() to replace fgets() etc.  These functions
do their own memory handling and there will never be a need again to
arbitrary limit line length (and the resulting errors in the code
handling too long lines).

scanf() with %s etc is another case.  Passing the length means the
programmer has to come with the error handling (and the correct length
value to pass to the function).  This is ignored as well.  The
alternative is to have scanf() allocate the memory.  glibc uses a format
flag ('a' in our case, yes it conflicts with C99).  I.e.,

    char *p;
    scanf(f, "%as", &p)

will cause p to be a freshly allocated buffer.


These are the kind of extensions which, without a OO framework in the
language, make the programmer's life easier and therefore will actually
be used.  None of the proposed interfaces can say that.  They all
require more work to be done or are just plain silly.

Committee Response:

Yes, a proposal along these line would be considered if presented to the committee.
ACTION: Nick Stoughton to develop a proposal to submit for consideration at the next meeting.
-----------------------------------------------

>From Konrad Schwarz

Diagnosed undefined behavior should be flagged by raise(3)'ing an
appropriate signal.

(I think there is a pretty clear trend towards memory protection -- not
necessarily paging support -- in embedded processor designs.)

This is in addition to the other criticims already posted.

Committee Response:

The committee is sympathetic to this idea, but the current (almost conforming) implementations have used other mechanisms, therefore the committee is uneasy about requiring the use of raise(). The current draft has "implementation defined" behavior for what is now expected to be called a "constraint violation". So, an implementation that uses (and documents) raise(3) would be conforming.