Document: WG14 N1106 Date: 07-Mar-05 Austin Group Review of ISO/IEC WDTR 24731 Specification for Secure C Library Functions ============================================ During the Redmond WG 14 meeting, I was asked to request review and comment on the Secure C Library functions by the Austin Group. This report summarizes the comments received. 1. The Austin Group in general applauds WG 14 for focusing attention on the problems of buffer overflow. 2. It is generally felt that the name "Secure" is incorrect, and should be changed. Possible alternates include "Enhanced Library Interfaces" or "Robust Library Interfaces". 3. The term "diagnosed undefined behavior" is confusing, and inappropriate. Some other term should be sought. 4. The approach of simply providing a length argument to a number of string functions may cause as many problems as it solves. It provides no assurance that the programmer provide a sensible value, and may end up obfuscating otherwise clear and safe usage of the original function. Functions that allocate memory (using malloc()) would provide safer, clearer, and more robust interfaces. E.g. strdup() instead of strcpy(). 5. The asctime_s() and related functions do not support localization, and should be dropped in favor of using strftime(). 6. Namespace pollution issues ... the behavior should not be implementation defined if __STDC_WANT_SECURE_LIB is undefined. See P4 para3. The undefined case should be the same as defined as 0 In general, many of the group found this proposed TR to be controversial, and the Austin Group has no strong showing of support for it. ======================================================================= Background ---------- The Austin Group initially looked at the document in late December, 2004, and discussed the document during their Face to Face plenary meeting in January 2005. At that meeting, the following observations were made: * Responses to Draft TR from WG14: "Security" Library Extensions WG 14 have submitted a draft of this TR for CD Registration, and have asked AG for review and comment. The document is titled "Specification for Secure C Library Functions" but does not cover anything related to security. It provides bound checked string functions and other interfaces intended to reduce programmer error. The AG respectfully suggest that the title should be something more like "Enhanced Library Interfaces", or similar. Security can be divided into six basic requirements, or tenets, that help ensure data confidentiality, integrity, and availability. The six security tenets are: * Identification. This deals with user names and how users identify themselves to the system. * Authentication. This deals with passwords, smart cards, biometrics, and so on. Authentication is how users prove to the system that they are who they claim to be. * Access control (also called authorization). This deals with access and the privileges granted to users so that they may perform certain functions on the system. * Confidentiality. This deals with encryption. Confidentiality mechanisms ensure that only authorized people can see data stored on or traveling across the network. * Integrity. This deals with checksums and digital signatures. Integrity mechanisms ensure that data is not garbled, lost, or changed when traveling across the network. * Nonrepudiation. This is a means of providing proof of data transmission or receipt so that the occurrence of a transaction cannot later be denied. This TR deals only with integrity, and only a little of that. Namespace pollution issues ... should not be implementation defined if __STDC_WANT_SECURE_LIB is undefined. See P4 para3. The undefined case should be the same as defined as 0. We should continue to review this document offline. Comments to austin-group-l. Nick Stoughton will deliver the comments to the next meeting of WG14. Mailing deadline (for WG14) is 7 March 2005. Our deadline should be one or two days before that. ACTION AI-2005-01-11: All to review WG14 Security TR and report to austin-group-l by 2005-3-1. ======================================================================== The following comments have been extracted from the messages posted in response to that action item. These responses have been slightly edited for consistency and to remove some of the off-topic side discussions! The entire messages can be found at http://www.opengroup.org/austin/mailarchives/ag/index.html At this stage, these are "raw" comments ... they represent the opinions of some of the approximately 550-600 participants, and have not been officially adopted as an offical opinion of the Austin Group itself. ----------------------------------------------- From: Curtis Smith This proposal seems as though it took a bit of effort to develop. I especially like the behaviour of strncpy_s in that it ensures a nul terminator (when possible) and does not require the writing of extra nuls at the end of the string. However, I don't think [this TR] would be especially useful, and I don't envision it gaining wide acceptance due to the following: awkward names with _s suffix corresponding traditional functions would apparently still be available, so whatever "security" the functions purport to provide can be circumvented anyone interested in using the new functions would probably already be writing code to check buffer sizes people that want to manipulate strings are probably better off using C++ unless they're concerned with speed in time-critical loops in which they don't want to be doing superfluous checks anyway. I think the term "secure" is misleading. Perpetuation of other poor interfaces (e.g., anything to do with time_t) is not a good idea. These changes really don't offer any functionality not already available. If you did want to proceed with the proposal, I think it is missing some functionality that really does aid in robustness. Background: Whenever I am trying to write robust code, especially for libraries that 3rd parties might call, I like to do some rudimentary argument checking for invalid pointers. For many specific operating environments, this is possible, e.g., in Windows, the functions IsBadReadPtr, IsBadWritePtr, and IsBadStringPtr implement this desired functionality (although they use logic inverted from what I suggest below). In operating systems which do not provide such functions, equivalents can usually be written using machine-specific instructions to analyse the accessibilty of certain parts of memory. Even in environments where no such cpu instructions exist, a stub set of functions can be written to ensure that NULL pointers aren't being used. It can be argued that these functions are not robust enough because they don't handle the case where code in parallel suddenly changes the readability or writability of a particular section of memory; nevertheless, their use does indeed provide value [[ edited out suggestion for additional robustness checking functions, available on request ]] ----------------------------------------------- From: Marshall Wiley As an application (rather than OS) developer, these seem like a good start. However, this leaves one problem that I haven't seen addressed anywhere - the last of a supported way to determine how long a va_arg list is, and thus prevent overflow when attempting to process one. The Open VMS HP C library offers a non-portable va_count() routine which returns the number of quadwords (Alpha) in a va_list, which is better than nothing, but there doesn't seem to be anything in the POSIX or C standards to address this problem correctly. A related issue is the apparent lack of a supported way to create a va_list, but I can't claim that is (directly) a security problem. As an API developer, I often need to take an incoming va_list, break out some of the arguments, and pass the remainder to a function such as printf (vprintf). Since I don't know how many arguments to expect, what types they may be, nor how long the arg list actually is, it's virtually impossible to write a supported function that I can guarantee won't run off the end of the list. Today I have to attempt to determine how each OS implements va_list and try to work around this restriction. But since the behavior isn't specified, that implementation can change tomorrow, potentially breaking my code, or worse, bypassing my attempts to provide some degree of security. [[ This led to a long, off topic, discussion on the merits of this proposal, which can be largely summarized as "this will either require hardware support or substantial compiler changes {Nick Mclaren has details}". Most thought it was not a good idea.]] ----------------------------------------------- >From Paul Eggert: Let's vote against that proposal. It's controversial, arguably leads to buggier software, and does not reflect the consensus of the community. Linus Torvalds had this to say about a similar, earlier proposal: The above code is slow, ugly, non-straightforward, unportable and no more secure than the original code was. In short, it is just stupid code. But hey, if you want to advocate stupid code in public, that's your prerogative. But please don't be proud of it. -- The GNU C Library maintainers have rejected similar proposals, for similar reasons. For example, Ulrich Drepper wrote: This is horribly inefficient BSD crap. Using these functions only leads to other errors. Correct string handling means that you always know how long your strings are and therefore you can you memcpy (instead of strcpy). -- Given this history of controversy, I am a bit surprised to see this proposal appear before a standards body. I am not aware of any study demonstrating that the approach embodied by the proposed API leads to safer or more-secure software. On the contrary, when I very-briefly attempted to investigate the matter, I found that it made C code harder to read and to maintain, and even buggier; its use did not catch any bugs in the code I surveyed and its use apparently introduced at least one bug. I surveyed OpenSSH_3.0.2p1, the then-current version. For some details please see: http://sources.redhat.com/ml/libc-alpha/2002-01/msg00096.html http://sources.redhat.com/ml/libc-alpha/2002-01/msg00159.html My understanding is that the proposed functions are mainly useful in environments where developers must take a lot of existing code, much of it of poor quality, and try to limit the scope of some of its faults without necessarily having to understand it thoroughly. The goal is mainly to prevent certain low-level failures, not to improve the code quality or functionality. This is a fairly specialized need, and does not form sufficient justification for introducing these primitives into a widely used standard. Those who have such a need can easily write the functions themselves, and include their implementations as part of their application. A new API like this should reflect consensus. We should not standardize on a new API that is so obviously controversial. [[Clive Feather points out "that this is a Technical Report. That is, it is *NOT* a proposal to change the Standard; it's a document to give some consensus to efforts in this area. I have a list of detailed comments on the document, but I'll just repeat my first one here: it should not be called "Secure functions" or anything like that; the correct name should be something along the lines of "Functions with Bounds Checking and/or Re-entrancy". I don't see the harm, once my issues are fixed, in issuing this as a TR - it gives a point of focus. Either the concepts will be taken up, or people will adjust them to meet real-world concerns, or they'll die. I would take a very different view, however, to bringing them into the C Standard at this point."]] ----------------------------------------------- >From Glenn Seeds While I may not agree with the details of what has been proposed, I applaud the recognition that something needs to be done, and I believe we should support the effort by making constructive comments. I certainly don't agree with a point of view that says "good programmers don't need this stuff". I do agree that any "improvements" must satisfy some key criteria: - Code must not be made less readable by their use. - It should be possible to implement them so that they perform as well as equivalently robust hand-written code in the application. 1) I was confused by the term "secure", which implied something about security (i.e. access control). To avoid confusing others, could we not use the term "robust" instead? 2) The current [POSIX] standard defines 2 methods for obtaining the text corresponding to an error code: strerror, and strerror_r strerror is not thread safe. strerror_r was introduced for this purpose. Unfortunately, strerror_r has a serious usability problem. It requires the caller to supply a fixed buffer for the result, but there is no way to determine how big this buffer needs to be. The only way to make this work in general is to supply an initial buffer, check for overflow, re-allocate, and try again. Repeat until success, or you've used all available memory. Minimal code looks something like this: [program fragment edited to work!] size_t buflen; char *buf; buflen = 100; while (0 != (buf = malloc(buflen))) { if (0 == (strerror_r(errno, buf, buflen)) ) break; free(buf); buflen++; } A possible solution is to add a strerrorlen(errno) to get the required buffer size. The resulting code would be: size_t buflen; char *buf; buflen = 1+strerrorlen(errno); buf = malloc(buflen); strerror_r(errno, buf, buflen); [Additional comments] strerror_s() as defined in the proposal does *not* solve the usability problems identified above. Strerrorlen() is easy to implement efficiently, and does not impair the readabaility of the application. While the malloc() approach used above does have a performance impact, applications are not forced to do that. If simple truncation is acceptable for the application, then the existing strerror_r() is sufficient. ----------------------------------------------- >From Bruce Korb Every new interface introduces new complexity, so throwing in new stuff to standardize needs to be widely regarded as useful. Paul [Eggert] did a reasonably careful code analysis of OpenSSH and did not find that it had any appreciable benefit. "appreciable benefit" to me would mean improved conciseness with equivalent or better verification. I do not see that this interface provides that, either. It is intended to do so, but when you get around to actually coding something up with it, it just fails to provide anything appreciable. So, perhaps instead of starting with the premise, "I think this will help guide programmers into being more careful", instead start with examining the misusages of string copies and providing interfaces that can more helpful. Both of which are outside of POSIX, of course. But, just as a couple of examples: char* asprintf( const char*, ...) -- allocating sprintf function char* strjoin( const char*, ...) -- multi-string "strdup()" ----------------------------------------------- >From Garret Wollman I think in particular the draft's attempts to save some of the fundamentally broken interfaces in C by restricting them to a magic maximum buffer size is very much the wrong way to go. General comments: * __STDC_WANT_SECURE_LIB__ is a horrible botch. [[but its the same technique that POSIX uses ...]] * tmpnam_s() (and functions like it) cannot be used securely; it should not be included. * The whole concept of "diagnosed undefined behavior" as invoked (e.g.) in section 5 does not bear contemplation. [[Clive Feather: The *name* of the concept is foul, but what it's trying to do isn't as bad as all that.]] * The fopen_s() etc. functions are not improvements over the Standard functions. * I don't believe the invention of `rsize_t' and `errno_t' improves anything. In general, new `foo_t' types are almost always a mistake, since C does not have incomplete typedefs. * The improvements to the scanf() family seem reasonable, but I'm dubious as to the value of improving scanf() since it is nearly always the wrong function to use when parsing input. * gets_s() is a pointless interface; fgets() does the useful part of what gets_s() is described as doing. By contrast, an interface like fgetln() would be a meaningful improvement. * getenv_s() does not offer anything over strdup(getenv(...)) except unnecessary complexity. [[strdup is not a C function, its a POSIX one]] * OpenBSD's strlcpy() would be an improvement over the strcpy_s() interface described here, and has the benefit of actually being used in a real implementation, Linux developers notwithstanding. * The 4.4BSD function strsep() is an improvement over the Standard strtok() and strtok_r() interfaces; this draft's strtok_s() is not. * I would agree with the comments others have made regarding strerror_s(). As defined, the whole strerror() family is a horrible interface. In retrospect, it would have been much better to simply standardize "%m" as a format specifier for printf(). * Extending the current time interfaces at this time is inadvisable. They are almost all broken with respect to timezones and localization; this needs to be on a separate standardization track. ----------------------------------------------- >From Ulrich Drepper The proposed safe(r) ISO C library fails to address to issue completely. The problem with the existing interfaces is that the programmer has to put in a lot of additional effort to make sure the program behaves correctly. In many situations a much simpler code, with all kinds of error checking removed, works equally well and therefore is left out. This is the core of the problem. Code is rather written like char *p = malloc (3 * NAME_LEN); strcpy (p, name1); strcat (p, name2); strcat (p, name3); for some fixed (but arbitrarily chosen NAME_LEN) than char *p = malloc (strlen (name1) + strlen (name2) + strlen (name3) + 1); if (p != NULL) { strcpy (p, name1); strcat (p, name2); strcat (p, name3); } Code like the former can be found in almost all software projects. The second code is just so much longer and, if something needs to change, it has to be changed in two places. Programmers are lazy, though, so this is a big deterrent. Proposing to make the life of a programmer even harder is not going to help. But this is exactly what is proposed. Requiring the programming to write size_t len = 3 * NAME_LEN; char *p = NULL; again: char *p2 = realloc (p, len); if (p2 == NULL) abort (); p = p2; if (strcpy_s (p, len, name1) != 0 || strcat_s (p, len, name2) != 0 || strcat_s (p, len name3) != 0) { len += 2; goto again; } will make the situation worst, if it changes anything. Programmers won't put up with the amount of new code needed. The resulting code is probably more complicated than the second example code which is perfectly safe as well. The program in both cases needs to keep track of the length of the string, so why not do it right from the beginning? If programmers are asked about what they want it is automatic, implicit memory allocation. I know that this is contrary to the design decision of ISO C where malloc et.al. are only used explicitly (and not implicitly by some interface). That's OK, but it does not justify inventing (and this all is pure invention) useless interfaces. It is clear that those who wish a string call should choose appropriate programming languages which either have it build in or allow to implement it easily using OO methods. C does not fall into this category. But if the no-implicit-malloc rule is ignored (as POSIX does), there are many possibilities. The GNU C library has many functions which provide implicit memory management: int asprintf (char **s, const char *fmt, ...) int vasprintf (char **s, const char *fmt, va_list ap) These functions work similar to sprintf() but actually allocate the memory needed for the result with an adequately sized call to malloc(). For more complicated situations this interface is useful: FILE *open_memstream (char **bufp, size_t *sizep) The function returns a stream which can be used with any of the stream operations. The output is accumulated in a memory buffer the runtime allocates. Upon fclose() to variables references in the open_memstream() call are filled with the final string. This interface allows to create arbitrarily complex strings. Some people suggested more interfaces like strjoin(), which would just append an arbitrary number of strings, but this is something which can be easily handled with asprintf() and open_memstream(). In a similar fashion, stdio functions should be extended. glibc contains getline()/getdelim() to replace fgets() etc. These functions do their own memory handling and there will never be a need again to arbitrary limit line length (and the resulting errors in the code handling too long lines). scanf() with %s etc is another case. Passing the length means the programmer has to come with the error handling (and the correct length value to pass to the function). This is ignored as well. The alternative is to have scanf() allocate the memory. glibc uses a format flag ('a' in our case, yes it conflicts with C99). I.e., char *p; scanf(f, "%as", &p) will cause p to be a freshly allocated buffer. These are the kind of extensions which, without a OO framework in the language, make the programmer's life easier and therefore will actually be used. None of the proposed interfaces can say that. They all require more work to be done or are just plain silly. ----------------------------------------------- >From Konrad Schwarz Diagnosed undefined behavior should be flagged by raise(3)'ing an appropriate signal. (I think there is a pretty clear trend towards memory protection -- not necessarily paging support -- in embedded processor designs.) This is in addition to the other criticims already posted.