Document number: N2353
Submitter: Martin Sebor
Submission Date: March 18, 2019
Subject: Add strdup and strndup to C2X

Summary

Copying character data is among the most common operations programs do. To dynamically allocate storage for a copy of a string and initialize the storage with the copy, a C program must take three steps: first determine the length of the string, then allocate the storage for the copy, and finally, if the allocation is successful, copy the string into it. So a utility function a program that often makes string copies might contain could look something like this:

	char* copy_string (const char *str)
	{
  	  size_t len = strlen (str);       // step 1
	  char *copy = malloc (len + 1);   // step 2
	  if (copy)
  	    strcpy (copy, str);            // step 3
	  return copy;
	}
That's not too bad, although it may be the most efficient way to implement the function. Replacing the strcpy call with memcpy might yield a slightly better performance. As a data point, not counting tests, the latest GCC source tree contains 534 calls to such functions, the Binutils/GDB tree 760 of them, and the Linux kernel tree 1074.

When it isn't known whether the character data is nul-terminated, a C program must first determine whether or not it is, and only then allocate the appropriate required amount of storage. So to handle such data, the function above might need to be modified as follows:

	char* copy_array (const char *str, size_t str_size)
	{
  	  char *nul = memchr (str, '\0', str_size);   // step 1
	  size_t len;                                 // still step 1
	  if (nul)                                    // …
	    len = nul - str;                          // …
	  else                                        // …
	    len = str_size;                           // end of step 1
	  char *copy = malloc (len + 1);              // step 2
  	  if (copy)
	  {
  	    memcpy (copy, str, len);                  // step 3
	    copy[len] = '\0';                         // also step 3
	  }
	  return copy;
	}
The function must take an additional argument that gives the size of the memory pointed to by str when it coesn't contain a terminating nul character. The code still isn't too complicated but it is less than trivial. It's certainly more involved than, for example, the strlen function that computes the length of a nul-terminated string, and offers opportunities for mistakes. A function like this is less commonly used than a function that copies a string. The latest GCC source tree contains 23 calls to such functions, the Binutils/GDB tree 97, and the Linux kernel 145.

Small, commonly used functions like those above are the perfect candidates for inclusion in some sort of a library that provides other string utility functions. The utility of functions like those above has already been recognized by another ISO standard, namely ISO/IEC 9945, also known as IEEE Std 1003.1, 2017 Edition, or for short, POSIX. It specifies two functions with just these semantics:

	char *strdup (const char *s);
	char *strndup (const char *s, size_t size);
This is a proposal to incorporate these functions in C2X.

Besides POSIX implementations, the functions are available on a other systems, including but not limited to the following.

Since strnlen is commonly needed, widely available and easy to implement, including as a compiler intrinsic function for efficiency (GCC, for example, provides it among its extensive list of built-in functions -- see Other Built-in Functions Provided by GCC), this proposal suggests to add it to C2X.

Size Of Allocated Object

One somewhat unusual aspect of the proposed strndup function that may not be apparent from their specification is worth calling out. In the APPLICATION USAGE section, POSIX specifies the following.

Implementations are free to malloc() a buffer containing either (size + 1) bytes or (strnlen(s, size) + 1) bytes. Applications should not assume that strndup() will allocate (size + 1) bytes when strlen(s) is smaller than size.

Suggested Change

Add the following subsection just after §7.24.6.3 The strlen function. The text for the subsection below has been adopted with modifications from the corresponding decsription in POSIX.

7.24.6.? The strdup function

Synopsis
	#include <string.h>

	char *strdup(const char *s);
Description

The strdup function creates copy of the string pointed to by s in a space allocated as if by a call to malloc.

Returns

The strdup function returns a pointer to the first character of the duplicate string. The returned pointer can be passed to free. If no space can be allocated the strdup function returns a null pointer.

Furthermore, add the following subsection just after subsection added above. The text for the subsection below has been adopted with modifications from the corresponding decsription in POSIX.

7.24.6.? The strndup function

Synopsis
	#include <string.h>

	char *strndup(const char *s, size_t size);
Description

The strndup function creates a string initialized with no more than size initial characters of the array pointed to by s and up to the first null character, whichever comes first, in a space allocated as if by a call to malloc. If the array pointed to by s does not contain a null within the first size characters, a null is appended to the copy of the array.

Returns

The strndup function returns a pointer to the first character of the created string. The returned pointer can be passed to free. If no space can be allocated the strndup function returns a null pointer.