Improved integration with C arrays and strings

Author: Thorsten Ottosen
Contact: thorsten.ottosen@dezide.com
Organizations:Dezide Aps
Date: 2007-03-08
Number:WG21/N2225 and J16/07-0085
Working Group:Library

Table of Contents

1   Introduction

Quite a few libraries start out as a C library and is then wrapped in a thing C++ layer for convenience and type-safety. However, standard class templates std::vector and std::basic_string does not allow us to seamlessly wrap a C API without great loss of efficiency. This is very embarrassing.

std::basic_string has already paid the price for C compatibility by storing a terminating null, thus allowing us to get a C compatible string with basic_string::c_str(). However, the compatibility is one way: from C++ to C. When going from C to C++ we are forced to copy the entire buffer.

The same annoyance comes up with heap-allocated arrays: because we are forced to copy a C array into an std::vector object, we pay a huge price performance wise if we want to upgrade legacy code to a more maintainable style.

The fix is easy: add two new member functions to std::vector and std::basic_string that gives us explicit control over the buffer. This example shows how to take ownership of a C string, manipulate it with the C++ interface, and then give up ownership again:

//
// Define a compatible string
//
typedef std::basic_string<char,malloc_allocator<char>> String

//
// Step 1: acquire buffer
//
char* str = get_c_string();
String new_string;
new_string.acquire_buffer( str, std::strlen( str ) );    
assert( str == 0 );

//
// Step 2: manipulate buffer
//
new_string.replace( ... );

//
// Step 3: give up ownership
//
str = new_string.release_buffer();
assert( new_string.empty() );
free( str );

In addition to improving interoperability with C, legacy C++ code will also benefit because we can almost directly replace new'ed arrays/strings with std::vector/std::string.

2   Wording

2.1   Synopsis for basic_string

Add the following members to the synopsis of basic_string:

void    acquire_buffer( charT*& newBuffer, size_type length, const Allocator& a );
charT*  release_buffer();

2.2   Specification for basic_string

Add the following specification after the synopsis of basic_string:

void acquire_buffer( charT*& newBuffer, size_type length, const Allocator& a = Allocator() );

  • Preconditions: newBuffer is a null-terminated buffer of length length (not counting null) . a.deallocate() can be used to reclaim the memory of the buffer.
  • Effects: deallocates the current buffer, copies a into the container's allocator and sets newBuffer as the new buffer.
  • Postconditions: size() == length && size() == capacity() && newBuffer == 0
  • Remarks: if an exception is thrown, newBuffer is deallocated with a.deallocate(newBuffer,length+1).

charT* release_buffer();

  • Effects: releases ownership of the internal buffer and returns it.
  • Postconditions: empty()
  • Throws: Nothing
  • Remarks: get_allocator() may be used to obtain an allocator that can deallocate the buffer.

2.3   Synopsis for vector

Add the following members to the synopsis of vector:

void  acquire_buffer( T*& newBuffer, size_type length, const Allocator& a = Allocator() );
T*    release_buffer();

2.4   Specification for vector

Add the following specification after the synopsis of vector:

void acquire_buffer( T*& newBuffer, size_type length, const Allocator& a = Allocator() );

  • Preconditions: newBuffer is a buffer of length length. a.deallocate() can be used to reclaim the memory of the buffer.
  • Effects: deallocates the current buffer, copies a into the container's allocator and sets newBuffer as the new buffer.
  • Postconditions: size() == length && size() == capacity() && newBuffer == 0
  • Remarks: if an exception is thrown, newBuffer is deallocated with a.deallocate(newBuffer,length).

T* release_buffer();

  • Effects: releases ownership of the internal buffer and returns it.
  • Postconditions: empty()
  • Throws: Nothing
  • Remarks: get_allocator() may be used to obtain an allocator that can deallocate the buffer.