Document number: N3334=12-0024

Date:

Jeffrey Yasskin <jyasskin@google.com>

Source at http://code.google.com/p/cxx1y-array-string-ref/

Proposing array_ref<T> and string_ref

Overview

References to arrays and strings are very common in C++ programs, but often the callee doesn't care about the exact type of the object that owns the data. 3 things generally happen in this case:

  1. The callee picks a particular type—a vector, say—and insists that callers copy the data if it was originally in another format.
  2. The callee takes two parameters—a pointer and a length—and reduces the readability and safety of calls and loses any helper functions the original type provided.
  3. The callee is rewritten as a template and its implementation is moved to a header file. This can increase flexibility if the author takes the time to code to a weaker iterator concept, but it can also increase compile time and code size, and can even introduce bugs if the author misses an assumption that the argument's contents are contiguous.

Google and LLVM have independently implemented two types to encapsulate this kind of argument. array_ref<T> is implicitly constructible from several standard array-like types and stores a contiguous range of addresses. It provides the subset of operations from std::array<> and std::vector<> that work on immutable sequences. Because array_ref stores pointers rather than a reference to the original array, it also supports a few operations to modify the current view.

string_ref is implicitly constructible from const char* and std::string. It provides most of the member operations from std::string to ease conversion. This paper follows Chromium in extending string_ref to basic_string_ref<charT, traits> (Chromium omits traits). We provide typedefs to parallel the 4 basic_string typedefs.

Both Google's and LLVM's string_ref types extend the interface from std::string to provide some helpful utility functions:

Versions of std::string operations that take string_ref instead also give the standard a way to provide in-place operations on non-null-terminated byte/character sequences:

Note that some of the code samples in this paper don't use full C++11 conventions because they're extracted from code that actually compiles, and current compilers don't yet support the whole C++11 language.

Inventions in this paper

Google's StringPiece provides as_string and ToString methods to convert to std::string. LLVM's StringRef provides both a str() explicit conversion and an implicit operator std::string(), and similarly for ArrayRef to std::vector. Since this paper builds on top of C++11, we provide explicit conversion operators in addition to two named conversions vec() and str().

Google's and LLVM's string_ref types provide a subset of std::string's searching operations, but they do provide 'pos' arguments to specify where to start the search. Because string_ref::substr is much cheaper than string::substr, this paper removes the pos argument entirely.

None of the existing classes have constexpr methods.

Bikeshed!

What do we call these?

std::array_ref:

std::string_ref:

std::array_ref

An array_ref<T> represents an immutable array of size() elements of type T. The storage for the array is *not* owned by the array_ref object, and clients must arrange for the backing store to remain live while the array_ref object is in use.

Implicit conversion operations are provided from types with contiguous iterators like std::vector, std::string, std::array, and primitive arrays. array_ref objects are invalidated by any operation that invalidates their underlying pointers.

One common use for array_ref is when passing arguments to a routine where you want to be able to accept a variety of array types. The usual approach here is to have the client explicitly pass in a pointer and a length, as in:

void MyOldRoutine(const int* elems, int N) {
  for (int i = 0; i < N; i++) { assert(elems[i] == N - i); }
}

Unfortunately, this leads to ugly and error-prone code at the call site:

std::vector<int> my_vector;
for (size_t i = 5; i > 0; --i) { my_vector.push_back(i); }
MyOldRoutine(my_vector.data(), my_vector.size());

std::array<int, 4> my_std_array = {{4, 3, 2, 1}};
MyOldRoutine(my_std_array.data(), my_std_array.size());

int my_array[10] = {10, 9, 8, 7, 6, 5, 4, 3, 2, 1};
MyOldRoutine(my_array, 10);

int* dyn_array = new int[3];
for (size_t i = 0; i < 3; ++i) { dyn_array[i] = 3 - i; }
MyOldRoutine(dyn_array, 3);

Instead, you can use an array_ref as the argument to the routine:

void MyNewRoutine(std::array_ref<int> a) {
  for (size_t i = 0; i < a.size(); i++) { assert(a[i] == int(a.size() - i)); }
}

This makes the call sites cleaner, for the most part:

std::vector<int> my_vector;
for (int i = 5; i > 0; --i) { my_vector.push_back(i); }
MyNewRoutine(my_vector);

std::array<int, 4> my_std_array = {{4, 3, 2, 1}};
MyNewRoutine(my_std_array);

int my_array[10] = {10, 9, 8, 7, 6, 5, 4, 3, 2, 1};
MyNewRoutine(my_array);

int* dyn_array = new int[3];
for (size_t i = 0; i < 3; ++i) { dyn_array[i] = 3 - i; }
MyNewRoutine(std::array_ref<int>(dyn_array, 3));

Todo

The existing array_ref classes make the view const. It may be useful to extend that to allow modifications of the referenced array elements, and use array_ref<const T> for immutable views.

Class definition

namespace std {
  template<typename T>
  class array_ref {
    // types
    typedef T value_type;
    typedef const T * pointer;
    typedef const T & reference;
    typedef const T & const_reference;
    typedef implementation_defined const_iterator;
    typedef const_iterator iterator;
    typedef std::reverse_iterator< const_iterator > const_reverse_iterator;
    typedef const_reverse_iterator reverse_iterator;
    typedef size_t size_type;
    typedef ptrdiff_t difference_type;

    // construct/copy
    constexpr array_ref();
    constexpr array_ref(const array_ref &);
    array_ref & operator=(const array_ref &);
    constexpr array_ref(const T * array, size_t length);
    array_ref(const vector< T > & v);
    template<size_t N>
    constexpr array_ref(const T(&a)[N]);
    template<size_t N>
    constexpr array_ref(const std::array< T, N > & a);
    constexpr array_ref substr(size_type pos, size_type n=size_type(-1)) const;

    // iterators
    constexpr const_iterator begin() const;
    constexpr const_iterator end() const;
    constexpr const_iterator cbegin() const;
    constexpr const_iterator cend() const;
    const_reverse_iterator rbegin() const;
    const_reverse_iterator rend() const;
    const_reverse_iterator crbegin() const;
    const_reverse_iterator crend() const;

    // capacity
    constexpr size_type size() const;
    constexpr size_type max_size() const;
    constexpr bool empty() const;

    // element access
    constexpr const T & operator[](size_t i) const;
    constexpr const T & at(size_t i) const;
    constexpr const T & front() const;
    constexpr const T & back() const;
    constexpr const T * data() const;

    // Outgoing conversion operators
    explicit operator vector< T >() const;
    vector< T > vec() const;
    template<typename traits, typename Allocator>
    explicit operator basic_string< T, traits, Allocator >() const;
    basic_string< T > str() const;

    // mutators
    void clear();
    void remove_prefix(size_type n);
    void remove_suffix(size_type n);
    void pop_back();
    void pop_front();
  };

  // deducing constructor wrappers
  template<typename T>
  constexpr array_ref< T > make_array_ref(const T * array, size_t length);
  template<typename T, size_t N>
  constexpr array_ref< T > make_array_ref(const T(&a)[N]);
  template<typename T>
  array_ref< T > make_array_ref(const vector< T > & v);
}

Member semantics

deducing constructor wrappers

These functions do the same thing as the constructor with the same signature. They just allow users to avoid writing the iterator type.

types

typedef const T * pointer;
Todo

Should the pointer type be configurable as a template argument?

typedef implementation_defined const_iterator;

random-access, contiguous iterator type

typedef const_iterator iterator;

Because array_ref controls a constant sequence, iterator and const_iterator are the same type.

construct/copy

constexpr array_ref();
Postcondition:

empty() == true

array_ref(const vector< T > & v);
Todo

Arguably, this conversion should be a std::vector conversion operator.

template<size_t N> constexpr array_ref(const std::array< T, N > & a);
Todo

Arguably, this conversion should be a std::array conversion operator.

constexpr array_ref substr(size_type pos, size_type n=size_type(-1)) const;
Todo

See basic_string_ref::substr for interface questions. We want something like this on array_ref, but probably not with this name.

element access

constexpr const T * data() const;
Returns:

A pointer such that [data(),data() + size()) is a valid range. For a non-empty array_ref, data() == &front().

Outgoing conversion operators

These functions provide explicit conversions to selected other contiguous sequence types using those types' iterator-range constructors. We provide both explicit conversion operators for use in variable initialization and short member functions for use in function calls.

The operators are explicit to avoid accidental O(N) operations on type mismatches.

explicit operator vector< T >() const;
Todo

Arguably, this conversion should be a std::vector constructor.

template<typename traits, typename Allocator> explicit operator basic_string< T, traits, Allocator >() const;
Todo

Arguably, this conversion should be a std::basic_string constructor.

mutators

void clear();
Effects:

Resets *this to its default-constructed state.

void remove_prefix(size_type n);
Effects:

Advances the start pointer of this array_ref past n elements without moving the end pointer.

void remove_suffix(size_type n);
Effects:

Moves the end pointer of this array_ref earlier by n elements without moving the start pointer.

void pop_back();
Effects:

remove_suffix(1)

void pop_front();
Effects:

remove_prefix(1)

std::basic_string_ref

A string-like object that points to a sized piece of memory, similar to an array_ref<charT>.

We provide implicit constructors so users can pass in a const char* or a std::string wherever a string_ref is expected.

It is expected that user-defined string-like types will define an implicit conversion to string_ref (or another appropriate instance of basic_string_ref) to interoperate with functions that need to read strings.

Unlike std::strings and string literals, data() may return a pointer to a buffer that is not null-terminated. Therefore it is typically a mistake to pass data() to a routine that takes just a const charT* and expects a null-terminated string.

std::hash specializations shall be provided for string_ref, wstring_ref, u16string_ref, and u32string_ref.

Methods on basic_string_ref<charT> with the same name as methods on array_ref<charT> are intended to have the same semantics unless the difference is noted specifically.

Todo

Should string_ref inherit from array_ref? It seems like a straightforward extension, but there's a risk in that some of basic_string_ref<charT, traits>'s behavior depends on that traits parameter, which isn't propagated to the array_ref<charT> base class. If traits::eq isn't simply operator==(charT), this would cause the behavior to depend on whether a basic_string_ref has been upcasted to array_ref. For the 4 specializations of char_traits<> defined by the standard, this is guaranteed, but it's not necessarily true for user-defined types.

Class definition

namespace std {
  template<typename charT, typename traits>
  class basic_string_ref {
    // types
    typedef charT value_type;
    typedef const charT * pointer;
    typedef const charT & reference;
    typedef const charT & const_reference;
    typedef implementation_defined const_iterator;
    typedef const_iterator iterator;
    typedef std::reverse_iterator< const_iterator > const_reverse_iterator;
    typedef const_reverse_iterator reverse_iterator;
    typedef size_t size_type;
    typedef ptrdiff_t difference_type;
    static constexpr size_type npos = size_type(-1);

    // construct/copy
    constexpr basic_string_ref();
    constexpr basic_string_ref(const basic_string_ref &);
    basic_string_ref & operator=(const basic_string_ref &);
    basic_string_ref(const charT * str);
    template<typename Allocator>
    basic_string_ref(const basic_string< charT, traits, Allocator > & str);
    constexpr basic_string_ref(const charT * str, size_type len);

    // iterators
    constexpr const_iterator begin() const;
    constexpr const_iterator end() const;
    constexpr const_iterator cbegin() const;
    constexpr const_iterator cend() const;
    const_reverse_iterator rbegin() const;
    const_reverse_iterator rend() const;
    const_reverse_iterator crbegin() const;
    const_reverse_iterator crend() const;

    // capacity
    constexpr size_type size() const;
    constexpr size_type max_size() const;
    constexpr bool empty() const;
    constexpr size_type length() const;

    // element access
    constexpr const charT & operator[](size_t i) const;
    const charT & at(size_t i) const;
    constexpr const charT & front() const;
    constexpr const charT & back() const;
    constexpr const charT * data() const;

    // Outgoing conversion operators
    constexpr operator array_ref< const charT >() const;
    template<typename Allocator>
    explicit operator basic_string< charT, traits, Allocator >() const;
    basic_string< charT, traits > str() const;

    // mutators
    void clear();
    void remove_prefix(size_type n);
    void remove_suffix(size_type n);
    void pop_back();
    void pop_front();

    // string operations with the same semantics as std::basic_string
    int compare(basic_string_ref x) const;
    constexpr basic_string_ref substr(size_type pos, size_type n=npos) const;
    size_type copy(charT * buf) const;
    size_type find(basic_string_ref s) const;
    size_type find(charT c) const;
    size_type rfind(basic_string_ref s) const;
    size_type rfind(charT c) const;
    size_type find_first_of(basic_string_ref s) const;
    size_type find_first_of(charT c) const;
    size_type find_first_not_of(basic_string_ref s) const;
    size_type find_first_not_of(charT c) const;
    size_type find_last_of(basic_string_ref s) const;
    size_type find_last_of(charT c) const;
    size_type find_last_not_of(basic_string_ref s) const;
    size_type find_last_not_of(charT c) const;

    // new string operations
    bool starts_with(basic_string_ref x) const;
    bool ends_with(basic_string_ref x) const;
  };

  // Common specializations:
  typedef basic_string_ref< char > string_ref;
  typedef basic_string_ref< char16_t > u16string_ref;
  typedef basic_string_ref< char32_t > u32string_ref;
  typedef basic_string_ref< wchar_t > wstring_ref;

  // Comparison operators
  template<typename charT, typename traits>
  bool operator==(basic_string_ref< charT, traits > x, basic_string_ref< charT, traits > y);
  template<typename charT, typename traits>
  bool operator!=(basic_string_ref< charT, traits > x, basic_string_ref< charT, traits > y);
  template<typename charT, typename traits>
  bool operator<(basic_string_ref< charT, traits > x, basic_string_ref< charT, traits > y);
  template<typename charT, typename traits>
  bool operator>(basic_string_ref< charT, traits > x, basic_string_ref< charT, traits > y);
  template<typename charT, typename traits>
  bool operator<=(basic_string_ref< charT, traits > x, basic_string_ref< charT, traits > y);
  template<typename charT, typename traits>
  bool operator>=(basic_string_ref< charT, traits > x, basic_string_ref< charT, traits > y);

  // numeric conversions
  int stoi(const string_ref & str, size_t * idx=0, int base=10);
  long stol(const string_ref & str, size_t * idx=0, int base=10);
  unsigned long stoul(const string_ref & str, size_t * idx=0, int base=10);
  long long stoll(const string_ref & str, size_t * idx=0, int base=10);
  unsigned long long stoull(const string_ref & str, size_t * idx=0, int base=10);
  float stof(const string_ref & str, size_t * idx=0);
  double stod(const string_ref & str, size_t * idx=0);
  long double stold(const string_ref & str, size_t * idx=0);
  int stoi(const wstring_ref & str, size_t * idx=0, int base=10);
  long stol(const wstring_ref & str, size_t * idx=0, int base=10);
  unsigned long stoul(const wstring_ref & str, size_t * idx=0, int base=10);
  long long stoll(const wstring_ref & str, size_t * idx=0, int base=10);
  unsigned long long stoull(const wstring_ref & str, size_t * idx=0, int base=10);
  float stof(const wstring_ref & str, size_t * idx=0);
  double stod(const wstring_ref & str, size_t * idx=0);
  long double stold(const wstring_ref & str, size_t * idx=0);
}

Member semantics

Comparison operators

template<typename charT, typename traits> bool operator==(basic_string_ref< charT, traits > x, basic_string_ref< charT, traits > y);

Note that equality can be implemented faster than comparison.

numeric conversions

Each function stox in this section behaves as if it calls

stox(str.str(), idx, base)
Ideally with less copying.

Todo

Instead of passing the idx parameter, we could define stox_consume(str, base) as
size_t idx;
auto result = stox(str.str(), &idx, base);
str.remove_prefix(idx);
return result;

types

typedef implementation_defined const_iterator;

random-access, contiguous iterator type

typedef const_iterator iterator;

Because string_ref controls a constant sequence, iterator and const_iterator are the same type.

construct/copy

std::string's substring constructor is omitted because string_ref(str).substr(pos, n) is clearer

constexpr basic_string_ref();
Postcondition:

empty() == true

basic_string_ref(const charT * str);
Postcondition:

data() == str, and size() is 0 if str == nullptr or strlen(str) otherwise.

template<typename Allocator> basic_string_ref(const basic_string< charT, traits, Allocator > & str);

Postcondition:

data() == str.data() && size() == str.size()

Todo

Arguably, this conversion should be a std::basic_string conversion operator.

constexpr basic_string_ref(const charT * str, size_type len);
Postcondition:

data() == str && size() == len

capacity

constexpr size_type length() const;

Alias for size() for string compatibility.

element access

constexpr const charT * data() const;
Returns:

A pointer such that [data(),data() + size()) is a valid range. For a non-empty basic_string_ref, data() == &front().

Outgoing conversion operators

constexpr operator array_ref< const charT >() const;
Returns:

An array_ref for this basic_string_ref's sequence of characters.

template<typename Allocator> explicit operator basic_string< charT, traits, Allocator >() const;

Returns:

A new string containing the sequence referenced by *this.

The operator is explicit to avoid accidental O(N) operations on type mismatches.

Unlike array_ref, this operator preserves the traits type.

Todo

Arguably, this conversion should be a std::basic_string constructor.

basic_string< charT, traits > str() const;

Returns:

basic_string<charT, traits>(*this);

Unlike array_ref, this function preserves the traits type.

mutators

void clear();
Effects:

Resets *this to its default-constructed state.

void remove_prefix(size_type n);
Effects:

Advances the start pointer of this basic_string_ref past n elements without moving the end pointer.

void remove_suffix(size_type n);
Effects:

Moves the end pointer of this basic_string_ref earlier by n elements without moving the start pointer.

void pop_back();
Effects:

remove_suffix(1)

void pop_front();
Effects:

remove_prefix(1)

string operations with the same semantics as std::basic_string

Unlike std::string, string_ref provides no whole-string methods with position or length parameters. Instead, use the substr() method to create the character sequence you're actually interested in, and use that.

constexpr basic_string_ref substr(size_type pos, size_type n=npos) const;
Todo

It would be nice to have a function that accepts negative arguments and interprets them like Python, Ruby, and JavaScript? However, substr() takes a start and a length, not a start and an end, which prevents this from being a simple extension. Should we add slice() and deprecate substr()?

size_type copy(charT * buf) const;

Copy is more likely than the other methods to be used with an n parameter since it can be the valid length of the character buffer. Still this->copy(buf, n) can be replaced with this->substr(0, n).copy(buf), so I've removed the n from here too.

new string operations

bool starts_with(basic_string_ref x) const;
Returns:

true if x is a prefix of *this.

bool ends_with(basic_string_ref x) const;
Returns:

true if x is a suffix of *this.

Miscellaneous other changes

Certain other pieces of the library should change to make life easier for array_ref<> and string_ref.

Future work

Open questions

Group numeric conversions
Instead of passing the idx parameter, we could define stox_consume(str, base) as
size_t idx;
auto result = stox(str.str(), &idx, base);
str.remove_prefix(idx);
return result;
Class std::array_ref< T >
The existing array_ref classes make the view const. It may be useful to extend that to allow modifications of the referenced array elements, and use array_ref<const T> for immutable views.
Member std::array_ref< T >::array_ref (const std::array< T, N > &a)
Arguably, this conversion should be a std::array conversion operator.
Member std::array_ref< T >::array_ref (const vector< T > &v)
Arguably, this conversion should be a std::vector conversion operator.
Member std::array_ref< T >::operator basic_string< T, traits, Allocator > () const
Arguably, this conversion should be a std::basic_string constructor.
Member std::array_ref< T >::operator vector< T > () const
Arguably, this conversion should be a std::vector constructor.
Member std::array_ref< T >::pointer
Should the pointer type be configurable as a template argument?
Member std::array_ref< T >::substr (size_type pos, size_type n=size_type(-1)) const
See basic_string_ref::substr for interface questions. We want something like this on array_ref, but probably not with this name.
Class std::basic_string_ref< charT, traits >
Should string_ref inherit from array_ref? It seems like a straightforward extension, but there's a risk in that some of basic_string_ref<charT, traits>'s behavior depends on that traits parameter, which isn't propagated to the array_ref<charT> base class. If traits::eq isn't simply operator==(charT), this would cause the behavior to depend on whether a basic_string_ref has been upcasted to array_ref. For the 4 specializations of char_traits<> defined by the standard, this is guaranteed, but it's not necessarily true for user-defined types.
Member std::basic_string_ref< charT, traits >::basic_string_ref (const basic_string< charT, traits, Allocator > &str)
Arguably, this conversion should be a std::basic_string conversion operator.
Member std::basic_string_ref< charT, traits >::operator basic_string< charT, traits, Allocator > () const
Arguably, this conversion should be a std::basic_string constructor.
Member std::basic_string_ref< charT, traits >::substr (size_type pos, size_type n=npos) const
It would be nice to have a function that accepts negative arguments and interprets them like Python, Ruby, and JavaScript? However, substr() takes a start and a length, not a start and an end, which prevents this from being a simple extension. Should we add slice() and deprecate substr()?