ISO/IEC JTC1 SC22 WG21 N3442=12-0132

Date:

Jeffrey Yasskin <jyasskin@google.com>

string_ref: a non-owning reference to a string

Overview

References to strings are very common in C++ programs, but often the callee doesn't care about the exact type of the object that owns the data. 3 things generally happen in this case:

  1. The callee takes std::string and insists that callers copy the data if it was originally owned by another type.
  2. The callee takes two parameters—a char* and a length (or just char* and assumes 0-termination)—and reduces the readability and safety of calls and loses any helper functions the original type provided.
  3. The callee is rewritten as a template and its implementation is moved to a header file. This can increase flexibility if the author takes the time to code to a weaker iterator concept, but it can also increase compile time and code size, and can even introduce bugs if the author misses an assumption that the argument's contents are contiguous.

Google and LLVM have independently implemented a string-reference type to encapsulate this kind of argument. string_ref is implicitly constructible from const char* and std::string. It provides most of the const member operations from std::string to ease conversion. This paper follows Chromium in extending string_ref to basic_string_ref<charT, traits> (Chromium omits traits). We provide typedefs to parallel the 4 basic_string typedefs.

Operations on string_ref apply to the characters in the string, and not the pointers that refer to the characters. This introduces the possibility that the underlying characters might change while a string_ref referring to them is in an associative container, which would break the container, but we believe this risk is worthwhile because it matches existing practice and matches user intentions more often.

Both Google's and LLVM's string_ref types extend the interface from std::string to provide some helpful utility functions:

Versions of std::string operations that take string_ref instead also give the standard a way to provide in-place operations on non-null-terminated byte/character sequences:

Inventions in this paper

Google's StringPiece provides as_string and ToString methods to convert to std::string. LLVM's StringRef provides both a str() explicit conversion and an implicit operator std::string(). Since this paper builds on top of C++11, we provide an explicit conversion operator.

Google's and LLVM's string_ref types provide a subset of std::string's searching operations, but they do provide pos arguments to specify where to start the search. Because string_ref::substr is much cheaper than string::substr, this paper removes the pos argument entirely.

None of the existing classes have constexpr methods.

Bikeshed!

What do we call this class?

Modifications vs std::string

The interface of string_ref is similar to, but not exactly the same as the interface of std::string. In general, we want to minimize differences between std::string and string_ref so that users can go back and forth between the two often. This section justifies the differences whose utility we think overcomes that general rule.

Additions

We should consider adding these methods to std::string in C++17, but we can't modify std::string in a TS, so this paper doesn't propose such changes.

Removals


Wording

?.1 Header <string_ref> synopsis [strings.string_ref.header]

namespace std {
namespace beta {
  template<typename charT, typename traits> class basic_string_ref;
  typedef basic_string_ref<char> string_ref;
  typedef basic_string_ref<char16_t> u16string_ref;
  typedef basic_string_ref<char32_t> u32string_ref;
  typedef basic_string_ref<wchar_t> wstring_ref;

  // numeric conversions
  int stoi(const string_ref& str, size_t* idx=0, int base=10);
  long stol(const string_ref& str, size_t* idx=0, int base=10);
  unsigned long stoul(const string_ref& str, size_t* idx=0, int base=10);
  long long stoll(const string_ref& str, size_t* idx=0, int base=10);
  unsigned long long stoull(const string_ref& str, size_t* idx=0, int base=10);
  float stof(const string_ref& str, size_t* idx=0);
  double stod(const string_ref& str, size_t* idx=0);
  long double stold(const string_ref& str, size_t* idx=0);

  int stoi(const wstring_ref& str, size_t* idx=0, int base=10);
  long stol(const wstring_ref& str, size_t* idx=0, int base=10);
  unsigned long stoul(const wstring_ref& str, size_t* idx=0, int base=10);
  long long stoll(const wstring_ref& str, size_t* idx=0, int base=10);
  unsigned long long stoull(const wstring_ref& str, size_t* idx=0, int base=10);
  float stof(const wstring_ref& str, size_t* idx=0);
  double stod(const wstring_ref& str, size_t* idx=0);
  long double stold(const wstring_ref& str, size_t* idx=0);
}
}

?.2 Class template basic_string_ref [strings.string_ref]

A string-like object that refers to a const sized piece of memory owned by another object.

We provide implicit constructors so users can pass in a const char* or a std::string wherever a string_ref is expected.

It is expected that user-defined string-like types will define an implicit conversion to string_ref (or another appropriate instance of basic_string_ref) to interoperate with functions that need to read strings.

Unlike std::strings and string literals, data() may return a pointer to a buffer that is not null-terminated. Therefore it is typically a mistake to pass data() to a routine that takes just a const charT* and expects a null-terminated string.

namespace std {
namespace beta {
  template<typename charT, typename traits>
  class basic_string_ref {
    public:
    // types
    typedef charT value_type;
    typedef const charT* pointer;
    typedef const charT& reference;
    typedef const charT& const_reference;
    typedef implementation_defined const_iterator;
    typedef const_iterator iterator;
    typedef std::reverse_iterator<const_iterator> const_reverse_iterator;
    typedef const_reverse_iterator reverse_iterator;
    typedef size_t size_type;
    typedef ptrdiff_t difference_type;
    static constexpr size_type npos = size_type(-1);

    // construct/copy
    constexpr basic_string_ref();
    constexpr basic_string_ref(const basic_string_ref&);
    basic_string_ref& operator=(const basic_string_ref&);
    basic_string_ref(const charT* str);
    template<typename Allocator>
    basic_string_ref(const basic_string<charT, traits, Allocator>& str);
    constexpr basic_string_ref(const charT* str, size_type len);
    basic_string_ref(initializer_list<charT> il);
    template<typename Allocator>
    explicit operator basic_string<charT, traits, Allocator>() const;


    // iterators
    constexpr const_iterator begin() const;
    constexpr const_iterator end() const;
    constexpr const_iterator cbegin() const;
    constexpr const_iterator cend() const;
    const_reverse_iterator rbegin() const;
    const_reverse_iterator rend() const;
    const_reverse_iterator crbegin() const;
    const_reverse_iterator crend() const;

    // capacity
    constexpr size_type size() const;
    constexpr size_type length() const;
    constexpr size_type max_size() const;
    constexpr bool empty() const;

    // element access
    constexpr const charT& operator[](size_type pos) const;
    const charT& at(size_t pos) const;
    constexpr const charT& front() const;
    constexpr const charT& back() const;
    constexpr const charT* data() const;

    // modifiers
    void clear();
    void remove_prefix(size_type n);
    void remove_suffix(size_type n);

    // basic_string_ref string operations
    constexpr basic_string_ref substr(size_type pos, size_type n=npos) const;
    int compare(basic_string_ref x) const;
    bool starts_with(basic_string_ref x) const;
    bool ends_with(basic_string_ref x) const;

    size_type find(basic_string_ref s) const;
    size_type find(charT c) const;
    size_type rfind(basic_string_ref s) const;
    size_type rfind(charT c) const;
    size_type find_first_of(basic_string_ref s) const;
    size_type find_first_of(charT c) const;
    size_type find_last_of(basic_string_ref s) const;
    size_type find_last_of(charT c) const;
    size_type find_first_not_of(basic_string_ref s) const;
    size_type find_first_not_of(charT c) const;
    size_type find_last_not_of(basic_string_ref s) const;
    size_type find_last_not_of(charT c) const;
  };

  // Comparison operators
  template<typename charT, typename traits>
  bool operator==(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
  template<typename charT, typename traits>
  bool operator!=(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
  template<typename charT, typename traits>
  bool operator<(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
  template<typename charT, typename traits>
  bool operator>(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
  template<typename charT, typename traits>
  bool operator<=(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
  template<typename charT, typename traits>
  bool operator>=(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);

  // Inserter
  template<class charT, class traits>
  basic_ostream<charT, traits>&
  operator<<(basic_ostream<charT, traits>& os,
             const basic_string_ref<charT,traits>& str);

  // Hashing
  template<> struct hash<string_ref>;
  template<> struct hash<u16string_ref>;
  template<> struct hash<u32string_ref>;
  template<> struct hash<wstring_ref>;
}
}

?.2.1 basic_string_ref constructors, assignment, and conversions [strings.string_ref.cons]

constexpr basic_string_ref();
Postconditions:
empty() == true
constexpr basic_string_ref(const basic_string_ref& other);
Postconditions:
data() == other.data() && size() == other.size()
basic_string_ref& operator=(const basic_string_ref& other);
Postconditions:
data() == other.data() && size() == other.size()
basic_string_ref(const charT* str);
Postconditions:
data() == str, and size() is 0 if str == nullptr or strlen(str) otherwise.
template<typename Allocator> basic_string_ref(const basic_string<charT, traits, Allocator>& str);
Postconditions:
data() == str.data() && size() == str.size()
constexpr basic_string_ref(const charT* str, size_type len);
Postconditions:
data() == str && size() == len
basic_string_ref(initializer_list<charT> il);
Postconditions:
data() == il.begin() && size() == il.size()
template<typename Allocator> explicit operator basic_string<charT, traits, Allocator>() const;
Returns:
basic_string<charT, traits, Allocator>(data(), size()).

[Note: The operator is explicit to avoid accidental O(N) operations on type mismatches. — end note]

?.2.2 basic_string_ref iterator support [strings.string_ref.iterators]

typedef implementation_defined const_iterator;

A random-access, contiguous iterator type.

For a string_ref str, any operation that invalidates a pointer in the range [str.data(), str.data+str.size()) invalidates pointers and iterators returned from str's methods.

typedef const_iterator iterator;

Because string_ref refers to a constant sequence, iterator and const_iterator are the same type.

constexpr const_iterator begin() const;
constexpr const_iterator cbegin() const;
Returns:
An iterator referring to the first character in the string.
constexpr const_iterator end() const;
constexpr const_iterator cend() const;
Returns:
An iterator which is the past-the-end value.
const_reverse_iterator rbegin() const;
const_reverse_iterator crbegin() const;
Returns:
An iterator which is semantically equivalent to reverse_iterator(end()).
const_reverse_iterator rend() const;
const_reverse_iterator crend() const;
Returns:
An iterator which is semantically equivalent to reverse_iterator(begin()).

?.2.3 basic_string_ref capacity [strings.string_ref.capacity]

constexpr size_type size() const;
constexpr size_type length() const;
Returns:
std::distance(begin(), end()), a count of the number of char-like objects currently in the string.
Complexity:
constant time.
constexpr size_type max_size() const;
Returns:
The size of the largest possible string.
Complexity:
constant time.
constexpr bool empty() const;
Returns:
size() == 0

?.2.4 basic_string element access [strings.string_ref.access]

constexpr const charT& operator[](size_type pos) const;
Requires:
pos < size()
Throws:
Nothing
Returns:
*(begin() + pos)
Complexity:
constant time.
constexpr const charT& at(size_type pos) const;
Throws:
out_of_range if pos >= size().
Returns:
operator[](pos)
Complexity:
constant time.
constexpr const charT& front() const;
Requires:
!empty()
Throws:
Nothing
Returns:
operator[](0)
constexpr const charT& back() const;
Requires:
!empty()
Throws:
Nothing
Returns:
operator[](size() - 1)
constexpr const charT* data() const;
Returns:
A pointer p such that p + i == &operator[](i) for each i in [0,size()). The range [data(),data() + size()) must be valid even for empty basic_string_refs..
Complexity:
constant time.
Requires:
The program shall not alter any of the values stored in the character array.

?.2.5 basic_string_ref modifiers [strings.string_ref.modifiers]

void clear();
Effects:
*this = basic_string_ref()
void remove_prefix(size_type n);
Effects:
*this = substr(n, npos);
void remove_suffix(size_type n);
Effects:
*this = substr(0, size() - n)

?.2.6 basic_string_ref string operations [strings.string_ref.ops]

Unlike std::string, string_ref provides no whole-string methods with position or length parameters. Instead, users should use the substr() method to create the character sequence they're actually interested in, and use that.

?.2.6.1 substr [strings.string_ref.ops.substr]
constexpr basic_string_ref substr(size_type pos, size_type len=npos) const;
Throws:
out_of_range if pos > size().
Effects:
Determines the effective length rlen of the string to copy as the smaller of len and size() - pos.
Returns:
basic_string_ref(data()+pos, rlen).
?.2.6.2 comparisons [strings.string_ref.ops.comparison]
int compare(basic_string_ref other) const;
Effects:
Determines the effective length rlen of the strings to compare as the smallest of size() and other.size(). The function then compares the two strings by calling traits::compare(data(), other.data(), rlen).
Returns:
The nonzero result if the result of the comparison is nonzero. Otherwise, returns a value as indicated in the following table:
compare() results
ConditionReturn Value
size() <  str.size()< 0
size() == str.size()  0
size() >  str.size()> 0
bool starts_with(basic_string_ref x) const;
Returns:
true if x is a prefix of *this.
bool ends_with(basic_string_ref x) const;
Returns:
true if x is a suffix of *this.
?.2.6.3 searching [strings.string_ref.ops.search]

All member functions in this section that take a charT argument c return the result of a call to the same-named member function with an argument of basic_string_ref(&c, 1).

size_type find(basic_string_ref str) const;
size_type find(charT c) const;
Effects:
Determines the lowest position xpos, if possible, such that both of the following conditions obtain:
  • xpos + str.size() <= size()
  • traits::eq(at(xpos+I), str.at(I)) for all elements I of the string referenced by str.
Returns:
xpos if the function can determine such a value for xpos. Otherwise, returns npos.
Remarks:
Uses traits::eq().
size_type rfind(basic_string_ref str) const;
size_type rfind(charT c) const;
Effects:
Determines the highest position xpos, if possible, such that both of the following conditions obtain:
  • xpos + str.size() <= size()
  • traits::eq(at(xpos+I), str.at(I)) for all elements I of the string referenced by str.
Returns:
xpos if the function can determine such a value for xpos. Otherwise, returns npos.
Remarks:
Uses traits::eq().
size_type find_first_of(basic_string_ref str) const;
size_type find_first_of(charT c) const;
Effects:
Determines the lowest position xpos, if possible, such that both of the following conditions obtain:
  • xpos <= size()
  • traits::eq(at(xpos+I), str.at(I)) for some element I of the string referenced by str.
Returns:
xpos if the function can determine such a value for xpos. Otherwise, returns npos.
Remarks:
Uses traits::eq().
size_type find_last_of(basic_string_ref str) const;
size_type find_last_of(charT c) const;
Effects:
Determines the highest position xpos, if possible, such that both of the following conditions obtain:
  • xpos <= size()
  • traits::eq(at(xpos+I), str.at(I)) for some element I of the string referenced by str.
Returns:
xpos if the function can determine such a value for xpos. Otherwise, returns npos.
Remarks:
Uses traits::eq().
size_type find_first_not_of(basic_string_ref str) const;
size_type find_first_not_of(charT c) const;
Effects:
Determines the lowest position xpos, if possible, such that both of the following conditions obtain:
  • xpos <= size()
  • traits::eq(at(xpos+I), str.at(I)) for no element I of the string referenced by str.
Returns:
xpos if the function can determine such a value for xpos. Otherwise, returns npos.
Remarks:
Uses traits::eq().
size_type find_last_not_of(basic_string_ref str) const;
size_type find_last_not_of(charT c) const;
Effects:
Determines the highest position xpos, if possible, such that both of the following conditions obtain:
  • xpos <= size()
  • traits::eq(at(xpos+I), str.at(I)) for no element I of the string referenced by str.
Returns:
xpos if the function can determine such a value for xpos. Otherwise, returns npos.
Remarks:
Uses traits::eq().

?.2.7 basic_string_ref non-member comparison operators [strings.string_ref.nonmem.compare]

template<typename charT, typename traits>
bool operator==(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
Returns:
x.compare(y) == 0
Remarks:
Equality can be implemented faster than comparison.
template<typename charT, typename traits>
bool operator!=(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
Returns:
x.compare(y) != 0
Remarks:
Inequality can be implemented faster than comparison.
template<typename charT, typename traits>
bool operator<(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
Returns:
x.compare(y) < 0
template<typename charT, typename traits>
bool operator>(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
Returns:
x.compare(y) > 0
template<typename charT, typename traits>
bool operator<=(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
Returns:
x.compare(y) <= 0
template<typename charT, typename traits>
bool operator>=(basic_string_ref<charT, traits> x, basic_string_ref<charT, traits> y);
Returns:
x.compare(y) >= 0

?.2.8 basic_string_ref inserter [strings.string_ref.io]

template<class charT, class traits>
basic_ostream<charT, traits>&
operator<<(basic_ostream<charT, traits>& os,
const basic_string_ref<charT,traits>& str);
Effects:
Behaves as a formatted output function (27.7.3.6.1). After constructing a sentry object, if this object returns true when converted to a value of type bool, determines padding as described in 22.4.2.2.2, then inserts the resulting sequence of characters seq as if by calling os.rdbuf()->sputn(seq, n), where n is the larger of os.width() and str.size(); then calls os.width(0).
Returns:
os

?.2.9 basic_string_ref hash [strings.string_ref.hash]

template<> struct hash<string_ref>;
template<> struct hash<u16string_ref>;
template<> struct hash<u32string_ref>;
template<> struct hash<wstring_ref>;
Requires:
the template specializations shall meet the requirements of class template hash

?.3 Numeric conversions [strings.conversions.numeric]

[Note: The functions below mirror C++11's stox() functions on strings. Because string_ref can cheaply change to refer to a smaller substring, we should also add a more convenient interface that advances the string_ref as it parses an integer out, with the rough signature stox_consume(string_ref&, int base=10). However, failure is expected in parsing functions, so it's not appropriate to raise an exception when no integer is available to be parsed. Existing practice is to use a signature like bool stox_consume(string_ref&, T& result, int base=10), but this requires a temporary integral variable to store the result in. If we get an optional<T> type in a TS, a better signature along the lines of optional<T> stox_consume(string_ref&, int base=10) becomes available. To avoid precluding the optional signature, I haven't included any of the consume variants in this paper. — end note]

int stoi(const string_ref& str, size_t* idx=0, int base=10);
long stol(const string_ref& str, size_t* idx=0, int base=10);
unsigned long stoul(const string_ref& str, size_t* idx=0, int base=10);
long long stoll(const string_ref& str, size_t* idx=0, int base=10);
unsigned long long stoull(const string_ref& str, size_t* idx=0, int base=10);
Returns:
stox(string(str), idx, base) where x is the type suffix of the function called.
float stof(const string_ref& str, size_t* idx=0);
double stod(const string_ref& str, size_t* idx=0);
long double stold(const string_ref& str, size_t* idx=0);
Returns:
stox(string(str), idx) where x is the type suffix of the function called.
int stoi(const wstring_ref& str, size_t* idx=0, int base=10);
long stol(const wstring_ref& str, size_t* idx=0, int base=10);
unsigned long stoul(const wstring_ref& str, size_t* idx=0, int base=10);
long long stoll(const wstring_ref& str, size_t* idx=0, int base=10);
unsigned long long stoull(const wstring_ref& str, size_t* idx=0, int base=10);
Returns:
stox(wstring(str), idx, base) where x is the type suffix of the function called.
float stof(const wstring_ref& str, size_t* idx=0);
double stod(const wstring_ref& str, size_t* idx=0);
long double stold(const wstring_ref& str, size_t* idx=0);
Returns:
stox(wstring(str), idx) where x is the type suffix of the function called.

Future work outside the scope of a TS