string_viewrange constructor should be
|Project:||Programming Language C++|
|Reply-to:||James Touton <firstname.lastname@example.org>|
P1989R2 added a new constructor to
basic_string_view that allows for implicit conversion from any contiguous range of the corresponding character type. This implicit conversion relies on the premise that a range of
char is inherently string-like. While that premise holds in some situations, it is hardly universally true, and the implicit conversion is likely to cause problems. This paper proposes making the conversion explicit instead of implicit in order to avoid misleading programmers.
P1391R3 (a precursor to P1989R2) justifies making the conversion implicit with the incorrect notion that "a contiguous range of character[s] is the same platonic thing as a
string_view", despite correctly pointing out that "[ranges] with different [traits types] should not be implicitly convertible". The latter acknowledgment recognizes that there are semantic nuances here beyond the value type, and as a result, no direct conversion is provided from range types having a mismatched
One such semantic difference between a string and an arbitrary range of
char is mentioned in P1391R3 (lightly modified for correctness):
char const t = "text"; std::string_view s1(t); // s1.size() == 4; std::span<char const> tv(t); std::string_view s2(tv); // s2.size() == 5;
s2 are constructed from equivalent ranges of
const char, but the resulting
string_view objects are different. This is because overload resolution for the array argument selects
string_view's constructor from
const char*, a type which by convention points to a string followed by a null terminator. The terminator is not semantically part of the string, so the resulting
string_view doesn't include it. The span, by contrast, does include the null terminator.
Laudably, P1989R2 recognizes several mechanisms by which a type may indicate that it provides string-like data, and the range constructor is disabled in these cases:
traits_type, and that type differs from the string view's
The presence of these mechanisms refutes the notion that "a contiguous range of character[s] is the same platonic thing as a
string_view". Nonetheless, it is certainly true that constructing a
string_view from a range of
char is a useful operation, provided that the user knows that the entire range actually constitutes a string. This paper therefore proposes to keep the range constructor, but make it
Very often, a contiguous range of
char is used as a buffer for storing string data. This does not imply that the entire range constitutes a string:
extern void get_string(std::span<char> buffer); extern void use_string(std::string_view str); char buf; get_string(buf); use_string(buf);
This code is representative of quite a lot of real-world code that exists today. The
get_string function fills a portion of a buffer with a null-terminated string, and the
use_string function consumes that string. This code works in C++20, and would also work in C++17 with a minor modification to
get_string to pass the buffer as a pointer and size instead of as a span. This code will continue to work in the presence of P1989R2; the range constructor is disabled because the array is convertible to
const char* (and even if it weren't disabled, overload resolution would prefer the
const char* constructor anyway).
Many code style guidelines emphasize the use of
std::array over raw arrays, so let's make that change:
extern void get_string(std::span<char> buffer); extern void use_string(std::string_view str); std::array<char, 200> buf; get_string(buf); use_string(buf); // oops
The code compiles and runs, and in many cases will appear to work, but where the length of the
string_view parameter used to be inferred from the presence of a null terminator, it is now unavoidably the size of the entire buffer, and unquestionably wrong given that the prior code was correct. If the range constructor were
explicit, this code would generate an error diagnostic.
The same sort of thing can easily happen with
vectors. For instance, an API might require the user to invoke a function that provides an estimate for a buffer size, which the user then allocates before calling another function that fills the buffer. The estimate may return a size greater than that actually needed by the resulting string if calculating the exact size would be expensive:
extern size_t estimate_string_size(); extern void get_string(std::span<char> buffer); extern void use_string(std::string_view str); size_t estimated_size = estimate_string_size(); std::vector<char> buf(estimated_size); get_string(buf); use_string(buf); // oops
P1391R3 states: "We think this proposed design is consistent with existing practices of having to be explicit about the size in the presence of embedded nulls[.]" This paper respectfully disagrees.
The intent of P1989R2 is to allow for conversion from a range to a string view. LEWG has already decided that this is a good idea, and this paper concurs. Removing the range constructor would be counter-productive, but keeping it in its current form is also problematic. That leaves us with a couple of options.
This is the preferred approach of this paper. This approach preserves the functionality gains offered by P1989R2 while making it harder to invoke the conversion by accident. Users who know that the source range actually represents a string can still take advantage of the conversion. Consider the
vector example above, but with
get_string modified to return the number of characters written to the buffer:
extern size_t estimate_string_size(); extern size_t get_string(std::span<char> buffer); extern void use_string(std::string_view str); size_t estimated_size = estimate_string_size(); std::vector<char> buf(estimated_size); size_t actual_size = get_string(buf); buf.resize(actual_size); use_string(std::string_view(buf)); // ok
If the source type defines its own
traits_type, and that type is the same as the string view's
traits_type, then the source range can reasonably be assumed to represent a string. This appears to be a good approach, but does add a small amount of complexity to the specification and may be a more difficult rule to teach than Option 1. This paper is not opposed to Option 2.
explicitand remove the
This modifies either Option 1 or Option 2 by additionally removing the constraint that the source range's
traits_type (if present) must match the string view's
traits_type. Given that the constructor is already
explicit, the user is already primed to expect that the resulting string view is not semantically equivalent to the source range in every respect. Moreover, the name
traits_type is somewhat generic; there's nothing in that name that implies the traits are string traits.
This change would allow for explicit conversion from a string or string view with dissimilar traits. This paper agrees with
P1391R3 that "strings with different [traits types] should not be implicitly convertible", but an explicit conversion may be sensible. This paper does not attempt to explore the consequences of this design, and so this approach is not recommended.
All modifications are presented relative to N4901.
Modify §126.96.36.199 string.view.template.general and the corresponding heading prior to §188.8.131.52 string.view.cons paragraph 11:
template<class R> constexpr basic_string_view(R&& r);