1. Motivation
Performance sensitive code is impacted by the cost of initializing and
manipulating strings. When streaming data into a 
- 
     Pay for extra initialization — resize 
- 
     Pay for extra copies — Populate a temporary buffer, copy it to the string. 
- 
     Pay for extra "bookkeeping" — reserve 
The effects of these unhappy choices can all be measured at scale, yet C++'s hallmark is to leave no room between the language (or in this case the library) and another language.
LEWGI polled on this at the [SAN] Meeting:
We should promise more committee time to [P1072R1], knowing that our time is scarce and this will leave less time for other work?
Unanimous consent
LEWG at the [SAN] Meeting:
We want to solve this problem
SF F N A SA 17 5 0 0 0 
2. Proposal
This proposal addresses the problem by adding 
void resize_default_init(size_type n);
- Effects: Alters the value of
as follows:* this 
- — If
, erases the lastn <= size () elements.size () - n - — If
, appendsn > size () default initialized elements.n - size () 
In order to enable 
3. Examples
3.1. Stamping a Pattern
Consider writing a pattern several times into a string:
std :: string GeneratePattern ( const std :: string & pattern , size_t count ) { std :: string ret ; ret . reserve ( pattern . size () * count ); for ( size_t i = 0 ; i < count ; i ++ ) { // SUB-OPTIMAL: // * Writes 'count' nulls // * Updates size and checks for potential resize 'count' times ret . append ( pattern ); } return ret ; } 
Alternatively, we could adjust the output string’s size to its final size,
avoiding the bookkeeping in 
std :: string GeneratePattern ( const std :: string & pattern , size_t count ) { std :: string ret ; const auto step = pattern . size (); // SUB-OPTIMAL: We memset step*count bytes only to overwrite them. ret . resize ( step * count ); for ( size_t i = 0 ; i < count ; i ++ ) { // GOOD: No bookkeeping memcpy ( ret . data () + i * step , pattern . data (), step ); } return ret ; } 
With this proposal:
std :: string GeneratePattern ( const std :: string & pattern , size_t count ) { std :: string ret ; const auto step = pattern . size (); // GOOD: No initialization ret . resize_default_init ( step * count ); for ( size_t i = 0 ; i < count ; i ++ ) { // GOOD: No bookkeeping memcpy ( ret . data () + i * step , pattern . data (), step ); } return ret ; } 
3.2. Interacting with C
Consider wrapping a C API while working in terms of C++'s 
extern "C" { int compress ( void * out , size_t * out_size , const void * in , size_t in_size ); size_t compressBound ( size_t bound ); } std :: string CompressWrapper ( std :: string_view input ) { std :: string compressed ; // Compute upper limit of compressed input. size_t size = compressBound ( input . size ()); // SUB-OPTIMAL: Extra initialization compressed . resize ( size ); int ret = compress ( compressed . begin (), & size , input . data (), input . size ()); if ( ret != OK ) { throw ... some error ... } // Set actual size. compressed . resize ( size ); return compressed ; } 
With this proposal:
extern "C" { int compress ( void * out , size_t * out_size , const void * in , size_t in_size ); size_t compressBound ( size_t bound ); } std :: string CompressWrapper ( std :: string_view input ) { std :: string compressed ; // Compute upper limit of compressed input. size_t size = compressBound ( input . size ()); // GOOD: No initialization compressed . resize_default_init ( size ); int ret = compress ( compressed . begin (), & size , input . data (), input . size ()); if ( ret != OK ) { throw ... some error ... } // Set actual size. compressed . resize ( size ); return compressed ; } 
4. Design Considerations
4.1. Method vs. Alternatives
- 
     It is simple. 
- 
     It matches existing practice. See §6.1 Google, §6.4 Discussion on std-proposals. 
- 
     Uninitialized buffers are not uncommon in C++. See for example new T [ 20 ] make_unique_default_init 
- 
     Dynamic analyzers like Memory Sanitizer [MSan] and Valgrind [Valgrind] can catch uninitialized reads. 
During the [SAN] Meeting LEWG expressed a preference for
implementing this functionality as a new method on 
Method on string vs storage_buffer type:
Strong Method Method Neutral Type Strong Type 9 2 3 2 2 
Several other alternatives were discussed at [SAN] and on the reflector. Please see §5 Alternatives Considered below for more details.
4.2. Tag Type
At the [SAN] Meeting, LEWG showed some support for a tag argument type:
Approval (vote for as many as you find acceptable)
13 Go back to resize_uninitialized 15 Do tag type (default_initialize) for c’tor / resize() 12 Continue with storage_buffer (as per R2 of this paper) 7 Crack open with a lambda 7 RAII separate type 
For example:
std :: string GeneratePattern ( const std :: string & pattern , size_t count ) { const auto step = pattern . size (); // GOOD: No initialization std :: string ret ( step * count , std :: string :: default_init ); for ( size_t i = 0 ; i < count ; i ++ ) { // GOOD: No bookkeeping memcpy ( ret . data () + i * step , pattern . data (), step ); } return ret ; } 
Benefits:
- 
     A constructor that takes a tag type simplifies some use cases, like the example above. 
- 
     Matches existing practice in Boost. See §6.7 Boost. 
Drawbacks:
- 
     Feature creep / complexity — A tag type invites extending the feature through allocator traits to allocators. Indeed, that’s a great idea, and exactly what Boost does. However, default initialization is not enough. There is also "implicit construction" (see [P1010R1], [P0593R2]) and "relocation" (see [P1144R0], [P1029R1]). We need all these wonderful things and not just for basic_string 
- 
     In reflector discussion of make_unique_default_init copy_backward copy count_if count 
4.3. Allocator Interaction
Unlocking the desired optimizations requires some change to 
This restriction should be acceptable because 
This Clause describes components for manipulating sequences of any non-array trivial standard-layout (6.7) type. Such types are called char-like types, and objects of char-like types are called char-like objects or simply characters.
Removing calls to false,
which should be the case in practice.
Along the way, this proposal clarifies ambiguous language in [string.require] and [container.requirements.general] by:
- 
     Stating explicitly that basic_string 
- 
     Introducing the term "allocator-aware" earlier in [container.requirements.general]. 
- 
     Not attempting to mention other "components" in [container.requirements.general]. The other allocator aware containers (just basic_string regex :: match_results 
libstdc++ and msvc allow strings of non-trivial type. That might force
those libraries to continue to support 
4.4. Bikeshedding
What do we want to call this method?
- 
     resize_default_init make_unique_default_init charT basic_string 
- 
     resize_uninitialized uninitialized uninitialized_copy 
5. Alternatives Considered
5.1. Standalone Type: storage_buffer 
   In [P1072R1], we considered 
At the [SAN] Meeting, this approach received support from LEWGI in light of the [post-Rapperswil] email review indicating support for a distinct type. This approach was rejected by the larger LEWG room in San Diego Meeting Diego.
The proposed type would be move-only.
std :: string GeneratePattern ( const std :: string & pattern , size_t count ) { std :: storage_buffer < char > tmp ; const auto step = pattern . size (); // GOOD: No initialization tmp . prepare ( step * count + 1 ); for ( size_t i = 0 ; i < count ; i ++ ) { // GOOD: No bookkeeping memcpy ( tmp . data () + i * step , pattern . data (), step ); } tmp . commit ( step * count ); return std :: string ( std :: move ( tmp )); } 
For purposes of the container API, 
std :: storage_buffer < char > buf ; buf . prepare ( 100 ); * fill in data * buf . commit ( 50 ); std :: string a ( buf . begin (), buf . end ()); std :: string b ( std :: move ( buf )); assert ( a == b ); 
Benefits:
- 
     By being move-only, we would not have the risk of copying types with trap representations (thereby triggering UB). 
- 
     Uninitialized data is only accessible from the storage_buffer basic_string storage_buffer basic_string commit basic_string 
Drawbacks:
- 
     storage_buffer string vector 
- 
     basic_string storage_buffer 
5.2. Externally-Allocated Buffer Injection
In [P1072R1], we considered that 
- 
     Not critical 
- 
     Not constrained in the future 
- 
     Overly constraining to implementers. Allowing users to provide their own buffers runs into the "offset problem". Consider an implementation that stores its size capacity sizeof ( container ) == sizeof ( void * ) class container { struct Rep { size_t size ; size_t capacity ; }; Rep * rep_ ; }; If using a Rep 
- 
     Introducing new pitfalls. It would be easy to mix new [] allocator :: allocate 
6. Related Work
6.1. Google
Google has a local extension to 
- 
     [Abseil] uses this to avoid bookkeeping overheads in StrAppend StrCat 
- 
     
     - 
       In decompression, the final size of the output buffer is known before the contents are ready. 
- 
       During compression, an upperbound on the final compressed size is known, allowing data to be efficiently added to the output buffer (eliding append 
 
- 
       
- 
     [Protobuf] avoids extraneous copies or initialization when the size is known before data is available (especially during parsing or serialization). 
6.2. MongoDB
MongoDB has a string builder that could have been implemented in terms of 
E.g.: https://github.com/mongodb/mongo/blob/master/src/mongo/db/fts/unicode/string.h
/** * Strips diacritics and case-folds the utf8 input string, as needed to support * options. * * The options field specifies what operations to *skip*, so kCaseSensitive * means to skip case folding and kDiacriticSensitive means to skip diacritic * striping. If both flags are specified, the input utf8 StringData is returned * directly without any processing or copying. * * If processing is performed, the returned StringData will be placed in * buffer. buffer’s contents (if any) will be replaced. Since we may return * the input unmodified the returned StringData’s lifetime is the shorter of * the input utf8 and the next modification to buffer. The input utf8 must not * point into buffer. */ static StringData caseFoldAndStripDiacritics ( StackBufBuilder * buffer , StringData utf8 , SubstrMatchOptions options , CaseFoldMode mode ); 
(Comments re-wrapped.)
6.3. VMware
VMware has an internal string builder implementation that avoids 
6.4. Discussion on std-proposals
This topic was discussed in 2013 on std-proposals in a thread titled "Add
basic_string::resize_uninitialized (or a similar mechanism)":
  https://groups.google.com/a/isocpp.org/forum/#!topic/std-proposals/XIO4KbBTxl0
6.5. DynamicBuffer
The [N4734] (the Networking TS) has dynamic buffer types.
6.6. P1020R1
See also [P1020R1] "Smart pointer creation functions for default initialization". Adopted in San Diego.
6.7. Boost
Boost provides a related optimization for vector-like containers, introduced in [SVN r85964] by Ion Gaztañaga.
E.g.: boost/container/vector.hpp:
//! <b>Effects</b>: Constructs a vector that will use a copy of allocator a //! and inserts n default initialized values. //! //! <b>Throws</b>: If allocator_type’s allocation //! throws or T’s default initialization throws. //! //! <b>Complexity</b>: Linear to n. //! //! <b>Note</b>: Non-standard extension vector ( size_type n , default_init_t ); vector ( size_type n , default_init_t , const allocator_type & a ) ... void resize ( size_type new_size , default_init_t ); ... 
These optimizations are also supported in Boost Container’s 
7. Wording
Wording is relative to [N4778] with [P1148R0] applied.
Motivation for some of these edits can be found in §4.3 Allocator Interaction.
7.1. [basic.string]
In [basic.string] [20.3.2], in the synopsis, add 
...
    // 23.3.2.4, capacity
    size_type size() const noexcept;
    size_type length() const noexcept;
    size_type max_size() const noexcept;
    void resize(size_type n, charT c);
    void resize(size_type n);
    void resize_default_init(size_type n);
    size_type capacity() const noexcept;
    void reserve(size_type res_arg);
    void shrink_to_fit();
    void clear() noexcept;
    [[nodiscard]] bool empty() const noexcept;
   
   In [string.require] [20.3.2.1] clarify that 
In every specialization
, the typebasic_string < charT , traits , Allocator > shall name the same type asallocator_traits < Allocator >:: value_type charT . Every object of type, and the typeuses an object of typebasic_string < charT , traits , Allocator > to allocate and free storage for the containedAllocator objects as needed. The Allocator object used is obtained as described in [container.requirements.general]. In every specializationcharT basic_string < charT , traits , Allocator > shall satisfy the character traits requirements ([char.traits]). [Note: The program is ill-formed iftraits is not the same type astraits :: char_type . — end note]charT 
is an allocator-aware container as described in [container.requirements.general], except that it is implementation defined whetherbasic_string andallocator_traits :: construct are used to construct and destroy the containedallocator_traits :: destroy objects.charT 
References, pointers, and iterators referring to the elements of a
sequence may be invalidated by the following uses of thatbasic_string object:basic_string 
...
In [basic.string.capacity] [20.3.2.4]:
void resize(size_type n, charT c);
- Effects: Alters the value of
as follows:* this 
- — If
, erases the lastn <= size () elements.size () - n - — If
, appendsn > size () copies ofn - size () .c void resize(size_type n);
- Effects: Equivalent to
.resize ( n , charT ()) void resize_default_init(size_type n);
- Effects: Alters the value of
as follows:* this 
- — If
, erases the lastn <= size () elements.size () - n - — If
, appendsn > size () default initialized elements.n - size () size_type capacity() const noexcept;...
7.2. [container.requirements.general]
In [container.requirements.general] [21.2.1] clarify the ambiguous "components affected by this subclause" terminology in p3. Just say "allocator-aware containers".
- All of the containers defined in this Clause except
are allocator-aware containers.array For the components affected by this subclause that declare an allocator_typeObjectsobjectsstored inthese componentsallocator-aware containers, except as noted, shall be constructed using the functionand destroyed using the functionallocator_traits < allocator_type >:: rebind_traits < U >:: construct (19.10.9.2).allocator_traits < allocator_type >:: rebind_traits < U >:: destroy 
We can then simplify p15:
All of the containers defined in this Clause and in 20.3.2 exceptAllocator-aware containers meet the additional requirements described in Table 67.meet the additional requirements of an allocator-aware container, asarray 
8. Questions for LEWG
- 
     Is LEWG satisfied with this approach? 
9. History
9.1. R1 → R2
Applied feedback from [SAN] Meeting reviews.
- 
     Reverted design to "option A" proposed in [P1072R0]. 
- 
     Switched from resize_uninitialized resize_default_init 
- 
     Added discussion of alternatives considered. 
- 
     Specified allocator interaction. 
- 
     Added wording. 
9.2. R0 → R1
Applied feedback from LEWG [post-Rapperswil] Email Review:
- 
     Shifted to a new vocabulary types: storage_buffer storage_node - 
       Added presentation of storage_buffer 
- 
       Added presentation of storage_node 
 
- 
       
- 
     Added discussion of design and implementation considerations. 
- 
     Most of [P1010R1] Was merged into this paper. 
10. Acknowledgments
Thanks go to Arthur O’Dwyer for help with wording and proof
reading, to Jonathan Wakely for hunting down the language that
makes