ISO/IEC JTC1 SC22 WG21 N3873

Date: 2014-01-20

Thomas Köppe <tkoeppe@google.com>

Improved insertion interface for unique-key maps

Contents

  1. Overview
  2. Motivation and scope
  3. Impact on the standard
  4. Design Decisions
    1. Non-solutions
    2. Existing solutions
  5. Discussion
  6. Technical specifications
  7. Alternative considerations
    1. An insert-like interface
    2. Adapt the existing standard
  8. Acknowledgements

Overview

The existing interface of unique-keyed map containers (std::map, std::unordered_map) is slightly underspecified, which makes certain container mutations more complicated to write than necessary. This proposal is to add new specialized algorithms:

Motivation and scope

The existing insert() and emplace() interfaces are underspecified. Consider the following example:

std::map<std::string, std::unique_ptr<Foo>> m; m["foo"]; std::unique_ptr<Foo> p(new Foo); auto res = m.emplace("foo", std::move(p));

What is the value of p? It is currently unspecified whether p has been moved-from. (The answer is that it depends on the library implementation.)

The remainder of this proposal is motivated by the problem of inserting p so that it does not get moved-from when the key already exists, and of overwriting it unconditionally. We will see that all possible solutions are less than perfect: All of them require boilerplate, some are inefficient, and none are elegant. Under the proposal, we could write:

auto res1 = m.emplace_stable("foo", std::move(p)); assert(p && !res1.second); auto res2 = m.emplace_or_update("foo", std::move(p)); assert(!p && res2.second);

Impact on the standard

This is purely an extension of the standard library. Two new functions have to be added to [map.special], and also to a new section “specialized algorithms” [unord.maps.special] for unordered maps. There is no interference with existing code.

Design Decisions

Non-solutions

First, let us consider some pieces of code that may look like a solution, but in fact are not.

m.emplace("foo", std::move(p)); // #1 m.insert(std::make_pair("foo", std::move(p))); // #2 m.insert(std::pair<std::string &&, std::unique_ptr<Foo> &&>("foo", std::move(p))); // #3 m.insert(std::forward_as_tuple("foo", std::move(p)); // #4

Both cases #1 and #2 typically create value-pair objects which unconditionally take ownership, all before looking up the key. The case #3 and #4 are more insidious, because it looks like they might work. But it is in fact perfectly valid for the implementation to construct a new, intermediate value_type object internally, which unconditionally takes ownership, before comparing keys.

The signature of insert is template <typename P> insert(P &&), which suggests to the user that it is possible to pass strictly by reference (as in cases #3 and #4 above). The problem is that an implementation is not prohibited from constructing further objects, which may move-from the original arguments..

(Concretely, the behaviour of an implementation may come down to whether an internal auxiliary member template is of the form template <typename _Pair> void __internal_insert(_Pair &&), in which case there is no ownership transfer, or of the form void __internal_insert(value_type &&), in which case a temporary value-pair is constructed which takes ownership. These differences have been observed between GCC 4.6 and GCC 4.8.)

Let us consider how we can program the desired algorithms with the existing library, and examine the relative merits and shortcomings of each case. We continue using the code of the initial example. In each case, we use the term lookup to refer either to the tree traversal in the ordered map or the hashing and bucket lookup in the unordered map.

Existing solutions

Stable insertion problem: insert the unique pointer and guarantee that it still owns the pointee if the key already exists.

Insert-or-update problem: Set the map entry for a given key, unconditionally on whether they key already exists.

find()

auto it = m.find("foo"); if (it == m.end()) {     it = m.emplace("foo", std::move(p)).first; } // insert-or-update problem only else {     it->second = std::move(p)); }

This solution performs the lookup twice when the key does not exist. It is the most general in terms of the various type requirements. Information on whether the key already existed is implicit in the branching.

lower_bound() and hint

auto it = m.lower_bound("foo"); if (it == m.end() || it->first != "foo") {     it = m.emplace_hint(it, "foo", std::move(p)).first; } // insert-or-update problem only else {     it->second = std::move(p)); }

This solution is only available for the ordered map. It is near-perfect for that case, though, except perhaps for the very insignificant fact that slightly more key comparisons than strictly necessary are being performed. Information on whether the key already existed is implicit in the branching.

operator[]

m["foo"] = std::move(p);   // insert-or-update problem only

For the insert-or-update problem only, this code is perhaps the most natural-looking one. It requires that the mapped_type be default-constructible, and it always default-constructs an element if the key does not exist before assigning the given value. There is no information on whether the key already existed.

Discussion

With the introduction of move semantics and move-only types in C++11, the standard library containers have evolved naturally to support such types, and by and large this evolution has been seamless. The insertion interface for unique-key maps is one of the few which do not extend naturally. To achieve the desired functionality, a programmer has to wrap the interface, and each of the possible ways of doing so has limitations, costs, or both.

The acceptance and sucess of C++ and its standard library is largely due to the belief of its users that they could not generally do a better job by writing code “by hand” (say, in C). It achieves this by being unconstrained, by not imposing arbitrary restrictions and limitations on what a programmer can do where they are not necessary, and by not imposing unwarranted cost (don't pay for what you don't use). The existing map insertion interface is something that can be done better, and in this spirit this proposal is to plug this hole. The proposed interface allows seamless use of move-only types in maps, can easily be implemented so as to not require any redundant lookup or comparison operations, and returns information which already comes out for free (the iterator and the boolean). In fact, the last two points apply equally well to pre-C++11 copyable types, and the new interface offers a previously unavailable guarantee to not make unneeded copies.

This proposal introduces one idiosyncrasy: Traditionally, we teach that we should universally regard a (potentially) moved-from object as being in an indeterminate, unusable state. The proposed emplace_stable interface has different semantics. However, we feel that this peculiarity is justified. The unique-key maps are already peculiar containers, in the sense that they are the only containers whose value_type has a non-trivial microstructure. Alternatives which would preserve the traditional notion of moving-from objects might consist of giving the object back optionally if it was not used, but this is too far a departure from the existing library.

We do not consider the “satellite-less” associative containers std::{unordered_,}{multi,}set, just as we do not consider movable key types. The notion of a key that only exists as a unique object is rather more unusual; for example, one cannot easily search for something by value which is unique. (Rather, heterogeneous and transparent comparators seem to be the more appropriate notion in that case.) Operations on move-only keys will already require specialized code, and thus present less of an opportunity for improving the existing interface. We are however perfectly open to suggestions to extend the proposal to cover move-only keys, should this turn out to be desirable.

Technical specifications

Add the following to [maps.special], and to a new subsection under [unord.maps], named “unordered_map specialized algorithms [unord.maps.special]”:

template <typename M> pair<iterator, bool> emplace_stable(const key_type & k, M && obj) template <typename M> pair<iterator, bool> emplace_stable(const_iterator hint, const key_type & k, M && obj)

Effects: If the key k already exists in the map, there is no effect, no dynamic allocations are performed and no exceptions are thrown. Otherwise, inserts the element constructed from the arguments as value_type(k, std::forward<M>(obj)) into the map. The bool part of the return value is true if and only if the insertion took place, and the iterator part points to the element of the map whose key is equivalent to k. The hint iterator provides an insertion hint.

Complexity: The same as emplace and emplace_hint, respectively.

template <typename M> pair<iterator, bool> emplace_or_update(const key_type & k, M && obj) template <typename M> pair<iterator, bool> emplace_or_update(const_iterator hint, const key_type & k, M && obj)

Effects. If the key comparing equal to k does not exist in the map, inserts the element constructed as value_type(k, std::forward<M>(obj)). If the key already exists, the mapped part of the value is assigned std::forward<M>(obj). The bool part of the return value is true if and only if the key already existed prior to the operation, and the iterator part points to the inserted or updated element. The hint iterator provides an insertion hint.

Complexity: The same as emplace and emplace_hint, respectively.

emplace_or_update is optional

Strictly speaking, once a reliable emplace_stable exists, emplace_or_update could be written generically and efficiently as follows:

auto it = m.emplace_stable("foo", std::move(p)); if (!it.second) {     it.first = std::move(p);  // valid move } //  equivalent to m.emplace_or_update(std::make_pair("foo", std::move(p)))

We propose the addition of emplace_or_update regardless, so as to provide a readable solution for both problems, without making one inferior to the other.

Implementation notes

The implementation of the stable interface is straight-forward: construction of the element node must simply be deferred until after the key lookup. Until such time, the arguments may be passed along by reference.

For emplace_or_update, the implementation should first look up the key and then either construct the full value_type or forward-assign the mapped element.

Alternative considerations

An insert-like interface

We considered a function like “insert_stable” with a signature like value_type &&, or like the existing template signature of insert. All we require for stable insertion is the ability to pass the arguments by reference, so that we can guarantee that they will not be consumed if the element cannot be inserted.

However, this would be awkward to use. The common-place construction insert_stable(std::make_pair("foo", std::move(p))) would do the wrong thing and construct a temporary. The correct idiom would be insert_stable(std::forward_as_tuple("foo", std::move(p))), but there would be no diagnostic if this were done wrong. By separating the key and mapped parts, we embrace the peculiar nature of the map's value type and make it an explicit, deliberate part of the interface. We believe that this aids readability and teachability, as well as making the intended use easy to get right.

Adapt the existing standard

It is worth considering whether less intrusive changes to the standard can achieve the goals of this proposal. This raises the issue that the current specification is unclear, both the “Associative containers” requirements ([associative.reqmts, unord.req]) and the “{unordered_,}map modifiers” ([map.modifiers, unord.map.modifiers]). For example, [unord.map.modifiers] says:

Effects: Inserts obj converted to value_type if and only if there is no element in the container with key equivalent to the key of value_type(obj).

This does not place any conditions on whether obj is converted or not, only on whether the converted value is inserted. The corresponding text for the ordered map is very different; here is only one excerpt (from [map.modifiers]):

Otherwise [if P is not a reference] x is considered to be an rvalue as it is converted to value_type and inserted into the map.

The problem with this formulation is that it is unclear what “is converted and inserted into the map” means – does the conversion happen regardless of the success of the insertion or not?

This only leaves us with emplace. This function's semantics are not specified as part of the containers, but only as part of the container requirements [associative.reqmts, unord.req]. The language is again unclear; in both cases the wording is:

Effects: Inserts a T object t constructed with std::forward<Args>(args)... if and only if there is no element in the container with key equivalent to the key of t.

Once again, the conflation of “construct” and “insert” make the specification too imprecise.

In all three cases, the standard leaves it unspecified whether further conversions and object constructions are allowed to happen. We believe that the standard wording could benefit from a clarification, regardless of this proposal.

It is perhaps possible, although we do not have a good suggestion for how to do it, to clarify the existing language and at the same time achieve the goal of this proposal. The fact that the map insert signature has a universal reference, template <typename P> insert(P &&) (as opposed to, say, the corresponding vector interface insert(value_type &&)) suggests that there is some intention in the existing standard to allow the user to pass arguments strictly by reference: the function call expression itself does not construct any objects as a side effect (unlike in the vector interface, where a temporary may be constructed). If some kind of rule were added that implementations must not construct value objects internally before checking the key, then perhaps insert could be given the semantics of the proposed emplace_stable, and emplace_or_update and “insert_or_update” could be derived from it as described above.

However, we do not currently have a suggestion for such a change that we would be comfortable with. Changes would have to be made to the container-specific description of insert, as well as to the general associative or unordered container requirements. More importantly yet, if we merely clarified insert and/or emplace to have stable semantics (bearing in mind that for insert to behave stably requires forward_as_tuple and not make_pair!), it would still be difficult for users to rely on the new behaviour, since it would not be easy to detect whether an implementation was conforming, and a non-conforming implementation would silently produce different behaviour.

Acknowledgements

Many thanks to Geoffrey Romer, Jeffrey Yasskin, Billy Donahue and Stuart Taylor for providing invaluable feedback and discussion.