Significant changes since P0652R2 are marked with blue.
There's a lot of use-cases where a concurrent modification of unordered associative container is required. For example, there is a popular web service named Memcached which simply caches objects in memory. Also, we can use it to collect statistics, to store connection metadata or just use as lightweight key-value storage. Some big companies like Facebook have it in their standard library. There were attempts to add such containers/data structures in the past (N3425, N3732, and others...)
This paper is another attempt to deal with the problem. This time we are trying to keep the interface familiar to users, hard to misuse but still functional.
Reference implementation is available at https://github.com/BlazingPhoenix/concurrent-hash-map.
When users wish to use the concurrent associative data structure, they are searching for performance and scalability. Fastest known implementations rely on the open addressing MemC3.pdf, so the interface of this proposal allows having Open Addressing implementation under the hood.
In C++17 std::shared_ptr::use_count() function
was removed because users were misusing it. Users were hoping that the
result of the function is actual
at the point where they were trying to use it. However, as soon
as the result is returned from the function it could expire as someone
modifies the value from other thread.
We followed the C++17 decision and removed all the functions that are useless/dangerous in concurrent
environments: size(), count(), empty(), buckets_count() and so
forth.
Iterators must take care of synchronization, otherwise, the user can not dereference the iterator at all. If Iterators do some synchronization it affects performance. Otherwise, if Iterators do some synchronization then deadlocks will happen. For example, if in first thread we iterate from beginning to the end of the container and in the second thread we iterate from the end to the beginning, then the deadlock will happen in the middle as the iterators met.
It is possible to drop the open addressing idea and make the iterators to have shared ownership of buckets. In that case, iterator may keep the bucket alive. This seems implementable and usable by users but significantly affects performance by adding multiple additional atomic operations and adding additional indirections. We tried to stick to this idea for a long time and minimize the performance impact. Finally, we created a list of use-cases for concurrent associative data structures and found out that in all of those use-cases iterators are useless or kill performance of the whole class (so are also useless). Instead of that, we came up with an idea of unsynchronized view/policy.
This is the killer feature of this proposal that attempts to fix all the limitations from above and provide a much more useful interface.
The idea is following: multiple operations on unordered containers make sense only if that container is not concurrently modified. A user may take the responsibility that no-one is modifying the container at this point and gain all the operations and iterators.
Such approach allows to initialize/prepare the data for container without additional synchronization overhead. It also allows advanced usage:
concurrent_unordered_map for reads and write simultaneously.const unordered_map_view
on the same concurrent_unordered_map
simultaneously
(no modifications through the concurrent_unordered_map interface are allowed!).
unordered_map_view (no modifications are allowed from other threads!).const concurrent_unordered_map&
for reads and multiple threads
use const unordered_map_view on the same concurrent_unordered_map
simultaneously
(ineffective: use multiple const
unordered_map_view instead).
std::unordered_map iterator invalidation:This is a consequence of allowing the open addressing as an underlying implementation.
This is a very risky decision because it unleashes new ways for deadlocking/breaking the container (users may recursively visit the same value; users may call heavy functions that will degrade the overall performance of the container; users can call some parallel functions of the standard library that may potentially use the same mutexes as the container implementation...).
However, there's a lot of use-cases where a value must be updated depending on the value that is in the container. Without a visitation, there's no way to do that safely, as all the functions return values by copy. See examples B and C.
We've added an ability to visit all elements of the container without locking the whole container. In this case, contents can be occasionally changed during iteration. Such functionality can be useful for debugging.
Some implementations (for example Intel TBB) provide a possibility to get an item from a container with an appropriate lock guard. There are two possible ways to implement it:
mapped_type some lock guarded type (for example boost::synchronized_value)
we can get thread-safe shared ownership of the collection objects without a risk of getting deadlocks.
According to thoughts provided above, we considered not to propose such interface because it decreases performance and creates a very simple ability for a user to make a deadlock.
namespace std {
template <class Key,
class T,
class Hash = hash<Key>,
class Pred = equal_to<Key>,
class Allocator = allocator<pair<const Key, T>> >
class concurrent_unordered_map;
template <class Key,
class T,
class Hash = hash<Key>,
class Pred = equal_to<Key>,
class Allocator = allocator<pair<const Key, T>> >
void swap(concurrent_unordered_map< Key, T, Hash, Pred, Allocator>& lhs,concurrent_unordered_map< Key, T, Hash, Pred, Allocator>& rhs);
}
concurrent_unordered_mapnamespace std {
template <class Key,
class T,
class Hash = hash<Key>,
class Pred = equal_to<Key>,
class Allocator = allocator<pair<const Key, T>> >
class concurrent_unordered_map {
public:
// type aliases
using key_type = Key;
using mapped_type = T;
using value_type = pair<const Key, T>;
using hasher = Hash;
using key_equal = Pred;
using allocator_type = Allocator;
using size_type = implementation-defined;
class unordered_map_view;
// construct/copy/assign/destroy
concurrent_unordered_map();
explicit concurrent_unordered_map(size_type n);
concurrent_unordered_map(size_type n, const Hash& hf,
const key_equal& eql = key_equal(),
const allocator_type& a = allocator_type());
template <typename InputIterator>
concurrent_unordered_map(InputIterator first, InputIterator last,
size_type n = implementation-defined,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& a = allocator_type());
concurrent_unordered_map(const Allocator&);
concurrent_unordered_map(initializer_list<value_type> il,
size_type n = implementation-defined,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& a = allocator_type());
concurrent_unordered_map(concurrent_unordered_map&& other) noexcept;
concurrent_unordered_map(concurrent_unordered_map&& other, const Allocator&);
concurrent_unordered_map& operator=(concurrent_unordered_map&& other) noexcept;
concurrent_unordered_map& operator=(initializer_list<value_type>il);
~concurrent_unordered_map();
// members observers
allocator_type get_allocator() const;
hasher hash_function() const;
key_equal key_eq() const;
// visitation
template <typename Visitor>
void visit(const key_type& key, Visitor& f);
template <typename Visitor>
void visit(const key_type& key, Visitor& f) const;
template <typename Visitor>
void visit_all(Visitor& f);
template <typename Visitor>
void visit_all(Visitor& f) const;
template<typename K, typename Visitor, typename... Args>
bool visit_or_emplace(K&& key, Visitor& f, Args&&... args);
// access
optional<mapped_type> find(const key_type& key) const;
template<typename... Args>
mapped_type find(const key_type& key, Args&&... args) const;
// modifiers
template<typename K, typename... Args>
bool emplace(K&& key, Args&&... args);
template<typename K, typename... Args>
bool insert_or_assign(K&& key, Args&&... args);
template<typename... Args>
size_type update(const key_type& key, Args&&... args);
size_type erase(const key_type& key);
template<class H2, class P2>
void merge(concurrent_unordered_map<Key, T, H2, P2, Allocator>& source);
template<class H2, class P2>
void merge(concurrent_unordered_map<Key, T, H2, P2, Allocator>&& source);
void swap(concurrent_unordered_map&)
noexcept(allocator_traits<Allocator>::is_always_equal::value &&
is_nothrow_swappable_v<Hash> &&
is_nothrow_swappable_v<Pred>);
void clear() noexcept;
// view retrieval
unordered_map_view make_unordered_map_view(bool lock = false) const noexcept;
};
The concurrent_unordered_map class is an associative data
structure that provides an effective key-value storage.
Concurrent member functions call on the same instance of concurrent_unordered_map do not introduce data races.
key_type satisfies Cpp17MoveConstructible requirements.
mapped_type satisfies Cpp17MoveConstructible requirements.
Unless specified otherwise all the member functions of concurrent_unordered_map have the following additional requirements, and remarks:
Hash, Pred, and Allocator do not introduce data races [Note: Safe to use same instance of Hash, Pred, or Allocator concurrently. - end note].Key and T do not introduce data races for the following functions: all the constructors (including default, move and copy constructors); copy and move assignments; destructor [Note: Safe to concurrently use different instances of Key, or T for copying, moving, etc. - end note].concurrent_unordered_map [Note: Is is safe to concurrently use the same instance of concurrent_unordered_map- end note].Calls to functions of concurrent_unordered_map that successfully modify keys or values synchronize with calls to functions successfully reading the value or key of the same keys [Note: Modifications of the key or value in one thread are visible to the thread that reads (or modifies) the same key or value. - end note]
[Note: Implementations are encouraged to implment concurrent reads of the same key or value without contention. - end note]
Destructor call for the unordered_map_view referencing the instance of concurrent_unordered_map synchronize with calls to functions successfully reading the value or key of the same instance of concurrent_unordered_map [Note: Destructor of the unordered_map_view makes all the changes done through the instance of unordered_map_view visible to all the threads that use a concurrent_unordered_map.- end note].
concurrent_unordered_map();
explicit concurrent_unordered_map(size_type n);
concurrent_unordered_map(size_type n, const Hash& hf,
const key_equal& eql = key_equal(),
const allocator_type& a = allocator_type());
template <typename InputIterator>
concurrent_unordered_map(InputIterator first, InputIterator last,
size_type n = implementation-defined,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& a = allocator_type());
concurrent_unordered_map(const Allocator&);
concurrent_unordered_map(initializer_list<value_type> il,
size_type n = implementation-defined,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& a = allocator_type());
concurrent_unordered_map with the analogous to the unoredered_map effects.concurrent_unordered_map(concurrent_unordered_map&& other) noexcept; concurrent_unordered_map(concurrent_unordered_map&& other, const Allocator&);
concurrent_unordered_map with the content of other, leaving other in valid but unspecified state.other before the constructor call may not be equal to *this after the constructor call only in case of concurrent access to the other.concurrent_unordered_map& operator=(concurrent_unordered_map&& other) noexcept;
other to *this, leaving other in valid but unspecified state.other before the operator call may not be equal to *this only in case of concurrent access to the other or concurrent access to *this.concurrent_unordered_map& operator=(initializer_list<value_type>il);
li to *this.li before the operator call may not be equal to *this only in case of concurrent access to *this.~concurrent_unordered_map();
allocator_type get_allocator() const; hasher hash_function() const; key_equal key_eq() const;
allocator_type, hasher and key_equal should not introduce data races.allocator_type, hasher and key_equal respectively.template <typename Visitor> void visit(const key_type& key, Visitor& f);
is_invocable_v<Visitor, mapped_type&> f on the value stored with a key equivalent to the key.f do not introduce data races.template <typename Visitor> void visit(const key_type& key, Visitor& f) const;
is_invocable_v<Visitor, const mapped_type&> f on the value stored with a key equivalent to the key.f do not introduce data races.template <typename Visitor> void visit_all(Visitor& f);
is_invocable_v<Visitor, const key_type&, mapped_type&> f on key and value pairs stored in *this.f do not introduce data races. It is guaranteed to visit all the keys of *this only if there is no concurrent modifications of *this.[Note: Invocation of f on some
key and value does not prevent modifications of other keys and values in *this. - end note].template <typename Visitor> void visit_all(Visitor& f) const;
is_invocable_v<Visitor, const key_type&, const mapped_type&> f on key and value pairs stored in *this.f do not introduce data races.It is guaranteed to visit all the keys of *this only if there is no concurrent modifications of *this.[Note: Invocation of f on some
key and value does not prevent modifications of other keys and values in *this. - end note].template<typename K, typename Visitor, typename... Args> bool visit_or_emplace(K&& key, Visitor& f, Args&&... args);
is_invocable_v<Visitor, mapped_type&> && is_constructible_v<key_type, K&&> && is_constructible_v<mapped_type, Args&&...> f on value stored with the key equivalent to key if such key exist in *this.
Otherwise stores key_type(std::forward<Key>(key)) and mapped_type(std::forward<Args>(args)...).key_type and mapped_type, access or modification of the arguments passed into f do not introduce data races.optional<mapped_type> find(const key_type& key) const;
is_copy_constructible_v<mapped_type> key in *this, otherwise returns an optional holding a copy of mapped_type for that key.template<typename... Args> mapped_type find(const key_type& key, Args&&... args) const;
is_copy_constructible_v<mapped_type> && is_constructible_v<mapped_type, Args&&...> mapped_type(std::forward<Args>(args...)) if there is no key equivalent to key in *this, otherwise returns a copy of mapped_type for that key.template<typename K, typename... Args> bool emplace(K&& key, Args&&... args);
is_constructible_v<key_type, K&&> && is_constructible_v<mapped_type, Args&&...> true if key key was not in the container and function emplaced it, false otherwisetemplate<typename K, typename... Args> bool insert_or_assign(K&& key, Args&&... args);
is_constructible_v<key_type, K&&> && is_constructible_v<mapped_type, Args&&...> && is_move_assigneble_v<mapped_type> true if key key was not in the container and function emplaced it, false if the key was in container and mapped_type for that key was replaced by move assigning mapped_type(std::forward<Args>(args...))template<typename... Args> size_type update(const key_type& key, Args&&... args);
is_constructible_v<mapped_type, Args&&...> && is_move_assigneble_v<mapped_type> 0 if the key was not in the container, 1 if the key was in the container and mapped_type for that key was replaced by move assigning mapped_type(std::forward<Args>(args...))size_type erase(const key_type& key);
0 if the key was not in the container, 1 if the key was in the container and was erased.template<class H2, class P2> void merge(concurrent_unordered_map<Key, T, H2, P2, Allocator>& source); template<class H2, class P2> void merge(concurrent_unordered_map<Key, T, H2, P2, Allocator>&& source);
source into *this. For each key and value pair from source apply the following rules:
If *this already has the key from source, key and value are left in the source. Otherwise, key and value are moved into *this.
[Note: If new values are being concurrently inserted into source during this operation then at the end of this function invocation source may have keys
that do not exist in *this- end note]void swap(concurrent_unordered_map& other)
noexcept(allocator_traits<Allocator>::is_always_equal::value &&
is_nothrow_swappable_v<Hash> &&
is_nothrow_swappable_v<Pred>);
other into *this.
[Note: If new values are being concurrently inserted into other or *this during this operation
then at the end of this function invocation other may not be equal to *this before the invocation and
at the end of this function invocation *this may not be equal to other before the invocation- end note]void clear() noexcept;
*this.
[Note: If new values are being concurrently inserted into *this during this operation
then at the end of this function invocation *this may contain some keys- end note]unordered_map_view make_unordered_map_view(bool lock = false) const noexcept;
true any concurrent access to *thisthrough member functions or any subsequent attempt to invoke this->make_unordered_map_view(true) should block as long as the view is not destroyed.*this.unordered_map_view template <class Key, class T, class Hash, class Pred, class Allocator>
class concurrent_unordered_map<Key, T, Hash, Pred, Allocator>::unordered_map_view {
concurrent_unordered_map<Key, T, Hash, Pred, Allocator>& delegate; // exposition only
public:
// types:
using key_type = Key;
using mapped_type = T;
using value_type = pair<const Key, T>;
using hasher = Hash;
using key_equal = Pred;
using allocator_type = Allocator;
using pointer = typename allocator_traits<Allocator>::pointer;
using const_pointer = typename allocator_traits<Allocator>::const_pointer;
using reference = value_type&;
using reference = const value_type&;
using size_type = implementation-defined;
using difference_type = implementation-defined;
using iterator = implementation-defined;
using const_iterator = implementation-defined;
using local_iterator = implementation-defined;
using const_local_iterator = implementation-defined;
// construct/copy/destroy
unordered_map_view() = delete;
unordered_map_view(unordered_map_view&&) noexcept = delete;
unordered_map_view(const unordered_map_view&) noexcept = delete;
unordered_map_view& operator=(unordered_map_view&&) noexcept = delete;
unordered_map_view& operator=(const unordered_map_view&) noexcept = delete;
~unordered_map_view();
// iterators:
iterator begin() noexcept;
const_iterator begin() const noexcept;
iterator end() noexcept;
const_iterator end() const noexcept;
const_iterator cbegin() const noexcept;
const_iterator cend() const noexcept;
// capacity:
bool empty() const noexcept;
size_type size() const noexcept;
size_type max_size() const noexcept;
// modifiers:
template<typename... Args>
pair<iterator, bool> emplace(Args&&... args);
// We considered have only forwarding reference variant of insert to avoid interface overloading
template<class P> pair<iterator, bool> insert(P&& x);
template<class InputIterator>
void insert(InputIterator first, InputIterator last);
void insert(initializer_list<value_type> il);
template <typename... Args>
pair<iterator, bool> try_emplace(const key_type& k, Args&&... args);
template <typename... Args>
pair<iterator, bool> try_emplace(key_type&& k, Args&&... args);
template <class M>
pair<iterator, bool> insert_or_assign(const key_type& k, M&& obj);
template <class M>
pair<iterator, bool> insert_or_assign(key_type&& k, M&& obj);
iterator erase(iterator position);
iterator erase(const_iterator position);
size_type erase(const key_type& k);
iterator erase(const_iterator first, const_iterator last);
void swap(concurrent_unordered_map&)
noexcept(allocator_traits<Allocator>::is_always_equal::value &&
is_nothrow_swappable_v<Hash> &&
is_nothrow_swappable_v<Pred>);
void clear() noexcept;
template<class H2, class P2>
void merge(concurrent_unordered_map<Key, T, H2, P2, Allocator>&& source);
template<class H2, class P2>
void merge(concurrent_unordered_multimap<Key, T, H2, P2, Allocator>&& source);
// observers:
allocator_type get_allocator() const;
hasher hash_function() const;
key_equal key_eq() const;
// map operations:
iterator find(const key_type& k);
const_iterator find(const key_type& k) const;
size_type count(const key_type& k) const;
pair<iterator, iterator> equal_range(const key_type& k);
pair<const_iterator, const_iterator> equal_range(const key_type& k) const;
// element access:
mapped_type& operator[](const key_type& k);
mapped_type& operator[](key_type&& k);
const mapped_type& at(const key_type& k) const;
mapped_type& at(const key_type& k);
// bucket interface:
size_type bucket_count() const;
size_type max_bucket_count() const;
size_type bucket_size(size_type n);
size_type bucket(const key_type& k) const;
local_iterator begin(size_type n);
const_local_iterator begin(size_type n) const;
local_iterator end(size_type n);
const_local_iterator end(size_type n) const;
const_local_iterator cbegin(size_type n) const;
const_local_iterator cend(size_type n) const;
// hash policy:
void rehash(size_type n);
float load_factor() const noexcept;
};
}
unordered_map_view class refers concurrent_unordered_map
and provides an interface to concurrent_unordered_map that satisfies unordered associative container requirements [unord.req]
except iterator invalidation requirements, load_factor functions, size() complexity requirements, buckets and node operations.
[ Note: Concurrent const member functions calls on the instances of unordered_map_view referencing
the same concurrent_unordered_map introduce data races - end note. ]
[ Note: Concurrent member functions calls on the concurrent_unordered_map instance A and
on the unordered_map_view referencing the instance A introduce data races. - end note. ]
~unordered_map_view();
*this was created by make_unordered_map_view(true)
resumes execution of all the waiting opearions on concurrent_unordered_map.#include <concurrent_unordered_map>
#include <chrono>
#include <string_view>
#include <memory>
using namespace std;
void precess_user(string_view name, shared_ptr<const user_t> user);
auto get_new_user();
auto get_request();
int main() {
concurrent_unordered_map<string_view, shared_ptr<user_t> > users;
// single threaded fill
read_users_from_file(users.make_unordered_map_view())
constexpr unsigned threads_count = 10;
while(1) {
// concurrent work
std::atomic<int> b{threads_count * 100500};
thread threads[threads_count];
for (auto& t: threads) {
// processing users
t = thread([&users, &b]() {
while (--b > 0) {
auto [user_name, data] = co_await get_request();
shared_ptr<const user_t> user = users.find(user_name, shared_ptr<const user_t>{});
if (!user) continue;
precess_user(*user, data);
}
});
}
// accepting users
auto start = std::chrono::system_clock::now();
while (--b > 0) {
auto [new_user_name, user] = co_await get_new_user();
users.insert(new_user_name, user);
if (std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - start) > 5000) {
// debug print rough contents each 5 seconds
users.visit_all([] (const string_view& name, const shared_ptr<user_t> user) {
cout << name << " " << user->status() << endl;
});
start = std::chrono::system_clock::now();
}
}
for (auto& t: threads) {
t.join();
}
// single threaded processing
auto unsafe_users = users.make_unordered_map_view();
count_statistics(unsafe_users);
dump_to_file(unsafe_users);
cleanup(unsafe_users);
}
}
#include <concurrent_unordered_map>
#include <algorithm>
int main() {
using namespace std;
using event_id = ...;
struct event_data {
event_data(const event_data&) = delete;
event_data& operator=(const event_data&) = delete;
...
};
concurrent_unordered_map<event_id, unique_ptr<event_data> > events;
// Getting unique events.
auto event_generators = get_event_generators();
for_each(execution::par_unseq, event_generators.begin(), event_generators.end(), [&events](auto& g) {
while (1) {
auto [event_name, data] = co_await g.get_event();
if (event_name.empty()) {
break; // no more events
}
events.visit_or_emplace(event_name, [&data](unique_ptr<event_data>& v){
if (v || v->priority() < data->priority()) {
std::swap(data, v);
}
}, unique_ptr<event_data>{});
}
});
auto v = events.make_unordered_map_view();
for_each(execution::par_unseq, v.begin(), v.end(), [](auto& e) {
process(e.first, std::move(e.second));
});
}
#include <concurrent_unordered_map>
#include <utility>
int main() {
using namespace std;
using id_t = ...;
using use_count_t = size_t;
concurrent_unordered_map<id_t, use_count_t> stats;
constexpr unsigned threads_count = 10;
thread threads[threads_count];
for (auto& t: threads) {
t = thread([&stats]() {
while (1) {
auto [id, data] = co_await get_something();
stats.visit_or_emplace(
id,
[](auto& v){ ++v; },
0
);
precess_stuff(id, data);
}
});
}
for (auto& t: threads) {
t.join();
}
process_stats(events.make_unordered_map_view());
}
| Write fraction | Thread count | boost::synchronized_value<std::unordered_map>, ms | std::concurrent_unordered_map prototype, ms | Sharded unordered map, ms | libcuckoo cuckoohash_map, ms | folly::ConcurrentHashMap, ms |
|---|---|---|---|---|---|---|
| 1/1 | 1 | 358 | 324 | 413 | 332 | 373 |
| 1/1 | 2 | 2034 | 563 | 1347 | 444 | 724 |
| 1/1 | 4 | 4236 | 678 | 2047 | 619 | 1412 |
| 1/1 | 8 | 15234 | 872 | 3382 | 703 | 2760 |
| 1/1 | 16 | 32094 | 1248 | 5593 | 1370 | 5889 |
| 1/1 | 32 | 73701 | 2123 | 8669 | 1585 | 13838 |
| 1/1 | 64 | 165842 | 5624 | 12076 | 7264 | 26686 |
| 1/5 | 1 | 274 | 178 | 294 | 238 | 333 |
| 1/5 | 2 | 1157 | 169 | 805 | 195 | 347 |
| 1/5 | 4 | 2565 | 189 | 1075 | 222 | 380 |
| 1/5 | 8 | 7562 | 400 | 1550 | 262 | 620 |
| 1/5 | 16 | 17829 | 558 | 2730 | 488 | 1368 |
| 1/5 | 32 | 40953 | 622 | 4826 | 598 | 2771 |
| 1/5 | 64 | 91353 | 1902 | 6925 | 3423 | 5332 |
| 1/20 | 1 | 199 | 146 | 257 | 189 | 192 |
| 1/20 | 2 | 694 | 152 | 426 | 193 | 196 |
| 1/20 | 4 | 1425 | 216 | 622 | 260 | 271 |
| 1/20 | 8 | 4483 | 167 | 1212 | 279 | 264 |
| 1/20 | 16 | 12908 | 200 | 1341 | 292 | 337 |
| 1/20 | 32 | 30692 | 310 | 2219 | 407 | 620 |
| 1/20 | 64 | 70597 | 703 | 3266 | 2795 | 1535 |