| Document number: | P1661R1 | |
|---|---|---|
| Date: | 2019-07-17 | |
| Audience: | Library Working Group | |
| Reply-to: | Tomasz KamiĆski <tomaszkam at gmail dot com> | 
This paper proposed to remove the precalculated hash lookup interface (as proposed in P0920R2: Precalculated hash values in lookup), as this functionality can be implemented by the user in less error-prone manner, via heterogeneous lookup for unordered containers (P0919R3).
[ Note: The implementation of hashers from this paper are included only to illustrate how precalculated hash lookup can be implemented. They are not proposed for standardization. ]
Initial revision.
Contrary to information presented in the P0920R2: Precalculated hash values in lookup, the C++20 is already providing efficient way to precalculate hash value for multiple lookups, via heterogeneous lookup for unordered containers (P0919R3).
To illustrate, the following code provides a type-safe implementation for the std::unordered_map
   with std::string key:
template<typename Key>
class PrehashedStringKey
{
public:
  using key_type = Key;
  explicit PrehashedStringKey(std::size_t h, key_type v)
   : key_(std::move(v)), hash_(h)
  {}
  std::size_t hash() const
  { return hash_; }
  key_type const& key() const&
  { return key_; }
  key_type&& key() &&
  { return std::move(key_); }
  friend bool operator==(PrehashedStringKey const& lhs, std::string_view rhs)
  { return lhs.key() == rhs; }
  template<typename U>
  friend bool operator==(PrehashedStringKey const& lhs, PrehashedStringKey<U> const& rhs)
  { return lhs.key() == rhs.key(); }
private:
  key_type key_;
  std::size_t hash_;
};
struct PrecalculationEnabledStringHasher
{
   using transparent_key_equal = std::equal_to<>;
   template<typename T>
   using prehashed_key_type = PrehashedStringKey<T>;
   std::size_t operator()(std::string_view sv) const
   { return std::hash<std::string_view>{}(sv); }
   template<typename T>
   std::size_t operator()(prehashed_key_type<T> const& v) const
   { return v.hash(); }
   template<typename T>
     requires std::is_convertible_v<T, std::string_view>
   prehashed_key_type<T> precompute(T t) const
   {
     std::size_t hash = operator()(std::string_view(t));
     return prehashed_key_type<T>(hash, std::move(t));
   }
};
The above interface introduces a precompute function that bundles given key value (regardless if it is std::string, std::string_view, or char const*),
   that can be used with multiple unordered_map that uses the same hasher. To illustrate the example the from Motivation section of P0920R1,
   will now look as follows:
std::array<std::unordered_map<std::string, int, PrecalculationEnabledStringHasher>, array_size> maps;
void update(const std::string& user)
{
  const auto hashedUser = maps.front().hash_function().precompute(user);
  for(auto& m : maps) {
    auto it = m.find(hashedUser);
    // ...
  }
}
The obvious advantage of the above solution is that it eliminates the possibility of pairing key with incorrect hash
   value when passed to find function. For example, the following code contains undefined behaviour, due
   paring of user2 with hash1:
std::array<std::unordered_map<std::string, int>, array_size> maps;
void update(const std::string& user1, const std::string& user2)
{
  const auto hash1 = maps.front().hash_function()(user1);
  const auto hash1 = maps.front().hash_function()(user2);
  for(auto& m : maps) {
    auto it1 = m.find(user1, hash1);
    auto it2 = m.find(user2, hash1); // undefined behaviour
    // ...
  }
}
However, such kind of error is not possible in case of PrehashedStringKey
   as we are bundling the key value with the appropriate hash:
std::array<std::unordered_map<std::string, int, PrecalculationEnabledStringHasher>, array_size> maps;
void update(const std::string& user)
{
  const auto hashedUser1 = maps.front().hash_function().precompute(user1);
  const auto hashedUser2 = maps.front().hash_function().precompute(user2);
  for(auto& m : maps) {
    auto it1 = m.find(hashedUser1);
    auto it2 = m.find(hashedUser2);
    // ...
  }
}
The difference in the above interface can be compared with the difference between using a range, versus passing iterator/sentinel as separate arguments.
Similarly to above, the currently proposed interface does not provide type-safety for key —
   it is possible to accidentally pass a hash of unrelated type to the find function,
   and thus causing undefined behaviour.
To illustrate, lets consider following example of correct code:
std::unordered_map<std::chrono::milliseconds, int> m1;
std::unordered_map<std::chrono::milliseconds, int> m2;
void update(std::chrono::seconds secs)
{
   const auto hash = m1.hash_function()(secs);
   {
      auto it1 = m1.find(secs, hash);
      // ...
   }
   {
      auto it2 = m2.find(secs, hash);
      // ...
   }
}
This code will silently break (undefined behaviour) in situation when one of the maps
   would be changed to use std::chrono::seconds as key:
std::unordered_map<std::chrono::milliseconds, int>> m1;
std::unordered_map<std::chrono::seconds, int>> m2;
void update(std::chrono::seconds secs)
{
   vconst auto hash = m1.hash_function()(secs);
   {
      auto it1 = m1.find(secs, hash);
      // ...
   }
   {
      auto it2 = m2.find(secs, hash); // undefined behaviour
      // ...
   }
}
[ Note: That above code still compiles, as std::chrono::seconds are convertible to keys of both
     m1 (std::chrono::milliseconds) and m2 (std::chrono::seconds). ]
However, in case if wrapper similar to PrecomptedStringHash would be used (see Implementability section),
   such mistakes would be reported as compilation error:
std::unordered_map<std::chrono::milliseconds, int, PrecalculationEnabledHasherFor<std::chrono::milliseconds>> m1;
std::unordered_map<std::chrono::seconds, int, PrecalculationEnabledHasherFor<std::chrono::seconds>> m2;
void update(std::chrono::seconds secs)
{
   const auto hashed = m1.hash_function().precompute(secs);
   {
      auto it1 = m1.find(hashed);
      // ...
   }
   {
      auto it2 = m2.find(hashed); // ill-formed
      // ...
   }
}
Another drawback of current interface, is that it allows silently mixing of unordered_map instances,
   that use  different hashers. This applies both for the type of the hasher and its state.
To illustrate, following code contains undefined behavior, due hash mismatch:
std::unordered_map<std::string, int, MurmurHash> m1;
std::unordered_map<std::string, int, CityHash> m2;
void update(std::string const& secs)
{
   const auto hash = m1.hash_function()(secs);
   {
      auto it1 = m1.find(secs, hash);
      // ...
   }
   {
      auto it2 = m2.find(secs, hash); // undefined behavior
      // ...
   }
}
Above problem, can be immediately addressed, by tagging the key/hash pair produced by the hasher with the hash type (see Implementability section).
std::unordered_map<std::string, int, PrecalculationEnabledHasher<MurmurHash>> m1;
std::unordered_map<std::string, int, PrecalculationEnabledHasher<CityHash>> m2;
void update(std::string const& secs)
{
   const auto hashedKey = m1.hash_function()(secs);
   {
      auto it1 = m1.find(hashedKey);
      // ...
   }
   {
      auto it2 = m2.find(hashedKey); // ill-formed
      // ...
   }
}
However, all of above does not address the possibility of using hasher of the same type, but having different state. [ Note: This is not an issue is all hasher use the same program-wide state. ]
One of the possible options is to introduce a check to the find method,
   that will rehash the object and compare it with the provided hash value. However, such
   solution eliminates are performance gains of this feature, making it more suited for the debug builds.
Use of the PrehashedKey approach, provides alternative solutions.
   Firstly, we can store the state of the hasher (salt) in key/hash bundle, and validating it inside
   operator():
template<typename T>
std::size_t operator()(prehashed_key_type<T> const& v) const
{
  if (v.salt() != hash_.salt())
    report_error();
  return v.hash();
}
Above solution is reducing performance impact (comparing state of hasher is usually less expensive than rehashing of a key), and can be eliminated in case of stateless hasher object.
Secondly, we can mitigate hasher state mismatch errors, by using different hasher (and PrehashedKey) for each
   container definition. This can be achieved by adding an otherwise unused tag template parameter to
   PrecalculationEnabledHasher and PrehashedKey.
Examples presented above show that the "power user" that need to squeeze the extra performance from
   lookup of multiple unordered_map with the same hashes (type and state) are already able to achieve
   that goal via custom hasher.
In addition, with this customized implementation, each user can take a different approach regarding the unavoidable safety versus speed trade-off for the solution, e.g.:
precompute method,prehashed_key,making the solution more fitting to the specific use case.
Finally, the hasher based solution makes it possible to localize the usage error-prone
   interface to only specific part of the code. This is not possible if the lookup mechanism
   is included as part of the unordered_map interface.
In light of the above author believes that dedicated precalculated lookup interface shall be removed from the standard, in favor of custom user-provided implementation, that suits their need.
Revert the changes introduced in by P0920R2.
Below you may find the general implementation of the hash wrapper, that allow to extend any existing hasher with precalculated hash lookup functionality.
template<typename Key, typename Hasher>
struct PrehashedKey
{
  using key_type = Key;
  template<typename U>
  explicit PrehashedKey(std::size_t h, U&& u)
   : key_(std::forward<U>(u)), hash_(h)
  {}
  std::size_t hash() const
  { return hash_; }
  key_type const& key() const&
  { return key_; }
  key_type&& key() &&
  { return std::move(key_); }
private:
  key_type key_;
  std::size_t hash_;
};
template<typename Equal, typename Hasher>
class PrecalculationEnabledHashEqual
{
  [[no_unique_address]] Equal equal;
public:
  using is_transparent = void;
  template<typename T>
  using prehashed_key_type = PrehashedKey<T, Hasher>;
  PrecalculationEnabledHashEqual() = default;
  PrecalculationEnabledHashEqual(Equal e) : equal(std::move(e)) {}
  template<typename T, typename U>
    requires std::is_invocable_v<Equal const&, T const&, U const&>
  decltype(auto) operator()(T const& lhs, U const& rhs) const
  {
     return equal(lhs, rhs);
  }
  template<typename T, typename U>
    requires std::is_invocable_v<Equal const&, T const&, U const&>
  decltype(auto) operator()(prehashed_key_type<T> const& lhs, U const& rhs) const
  {
     return equal(lhs.key(), rhs);
  }
  template<typename T, typename U>
    requires std::is_invocable_v<Equal const&, T const&, U const&>
  decltype(auto) operator()(T const& lhs, prehashed_key_type<U> const& rhs) const
  {
     return equal(lhs, rhs.key());
  }
  template<typename T, typename U>
    requires std::is_invocable_v<Equal const&, T const&, U const&>
  decltype(auto) operator()(prehashed_key_type<T> const& lhs, prehashed_key_type<U> const& rhs) const
  {
     return equal(lhs.key(), rhs.key());
  }
};
template<typename Hasher>
struct HasherEquailty
{ using type = std::equal_to<>; };
template<typename Hasher>
  requires (requires () { typename Hasher::transparent_key_equal; })
struct HasherEquailty<Hasher>
{ using type = typename Hasher::transparent_key_equal; };
template<typename Hasher>
class PrecalculationEnabledHasher
{
  [[no_unique_address]] Hasher hasher;
public:
   using transparent_key_equal = PrecalculationEnabledHashEqual<HasherEquailty<Hasher>, Hasher>;
   template<typename T>
   using prehashed_key_type = PrehashedKey<T, Hasher>;
   PrecalculationEnabledHasher() = default;
   PrecalculationEnabledHasher(Hasher h) : hasher(std::move(h)) {}
   template<typename T>
     requires std::is_invocable_r_v<std::size_t, Hasher const&, T const&>
   std::size_t operator()(T const& t) const
   { return hasher(t); }
   template<typename T>
   std::size_t operator()(prehashed_key_type<T> const& v) const
   { return v.hash(); }
   template<typename T>
     requires std::is_invocable_r_v<std::size_t, Hasher const&, std::decay_t<T> const&>
   prehashed_key_type<std::decay_t<T>> precompute(T&& t) const
   {
     std::size_t hash = operator()(std::as_const(t));
     return prehashed_key_type<std::decay_t<T>>(hash, std::forward<T>(t));
   }
};
template<typename T>
using PrecalculationEnabledHasherFor = PrecalculationEnabledHasher<std::hash<T>>;
With above code, the PrecalculationEnabledStringHasher is equivalent to PrecalculationEnabledHasherFor<std::string_view>.
Special thanks and recognition goes to Sabre (http://www.sabre.com) for supporting the production of this proposal and author's participation in standardization committee.