一致性 Hash

Problem of simple hashing

  1. In a large-scale distributed system, data does not fit on a single server. They are “distributed” across many machines. This is called horizontal scaling.
  2. To build such a system with predictable performance, it is important to distribute the data evenly across those servers. Simple hashing: serverIndex = hash(key) % N, where N is the size of the server pool
  3. This approach works well when the size of the cluster is fixed, and the data distribution is even. But when new servers get added to meet new demand, or when existing servers get removed, it triggers a storm of misses and a lot of objects to be moved.
  4. Consistent hashing is an effective technique to mitigate this issue. The goal of consistent hashing is simple. We want almost all objects to stay assigned to the same server even as the number of servers changes.

Algorithms

  1. As shown in the diagram, using a hash function, we hash each server by its name or IP address, and place the server onto the ring. Next, we hash each object by its key with the same hashing function.
  2. To locate the server for a particular object, we go clockwise from the location of the object key on the ring until a server is found. Continue with our example, key 0 is on server 0, key 1 is on server 1.

Add a server

Remove a server

Virtual Nodes

Examples

Reference