Consistent Hash
一致性 Hash
Problem of simple hashing
- In a large-scale distributed system, data does not fit on a single server. They are “distributed” across many machines. This is called horizontal scaling.
- To build such a system with predictable performance, it is important to distribute the data evenly across those servers. Simple hashing:
serverIndex = hash(key) % N, where N is the size of the server pool - This approach works well when the size of the cluster is fixed, and the data distribution is even. But when new servers get added to meet new demand, or when existing servers get removed, it triggers a storm of misses and a lot of objects to be moved.
- Consistent hashing is an effective technique to mitigate this issue. The goal of consistent hashing is simple. We want almost all objects to stay assigned to the same server even as the number of servers changes.
Algorithms
- As shown in the diagram, using a hash function, we hash each server by its name or IP address, and place the server onto the ring. Next, we hash each object by its key with the same hashing function.
- To locate the server for a particular object, we go clockwise from the location of the object key on the ring until a server is found. Continue with our example, key 0 is on server 0, key 1 is on server 1.
Add a server
Remove a server
Virtual Nodes
Examples
- Couchbase automated data partitioning
- OpenStack’s Object Storage Service Swift
- Partitioning component of Amazon’s storage system Dynamo
- Data partitioning in Apache Cassandra
- Data partitioning in Voldemort
- Akka’s consistent hashing router
- Riak, a distributed key-value database
- Gluster, a network-attached storage file system
- Akamai content delivery network
- Discord chat application
- Chord algorithm
Reference
Linked Mentions
-
No backlinks found.