Consistent Hashing

I've written previously on the value of Feature Flags. This time around i would like to talk about an algorithim that's very important to their implementation and also very important in distributed systems known as Consistent Hashing.

The problem

In order to fully understand the benefits of Consistent Hashing we will first take a look at the naive alternative of hashing with modulus. What follows is a straight forward implementation of this naive approach.

final case class NodeRing(nodes: String*) {  
  def getNode(id: String): Option[String] = {
    nodes.drop(Crc32(id).abs % nodes.length).headOption
  }
}

Effectively all this algorithim is doing is taking a string, hashing it, getting it's absolute value and then applying a modulus of the length of nodes to provide us with an index of the node this key maps to and then finally returning the node at that index.

Note: The method Crc32() is documented in note [1]

Now that we have this hashing algorithm, let's leverage it to load balance between some distributed memcached nodes. What follows is a code snippet that creates a bucket of memcached nodes and will hash memcache keys mapping them to memcached nodes and return the corresponding memcached node per key.

val memcacheRing = NodeRing(  
  "10.10.10.1:11211",
  "10.10.10.2:11211",
  "10.10.10.3:11211",
  "10.10.10.4:11211"
)
val josephsNode   = memcacheRing.getNode("joseph")  
val isaiahsNode   = memcacheRing.getNode("isaiah")  
val carolinasNode = memcacheRing.getNode("carolina")  

If we were to evaluate this code the values of the three node resolutions would be as follows:

val josephsNode   = Some("10.10.10.3:11211")  
val isaiahsNode   = Some("10.10.10.4:11211")  
val carolinasNode = Some("10.10.10.2:11211")  

Now if we were to represent the node ring and the hashing results visually we would end up with a representation like the following.

naive hashing

So we can clearly see that the key "joseph" maps to the memcached node 10.10.10.3:11211, "isaiah" maps to 10.10.10.4:11211 and "carolina" maps to 10.10.10.2:11211.

That's all very straight forward and to be honest fairly uninsteresting. However, where it get's interesting is when we need to scale up by adding a new node to the memcached pool. Let's go ahead and do this in code by adding a new node to the ring.

val memcacheRing = NodeRing(  
  "10.10.10.1:11211",
  "10.10.10.2:11211",
  "10.10.10.3:11211",
  "10.10.10.4:11211",
  "10.10.10.5:11211"
)
val josephsNode   = memcacheRing.getNode("joseph")  
val isaiahsNode   = memcacheRing.getNode("isaiah")  
val carolinasNode = memcacheRing.getNode("carolina")  

Now just like before we'll evaluate the code and see what happend.

val josephsNode   = Some("10.10.10.3:11211")  
val isaiahsNode   = Some("10.10.10.2:11211")  
val carolinasNode = Some("10.10.10.1:11211")  

Just like before we'll also visualize the results.

updated naive hashing

A quick glance at the result and we can easily observe that we have invalidated most of our caches as every key we were interested in now maps to a new memcached node. This becomes problematic at scale when you have very large memcached pools that effectively loose the majority of their cache, causing services to hammer on your database of choice. At scale this can result in a common failure mode known as a Cascading Failure, which would be to say that your cache becoming invalidated caused the rest of your services to effectively overwhelm your main data persistence layer. This is also known as the Thundering Herd problem.

That solution though

So now that we understand the problem caused by the naive implementation we can begin searching for a better solution. Like most things in computer science and especially distributed computing, most of this stuff has been well researched and known about for over a decade, luckilly for us Consistent Hashing is no different [2][3][4].

Consistent hashing starts off as a sparse ring data structure, you can think of it as a SortedMap[K, V] where K is the offset in the ring represented as an Int and V is the node value at that position of the ring. Unlike our previous naive implementation, Consistent Hashing has N entries in the ring per node.

A node's N value is commonly referred to as the node's weight and corresponds the the probabillity of that node being selected from the ring as a result of hashing a key through the ring. The probability of a given node being selected is P in P = W / S where W is the weight of the node and S is the sum of every node's weight in the ring.

To visualize this, lets start off with a simple hash ring that only contains 2 nodes and each node has a weight of 2.

initial hash ring

Here we can see that this node contains 2 memcached nodes 10.10.10.1:11211 and 10.10.10.2:11211 and that they each have a weight of 2 as they each appear in the ring twice.

Now in order to map a memcached key to it's corresponding memcached node we apply the hashing function to the key, we then map that hashing function to the ring and slide along the ring clockwise until we run into the next node. The first node we run into is the node that key maps to. We visually demonstrate this algorithm with 4 keys in the next figure.

populated hash ring

Here we can see clearly that the key "Isaiah" maps to the memcached node 10.10.10.1:11211, "Robert" maps to 10.10.10.2:11211, "Joseph" maps to 10.10.10.1:11211 and "Carolina" maps to 10.10.10.2:11211.

Now just like in our previous naive implementation, the rubber only hits the road when we add more nodes in order to scale up. So, let's go ahead and add 2 more nodes with the same weight and see what happens to the results.

updated hash ring

Wow, thats a huge improvement over our naive implementation as in this case we've only invalidated one of the keys we're interested in. The key "Robert" used to map to 10.10.10.2:11211 but now maps to 10.10.10.3:11211, the 3 other keys we were interested in haven't been reallocated and we can visually observe that our keyspace has been evenly partitioned[5].

We can easily implement this algorithm in approximately just 40 lines of scala as demonstrated by the following code snippet.

import scala.collection._  
import scala.collection.immutable.TreeMap

final case class HashRingNode(value: String, weight: Int)

final case class HashRing(ring: SortedMap[Int, String]) {

  def get(value: String): Option[String] = {
    ring
      .from(Crc32(value))
      .headOption
      .orElse(ring.headOption)
      .map(_._2)
  }

}

object HashRing {

  def apply(nodes: HashRingNode*): HashRing = {
    val init = TreeMap[Int, String]()
    val ring = (init /: nodes) {(map, node) =>
      map ++  (init /: (1 to node.weight)) {(t, p) =>
        t + (Crc32(node.value + p) -> node.value)
      }
    }
    HashRing(ring)
  }

}

The use of CRC32 as the hashing algorithm is entirely optional and you can freely substitute it for an algorithm with a better hashing distribution.

For practical use you're going to want to use a higher weight per node than we used earlier in our visual demonstration. It's encouraged that you use weights in the > 100 range but for the following code demos we'll be defaulting to a weight of 80.

So lets go ahead and create a memcached pool of 4 nodes and hash some keys through it to see what results we get.

val memcacheRing = HashRing(  
  HashRingNode("10.10.10.1:11211", 80),
  HashRingNode("10.10.10.2:11211", 80),
  HashRingNode("10.10.10.3:11211", 80),
  HashRingNode("10.10.10.4:11211", 80)
)
val josephsNode   = memcacheRing.getNode("joseph")  
val isaiahsNode   = memcacheRing.getNode("isaiah")  
val carolinasNode = memcacheRing.getNode("carolina")  
val robertsNode   = memcacheRing.getNode("robert")  

The surface area of this code looks very similar to the example ones we used earlier to demonstrate the naive hashing approach with the main difference being we're binding a node weight to each node.

So let's evaluate this code and checkout the values we get.

val josephsNode   = Some("10.10.10.2:11211")  
val isaiahsNode   = Some("10.10.10.3:11211")  
val carolinasNode = Some("10.10.10.3:11211")  
val robertsNode   = Some("10.10.10.3:11211")  

Here we see in this particular case the majority of the nodes map to 10.10.10.3:11211 and only the key "joseph" maps to 10.10.10.2:11211. This appears to be an unbalanced result but given the full key space it's actually fairly balanced.

Now let's add a new memcached node to the code.

val memcacheRing = HashRing(  
  HashRingNode("10.10.10.1:11211", 80),
  HashRingNode("10.10.10.2:11211", 80),
  HashRingNode("10.10.10.3:11211", 80),
  HashRingNode("10.10.10.4:11211", 80),
  HashRingNode("10.10.10.5:11211", 80)
)
val josephsNode   = memcacheRing.getNode("joseph")  
val isaiahsNode   = memcacheRing.getNode("isaiah")  
val carolinasNode = memcacheRing.getNode("carolina")  
val robertsNode   = memcacheRing.getNode("robert")  

Next we observe the results.

val josephsNode   = Some("10.10.10.2:11211")  
val isaiahsNode   = Some("10.10.10.3:11211")  
val carolinasNode = Some("10.10.10.3:11211")  
val robertsNode   = Some("10.10.10.5:11211")  

The results we demonstrated visually also holds consistent with the results of evaluating this code. Here we can see that we have proportionally reallocated our keyspace while managing to invalidate only a small subset of that keyspace in the cache.

Conclusion

Consistent Hashing is a valuable tool to leverage when building distributed systems that can scale up and down and due to the same properties also useful for any system where invalidating the allocated buckets on changes is not optimal.

Notes

[1] Crc32() is implemented as follows:

import java.util.zip._

object Crc32 {  
  def apply(str: String): Int = {
    val crc = new CRC32()
    crc.update(input.getBytes("UTF-8"))
    crc.getValue.toInt
  }
}

[2] Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web

[3] Web Caching with Consistent Hashing

[4] libketama - a consistent hashing algo for memcache clients

[5] You're only likely to get an even distribution given each node has a large enough weight (think hundreds) and you're using a hashing function with a fairly moderate distribution. Never expect to get an even distribution with very small weights.