博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
[zz] Consistent Hashing Ring
阅读量:6242 次
发布时间:2019-06-22

本文共 8961 字,大约阅读时间需要 29 分钟。

Building a Consistent Hashing Ring – Part 1

“Consistent Hashing” is a term used to describe a process where data is distributed using a hashing algorithm to determine its location. Using only the hash of the id of the data you can determine exactly where that data should be. This mapping of hashes to locations is usually termed a “ring”.

Probably the simplest hash is just a modulus of the id. For instance, if all ids are numbers and you have two machines you wish to distribute data to, you could just put all odd numbered ids on one machine and even numbered ids on the other. Assuming you have a balanced number of odd and even numbered ids, and a balanced data size per id, your data would be balanced between the two machines.

Since data ids are often textual names and not numbers, like paths for files or URLs, it makes sense to use a “real” hashing algorithm to convert the names to numbers first. Using MD5 for instance, the hash of the name ‘mom.png’ is ’4559a12e3e8da7c2186250c2f292e3af’ and the hash of ‘dad.png’ is ’096edcc4107e9e18d6a03a43b3853bea’. Now, using the modulus, we can place ‘mom.jpg’ on the odd machine and ‘dad.png’ on the even one. Another benefit of using a hashing algorithm like MD5 is that the resulting hashes have a known even distribution, meaning your ids will be evenly distributed without worrying about keeping the id values themselves evenly distributed.

Here is a simple example of this in action:

from hashlib import md5 from struct import unpack_from NODE_COUNT = 100 DATA_ID_COUNT = 10000000 node_counts = [0] * NODE_COUNT for data_id in xrange(DATA_ID_COUNT): data_id = str(data_id) # This just pulls part of the hash out as an integer hsh = unpack_from('>I', md5(data_id).digest())[0] node_id = hsh % NODE_COUNT node_counts[node_id] += 1 desired_count = DATA_ID_COUNT / NODE_COUNT print '%d: Desired data ids per node' % desired_count max_count = max(node_counts) over = 100.0 * (max_count - desired_count) / desired_count print '%d: Most data ids on one node, %.02f%% over' % \ (max_count, over) min_count = min(node_counts) under = 100.0 * (desired_count - min_count) / desired_count print '%d: Least data ids on one node, %.02f%% under' % \ (min_count, under)
100000: Desired data ids per node 100695: Most data ids on one node, 0.69% over 99073: Least data ids on one node, 0.93% under

So that’s not bad at all; less than a percent over/under for distribution per node. In the next part of this series we’ll examine where modulus distribution causes problems and how to improve our ring to overcome them.

 

Building a Consistent Hashing Ring – Part 2

In Part 1 of this series, we did a simple test of using the modulus of a hash to locate data. We saw very good distribution, but that’s only part of the story. Distributed systems not only need to distribute load, but they often also need to grow as more and more data is placed in it.

So let’s imagine we have a 100 node system up and running using our previous algorithm, but it’s starting to get full so we want to add another node. When we add that 101st node to our algorithm we notice that many ids now map to different nodes than they previously did. We’re going to have to shuffle a ton of data around our system to get it all into place again.

Let’s examine what’s happened on a much smaller scale: just 2 nodes again, node 0 gets even ids and node 1 gets odd ids. So data id 100 would map to node 0, data id 101 to node 1, data id 102 to node 0, etc.. This is simply node = id % 2. Now we add a third node (node 2) for more space, so we want node = id % 3. So now data id 100 maps to node id 1, data id 101 to node 2, and data id 102 to node 0. So we have to move data for 2 of our 3 ids so they can be found again.

Let’s examine this at a larger scale:

from hashlib import md5 from struct import unpack_from NODE_COUNT = 100 NEW_NODE_COUNT = 101 DATA_ID_COUNT = 10000000 moved_ids = 0 for data_id in xrange(DATA_ID_COUNT): data_id = str(data_id) hsh = unpack_from('>I', md5(str(data_id)).digest())[0] node_id = hsh % NODE_COUNT new_node_id = hsh % NEW_NODE_COUNT if node_id != new_node_id: moved_ids += 1 percent_moved = 100.0 * moved_ids / DATA_ID_COUNT print '%d ids moved, %.02f%%' % (moved_ids, percent_moved)
9900989 ids moved, 99.01%

Wow, that’s severe. We’d have to shuffle around 99% of our data just to increase our capacity 1%! We need a new algorithm that combats this behavior.

This is where the “ring” really comes in. We can assign ranges of hashes directly to nodes and then use an algorithm that minimizes the changes to those ranges. Back to our small scale, let’s say our ids range from 0 to 999. We have two nodes and we’ll assign data ids 0-499 to node 0 and 500-999 to node 1. Later, when we add node 2, we can take half the data ids from node 0 and half from node 1, minimizing the amount of data that needs to move.

Let’s examine this at a larger scale:

from bisect import bisect_left from hashlib import md5 from struct import unpack_from NODE_COUNT = 100 NEW_NODE_COUNT = 101 DATA_ID_COUNT = 10000000 node_range_starts = [] for node_id in xrange(NODE_COUNT): node_range_starts.append(DATA_ID_COUNT / NODE_COUNT * node_id) new_node_range_starts = [] for new_node_id in xrange(NEW_NODE_COUNT): new_node_range_starts.append(DATA_ID_COUNT / NEW_NODE_COUNT * new_node_id) moved_ids = 0 for data_id in xrange(DATA_ID_COUNT): data_id = str(data_id) hsh = unpack_from('>I', md5(str(data_id)).digest())[0] node_id = bisect_left(node_range_starts, hsh % DATA_ID_COUNT) % NODE_COUNT new_node_id = bisect_left(new_node_range_starts, hsh % DATA_ID_COUNT) % NEW_NODE_COUNT if node_id != new_node_id: moved_ids += 1 percent_moved = 100.0 * moved_ids / DATA_ID_COUNT print '%d ids moved, %.02f%%' % (moved_ids, percent_moved)
4901707 ids moved, 49.02%

Okay, that is better. But still, moving 50% of our data to add 1% capacity is not very good. If we examine what happened more closely we’ll see what is an “accordion effect”. We shrunk node 0′s range a bit to give to the new node, but that shifted all the other node’s ranges by the same amount.

We can minimize the change to a node’s assigned range by assigning several smaller ranges instead of the single broad range we were before. This can be done by creating “virtual nodes” for each node. So 100 nodes might have 1000 virtual nodes. Let’s examine how that might work.

from bisect import bisect_left from hashlib import md5 from struct import unpack_from NODE_COUNT = 100 DATA_ID_COUNT = 10000000 VNODE_COUNT = 1000 vnode_range_starts = [] vnode2node = [] for vnode_id in xrange(VNODE_COUNT): vnode_range_starts.append(DATA_ID_COUNT / VNODE_COUNT * vnode_id) vnode2node.append(vnode_id % NODE_COUNT) new_vnode2node = list(vnode2node) new_node_id = NODE_COUNT NEW_NODE_COUNT = NODE_COUNT + 1 vnodes_to_reassign = VNODE_COUNT / NEW_NODE_COUNT while vnodes_to_reassign > 0: for node_to_take_from in xrange(NODE_COUNT): for vnode_id, node_id in enumerate(new_vnode2node): if node_id == node_to_take_from: new_vnode2node[vnode_id] = new_node_id vnodes_to_reassign -= 1 if vnodes_to_reassign <= 0: break if vnodes_to_reassign <= 0: break moved_ids = 0 for data_id in xrange(DATA_ID_COUNT): data_id = str(data_id) hsh = unpack_from('>I', md5(str(data_id)).digest())[0] vnode_id = bisect_left(vnode_range_starts, hsh % DATA_ID_COUNT) % VNODE_COUNT node_id = vnode2node[vnode_id] new_node_id = new_vnode2node[vnode_id] if node_id != new_node_id: moved_ids += 1 percent_moved = 100.0 * moved_ids / DATA_ID_COUNT print '%d ids moved, %.02f%%' % (moved_ids, percent_moved)
90108 ids moved, 0.90%

There we go, we added 1% capacity and only moved 0.9% of existing data. The vnode_range_starts list seems a bit out of place though. It’s values are calculated and never change for the lifetime of the cluster, so let’s optimize that out.

from bisect import bisect_left from hashlib import md5 from struct import unpack_from NODE_COUNT = 100 DATA_ID_COUNT = 10000000 VNODE_COUNT = 1000 vnode2node = [] for vnode_id in xrange(VNODE_COUNT): vnode2node.append(vnode_id % NODE_COUNT) new_vnode2node = list(vnode2node) new_node_id = NODE_COUNT vnodes_to_reassign = VNODE_COUNT / (NODE_COUNT + 1) while vnodes_to_reassign > 0: for node_to_take_from in xrange(NODE_COUNT): for vnode_id, node_id in enumerate(vnode2node): if node_id == node_to_take_from: vnode2node[vnode_id] = new_node_id vnodes_to_reassign -= 1 if vnodes_to_reassign <= 0: break if vnodes_to_reassign <= 0: break moved_ids = 0 for data_id in xrange(DATA_ID_COUNT): data_id = str(data_id) hsh = unpack_from('>I', md5(str(data_id)).digest())[0] vnode_id = hsh % VNODE_COUNT node_id = vnode2node[vnode_id] new_node_id = new_vnode2node[vnode_id] if node_id != new_node_id: moved_ids += 1 percent_moved = 100.0 * moved_ids / DATA_ID_COUNT print '%d ids moved, %.02f%%' % (moved_ids, percent_moved)
90369 ids moved, 0.90%

There we go. In the next part of this series, will further examine the algorithm’s limitations and how to improve on it.

转载地址:http://fkpia.baihongyu.com/

你可能感兴趣的文章
组策略导入导出secedit
查看>>
Windows Phone 7.5 - Local SQL Database(简介)
查看>>
微软宣布Entity Framework 5的性能有了显著提升
查看>>
SPSS中八类常用非参数检验之二:二项分布(Binomial)检验
查看>>
mysql字段类型范围说明:int、bigint、smallint、tinyint,char、varchar、nvarchar
查看>>
php简单对象与数组的转换函数代码(php多层数组和对象的转换)
查看>>
C# Socket编程(5)使用TCP Socket
查看>>
SQL SERVER IN参数化处理
查看>>
Python MongoDB Spatial Query
查看>>
NetBeans IDE 7.4 Beta版本build JavaFX时生成的可执行jar包执行时找不到依赖的jar包
查看>>
笔记本wifi热点设置好后,手机连上但不能上网问题
查看>>
Run ASP.NET MVC site on mac (mono/xamarin studio)
查看>>
win8.1安装驱动出现“文件的哈希值不在指定的目录”的解决办法[zz]
查看>>
CRM 常用SQL 脚本
查看>>
备忘录--关于线程和IO知识
查看>>
【iCore3 双核心板】例程八:定时器PWM实验——呼吸灯
查看>>
jquery tmpl 详解
查看>>
docker学习笔记4:利用docker hub上的mysql镜像创建mysql容器
查看>>
【Xamarin开发 Android 系列 3】循序渐进的学习顺序
查看>>
自定义列表dl的使用原因和场合
查看>>