Listing 2. findNode.py
# From the start node, find the node responsible
# for the target key
def findNode(start, key):
current=start
while distance(current.id, key) > \
distance(current.next.id, key):
current=current.next
return current
# Find the responsible node and get the value for
# the key
def lookup(start, key):
node=findNode(start, key)
return node.data[key]
# Find the responsible node and store the value
# with the key
def store(start, key, value):
node=findNode(start, key)
node.data[key]=value
This DHT design is simple but entirely sufficient to serve the
purpose of a distributed hash table. Given a static network of nodes
with perfect uptime, you can start with any node and key and
find the node responsible for that key. An important thing to keep in
mind is that although the example code so far looks like a fairly normal,
doubly linked list, this is only a simulation of a DHT. In a real DHT,
each node would be on a different machine, and all calls to them would
need to be communicated over some kind of socket protocol.
In order to make the current design more useful, it would be nice to
account for nodes joining and leaving the network, either intentionally
or in the case of failure. To enable this feature, we must establish a
join/leave protocol for our network. The first step in the Chord join
protocol is to look up the successor of the new node’s ID using the
normal lookup protocol. The new node should be inserted between the
found successor node and that node’s predecessor. The new node
is responsible for some portion of the keys for which the predecessor node
was responsible. In order to ensure that all lookups work
without fail, the appropriate portion of keys should be copied to the new
node before the predecessor node changes its next node pointer to point
to the new node.
Leaves are very simple; the leaving node copies all of
its stored information to its predecessor. The predecessor then changes
its next node pointer to point to the leaving node’s successor. The join
and leave code is similar to the code for inserting and removing
elements from a normal linked list, with the added requirement of
migrating data between the joining/leaving nodes and their neighbors. In
a normal linked list, you remove a node to delete the data
it’s holding. In a DHT, the insertion and removal of
nodes is independent of the insertion and removal of data. It might be
useful to think of DHT nodes as similar to the periodically readjusting
buckets used in persistent hash table implementations, such as Berkeley DB.
Allowing the network to have dynamic members while ensuring
that storage and lookups still function properly certainly is an
improvement to our design. However, the performance is terrible—O(n)
with an expected performance of n/2. Each node traversed
requires communication with a different machine on the network, which
might require the establishment of a TCP/IP connection, depending on
the chosen transport. Therefore, n/2 traversed nodes can be quite slow.
In order to achieve better performance, the Chord design adds a layer to
access O(log n) performance. Instead of storing a pointer to the next
node, each node stores a “finger table” containing the
addresses of k
nodes. The distance between the current node’s ID and the IDs of the
nodes in the finger table increases exponentially. Each traversed node
on the path to a particular key is closer logarithmically than the last,
with O(log n) nodes being traversed overall.
In order for logarithmic lookups to work, the finger table needs to be
kept up to date. An out-of-date finger table doesn’t break lookups as long
as each node has an up-to-date next pointer, but lookups are
logarithmic only if the finger table is correct. Updating the finger table
requires that a node address is found for each of the k slots in the
table. For any slot x, where x
is 1 to k, finger[x] is determined by
taking the current node’s ID and looking up the node responsible for the
key (id+2(x-1)) mod
(2k) (Listing 3). When doing lookups, you now have
k nodes to choose from at each hop, instead of only one at each. For each
node you visit from the starting node, you follow the entry in the finger
table that has the shortest distance to the key (Listing 4).
Orginalpost: Distributed Hash Tables, Part I
Related posts:
