Distrubuted hash tables
Distributed hash tables (DHT) are a decentralized distributed system that provides a lookup similar to a hash table. Hash pairs are stored in the DHT and any node connected to the network can efficiently retrieve the value associated with the given name. Responsibility for maintaining the mapping from names to values is distributed among the nodes themselves. Any change in the set of participants causes minimal amounts of disruption to the service as a whole. This allows DHTs to scale to extremely large numbers of nodes and allows them to handle continual node arrivals, departures and failures.
DHTs form an infrastructure that can be used to build more complex services, such as distributed file systems, peer-to-peer file sharing and content distribution systems, cooperative web caching, multicast, anycast, domain name services, and instant messaging. Applications that use DHTs include BitTorrent and eMule.
The research to use DHTs began and was spurred on by the creation of P2P networks such as Napster, Gnutella and Freenet. These services used resources distributed across the Internet to provide a single useful application. More specifically, they took advantage of the increased Bandwidth and hard disk capacity to provide a file sharing service. These services had a different approach as to how they found data that was contained within a peer. Napster utilized a central index server where each node would broadcast its list of locally held files everytime it joined the network. This central component left the system vulnerable to attacks and eventually lawsuits. Gnutella used a flooding query model where each search would broadcast a message to every machine on the network. This eliminated the possibility of a single point of failure, but was not as efficient. Freenet used a heuristic key based routing scheme where each file was associated with a key, then files and similar keys were grouped together on similar nodes.
DHTs contain the following characteristics:
- Decentralization - Each node collectively forms the system without any central coordination.
- Scalability - The system should function efficiently not matter how many new nodes join.
- Fault tolerance - The system should be reliable even when nodes join, leave or fail.