What is the advantage of seperate chaining over open addressing?

Question

Hash tables resolve collisions through two mechanisms,

separate chaining or open hashing and
open addressing or closed hashing.

Though the first method uses lists (or other fancier data structure) in hash table to maintain more than one entry having same hash values, the other uses complex ways of skipping n elements on collsion.

Since, while searching, both mechanisms requires looking up for more entries on collision and Seperate chaining may require lesser number of searches to find the match.

What kind of advantages does it have over separate chaining?

score 4 · Answer 1 · answered Jun 06 '15 at 22:05

Suppose that you have a very large number of sufficiently uniform elements, $n$, that you want to hash. We want to minimize lookup time, obviously.

If we use separate chaining with $m$ linked lists, our lookup time will be, on average, $O(n/m)$ since we have $n$ elements and $m$ buckets. The actual lookup time depends on whether our input set is sufficiently uniform or not, but for the sake of simplicity we can assume that it is. What are the downsides of this? Linked lists and most other data structures that we would use to do this make heavy use of pointers. A single query takes $O(n)$ pointer dereferences.

Now consider two typical methods of open addressing: linear probing and quadratic probing. Linear probing traverses the space allot for open addressing and places the element that it is hashing at the first available memory location (at the $i^\text{th}$ step, we look at index $i$ to see if it is free). Quadratic probing traverses by incrementing the index by $i^2$ on each step. This can lead to immense speed-up due to its cache-friendliness (as mentioned in the first comment, this is called spatial locality -- I'd suggest reading this if you're unfamiliar).

score 4 · Answer 2 · answered Nov 20 '16 at 03:08

In addition to what everyone else has said, you can get some of the locality back in a separate chaining scenario by unrolling the linked list.

Assuming a C-esque language, separate chaining might naively be implemented with a structure like this:

struct hash_node {
    unsigned hash_value;
    void* data;
    struct hash_node* next;
};

There are plenty of variants on this. If you have some control over the data that is being stored in the hash table, for example, you might be able to store the "next" pointers in there. If fast removal is important, you might use a doubly-linked list. You get the idea.

However, if you expect hash slots to collide far more often than hash values, this is an alternative:

#define HASH_UNROLL  4

struct hash_node {
    unsigned hash_values[HASH_UNROLL];
    void* data[HASH_UNROLL];
    struct hash_node* next;
};

The hash nodes are a little bigger, but all of the hash values to check are in the same cache line. If you're particularly lucky, the compiler may even turn this into a SIMD operation for you.

If the hash table is stored on disk, variations of this technique can improve locality much more than open addressing, at the cost of using extra space. See, for example, extendible hashing.

Some additional tradeoffs are:

Resizing a hash table can be faster with separate chaining if worst-case performance (as opposed to amortised) is an issue.
It's much simpler to implement deletion with separate chaining, although "ease of implementation" is not necessarily your biggest concern. Look up Robin Hood hashing for details on how to arrange an open addressing hash table to support efficient deletion.
It's much simpler to make a separate chaining-based hash table concurrent, since you can lock each chain separately. Of course, there are concurrent variants of open addressed hash tables, such as hopscotch hashing (a variant on cuckoo hashing), which seem to perform well in practice.

score 1 · Answer 3 · answered Nov 20 '16 at 02:16

In addition to what Bharadwaj said, you don't have to use an association list to implement separate chaining. For example, Java 8 uses red-black trees instead of linked lists for HashMap. Theoretically, this reduces the worst-case scenario of a hash table from $O(n)$ to $O(\log n)$. Of course, in instances where your buckets remain small (e.g. small hash tables, or if you dynamically resize your table as more elements are added) this may or may not be faster than just searching a linked list linearly.

What is the advantage of seperate chaining over open addressing?

3 Answers3