In addition to what everyone else has said, you can get some of the locality back in a separate chaining scenario by unrolling the linked list.
Assuming a C-esque language, separate chaining might naively be implemented with a structure like this:
struct hash_node {
unsigned hash_value;
void* data;
struct hash_node* next;
};
There are plenty of variants on this. If you have some control over the data that is being stored in the hash table, for example, you might be able to store the "next" pointers in there. If fast removal is important, you might use a doubly-linked list. You get the idea.
However, if you expect hash slots to collide far more often than hash values, this is an alternative:
#define HASH_UNROLL 4
struct hash_node {
unsigned hash_values[HASH_UNROLL];
void* data[HASH_UNROLL];
struct hash_node* next;
};
The hash nodes are a little bigger, but all of the hash values to check are in the same cache line. If you're particularly lucky, the compiler may even turn this into a SIMD operation for you.
If the hash table is stored on disk, variations of this technique can improve locality much more than open addressing, at the cost of using extra space. See, for example, extendible hashing.
Some additional tradeoffs are:
- Resizing a hash table can be faster with separate chaining if worst-case performance (as opposed to amortised) is an issue.
- It's much simpler to implement deletion with separate chaining, although "ease of implementation" is not necessarily your biggest concern. Look up Robin Hood hashing for details on how to arrange an open addressing hash table to support efficient deletion.
- It's much simpler to make a separate chaining-based hash table concurrent, since you can lock each chain separately. Of course, there are concurrent variants of open addressed hash tables, such as hopscotch hashing (a variant on cuckoo hashing), which seem to perform well in practice.