11

I'm looking for a persistent data structure similar to array (but immutable), allowing for fast indexing, append, prepend, and iteration (good locality) operations.

Clojure provides persistent Vector, but it's only for fast append. Scala's Vector has effectively constant-time append and prepend, but I can't get how it's implemented, since it's based on same data structure (bit-mapped vector trie) as Clojure vector, and, as I understand, bit-mapped vector trie can't have fast prepend without some tricks.

I'm interested not in ready to use implementation but in a description of how to implement such a data structure myself.

Raphael
  • 73,212
  • 30
  • 182
  • 400
Tvaroh
  • 255
  • 2
  • 6

2 Answers2

13

The obvious candidate is a persistent balanced binary tree. All the operations you listed can be performed in $O(1)$ or $O(\lg n)$ time, using path copying. For more details on how to achieve this runtime, see Chris Okasaki's book referenced below or my answer here.

Of course, as an variant, each leaf of such a tree could itself contain an immutable array (a sequence of consecutive values). This makes updating a value less efficient, but it might work well for your situation, if you never intend to modify an existing value, just append and prepend. In this way, your vector is represented as a sequence of immutable sequences, represented as a balanced binary tree with immutable arrays in the leaves. This allows for fast indexing (logarithmic in the number of leaves), fast append and prepend, and fast iteration. The worst-case asymptotic complexity is no better, but the performance in practice might be significantly better.

The standard reference is Chris Okasaki's 1998 book "Purely functional data structures".
See also

D.W.
  • 167,959
  • 22
  • 232
  • 500
4

I have described one implementation of such a data structure in my article about incremental regular expression matching - see http://jkff.info/articles/ire/#ropes-strings-with-fast-concatenation and the text below and above that section.

It's a variety of a constant-height tree (like B-trees or 2-3 trees). Basically it's a (2,3) tree, whose leaves are (N, 2N-1) arrays, in order to avoid per-element overhead. (A (N, 2N-1) array is an array whose lengths in the range N..2N-1.) Larger N gives you smaller overhead but linearly increases the complexity of splitting and concatenation. Operations such as indexing, splitting and concatenation are very similar to the way they work in 2-3 trees, generalizing to (N, 2N-1) at the leaf level.

D.W.
  • 167,959
  • 22
  • 232
  • 500
jkff
  • 2,269
  • 1
  • 14
  • 17