3

How do you shuffle the bytes in a file (bytes for simplicity) on disk with a small, $O(\log n)$, amount of memory and preferably in-place?

If the file had size $2^m$, then we can first split the file into 2-byte chunks, shuffle them individually, then shuffle each set of neighboring 2 chunks and merge, doing this recursively. Merging can be done using constant memory by reading the start of a chunk and swapping them with another.

But if the file size is not $2^m$ then the above doesn't really work, as for a 3-byte file the last byte cannot be in the middle position. (the chunks here are [1, 2] and [3]).

[EDIT] This doesn't generate all permutations as Yuval Filmus points out.

[EDIT 2] The paper "Random permutations on distributed, external and hierarchical memory" seems like a close match. Other refs/ideas welcome.

simonzack
  • 313
  • 3
  • 12

2 Answers2

5

The algorithm you suggest doesn't result in a uniform permutation. An in-place algorithm which works for every file size is the Fisher–Yates shuffle.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514
2

Raw idea:

  1. Divide your file in blocks sized with the available memory. Use Fisher-Yates to shuffle the blocks.

  2. Shuffle merge the blocks by group of 2 (when it remains $N_A$ bytes in block A and $N_B$ bytes in block B, take the byte of block A with probability $N_A/(N_A+N_B)$), for now consider that we are not trying to do that in place and put the result in another file.

  3. Repeat, at each step the size of the block is doubled, until there is only one block. You have your shuffling.

  4. To make it in replace, remark that this is in fact a merge sort using random choice instead of comparison. Thus adapt a in-place merge sort algorithm, using shuffling merge instead of sorting merge for the merge step (IIRC, you'll need $O(\log N)$ file positions for that, but I doubt it will dominate the three disks sectors buffers needed somewhere in the system to make it works)

AProgrammer
  • 3,099
  • 18
  • 20