4

I've got the following theoretical problem which puzzles me a bit:

I can obtain a string of n bytes (as octets, one byte = one octet = eight bits) of random data. I need to preserve the randomness while reducing the base from 256 to x where x is below 256 (and not 0, 1, 2, 4, 8, 16, 32, 64 or 128).

As I want to preserve the randomness, I don't want to cut-off (waste) any information from this string until I've obtained the number of chunks I need. This is for reason of randomness which can be a limited resource on the computer.

I had the idea to do this for base64 which is simple because I can just create 4 numbers out of a single byte (by shifting bits for example: encode64()). But how to do with a base like 254 for example? I can not cut off at bit-boundaries here, can I?

Do I probably need to create a number large enough out of base 2 based bits that can contain both bases? (This is one of the ideas I have so far).

Would be great to get some feedback, I normally paint pictures with such problems, however, just discovered this website here yesterday and I normally use Stackoverflow so I thought I give it a try :D

If you're interested in some non-theoretical background to my question, see "What is the meaning of the term “simple string” for the SALT string in Unix crypt using SHA-256 and SHA-512?", you might get an idea why I don't want to loose any information bits from the random source.

hakre
  • 143
  • 5

1 Answers1

6

You can use "arithmetic decoding". Interpret your random data as a random bit stream which encodes a random number between $0$ and $1$. Then write this number in base $B$.

A much simpler method is "rejection sampling". Suppose for example that $128 < B < 256$. Given a random byte $x$, if $0 \leq x < B$ then output $x$, otherwise reject. If $x$ is close to $256$ then this is pretty efficient. (To get higher efficiency, try the same trick with some power of $B$, i.e. output several digits at once.)

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514