Unique numerical encodings of lists of integers

Question

I am a computer programmer and my project is to take an arbitrary length ordered list of integers and generate another integer which is a reversible encoding of the list.

I did find a solution, but I wonder if there is a better solution in terms of efficiency. If mathematicians offer other solutions, I can evaluate them for efficiency. I'll represent my current solution as code, for clarity.

Imagine that we want to treat a group of numbers like a hierarchical namespace. Analogous to subomain.domain.tld or 127.99.0.1 or /directory/.../subdirectory/filename

In each of the examples above, we use a separator character. But we want our output to be numbers, composed entirely of digits.

Our trick: convert the numbers to octal and use "9" as the separator.

>>> lst = [127, 99, 0, 1]
>>> octnums = [oct(x) for x in lst] # encode in octal
>>> octnums
['0o177', '0o143', '0o0', '0o1']
>>> as_str = "9".join(octnum[2:] for octnum in octnums)
>>> int(as_str)
17791439091

This is a unique and reversible representation of that list of numbers as an integer.

Note that the number "8" and the string "99" will never appear in these numbers, so they are a subset of the normal integers.

Octal is selected because the language has a built-in function for it.

Is there a better way: one that uses either integer-space or CPU time more efficiently?

No. Your approach is darn near optimal - it’s only a constant worse than just writing down the numbers next to each other. You could try fancier things like encoding the lengths of the numbers first and then writing the numbers out, but the improvement if any would be marginal. You may be able to do better if you knew more about likely sizes/properties of the numbers. — Eric, Jul 10 '21 at 16:15
Your apporach mya waste a lot of integers if the typical list entries are big. Indieed, if you encode a single 10 octal dicgit integer as 10 decimal digits, you use only $8^{10}/10^{0}\approx 11%$ of the available integers. On the other hand, there are of course versions that use integer-space perfectly (e.g., start with perfectly encoding pairs of integers a la Cantor and then recursively expand that method; you will obtain a bijection between the set of all finite lists of integers and the set of integers) - but they are probably not so CPU efficient. — Hagen von Eitzen, Jul 10 '21 at 16:28

score 1 · Accepted Answer · answered Jul 10 '21 at 16:30

I think when it comes to the efficiency question, some other stackexchange sites might be better suited. The encoding you have given is very efficient and it's very clear how the decoding can be done. But since it is only injective and not bijective, the decoding function does have to (in general) inspect the whole number in order to recognize whether the input was a valid code or not.

There is this way to encode lists, which gives a bijection between lists and numbers; most likely very inefficient due to the involvement of prime numbers.

Another bijective list $\mathsf{encode} : \mathsf{List}(\mathbb{N}) \to \mathbb{N} $ uses the Cantor pairing function $\pi : \mathbb{N} \times \mathbb{N} \to \mathbb{N}$ and is defined by recursion on the list $$ \mathsf{encode}~ [\,] := 0 \hspace{4em} \mathsf{encode}~ (x ::L) := \pi(x, \mathsf{encode} ~ L) + 1 $$

Thanks. Something like the Cantor Tuple Function is exactly what I was looking for. It remains to be seen how it performs in practice, but it's much more elegant than what I had originally. — Paul Prescod, Jul 11 '21 at 16:52

Unique numerical encodings of lists of integers

1 Answers1

Linked