How is the Varshamov-Tenegolts code decoded?

Question

For $0 \leq a \leq n$ the VT code $VT_a(n)$ consists of all tuples $(x_1,x_2,..,x_n) \in \{ 0,1\}^n$ such that $ \sum_{i=1}^{n} ix_i = a (mod (n+1))$

For example $VT_0(4) = \{ 0000,1001,0110,1111 \}$

Say someone has $x = 1001$ and deletes the first zero and transmits $x' = 101$. Now is the claim that a receiver who only has access to say $101$ string will be able to decode back to $1001$?
Searching on the net what I see being called as "decoding algorithm" seems to me to assume that the receiver at least has access to the sums $\sum_{i=1}^{n} ix_i$ and $\sum_{i=1}^{n-1} ix'_i$ and also knows stuff like number of 1s and 0s to the left and right of the deleted position. This doesn't make sense to me since I thought that the idea is that the receiver doesn't know anything about $x$!

score 4 · Answer 1 · answered Jun 05 '15 at 01:24

Right, the code $VT_0(4)$ has Levenshtein distance (edit distance) of 4: to get from one codeword to another you must do 2 deletions and 2 insertions. Therefore, the code can correct one deletion. Indeed, if 101 was received, the only possible way to get this message assuming one deletion, is if 1001 was sent.

Decoding can be done in several ways:

The naive approach: since the edit-distance is large, if only a single deletion occurred there is only a single codeword that could have been sent. To decode we can just go over all the codewords until we find the one that leads to the received word. More efficiently, we can start with the received word and try to insert 0 or 1 in all the possible position until we get a codeword. We are guaranteed that we will get only one code word. However, this still may take a lot of time/computation.

The more efficient approach is an observation that appeared in Levenshtein's original paper[1] and then also re-explained in Sloan's manuscript on single-deletion correction codes[2]. I'll repeat the essence of it, but you can find it there as well.

Assume each of the codewords $x=x_1\ldots x_n$ satisfies $$ \sum_{i=1}^n ix_i = a \mod n+1$$
If there was only a single deletion we will get the word $y=y_1\ldots y_{n-1}$. we can compute the value $$a' = \sum_{i=1}^{n-1} i y_i$$
The key observation is the following. Let's assume the index $j$ was deleted. That is $y_1=x_1, \ldots, y_{j-1}=x_{j-1}$ and $y_j=x_{j+1}, \ldots y_{n-1}=x_n$. Denote with $R_0$ the $\{x_i\}_{i<j}$ that are a $0$, and with $R_1$ the ones that are a $1$. similarly, let $L_0=|\{ x_i =0 \mid i>j\}|$ and $L_0=|\{ x_i =1 \mid i>j \}|$. Now let's look at $a'-a \mod n+1$. (below all computations are $\text{mod } n+1$.)

3.1. Let $w=L_1+R_1$ the weight of $y$ (the number of $1$'s).

3.2. If $x_j=0$, then $$\begin{align*} a'-a &= \sum_{i=j}^n i(y_i-x_i) = \sum_{i=j}^n i(x_{i+1}-x_i) \\ &= ix_{j+1} + (i+1)(x_{j+2}-x_{j+1}) + \dotsb + (n-1)(x_n - x_{n-1}) - nx_n\\ &=-x_{j+1} - x_{j+2} - \dotsb - x_n = -R_1 \end{align*}$$ (note that $R_1<n$ so the mod $n+1$ is good enough; I also virtually set $y_n=x_{n+1}=0$ so it adds nothing to the sum but make them well defined)

3.3. But if $x_j=1$, then $$\begin{align*} a'-a &= \sum_{i=j}^n i(y_i-x_i) = \sum_{i=j}^n i(x_{i+1}-x_i) \\ &= i(x_{j+1}-1) + (i+1)(x_{j+2}-x_{j+1}) + \dotsb + (n-1)(x_n - x_{n-1}) - nx_n\\ &= -R_1 -i = -R_1 - L_0 -L_1 -1 = -w -L_0 -1 \end{align*}$$

3.4. Forget about the minus sign (that is, look at $a-a'$ instead). If we get something which is $<w$ then we must be in the first case (since $R_1<w$), and we learn two things: (1) that a zero was deleted, and (2) we learn $R_0$, which means that we know where to put the zero back: before this deleted zero there are exactly $R_1$ ones!
if what we get from $a-a'$ is $>w$ we know we are in the second case. Now we learn that: (1) a one was deleted from the codeword, and (2) we learn $w+1+L_0$ but we know $w$ so we get $L_0$, and we can restore the deleted 1: to the left of this deleted one there are exactly $L_0$ zeroes.

How is the Varshamov-Tenegolts code decoded?

1 Answers1