3

Given a string $(a_0,a_1,\ldots a_n)$. I want to find the length of the longest common prefix of the substrings $(a_0,a_1,\ldots a_{n-1})$ and $(a_1,a_2,\ldots a_n)$. I know this has atleast $O(n)$ complexity. Lets call this operation as prefix-suffix match.

I want to calculate prefix-suffix match length for all suffixes of a string. Now the naive algorithm which doesn't take into account that all the strings for which I am doing this operation are related has $O(n^2)$ complexity. Now my question is can we do this in better complexity.

Note that what I want is similar to LCP array of a slightly modified suffix array. Where the suffixes are sorted based on length instead of lexicographic ordering.

Satvik
  • 131
  • 2

1 Answers1

1

Expanding on reinierpost's comment, here is how to do it in linear time. Every word $w$ can be decomposed as $w = a_1^{k_1} a_2^{k_2} \cdots a_l^{k_l}$, where $a_i \neq a_{i+1}$ and $k_i \geq 0$. This is just decomposition into runs. The LCP of $w$ and its shift is simply $a_1^{k_1-1}$.

In view of this, here is how to compute the answer for all suffixes of a word $w$ in linear time. We go over the array backwards, populating an array $K$ using the following algorithm:

  1. $K_n = 1$.
  2. For $i = n-1,\ldots,0$:
    1. If $w_i = w_{i+1}$, $K_i = K_{i+1} + 1$, otherwise $K_i = 1$.

The answer for the suffix starting at $w_i$ is $w_i^{K_i-1}$.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514