10

I'm looking for an efficient algorithm to find the longest repeated pattern in a string.

For example, consider the following string of numbers:

5431428571428571428571428571427623874534.

As you can see, 142857142857 is the longest pattern which is repeated for a couple of times (at least twice) in this string.

The repeated string should not contain any re any idea rather than brute-force?

Juho
  • 22,905
  • 7
  • 63
  • 117

1 Answers1

16

The problem is surprisingly non-trivial. First, two brute force algorithms. A square ("repeated pattern") is given by its length $\ell$ and position $p$, and takes time $O(\ell)$ to verify. If we go over all $\ell$ and $p$, we obtain an $O(n^3)$ algorithm. We can improve on that by first looping over $\ell$, and then scanning the string with two running pointers at a distance of $\ell$. In this way, one can verify whether a square of length $2\ell$ exists in linear time, giving a total running time of $O(n^2)$.

Kolpakov and Kucherov developed an algorithm for finding all maximal repeats in a word in time $O(n)$ [1], and their algorithm can be used to find all maximal squares in time $O(n)$. A repeat is a subword of the form $w^kx$, where $k \geq 2$ and $x$ is a proper prefix of $w$. The largest square contained in that repeat is $(w^{\lfloor k/2 \rfloor})^2$. Using this formula, given all maximal repeats in a word (of which there are only $O(n)$ many), one can find the largest square.


[1] Kolpakov, R., & Kucherov, G. (1999). Finding maximal repetitions in a word in linear time. In Foundations of Computer Science, 1999. 40th Annual Symposium on (pp. 596-604). IEEE.

xskxzr
  • 7,613
  • 5
  • 24
  • 47
Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514