3

Given a string $S$, I want to find the prefix string $P$ of shortest length, such that the original string $S$ can be generated by concatenating copies of $P$ (where overlapping is allowed).

For example, if $S = atgatgatatgat$, I want to find $P = atgat$; $P$ is the smallest prefix of $S$ that can be concatenated (in this case three times, starting at indices $\{0,3,8\}$ of $S$, where the first and second copies overlap but the second and third copies do not overlap) to equal $S$.

Obviously, there is an $\mathcal{O}(n^2)$ algorithm by checking each prefix of $S$, but a colleague mentioned it might be possible to do it in $\mathcal{O}(n \log n)$. I'm thinking of using suffix arrays for different prefixes of $S$ but haven't quite been able to proceed from there.

Robert Lee
  • 31
  • 1

1 Answers1

1

What you are looking for is called the quasiperiod of a string. If such a string has a quasiperiod of $|S|$ it is called superprimitive, and can not be covered by a substring.

A method for computing it in $O(n)$ time is given in "An On-Line String Superprimitivity Test" by Dany Breslauer. You might also be interested in "Of Periods, Quasiperiods, Repetitions and Covers" by Alberto Apostolico and Dany Breslauer.

orlp
  • 13,988
  • 1
  • 26
  • 41