4

According to Wikipedia, both bad character table and good suffix table can be created in $O(n)$ time, where $n$ is the length of the pattern. It is pretty obvious how bad character table can be computed in linear time, but I don't understand how good suffix table can be computed in linear time. I thought about it a lot, and could not come up with an $O(n)$ algorithm to generate that table. I also looked at the Java implementation provided in Wikipedia, but I think it runs in $O(n^2)$ time. For example, look at the following two lines taken from makeOffsetTable method.

for (int i = needle.length - 1; i >= 0; --i) {
    if (isPrefix(needle, i + 1)) {

Outer for loop iterates needle.length (i.e. $n$) times, and within the loop body isPrefix is called, which contains a loop that starts with the following header.

for (int i = p, j = 0; i < needle.length; ++i, ++j) {

Just to make it easier on the eyes, here is the two loops shown together.

for (int i = needle.length - 1; i >= 0; --i) {
    for (int i2 = i + 1, j = 0; i2 < needle.length; ++i2, ++j) {

I can get rid of j of the inner loop to further simplify it.

for (int i = needle.length - 1; i >= 0; --i) {
    for (int i2 = i + 1; i2 < needle.length; ++i2) {

What I see here is that with each successive iteration of the outer loop, inner loop will have one additional iteration. So, the number of iterations for the inner loop will be $0, 1, 2, 3, 4, ..., (n-1)$ which adds up to $(n-1)n/2$, which is $O(n^2)$, which is not linear.

So, the question is, does it really take more than linear time to compute the good suffix table, or am I making a mistake with runtime calculation?

nlogn
  • 143
  • 4

1 Answers1

2

The implementation provided in Wikipedia is $O(n^2)$. But these tables can actually be built in $O(n)$ time via Z-algorithm. Since you can find Z-algorithm on internet easily, I just provide a sketch here.

Let $s[i,j]$ denote the substring of $s$ from $i$-th position to $j$-th position. Like what we see in the $O(n^2)$ code, we want to find the largest value $z_i$ satisfying $s[i-z_i+1,i]=s[n-z_i+1,n]$ for each $i$ (the longest matching suffix).

Now compute $z_i$ from back to front. Let $l$ be the smallest number s.t. $s[l,r]=s[n-r+l,n]$ for some $r>i$. If $l\le i$, we have $s[l,i]=s[n-r+l,n-r+i]$, which means we can get some information from $z_{n-r+i}$. If $z_{n-r+i}\le i-l$, $z_i=z_{n-r+i}$. Otherwise $s[l,i]=s[n-i+l,n]$ and $z_i>i-l$. In this case, we should find the real $z_i$ by brute force, but at the same time we get a smaller $l$, which provide more information for future computation. Since $l$ is non-increasing, the algorithm works in $O(n)$.

aaaaajack
  • 515
  • 2
  • 7