Let's say I have the following string:
XXXXXAYYYYBYYYYCXXXXXDYYYY
We can see that the substring XXXXX is 5 characters long, and repeated 2 times. So it covers 10 characters. This is the longest repeated substring
However, there's the substring YYYY which is 4 characters long, and repeated 3 times and therefore covers 12 characters.
I want to find the repeated substring of length of at least 2 with the highest coverage, and that don't overlap.
Is there an efficient algorithm that does so?
This is the current algorithm I'm using (written in Java), which is naive and slow :
String highestCoveringRepeatedSubstring(String string) {
int highestCoverage = 0;
String highestCoveringRepeatedSubstring = "";
for (int length = 1; length < string.length(); length++) {
for (int index = 0; index < string.length() - length; index++) {
String substring = string.substring(index, index + length);
int count = 0;
for (int i = index; i >= 0; i = string.indexOf(substring, i + length)) {
count++;
}
if (count > 1 && length * count >= highestCoverage) {
highestCoverage = length * count;
highestCoveringRepeatedSubstring = substring;
}
}
}
return highestCoveringRepeatedSubstring;
}