6

This looks like quite the challenge; given a pattern $P$ (of length $n$) and a string $S$ (of length $m$), how would you check whether the string matches the pattern? For instance:

  • If $P$ = "xyx" and $S$ = "foobarfoo" then $S$ matches $P.$
  • If $P$ = "acca" and $S$ = "carbuscarbus" then $S$ does not match $P.$

My thoughts so far: This looks like a dynamic programming problem. We can define a boolean

$M(i, j)$ = True iff pattern $P[i:n]$ matches $S[j:m]$

We then need $M(0, 0).$ Note here that the substring notation is $P[i:n] = P_iP_{i+1}...P_n$.

I couldn't think of an efficient recurrence beyond this. It seems like I would have to introduce an extra mapping (from a substring of $P$ to a substring of $S$, indicating that the earlier "matches" the latter; for example in the first example, ("x", "foo") would be part of the mapping. You then replace "x" by "foo" in the substring of P). This would make it $O(2^n)$ or worse though.

If you don't see the intuitive meaning, here's a more rigorous definition: S matches P iff there exists a mapping $$T : P[i:j] \rightarrow S[a:b]$$ for $i, j \in [1, 2 ...n]$ and $a, b \in [1, 2 ...m]$, such that replacing $P[i:j]$ with $S[a:b]$ transforms P into S. For example, in the first case where $P$ = "xyx" and $S$ = "foobarfoo", $T$ is as follows:

  • T('x') = 'foo'. {Here $i = 1, j = 1, a = 1, b = 3$}
  • T('y') = 'bar'. {Here $i = 2, j = 2, a = 4, b = 6$}

Replacing all instances (in $P$) of 'x' with 'foo' and 'y' with 'bar', gives us 'foobarfoo' i.e. $S$.

Thus, $S$ matches $P.$

Evil
  • 9,525
  • 11
  • 32
  • 53
Mathguy
  • 411
  • 5
  • 14

1 Answers1

5

There are various algorithms for pattern matching in string.

Exact String matching algorithms

  1. Brute Force
  2. Rabin Karp
  3. Boyer-Moree
  4. KMP
  5. Aho Corasick etc.

Approximate String Matching Algorithm

Approximate String matching is applied in text searching, pattern recognition and signal processing applications. For a text T[1..n] and pattern P[1...m], we are supposed to find all the occurrences of pattern in the text whose edit distance to the pattern is at most K. The edit distance between two strings is defined as minimum number of character insertion, deletion and replacements needed to make them equal.

Approximate string matching problem is solved with the help of dynamic programming.

You can have a look through all these algorithms. These algorithms will take optimum complexity like O(m),O(n+m) (in case of KMP for example).

D.W.
  • 167,959
  • 22
  • 232
  • 500
Anjo
  • 372
  • 1
  • 9