Questions tagged [exact-string-matching]

24 questions
8
votes
2 answers

Minimal regular expression that matches a given set of words

I have a dictionary-like regular expression, an "or chain" of words, word1|word2|word3|... Unfortunately, the chain is too large. I'd like to find the minimal regular expression that is equivalent. How can do I do that? You should think of this…
7
votes
3 answers

Is there a data structure for efficiently searching a string that contains a given substring?

This question arose from a practical problem: given a set of texts, find one, which contains a given string (not word). Let $S$ be a set of $n$ strings, and $l$ the length of the longest string in $S$. What will be the best data structure to…
7
votes
1 answer

How does the Galil's rule work on Boyer-Moore algorithm?

I would like to know how Boyer-Moore text searching algorithm with Galil's rule works,. I tried to search for but I couldn't understand the information I found, for example this Wikipedia page. And why with this rule we go to a linear time…
6
votes
1 answer

String matching algorithm - check if a string matches a pattern

This looks like quite the challenge; given a pattern $P$ (of length $n$) and a string $S$ (of length $m$), how would you check whether the string matches the pattern? For instance: If $P$ = "xyx" and $S$ = "foobarfoo" then $S$ matches $P.$ If $P$…
4
votes
1 answer

Runtime of good suffix table creation in Boyer-Moore algorithm

According to Wikipedia, both bad character table and good suffix table can be created in $O(n)$ time, where $n$ is the length of the pattern. It is pretty obvious how bad character table can be computed in linear time, but I don't understand how…
nlogn
  • 143
  • 4
4
votes
4 answers

Substring in a infinite sequence of numbers

I have an infinite sequence of numbers, starting from 1 and need to find position of begin of some given substring of numbers. Example: 1234567891011121314151617181920 ... S = 141 Result: 18 All i think about is convert sequence to string and find…
fryme
  • 41
  • 3
3
votes
3 answers

Complexity of string comparison vs whitespace-trimmed string comparison

I recently worked on an algorithm which, among other things, checks strings for equality using the classic builtin equality operator: str1 == str2 (I think it should be irrelevant to the question, but I faced this issue in C++, and str1 and str2…
3
votes
1 answer

Why does the exact string matching brute force algorithm not compare index 1 of P with index 1 of S in the first round of the for?

In my ADS course we were given this pseudo code for the "exact string matching brute force" algorithm: 1 ESM-BF(P, S) 2 m = length(P), n = length(S) 3 k = 0 # number of matches 4 for j=1,...,n-m+1do 5 i=1 6 while i ≤ m and P[i]…
ilam engl
  • 151
  • 6
3
votes
1 answer

Is there any neutral element for the cryptographic hash function SHA256? (or its variants)

My question is the following: Is it possible to compute a string given that after applying a SHA256 function the result is the same string? Edit for clarification: If my string A is a neutral element of SHA256, then: A == SHA256(A) is true. Does A…
2
votes
0 answers

Bad character rule in the Apostolico–Giancarlo algorithm

In the paper "Tight bounds on the complexity of the Apostoliko-Giancarlo algorithm" by Crochemore and Lecroq authors prove that algorithm performs not more than $1.5n$ comparison of characters in the processing stage. If I understand their proof…
2
votes
0 answers

Why is the second while loop in KMP not a conditional statement?

When building the partial match table for KMP: void buildBackTable() { int i = 0, j = -1; b[0] = -1; while (i < m) { while (j >= 0 && P[i] != P[j]) j = b[j]; //Why is this a while loop!? i++; j++; b[i] = j; } }…
2
votes
0 answers

How to build a minimal string matching DFA with limited memory?

I am working on finite state machine pruning, a problem that requires me to build finite state machines (in the manner of the Aho Corasic algorithm) to match an evolving input string against a set of suffixes. If a match occurs, the search is…
fuz
  • 913
  • 6
  • 20
2
votes
1 answer

Information-theoretic lower bound for succinct string dictionary of the Unicode Name property

Background The literature on succinct data structures refers often to the “information-theoretic lower bound” of encoding data, i.e., the minimum number of bits needed to store the data – a concept related to information-theory entropy. For…
2
votes
1 answer

What is the SetHorspool string searching algorithm and how is it implemented?

What is the SetHorspool string searching algorithm with pseudo-code so it can be easily implemented in a language of choice? This has been implemented in 2 libraries I have come…
2
votes
2 answers

Sliding Window Dictionary String Matching

Consider the following problem. We are given a set of patterns (strings) $\Pi = \{\pi_i\}$, a text $s$, and a window length $k$. We want a list of all shifts $0 \le i \le |s|-k$ such that every pattern in $\Pi$ is contained in the substring…
1
2