Questions tagged [string-matching]

Use for generic string matching that may be exact substring matching (though then prefer the tag exact-string-matching), may be matching to a regular expression, or may be approximate matching (e.g. finding substrings within a given Levenshtein distance of a reference string)

79 questions
8
votes
3 answers

What is the expected time complexity of checking equality of two arbitrary strings?

The simple (naive?) answer would be O(n) where n is the length of the shorter string. Because in the worst case you must compare every pair of characters. So far so good. I think we can all agree that checking equality of two equal length strings…
8
votes
1 answer

How to perform orthogonal check on two circular binary strings?

Say we have two circular binary strings $a = a_0a_1...a_{n-1}$ and $b = b_0b_1...b_{n-1}$ with arbitary starting point, and define a and b are orthogonal if $\sum_{i=0}^{n-1}a_ib_i = 0$. Is there a $O(nlogn)$ algorithm can tell a rotation of such…
Yiqun Sun
  • 81
  • 1
6
votes
1 answer

Find the 'best' longest common subsequence

I am writing a program that computes and displays diffs. I implemented Meyers algorithm that computes the LCS between 2 subsequences (seq1 and seq2); its output is one of the possible LCS and a partition of seq1 and seq2, one projection of which is…
5
votes
1 answer

Find all substrings that fit the mask with asterisks

There is a problem. Given string $text$ containing only letters and string $mask$ containing letters and asterisks (*), where asterisk means substitution of zero or more letters, find all substrings of $text$ that fit $mask$. There is an example:…
Elman
  • 155
  • 6
4
votes
2 answers

NFA models with characters on nodes, not edges

I am attempting to understand the inner workings of the open source string matching library Hyperscan. It takes a multiple-engine approach to the problem of generating string matches, and I'm still in the early stages of following through the…
Daniel Martin
  • 643
  • 4
  • 14
4
votes
0 answers

Remove contiguous 5th powers (5-fold repetitions) from list of 'a's and 'b's?

Given a list of characters in $\{a,b\}$, for example $abababababa$, what is the most efficient way to remove all 5th powers in a way that makes the string as short as possible? (This example would reduce to a since the $(ab)^5$ cancels.) By 5th…
4
votes
1 answer

please help me understand the algorithm for building the KMP failure function

I am struggling to grasp the algorithm for building the KMP failure function. The bulk of what is making my understanding incomplete concerns the line length=PI[length-1]. There is the psuedo code for the algorithm below. Here are my questions: 1.)…
B_math
  • 39
  • 2
4
votes
3 answers

Sorting array of strings (with repetitions) according to a given ordering

We get two arrays: ordering = ["one", "two", "three"] and input = ["zero", "one", "two", "two", "three", "three", "three", "four"]; We want to find the array output so that output = ["one", "two", "two", "three", "three", "three", "zero",…
Pe Wu
  • 143
  • 5
3
votes
3 answers

By what criteria is the base value selected in Rabin Karp algorithm?

In the Rabin Karp algorithm the rolling hash is calculated as follows: H1= c1*a^k-1 + c2*a^k-2+c3*a^k-3+…+ck*a^0 where a is a constant. On what basis is this a selected? In Cormen they have used a value 10 and at some other places it is 26. By…
Navjot Singh
  • 1,215
  • 1
  • 9
  • 26
3
votes
3 answers

Complexity of string comparison vs whitespace-trimmed string comparison

I recently worked on an algorithm which, among other things, checks strings for equality using the classic builtin equality operator: str1 == str2 (I think it should be irrelevant to the question, but I faced this issue in C++, and str1 and str2…
3
votes
1 answer

calculating the string similarity of an optimal alignment

description of the algorithms behavior I have two strings s1 and s2, with $len\_s1 <= len\_s2$. I would like to find the substring of s2, that has the biggest similarity to s1. The following alignments are possible: [s2[:i] for i in range(len_s1)] +…
3
votes
0 answers

Intellij string search and highlight algorithm

I'm searching for an alogrithm that takes two strings, a query and a string that is to be searched for the query. The algorithm should result in a 'found' when the string contains the characters of the query in the right order but with any amount of…
3
votes
1 answer

Given a list of strings, find every pair $(x,y)$ where $x$ is a substring of $y$. Possible to do better than $O(n^2)$?

Consider the following algorithmic problem: Given a list of strings $L = [s_1, s_2, \dots, s_n]$, we want to know all pairs $(x,y)$ where $x$ is a substring of $y$. We can assume all strings are of length at maximum $m$, where $m << n$ and are all…
3
votes
1 answer

one-to-many matching in bipartite graphs?

Consider having two sets $L$ (left) and $R$ (right). $R$ nodes have a capacity limit. Each edge $e$ has a cost $w(e)$. I want to map each of the $L$ vertices to one node from $R$ (one-to-many matching), with minimum total edge-costs. Each vertex in…
mcqueenvh
  • 53
  • 6
3
votes
1 answer

Find shortest prefix to generate original string by overlapping

Given a string $S$, I want to find the prefix string $P$ of shortest length, such that the original string $S$ can be generated by concatenating copies of $P$ (where overlapping is allowed). For example, if $S = atgatgatatgat$, I want to find $P =…
1
2 3 4 5 6