Questions about sequences of symbols, sets thereof and their properties as well as uses.
Questions tagged [strings]
512 questions
43
votes
2 answers
Efficient data structures for building a fast spell checker
I'm trying to write a spell-checker which should work with a pretty large dictionary. I really want an efficient way to index my dictionary data to be used using a Damerau-Levenshtein distance to determine which words are closest to the misspelled…
Charles Menguy
- 1,193
- 1
- 10
- 12
32
votes
5 answers
Finding interesting anagrams
Say that $a_1a_2\ldots a_n$ and $b_1b_2\ldots b_n$ are two strings of the same length. An anagramming of two strings is a bijective mapping $p:[1\ldots n]\to[1\ldots n]$ such that $a_i = b_{p(i)}$ for each $i$.
There might be more than one…
Mark Dominus
- 1,567
- 14
- 22
29
votes
2 answers
Longest Repeated (Scattered) Subsequence in a String
Informal Problem Statement:
Given a string, e.g. $ACCABBAB$, we want to colour some letters red and some letters blue (and some not at all), such that reading only the red letters from left to right yields the same result as reading only the blue…
Sekti
- 393
- 2
- 6
28
votes
1 answer
Is there a 'string stack' data structure that supports these string operations?
I'm looking for a data structure that stores a set of strings over a character set $\Sigma$, capable of performing the following operations. We denote $\mathcal{D}(S)$ as the data structure storing the set of strings $S$.
Add-Prefix-Set on…
Alex ten Brink
- 9,206
- 3
- 36
- 63
27
votes
2 answers
Efficient map data structure supporting approximate lookup
I'm looking for a data structure that supports efficient approximate lookups of keys (e.g., Levenshtein distance for strings), returning the closest possible match for the input key. The best suited data structure I've found so far are…
merijn
- 409
- 4
- 6
25
votes
1 answer
Compression of domain names
I am curious as to how one might very compactly compress the domain of an arbitrary IDN hostname (as defined by RFC5890) and suspect this could become an interesting challenge. A Unicode host or domain name (U-label) consists of a string of Unicode…
eggyal
- 359
- 2
- 7
21
votes
1 answer
Does every large enough string have repeats?
Let $\Sigma$ be some finite set of characters of fixed size. Let $\alpha$ be some string over $\Sigma$. We say that a nonempty substring $\beta$ of $\alpha$ is a repeat if $\beta = \gamma \gamma$ for some string $\gamma$.
Now, my question is whether…
Alex ten Brink
- 9,206
- 3
- 36
- 63
19
votes
1 answer
How does the runtime of the Ukkonen's algorithm depend on the alphabet size?
I am concerned with the question of the asymptotic running time of the Ukkonen's algorithm, perhaps the most popular algorithm for constructing suffix trees in linear (?) time.
Here is a citation from the book "Algorithms on strings, trees and…
Mikhail Dubov
- 623
- 4
- 11
17
votes
3 answers
dynamic programming exercise on cutting strings
I have been working on the following problem from this book.
A certain string-processing language offers a primitive operation which splits a string into two
pieces. Since this operation involves copying the original string, it takes n units of…
Mark
- 373
- 1
- 3
- 7
15
votes
2 answers
Why is the base used to compute hashes in Rabin–Karp always primes?
The Rabin–Karp string matching algorithm requires a hash function which can be computed quickly. A common choice is
$$ h(x_0\ldots x_n) = \sum_{i=0}^n b^i x_i, $$
where $b$ is prime (all computations are module $2^w$, where $w$ is the width of a…
Saurabh Jain
- 291
- 2
- 7
14
votes
2 answers
Comparison between Aho-Corasick algorithm and Rabin-Karp algorithm
I am working on string searching algorithms that support multiple pattern search. I found two algorithms that seem like the strongest candidates in terms of running time, namely Aho-Corasick and Rabin-Karp. However, I could not find any…
Hawk
- 241
- 3
- 7
13
votes
7 answers
How to check if two strings are permutations of each other using O(1) additional space?
Given two strings how can you check if they are a permutation of each other using O(1) space? Modifying the strings is not allowed in any way.
Note: O(1) space in relation to both the string length AND the size of the alphabet.
Teodor Dyakov
- 1,341
- 1
- 13
- 22
13
votes
5 answers
Word Frequency with Ordering in O(n) Complexity
During an interview for a Java developer position, I was asked the following:
Write a function that takes two params:
a String representing a text document and
an integer providing the number of items to return.
Implement the function such…
user2712937
- 131
- 1
- 1
- 3
12
votes
1 answer
Finding the longest repeating subsequence
Given a string $s$, I would like to find the longest repeating (at least twice) subsequence. That is, I would like to find a string $w$ which is a subsequence (doesn't have to be a contiguous) of $s$ such that $w=w' \cdot w' $. That is, $w$ is a…
Dan D-man
- 544
- 3
- 9
12
votes
1 answer
Edit distance of list with unique elements
Levenshtein-Distance edit distance between lists
is a well studied problem.
But I can't find much on possible improvements if
it is known that no element does occurs more than once in each list.
Let's also assume that the elements are…
user362178
- 221
- 1
- 5