13

Given two strings how can you check if they are a permutation of each other using O(1) space? Modifying the strings is not allowed in any way.
Note: O(1) space in relation to both the string length AND the size of the alphabet.

Teodor Dyakov
  • 1,341
  • 1
  • 13
  • 22

7 Answers7

12

Denote the arrays by $A,B$, and suppose they are of length $n$.

Suppose first that the values in each array are distinct. Here is an algorithm that uses $O(1)$ space:

  1. Compute the minimum values of both arrays, and check that they are identical.

  2. Compute the second minimum values of both arrays, and check that they are identical.

  3. And so on.

Computing the minimum value of an array clearly uses $O(1)$ space. Given the $k$th smallest element, we can find the $(k+1)$st smallest element by finding the minimal value larger than the $k$th smallest element (here we use the fact that all elements are distinct).

When elements are allowed to repeat, we modify the algorithm as follows:

  1. Compute the minimum values $m_{A,1},m_{B,1}$ of both arrays, count how many times each appear, and verify the $m_{A,1} = m_{B,1}$ and that the counts are identical.

  2. Compute the minimum values $m_{A,2},m_{B,2}$ larger than $m_{A,1},m_{B,1}$ in the two arrays (respectively), and count how many times each appear. Verify that $m_{A,2} = m_{B,2}$, and that the counts are identical.

  3. And so on.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514
7

The naive approach would be building histograms of both strings and checking whether they are the same. Since we are not allowed to store such a data structure (whose size would be linear to the size of the alphabet) that could be computed in one pass, we need to count the occurences of each possible symbol after the other:

function count(letter, string)
    var count := 0
    foreach element in string
        if letter = element
            count++
    return count

function samePermutation(stringA, stringB)
    foreach s in alphabet
        if count(s, stringA) != count(s, stringB)
            return false
    return true

This does of course assume that the counts and iterator indices are integers of constant size, instead of being dependent on the length of the strings.

Bergi
  • 610
  • 3
  • 12
2

Define some function f(c) which maps some character c to a unique prime number (a = 2, b = 3, c = 5, etc).

set checksum = 1
set count = 0 <-- this is probably not even necessary, but it's another level of check
for character c in string 1
    checksum = checksum * f(c)
    count = count + 1
for character c in string 2
    checksum = checksum / f(c)
    count = count = 1

permutation = count == 0 and checksum == 1

Just declaring that you can use a prime number mapping function is a bit handwavey, and most likely where a problem would arise keeping $\textit{O}\textbf{(1)}$ space.

Turksarama
  • 129
  • 2
0

You can do this is O(nlogn). Sort the two strings, and compare them index by index. If they differ anywhere, they're not permutations of each other.

For an O(n) solution, hashing could be used. This hashing function would work, and e for any letter would be its ascii value. If the two hashes of the strings differ, they're not permutations of each other.

The hashing function in the link:

One potential candidate might be this. Fix a odd integer R. For each element e you want to hash compute the factor (R + 2*e). Then compute the product of all these factors. Finally divide the product by 2 to get the hash.

The factor 2 in (R + 2e) guarantees that all factors are odd, hence avoiding that the product will ever become 0. The division by 2 at the end is because the product will always be odd, hence the division just removes a constant bit.

E.g. I choose R = 1779033703. This is an arbitrary choice, doing some experiments should show if a given R is good or bad. Assume your values are [1, 10, 3, 18]. The product (computed using 32-bit ints) is

(R + 2) * (R + 20) * (R + 6) * (R + 36) = 3376724311 Hence the hash would be

3376724311/2 = 1688362155.

Using double hashing (or for overkill even more) by changing the value of R would successfully identify them as permutations with very high probability.

Indo Ubt
  • 221
  • 1
  • 12
0

Let's say you have two strings called s and t.

You could use heuristics to make sure that they are not unequal.

  1. s.length == t.length
  2. sum of chars of s == sum of chars in t
  3. [the same as in 2. but with xor instead of sum]

After this you can easily run an algorithm to prove that the string are equal.

  1. sort one string to be equal to the other and compare (O(n^2))
  2. sort both and compare (O(2n log(n))
  3. check for each char in s if there are the same amounts in both strings (O(n^2))

Of course you can't sort that fast if you are not allowed to use additional space. So it does not matter which algorithm you choose - each algorithm will need will run in O(n^2) time when there is only O(1) space and if the heuristic was unable to prove that they can't be equal.

0

In C-style code for the whole routine:

for (int i = 0; i < n; i++) {
   int k = -1;
   next: for (int j = 0; j <= i; j++)
       if (A[j] == A[i]) {
          while (++k < n)
              if (B[k] == A[i])
                  continue next;
          return false; // note at this point j == i
       }
}
return true; 

Or in very verbose pseudo code (using 1-based indexing)

// our loop invariant is that B contains a permutation of the letters
// in A[1]..A[i-1]
for i=1..n
   if !checkLetters(A, B, i)
      return false
return true

where the function checkLetters(A,B,i) checks that if there are M copies of A[i] in A[1]..A[i], then there are at least M copies of A[i] in B:

checkLetters(A,B,i)
    k = 0 // scan index into B
    for j=1..i
      if A[j] = A[i]
         k = findNextValue(B, k+1, A[i])
         if k > n
            return false
    return true

and function findNextValue searches in B for a value starting from an index, and returns the index where it was found (or n+1 if not found).

The idea here is that we have already validated that we have a permutation of the characters before i in the outer loop. In the j loop, we proceed through all the characters that match A[i], and we need to be able to find unique indices in B that match for them (including the i-th position). Thus for each of those, we need to move our scan forward through B. Time is O($n^2$).

MotiNK
  • 553
  • 2
  • 12
0

I think this is the most simplest algorithm (with $O(n^3$) time, $n$ length of strings)

Loop through string1 and string2, for every character check how often it can be found in string1 and string2. I a character is more often in one string than in the other one, it is not a permutation. If the frequencies of all characters are equal then the strings are permutations of each other.

Here is a piece of python to make this precise

s1="abcaba"
s2="aadbba"

def check_if_permutations(string1, string2):
  for string in [string1, string2]:
    # string references string1 
    #  string2, it is not a copy
    for char in string:
      count1=0
      for char1 in string1:
        if  char==char1:
          count1+=1
      count2=0
      for char2 in string2:
        if  char==char2:
          count2+=1
      if count1!=count2:
        print('unbalanced character',char)
        return()
  print ("permutations")
  return()

check_if_permutations(s1,s2)

The program needs some pointers to strings (string, string1, string2, char, char1, char2) and variables of size $O(\log n)$ for counting (count1, count2). It needs to check if characters are equal or not but it dows not need any order on these characters. Maybe it needs some variables for small integers (e.g. to hold boolean values or to represent the postition of string in [string1, string2].

Of course you don't even need the count variables but can use pointers.

s1="abcaba"
s2="aadbba"

def check_if_permutations(string1, string2):
  for string in [string1, string2]:
    # string references one of string1 
    # or string2, it is not a copy
    for char in string:
      # p1 and p2 should be views as pointers
      p1=0
      p2=0
      while (p1<len(string1)) and (p2<len(string2)):
        # p1>=len(string1): p1 points to beyond end of string
        while (p1<len(string1)) and (string1[p1]!=char) :
          p1+=1
        while(p2<len(string2)) and (string2[p2]!=char):
          p2+=1
        if (p1<len(string1)) != (p2<len(string2)):
          print('unbalanced character',char)
          return()
        p1+=1
        p2+=1
  print ("permutations")
  return()

check_if_permutations(s1,s2)

This second program needs similar variables as the first one except that it does not need the $O(\log(n))$-size variables for holding the count values.

So it actually does not depend of $n$ or the size of the alphabet.

miracle173
  • 552
  • 2
  • 16