3

In the Rabin Karp algorithm the rolling hash is calculated as follows:

H1= c1*a^k-1 + c2*a^k-2+c3*a^k-3+…+ck*a^0

where a is a constant. On what basis is this a selected? In Cormen they have used a value 10 and at some other places it is 26. By observation, when the string is of digits they have taken a = 10 and when the string is of English characters then they have taken a = 26. Is it not possible to compute the hash value using a = 10 for English characters. If not what flaws does it introduce or i am assuming it wrong. I would appreciate help.

Bulat
  • 2,113
  • 1
  • 11
  • 17
Navjot Singh
  • 1,215
  • 1
  • 9
  • 26

3 Answers3

4

You can use any value you want, but it is best to choose a value that is relatively prime to the modulus, as that reduces the number of hash collisions.

For efficiency you might want to choose a value that is small (like 3), as that may make some computations faster.

D.W.
  • 167,959
  • 22
  • 232
  • 500
4
  1. hash values, by definition, may collide
  2. this hash is computed (implicitly) by modulo of 2^N if you use N-bit integers
  3. recommended value of "a" is any large prime number, f.e. 123456791

With a=10, it will definitely still work, although you can easily build a hash collision. With a=123456791 you may need more time to build a hash collision, but they still exists. As D.W. said, it's better to have "a" coprime to 2^N and to hash size, so both a=10 and a=26 aren't the best choice, especially if you compute hashes of strings longer than N chars.

One more note - this formula is detail of rolling hash algorithm, not Rabin-Karp. Rabin-Karp string search algo may be used with arbitrary rolling hash, and moreover - AFAIR their own paper described this algo using CRC as rolling hash:

Bulat
  • 2,113
  • 1
  • 11
  • 17
0

If hash function uses 10, you cannot correctly compute string.

Text : ABAAAK

Pattern : ABA

Then, Let's compute by using value 10 for pattern P.

$ABA = 65*10^2+66*10^1+65*10^0=7,225$

We can find match at shift 0, $ABA=ABA=7225$. But, we also can find match of hash value $ABA=AAK=7,225$ at shift 3.

It is the flaw of 10-base hash function for matching of upper alphabet string. If you use 26-base for this problem, $AAB=45,721 \neq AAK=45,705$.

Plz try yourself bro.

molamola
  • 353
  • 2
  • 4
  • 11