0

Suppose that you had two trees.

Our goal is to convert the two trees into two integers such that two trees are the same if and only if the two integers are the same.

Suppose that we have a function named leaf_hash() which converts the leaves of the trees into integers such that for any two leaves x1 and x2, we have x1 == x2 if and only if leaf_hash(x1) == leaf_hash(x2)

I want to convert trees, each having many nodes, into integers so that two trees can be compared for equality.

2 Answers2

1

What you are looking for may be encoding instead of hash.

This can be done using some steps, however, the corresponding integers may be big.

  • given $a$, $b$ two integers, denote $\langle a, b\rangle$ an encoding of the couple $(a, b)$. The expression $\langle a, b\rangle = \dfrac{(a+b)(a+b+1)}2 + a + 1$ is suitable, since it defines a bijection from $\mathbb{N}^2$ to $\mathbb{N}$.
  • given a list $\ell = (a_1, …, a_n)$ of integers, define $\langle \ell \rangle = 0$ if $n = 0$ or recursively $\langle \ell \rangle = \langle a_1, \langle a_2, …, a_n\rangle \rangle$.
  • given a tree $t$, define $\langle t\rangle = 0$ if $t$ is the empty tree, or, if $t = \text{Node}(a, t_1, t_2, …, t_n)$, $\langle t \rangle = \langle a, \langle t_1\rangle, \langle t_2 \rangle, …, \langle t_n\rangle\rangle$.

This is indeed a bijection from trees to integers, and can be used to test equality between two trees. For example, if $t = \text{Node}(5, \text{Leaf}(3), \text{Leaf}(2))$, then $\langle t\rangle = 286909$.

Nathaniel
  • 18,309
  • 2
  • 30
  • 58
1

Hash codes will not be identical "if and only if the data is identical". There are more possible trees than possible hash codes, so there will be many trees with the same hash code. What you need to achieve: Two trees that compare equal have the same hash code, and two trees with different hash codes don't compare equal.

First question: When do you consider two trees equal? A tree is really just a sorted list, with some structure to find things quickly, and to add and remove items quickly (faster than in a sorted list using an arrow). So I would consider two trees equal if they are each a sorted list of items with equal hash codes. The tree structure can be different, but we usually don't care about that.

You need an algorithm that calculates the same hash value independent of the actually tree structure. You can with brute force just calculate a hash code from all hash values in the tree structure. Or you can find the elements with index k/8, 3k/8, 5k/8 and 7k/8 where k is the number of elements.

gnasher729
  • 32,238
  • 36
  • 56