0

Let $X=x_1,x_2,x_3,\dots,x_n$ be set of $n$ distinct keys. I read this posts about Randomly build binary search trees like this and now I encounter the following question in my mind, suppose we generate a random permutation $X'$ from $X$ and we insert first half of $X'$ in empty binary search tree $T$. Also we sort second half of $X'$ and then insert them, can we conclude that the expected height of $T$ is $O(\log n)$. I sense the answer is yes, because we know that the expected height of $T$ after inserting first half is at most $\log \frac{n}{2}$ but I can't convince myself that after inserting second half the average height remain $O(\log n)$.

ErroR
  • 1,954
  • 6
  • 22

1 Answers1

0

Suppose for simplicity that $n$ is even and let $S$ be the sorted list of elements. Let the elements that end up in the first half of your permutation be "red", while the others are "blue".

Consider a leaf $v$ of the tree $T$ built on the first half of the elements in the permutation, and notice that $v$ stores some red element $x$. After inserting the second half of the elements, the height of the subtree of $T$ rooted in $v$ will be upper bounded by the maximum number of contiguous blue elements immediately preceding or following $x$ in $S$.

Therefore, we just need to bound the length of the longest run of blue elements in $S$.

To this aim let $\ell=\lceil 2 \log n \rceil-1$ and fix an element $y$ in $S$ (we don't care whether $y$ will be red or blue) that is followed by at least $\ell$ elements. Let $Y_i$ be the event "the $i$-th element following $y$ is blue". We want to estimate the probability that $y$ is the first element of a run of $\ell+1$ blue elements. Formally, we want to upper bound the probability of the event $ \bigcap_{i=0}^\ell Y_i$. $$ \Pr\left(\bigcap_{i=0}^\ell Y_i\right) = \prod_{i=0}^\ell \Pr(Y_i \mid Y_1, \dots, Y_{i-1}) \le \prod_{i=0}^\ell \Pr(Y_i) = \frac{1}{2^{\ell+1}} \le \frac{1}{2^{2\log n}} = \frac{1}{n^2}. $$

Taking the union bound over all possible elements $y$ (i.e., over all possible starting positions of the run), we can conclude that the probability that there exists some run of at least $2\log n$ blue elements in $S$ is at most $n \cdot \frac{1}{n^2} = \frac{1}{n}$.

As a consequence, the height of the tree will be at most $\log \frac{n}{2} + 2 \log n < 3 \log n$ with high probability (i.e., with probability at least $1-\frac{1}{n}$).

Steven
  • 29,724
  • 2
  • 29
  • 49