2

Question

Prove that the number of comparisons between elements in binary heap build is at most $2n-2$.

$n$ is the total number of the nodes.


Pseudocode

BUILD-MAX-HEAP (A)
1 heap-size [A] < - length[A]
2 for i <- Length[A]/2| downto 1
3 do MAX-HEAPIFY (A, i)
MAX-HEAPIFY (A, i)
1 l <- LEFT (i)
2 r <- RIGHT (i)
3 if l <= heap-size [A] and A[l] > Ali]
4    then largest <- l
5    else largest <- i
6 if r <= heap-size [A] and A[r] > A[largest]
7    then largest <- r
8 if largest ‡ i
9    then exchange A[i] with A[largest]
10   MAX-HEAPIFY (A, largest)

What I tried

I know that the number of comparisons between elements in binary heap build is at most:

$$ \sum_{i=0}^{\lfloor \log_{2}{n} \rfloor} \bigg \lceil \frac{n}{2^{i+1}} \bigg \rceil \cdot 2i $$

where $n \in \mathbb{N}$.

Because the number of nodes of height $i$ is at most$ \bigg \lceil \frac{n}{2^{i+1}} \bigg \rceil $, for each node there are at most $2i$ comparisons. Since the height of the tree is $\lfloor \log_{2}{n} \rfloor$ we can claim what I claimed above.

I want to claim that: $$ \sum_{i=0}^{\lfloor \log_{2}{n} \rfloor} \bigg \lceil \frac{n}{2^{i+1}} \bigg \rceil \cdot 2i \le 2n-2 $$

However, there exists a counterexample $n = 9$.

I have 2 questions:

  • Is this claim is true?
  • How can I deduce that? (I tried)

A similar question has been asked in here. I didn’t understand the solution which also assumed tree is full.

Kenneth Kho
  • 753
  • 3
  • 17
Lior
  • 143
  • 5

1 Answers1

2

The first two proofs only work for a fully populated heap of height h, which contains n=2h-1 items. The third proof also works for partial heaps. Throughout this answer, i will use two functions called BuildCost and MergeCost. The build cost is the cost for constructing a heap of height h from scratch. The merge cost is for merging two heaps of height h and one unsorted item into a single heap of height h+1.

Your approach, revisited

I used h-1 and n+1 instead of floor/ceiling, so my proof is only valid for a fully populated heap as described in the intro. All divisions and logarithms are exact in my solution. $$ \begin{align*} \text{BuildCost}(h) &= \sum_{i=0}^{h-1}\frac{2^{h-1}}{2^{i+1}}\cdot\text{MergeCost}(i) \\ &= \sum_{i=0}^{h-1}\frac{2^{h-1}}{2^{i+1}}\cdot2i \\ &= 2^h\sum_{i=0}^{h-1}\frac{i}{2^i} \\ &= (n+1)\sum_{i=0}^{h-1}\frac{i}{2^i} \end{align*} $$ At this point, we can use the identity: $$\sum_{m=0}^M\frac{m}{2^m} = 2-\frac{M+2}{2^M}$$ to solve the build cost as: $$ \begin{align*} \text{BuildCost}(h) &= 2^h\sum_{i=0}^{h-1}\frac{i}{2^i} \\ &= 2^h\left(2 - \frac{h-1+2}{2^{h-1}} \right)\\ &= 2\cdot2^h - 2h -2 \\ &= 2(n+1) - 2h - 2 \\ &= 2n - 2h \\ \end{align*} $$

Alternative: proof by induction

Here is a proof by induction, which confirms the results above. I like it better because it doesn't rely on a non-obvious identity like the other proof.

Induction hypothesis: $$ \begin{align*} \text{BuildCost}(h) &= 2n - 2h\\ \text{MergeCost}(h) &= 2h \\ n &= 2^h-1 \end{align*} $$ The base case is a heap of height one, which requires zero comparisons to build: $$ \text{BuildCost}(1) = 2(2^1-1) - 2\cdot1 = 0 $$ The induction step is: $$ \begin{align*} \text{BuildCost}(h+1) &= 2(2^{h+1}-1) - 2(h+1)\\ &= 4\cdot2^h -2h -4 \\ &= 4\cdot2^h -4h -4 &+& 2h \\ &= 2(2\cdot2^h -2h -2) &+& 2h \\ &= 2\cdot(2(2^h-1)-2h) &+& 2h \\ &= 2\cdot\text{BuildCost}(h) &+& \text{MergeCost}(h) \end{align*} $$

Extension to partial heaps

The build cost for a full heap is 2n-2h. We can think of the -2h term as a "merge budget" so that when we merge the heap, we're still below the 2n limit. If we merge two heaps of the same height, both sides bring a budget of 2h and the unsorted item brings a budget of 2, so we end up with 4h+2 of which the merge itself consumes 2h. What remains is 2(h+1) and this is exactly the merge budget that we want to pass up the tree. So far, this paragraph is a repeat of the induction step.

Now let's do the induction step again, but this time the left child has height h and the right child has height h-1. The left child has nl items (fully populated or not) and build cost of 2nl-2h. The right child has nr items (fully populated or not) and build cost of 2nr-2(h-1). My working hypothesis is that all heaps of height h have a build cost of 2n-2h regardless if fully populated or not. This is false, but for now let's pretend it's true. $$ \begin{align*} \text{BuildCost} &= \text{BuildCost}(h) + \text{BuildCost}(h-1) + \text{MergeCost}(h)\\ &= 2n_l - 2h + 2n_r - 2(h-1) + 2h \\ &= 2n_l + 2n_r + 2 - 2h \\ &= 2(n_l + n_r + 1) - 2h \\ &\neq 2(n_l + n_r + 1) - 2(h+1) \end{align*} $$ The merged heap has nl+nr+1 items and height h+1. According to my working hypothesis, it should have a build cost of 2(nl+nr+1)-2(h+1). This is wrong, the actual build cost requires two extra comparisons. Using the terminology of "merge budget" again, we can clearly see what went wrong: The shorter child's merge budget was only 2(h-1), yet the merge required 2h comparisons in the worst case (recursion into the taller child) and we're now two comparisons short of a balanced budget. In conclusion, a heap of n items has a build cost of: $$\text{BuildCost} = 2n-2h+2k $$ where h is the height and k is the number of asymmetric mergers (left child one taller than right child) that occurred during construction of the heap. Now we have to prove that k<h or, in other words, that there is at most one asymmetric merger per (non-leaf) layer. This is indeed true. Here is a small example of a heap with n=10, h=4. The letter x denotes a missing leaf (compared to a fully populated heap).

         1
    2         3
 4     5    6   7
8 9  10 x  x x x x

Since we populate the heap breadth first and from left to right, there can be only one partially populated node per layer. In this example, only nodes 1 and 5 have children of unequal height, so k=2. Here's another example where nodes 1, 2, and 4 have children of unequal height, so k=3. That's the maximum asymmetry we can achieve in a heap of height 4.

         1
    2         3
 4     5    6   7
8 x   x x  x x x x

But wait, there's more. We can find the asymmetric mergers from the binary representation of the number of missing leafs. In the n=10 example, we had 5 missing leafs. 5 is 101 in binary, which tells us there's an asymmetric merger in the top and bottom layer but not in the middle layer (we ignore the leaf layer here, so top/middle/bottom refers to the non-leaf layers). The n=8 example had 7 missing leafs, which is 111 in binary. I think you can see a pattern. Finally, the build cost of an arbitrary heap is: $$\text{BuildCost} = 2n-2h+2\cdot\text{PopCount}(2^h-1-n) \le 2n-2 $$

Rainer P.
  • 862
  • 5
  • 9