2

Difference between “average length” and “entropy” gives the percent of optimal. The optimal case is when the average length of a code is equal to the entropy. For example if average length is 1 and entropy is 0.72: (1- 0.72) = 0.28 -> 28% worse than optimal.

If both “average length” and “entropy” are 1, the compression is optimal.

But what does it mean if the result is negative value?

Note: entropy : -(pr*lg(pr)). Average length: probability*number of bit

Dave Clarke
  • 20,345
  • 4
  • 70
  • 114

1 Answers1

3

Shannon's source coding theorem shows that you cannot compress data more than its entropy, even if you encode chunks at a time. For the specific case of prefix-free codes (even uniquely-decodable codes), this follows from Kraft's inequality, which for a uniquely-decodable code with codeword lengths $\ell_i$ states that $$ \sum_i 2^{-\ell_i} \leq 1, $$ and furthermore every code can be "improved" (by only decreasing its codeword lengths) so that equality holds. Suppose that the probability of the $i$th symbol is $p_i$. The average codeword length is $E = \sum_i p_i \ell_i$. We want to minimize $E$ given the constraint $\sum_i 2^{-\ell_i} = 1$, and furthermore $\ell_i \geq 0$ are integers. Relaxing integrality, we can find the optimum using Lagrange multipliers: it satisfies $p_i = \lambda 2^{-\ell_i} \log_e 2$ for some $\lambda$. Kraft's inequality forces the solution $2^{-\ell_i} = p_i$, and so $$ E \geq \sum_i p_i (-\log_2 p_i). $$ The right-hand side is simply the entropy of the distribution.

This argument immediately gives a criterion for when Huffman's code reaches the entropy: this happens if $-\log_2 p_i$ is always an integer, that is, all probabilities are of the form $1/2^k$ (for integer $k$).

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514