Shannon's source coding theorem shows that you cannot compress data more than its entropy, even if you encode chunks at a time. For the specific case of prefix-free codes (even uniquely-decodable codes), this follows from Kraft's inequality, which for a uniquely-decodable code with codeword lengths $\ell_i$ states that
$$ \sum_i 2^{-\ell_i} \leq 1, $$
and furthermore every code can be "improved" (by only decreasing its codeword lengths) so that equality holds.
Suppose that the probability of the $i$th symbol is $p_i$. The average codeword length is $E = \sum_i p_i \ell_i$.
We want to minimize $E$ given the constraint $\sum_i 2^{-\ell_i} = 1$, and furthermore $\ell_i \geq 0$ are integers. Relaxing integrality, we can find the optimum using Lagrange multipliers: it satisfies $p_i = \lambda 2^{-\ell_i} \log_e 2$ for some $\lambda$. Kraft's inequality forces the solution $2^{-\ell_i} = p_i$, and so
$$ E \geq \sum_i p_i (-\log_2 p_i). $$
The right-hand side is simply the entropy of the distribution.
This argument immediately gives a criterion for when Huffman's code reaches the entropy: this happens if $-\log_2 p_i$ is always an integer, that is, all probabilities are of the form $1/2^k$ (for integer $k$).