Entropy of empirical samples of an unknown true ensemble

Question

The entropy of an ensemble $X=(x,A_x,P_x)$ is defined as

$H(x) = \sum_{x \in A_x} P(x) \log{\frac{1}{P(x)}},$

with $x$ as the value a random variable takes, $A_x$ as the alphabet of possible values and $P_x$ the set containing the probability of each element of the alphabet $A_x$.

However, what happens when I don't have $P_x$, but only samples that come from $P_x$?

More concretely, let's say I have only $N$ dice rolls from a biased die. Can I construct an empirical $P_x$ from counting these dice rolls and calculate $H(x)$ using it?

I wonder what is the rigorous information-theoretic way of calculating the entropy of an ensemble when I only have samples from $P_x$. Please note that I am assuming I do know the alphabet $A_x$.

Thank you,

This is not easy at all. See eg here https://math.stackexchange.com/questions/604654/estimating-the-entropy and here http://thirdorderscientist.org/homoclinic-orbit/2013/7/9/how-many-ways-can-you-estimate-entropy Of course, another way is to compress the data. — leonbloy, Mar 31 '18 at 14:39
Thank you! It seems this is a good article: https://memming.wordpress.com/2014/02/09/a-guide-to-discrete-entropy-estimators/ — Paulo A. Ferreira, Mar 31 '18 at 17:53

Entropy of empirical samples of an unknown true ensemble

0 Answers0