The entropy of an ensemble $X=(x,A_x,P_x)$ is defined as
$H(x) = \sum_{x \in A_x} P(x) \log{\frac{1}{P(x)}},$
with $x$ as the value a random variable takes, $A_x$ as the alphabet of possible values and $P_x$ the set containing the probability of each element of the alphabet $A_x$.
However, what happens when I don't have $P_x$, but only samples that come from $P_x$?
More concretely, let's say I have only $N$ dice rolls from a biased die. Can I construct an empirical $P_x$ from counting these dice rolls and calculate $H(x)$ using it?
I wonder what is the rigorous information-theoretic way of calculating the entropy of an ensemble when I only have samples from $P_x$. Please note that I am assuming I do know the alphabet $A_x$.
Thank you,