In most introductory textbooks on information theory, the entropy of a discrete random variable (r.v.) is defined as
$$H(X) \triangleq -\sum p(x)\log p(x)=-\mathbb E[\log p(X)],$$
where $p$ is the pmf of $X$; while that of a continuous random variable is given by
$$h(X)\triangleq -\int f(x)\log f(x) dx=-\mathbb E[\log f(X)],$$
where $f$ is the pdf of $X$.
Question: Is there a unified definition of entropy for arbitrary random variable?
My question is motivated by Robert M. Gray's book, "Entropy and Information Theory." In his book, he provide a unified definition of divergence and hence for mutual information:
Given a probability space $(\Omega, \mathcal B, P)$ and another probability measure defined on the same space, define the divergence of $P$ with respect to M by
$$D(P\, \Vert\, Q)\triangleq \sup_{\mathcal Q} \sum_{Q\in \mathcal Q}P(Q)\log\frac{P(Q)}{M(Q)},$$
where $\mathcal Q$ is any finite measurable partition of $\Omega.$
For any two random variables, define $$I(X;Y)\triangleq D(P_{XY}\Vert P_X\times P_Y),$$ where $P_{XY}$ and $P_X\times P_Y$ are the joint distribution and product distribution of $X$ and $Y$, respectively.
From my understanding, the nice thing about this definition of mutual information is that it is a unified definition which works for arbitrary random variables, and it reduces to the usual definition of $I(X;Y)$ when $X$ and $Y$ are discrete, or continuous r.v.'s.
In Gray's book, he then goes on to define the entropy in terms of mutual information defined above:
$$H(X)\triangleq I(X;X).$$
For a discrete r.v., this also reduces to the regular definition of entropy given at the very beginning of this question. Perfect. However, if $X$ is a continuous r.v., say Gaussian, then I think this definition gives $H(X)=\infty,$ since it implies that
$$H(X)=\sup_q H(q(X)),$$ where $q$ is any quantizer of $X$. So it appears inconsistent with $h(X)$, the usual finite (differential) entropy, doesn't it? Hence the question.
Or how do we reconcile this inconsistency? Does it make more sense to have the entropy of a continuous r.v. defined as infinity, don't use it, and just use its mutual information with other random variables? If we must, we just use its differential entropy, $h(X),$ and have it defined differently from its entropy $H(X)$?