2

I am trying to figure out whether convergence of a sequence of pdfs to a scaled version of the Gaussian pdf implies the existence of a subsequence that converges in $L_1$. In other words, let $(p_k)_{k\in \mathbb{N}}$ a sequence of probability density functions (I also know that they are continuous and uniformly bounded) and $p_Z$ the pdf of the standard Gaussian. Let $$ D_\text{KL}(p_k \| C p_Z) \rightarrow 0 \quad \text{ as } k \to \infty \text{ for } C > 0.$$ I feel like this does not imply that there exists a pdf $p^\ast$ for which $D_\text{KL}(p_k\|p^\ast) \rightarrow 0$ as $k\to\infty$ or $\|p_k-p^\ast\|_{L^1(\mathbb{R}^n)} \rightarrow 0$ as $k\to \infty$, but is it possible to prove that there exist subsequences with one of these properties?

It is clear to me that $Cp_Z$ is not a pdf but the KL divergence is well-defined for this scaled version. If $Cp_Z$ where a pdf, I could apply Pinsker's inequality, which does not hold in this case. Thankful for any help!

Mittens
  • 46,352
mathxxx
  • 137

1 Answers1

2

Let $\phi$ be the density of the standard normal $N(0,1)$ distribution. Let $\{t\}=t-\lfloor t\rfloor$, this is the fractional part function. Thought, $m$ is the Lebesgue measure on the real line.

Define $$f_n=c_n(1-\{nt\})\phi(t)$$ where $c_n$ is a normalizing factor such that $\int f_n=1$. Since $t\mapsto 1-\{t\}$ is measurable bounded and 1-periodic $$c^{-1}_n=\int (1-\{nt\})\phi(t)\,dt\xrightarrow{n\rightarrow\infty}\int^1_0(1-t)\,dt=\frac12$$ by Fejer's lemma. Thus, $c_n\xrightarrow{n\rightarrow\infty}2$.

For any $g\in L_\infty$, $g\phi\in L_1$ and so $$\int f_n g\xrightarrow{n\rightarrow\infty}\int \phi g$$ Thus, not only does $\mu_n=f_n\,dm$ converges to $\mu=\phi\,dm$ weakly in $\sigma(\mathcal{M}(\mathbb{R}),\mathcal{C}_b(\mathbb{R}))$ but also $f_n$ converges weakly to $\phi$ in $\sigma(L_1(m),L_\infty(m))$.

Now, $$KL(f_n|\phi)=c_n\int\log(1-\{nt\})\,(1-\{nt\})\phi(t)\,dt+\log(c_n)\xrightarrow{n\rightarrow\infty}-\frac12+\log(2)$$ Since $h(x)=\log(1-\{t\})(1-\{t\})$ is measurable bounded and 1-periodic (another application of Fejer's lemma). Then $\int\log\Big(\frac{f_n}{c\phi}\Big)f_n\,dm\xrightarrow{n\rightarrow\infty}0$ for $c=e^{\log(\log2-1/2)}$. However, there is no pointwise convergent subsequence of $f_n$ and thus, $f_n$ does not converge to $\phi$ in $L_1(m)$ either.


Final comments:

  • This example also shows that even if a sequence of densities $f_n$ converges weakly to another probability density $f$, $KL(f_n|f)$ may not converge to $K(f|f)=0$.
  • If $\mu_n$ and $\mu$ are probability measures on a space $(\Omega,\mathscr{F})$, $\mu_n\ll \mu$ for all $n$, then it is known that (Pinsker's inequality) that $$\|\mu_n-\mu\|_{TV}\leq \sqrt{\frac12KL(\mu_n|\mu)}$$ where $\|\;\|_{TV}$ stands for the total variation of $\mu_n-\mu$. Therefore, if $K(\mu_n|\mu)\xrightarrow{n\rightarrow\infty}0$, the sequence $\mu_n$ converges to $\mu$ in total variation. This is much stronger than weak convergence of measures. In particular, if $\mu_n$ and $\mu$ are Borel probability measures on the real line, $\mu_n\ll\mu\ll m$, and $f_n=\frac{d\mu_n}{dm}$, $f=\frac{d\mu}{dm}$, then $$\|\mu_n-\mu\|_{TV}=\|f_n-f\|_{L_1(m)}\xrightarrow{n\rightarrow\infty}0$$ This is perhaps the result that the OP is more interested in.
Mittens
  • 46,352
  • Thanks so much for your effort! I was trying to imagine what these functions $f_n$ would look like and they are not continuous. I totally get the point that in this case we don't have $L_1$ convergence of $f_n$ to $\phi$. Actually, my goal is to derive that if $D_\text{KL}(p_n | p_Z) \rightarrow L$ for continuous $p_n$, then there exists any $p^\ast$ with $D_\text{KL}(p_n | p^\ast) \rightarrow 0$. – mathxxx Feb 10 '25 at 11:57
  • So in case $\lim_{n\to \infty}\mathbb{E}_{p_n}(x) = K > 0$ is fulfilled, we can simply take $p^\ast = \mathcal{N}(M(L),1)$ for some $M$ depending on $L$. But if this is not the case, I have no idea at the moment. I feel like it should not be that complicated. – mathxxx Feb 10 '25 at 12:00
  • @mathxxx: You get a very similar effect if you use the factor $(1+\cos(2\pi nx))$ instead. The decoupling in the integration as $n\rightarrow\infty$ comes from the $1$-periodicity of $(1+\cos(2\pi x))$. your $p_n$ will be now continuous (infinitely differentiable to boot) and same thing. The point is that $KL$ is very restrictive for it gives a strong sense of convergence (total variation). – Mittens Feb 10 '25 at 12:31
  • I see, one probably needs to restrict the class of functions to ones with bounded derivatives or something similar. But what about weak L1 convergence of a subsequence to some pdf. Is this guaranteed under the setting stated above? – mathxxx Feb 10 '25 at 14:22
  • @mathxxx: KL is typically used in the case where $KL(P_n|Q)\xrightarrow{n\rightarrow\infty}0$ for in this case, as I stated in my posting, $P_n$ converges to $Q$ in total variation; thus, if in addition $P_n$ has density w.r.t. $Q$ or $P_n$ and $Q$ have densities w.r.t. a third $\sigma$-finite measure, then the densities converge in $L_1$ (Pinsker's inequality). – Mittens Feb 10 '25 at 14:47