Introduction. In a previous post, Definitions of the total variation distance, I asked about the equivalences among different definitions of the "total variation distance" $d_{TV}$:
I found the following definitions of the total variation distance $d_{TV}$ between two probability distributions (also called probability measures) $P$ and $Q$ on $\mathcal{A}$ (please note that I tried to use a consistent notation in the following definitions!):
\begin{align} &{\color{red}{\textbf{Definition 1.}}}\quad{\color{blue}{\textbf{"Definition 2.4" on page 84, in > Tsybakov (2009)}}} \\&\hspace{10ex} d_{TV}(P,Q) > = \sup_{A \in \mathcal{A}} \left| P(A)-Q(A) \right| = \sup_{A \in \mathcal{A}} \left| \int_{A} (p-q)d\nu \,\right| \\ \\ &{\color{red}{\textbf{Definition 2.}}}\quad{\color{blue}{\textbf{"2.1 Definition" on page 5, in Strasser > (1985)}}} \\&\hspace{10ex}d_{TV}(P,Q) > = \left\Vert P-Q\right\Vert = \sup \{\left| P(A)-Q(A) \right| : {A \in \mathcal{A}} \} \\ \\ &{\color{red}{\textbf{Definition 3.}}}\quad{\color{blue}{\textbf{"4.1. Total Variation > Distance" on page 47, in Levin&Peres (2017)}}} \\&\hspace{10ex}d_{TV}(P,Q) > = \left\Vert P-Q\right\Vert = \max_{A \subseteq \mathcal{A}} \left| P(A)-Q(A) \right| \\ \\ &{\color{red}{\textbf{Definition 4.}}}\quad{\color{blue}{\textbf{On page 22, in Villani > (2008)}}}\\&\hspace{10ex}d_{TV}(P,Q) > = \left\Vert P-Q\right\Vert = 2 \inf \left\{ \mathbb{E} [\mathcal{1}_{X \neq Y}]; \,\text{law}(X)=P, \text{law}(Y)=Q \right\} \\ \end{align}
I received very nice replies and things are getting clearer. However, I am not able to figure out how to show what suggested by @John Dawkins in his answer, i.e. to show that Definition 2 and Definition 3 are equivalent, by using the densities of $P$ and $Q$:
One can take the measure $\nu$ in the first definition to be $P+Q$. Then $P\ll\nu$ so (Radon-Nikdym) there is a density $p$ such that $P(A)=\int_Ap\,d\nu$ for all $A\in\mathcal A$. Likewise, there's a density $q$ such that $Q(A)=\int_A q\,d\nu$.
There is another (equivalent) definition based on these densities: $\|P-Q\|={1\over 2}\int|p-q|\,d\nu$.
Using this you can show that the supremum in definition 2 is attained at $A=\{x: p(x) > q(x)\}$. This shows that definitions 2 and 3 are equivalent.
In Definition of the total variation distance: $ V(P,Q) = \frac{1}{2} \int |p-q|d\nu$?, I guess I found something very related to the suggestion of @John Dawkins, i.e. the proof of one equality among the densities of $P$ and $Q$ (by using both $B=\{p\geq q\}$ and $A$ sets): \begin{equation} \frac{1}{2} \int_{A} \left| p-q \right| = \sup_{A \in \mathcal{A}} \left| \int_{A} (p-q) d\nu \right| \end{equation} But still, a part is missing in showing that Definition 2 and Definition 3 of my post are equivalent, i.e. the following (I guess): \begin{equation} \sup_{A \in \mathcal{A}} \left| \int_{A} (p-q) d\nu \right| = \max_{A \in \mathcal{A}} \left| \int_{A} (p-q) d\nu \right| \end{equation}
Question. Do you have any suggestion or reference about how to show that definitions 2 and 3 are equivalent, by using the suggestion/path proposed by @John Dawkins?
Note. I think, that, maybe, an alternative way to show that definitions 2 and 3 are equivalent, could be to just say what kindly indicated by angryavian in his answer, still in my previous post (Definitions of the total variation distance), i.e.:
- If $\Omega$ is finite, then any $\sigma$-algebra $\mathcal{A}$ is finite, since the power set is finite.
- If $\mathcal{A}$ is finite, then $\max_{A \in \mathcal{A}}$ exists and is equivalent to $\sup_{A \in \mathcal{A}}$.
However, it would be very interesting to explore the path proposed by @John Dawkins, as well!