12

Total variation distance is a measure for comparing two probability distributions (assuming that these are unit vectors in a finite space- where basis corresponds to the sample space ($\omega$)). I know a distance measure need to obey triangle inequality and it should satisfy that orthogonal vectors have maximum distance and the same distributions should have distance $0$. Others should like between these two. I completely don't understand why the $L^1$ norm is chosen for measuring the distance between these vectors (prob. distributions). I also want to know why it is exactly defined the way it is. $TV(P_1,P_2) = \frac{1}{2}\sum_{x \in \omega} \mid {P_1(x)-P_2(x) \mid}$

am_rf24
  • 163
  • 1
  • 1
  • 11
  • 1
    Yes. That's what but here $f,g$ are probability distributions. I don't understand why $L^1$ is used and why 1/2 comes in the way? – am_rf24 Oct 30 '19 at 20:20
  • 1
    The total variation distance of two probability measures is usually defined in terms of the sup norm: https://en.wikipedia.org/wiki/Total_variation_distance_of_probability_measures – amsmath Oct 30 '19 at 20:22
  • "When the set($\omega$) is countable, the total variation distance is related to the $L^1$ norm by the identity." So, this follows by doing some math. But what is the intuitive explanation for this? – am_rf24 Oct 30 '19 at 20:36
  • 1
    See Proposition 4.2 in https://pages.uoregon.edu/dlevin/MARKOV/markovmixing.pdf – amsmath Oct 30 '19 at 20:42

3 Answers3

6

The TV distance is measureing exactly what you want: The maximal difference between the assignments of probabilities of two probability distributions P and Q. And it is hence defined as $$TV(P,Q)=sup_{A\subset\Omega}|P(A)-Q(A)|$$

Now, as is shown in Proposition 4.2 here your last equation $$TV(P,Q)=\frac{1}{2}||P-Q||_1$$ is true (in countable prob. spaces). I will not redo the proof here, you can get a glimpse using the characterizations am_rf24 writes about in his answer. But I can give you some intuition: Although the definition of the TV distance seems closely resembling the definition of the infinity norm on vectors, it is actually subtly different. Note that the TV distance is defined over events (aka subsets) of $\Omega$ while the infinity norm is over elements of $\Omega$. So to conclude: other norms could be chosen. But because it is a norm we luckily do not need to show that the TV distance is a norm.

(Another remark: the orthogonality criterion you mention is not really a thing for a norm, as you can have norms without having a scalar product ;) )

JTB
  • 558
sani
  • 78
3

The two observations for characterizing the total variation distance using the two definitions are $\\$: $ 1. \max{P_1(A)-P_2(A)} = \max{P_2(\omega-A)-P_1(\omega-A)} \\$ $2. P_1(\omega)= P_2(\omega) \implies P_1(A)- P_2(A) = P_2(B) -P_1(B) = \max_{A \subset \omega}( P_1(A)- P_2(A)) $

Now, using $ 1/2 \sum_{x}{\mid{P_1(x)-P_2(x)}\mid}= \frac{1}{2}\sum_{x:P_1(x)>P_2(x)}{{(P_1(x)-P_2(x))}}+ \frac{1}{2}\sum_{x:P_2(x)>P_1(x)}{{(P_2(x)-P_1(x))}}$ $ \qquad \qquad \qquad \qquad= \frac{1}{2}\max_{A \subset \omega}{(P_1(A)-P_2(A))} + \frac{1}{2} \max_{B \subset \omega}{(P_2(B)-P_1(B))}$
$ \qquad \qquad \qquad \qquad= \max_{A \subset \omega}{ \mid P_1(A)-P_2(A)\mid}$

am_rf24
  • 163
  • 1
  • 1
  • 11
  • Hi, could you suggest any reference about the proof you have provided? thanks! :-) Indeed, I just asked a question about your proof https://math.stackexchange.com/questions/4771237/total-variation-distance-max-a-subseteq-mathcala-left-pa-qa-righ?noredirect=1#comment10135327_4771237 – Ommo Sep 18 '23 at 16:49
0

I can provide a little intuition as to why the relation between $TV$ distance and $L_1$ norm holds. For the sake of simplicity, assume $P_1$ and $P_2$ are probability mass functions with finite support. We can make the following claim:

In the notation of sani's answer, for every element $a$ in the optimal subset $A^\ast$, it must be that $P_1(a) \geq P_2(a)$. Otherwise, the difference in probabilities decreases the objective function, and hence cannot be a part of $A^\ast$.

Using the above claim, it follows that for all $b$ in $(A^\ast)'$, it must be that $P_1(b) \leq P_2(b)$. Let $p_1$ and $p_2$ denote $P_1(A^\ast)$ and $P_2(A^\ast)$, respectively. Then, $TV(P_1,P_2) = p_1 - p_2 \geq 0$. The $L_1$ norm can be written as \begin{align} L_1(P_1,P_2) & = | P_1(A^\ast) - P_2(A^\ast) | + | P_1((A^\ast)') - P_2((A^\ast)') | \\ & = \big( P_1(A^\ast) - P_2(A^\ast) \big) - \big( P_1((A^\ast)') - P_2((A^\ast)') \big) \\ & = (p_1 - p_2) + (~(1-p_2) - (1-p_1)~) \\ & = 2(p_1 - p_2) = 2~TV(P_1,P_2). \end{align}

Kamal Saleh
  • 6,973