On the Derivation of Judea Pearl's Front-Door Adjustment Formula in The Book of Why

Question

I have a number of related questions about the derivation of the front-door adjustment formula as given on page 236. Here is the derivation. I would have typed it up, but the diagrams at the far right would have been a pain to include.

There is a typo in Line 4, caught in the errata. It should be

$$=\sum_t P(c|\operatorname{do}(t){\color{red})}\,P(t|s). $$

Some additional background are the Rules of the Do-Calculus, which are as follows:

Rule 1. Assume the variable set $Z$ blocks all paths from $W$ to $Y$ after we have deleted all arrows leading into $X.$ Then $$P(Y|\operatorname{do}(X),Z,W)=P(Y|\operatorname{do}(X),Z). $$

Rule 2. If $Z$ blocks all back-door paths from $X$ to $Y,$ then $$P(Y|\operatorname{do}(X),Z)=P(Y|X,Z). $$

Rule 3. If there are no causal paths from $X$ to $Y,$ then $$P(Y|\operatorname{do}(X))=P(Y). $$

My questions are as follows:

Given the invocation of the Probability Axioms in Line 1, it is not difficult to follow the validity of the same invocation in Line 5. However, which axiom is being used, here? Where can I find a discussion of it?

In Lines 2, 3, 4, 6, and 7, Pearl invokes Rule 2 or 3. Next to the invocation is a diagram, which is supposed to be some subset of the original at the top, involving a stereotypical confounding unobserved variable situation. Why can Pearl just delete edges at will? That is, how come the expressions are equivalent while he's manipulating the diagram right and left?

The final result has $s$ and $s'$ in it, but the version of the Front-Door Adjustment Formula on page 227 does not: $$P(Y|\operatorname{do}(X))=\sum_z P(Z=z|X) \sum_x P(Y|X=x,Z=z) P(X=x). \quad \text{(7.1)} $$ Here $z$ is like $t$ in the formula above, as well as $x\to s$ and $y\to c.$ How has he proved the formula on page 227? Wouldn't he have to collapse $s'\to s$ to finish?

Thank you for your time!

score 3 · Accepted Answer · edited Feb 25 '20 at 19:26

Adrian, here are the answers:

1 - Here we are using the Law of Total Probability, that is, $p(y)=\sum_x{p(y|x)p(x)}$.

2 - Every valid manipulation of the causal expression has to preserve the meaning of that expression. For instance, the equality $p(t|do(s)) = p(t|s)$ is, in general, not true. It is only true if there are no backdoor paths between $t$ and $s$. Thus, to know whether you can substitute the $do(s)$ operator with regular conditioning on $s$, you need to check whether this holds in your model. That's why you need to check the auxiliary graphs– the graphs provide the test to know whether the substitution is valid. So for instance, in the case of $p(t|do(s)) = p(t|s)$, the test is to: (i) delete outgoing arrows from $s$; and, (ii) check whether $t$ and $s$ are separated in the graph. If they are, then the manipulation is valid. This is what rule 2 is saying. Therefore, you can't delete arrows arbitrarily. These modifications on the graph are done to check the conditions that license the manipulation.

3 - The formula is not proved on page 227, you can find its proof here. Regarding $s$ versus $s'$, this is for notation purposes, and it is useful to keep them distinct, because we have two different operations being beformed with $s$ here. Let's rewrite the estimand as,

$$p(c|do(s))= \sum_{t}p(t|s)\sum_{s'}p(c|t, s')p(s')$$

Thus, the first $s$ in $p(t|s)$ stands for the same $s$ in the $do(s)$ expression. That's the value $s$ you are setting the variable $S$ to, say, $S= 1$. The term involving $s'$ stands for summing over all values of possible values of $S$. You could have written that as $\sum_{s}p(c|t, s)p(s)$, but this notation without primes could lead to ambiguity, since we now have the same symbol for the specific value of $s$ in the expression $p(t|s)$. Thus, the primes are added for clarity, so the reader understand these are different values.

Hope these clarifications help!

Addendum:

Regarding the law of total probability, one way to help thinking about it may be remembering that interventions define a new probability distribution, and the law of total probability only holds when using the same probability distribution. So let's define the post-intervention distribution after intervening on $S$ as $P^*(\cdot)$, that is, $P^*(\cdot) := P(\cdot|do(s))$. Thus the law of total probability on $P^*$ states that,

$$ P^*(c) = \sum_{t}P^*(c|t)P^*(t) $$

If we now recall that $P^*(\cdot) := P(\cdot|do(s))$, we have,

$$ P(c|do(s)) = \sum_{t}P(c|t, do(s))P(t|do(s)) $$

Regarding the formula on page 227, the notation is indeed not ideal. The best notation is using the primes to avoid these confusions. In that page, the big $X$ is indeed standing for the little $s,$ and the little $x$ for the $s'.$

So the two formulas are indeed equivalent (barring, of course, the different notation and different names for the variables). The main source of confusion in the second formula is that the symbol "capital $X$" is being used to denote both an instantiation value (inside the $do(X)$) and the random variable (when we write $X = x$).

Regarding the first formula, with lower case letters only, that's a somewhat standard notation (see Causality Chapter 1), in which we usually use $P(y|x)$ as a shorthand notation for $P(Y=y|X=x)$.

PS: maybe these questions are probably better suited in cross-validated.

Carlos: Thanks much for your answer! I still have a few questions. Re:

Why isn't the expression this instead: $$P(c|\operatorname{do}(s))=\sum_t P((c|\operatorname{do}(s))|t),P(t)?$$

So, e.g., in Line 2 of the derivation, what would $Z$ be?

I'm afraid you haven't answered this one at all. I know why someone would put $s'$ in there instead of $s.$ What I'm asking is this: it doesn't appear to me that he has proved the Front-Door Formula, because the Front-Door Formula doesn't have different $s$'s in it, at least not the version on page. 227. Is he just being sloppy? — Adrian Keister, Feb 25 '20 at 18:43
Hi Adrian. You can think of $do(s)$ as a different probability measure, that holds in the intervention world. The law of probability only holds when using the same probability measure. Maybe this helps, instead of $P( . |do(s))$ let's call this interventional probability $P^$, that is, define $P^(.) := P (. |do(s))$. Then the law of total probability states $P^(c) = \sum_{t}{P^(c|t)P^*(t)}$. — Carlos Cinelli, Feb 25 '20 at 18:52
It looks to me like the following matchup in the formulae on pages 227 and 236 hold: \begin{align}c&\to Y\ t&\to Z=z\ s'&\to X=x\ s&\to X\end{align} This looks really sloppy to me. What's being said here? That is, the two formulae \begin{align}P(c|\operatorname{do}(s))&=\sum_{s'}\sum_tP(c|t,s') P(s') P(t|s)\ P(Y|\operatorname{do}(X))&=\sum_x\sum_z P(Y|Z=z,X=x)P(X=x)P(Z=z|X). \end{align} are equivalent. — Adrian Keister, Feb 25 '20 at 18:54
@AdrianKeister regarding the formula on page 227, the notation is indeed not ideal. The best notation is using the primes to avoid these confusions. In that page, the big X is indeed standing for the little s, and the little x for the s'. — Carlos Cinelli, Feb 25 '20 at 18:57
@AdrianKeister yes the two formula are equivalent (baring, of course, the different notation and different names of the variables). The main source of confusion in the second formula is that symbol "capital X" is being used to denote both an instantiation value (inside the do(X)) and the random variable (when we write X = x). Regarding the first formula, with lower case letters, that's standard notation, in which we usually use P(y|x) as a short hand for P(Y=y|X=x). — Carlos Cinelli, Feb 25 '20 at 19:02
Great, thanks! If you fold these comments back into your answer, I'll accept. Pity I can't give you the 100 bounty I had on this problem! — Adrian Keister, Feb 25 '20 at 19:03
@AdrianKeister thanks Adrian, just added these comments in the answer. — Carlos Cinelli, Feb 25 '20 at 19:10

On the Derivation of Judea Pearl's Front-Door Adjustment Formula in The Book of Why

1 Answers1