Causal Inference A Primer Study Question

Question

I am reading Pearl's Causal Inference book and attempted at solving study question 1.2.4. Here is the entire problem:

In an attempt to estimate the effectiveness of a new drug, a randomized experiment is conducted. In all, 50% of the patients are assigned to receive the new drug and 50% to receive a placebo. A day before the actual experiment, a nurse hands out lollipops to some patients who show signs of depression, mostly among those who have been assigned to treatment the next day (i.e., the nurse’s round happened to take her through the treatment-bound ward). Strangely, the experimental data revealed a Simpson’s reversal: Although the drug proved beneficial to the population as a whole, drug takers were less likely to recover than nontakers, among both lollipop receivers and lollipop nonreceivers. Assuming that lollipop sucking in itself has no effect whatsoever on recovery, answer the following questions:

1. Is the drug beneficial to the population as a whole or harmful?

2. Does your answer contradict our gender example, where sex-specific data was deemed more appropriate?

3. Draw a graph (informally) that more or less captures the story.

4. How would you explain the emergence of Simpson’s reversal in this story?

5. Would your answer change if the lollipops were handed out (by the same criterion) a day after the study?

[Hint: Use the fact that receiving a lollipop indicates a greater likelihood of being assigned to drug treatment, as well as depression, which is a symptom of risk factors that lower the likelihood of recovery.]

Given the hint, Here is my attempt at forming some assumptions that might have produced the observed data:

Lollipop receivers consist of depressed patients (majority) and patients without depression. So depression affects lollipop receipt.
Lollipop receivers consist of treated patients (majority) and untreated patients. So treatment affects lollipop receipt.
However, depression and treatment assignment cannot be related because treatment assignment was randomized. This means that the number of depressed patients in the treatment group should be similar to the number of depressed patients in the placebo group.
Treatment affects recovery.
Depression affects recovery.

I tried to come up with a concrete extreme case that would be consistent with the problem's description and my assumptions.

One extreme scenario for the simpson's reversal to happen is by the following:

Within the lollipop group, most treated patients are depressed, and all the untreated patients aren't depressed. Since untreated patients have better health than treated patients, depending on the severeness of depression for the treated patients, placebo group can have a higher recover rate than treatment group within the lollipop group.
If 1 is established, within the non-lollipop group, it must be that all the treated patients are not depressed, and the majority of the untreated patients are depressed. In this case, untreated patients should have a lower recovery rate since they are both untreated and are in a worse health condition, contradicting simpson's reversal.

I have been trying different scenarios and creating contingency tables but still can't find a satisfying concrete example.

Could someone please help me?

LudvigH · Answer 1 · 2021-01-29T17:07:39.627

2

I go stumped on the same question. I concluded that lolliepops generate collider bias, but could not find a concrete distribution that shows the simpsons reversal in question. Therefore I searched on cross validated (forgot to search for the question here on math.SE though).

The answer https://stats.stackexchange.com/a/397718/28436 by Carlos Cinelli was provided in R code:


set.seed(10)
n     <- 1e6
x     <- rbinom(n, 1, 0.5)
dep   <- rbinom(n, 1, 0.5)
lolli <- rbinom(n, 1, 0.5*x + 0.5*dep)
y     <- rbinom(n, 1, 0.1*x + 0.8*dep)
mean(y[x==1]) - mean(y[x==0]) # 0.1
mean(y[x==1 & lolli==1]) - mean(y[x==0 & lolli==1]) # -0.17
mean(y[x==1 & lolli==0]) - mean(y[x==0 & lolli==0]) # -0.17

It is a quite nice simple answer. :)

EDIT After the question in the comment I have come to think that the question as stated is not possible to solve. M bias between depression and drug assignment implies they must be negatively correlated both among lolly recievers and non-recievers. But to get the reversal for positive drug effect and negative depression effect we must have positive correlation between the two. The only ways I have gotten the reversal effect is by either getting positive effect from the depression on the outcome, or drug assignment of depression has a negative effect on lolly assignment. You may un-acceptthis answer in hope that someone with a more complete answer comes by.

edited Jan 29 '21 at 17:07

answered Jan 15 '20 at 10:23

LudvigH

390

I'm somewhat confused, if $y \sim \mathrm{ber}(0.1x+0.8dep)$ then thats saying that you're more likely to recover if you are depressed? But the clue states "a lollipop indicates a greater likelihood of being assigned to drug treatment, as well as depression, which is a symptom of risk factors that lower the likelihood of recovery." So how does that work? – Pavan Sangha Jan 29 '21 at 12:58
good point. I didn't think of it before, since it was not in the problem main text. I'll play around a little and see if I can get it to work myself... – LudvigH Jan 29 '21 at 15:26
I can't solve it. It seems to me the coefficient for $dep$ must be positive, or either $dep$ or $drug$ must have a negative coefficient for $lolly$. – LudvigH Jan 29 '21 at 17:09
I tried off setting it in which case it needs to be $\alpha + \betadrug - \alphadrug$ which ensures it doesn't go above 1 or lower than 0 for appropriate values of $\alpha,\beta$. I guessed values initially but it violated the other conditions with lollipop. So i tried something like 0.2+0.7drug - 0.2dep which didn't work. I'm wondering if we can put in arbitrary values, calculate the proabilities and solve for the parameters? – Pavan Sangha Jan 29 '21 at 17:12
i started doing exactly that, but gave up due to tedious algebra. It is definitely doable if you have the grit. please tell me your results if you do! – LudvigH Jan 31 '21 at 09:18
Here is the mess in a colab.... https://colab.research.google.com/drive/1MpPcCoKwDUP-244jtaWexarRGDLVovw9?usp=sharing – LudvigH Jan 31 '21 at 11:17

Causal Inference A Primer Study Question

1 Answers1