I am reading Pearl's Causal Inference book and attempted at solving study question 1.2.4. Here is the entire problem:
In an attempt to estimate the effectiveness of a new drug, a randomized experiment is conducted. In all, 50% of the patients are assigned to receive the new drug and 50% to receive a placebo. A day before the actual experiment, a nurse hands out lollipops to some patients who show signs of depression, mostly among those who have been assigned to treatment the next day (i.e., the nurse’s round happened to take her through the treatment-bound ward). Strangely, the experimental data revealed a Simpson’s reversal: Although the drug proved beneficial to the population as a whole, drug takers were less likely to recover than nontakers, among both lollipop receivers and lollipop nonreceivers. Assuming that lollipop sucking in itself has no effect whatsoever on recovery, answer the following questions:
1. Is the drug beneficial to the population as a whole or harmful?
2. Does your answer contradict our gender example, where sex-specific data was deemed more appropriate?
3. Draw a graph (informally) that more or less captures the story.
4. How would you explain the emergence of Simpson’s reversal in this story?
5. Would your answer change if the lollipops were handed out (by the same criterion) a day after the study?
[Hint: Use the fact that receiving a lollipop indicates a greater likelihood of being assigned to drug treatment, as well as depression, which is a symptom of risk factors that lower the likelihood of recovery.]
Given the hint, Here is my attempt at forming some assumptions that might have produced the observed data:
Lollipop receivers consist of depressed patients (majority) and patients without depression. So depression affects lollipop receipt.
Lollipop receivers consist of treated patients (majority) and untreated patients. So treatment affects lollipop receipt.
However, depression and treatment assignment cannot be related because treatment assignment was randomized. This means that the number of depressed patients in the treatment group should be similar to the number of depressed patients in the placebo group.
Treatment affects recovery.
Depression affects recovery.
I tried to come up with a concrete extreme case that would be consistent with the problem's description and my assumptions.
One extreme scenario for the simpson's reversal to happen is by the following:
Within the lollipop group, most treated patients are depressed, and all the untreated patients aren't depressed. Since untreated patients have better health than treated patients, depending on the severeness of depression for the treated patients, placebo group can have a higher recover rate than treatment group within the lollipop group.
If 1 is established, within the non-lollipop group, it must be that all the treated patients are not depressed, and the majority of the untreated patients are depressed. In this case, untreated patients should have a lower recovery rate since they are both untreated and are in a worse health condition, contradicting simpson's reversal.
I have been trying different scenarios and creating contingency tables but still can't find a satisfying concrete example.
Could someone please help me?