0

I am having trouble figuring out what 'Assuming that the accuracy of the test is not affected by the presence of symptom Y' means in the following problem in terms of the events: A - test is positive, B - patient has disease X, C - patient has symptom Y.

First part:

enter image description here

Second part:

enter image description here

EDIT: The exercise is from Victor Shoup's free book A Computational Introduction to Number Theory and Algebra, pages 216 (example 8.13) and 220 (exercise 8.10): https://shoup.net/ntb/

StubbornAtom
  • 17,932
PhantomR
  • 342

1 Answers1

1

It means that $\mathsf P(A\mid B,C)=\mathsf P(A\mid B)$ and $\mathsf P(A\mid\overline B,C)=\mathsf P(A\mid\overline B)$; that is, that you can keep using the four numbers for the accuracy of the test in the first part and only have to adapt the $1\%$ rate of incidence (since the incidence among people with symptom $Y$ is higher).

This means that $A$ is conditionally independent of $C$ given $B$ or $\overline B$; that is, while $\mathsf P(A\mid C)\gt\mathsf P(A)$ (a person with symptom $Y$ is more likely than average to test positive for the disease), so that $A$ and $C$ are not independent, they become independent when you condition on $B$ or on $\overline B$. That is, the test accuracy only depends on the presence of the disease; it depends on the presence of the symptom only in that this makes it more likely that the disease is present, but given whether the disease is present, the symptom has no influence of its own on the test results. For more on conditional independence, you may want to take a look at Could someone explain conditional independence?.

joriki
  • 242,601
  • Just curious, does the stated condition imply anything about $P(\neg A)$? – Jake Tae May 06 '20 at 14:35
  • @JST99: Conditional probabilities are, in particular, probabilities; $\mathsf P(\overline A\mid B)=1-\mathsf P(\overline A\mid B)$ (and likewise for conditioning on $\overline B$ or $B\cap C$ or $\overline B\cap C$) holds as it does for all probabilities; that's why I didn't write it out separately for $\overline A$, as the equations for $A$ imply the ones for $\overline A$. I wrote it out separately for $\overline B$ because it doesn't work that way on the condition side; you could very well have $A$ conditionally independent of $C$ given $B$ but not given $\overline B$. – joriki May 06 '20 at 14:39
  • Thank you so much. I also thought it meant the first one, but not the second (in which we condtion on $\bar B$. If I may ask, could you please elaborate on why I have to adapt the $1%$ rate? Is it no longer valid? I am not able to derive this from the text. – PhantomR May 07 '20 at 11:04
  • @PhantomR: In the population at large, the rate of incidence is $1%$. But everyone who exhibits symptom $Y$ carries the disease, and $10%$ of the population exhibit symptom $Y$. Thus, $1%$ of the population exhibit the symptom and carry the disease (this is the original rate of incidence), and $9%$ of the population (the rest of the $10%$) exhibit the symptom and don't carry the disease. Thus, among those who exhibit the symptom (this is now the relevant "population", since the patient exhibits the symptom), the rate of incidence is $10%$. – joriki May 07 '20 at 11:40
  • It's a bit unfortunate (and perhaps confusing) that the same number $10%$ appears in two different roles here; as the proportion of the population that exhibits the symptom and as the proportion of the symptomatic population that carries the disease; there's no reason for those two numbers to be the same. – joriki May 07 '20 at 11:40