I’ve encountered the following seemingly simple probability interview question in my workplace:
Two reviewers were tasked with finding errors in a book. The first had found 40 errors and the other had found 60. 20 of the found errors were found in common. Give an estimate on the number of errors in the book.
A few clarifications were given:
- The errors are not false positives.
- The probability of the reviewers to find any error is independent of each other. (Problematic phrasing?)
- The lower bound is not required (i.e at least 80 errors).
It was my opinion that this problem is not well defined and any answer would rely on hidden assumptions.
My coworker said that the solution is easily calculable using the following method assigning to x the total number of errors:
$$P(A) = \frac{40}{x}$$ $$P(B) = \frac{60}{x}$$ $$P(A\cap B) = \frac{20}{x}$$ $$P(A\cap B) = P(A) * P(B)$$ $$\frac{20}{x} = \frac{40}{x} * \frac{60}{x} $$ $$20x = 2400$$ $$x = 120$$
I found this answer unsatisfying, but I am struggling to coherently explain why. I believe there are various assumptions hidden in the above “solution”.
I need help identifying these assumptions or phrasing issues with the question itself that make it not well defined. It could be that I’m mistaken and the problem is well defined and I’ve complicated it.
I am also interested in alternative solutions that could be based on different assumptions but don’t negate the clarifications made.