6

I have seen a lot of questions in this forum related to what my question is, but I didn't find any convincing answer. So I would to like to put this question:

When we are dealing with 95% confidence interval we mean that if we repeat process of collecting samples of same size and calculate 95% intervals for those samples then 95% of those intervals will contain the true population parameter.

Let the infinite number of intervals be represented by 100 for simplicity. Then 95 of these intervals will contain true population parameter.

Suppose we got an interval at the starting of the above process (L,U). Then if I ask what is the probability that this interval (L,U) contains the true population parameter then shouldn't it be 95/100 = 0.95? (Because this interval (L,U) can be anyone of 100 and it would contain true population parameter of its one of those 95).

But this interpretation of confidence interval is considered incorrect. Can someone explain me why is this so?

5 Answers5

4

For an analogy, consider the following game. Alice pays Bob five dollars to flip a fair coin. If the coin lands heads, Alice wins ten dollars; if the coin lands tails, Alice wins nothing. Let $W$ be the random variable representing Alice's winnings. Consider the question, "Did Alice win five dollars?" (i.e. "Is $W = +5$?")

Now:

  • before Bob flips the coin, we have: $$P(W = +5) = P(W = -5) = 0.5.$$ So the answer is Yes with probability $0.5$.

But,

  • after Bob flips it, the coin either came up heads, or it came up tails. So $W$ is now either equal to $+5$, or not. The answer is now Yes either with probability $1$, or probability $0$.

This is the case generally: the act of performing an experiment changes probabilities to certainties. Whatever likelihood we assign to an event happening or not happening beforehand, ceases to matter after the experiment has been performed, and the event either did actually happen, or did not actually happen.

Similarly for your question about 95% confidence intervals. When we ask the question, "Does the 95% confidence interval $(L, U)$ contain the true population parameter?" where $L, U$ are the random variables representing the lower and upper endpoints of the interval, then before we take our sample, the answer is Yes with probability $0.95$.

But after we take our sample, $L$ and $U$ are no longer random variables, but have taken specific numerical values. Once the sample is taken and the endpoints are calculated, either $(L, U)$ actually contains the true population parameter, or does not actually contain the true population parameter. So the probability of the answer being Yes is now either $1$ (if it does contain the true parameter) or $0$ (if it does not).

3

I think a better way to conceptualize confidence intervals (in the frequentist sense) is to first go back to point estimates.

Suppose we calculate a point estimate $W$ for a fixed but unknown parameter $\theta$. This value of $W$ is a statistic: it is a random variable whose value is calculated from the sample $(X_1, \ldots, X_n)$, and does not depend on any unknown parameters. It is random in the sense that it inherits the randomness from the sample, not that the calculation of $W$ from the sample is itself random. For example, we can write a specific formula or rule that calculates $W$ when we have observed $X_1, \ldots, X_n$, but each time we collect a sample, the resulting $W$ we calculate may vary from sample to sample.

As such, we do not have any difficulty understanding that $W$ is an estimate, rather than the true value of $\theta$, which remains unknown to us. We could collect many samples and calculate many different estimates. If the estimator is a "good" one, then if we plotted a histogram of $W$, we would see that $W$ has what is called a sampling distribution, for which most outcomes will tend to "cluster" around the true value of $\theta$.

Now, when we calculate a confidence interval, the idea is to move away from point estimation and talk about pairs of random variables that enclose a range of values that estimate $\theta$. For instance, if we collected ten point estimates and they were $$\{4.2, 4.9, 3.9, 3.75, 4.1, 4.3, 4.45, 3.95, 4.05, 4.5\},$$ this gives us some idea of $\theta$. But ten confidence intervals might look like $$\{(3.7, 4.5), (3.85, 4.65), (4.0, 4.8), (3.9, 4.7), (4.1, 4.9), \\ (4.3, 5.1), (3.35, 4.1), (3.6, 4.4), (4.2, 5.0), (4.25, 5.05)\}.$$ Each time we collect a sample, we calculate two statistics--one for the lower endpoint and one for the upper, with the understanding that their difference incorporates in some sense the underlying variability observed in the sample. But how do we interpret this interval? What does "$95\%$ confidence" mean?

The idea is that in order for us to calculate an interval estimate, we not only need a calculation of the variability in the sample, but we also need to set a criterion called "confidence" that expresses how tolerant we are of the possibility that our estimate might fail to enclose the true value of $\theta$. For example, if we want $99.9\%$ confidence, what this means is on average, we want the chance of the resulting interval we calculate to enclose $\theta$ to be at least this high. Thus, such an interval will be at least as large (and generally speaking much larger) than an interval that has only $90\%$ confidence.

So why don't we ask for $100\%$ confidence intervals? Because except in trivial cases, to be $100\%$ confident you "caught" $\theta$ in your interval, you'd need an infinitely large interval, thus negating the value of computing an estimate at all.

For American audiences, there is a game show called "The Price is Right." One of the games they play with contestants is called the Range game. The host shows the contestant a car, and the contestant needs to guess the price of the car in order to win it. But the guess isn't a point estimate--the contestant doesn't have to guess the exact value. Instead, they watch a transparent red slider of fixed width that moves steadily upward over a vertical chart of prices, and they press a button when they think that the slider is covering the true price of the car to stop the slider. Once it is stopped, the edges of the slider indicates the range of prices that the contestant believes the true price is contained within.

This game is exactly analogous to how confidence intervals work. The true price is fixed but unknown to the contestant. The act of pressing the button is equivalent to calculating the confidence interval. If you were to play this game with the same car but for many different contestants, each one would stop the slider at a slightly different point. Not all of the contestants would win; some would miss. The proportion of contestants who win is the confidence level. If the game show made the slider huge, then it would be easier to win, but the precision of the price estimate suffers. Conversely, if the slider were made tiny, it would be hard to win but the estimate has better precision.

heropup
  • 143,828
2

Look at the problem this way: $L$ and $U$ are fixed numbers you know, since you have calculated them. They are non-random given your sample. Now the true population parameter, let's call it $\theta$ may be unknown to you, but it is again a fixed non-random number. So you have three fixed numbers $L$, $U$ and $\theta$ and you ask: What is the probability that $\theta\in[L,U]$? This is really easy to answer: Either $\theta$ is in the interval, then the probability is one or it isn't, then the probability is zero.

g g
  • 2,799
2

The explanation given by @g g is what most people give but I am not convinced actually. I am using the following analogy: Say NYC gender distribution of male v.s. female is 60% : 40%. Before I draw a random sample from NYC, I know there is 60% chance that the draw is a male. This is in analogous to say that: before I run my estimation I know that 95% of the time the confidence internal that comes out of it will cover the truth. Now the question asks, given a person that was sampled from NYC population, what is the probability of that person being a male. Surely it's either male or female but my estimate is still 60% male before I know the true answer. I don't see why conditioning on the sample changes that probability.

Jing
  • 2,547
1

In a frequentist approach, that is the case of the Classical Statistics Confidence Intervals, they may strictly be interpreted only in a relation to a sequence of similar inferences that might be made in repeated practice. On the contrary, in Bayesian thinking this argument facilitates the common sense interpretation of statistical conclusion. For instance, a Bayesian Credible Interval for an unknown quantity of interest can be directly regarded as having high probability of containing the unknown quantity.

Similar comment applies to the comparison of Bayesian and frequentist approaches to hypothesis testing.

tommik
  • 33,201
  • 4
  • 17
  • 35