2

On page 482 of Statistical Inference (Second Edition) by Casella & Berger, the authors define the breakdown value as follows:

Defintion 10.2.2 Let $X_{(1)} < \dots < X_{(n)} $ be an ordered sample of size $n$, and let $T_n$ be a statistic based on the sample. $T_n$ has a breakdown value $b$, $0 \leq b \leq 1$, if, for every $\epsilon > 0$,

$\lim_{X_{(\{(1-b)n\})} \rightarrow \infty} T_n < \infty$ and $\lim_{X_{(\{(1-(b+\epsilon))n\})} \rightarrow \infty} T_n = \infty$

where the round brackets $\{\cdot \}$ indicate rounding to the closest integer.

Now on the next page Casella & Berger state that the breakdown value of the mean is $0$, which is generally accepted, I think. But if I apply the definition, both of the limits would go to infinity, would they not?

I would appreciate if anybody could point out my error in understanding or provide a different formal definition. I am aware that the breakdown value is the proportion of the sample that can be changed without changing the statistic (very generally speaking).

BBB
  • 45
  • 4

1 Answers1

1

I've never seen this notion of break down point and maybe the definition should be extended so that $b=0$ if no $b>0$ exists such that the first limit is satisfied. Another notion of breakdown point that is (as far as I know) more common in the literature was proposed by Donoho here. Let $X_n=(x_1,\dots, x_n)$ denote a fixed sample of $n$ points and $X'_n$ denote an $\epsilon$-corrupted sample obtained by replacing an $\epsilon$ proportion of the original $X$ arbitrarily. Then let $T$ be some statistic and define the largest bias caused by $\epsilon$-corruption by $$ b(\epsilon; X,T) = \sup |T(X') - T(X)| $$

where the supremum is taken over all possible $\epsilon$-corrupted samples $X'$, and the breakdown point is then

$$ \epsilon^*(X, T) = \inf \{ \epsilon: b(\epsilon, X , T) = \infty \} $$

The definition here can be generalized by looking at other distances between $T(X)$ and $T(X')$, see for example the quantity defined in (2.2) here.

WeakLearner
  • 6,350
  • That seems like a sensible definition to me, thank you.

    As for the extension of the Casella & Berger definiton, I'm not sure it would entirely solve the problem. If we assume the breakdown value of the median is 0,5 as is commonly stated, the definition would still not hold, if I'm not being mistaken. I think is would be appropriate to change the definition to $lim_{X({(1−(b-\epsilon)n}) \rightarrow \infty} T_n < \infty$ and $lim_{X({(1−b)n}) \rightarrow \infty} T_n = \infty$ and extend the definiton for b=0 as you stated. I would be very curious to read your thoughts on this.

    – BBB Sep 23 '22 at 16:31